Parallel BZIP2: pbzip2
I find myself compressing files for archival purposes constantly – today I was sitting, waiting on one such compression on a SMP box and thought it seems silly that bzip2 does not use more than one CPU. After a quick flip through the man page for bzip2 I found no way to force it to use more than one core. A quick web search yielded pbzip2 (http://compression.ca/pbzip2/) – another project that does indeed allow you to use more than one CPU for compress and decompression of bzip2 files. A quick test showed a huge reduction in compression time:
Using 1 Core:
[user@server source]$ pbzip2 -p1 -v -m1000 source.tar Parallel BZIP2 v1.1.1 - by: Jeff Gilchrist [http://compression.ca] [Apr. 17, 2010] (uses libbzip2 by Julian Seward) # CPUs: 1 BWT Block Size: 900 KB File Block Size: 900 KB Maximum Memory: 1000 MB ------------------------------------------- File: 1 of 1 Input Name: source.tar Output Name: source.tar.bz2 Input Size: 445163520 bytes Compressing data (no threads)... Output Size: 80063773 bytes ------------------------------------------- Wall Clock: 95.100318 seconds
Using 64 Cores:
[user@server source]$ pbzip2 -p64 -v -m1000 source.tar Parallel BZIP2 v1.1.1 - by: Jeff Gilchrist [http://compression.ca] [Apr. 17, 2010] (uses libbzip2 by Julian Seward) # CPUs: 64 BWT Block Size: 900 KB File Block Size: 900 KB Maximum Memory: 1000 MB ------------------------------------------- File: 1 of 1 Input Name: source.tar Output Name: source.tar.bz2 Input Size: 445163520 bytes Compressing data... Output Size: 80063773 bytes ------------------------------------------- Wall Clock: 3.795763 seconds
Needless to say I’ve found a new utility for my compression needs.
If you like pbzip2 also check out lbzip2 and plzip. Benchmark comparison http://vbtechsupport.com/1614/. Would be interesting if you tested 64 cores with them !