<< Das Piratenkind @ Reaktor | Home | Working with the Canon EF 50mm f/1.8 II: Squep @ Rohstofflager (again) >>

Archiving HTTP log files: bzip2 versus gzip

The bzip2 compressed files are half the size of the gzip compressed files
Bookmark and Share

The servers I work with generate fairly large http log files which we archive away for later analysis. Each month a typical log file (tens of millions of 'hits') gets compressed. I used to do this with gzip, but recently have moved over to bzip2 because I had heard it was more efficient.  But beforehand, in the spirit of "measure before you optimize", I compressed five files from last year before 'trans-compressing' them all to bz2 files as a test. The results:

File Size in MegaBytes (% of original)
File txt gz bz2
A 6,832 403 (5.9%) 203 (3.0%)
B 6,145 366 (6.0%) 184 (3.0%)
C 9,092 513 (5.6%) 257 (2.8%)
D 7,146 418 (5.8%) 208 (2.9%)
E 7,675 448 (5.8%) 223 (2.9%)
While the time to compress files with bzip2 is much, much longer than gzip, the files are half the size of the gzip files, and are a stunning 3% the size of the original files. That means I can file away twice as many of these logs on a single CD, which is very handy!

(Yes, some of us geeks really do spend time and effort thinking about such weighty topics.)
bzip2 versus gzip



Add a comment Send a TrackBack