A co-worker and I were discussing the possibilities of performance gain doing extraction of ZIP files using multiple threads rather than the typically single-threading extraction that the majority (if not all) mainstream archive extraction utilities use and given Java’s great ZIP file support built-in it seemed rather trivial to give this a shot, thus, UberZip was born.
UberZip is my simple little sample Java command-line application to extract files where you can specify the number of threads to utilize during extraction. My #1 biggest headache is extracting Eclipse from the ZIP file you get, so I decided that would be an ideal test of my little program. I downloaded the Eclipse J2EE bundle (3.1.1), which is a happy 132 meg with thousands of files.
I need to test further with a machine that actually can better utilize multiple threads, but here are the stats for my AMD Athlon 64 3200+ (Hyperthreaded, so I actually get a very slight benefit) running on Windows XP:
14.7 seconds with 1 thread
13.4 seconds with 5 threads
11.6 seconds with 30 threads
Anything above 30 threads seems to actually create more overhead than gain.
Now, to be fair I matched this up against the fastest unzipper I am aware of, 7zip (http://www.7-zip.org). It took right at about 14 seconds to unzip the file inside 7zip, which seems to fall nearly perfectly in-line with the single-threaded execution of UberZip.
For those of you that would be interested in taking a look at this very simple example, I have committed the source to my public repository:
Further, if you’d simply rather download the JAR or EXE I have uploaded copies of it for those of you that would like to unzip files “uber” fast. 🙂
After taking this to try out on my work machine (Dual Xeon dual-core 3.2 GHz processors + 4 gig of memory = 8 processors in Windows) I got the following stats:
7zip – 25 seconds
UberZip (1 thread) – 22.17 seconds
UberZip (5 threads) – 19.5 seconds
UberZip (30 threads) – 21.6 seconds
Oddly it would seem about 5 threads is the “sweet-spot” for this machine. Though some of the results are a bit strange and it would seem on this machine the hard drives don’t perform quite as well as my home machine it is obvious that at some level of configuration multithreading you can gain some good performance on the single-threaded applications out there.
Perhaps someone else will find this information useful and turn UberZip into a product people can use. 😉
I’ve completely re-written this functionality in Scala and posted it on GitHub: https://github.com/outr/uberzip. It’s faster than ever and more cleanly written. Tested on the latest Eclipse ZIP (320 meg) it can unzip 2,981 files in 0.73 seconds. Doing the same test on the same machine with Linux unzip took 1.7 seconds.