This is a continuation (without prize money) of the Calgary Compression Challenge, a contest run by Leonid A. Broukhis from May 21, 1996 through May 21, 2016. The goal of the contest is to produce the smallest possible archive containing either the 14 file Calgary corpus, or a program that when run taking input from only other files in the archive (if any), outputs the 14 file Calgary corpus.
Size | Date | Author |
---|---|---|
759881 | Sep 1997 | Malcolm Taylor |
692154 | Aug 2001 | Maxim Smirnov |
680558 | Sep 2001 | Maxim Smirnov |
653720 | Nov 2002 | Serge Voskoboynikov |
645667 | Jan 10, 2004 | Matt Mahoney |
637116 | Apr 2, 2004 | Alexander Rhatushnyak |
608980 | Dec 31, 2004 | Alexander Rhatushnyak |
603416 | Apr 4, 2005 | Przemysław Skibiński |
596314 | Oct 2005 | Alexander Rhatushnyak |
593620 | Dec 3, 2005 | Alexander Rhatushnyak |
589863 | May 2006 | Alexander Rhatushnyak |
580170 | Jul 2, 2010 | Alexander Rhatushnyak |
Submissions must improve on the previous best result by at least 1000 bytes.
An archive is a file or set of files that may be processed by any of the following: unzip, bunzip2, unrar, or or PPMd var. I. Effective June 1, 2017, 7zip, and zpaq. are also allowed. If submitting more than one file, then the size of the archive is calculated as the sum of the file sizes, plus the lengths of the file names, plus 4 bytes per file.
A program is a 32 or 64 bit Linux or Windows executable program or a source program written in C, C++, or Perl. It must run to completion in 6 hours or less on a Core i7 M620 with 4 GB memory. If the archive contains one or more other files, then the program will be run once for each file with the file name passed as a command line argument. Otherwise it will be run with no arguments. The program must not take any input other than from the file whose name is passed to it.
The Calgary corpus is the following set of 14 files:
Size Name ------- ------ 111,261 bib 768,771 book1 610,856 book2 102,400 geo 377,109 news 21,504 obj1 246,814 obj2 53,161 paper1 82,199 paper2 513,216 pic 39,611 progc 71,646 progl 49,379 progp 93,695 transThe concatenation of these files in alphabetical order by name (as shown) to a single file of size 3,141,622 bytes has the following hashes (as computed by fsum):
md5 1b62b5d5c9536368b0b691fd9a41a536 sha-1 937b489e26962b094aff0547e7b34c02eac1b0f5 sha-256 3a1586fb28c0d9b767e561b604092ce73336cd3eedc5df0f29c9db1a63f0f124
I reserve the right to change these rules or to reject submissions not in keeping with the spirit of the contest.
Send your submission to Matt Mahoney at mattmahoneyfl at gmail.com. If accepted, I will post it and add your name to the leaderboard.
May 19, 2016. Created this page in taking over the contest.
Apr. 3, 2017. Added 7zip and zpaq to the list of allowable archive formats, effective June 1, 2017. A decompression program is optional. Calgary corpus defined.