Large Text Compression Benchmark

Matt Mahoney
Last update: July 9, 2014. history

This competition ranks lossless data compression programs by the compressed size (including the size of the decompression program) of the first 109 bytes of the XML text dump of the English version of Wikipedia on Mar. 3, 2006. About the test data.

The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. Rationale.

This is an open benchmark. Anyone may contribute results. Please read the rules first.

Compression improvements to the first 108 bytes are eligible for the Hutter Prize, with 50,000 euros of funding.

Benchmark Results

Compressors are ranked by the compressed size of enwik9 (109 bytes) plus the size of a zip archive containing the decompresser. Options are selected for maximum compression at the cost of speed and memory. Other data in the table does not affect rankings. This benchmark is for informational purposes only. There is no prize money for a top ranking. Notes about the table:

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
cmix v3                                       15,809,519  125,971,560    274,992 x  126,246,552 267978 266622 26681 CM 66
durilca'kingsize  -m13000 -o40 -t2            16,209,167  127,377,411    407,477 xd 127,784,888   1398  1797 13000 PPM 31
paq8hp12any       -8                          16,230,028  132,045,026    330,700 x  132,375,726  37660 37584 1850 CM   41
paq8pxd_v10fix    -8                          16,607,760  134,780,308     72,382 s  134,852,690  37177 54433 1633 CM   48
zpaq 6.42         -m s10.0.5fmax6             17,855,729  142,252,605      4,760 sd 142,257,365   6699 14739 14000 CM  61

drt|lpaq9m        9                           17,964,751  143,943,759    110,579 x  144,054,338    868   898 1542 CM   41
nanozip 0.09a     w32c -cc -m3g -nm           18,723,846  150,037,341          0 xd 150,037,341   1110  1084 2693 CM   40
xwrt 3.2    -l14 -b255 -m96 -s -e40000 -f200  18,679,742  151,171,364     52,569 s  151,223,933   2537  2328 1691 CM
fp8 v3            -8                          18,438,169  153,188,176     50,068 s  153,238,244  20605 22593 1192 CM   26
WinRK 3.03        pwcm +td 800MB SFX          18,612,453  156,291,924     99,665 xd 156,391,589  68555        800 CM   10
10

ppmonstr J        -m1700 -o16                 19,055,092  157,007,383     42,019 x  157,049,402   3574 ~3600 1700 PPM
slim 23d          -m1700 -o12                 19,077,276  159,772,839     69,453 x  159,842,292   5232 ~5400 1700 PPM
bwmonstr 0.02                                 20,307,295  160,468,597     69,401 x  160,537,998 331801 156147 590 BWT  30
zcm 0.92          -m8 -t1                     19,700,970  160,848,578    225,205 x  161,073,783    489   474 2400 CM   48
nanozipltcb 0.09                              20,537,902  161,581,290    133,784 x  161,715,074     64    30 3350 BWT  40

M03 1.1b          1000000000                  20,710,197  163,667,431     50,468 x  163,717,899    457   406 5735 BWT  52
bcm 0.14          c1000                       20,736,614  163,885,873     74,569 x  163,960,442    162   153 5000 BWT  60
bsc 2.00          -b1000p                     20,789,147  163,888,465    122,581 s  164,011,046    237   199 5095 BWT  39
bbb               m1000                       20,847,290  164,032,650     11,227 s  164,043,877   4524  2619 1401 BWT
mcm v0.3          -9                          19,707,487  164,464,527    122,205 x  164,586,732   1387  1435 1452 CM   26
20

paq9a             -9                          19,974,112  165,193,368     13,749 s  165,207,117   3997  4021 1585 CM
uda 0.300                                     19,393,460  166,272,261     11,264 x  166,283,525  25282 25174  180 CM
BWTmix v1         c10000                      20,608,793  167,852,106      9,565 x  167,861,671   1794   690 5000 BWT  49
lrzip 0.612       -z -L 9 -p 1                19,847,690  169,318,794     99,363 x  169,418,157   2987  2929 2700 CM   33
cm4_ext                                       20,188,048  170,566,799    204,782 x  170,771,581   4123  4130 1906 CM   26

M1x2 v0.6         7 enwik7.txt                20,723,056  172,212,773     38,467 s  172,251,240    711   715 1051 CM   26
cmm4 v0.1e        96                          20,569,034  172,669,955     31,314 x  172,701,269   2052  2056 1321 CM
ccmx 1.30         7                           20,857,925  174,142,092     15,014 x  174,157,106   1313  1338 1332 CM
bit 0.7           -p=5                        20,823,204  174,425,039     62,493 x  174,487,532   2050  2100  663 CM   26
mcomp 2.00        -mw -M320m                  21,103,670  174,388,351    172,531 x  174,560,882    473   399 1643 BWT  26
30

epmopt|epm r9     -m800 -n20 --fixedorder:12  19,713,502  174,817,424    141,101 x  174,958,525   3179  3376  800 PPM
WinUDA 2.91       mode 3 (194 MB)             20,332,366  174,975,730     17,203 x  174,992,933  23610 23473  194 CM
dark 0.51         -b333mf                     21,169,819  175,471,417     34,797 x  175,506,214    533   453 1692 BWT
FreeArc 0.40pre-4 -mppmd:1012m:o13:r1         20,931,605  175,254,732    748,202 x  176,002,934   1175  1216 1046 PPM
hook v1.4         1700                        21,990,502  176,648,663     37,004 x  176,685,667    741   695 1777 DMC  26

7zip 4.46a        -m0=ppmd:mem=1630m:o=10 ... 21,197,559  178,965,454          0 xd 178,965,454    503   546 1630 PPM  23
ash 04a           /m700 /o10                  19,963,105  180,735,542     11,137 x  180,746,679   6100  5853  700 CM
pimple2                                       20,871,457  180,251,530     78,642 x  180,330,172  18474 17992  128 CM
tree 0.9                                      22,366,748  181,324,992      7,104 sd 181,332,096  70723    15 1850 Dict 64
ocamyd LTCB 1.0   -s0 -m3                     21,285,121  182,359,986     21,030 x  182,381,016 108960~110000 300 DMC   6
40

bee 0.79 b0154    -m3 -d8                     20,975,994  182,373,904     57,046 x  182,430,950   9295  9285  512 PPM
st                -b                          21,589,955  182,668,405     13,724 x  182,682,129   1344  1356 1810 PPM  26
uhbc 1.0          -m3 -b100m                  20,930,838  182,918,172     56,242 x  182,974,414   1569   809  800 BWT
smac 1.20                                     21,781,544  183,190,888      4,356 x  183,195,244   4249  4399 1542 CM   26
ppmd J1           -m256 -o10 -r1              21,388,296  183,964,915     11,099 s  183,976,014    880   895  256 PPM

tc 5.2 dev 2                                  21,481,399  184,939,711     41,112 x  184,980,823   3637  3655  230 CM
rings 2.0         -m7 -t1                     21,194,965  185,256,848    164,995 x  185,421,843    375   206  493 BWT  26
bwtsdc v1                                     23,414,955  185,709,858      8,421 s  185,718,279   2100   420 5213 BWT  47
fbc v1.1          333333334                   22,554,133  185,975,548     23,576 x  185,999,124    451   415 1647 BWT  55
ppmvc v1.1        -m256 -o8 -r1               21,484,294  186,208,405     25,241 x  186,233,646    898   913  272 PPM
50

chile 0.4         -b=244141                   22,218,917  186,979,614     11,530 s  186,991,144   2513   512 1426 BWT
bwtdisk 0.9.0     -b 2 -m 3500                24,725,277  190,004,306    169,579 s  190,173,885   1124       3500 BWT  48
CTXf 0.75 pre b1  -me                         22,072,783  191,008,871     57,337 x  191,066,298   1112  1037   78 PPM
m03exp 2005-02-15 32MB blocks                 21,948,192  191,250,500     44,593 x  191,295,093  ~4800 ~2100  256 BWT
Stuffit 12.0.0.17 -m=4 -l=16 -x=30            22,105,654  190,372,707  2,658,122 xd 193,030,829    628   658 1062 PPM

plzma v3b         c2 ... (see below)          24,206,571  193,240,160    101,221 x  193,341,381   8889    55 10110 LZ77 58
crook v0.1        -m1600 -O8                  22,503,627  193,333,159      8,539 s  193,341,698    483   513 1641 PPM  26
ppmx 0.03                                     22,572,808  193,643,464     54,964 x  193,698,428    777   784  609 PPM  26
lzturbo 1.1       -49 -b1000 -p0              24,416,777  194,681,713    110,670 x  194,792,383   1920     9 14700 LZ77 59
enc 0.15          aq                          22,156,982  195,604,166     94,888 x  195,699,054   6843  6868   50 CM
60

comprolz 0.11.0-bugfix1  -b250 -f             22,813,215  196,651,379     29,453 x  196,680,832    984   308  688 ROLZ 26
sbc 0.970r2       -ad -m3 -b63                22,470,539  197,066,203     99,094 xd 197,165,297   1733   313  224 BWT
WinRAR 3.60b3     -mc7:128t+ -sfxWinCon.sfx   22,713,569  198,454,545          0 xd 198,454,545    506   415  128 PPM
quark v0.95r beta -m1 -d25 -l8                22,988,924  198,600,023     80,264 x  198,680,287  27952   217  534 LZ77
lzip 1.14-rc3     -9 -s512MiB                 24,756,063  199,410,543     21,682 s  199,432,225   2409    21 5632 LZ77 57

comprox 0.11.0-bugfix1 -b250 -f -m100         23,064,386  199,515,912     34,176 x  199,550,088    917   153  688 LZ77 26
bssc 0.95 alpha   -b16383                     23,117,061  201,810,709     45,489 x  201,856,198    578   217  140 BWT   4
flashzip 1.0.0    -mx7 -b7                    23,869,034  202,363,445    123,053 x  202,486,498   1296   122  802 ROLZ 26
lzham alpha 3 x64 -m4 -d29                    24,954,329  206,393,809    155,282 x  206,549,091    595     9 4800 LZ77 45 
uharc 0.6b        -mx -md32768                23,911,123  208,026,696     73,608 xd 208,100,304   1666  1330   50 PPM
70

TarsaLZP Jan 29 2012                          24,751,389  208,867,187     13,081 s  208,880,268    203      ~2000 LZP  54
GRZipII 0.2.4     -b8m                        23,846,878  208,993,966     41,645 s  209,035,641    312   216   58 BWT
4x4 0.2a          4t (grzip:m1:h18)           23,833,244  208,787,642    317,097 x  209,104,739    386   240  269 BWT
rzm 0.07h                                     24,361,070  210,126,103     17,667 x  210,143,770   2336    81  160 ROLZ
pim 2.50          best                        24,303,638  210,124,895    330,901 x  210,455,796    764  ~764   88 PPM

xz 5.0.1          -9 -e                       24,831,648  211,776,220    103,692 x  211,879,912   2482    36  660 LZ77 26
CTW 0.1           -d6 -n16M -f16M             23,670,293  211,995,206     43,247 x  212,038,452  19221 19524  144 CM
boa 0.58b         -m15                        24,322,643  213,845,481     55,813 x  213,901,294   3953 ~4100   17 PPM
packet 1.1        -mx -b512 -h4               25,348,872  213,722,850    265,102 x  213,987,952    767    26 1500 LZ77 48
yxz 0.11          -m9 -b7 -h6                 25,754,856  214,317,684    131,062 x  214,448,746    642    77 1590 LZ   26
80

tornado 0.6       -16                         25,768,105  217,749,028     83,694 s  217,832,722   1482     9 1290 LZ77 48
LZPXj 1.2h        9                           25,205,783  217,880,584      4,853 s  217,885,437    783   717 1316 PPM  
scmppm 0.93.3     -l 9                        25,198,832  217,867,392     37,043 s  217,904,435    708   644   20 PPM
acb 2.00c         u                           25,063,656  218,473,968     38,976 x  218,512,944  10656 10883   16 LZ77 26
crushm                                        25,013,576  218,656,416     30,097 x  218,686,513    617   649   39 CM   26

PX v1.0                                       24,971,871  219,091,398      3,054 s  219,094,452   1838  1809   66 CM    3
DGCA 1.10         default+SFX                 25,203,248  219,655,072          0 xd 219,655,072    858   270   76
Squeez 5.20.4600  sqx2.0 32MB Ultra           25,118,441  220,004,873     91,019 xd 220,095,892   2575   116  365
fpaq2                                         25,287,775  221,242,386      3,429 s  221,245,815  20183 20186  131 CM
TinyCM 0.1        9                           25,913,605  221,773,542     12,553 x  221,786,095   1342  1330 1083 CM   26
90

dmc               c 1800000000                25,320,517  222,605,607      2,220 s  222,607,827    676   721 1800 DMC
szip 1.12a        -b41o16                     26,120,472  227,586,463     31,708 x  227,618,171   1191   289   21 BWT  26
balz 1.13         ex                          26,421,416  228,337,644     49,024 x  228,286,668   3700   190  206 ROLZ
lzpm 0.11         9                           26,501,542  229,083,971     46,824 x  229,130,795  15395    57  740 ROLZ
qazar 0.0pre5     -l7 -d9 -x7                 26,455,170  229,846,871     71,959 x  229,918,830   5738   903  105 LZP

csc32 final       -m3 -d512                   26,842,072  229,929,654     53,665 s  229,983,319    423    47  660 LZ77 26
KuaiZip 2.3.2 x86                             25,895,915  227,905,650  3,857,649 x  231,763,299   1061    47  197 LZ77 26
qc 0.050          -8                          26,763,343  232,784,501     46,100 x  232,830,601   8218  1503  151
ppms J            -o5                         26,310,248  233,442,414     16,467 x  233,458,881    330   354  1.8 PPM
dzo beta                                      26,616,115  235,056,859    618,883 x  235,675,742   1088   159  200 LZ77 26
100

comprox_ba 20110929                           27,828,189  242,846,243      4,134 s  242,850,377    397   101  226 BWTS 48
WinTurtle 1.60    512 MB buffer               28,379,612  245,217,944    160,090 x  245,378,034    273   237  583 PPM
diz                                           26,545,256  246,679,382     12,945 s  246,692,327  21240 22746 1350 PPM  26
lza x64 0.10      -mx5 -b7 -h7                28,835,165  246,671,312    259,425 x  246,930,737    265    12 1800 LZ77 48
cabarc 1.00.0601  -m lzx:21                   28,465,607  250,756,595     51,917 xd 250,808,853   1619    15   20 LZ77

sr3                                           28,926,691  253,031,980      9,399 s  253,054,625    148   160   68 SR   26
bzip2 1.0.2       -9                          29,008,736  253,977,839     30,036 x  254,007,875    379   129    8 BWT
RangeCoderC v1.7  c7 26                       28,788,013  254,527,369      7,858 x  254,535,227   2460  2436 1116 CM   26
quad v1.11        -x                          29,110,579  256,145,858     13,387 s  256,159,245    956   116   34 ROLZ
WinACE            -sfx -m5 -d4096             29,481,470  257,237,710          0 xd 257,237,710   1080    77    4
110

RH4_x64 22Mar2014 c6                          29,553,289  258,411,625     79,155 x  258,490,780    301    12   27 ROLZ 48
lzsr 0.01                                     29,433,834  258,912,605     40,287 x  258,952,892    194    88    6 LZ77 26
xpv5              c2                          29,963,217  262,525,246     14,371 x  262,539,617   2359   516    9 ROLZ 26
sr3c 1.0                                      29,731,019  266,035,006      7,701 x  266,042,707    160   145    5 SR   26
lzc v0.08         10                          30,611,315  266,565,255     11,364 x  266,576,619    302    63  550 LZ77

libzling 20140430-bugfix e4                   30,707,022  268,793,105     32,148 s  268,825,253     40    10   27 ROLZ 48
crush 1.00        cx                          31,731,711  279,491,430      2,489 s  279,493,919    948   2.9  148 LZ77 60
bzp 0.2                                       31,563,865  283,908,295     36,808 x  283,945,103    110   120    3 LZP
ha 0.98           a2                          31,250,524  285,739,328     28,404 x  285,767,732   2010  1800  0.8 PPM
irolz                                         33,310,676  292,448,365      4,584 s  292,452,949    274   144   17 ROLZ 26
120

lcssr 0.2         -b7 -l9                     34,549,048  296,160,661      8,802 x  296,169,463   8186  8281 1184 SR
zlite                                         33,975,840  298,470,807      4,880 s  298,475,687     61    28   36 ROLZ 26
lazy 1.00         5                           35,024,082  306,245,949      5,986 s  306,251,935    273    24   96 LZ77 26
zhuff 0.97 beta   -c2                         34,907,478  308,530,122     63,209 x  308,593,331     24   3.5   32 LZ77 48
slug 1.27                                     35,093,954  309,201,454      6,809 x  309,208,263     32    28   14 ROLZ

pigz 2.3          -11                         35,002,893  309,812,953     52,717 s  309,865,670   2237    13   25 LZ77 48
kzip May 13 2006  /b1024                      35,016,649  310,188,783     29,184 xd 310,217,967   6063    62  121 LZ77  2
uc2 rev 3 pro     -tst                        35,384,822  312,767,652    123,031 x  312,890,683    360    63    4 LZ77
thor 0.95         e4                          35,795,184  314,092,324     49,925 x  314,142,249     64    34   16 LZP
etincelle a3                                  35,776,971  314,801,710     44,103 x  314,845,813     29    18  976 ROLZ 26
130

gzip124hack 1.2.4 -9                          36,273,716  321,050,648     62,653 x  321,113,301    149    19    1 LZ77 
doboz 0.1                                     36,367,430  322,415,409     83,591 x  322,499,000    533   3.4 1200 LZ77 48
gzip 1.3.5        -9                          36,445,248  322,591,995     38,801 x  322,630,796    101    17  1.6 LZ77
Info-ZIP 2.3.1    -9                          36,445,373  322,592,120     57,583 x  322,649,703    104    35  0.1 LZ77
pkzip 2.0.4       -ex                         36,556,552  323,403,526     29,184 xd 323,432,710    171    50  2.5 LZ77

jar (Java) 0.98-gcc  cvfM                     36,520,144  323,747,582     19,054 x  323,766,636    118    95  1.2 LZ77
PeaZip            better, no integrity check  36,580,548  323,884,274    561,079 x  324,445,353    243   243    8 LZ77 20
arj 3.10          -m1                         37,091,317  328,553,982    143,956 x  328,697,938    262    67    3 LZ77 26
ulz               c5                          37,652,826  332,626,591     47,809 x  332,674,400   1077     9   43 LZ77 26
lzgt3a                                        37,444,440  334,405,713      4,387 xd 334,410,100   1581  2886    2 LZ77
140

lzuf Apr.15.2009                              38,036,810  338,488,945      4,070 xd 338,493,015    446    40    2 LZ77 26
pucrunch          -d -c0                      39,199,165  350,265,471     34,359 s  350,299,830   2649   463    2 LZ77
packARC v0.7RC11  -sfx -np                    38,375,065  361,905,425          0 xd 361,905,425   1359  1486   23 CM
urban                                         38,215,763  362,677,440      4,280 s  362,681,720    381   450    6 o2   48
lzop v1.01        -9                          41,217,688  366,349,786     54,438 x  366,404,224    289    12  1.8 LZ77

lzw 0.2                                       41,960,994  367,633,910        671 s  367,634,581   3597    31   18 LZW
MTCompressor v1.0                             41,295,546  370,152,396      3,620 x  370,156,016    173   117   74 LZ77 26
arbc2z                                        38,756,037  379,054,068      6,255 sd 379,060,323   2659  2674   68 PPM
lz4 v1.2          -c2                         42,870,164  379,999,522     49,128 x  380,048,650     91     6   20 LZ77 26
lzss 0.02         cx                          42,874,387  380,192,378     48,114 x  380,240,492    107   2.3  145 LZSS 63
150

xdelta 3.0u       -9                          44,288,463  389,302,725    107,985 x  389,410,710   1021    30   47 LZ77
mtari 0.2                                     41,655,528  397,232,608      4,156 s  397,236,764     80    99   18 CM   26
srank 1.1         -C8                         43,091,439  409,217,739      6,546 x  409,224,285     51    45    2 SR
QuickLZ 1.30b     (quick3)                    46,378,438  410,633,262     44,202 x  410,677,464     48    12    3 LZ77
lzf 1.01          cx                          46,318,130  416,377,741     47,737 x  416,425,469     12   2.3   18 LZ77 60

stz 0.7.2         -c2                         47,192,312  416,524,596     41,941 x  416,566,537     14    13    3 LZ77 26
compress 4.3d                                 45,763,941  424,588,663     16,473 x  424,605,136    103    70  1.8 LZW
BriefLZ 1.05                                  46,638,341  425,384,313      5,298 x  425,389,611     66    18    2 LZ77
lzrw3-a                                       48,009,194  438,253,704      4,750 x  438,258,454     38    17    2 LZ77
fcm1                                          45,402,225  447,305,681      1,116 s  447,306,797    228   261    1 CM1
160

runcoder1                                     46,883,939  458,125,932      5,488 s  458,131,420    140   156    4 o1   26
data-shrinker 23Mar2012                       51,658,517  459,825,318      3,706 s  459,829,024     14     4    2 LZ77 26
lzwc_bitwise 0.7                              46,639,414  463,884,550      4,183 x  463,888,733    123   134   71 LZW  26 
exdupe 0.3.3                                  53,717,422  478,788,378  1,092,986 x  479,881,364     27     5 1000 LZ77 48
lzv 0.1.0                                     54,950,847  488,436,027     10,385 x  488,446,412      4   2.6    3 LZ77 48

FastLZ Jun 12 2007                            54,658,924  493,066,558      7,065 xd 493,073,623     18    13    1 LZ77
sharc 0.9.11b     -c2                         53,175,042  494,421,068     81,001 s  494,502,069     15    14    6 LZP  26
flzp v1                                       57,366,279  497,535,428      3,942 s  497,539,370     78    38    8 LZP
alba 0.5.1        cd                          52,728,620  515,760,096      4,870 s  515,764,966    239    10    4 BPE  48
snappy 1.0.1                                  58,350,605  527,772,054     23,844 s  527,795,898     25    12  0.1 LZ77 26
170

bpe               5000 4096 200 3             53,906,667  532,250,688      1,037 sd 532,251,725    639    28  0.5 Dict 26
kwc                                           54,097,740  532,622,518     15,186 x  532,637,704    438   145  668 Dict 26
bpe2 v3                                       55,289,197  542,748,980      2,979 s  542,751,959    518   132  0.5 Dict 26
fpaq0f2                                       56,916,872  558,645,708      3,066 x  558,648,769    222   207  0.4 o0
ppp                                           61,657,971  579,352,307      1,472 s  579,353,779     80    59    1 SR

ksc               4                           59,511,259  580,557,413     13,507 x  580,570,920  40050  7917 1700 SR   48
lzbw1 0.8                                     67,620,436  590,235,688     21,751 x  590,257,439     15    12   55 LZP  26
lzp2 0.7c                                     67,909,076  598,076,882     40,819 x  598,117,701     11     8   15 LZP  26
NTFS              LZNT1                       76,955,648  636,870,656          0    636,870,656     10     9  0.1 LZ77 26
shindlet_fs                                   62,890,267  637,390,277      1,275 xd 637,391,552    113   103  0.6 o0
180

arb255                                        63,501,996  644,561,595      4,871 sd 644,566,466   2551  2574  1.6 o0
compact                                       63,862,371  648,370,029      3,600 sd 648,373,629    216   164  0.2 o0
TinyLZP 0.1                                   79,220,546  694,274,932      2,811 s  694,277,743     32    38   10 LZP  26
smile                                         71,154,788  695,562,502        207 xd 695,562,709  10517 10414  0.6 MTF  26
barf              (2 passes)                  76,074,327  758,482,743    983,782 s  759,466,525    756    53    4 LZ77

arb2x v20060602                               99,642,909  995,674,993      3,433 sd 995,678,426   2616  2464  1.6 o0b

Fails on enwik9

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
hipp 5819         /o8                         20,555,951  (fails)         36,724 x                5570  5670  719 CM
ppmz2                                         23,557,867  (fails)         29,362 s               92210 88070 1497 PPM  26
XMill 0.8         -w -P -9 -m800              26,579,004  (fails)        114,764 xd                616   530  800 PPM
lzp3o2                                        33,041,439  (fails)         23,427 xd                230   270  151 LZP

Programs that properly decompress enwik8 and don't use external dictionaries are still eligible for the Hutter Prize.

Testing not yet completed

                       Compression               Compressed size      Decompresser  Total size   Time (ns/byte)
Program                  Options                enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------                  -------              ----------  -----------  -----------  -----------  ----- -----  --- --- ----
rdmc 0.06b                                    33,181,612                                          1394  1381      DMC  6
ESP v1.92                                     36,651,292                                           223            LZ77 16

Pareto frontier: compressed size vs. compression time as of Aug. 18, 2008 from the main table (options for maximum compression).

Pareto frontier: compressed size vs. memory as of Aug. 18, 2008 (options for maximum compression).

Notes about compressors

I only test the latest supported version of a program. I attempt to find the options that select the best compression, but will not generally do an exhausitve search. If an option advertises maximum compression or memory, I don't try the alternatives. If you know of a better combination, please let me know. I will select the maximum memory setting that does not cause disk thrashing, usually about 1800 MB. If the compressor is not downloadable as a zip file then I will compress the source or executable (whichever archive is smaller) plus any other needed files (dictionaries) into a single zip archive using 7zip 4.32 -tzip -mx=9. If no executable is available I will attempt to compile in C or C++ (MinGW 3.4.2, Borland 5.5 or Digital Mars), Java 1.5.0, MASM, NASM, or gas.

1. Reported by Guillermo Gabrielli, May 16, 2006. Timed on a Celeron D325 2.53Ghz Windows XP SP2 256MB RAM.
2. Decompression size and time for pkzip 2.0.4. kzip only compresses.
3. Reported by Ilia Muraviev (author of PX, TC, pimple), June 10-July 18, 2006. Timed on a P4 3.0 GHz, 1GB RAM, WinXP SP2.
4. enwik9 reported by Johan de Bock, May 19, 2006. Timed on Intel Pentium-4 2.8 GHz 512KB L2-cache, 1024MB DDR-SDRAM.
5. Compressed with paq8h (VC++ compile) and decompressed with paq-8h (Intel compile of same source code). Normally compression and decompression are the same speed.
6. ocamyd 1.65.final and LTCB 1.0 reported by Mauro Vezzosi, May 30-June 20, 2006. Timed on a 1.91 GHz AMD Athlon XP 2600+, 512 MB, WinXP Pro 2002 SP2 using timer 3.01. ocamyd 1.66.final reported Feb. 3, 2007. Times are process times.
7. Under development by Mauro Vezzosi, May 24, 2006.
8. Reported by Denis Kyznetsov (author of qazar), June 2, 2006.
9. Reported by sportman, May 24, 2006. Timed on a Intel Pentium D 830 dual core 3.0GHz, 2 x 512MB DDR2-SDRAM PC4300 533Mhz memory timing 4-4-4-12 (833.000KB free), Windows XP Home SP2. CPU was at 52% so apparently only one of 2 cores was used. Decompression verified on enwik8 only (not timed, about 2.5 hours). WinRK compression options: Model size 800MB, Audio model order: 255, Bit-stream model order: 27, Use text dictionary: Enabled, Fast analyses: Disabled, Fast executable code compression: Disabled
10. Reported by Malcolm Taylor (author of WinRK), May 24, 2006. Timed on an Athlon X2 4400+ with 2GB, running WinXP 64. Decompression not tested. decompresser size is based on SFX stub size reported by Artyom (A.A.Z.), Sept. 2, 2007, although it was not tested this way.
11. Reported by sportman, May 25, 2006. CPU as in note 9.
12. Reported by sportman, May 30, 2006. CPU as in 9 (50% utilized).
13. xwrt 3.2 options are -2 -b255 -m250 -s -f64. ppmonstr J options are -o10 -m1650.
14. Reported by Michael A Maniscalco, June 15, 2006.
15. Reported by Jeremiah Gilbert on the Hutter group, Aug. 18, 2006. Tested under Linux on a dual Xeon 1.6 GHz(lv) (overclocked to 2.13 GHz) with 2 GB memory. Time is user+sys (real=196500 B/ns).
16. Reported by Anthony Williams, Aug. 19-22. 2006. Timed on a 2.53 GHz Pentium 4 with 512 MB under WinXP Home SP2.
17. Tested Aug. 20, 2006 under Ubuntu Linux 2.6.15 on a 2.2 GHz Athlon-64 with 2 GB memory. Time is approximate wall time due to disk thrashing. User+sys time is 153600 ns/byte compress, 148650 decompress.
18. Reported by Dmitry Shkarin (author of durilca4linux), Aug. 22-23, 2006 for durilca4linux_1; and Oct. 16-18, 2006 for durilca4linux_2. 3 GB memory usage is RAM + swap. Tested on AMD Athlon X2 4400+, 2.22 GHz, 2 GB memory under SuSE Linux AMD64 v10.0. durilca4linux_3 reported Feb. 21, 2008 using 4 GB RAM + 1 GB swap. v2 reported Apr. 22, 2008. v3 reported May 22, 2008.
19. enwik8 confirmed by sportman, Sept. 20, 2006. Compression time 61480 ns/byte timed on a 2 x dual core (only one core active) Intel Woodcrest 2GHz with 1333MHz fsb and 4GB 667MHz CL5 memory under SiSoftware Sandra Lite 2007.SP1 (10.105). Drystone ALU 37,014 MIPS, Whetstone iSSE3 25,393 MFLOPS, Integer x8 iSSE4 220,008 it/s, Floating-point x4 iSSE2 119,227 it/s.
20. Reported by Giorgio Tani (author of PeaZip) on Nov. 10, 2006. Tested on a MacBook Pro, Intel T2500 Core Duo CPU (one core used), with 512 MB memory under WinXP SP2. Time is combined compression and decompression.
21. enwik9 -8 reported by sportman, Dec. 12-13, 2006. Hardware as note 19. enwik9 decompression not verified. paq8hp7 -8 enwik8 compression was reported as 16,417,650 (4 bytes longer; the size depends on the length of the input filename, which was enwik8.txt rather than enwik8). I verified enwik8 -7 and -8 decompression.
22. paq8hp8 -8 enwik9 reported by sportman, Jan. 18, 2007. paq8hp10 -8 enwik9 on Apr. 2, 2007. paq8hp11 -8 enwik9 on May 10, 2007. paq8hp12 -8 enwik8/9 on May 20, 2007. Hardware as in note 19. Decompression verified for enwik8 only.
23. 7zip 4.46a options were -m0=PPMd:mem=1630m:o=10 -sfx7xCon.sfx
24. paq8o8-intel (intel compile of paq8o8) -1, paq8o8z-jun7 (DOS port of paq8o8) -1 reported by Rugxulo on Jun 10, 2008. Timed on a AMD64x2 TK-53 Tyler 1.7 GHz laptop with Vista Home Premium SP1.
25. paq8o8z -1 enwik8 (DJGPP compile) reported by Rugxulo on Jun 17, 2008. Tested on a 2.52 Ghz P4 Northwood, no HTT, WinXP Home SP2.
26. Tested on a Gateway M-7301U laptop with 2.0 GHz dual core Pentium T3200 (1MB L2 cache), 3 GB RAM, Vista SP1, 32 bit. Run times are similar to my older computer.
27. enwik9 size reported by Eugene Shelwien, Mar. 5, 2009. enwik8 size and all speeds are tested as in note 26.
28. Reported by Eugene Shelwien on a Q6600, 3.3 GHz, WinXP SP3, ramdrive: bcm 0.06 on Mar. 15, 2009, bcm 0.08 on June 1, 2009.
29. Reported by kaitz (KZ): paq8p3 on Apr. 19, 2009, v2 on Apr. 21, 2009, paq8pxd on Jan. 21, 2012, v2 on Feb. 11, 2012, v3 on Feb. 23, 2012, v4 on Apr. 23, 2012. 2012 tests on a Core2Duo T8300 2.4 GHz, 2 GB.
30. Reported by Sami Runsas (author of bwmonstr), July 14, 2009. Tested on an Athlon XP 2200 (Win32).
31. Reported by Dmitry Shkarin, July 21, 2009, Nov. 12, 2009. Tested on a 3.8 GHz Q9650 with 16 GB memory under Windows XP 64bit Pro SP2. Requires msvcr90.dll.
32. Reported by Mike Russell, Sept. 11, 2009. Tested on an 2.93 GHz Intel Q6800 with 3.5 GB memory.
33. Reported by Con Kolivas (author of lrzip) on Nov. 27, 2009 (lrzip 0.40), Nov. 30, 2009 (lrzip 0.42), Mar. 17, 2012 (lrzip 0.612). Tested on a 3 GHz quad core Q9650, 8 GB, 64 bit debian linux.
34. Reported by sportman, Nov. 29, 2009 (durilca'kingsize), Nov. 30, 2009 (durilca'kingsize4), Apr. 8, 2010 (bsc 1.0.0). Test hardware: 2 x 2.4GHz (overclocked at 2.53 GHz) quad core Xeon Nahalem, 24GB DDR3 1066MHz, 8 x 2TB RAID5, Windows 2008 Server R2 64bit
35. Reported by zody on Dec. 12, 2009. Tested in Windows 7, x64, 3.6 GHz e8200, 4 GB 1066 MHz RAM.
36. Reported by Ilia Muraviev on Dec. 16, 2009. Tested on a 2.40 GHz Core 2 Duo, DDR2-800 4GB RAM, Windows7 x64.
37. Reported by Sami Runsas, Mar. 3, 2010. Tested under Win64 on a Q6600 at 3.0 GHz.
38. Reported by Ilya Grebnov, Apr. 7, 2010. Tested on an Intel Core 2 Duo E8500, 8 GB memory, Windows 7.
39. Reported by Ilya Grebnov, Apr. 8, 2010. Tested on an Intel Core 2 Quad Q9400, 8 GB memory, Windows 7. bsc 2.00 on May 3, 2010. bsc 2.2.0 on June 15, 2010.
40. Reported by Sami Runsas, May 10, 2010. Tested on an overclocked Intel Core i7 860. nanozip 0.08a tested June 6, 2010. nanozip 0.09a on Nov. 5, 2011.
41. lpaq9m reported by Alexander Rhatushnyak on June 9, 2010. Tested on an Intel Core i7 CPU 930 (8 core), 2.8 GHz, 2.99 GB RAM. paq8hp12any tested June 28, 2010.
42. Reported by Michal Hajicek, June 4, 2010 on an AMD Phenom II 965, 64 bit Windows. WinRK, ppmonstr on June 14.
43. Reported by Ilia Muraviev, June 26, 2010. Tested on a Core 2 Quad Q9300, 2.50 GHz, 4 GB DDR2, Windows 7.
44. Timed on a Dell Latitude E6510 laptop Core I7 M620, 2.66 GHz, 4 GB, Windows 7 32-bit.
45. Reported by Richard Geldreich (lzham author) on Aug. 30, 2010. Tested on a 2.6 GHz Core i7 (quad core + HT), 6 GB, Win7 x64.
46. Reported by Stefan Gedo (ST author) on Oct. 14, 2010. Tested on Athlon II X4 635 2.9 GHz, 4 GB memory, Windows 7.
47. Reported by David A. Scott on Dec. 15, 2010. Tested on a I3-370 with 6 GB DDR3 1033 MHz memory.
48. Timed on a Dell Latitude E6510 laptop Core I7 M620, 2.66 GHz, 4 GB, Ubuntu Linux 64-bit.
49. Tested by the author on a Q9450, 3.52 GHz = 440x8, ramdrive.
50. Tested by the author on an Intel Core i7-2600, 3.4 GHz, Kingston 8 GB DDR3, WD VeloicRaptor 10000 RPM 600 GB SATA3, Windows 7 Ultimate SP1.
51. Tested by Bulat Ziganshin on i7-2600, 4.6 GHz with 1600 MHz RAM (8-8-8-21-1T) and NVIDEA GeForce 560Ti at 900/2000 MHz.
52. Tested by Michael Maniscalco on an 8 core Intel Xeon E5620, 2.40 GHz, 12 GB memory running Windows 7 Enterprise SP1, 64 bit.
53. Tested by the author on a Core i7-2600K @ 4.6GHz, 8GB DDR3 @ 1866MHz, 240GB Corsair Force GT SSD.
54. Tested by Piotr Tarsa on a Core 2 Duo E8400, 8 GiB RAM, Ubuntu 11.10 64-bit, OpenJDK 7.
55. Tested by David Catt on a 64 bit Windows 7 laptop, 2.33 GHz, 4 GB, 4 cores.
56. Reported by the author on a Athlon II X4 635 2.9 GHz, 4GB, Windows 8 Enterprise.
57. Reported by the author on a x86_64 Athlon 64 X2 5200+ with 8 GiB of RAM running GNU/Linux 2.6.38.6-libre.
58. Reported by the author on a 4 GHz i7-930 from ramdrive.
59.
Reported by the author on a I7-2600, 4.6 GHz, 16 GB RAM, Ubuntu 13.04.
60. Tested by Ilia Muravyov on an Intel Core i7-3770K, 4.8 GHz, 16 GB Corsair Vengeance LP 1800 MHz CL9, Corsair Force GS 240 GB SSD, Windows 7 SP1.
61. Tested by Matt Mahoney on a dual Xeon E-2620, 2.0 GHz, 12+12 hyperthreads, 64 GB RAM (20 GB usable), Fedora Linux.
62. Tested by Valéry Croizier on a 2.5 GHz Core i5-2520M, 4 GB memory, Windows 7 64 bit.
63. Tested by Ilia Muravyov on an Intel i7-3770, 4.7 GHz, Corsair Vengenance LP 1600 MHz CL9 16 GB RAM, Samsung 840 Pro 512 GB SSD, Windows 7 SP1.
64. Tested by Kennon Conrad on a 3.2 GHz AMD A8-5500.
65. Tested by sportman on an Intel Core i7 4960X 3.6GHz OC at 4.5GHz - 6 core (12 threads) 22nm Ivy Bridge-E, Kingston 8 x 4GB (32GB) DDR3 2400MHz 11-14-14 under clocked at 2000MHz 10-11-11. Windows 8.1 Pro 64-bit, SoftPerfect RAM Disk 3.4.5 64-bit.
66. Tested by Byron Knoll on a Intel Core i7-3770, 31.4 GB memory, Linux Mint 14.

I have not verified results submitted by others. Timing information, when available, may vary widely depending on the test machine used.

About the Compressors

The numbers in the headings are the compression ratios on enwik9.

.1262 cmix

cmix v1 is a free, open source (GPL) file compressor by Byron Knoll, Apr. 16, 2014. It is a context mixing compressor with dictionary preprocessing based on code from paq8hp12any and paq8l but increasing the number of context models and mixer layers. It takes no compression options.

cmix v2 was released May 29, 2014.

cmix v3 was released June 27, 2014.

            Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program       Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp   Decomp   Mem   Notes
-------       -------    ----------  -----------  -----------  -----------  ------ ------  -----  -----
cmix v1                  16,076,381  128,647,538    279,185 x  128,926,723  181924 179706  20785  66
cmix v2                  15,863,623  126,323,656    310,068 x  126,633,724  580083 577626  28152  66
cmix v3                  15,809,519  125,971,560    274,992 x  126,246,552  267978 266622  26681  66

.1277 durilca

durilca and durilca'light 0.5 by Dmitry Shkarin (Apr. 1, 2006) are closed source, experimental command line file compressors based on ppmd/ppmonstr with filters for text, exe, and data with fixed length records (wav, bmp, etc). durilca'light is a faster version with less compression. Unfortunately both crash on enwik9. Decompression is verified on enwik8.

The -m700 option selects 700 MB of memory. (It appears to use substantially more for enwik9 according to Windows task manager). -o12 selects PPM order 12 (optimal for enwik9 -t0). -t0 (default) turns off text modeling, which hurts compression but is necessary to compress enwik9 (although decompression still crashes). -t2(3) turns on text preprocessing (dictionary; thus the increased decompresser size). -t2 also supports 3 additive flags (4, 8, 16) which have no effect on this data, thus -t2(31) or -t2 (default is 31) give the same compression as -t(3).

durilca 0.5(Hutter) was released 1457Z Aug. 16, 2006. It does not use external dictionaries. When run with 1 GB memory (-m700), -o13 is optimal. With 2 GB (-m1650), -o21 is optimal. The unzipped .exe file is 86,016 bytes.

durilca4linux_1 (0825Z Aug 23 2006) is a Linux version of durilca 0.5(Hutter) which successfully compresses enwik9 and decompresses with UnDur (23,375 bytes zipped, 42,065 bytes uncompressed). All versions of durilca require memory specified by -m plus memory to read the input file into memory. In Windows, this exceeds the 2 GB process limit regardless of available RAM and swap. Thus, enwik9 compresses only under Linux with 2 GB real memory and 1 GB additional swap. The -o12 option is optimal for enwik9 (tested under 64 bit SuSE 10.0 by the author), -o24 for enwik8 (verified by me under 64 bit Ubuntu 2.6.15).

durilca4linux_2 (Oct. 16, 2006) is a closed source Linux version specialized for this benchmark. It includes a warning that use on other files may cause data loss. It requires AMD64 Linux and 3 GB of memory (2 GB for enwik8). The decompresser files (EnWiki.dur and UnDur) are contained within a 241,322 byte zip file in the rar distribution. To compress:

  ./DURILCA d EnWiki.dur
  ./DURILCA e -m1800 -o10 -t2 enwik9
To decompress:
  ./UnDur EnWiki.dur
  ./UnDur enwik9.dur
The first step extracts a compressed dictionary. It is organized in a similar manner to paq8hp2-paq8hp5 in that syntactically related words and words with the same suffix are grouped together. Results are reported by the author under Suse Linux 10.0. I verified enwik8 only (6480 ns/b to compress on a 2.2 GHz Athlon 64 with 2 GB memory under Ubuntu Linux). enwik9 caused disk thrashing.

durilca4linux_3 (dictionary version v1) was released Feb. 21, 2008. Like version 2, it requires extraction of EnWiki.dur before compressing or decompressing, and may not work with files other than enwik8 and enwik9. As tested, requires 64-bit Linux, 4 GB RAM, and 5 GB RAM+swap.

undur3 v2 contains an improved dictionary (version v2), released Apr. 22, 2008, for DURILCA4Linux_3. The compression and decompression programs are the same. The decompression program UnDur (Linux executable) is included. To compress, download durilca4linux_3 and replace the dictionary (EnWiki.dur) with this one. The options are -m3600 (3600 MB memory), -o14 (order 14 PPM), -t2 (text model 2).

undur3 v3, released May 22, 2008, uses an improved dictionary but the same compressor and decompresser as v1 and v2. The dictionary contains 123,995 lowercase words separated by NUL bytes. Of these, 5579 words occur more than once (wasted space?) I tested options -m1500 under Ubuntu Linix with 2 GB memory. At -m1500 top reports 2157 MB virtual memory and 1894 MB real memory. -m1600 caused disk thrashing.

durilca kingsize (July 21, 2009) runs under 64 bit Windows and requires 13 GB memory. It is designed to work only on this benchmark and not in general. The dictionary file EnWiki.fsd must be extracted first from EnWiki.dur before compression or decompression. Requires msvcr90.dll. enwik8 can be compressed with -m1200 (1.2 GB).

durilca4_decoder is a new dictionary for durilca'kingsize (above), Nov. 12, 2009. It is reported as "durilca'kingsize_4" below. Decompression time is reported to be 1411.88 sec with "durilca d" and 1796.98 sec with "UnDur". enwik8 compresses with 1200 MB (-m1200) in 157.38 sec.

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp   Notes
-------           -------                     ----------  -----------  -----------  -----------  ----- -----   -----  
durilca'light 0.5   -m650  -o12               21,089,993  178,562,475  1,495,422 x  180,057,897   1227 (fails)  
durilca 0.5         -m700  -o12 -t0           19,227,202  162,117,578     74,292 x  162,191,870   4140 (fails)
                    -m800  -o128              19,321,003  164,298,178     74,292 x  165,372,470   7718 (fails)
                    -m700  -o12 -t2(3)        18,520,589    (fails)    1,507,312 x                3330  3940
durilca 0.5(Hutter) -m700  -o13 -t2           18,128,339    (fails)       77,295 x                5905
                    -m1650 -o21 -t2           17,958,687    (fails)       77,295 x                6140  6140
durilca4linux_1     -m700  -o13 -t2           18,128,334                  23,375 xd               5950  5880
                    -m1750 -o12 -t2           18,027,888  146,521,559     23,375 xd 146,544,934   5500  7301    18
                    -m1750 -o24 -t2           17,949,422                  23,375 xd               6190  6780
durilca4linux_2     -m1800 -o10 '-t2(11)'     17,002,831  136,536,189    241,322 xd 136,777,511   4249  4827    18
                    -m1800 -o10 -t2           16,998,300  136,596,818    241,322 xd 136,838,140   4405  4894    18
durilca4linux_3 v1  -m3600 -o14 -t2           16,356,063  129,933,145    345,957 xd 130,279,102   3649  3715    18
                    -m1200 -o32 -t2           16,348,796                                          4170  4178    18
durilca4linux_3 v2  -m3600 -o14 -t2           16,323,581  129,670,441    344,525 xd 130,014,966   3628  3639    18
                    -m1200 -o32 -t2           16,316,255                                          4148  4157    18
durilca4linux_3 v3  -m3600 -o14 -t2           16,292,414  129,469,384    339,990 xd 129,809,374   3624  3627    18
                    -m1200 -o32 -t2           16,285,285                                          4135  4138    18
                    -m1500 -o6  -t2           16,517,051  133,674,565                             3852
                    -m1500 -o7  -t2           16,418,799  132,239 495                             4006
                    -m1500 -o8  -t2           16,368,632  131,722,213                             4149
                    -m1500 -o9  -t2           16,335,259  131,549,901    339,990 xd 131,889,891   4261  4344
                    -m1500 -o10 -t2           16,316,775  131,574,739                             4405
                    -m1500 -o11 -t2           16,306,086  131,707,901                             4544
                    -m1500 -o12 -t2           16,299,411  131,807,298                             4554
                    -m1500 -o14 -t2           16,292,414  132,238,662                             4763
                    -m1500 -o16 -t2           16,289,512  132,516,825                             4879
                    -m1500 -o32 -t2           16,285,285  134,238,759                             5440
durilca'kingsize    -m13000 -o40 -t2          16,258,380  127,695,666    333,790 xd 128,029,456   1413 1805     31
                    -m22500 -o40 -t2                      127,695,666                             1806 1814     34
durilca'kingsize_4  -m13000 -o40 -t2          16,209,167  127,377,411    407,477 xd 127,784,888   1398 1797     31
                                              16,209,167  127,377,411                             1788 1802     34

.1323 paq8hp12any

paq8hp12any was developed as a fork of the PAQ series of open source context mixing compressors by Alexander Rhatushnyak. It was forked from the paq8 series developed largely by Matt Mahoney, and uses a dictionary preprocessor (xml-wrt) originally developed by Przemyslaw Skibinski as a separate program and later integrated. All versions are optimized for the Hutter prize. Thus, they are tuned for enwik8. The 12 versions are described below in chronological order. They originally were located here (link broken) and can now be found here (as a zpaq archive) (as of Sept. 16, 2009). All programs are free, GPL open source, command line archivers. Most take a single option controlling memory usage.

Note: these programs are compressed with upack, which compresses better than upx. Some virus detectors give false alarms on all upack-compressed executables. The programs are not infected.

paq8hp1 by Alexander Rhatushnyak, 1945Z Aug. 21, 2006. It is a modification of paq8h using a custom dictionary tuned to enwik8 for the Hutter prize. Because the Hutter prize requires no external dictionaries, the dictionary is spliced into the .exe file during the build process. When run, it creates the dictionary as a temporary file. The program must be run in the current directory (not in your PATH or with an explicit path), or else it can't find this file. The unzipped paq8hp1.exe is 206,764 bytes. Decompression was verified for enwik8 (60730 ns/b for -8, 60660 ns/b for -7). enwik9 is pending.

paq8hp2 (source code) by Alexander Rhatushnyak, 0233Z Aug. 28, 2006 is an improved version of paq8hp1 submitted for the Hutter prize. paq8hp2.exe size is 205,276 bytes. It differs from paq8hp1 mainly in that the 43K word dictionary for 2-3 byte codes is sorted alphabetically. The 80 most frequent words, coded as 1 byte before compression, are grouped by syntactic type (pronoun, preposition, etc).

paq8hp3 (source code) by Alexander Rhatushnyak, released Aug. 29, 2006 is an improved version of paq8hp2 submitted for the Hutter prize on Sept. 3, 2006. The 80 dictionary words coded with 1 byte and 2560 words coded with 2 bytes are organized into semantically related groups or by common suffixes. The 40,960 words with 3 byte codes are sorted from the last character in reverse alphabetical order. paq8hp3.exe is 178,468 bytes unzipped. enwik9 decompression is not yet verified. For enwik8, decompression is verified with time 60300 ns/b compression, 60220 ns/b decompression.

paq8hp4 (source code) by Alexander Rhatushnyak, released and submitted for the Hutter prize on Sept. 10, 2006, is an improved version of paq8hp3. The dictionary is further organized into semantically related groups among 3-byte codes. The unzipped size of paq8hp4.exe is 206,336 bytes.

paq8hp5 (source code) by Alexander Rhatushnyak, released Sept. 20, 2006, is an improved version of paq8hp4, submitted for the Hutter prize on Sept. 25, 2006. The unzipped size of paq8hp5.exe is 174,616 bytes (in spite of a slightly larger dictionary). The dictionary size is optimized for enwik8; a larger dictionary would improve compression of enwik9. Decompression is verified for enwik8 only (-8 at 74640 ns/b). A Linux port of paq8hp5 is by Лъчезар Илиев Георгиев (Luchezar Georgiev), Oct 26, 2006 (mirror).

paq8hp6 (source code) by Alexander Rhatushnyak, released Oct. 29, 2006, is an improved version of paq8hp5. It was submitted as a Hutter prize candidate on Nov. 6, 2006. Unzipped paq8hp6.exe size is 170,400 bytes. The -8 option was not tested on enwik9 due to disk thrashing on my 2 GB PC. Compression was about 25% finished after 9 hours.

paq8hp7a by Alexander Rhatushnyak, Dec. 7, 2006, was intended to supercede paq8hp6 as a Hutter prize entry, then was withdrawn on Dec. 10, 2006 with the release of paq8hp7. Unzipped executable size is 151,664 bytes. -8 for enwik9 (but not enwik8) caused disk thrashing on my computer (2 GB, WinXP).

paq8hp7 (source code) by Alexander Rhatushnyak, Dec. 10, 2006, as a Hutter prize entry. Unzipped paq8hp7.exe size is 152,556 bytes.

paq8hp8 (source code) by Alexander Rasushnyak, Jan. 18, 2007, as a Hutter prize entry (replacing an incorrect version posted 2 days earlier). Unzipped size is 152,692 bytes. The dictionary is identical to paq8hp7.

paq8hp9 (mirror) (source code) by Alexander Rhatushnyak, Feb. 20, 2007, is a Hutter prize entry. Only the -7 option works. The unzipped size of paq8hp9.exe is 112,628 bytes.

paq8hp9any (Feb. 23, 2007) by Alexander Rhatushnyak is a paq8hp9 -7 compatible version with external dictionary where all options work. However the zipped program is larger and -8 was not tested due to disk thrashing, so results are unchanged.

paq8hp10 (Mar. 26, 2007) by Alexander Rhatushnyak was derived from paq8hp9 as a Hutter prize entry. The unzipped size is 103,224 bytes. Only the -7 option works.

paq8hp10any (source code), Mar. 31, 2007, by Alexander Rhatushnyak is archive compatible with paq8hp10 -7 but works with other memory options. When run, paq8hp10.exe and both dictionary files should be in the current directory. This program is not a Hutter prize entry.

paq8hp11 (mirror) by Alexander Rhatushnyak, Apr. 30, 2007, is a Hutter prize entry. paq8hp11.exe is 99,816 bytes. Like paq8hp10, it works only with the -7 option.

  To compress:   paq8hp11 -7 enwik8.paq8hp11 enwik8
  To decompress: paq8hp11 enwik8.paq8hp11

paq8hp11any (source code) by Alexander Rhatushnyak, May 2, 2007, is a paq8hp11 variant that accepts any memory option. It was optimized for speed rather than size. It includes two dictionary files which must be present in the current directory when run, unlike paq8hp11 where the dictionary is self extracted. -8 selects 1850 MB memory. -7 produces the same archive as paq8hp11. Run speeds for -8 enwik8 are 76770+76820 ns/B.

paq8hp12 (mirror) by Alexander Rhatushnyak, May 14, 2007, is a Hutter prize entry. paq8hp12.exe size is 99,696 bytes. It works only with the -7 option like paq8hp11.

paq8hp12any (source code) by Alexander Rhatushnyak, May 20, 2007, is a paq8hp12 variant that accepts any memory option (like paq8hp11any). The -7 option produces an archive identical to that of paq8hp12.

paq8hp12any was updated on Jan. 9, 2009 to fix a compiler issue and add a 64 bit Linux version. Compressed file format was not changed. It was not retested.

Options select memory usage as shown in the table.

           Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program      Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp  Decomp  Mem Note
-------      -------    ----------  -----------  -----------  -----------  -----  -----  --- ----
paq8hp1         -7      17,566,769                 205,783 x               60170  60660  748
                -8      17,397,023  142,477,977    205,783 x  142,683,760  63317        1595
paq8hp2         -7      17,390,490                 204,557 x               62000  62330  747
                -8      17,223,661  141,145,684    204,557 x  141,350,241  65323        1584
paq8hp3         -7      17,241,280                 177,477 x               61360  59690  742
                -8      17,085,021  139,905,045    177,477 x  140,082,522  63420        1586
paq8hp4         -7      17,039,173                 198,525 x              ~65000  65110  755
                -8      16,889,237  138,188,695    198,525 x  138,387,220  67956  68120 1598
paq8hp5         -7      16,898,402                 161,887 x               76300  77710  900  19
                -8      16,761,044  137,017,311    161,887 x  137,179,198 ~85153  75162 1787
paq8hp6         -7      16,731,800  138,828,889    166,715 x  138,995,604  74953  73707  941
                -8      16,568,451  135,281,289    166,715 x  135,448,004  60865        1807  21
paq8hp7a        -7      16,592,672  137,441,743    150,678 x  137,592,421  79795         940
                -8      16,431,239                 150,678 x               76940  77600 1790
paq8hp7         -7      16,579,500                 151,633 x               79620  79660  940
                -8      16,417,646  133,835,408    151,633 x  133,987,041  66074        1850  21
paq8hp8         -7      16,528,353                 151,711 x               79580  79970  940
                -8      16,372,960  133,271,398    151,711 x  133,423,109  64639        1849  22
paq8hp9         -7      16,516,789  136,676,674    111,653 x  136,788,327  84529  85957  940
paq8hp10        -7      16,490,947                 102,256 x               86720  88890  940
paq8hp10any     -8      16,335,197  132,979,531    333,925 x  133,313,456  55639        1849  22
paq8hp11        -7      16,459,515                  98,851 x              129540 128530  947
paq8hp11any     -8      16,304,862  132,757,799    327,608 s  133,085,407  57503        1850  22
paq8hp12        -7      16,381,959                  98,745 x              130820 131480  936
paq8hp12any     -7      16,381,959                 330,700 x               78860  76190  941
                -8      16,230,028  132,045,026    330,700 x  132,375,726  56993        1850  22
                -8      16,230,028  132,045,026    330,700 x  132,375,726  37660  37584 1850  41

paq8hp1 through paq8hp12 can be used as a preprocessor to other compressors by compressing with option -0. In the following tests on ppmonstr, options were tuned for the best possible compression of enwik8 with 2 GB memory (1.65 GB available under WinXP). The xml-wrt 2.0 options are -l0 -w -s -c -b255 -m100 -e2300 (level 0, turn off word containers, turn off space modeling, turn off containers, 255 MB buffer for dictionary, 100 MB buffer, 2300 word dictionary). The xml-wrt 3.0 options are -l0 -b255 -m255 -3 -s -e7000 (-3 = optimize for PPM).

xml-wrt prepends the dictionary to its output. To make the comparison fair, the compressed size of the dictionary must be added. This is done in two ways, first by compressing the preprocessed text and dictionary and adding the compressed sizes, and second by prepending the dictionary to the preprocessed text before compression. The first method compresses about 1-2 KB smaller.

The uncompressed size of each dictionary for paq8hp1 through paq8hp4 is 398,210 bytes. They contain identical words, but in different order. The first two dictionaries are identical. They compress smaller because they are sorted alphabetically. The dictionary for paq8hp5 is 411,681 bytes. It contains all of the words in the first 4 dictionaries plus 1280 new words (44,880 total).

Preprocessor    Compressor                 enwik8     dict      total    dict+enwik8
------------    ----------               ----------  -------  ----------  ---------
paq8hp1 -0    | ppmonstr J -m1650 -o64   18,322,077   81,190  18,403,267  18,403,991
paq8hp2 -0    | ppmonstr J -m1650 -o64   18,266,424   81,190  18,347,614  18,349,587
paq8hp3 -0    | ppmonstr J -m1650 -o64   18,197,797  107,583  18,305,380  18,306,690
paq8hp4 -0    | ppmonstr J -m1650 -o64   18,170,944  107,590  18,278,534  18,280,098
paq8hp5 -0    | ppmonstr J -m1650 -o64   18,154,921  111,935  18,266,856  18,267,556
xml-wrt 2.0   | ppmonstr J -m1650 -o64   18,625,624
xml-wrt 3.0   | ppmonstr J -m1650 -o64   18,494,374
 (none)         ppmonstr J -m1650 -o16   19,062,555
                ppmonstr J -m1650 -o32   19,084,964
                ppmonstr J -m1650 -o64   19,098,634

The transform done by paq8hp1 through paq8hp5 is based on WRT by Przemyslaw Skibinski, which first appeared in PAsQDa and paqar, and later in paq8g and xml-wrt. The steps are as follows:

WRT has additional capabilities depending on input, such as skipping encoding if little or no text is detected. The dictionary format is one word per line (linefeed only) with a 13 line header.

.1348 paq8pxd_v10

paq8pxd_v10 is the latest versions in the following PAQ series of open source (GPL) context mixing archivers.

p5, p6, and p12 (Matt Mahoney, May 13, 2000) use a neural network with 256K or 4M inputs, no hidden layer and a single output to predict the next bit of input, given hashes of various contexts to select active inputs. The output is arithmetic coded. p5 uses 1 MB memory and context orders 0 to 3. p6 uses 16 MB and orders 0-5. p12 uses 16 MB, orders 1-4 and word-level orders 0-1 as an optimization for text. The programs take no options. The algorithm is described in M. Mahoney, Fast Text Compression with Neural Networks, Proc. AAAI FLAIRS, Orlando, 2000 (C) 2000, AAAI.

paq1 (Matt Mahoney, Jan. 6, 2001) replaces the neural network in p5, p6, p12 with a fixed weighted averaging of model outputs. Described in an unpublished report, M. Mahoney, The PAQ1 Data Compression Program, 2002.

paq6 (Matt Mahoney and Serge Osnach, Dec. 30, 2003) evolved as a series of improvements to paq1. It is described in M. Mahoney, Adaptive Weighing of Context Models for Lossless Data Compression, Florida Tech. Technical Report CS-2005-16, 2005. The most significant improvements are replacing the fixed model weights with adaptive linear mixing (Matt Mahoney), and SSE (secondary symbol estimation) postprocessing on the output probability, and modeling of sparse contexts (Serge Osnach). Other models were added for x86 executable code, and automatic detection of fixed length records in binary data. Intermediate versions can be found here.

paqar 4.5 (Alexander Rhatushnyak, Feb. 13, 2006) is the last of a long series of improvements to paq6 by Alexander Rhatushnyak (paqar: multimixer model, .exe preprocessor, other model improvements), Przemyslaw Skibinski (WRT text preprocessing), Berto Destasio (model tuning), Fabio Buffoni (speed optimizations), David. A Scott (arithmetic coder optimizations), Jason Schmidt (model improvements), and Johan de Bock (compiler optimizations). For text, the biggest improvement was from WRT (Word Reducing Transform), which replaces words with shorter codes from an external English dictionary to PAsQDa 1.0 on Jan. 18, 2005. WRT is described in P. Skibiński, Sz. Grabowski, and S. Deorowicz, Revisiting dictionary-based compression, Software - Practice & Experience, 35 (15), pp. 1455-1476, December 2005. There were a great number of versions by many contributors, mostly in 2004 when the PAQ series moved to the top of most compression benchmarks and attracted interest. Prior to PAQ, the top ranked programs were generally closed source.

paq8f (Matt Mahoney, Feb. 28, 2006) evolved from paq7 (Dec. 24, 2005) as a complete rewrite of paq6/paqar. The important improvements were replacing the adaptive linear mixing of models with a neural network (coded in MMX assembler), a more memory-efficient mapping of contexts to bit histories using a cache-aligned hash table, adaptive mapping of bit histories to probabilities, and models for bmp, tiff, and jpeg images. It models text using whole-word contexts and case folding, like all versions back to p12, but lacks WRT text preprocessing. It served as a baseline for the Hutter prize. Details are in the source code comments.

paq8g (Przemyslaw Skibinski, Mar. 3, 2006) adds back WRT text preprocessing.

paq8h (Alexander Rhatushnyak, Mar. 24, 2006) added additional contexts to the neural network mixer. It was top ranked on enwik9 (but not enwik8) when the Hutter prize was launched on Aug. 6, 2006. This is the 78'th version since p5.

raq8g by Rudi Cilibrasi, released 0721Z Aug. 16, 2006, is a modification of paq8f. It adds a NestModel to model nesting of parenthesis and brackets. The test below for -7 is based on a Windows compile, raq8g.exe. The test for -8 was under Linux. The unzipped Linux executable is 27,660 bytes.

paq8j by Bill Pettis, Nov. 13, 2006, is based on paq8f (no dictionary) with model improvements taken from paq8hp5. It is a general purpose compressor like paq8f, not specialized for text.

paq8ja.zip by Serge Osnach, Nov. 16, 2006, is an improvement of paq8j, using additional contexts based on character classifications.

paq8jb.zip by Serge Osnach, Nov. 22, 2006, adds contexts using the distance to an anchor byte (x00, space, newline, xff) combined with previous characters. The -8 test caused some minor disk thrashing at 2 GB memory under WinXP Home (82% CPU usage). Time reported is wall time.

paq8jc.zip by Serge Osnach, Nov. 28, 2006, improves the record model for better compression of some binary files, although it is slightly worse for text. Time for -8 is wall time at 72% CPU usage.

paq8jd by Bill Pettis, Dec. 30, 2006, improves on paq8j with additional SSE (APM) stages. enwik8 -8 caused some disk thrashing at 2 GB memory.

paq8k is by Bill Pettis, Feb. 13, 2007.

paq8l by Matt Mahoney, Mar. 8, 2007, is based on paq8jd. It adds a DMC model and minor improvements.

paq8fthis2 by Jan Ondrus, Aug. 12, 2007, is paq8f with an improved model for compressing JPEG images. It is otherwise archive compatible with paq8f for data without JPEG images (such as enwik8 and enwik9).

paq8n by Matt Mahoney, Aug. 18, 2007, combines paq8l with the JPEG model from paq8fthis2.

paq8o and paq8osse by Andreas Morphis, Aug 22 2007, is paq8n with an improved model for .bmp images. There are two executables that produce identical archives. paq8o.exe is for Pentium MMX or higher. paq8osse.exe is for newer processors that support SSE2 instructions like the Pentium 4. It is about 8% faster, but uses more memory. Both use the same C++ source but use different (but equivalent) assembler code to implement the neural network mixer. paq8osse.exe was compiled with Intel C++, which produces slightly faster executables than g++ used in earlier versions. The current version is paq8o ver. 2 (Aug. 24, 2007), which fixes the file name extension (was .paq8n) but does not change compression. The benchmark is based on the first version.

paq8o3 by KZ, Sept. 11, 2007, combines paq8o with an improved JPEG model from paq8fthis3 (Jan Ondrus, Sept. 8, 2007) and an improved model for grayscale PGM images from paq8i (Pavel Holoborodko, Aug. 18, 2006). Text compression is unchanged from paq8l, paq8m, paq8o, or paq8o2.

paq8o4 v1 by KZ, Sept. 15, 2007, includes a grayscale .bmp model (based on the grayscale PGM model). Text compression is unaffected. It was compiled with Intel C++. paq8o4 v2 by Matt Mahoney, Sept. 17, 2007, is a port to g++ which allows wildcards, directory traversal, and directory creation, but is 8% slower. It is archive compatible with v1.

paq8o6 by KZ, Sept. 28, 2007, is based on paq8o5 by KZ, Sept. 21, 2007 with the improved JPEG model from paq8fthis4 by Jan Ondrus, Sept. 27, 2007. paq8o5 is paq8o4 with an improved StateMap from lpaq1. The improved compression of enwik8 comes from this StateMap. Compression of enwik8 is unchanged from paq8o5 to paq8o6.

paq8o7 by KZ, Oct. 16, 2007, improves paq8o6 with improved JPEG compression and support for 4 and 8 bit BMP images. Text is not affected.

paq8o8 by KZ, Oct. 23, 2007, improves paq8o7 with improved JPEG compression further.

paq8o8-jun7 is a DOS port of paq8o8 by Rugxulo, June 7, 2008.

paq8o10t is by KZ, June 11, 2008. Discussion.

paq8p3 is by KZ, Apr. 19, 2009.

paq8p3 v2 is by KZ, Apr. 21, 2009.

paq8px_v60_turbo (source code and discussion) was by Jan Ondrus (with contributions from many others), June 20, 2009, and speed optimized by LovePimple on July 11, 2009. By default the turbo version runs in high priority under Windows, but was tested at normal priority. The v60 version was released after a long period of development beginning with v1 on Apr. 25, 2009. Development was aimed mostly at improving x86, image and wav compression. Decompression was not verified.

paq8px_v69 was released Apr. 26, 2010.

paq8pxd by kaitz, Jan. 21, 2012, modifies paq8px_v69 by adding dynamic dictionary preprocessing (based on XWRT), UTF-8 detection, and an alternating byte sparse model.

paq8pxd_v2 by kaitz (KZo) was released Feb. 11, 2012.

paq8pxd_v3 by kaitz (KZo) was released Feb. 23, 2012. Modified im8model, base64 in email model, and fixes false image detection in enwik9.

paq8pxd_v4 by kaitz was released Apr. 19, 2012. Adds 4 bit bmp model, base64 fixes, combines WRT source code and has other fixes.

paq8pxd_v5 by kaitz was released Apr. 18, 2013.

paq8pxd_v7 by kaitz was released Aug. 14, 2013.

paq8pxd_v8 by kaitz was a temporary release on June 16, 2014. It was still under development to fix bugs causing it to fail on JPEG and WAV input, but there were no errors for enwik8 or enwik9. To test, it was compiled from source under 64 bit Ubuntu using g++ 4.8.1 -O3.

paq8pxd_v10fix was released June 21, 2014. It was compiled from source under 64 bit Ubuntu, g++ 4.8.1 -O3.

Options select memory usage as shown in the table. Early versions took no options. Most versions were not tested on enwik9 due to their slow speed.

           Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program      Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp  Decomp  Mem Note
-------      -------    ----------  -----------  -----------  -----------  -----  -----  --- ----
p5                      31,255,092                   9,298 s                3421           1   6
p6                      25,377,998                   9,421 s                4190          16   6
p12                     24,714,219                   9,598 s                4160          16   6
paq1                    22,156,982                  16,436 s                7800   7790   50
paq6 v2         -8      19,589,267                  26,548 s               47624         808
paqar 4.5       -7      18,388,609                 414,164 s              118690 119010  470
paq8f           -7      18,289,559                  34,371 x               68960         854
                -8      18,075,265                  34,371 x               69170        1693
paq8g           -7      17,817,246                 804,867 s               44130         854
paq8h           -7      17,674,700  147,195,723    801,612 s  147,997,335  56511  57278  854   5
raq8g           -7      18,132,399                  33,483 x               84555  84793 1089
                -8      17,923,022                  27,660 x              337430~330000 2095  17
                -8      17,923,022                  27,660 x              196540~196000 2095  15
paq8j           -7      18,208,284                  39,366 s              138030 138260  959
                -8      17,991,628                  39,366 s              138990 136500 1896
paq8ja          -7      18,184,224                  39,781 s              148560 143200  993
                -8      17,968,233                  39,781 s              154700 153990 1965
paq8jb          -7      18,180,081                  39,982 s              148570 148200 1009
                -8      17,964,363                  39,982 s              188590 190190 1999
paq8jc          -7      18,185,705                  40,064 s              150910 152080 1017
                -8      17,970,943                  40,064 s              224410 234900 2015
paq8jd          -7      18,158,159                  40,460 s              157340 156350 1030
                -8      17,943,042                  40,460 s              406730        2028
paq8k           -8      18,239,915                  41,881 s              457150        1463
paq8l           -6      18,518,485                  35,955 x              133910         435
                -7      18,168,563                  35,955 x              134770         837
                -8      17,916,450                  35,955 x              136000 136390 1643
paq8fthis2      -8      18,075,265                  34,846 x               69100  69310 1693
paq8n           -8      17,916,420                  37,402 x              134880 135480 1643
paq8o           -8      17,916,451                  42,389 s              135850 135260 1643
paq8osse        -8      17,916,451                  42,290 s              125260 124570 1778
paq8o3          -8      17,916,450                  43,745 s              134580 134530 1636
paq8o4 v1       -8      17,916,450                  43,876 s              126780 126560 1636
paq8o6          -8      17,904,721                  44,883 s              139530 139520 1712
paq8o7          -8      17,904,756                  45,979 s              139140 138530 1574
paq8o8          -8      17,904,756                  46,381 s              139370 139150 1574
paq8o8-intel    -1      22,260,679                  46,381 s               24687          37  24
paq8o8z-jun7    -1      22,260,679                  49,085 s               25919          37  24
                -1      22,260,680                                         29639          37  25
paq8o10t        -8      17,772,821                  50,865 s              144250 143720 1591
paq8p3          -7      18,044,229  150,709,834     57,288 s  150,767,122  72412         803  29
paq8p3 v2       -7      17,990,788                                         86891         803  29
                -8      17,759,875                                         87305        1574  29
paq8px_v60_turbo -8     17,733,057  146,272,609     53,846 s  146,326,455 143846        1643  26
paq8px_v69      -7      17,939,225                                         20170         878  26
paq8pxd_v1      -7      17,596,170  144,773,408     83,547 s  144,856,955  63302         811  29
paq8pxd_v2      -7      17,045,653                                         94280         853  29
                -8      16,848,214                                         95350        1658  29
paq8pxd_v3      -7      17,045,354  140,110,094     72,976 s  140,183,094  80069         853  29
                -8      16,847,903  136,777,893     72,976 s  136,850,869  82822        1658  29
paq8pxd_v4      -8      16,642,941  135,027,170     67,766 s  135,094,936  88409        1633  29
paq8pxd_v5      -8      16,699,597                  67,745 s              114960 116450 1633  26
paq8pxd_v7      -8      16,606,773  134,791,909     70,210 s  134,862,119  93751        1633  29        
paq8pxd_v8      -8      16,607,759  134,781,085     72,059 s  134,853,144  59387  54611 1521  48
paq8pxd_v10fix  -8      16,607,760  134,780,308     72,382 s  134,852,690  37177  54433 1633  48

.1422 zpaq

zpaq 1.03 is a free, open source command line archiver by Matt Mahoney, Sept. 8, 2009. zpaq implements the proposed ZPAQ standard format for highly compressed data. The goal of the standard is to allow the development of new compression algorithms without breaking compatibility with older decompressers. ZPAQ is described by the level 1 specification and a reference decoder. The specification does not describe the encoding algorithm. It only requires that compressed files be readable by the reference decoder, which was first released with the standard on Mar. 12, 2009 (v1.00). The release followed a development period with 9 experimental and incompatible version (level 0, v0.01 through v0.09) released beginning Feb. 15, 2009. All level 1 versions from v1.00 onward are forward and backward compatible with each other. Higher levels may be introduced in the future with only a forward compatibility requirement: higher level decompressers must read archives produced by lower level compressors, back to level 1.

A ZPAQ archive is organized into independently compressed blocks. Each block is divided into one or more segments which must be decompressed in sequence. Each segment represents a file or a part of a file. The standard supports both archivers and single file compressors. In the case of a compressor, no filenames are stored in the segment headers, and all the blocks and segments are concatenated to a single output file specified by the user.

ZPAQ uses a streaming format that can be read or written in a single pass. The arithmetic coded data is designed so that the end of a segment can be found by scanning quickly without decoding. There is no central directory information to update when blocks are added, removed, or reordered.

The ZPAQ standard requires that the decompression algorithm be described in the block headers. The header describes a collection of bitwise predictive models based loosely on PAQ components, a program to compute the bytewise contexts for each model, and a second program to perform arbitrary postprocessing on the output data. The two programs are written in an interpreted bytecode language called ZPAQL.

A ZPAQ model specifies a list of 1 to 255 components. Each component outputs a prediction or probability that the next bit will be a 1. Each component may receive as input a computed 32-bit context and the output predictions of earlier components on the list. The last component's prediction is fed to an arithmetic coder to encode or decode the next bit. The components are as follows:

There are two ZPAQL virtual machines, one (HCOMP) to compute contexts, and one (PCOMP) to postprocess the decoded data. Each program is called once per decoded byte with that byte as input. A ZPAQL machine has the following state:

Most instructions are either 1 byte or 2 bytes with an 8 bit operand (0..255). There is one 3 byte instruction (16 bit jump). The possible instructions are assignment, swap, add, subtract, multiply, divide, mod, and, or, xor, not-and, left shift, right shift, less than, equals, greater than, increment, decrement, complement, jump, conditional jump, hash, output, and halt. The hash instruction is convenient for updating a context hash with an input byte by the formula hash := (hash + byte + 512) * 773.

zpaq 1.03 takes as input a configuration file which describes the arrangement of components, their parameters, and the ZPAQL program HCOMP written one token per byte in a C-like syntax (e.g. "A=B" to assign B to A). PCOMP is not specified because in general the preprocessing step by the compressor is different (and usually more complex) than the postprocessing step. Instead, zpaq 1.03 provides the option of two built-in preprocessors, LZP and E8E9. If selected, the preprocessing is done in C++ by the compressor, and the compressor generates ZPAQL code to perform the inverse transform and insert it into the archive block header. (PCOMP is actually appended to the beginning of the input data and compressed with it. HCOMP is not compressed).

E8E9 is used to improve compression of 32 bit x86 executable files. It replaces the 32 bit relative address after a CALL or JMP (0xE8 or 0xE9) x86 instruction by adding the offset from the beginning of the file. This improves compression because often there are several calls to the same target. PCOMP performs the inverse transform in ZPAQL by subtracting the offset.

LZP encodes long string matches as an escape byte and length byte. The decompresser maintains a rolling context hash which indexes a pointer table (the H array) into the output buffer (the M array) pointing to the previous context match. If an escape is present, then the indicated number of bytes are copied from the previous context match. In zpaq 1.03, the user can specify the sizes of M and H, the hash multiplier (effectively choosing the context length), the value to use as the escape byte (preferably occurring rarely in the input), and minimum match length. Escape bytes in the input are encoded as an escaped 0 length.

zpaq 1.03 is distributed with three configuration files, min.cfg (for speed), mid.cfg (the default), and max.cfg (for good compression). However, the user can also write their own config files.

o0.cfg, o1.cfg, and o2.cfg are order 0, 1, and 2 models with a single CM and direct context lookup with no hashing. o0 is equivalent to fpaq0. In each of the models the asymptotic learning rate was tuned for maximum compression. Other values are given as comments in the sources. The CM uses 2KB, 512KB and 128MB respectively.

min.cfg uses LZP preprocessing with a minimum match length of 3 and an order 4 context hash, followed by compression by single CM with an order 3 context and 512K entries. The LZP has a 1 MB output buffer and 256K index. It uses 4 MB memory.

mid.cfg (the default) does no preprocessing. It has an order 0 ICM, a chain of ISSE with context orders 1 through 5, each taking the previous ISSE as input, a MATCH with an order 7 context, and a final MIX with an order 1 context taking input from all other models. It uses 111 MB memory.

max.cfg does no preprocessing. It has 21 components: an order 0 ICM, a chain of order 1, 2, 3, 4, 5, 7 ISSE, an order 8 MATCH, a wordwise order 0-1 ICM-ISSE chain (for text), sparse order 1 ICM with gaps of 1, 2, and 3, a partially masked order 2 ICM with a gap of 216 for CCITT images (calgary/pic), order 0 and 1 mixers taking a CONST and all previous components as input and averaged together with a context free MIX2, followed by a chain of order 0 and 1 SSE each partially bypassed by a context free and order 0 MIX2, and a final context free MIX of all other components. The two wordwise contexts depend on the current and previous case insensitive sequences of letters in the range a-z. It uses 278 MB memory.

max3.cfg is a variation of max.cfg by Jan Ondrus (Sept. 10, 2009) using 550 MB memory and without a CCITT model.

max4.cfg is a variation of max3.cfg (Sept. 15, 2009) using 1465 MB memory.

drt is the dictionary preprocessor from lpaq9m by Alexander Rasushnyak. The results include the dictionary file lpqdict0.dic compressed from 465,210 to 88,759 bytes in 8 seconds as a separate archive with max4.cfg and decompressed in 7 seconds, and drt.exe with a size of 15,548 bytes (whether uncompressed or as a zip file) with 38 seconds to encode enwik9 and 38 seconds to decode.

max_enwik9.cfg is a variation of max.cfg by Mike Russell, Sept. 11, 2009. It adds 5 more models for higher order contexts using an ISSE chain after the first order 5 mixer.

max_enwik9drt.cfg is a variation of max_enwik9.cfg, Sept. 18, 2009, modified to define word contexts for ASCII range 65-255 instead of A-Z,a-z because DRT encodes words using bytes in the range 128-255. The compressed size of lpqdict0.dic is 86810 bytes, 12+9 sec, compressed separately and added to the compressed sizes.

zpipe 1.00 is a ZPAQ compatible streaming file compressor that compresses or decompresses from standard input to standard output. It takes no options. It compresses equivalently to mid.cfg without storing a filename or comment. The decompresser outputs the contents of archives to a single file by concatenation.

bwt_j2.cfg implements an inverse BWT transform. It was writen by Jan Ondrus, Oct. 6, 2009. The forward transform is implemented by an external preprocessor, bwtpre (included above) by Matt Mahoney, Oct. 6, 2009. bwtpre is based on BBB fast mode compression but does not itself compress. The argument ",18" tells bwt_j2.cfg to use a block size of 210+18-256 bytes. Memory usage is 5x blocksize for both the preprocessor and postprocessor, plus 100 MB for the model. The ability of config files to call external preprocessors was added to zpaq v1.05 on Sept. 28, 2009. The ability to pass arguments was added to zpaq v1.07 on Oct. 2, 2009.

zpaq v1.08 (Oct. 14, 2009) adds the capability to compile ZPAQL configuration files and corresponding archive headers to C++ and link to a copy of itself to speed up compression and decompression. The program first looks for an optimized version of the program, writes and compiles it if needed, then runs it to compress or decompress. Some tests are shown for speed comparison. max.cfg was modified to use less memory. The arguments to min.cfg, mid.cfg, and max.cfg have the effect of improving compression at the cost of doubling memory for each increment.

bwt_slowmode1_1GB_block.cfg implements slow mode BWT transform using 1.25x blocksize memory based on BBB. The inverse transform was re-implemented in ZPAQL by Jan Ondrus, Oct. 15, 2009.

zpaq v1.09 is mainly a Linux port of v1.08 with some cosmetic improvements. Times for obwt_j2.cfg,18 are shown for comparison to v1.07 without optimization. Memory usage is 1838 MB for compression (includes preprocessor) and 1443 MB for decompression.

The c command followed by the name of a configuration file creates a new archive using that file. By default the archive header includes the file name (6 bytes), size (10 bytes), and SHA1 checksum (20 bytes). There are options to omit these and save 36 bytes. The "oc" command in zpaq v1.08 optimizes for speed.

zp 1.00 is a ZPAQ compatible archiver by Matt Mahoney, May 7, 2010. It is designed to have fewer options so it is easier to use. It has 3 compression levels: 1=fast, 2=mid, 3=max. It uses compiled ZPAQL code (like zpaq oc/ox) but without requiring an external C++ compiler to be installed. It automatically detects when an archive is compressed with one of these three models and decompresses with compiled code. Otherwise, it will decompress all other ZPAQ compatible archives with slower, interpreted code. Levels 2 and 3 are the same as zpaq mid.cfg and max.cfg. Only level 1 (fast) was tested because it uses a new model, fast.cfg, an ICM chain of length 2 with order 2 and 4 contexts. It is equivalent to compressing with zpaq ocfast.cfg.

                  Compression                          Compressed size      Decompresser  Total size   Time (ns/byte)
Program            Options                           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------            -------                         ----------  -----------  -----------  -----------  ----- -----  --- --- ----
zpaq 1.03          co0.cfg                         61,217,687  620,040,242     14,317 xd 620,054,559    441   453  0.4 o0  26
                   co1.cfg                         46,083,596  454,040,416     14,317 xd 454,054,733    459   480  0.6 o1  26
                   co2.cfg                         36,694,483  346,551,263     14,317 xd 346,565,580    557   560  134 o2  26
                   cmin.cfg                        33,460,947  294,281,789     14,317 xd 294,296,106    438   513    4 LZP 26
                   cmid.cfg                        20,941,558  180,279,221     14,317 xd 180,293,538   3521  3652  111 CM  26
                   cmax.cfg                        19,412,353  165,191,085     14,317 xd 165,205,402  12211 12204  278 CM  26
                   cmax3.cfg                       19,179,311  161,604,379     14,317 xd 161,618,696  14108 13609  550 CM  26
                   cmax4.cfg                       18,986,507  157,246,349     14,317 xd 157,260,666  14061 13077 1465 CM  26
                   cmax_enwik9.cfg                 18,238,435  149,376,058     14,317 xd 149,390,375  11961       2002 CM  32
drt|zpaq 1.03      cmax4.cfg                       18,400,773  149,761,125     29,865 xd 149,790,990   8663  8547 1465 CM  26
                   cmax_enwik9drt.cfg              18,022,167  146,078,502     29,865 xd 146,108,367  11494 11614 1952 CM  26
zpipe 1.00                                         20,941,543  180,279,205     13,421 x  180,292,626   3540  3480  111 CM  26
zpaq 1.07          cbwt_j2.cfg,18                  20,756,888  174,171,969     13,421 x  174,185,390   5593  4347 1838 BWT 26
zpaq 1.08          ocbwt_slowmodel_1GB_block.cfg   20,756,996  163,565,006     29,153 x  163,594,159   7957  3875 1443 BWT 26
                   oco0.cfg                        61,217,687                                           335   407  0.4 o0  26
                   ocmin.cfg                       33,460,960                                           414   383    4 LZP 26
                   ocmid.cfg                       20,941,558                                          2392  2456  111 CM  26
                   ocmax.cfg                       19,448,650                                          6569  6641  246 CM  26
                   ocmax.cfg,3                     18,977,961                                          6667  6640 1861 CM  26
zpaq 1.09          ocbwt_j2.cfg,18                 20,756,883  174,171,965     31,744 x  171,203,709   4529  1847 1838 BWT 26
zp 1.00            c1                              24,837,469  222,310,430     26,815 s  222,337,245    688   776   37 CM  26
                                                                                                        587   688          44

pzpaq 0.01 (a predecessor to zp 1.02) is a free, open source file compressor and archiver by Matt Mahoney, Jan. 21, 2011. It uses a ZPAQ compatible format with speed optimizations for the 3 default compression levels supported by libzpaq, zpaq, and zpipe. It supports parallel compression and decompression by dividing the input into blocks which are compressed or decompressed at the same time in separate threads, writing the result to temporary files, and then comcatenating them when done. For compression with N threads, the input is divided into N blocks of equal size by default, although a different block size can be specified. Larger blocks make compression better but reduce the number of threads that can run at the same time. Using more threads also increases the memory required. pzpaq can also compress or decompress multiple files at once to separate archives or pack them into a solid archive or an archive with the packed files split across blocks within the archive.

The version 0.01 distribution includes a 32 bit Windows executable and source code to compile for Windows or Linux. For Windows, the code must be linked with Pthreads-Win32 and pthreadGC2.dll is required at run time. The program size was calculated from the source code (including libzpaq) required for Linux, which has pthreads installed by default and is not included in the size.

The test results shown below are for 2 machines, a 2.67 GHz Intel Core i7 M620 with 2 cores and 2 hyperthreads per core, running 64 bit Linux (note 48), and a 2.0 GHz Intel T3200 with 2 cores without hyperthreading running 32 bit Windows (note 26). The Linux version was compiled with g++ 4.4.4 -O3 -s -march=native -DNDEBUG. The Windows version used the distributed pzpaq.exe and pthreadGC2.dll. It was compiled with g++ 4.5.0 -O2 -s -march=pentiumpro -fomit-frame-pointer. Times shown are wall (real) times, not process times, in nanoseconds per byte.

We observe the normal 3 way tradeoff between speed, memory, and compression. Compression levels -1, -2, and -3 require 38 MB, 112 MB, and 247 MB per thread respectively. The default is -2. -t selects the number of threads. The default is -t2. -b selects the block size. The default is the input size divided by the number of threads. The -m option limits memory usage in MB by reducing -t. The default is -m500. Selecting larger -m than required has no effect on compression, speed, or actual memory used. -m is only required with -3 -t3 or higher.

                                            C/D time     C/D time
Lev Thr Block      Memory      enwik8       Note 48      Note 26
-------------------------    ----------   -----------   -----------
-1 -t2 -b1000000     -m76    28,176,221                  471
-1 -t2 -b2500000     -m76    26,915,416                  443
-1 -t2 -b5000000     -m76    26,236,689                  436
-1 -t2 -b10000000    -m76    25,728,498                  429
-1 -t4 -b25000000    -m152   25,253,629    210    220
-1 -t3 -b33333334    -m114   25,144,587    220    240
-1 -t2 -b50000000    -m76    25,009,236    240    290    410   430
-1 -t1 -b100000000   -m38    24,837,482    420    470    750   800

-2 -t2 -b1000000     -m224   24,582,373                 1440
-2 -t2 -b2500000     -m224   23,374,191                 1396
-2 -t2 -b5000000     -m224   22,644,738                 1417
-2 -t2 -b10000000    -m224   22,044,838                 1430
-2 -t2 -b25000000    -m224   21,438,679                 1382
-2 -t4 -b25000000    -m448   21,438,679    720    730
-2 -t3 -b33333334    -m336   21,303,705    790    820
-2 -t2 -b50000000    -m224   21,138,877    950    980   1300  1310
-2 -t1 -b100000000   -m112   20,941,571   1510   1560   2350  2330

-3 -t2 -b1000000     -m494   23,281,943                 4142
-3 -t2 -b2500000     -m494   22,105,128                 3896
-3 -t2 -b5000000     -m494   21,371,902                 3866
-3 -t2 -b10000000    -m494   20,745,064                 3854
-3 -t2 -b25000000    -m494   20,073,978                 3816
-3 -t4 -b25000000    -m988   20,073,978   1900   1950
-3 -t3 -b33333334    -m741   19,914,412   2070   2120
-3 -t2 -b50000000    -m494   19,710,450   2180   2250   3670  3990
-3 -t1 -b100000000   -m247   19,448,663   3780   3910   6080  6200

                                            C/D time     C/D time
Lev Thr Block      Memory     enwik9        Note 48      Note 26
-------------------------   -----------   -----------   -----------
-1 -t2 -b1000000     -m76   254,931,717                  582
-1 -t2 -b10000000    -m76   232,278,737                  425
-1 -t2 -b100000000   -m76   224,233,690                  392
-1 -t2 -b250000000   -m76   223,043,964                  393
-1 -t4 -b250000000   -m152  223,043,964    198    223
-1 -t3 -b333333334   -m114  222,789,971    224    254
-1 -t2 -b500000000   -m76   222,544,698    236    276    408   556
-1 -t1 -b1000000000  -m38   222,310,443    410    470    758   800

-2 -t2 -b1000000     -m224  216,322,292                 1377
-2 -t2 -b10000000    -m224  192,436,071                 1286
-2 -t2 -b100000000   -m224  182,293,069                 1275
-2 -t2 -b250000000   -m224  180,995,559                 1278
-2 -t4 -b250000000   -m448  180,995,559    710    742
-2 -t3 -b333333334   -m336  180,716,954    768    811
-2 -t2 -b500000000   -m224  180,516,414    854    881   1275
-2 -t1 -b1000000000  -m112  180,279,234   1487   1532   2231

-3 -t2 -b1000000     -m494  203,976,295                 3824
-3 -t2 -b10000000    -m494  180,499,077                 3657
-3 -t2 -b100000000   -m494  168,839,648                 3611
-3 -t2 -b250000000   -m494  167,036,071                 3635
-3 -t4 -b250000000   -m988  167,036,071   1881   1926
-3 -t3 -b333333334   -m741  166,567,322   2025   2158
-3 -t2 -b500000000   -m494  166,324,415   2172   2236   3599
-3 -t1 -b1000000000  -m247  165,887,518   3708   3846   5989
zp 1.02 is a successor to pzpaq, which was considered experimental. It adds two new BWT compression modes which replace the "fast" (-1) model. Option -m1 selects the faster BWT mode (bwtrle1), which consists of right-context sorting (using libdivsufsoft by Yuta Mori), RLE encoding, and a single order 0 ICM with the RLE state (literal or count) as context. The BWT output is run length encoded by replacing runs of 2 to 257 identical bytes with 2 bytes and a count. The ICM maps the context to a bit history and then to a bit prediction, which is adjusted after coding to reduce the prediction error.

Option -m2 selects the better BWT mode (bwt2), which drops the RLE step and uses an order 0-1 ISSE chain. The order-1 ISSE adjusts the order-0 ICM prediction by mixing it in the logistic domain with a constant, such that the pair of weights is selected by an 8-bit bit history, which is selected by an order 1 context of the BWT output. After coding, the mixing weights are adjusted to reduce the prediction error.

Options -m3 and -m4 select the "mid" and "max" modes, the same as -4 and -5 respectively in pzpaq. The option -bN selects a block size of N*2^20 - 256 bytes. Memory usage per thread for the two BWT modes is 5 times the block size after rounding up to a power of 2. The default is -b32 which uses 160 MB per thread for -m1 and -m2. Memory usage for -m3 and -m4 is not affected by block size. Usage is 111 MB and 246 MB per thread for -m3 and -m4 respectively.

Other changes: there is no longer an option to limit memory. The default number of threads (-t option) is the number of cores. There is no solid mode compression because BWT requires that each block contain only one whole or part of a file. There is a separate decompresser, unzp, which is optimized for fast, mid, max, bwtrle1, and bwt2 modes, and can be configured to optimize for other models by generating, compiling, linking, and running C++ code for an optimized version of itself. Compressed sizes are based on the unzp source code (37,967 bytes).

zpaq 4.00 was released Nov. 13, 2011. It uses libzpaq v4.00, which internally translates ZPAQL into just-in-time (JIT) x86-32 or x86-64, which runs about as fast as the previous version that translated ZPAQL to C++ and compiled it. Unlike the earlier version, it correctly handles all legal ZPAQL, such as jumps into the middle of a 2 byte instruction, such as occurs in max_enwik9.cfg. Like zp 1.02, it uses multi-threading and the same build-in compression levels -m1 through -m4.

Results are shown below for a 4 GB 2.66 GHz Core I7 M620 (note 40), which has 2 cores with 2 hyperthreads each. Run under Ubuntu 64 bit Linux. Compression and decompression times (wall times, ns/byte) are shown for 1 through 4 threads (-t1 through -t4) as the compression method (-m) and block size (-b) are varied. max_enwik9 runs in one thread in a single block.

Compressor  Options        enwik8       enwik9       -t1        -t2        -t3        -t4     MB/thread
----------  --------     ----------  -----------  ---------  ---------  ---------  ---------  ----------
zp 1.02     -m1 -b32     24,091,153  210,224,876   264  313   144  184   131  170   120  165   160
            -m1 -b128    22,823,452  197,571,474   264  335   163  208   137  187   136  179   640
            -m1 -b256    22,823,452  191,741,553              167  218                        1280
            -m2 -b32     22,440,353  195,887,789   446  514   259  304   237  274   231  267   160
            -m2 -b128    21,246,043  184,023,690   467  543   291  343   250  295   248  294   640
            -m2 -b256    21,246,043  178,551,919              304  351                        1280
            -m3 -b32     21,301,940  185,584,854  1420 1478   805  856   760  790   713  745   111
            -m3 -b128    20,941,571  181,908,375  1430 1491   851  897   772  823   723  758   111
            -m3 -b1024   20,941,571  180,279,234  1446 1503                                    111
            -m4 -b32     19,912,920  172,989,918  3567 3695  2075 2145  1966 2011  1868 1906   246
            -m4 -b128    19,448,663  168,312,889  3578 3706  2156 2234  1984 2043  1875 1925   246
            -m4 -b1024   19,448,663  165,887,518  3597 3732                                    246


             Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------     ------------     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
zpaq 4.00   -mmax_enwik9     18,238,435  149,376,058     66,958 s  149,440,016   6327  6528 2002 CM   48

zpaq v6.12, Oct. 19, 2012, is a journaling, deduplicating, incremental archiver. These features were added in zpaq v6.00 on Sept. 26, 2012. It implements the level 2 ZPAQ standard introduced with libzpaq v5.00 on Feb. 1, 2012. The level 2 standard allows for uncompressed (but possibly pre/post-processsed) data. The format is described in the ZPAQ specification v2.01.

zpaq v6.12 is designed for large backups. It will compress 100 GB to an external drive in a few hours, then perform daily incremental backups of files whose dates have changed in a few minutes. It recursively traverses directories, storing last-modified dates and attributes of added files.

A journaling archive is append-only. When a journaling archive is updated, it keeps both the old and new versions of each file or directory. The old version can be extracted by specifying a dated version, and any later updates are ignored.

Input is deduplicated before compression by dividing input files into fragments averaging 64 KB on content-dependent boundaries that move when data is inserted or removed. The archive stores fragment SHA-1 hashes and stores any fragment with a matching hash as a pointer to an existing fragment. Any remaining fragments are packed into 16 MB blocks in memory and compressed by multiple threads in parallel to memory buffers before being appended to the archive. After compression is completed, the fragment sizes and hashes are appended, and then a list of index updates in separately compressed blocks. Each update is either a deletion (filename only) or an update (filename, date, attributes, and list of fragment pointers).

An update is performed as a transaction by first appending a temporary header, then the compressed data and index, and then finally going back and updating the header to store the compressed data size so that it can be skipped over when listing the archive contents or preparing a list of files to add or extract. If compression is interrupted or an error occurs, then the temporary header is not updated. If zpaq encounters a temporary header then it assumes that any data following it is corrupted and ignores it during extraction or listing, and overwrites it during the next update.

zpaq also has features to summarize the contents of archives containing millions of files, show update history and version dates, and compare and extract individual files and directories and rename them. Archives can be encrypted.

The deduplication algorithm uses a rolling hash of the input that depends on the last 32 bytes that are not predicted in an order-1 context. Missed predictions (from a 256 byte table) are counted as a heuristic to guess whether a block can be compressed. If not, then it is stored without compression as a speed optimization. There are 4 compression levels (-method 1 through 4). The threshold for compressing a block is 1/16, 1/32, 1/64, and 1/128 of bytes predicted by the order 1 model, respectively. Like earlier versions of zpaq, it also accepts configuration files and external preprocessors. These are always compressed.

The journaling format is not compatible with zpaq versions prior to 6.00. Older versions would decompress a journaling archive to a set of jDC* files that could in theory reconstruct the data. To support older versions, there are three additional modes: streaming, solid, and tiny. In streaming mode, each file is compressed in parallel in a separate block, and large files are split into 16 MB blocks. In solid mode, all files are compressed to a single block in a single thread. Tiny mode is like solid mode except that comments (uncompressed sizes), checksums, and header locator tags (for error recovery) are not stored, saving a few bytes each. None of these modes support journaling, incremental backup, or deduplication, and do not save file attributes or empty directories. An update appends to an archive without checking whether the files have been added before.

There are 4 built in methods. Method 1 is equivalent to "lazy" level 3. It is LZ77 using variable length codes to represent the lengths of literal byte strings or the length and offset of matches to earlier occurrences of the same string in a 16 MB output block. Matches are found by indexing a hash of the next 4 bytes in the input buffer into a table of size 4M which is grouped into 512K buckets of 8 pointers each. The longest match is coded, provided the length is at least 4, or 5 if the offset is greater than 64K and the last output was a literal. Ties are broken by favoring the smaller offset. Bucket elements are selected for replacement using the low 3 bits of the output count.

Literal lengths are coded using "marked binary" Elias gamma codes, where the leading 1 bit of the number is dropped and a 1 bit is inserted in front of the remaining bits and a 0 marks the end. For example, 1100 is coded as 1,1,1,0,1,0,0. Matches are coded as a length and an offset. The length is at least 4. All but the last 2 bits are coded as a marked binary. The number of match bits is given in the first 5 bits of the code. If the code starts with 00, then a literal length and string of literal follow. Otherwise the 5 bits code a number from 0 to 23, and that number of bits, with an implied leading 1 give the offset.

The codes are not compressed further. They are stored in the ZPAQ level 2 format, consisting of a sequence of sub-blocks each preceded by a 4 byte header giving the sub-block size.

Method 2 is also LZ77, but the codes are byte aligned and context modeled rather than coded directly. It also searches 4 order-7 context hashes and 4 order-4 hashes, rather than 8 order-4 hashes like method 1. Method 2 first codes as follows, according to the high 2 bits of the first byte:

  00 = literal of length 1..64, followed by uncompressed bytes.
  01 = match of length 4..11 and offset 1..2048.
  10 = match of length 1..64 and offset of 1..65536.
  11 = match of length 1..64 and offset of 1..16777216.
These codes are arithmetic coded using an indirect context model. The context depends on the parse state and in the case of literals, on the previous byte. An indirect context model maps a context into a bit history (represented as an 8 bit state) and then to a bit prediction. The model is updated by adjusting the prediction to reduce the error by 0.1%. A bit history represents a bounded pair of bit counts (n0,n1) and the value of the most recent bit. The bounds for (n0,n1) and (n1,n0) are (20,0), (48,1), (15,2), (8,3), (6,4), (5,5).

Method 3 uses a Burrows-Wheeler transform (BWT) using libdivsufsort-lite v2.0. This is equivalent to -m2 in older zpaq versions. The input bytes are sorted by their right contexts and compressed using an order 0-1 ICM-ISSE chain. The order 0 ICM (indirect context model) works as in method 2, taking only the previous bits of the current byte (MSB first) as context. The prediction is adjusted by an order-1 indirect secondary symbol estimator (ISSE). An ISSE maps its context (the previous byte and the leading bits of the current byte) to a bit history, and the history selects a pair of mixing weights to compute the weighted average of the constant 1 and the ICM output in the logistic domain, log(p/(1-p)). The output is converted back to linear, and the two weights are updated to reduce the prediction error in favor of the better model. In other words, the output is:

  p' := 1/(1 + exp(-w1*1 - w2*log(p/(1-p))))
and after the bit is arithmetic coded, the weights w1 and w2 are updated:
  w1 := w1 + 1            * 0.001 * (bit - p')
  w2 := w2 + log(p/(1-p)) * 0.001 * (bit - p')

Method 4 is equivalent to mid.cfg or -m3 in older zpaq versions. It directly models the data using an order 0-5 ICM-ISSE chain, an order 7 match model, and an order 1 mixer which produces the bit prediction by mixing the predictions of all other components. The 6 components in the chain each mix the next lower order prediction using a hash of the next higher order context to select a bit history for that context, which selects the mixing weights. A match model has a 16 MB history buffer and a 4M hash table of the previous occurrence of the current context. If a match is found, it predicts the bit that followed the match with probability 1 - 1/(length in bits). The outputs of all 7 models are then mixed as with an ISSE except with a vector of 7 weights selected by an order 1 (16 bit) context, and with a faster weight update rate of about 0.01.

With method 4 you can give an argument like "-method 4 1" to double the memory allocated to the components to improve compression. The same extra memory is needed to decompress. The default is 111 MB per thread. An argument n multiplies memory usage by 2^n. n can be negative.

Methods 1, 2, and 3 only work in journaling and streaming mode, since they have a 16 MB block size limit. Method 4 and configuration files work in all modes.

The following tests are on a 2.0 GHz T3200 with 2 cores. zpaq will automatically detect the number of cores and use the same number of compression or decompression threads, although this can be overridden.

             Compression                 Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------     ------------             ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
zpaq 6.12   -method 1                37,397,857  328,974,375    104,067 s  329,078,442     93    53  152 LZ77  26
            -method 1 -streaming     37,359,931  328,618,875    104,067 s  328,722,942     85    28  151 LZ77  26
            -method 2                31,765,035  281,184,939    104,067 s  281,289,006    196   108  153 LZ77  26
            -method 2 -streaming     31,730,884                                           218   126  151 LZ77  26
            -method 3                23,341,562  203,365,453    104,067 s  203,469,520    429   369  238 BWT   26
            -method 3 -streaming     23,328,888                                           425   375  238 BWT   26
            -method 4                21,768,810                                          1403  1371  299 CM    26
            -method 4 -streaming     21,744,770                                          1403  1356  299 CM    26
            -method 4 -solid         20,941,591                                          2036  2056  109 CM    26
            -method 4 1 -solid       20,740,920                                          2338  2197  216 CM    26
            -method 4 4 -solid       20,581,270                                          2356  2289 1482 CM    26
            -method 4 4 -tiny        20,581,208  173,028,477    104,067 s  173,132,544   2107  2230 1654 CM    26

zpaq v6.19, Jan. 23, 2013, moves the -solid and -tiny modes into a separate program, zpaqd, and eliminates -streaming. It adds 5 more compression levels (0 through 9). -method 5 is max.cfg, a 22 component CM with some of the component sizes reduced to use about 225 MB per thread. -methods 6 through 9 each double the memory size (450 MB to 1.8 GB) and block size (32 MB to 256 MB). All levels except 0 (store uncompressed) have an E8E9 pre/post-processor. -methods 0 through 4 are unchanged.

             Compression                 Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------     ------------             ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
zpaq 6.19   -method 0 -threads 2    100,050,464                                            37    42  169 copy  26
            -method 1 -threads 2     37,398,697                                           143    61  225 LZ77  26
            -method 2 -threads 2     31,766,023                                           294   185  225 LZ77  26
            -method 3 -threads 2     23,342,327                                           635   548  322 BWT   26
            -method 4 -threads 2     21,770,084                                          1319  1331  378 CM    26
            -method 5 -threads 2     20,491,832                                          3778  3773  563 CM    26
            -method 6 -threads 2     19,901,321                                          4446  4615  991 CM    26
            -method 7 -threads 2     19,497,869                                          4625  4711 1845 CM    26
            -method 8 -threads 1     19,038,853  164,475,887     95,914 s  164,571,801   6153  6296 1911 CM    26
            -method 8 -threads 2     19,038,853                                          3553  3551 3800 CM    48
            -method 9 -threads 1     19,004,217  161,001,056     95,914 s  161,096,970   3468  3521 3800 CM    48

zpaq v6.34 has 7 compression methods as follows:

Methods 0 and 1 use 16 MB blocks by default. Methods 2..6 use 64 MB blocks. The size can be specified by a second digit N which specifies 2N MB blocks. Thus, the defaults are 04, 14, 26, 36, 46, 56, 66. Larger blocks compress better but require more memory per thread.

Methods 1..6 use heuristics to detect already compressed data and either store it or compress it with a fast method like 1 depending on the degree of compressibility. The heuristic depends on the 256 byte order-1 prediction table that is used to compute the rolling hash used in the fragmentation algorithm. The table is initialized to all zeros at each fragment boundary, and contains the last byte seen in each of 256 possible 1 byte contexts. If the data is random, then at each fragment boundary (average size 64K), the following properties are expected:

A compressibility statistic is calculated for each test, and the highest (least random) is used. When packing fragments into blocks, if the previous fragments are detected as random and a new file is started, then the block is passed to the compressor when it is 1/8, 1/4, or 1/2 full depending on the total compressibility. Otherwise the block must be at least 3/4 full and there is not room for the next file assuming no deduplication.

In addition, the order 1 tables are used to detect text and x86 (.exe) data types. Text is detected if at least 5 letter, digit, period, or comma contexts predict a space, minus any predicted characters in the range 1..8, 11, 12, 14..31, which normally do not appear in text files. If at least 1/4 of the fragments are detected as text, then methods 5 and 6 add extra models for it. x86 is detected if at least 5 contexts predict a 139 (an x86 MOV reg, r/m instruction). If at least 1/8 of the fragments are detected as x86, then a E8E9 pre/post processor is used in methods 1..6.

LZ77 and BWT removed the 16 MB block size limitation of the previous version. Variable length LZ77 adds an extra field of rb = 1..8 bits to represent the low bits of an offset up to 32 bits, where rb increases by 1 for each doubling of the block size over 16 MB. 2rb - 1 is added to the offset, so that it requires a rb..rb+23 bit code.

Byte aligned LZ77 removed the limitation by eliminating the short code (3 bit length and 11 bit offset) and adding a code with 4 offset bytes. Lengths range from m..m+63 where m is the mininum match length, normally 8 when used with an order-1 context model.

BWT removes the block size limitation by removing the IBWT optimization of packing pointers and the byte pointed to into a single 32 bit linked list element when the block size is over 16 MB. No changes were required for higher compression levels.

zpaq versions since v6.22 support custom context models through the command line. When compressing enwik8 and enwik9 the following models are automatically generated:

Option    Equivalent
------    ----------
  -m 0    -m x4,0
  -m 1    -m x4,1,4,0,3,24,16,18
  -m 18   -m x8,1,4,0,3,27,16,18
  -m 2    -m x6,1,4,8,4,26,16,18
  -m 28   -m x8,1,4,8,4,27,16,18
  -m 3    -m x6,2,8,0,4,26,16,24c0,0,511
  -m 38   -m x8,2,8,0,4,26,16,24c0,0,511
  -m 4    -m x6,3ci1
  -m 48   -m x8,3ci1
  -m 5    -m x6,0ci1,1,1,1,2awm
  -m 58   -m x8,0ci1,1,1,1,2awm
  -m 6    -m x6,0w2c0,1010,255i1c256ci1,1,1,1,1,1,2ac0,2,0,255i1c0,3,0,0,255i1c0,4,0,0,0,255i1mm16ts19t0
  -m 68   -m x8,0w2c0,1010,255i1c256ci1,1,1,1,1,1,2ac0,2,0,255i1c0,3,0,0,255i1c0,4,0,0,0,255i1mm16ts19t0

The meaning is as follows.

x (experimental) rather than a digit selects a specific method which is the same for every block. It can also be s to add in streaming mode with each file in a separate block and large files split into blocks with no deduplication.

The first digit N1 after x selects a maximum block size of 2N1+20 - 4096 bytes. This is selected by the second digit of the method, if present, or else it defaults to 6 for methods 2..6 or 4 otherwise.

The second digit N2 selects the pre/post processing step. 0 means none. 1 means LZ77 with variable length codes. 2 means LZ77 with byte aligned codes. 3 means BWT. 4..7 means 0..3 with E8E9 filtering.

N3..N8 apply to the LZ77 modes only. N3 (4 or 8) is the minimum match length. N4 (8 or 0) if not 0 specifies a context order to search first. N5 (3 or 4) says to search 2N5 contexts of each order to look for matches. N6 (24..27) specifies 2N6 elements in the hash table for lookups. Each entry requires 4 bytes of memory. It defaults to the block size up to N1=26, then N1-1. N7 and N8 specify that the minimum match (N3) should be increased by 1 after a literal or match, respectively, when the match offset is greater than 2N7 or 2N8 respectively.

The sequence of strings starting with letters followed by a comma-separated list of numbers specifies various context models used by methods 3 and higher. c0 specifies an ICM (indirect context model: context to bit history to prediction). c1...c256 (used in -m 6) specifies a CM (context to prediction) with an update rate of 1/count and maximum count of N1*4-4, e.g. c256 specifies 1020. The remaining arguments to c default to 0. N2 describes any special contexts. N2 in 1..255 (e.g. c0,2) means offset mod N2. N2 in 1000..1255 means the distance to the last occurrence of N2-1000 (e.g. c0,1010 means how far from the last linefeed). N3 and up specifies byte masks starting with the most recent context byte (e.g. c0,2,0,255 means offset mod 2 combined with the second context byte (sparse model)). A value of 256..511 includes the byte aligned LZ77 parse state if applicable (e.g. c0,0,511 means the order 1 context plus parse state hashed together).

i followed by a list specifies a chain of ISSE components with each context order increasing by the specified amount by hashing it with the previous component, (e.g. ci1,1,1,1,2 specifies an order 0 ICM chained with order 1, 2, 3, 4, 6 ISSE). Each ISSE (indirect secondary symbol estimator) adjusts the prediction of the previous component in the bit history of the current context (hashed together with the previous component's context).

a specifies a match model, which predicts the bit which followed the most recent occurrence of the current (normally high order) context. It can take parameters specifying buffer size, hash table index size and context order.

wN1 specifies a word model, an ICM-ISSE chain of increasing order from 0 to N1-1 in words rather than bytes. A word is defined as a sequence of letters converted to upper case, ignoring all other characters (e.g. w2 specifies an order 0 ICM and order 1 ISSE). It can take additional parameters specifying an alphabet range and a mask to convert case.

m specifies a mixer, which adaptively averages the predictions of all prior components. It can take a parameter (default 8) which is the number of bits of context to select the mixing weights (e.g. m16 is a byte-wise order 1 context). It takes additional parameters specifying update rate.

t is a MIX2 2-input mixer which averages just the last 2 components.

s is a SSE which adjusts the prevous prediction like an ISSE but using a direct context instead of a bit history. It takes parameters specifying the number of context bits (e.g. s19 selects the current and previous bytes and the 3 high bits of the second byte), and additional parameters specifying initial and final update rates.

-m is short for -method. -th 1 (-threads 1) selects 1 thread. The default on the test machine is 4 (2 cores + 2 hyperthreads). It is also used in decompression to reduce memory.

             Compression                 Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------     ------------             ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
zpaq 6.34   -m 1                     36,720,879  322,717,507                               38    15  456 LZ77  48
            -m 18 -th 1              36,174,283  316,439,766                               85    25 1200 LZ77  48
            -m 2                     32,785,291  287,047,166                               76    17 1500 LZ77  48
            -m 28 -th 1              32,123,217  279,231,899                              159    25 1200 LZ77  48
            -m 3                     30,759,444  270,317,562                               89    56 1500 LZ77  48
            -m 38 -th 1              30,216,795  264,333,006                              198   106 1200 LZ77  48
            -m 4                     21,982,505  189,860,169                              285   224 1800 BWT   48
            -m 48 -th 1              21,293,686  179,016,475                              596   512 1400 BWT   48
            -m 5                     20,742,462  179,365,293                              937   658 2100 CM    48
            -m 58 -th 1              20,214,879  172,645,399                             1931  1430 2400 CM    48
            -m 6                     19,627,225  168,583,236                             2348  2356 3300 CM    48
            -m 68 -th 1              18,998,601  160,541,121     118,086 s  160,659,207  4300  4408 3200 CM    48

The following table shows compression with the config file max5.cfg (Oct. 14, 2013). This is the same model as max_enwik9.cfg except that it was modified to take an argument to double memory usage for most of the components for each increment. With argument 0, it is the same as max_enwik9. Compression was with zpaqd 6.33 (June 20, 2013), which is the developement tool that accompanies zpaq and produces streaming mode archives from a config file. Thus, the command "zpaqd c max5 3 archive enwik9" compresses to archive.zpaq with 3 passed to $1 in max5.cfg. This has the effect of using almost 8 times as much memory for both compression and decompression as max_enwik9. The archive was decompressed with both zpaq 6.42 (Sept. 26, 2013) and with tiny_unzpaq (Mar. 21, 2012, public domain) compiled with g++ 4.1.2 -O3 under Linux on the test machine, which has 20 GB of available memory. zpaq 6.42 is an archiver like zpaq 6.33 with a number of added features and bug fixes unrelated to compression. tiny_unzpaq is a stand-alone program that extracts only streaming mode archives and is designed so that the source code is as small as possible. It does not support JIT compilation of the ZPAQL code, or multithreading and has no error checking or help message. It takes an archive as an argument with no options and extracts to the saved names.

max6.cfg (Oct. 15, 2013) modifies max5 by rewriting the word model and adding models that count brackets ("[" minus "]" in range 0..2) and a column model (counts bytes after the last linefeed in range 0..64). It also changes the memory parameter from $1 to $3 so it can be passed to zpaq like "-m s10.0.5fmax6". This means to choose streaming mode (s), a block size of 2^10 MB (10), no preprocessing (0), pass 5 as $3 selecting 14 GB (or 1 selecting 1.4 GB) using max6.cfg. For this test, tiny_unzpaq is used to extract when the decompresser is given as "sd" although either program could be used.

             Compression                 Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------     ------------             ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
zpaqd 6.33  max5 0                   18,238,448                                          5960        2000 CM   61
            max5 1                   18,135,013  146,750,019                             6309        3400 CM   61
            max5 2                   18,095,676  144,918,290                             6521        6600 CM   61
            max5 3                   18,084,027  143,757,714       4,760 sd 143,762,474  5894 13173 13100 CM   61
zpaq 6.42                                        143,757,714     125,670 s  143,883,384        5985 13500 CM   61
zpaq 6.42   -m s10.0.1fmax6          18,167,158  150,622,666     125,670 s  150,748,336  6368  6475  1400 CM   61
            -m s10.0.5fmax6          17,855,729  142,252,605       4,760 sd 142,257,365  6699 14739 14000 CM   61

zpaq 6.50, Mar. 21, 2014, uses 5 compression levels instead of 6. LZ77 when used in methods 2 and higher uses a suffix array to find matches. There are also other improvements in sorting files, grouping into blocks, detecting file type, detecting random data, and selecting compression algorithm based on type. Tests below used 4 threads.

             Compression                 Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------     ------------             ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
zpaq 6.50   -method 1                35,691,734  314,117,968     137,993 s 314,255,964     35    23  512 LZ77  48
            -method 2                31,184,422  271,626,606     137,993 s 271,764,602    150    24 1800 LZ77  48
            -method 3                21,980,366  189,875,990     137,993 s 190,013,986    222   220 1600 BWT   48
            -method 4                20,740,505  179,455,249     137,993 s 179,593,245    665   670 2200 CM    48
            -method 5                19,625,015  168,590,741     137,993 s 168,728,730   2410  2419 3400 CM    48

.1440 drt|lpaq9m

lpaq versions 1 through 8 may be downloaded here. lpaq9* can be downloaded here or as a zpaq archive. The decompr8 series of Hutter prize entries (decompresser and enwik8 archive) are also listed here because they followed a period of development of the lpaq series.

Note: some of these programs are compressed with upack, which compresses better than upx. Some virus detectors give false alarms on all upack-compressed executables. The programs are not infected.

lpaq1 is a free, open source (GPL) file compressor by Matt Mahoney, July 24, 2007. It uses context mixing. It is a "lite" version of paq8l, about 35 times faster at the cost of about 10% in compression. The "9" option selects maximum memory. The options range from 0 (6 MB) to 9 (1.5 GB). Memory usage is 3 + 3*2N MB, N = 0..9.

The compressor mixes 7 contexts: orders 1, 2, 3, 4, 6, a unigram word context (consecutive letters, case insensitive), and a matched bit context. The contexts (except the matched bit) are mapped to nonstationary bit histories using nibble-aligned hash tables, then mapped to bit prediction probabilities using stationary adaptive tables with bit counts to control adaptation rate. The matched bit context maps the predicted bit (based on a context match), match length and order-1 context (or order 0 if no match) to a bit prediction. The probabilities are combined in the logistic domain (log(p/(1-p)) using a single layer neural network selected by a small context (3 high bits of last byte + context order), then passed through 2 SSE stages (orders 0 and 1) and arithmetic coded. Except for one model for ASCII text, there are no specialized models for binary data, .exe, .bmp, .jpeg, etc.

lpaq2 by Alexander Rhatushnyak, Sept. 20, 2007, contains some speed optimizations.

lprepaq 1.2 by Christian Schnaader, Sept. 29, 2007, is lpaq1 combined with precomp as a preprocessor. precomp compresses JPEG files and also expands data segments compressed with zlib, often making them more compressible. This preprocessing has no effect on text files.

lpaq3 and elpaq3 by Alexander Rhatushnyak, Sept. 29, 2007, has two versions with the same source code. When compiled with -DWIKI, the result is elpaq3 which is tuned for large text files. The normal compile produces lpaq3.

lpaq3a by Alexander Rhatushnyak, Sept. 30, 2007, improves compression on some files over lpaq3 (but not enwik8/9). The archive also contains lpaq3e.exe, which is an archive compatible (Intel compile) of elpaq3.exe.

lpaq4 and lpaq4e (mirror) are by Alexander Rhatushnyak, Oct. 1, 2007. lpaq4e is tuned for large text files.

lpaq5 and lpaq5e are by Alexander Rhatushnyak, Oct. 16, 2007. Option 9 selects 1542 MB memory. lpaq5e is tuned for large text files. It includes separate programs for compression only (lpaq5e-c.exe) and decompression only (lpaq5e-d.exe). Tests were done with these programs, rather than the version that does both (lpaq5e.exe).

lpaq6 and lpaq6e are by Alexander Rhatushnyak, Oct. 22, 2007. Option 9 selects 1542 MB memory. lpaq6e is tuned for large text files. lpaq6 includes a E8E9 transform for compressing x86 executables.

lpaq7 and lpaq7e (mirror) are by Alexander Rhatushnyak, Oct. 31, 2007.

lpaq8 and lpaq8e are by Alexander Rhatushnyak, Dec. 10, 2007. The executables are packed with upack. zip -9 would make them larger.

lpaq1a by Matt Mahoney, Dec. 21, 2007, uses the same model as lpaq1 but replaces the arithmetic coder with the asymmetric binary coder from fpaqb.

lpq1 by Matt Mahoney, Dec. 23, 2007, is an archiver (not a file compressor) based on lpaq1 option 7.

drt|lpaq9e is by Alexander Rhatushnyak, Feb. 20, 2008. It is specialized for English text. It includes a separate program drt.exe (without source code) which performs a dictionary transform prior to compression with lpaq9e. The option 9 is for lpaq9e which selects maximum memory. The program size is computed by adding lpaq9e.exe, drt.exe, and the compressed dictionary, which must be uncompressed with lpaq9e before running. The size is smaller without a zip archive. Decompression consists of uncompressing the dictionary with lpaq9e, uncompressing the transformed file with lpaq9e, and reversing the transform with drt. Run times are for the sum of all three operations (1+62+2943, 1+2929+45 sec).

lpaq9f by Alexander Rasushnyak, Apr. 27, 2007, works like lpaq9e. Run times are (2+55+2801, 2+2819+38 sec). drt uses 8 MB for compression and 4 MB for decompression.

lpaq9g by Alexander Rasushnyak, May 23, 2008, works like lpaq9e. Run times are (2+51+2691, 2+2682+38 sec).

lpaq9h by Alexander Rasushnyak, June 3, 2008, works like lpaq9e. Run times are (2+53+2530, 2+2529+44 sec).

lpaq9i by Alexander Rasushnyak, June 13, 2008, works like lpaq9e. Run times are (2+59+2425, 2+2453+46 sec). drt.exe and the dictionary file (tmpdict0.dic) are unchanged in all versions starting with lpaq9f.

lpaq9j by Alexander Rhatushnyak, Aug. 17, 2008, has a new version of drt.exe and dictionary. Run times are (2+58+2365, 2+2358+48 sec).

lpaq9k is by Alexander Rhatushnyak, Sept. 30, 2008. Run times are (2+59+2336, 2+2346+47 sec). decompresser size is as 3 files (not zipped).

lpaq9l is by Alexander Rhatushnyak, Dec. 2, 2008. Run times are (2+41+2132, 2+2179+40 sec) on the computer described in note 26, and (2+58+2338, 2+2422+50) on the computer used to test all the earlier versions. decompresser size is as 3 files (not zipped).

lpaq9m (zpaq archive) is by Alexander Rhatushnyak, Feb. 20, 2009. Run times are (2+38+2067, 2+2111+38). decompresser size is 3 files (not zipped).

decomp8 is a Hutter Prize entry by Alexander Rhatushnyak, Mar. 23, 2009. It consists of a decompresser (Windows executable only) and an archive (archive8.bin) which decompresses to enwik8. There is no compressor. During decompression, the program creates a temporary file containing a dictionary similar to the one used in paq8hp12 and by drt. The command to decompress is "decomp8 archive8.bin enwik8". The total size (not zipped) is 15,986,677 bytes.

decomp8b is an update to the Hutter prize entry decomp8 by Alexander Rhatushnyak, Apr. 22, 2009. Total size (not zipped) is 15,958,674 bytes.

decmprs8 is an update to the Hutter prize entry decomp8b by Alexander Ratushyak, May 23, 2009. Total size (not zipped) is 15,949,688 bytes. To decompress: decmprs8.exe archive8.dat enwik8

Prog       Opt     enwik8      enwik9         prog       Total       Comp  Deco Mem  Alg Note
----       ---   ----------  -----------      ----     -----------   ----  ---- ---- --- ----
lpaq1       9    19,755,948  164,508,919      6,676 x  164,515,595   3646  3594 1539 CM
lpaq2       9    19,755,471  164,496,295      6,888 x  164,503,183   3260  3354 1539 CM
lprepaq 1.2 9    19,755,989  164,509,300    189,891 x  164,699,191   8696  7888 1582 CM
lpaq3       9    19,580,276  165,600,121      7,514 x  165,607,635   3695  3735 1542 CM
elpaq3      9    19,392,604  160,081,507      7,377 x  160,088,884   3411  3454 1542 CM
lpaq3a      9    19,585,951  165,661,890     12,004 s  165,673,894   4177  4163 1542 CM
lpaq3e      9    19,392,604  160,081,507     12,004 s  160,093,511   3967  3932 1542 CM
lpaq4       9    19,583,905  165,603,612      7,117 x  165,610,729   3693  3697 1542 CM
lpaq4e      9    19,358,662  159,675,213      6,990 x  159,682,203   3383  3422 1542 CM
lpaq5       9    19,455,395  161,410,276      8,382 x  161,418,658   3614  3630 1542 CM
lpaq5e      9    19,078,767  156,194,860      7,841 xd 156,202,701   3428  3605 1542 CM
lpaq6       9    19,562,861  165,224,012      8,848 x  165,232,860   3586  3624 1542 CM
lpaq6e      9    19,054,076  155,943,020      8,866 x  155,951,886   3420  3478 1542 CM
lpaq7       9    19,557,894  162,359,435      9,078 x  163,368,513   3922  3850 1542 CM
lpaq7e      9    19,039,516  155,840,757      8,570 x  155,849,327   3477  3490 1542 CM
lpaq8       9    19,523,803  161,987,713      9,676 x  161,997,389   3682  3718 1542 CM
lpaq8e      9    18,982,007  155,232,477      8,888 x  155,241,365   3424  3475 1542 CM
lpaq1a      9    19,759,778  164,547,926      8,558 x  164,556,484   3462  3423 1540 CM
lpq1             19,888,399  168,467,267      9,151 x  168,476,408   3389  3402  387 CM
drt|lpaq9e  9    18,151,024  145,628,635    110,844 x  145,739,479   3006  2975 1542 CM
drt|lpaq9f  9    18,079,247  144,877,844    110,864 x  144,988,708   2858  2859 1542 CM
drt|lpaq9g  9    18,069,107  144,838,636    110,318 x  144,948,954   2744  2722 1542 CM
drt|lpaq9h  9    18,067,711  144,763,248    110,376 x  144,873,624   2585  2575 1542 CM
drt|lpaq9i  9    18,065,347  144,752,858    110,149 x  144,863,007   2486  2501 1542 CM
drt|lpaq9j  9    18,056,997  144,687,646    110,135 x  144,797,781   2425  2408 1542 CM
drt|lpaq9k  9    18,007,677  144,277,379    110,785 x  144,388,164   2397  2395 1542 CM
drt|lpaq9l  9    17,979,724  144,082,479    110,479 x  144,192,958   2398  2474 1542 CM
drt|lpaq9l  9    17,979,724  144,082,479    110,479 x  144,192,958   2175  2221 1542 CM  26
drt|lpaq9m  9    17,964,751  143,943,759    110,579 x  144,054,338   2107  2151 1542 CM  26
drt|lpaq9m  9    17,964,751  143,943,759    110,579 x  144,054,338    868   896 1542 CM  41
decomp8          15,970,425                  16,252 xd                    78180  936 CM  26
decomp8b         15,942,290                  16,384 xd                    74790  934 CM  26
decmprs8         15,932,968                  16,720 xd                    76080  936 CM  26

drt may be combined with other compressors to improve compression. The following were obtained using drt and tmpdict0.dic (from lpaq9i) with ppmonstr J (PPM). Option -m1650 selects 1650 MB memory. -r1 partially rebuilds the model when memory is exhausted. -o select the PPM model order. Compression time is for ppmonstr only. Mem8 is actual memory used to compress enwik8.drt. enwik9.drt always uses 1650 MB. As a separate compressor, the compressor size would be 147,915 for a zip file containing drt.exe, ppmonstr.exe, and tmpdict0.pmm (tmpdict0.dic compressed with ppmonstr -m1650 -r1 -o64). Total size would be 148,047,289.

For drt 9j, the decompresser size is 149,468 and total size is 147,196,757.

    Compressors          options         enwik8    enwik9       Comp Mem8
-------------------  ----------------  ----------  -----------  ---- ----
drt 9i | ppmonstr J  -m1650 -r1 -o10   18,185,633  147,936,682  2509  825
                     -m1650 -r1 -o11   18,166,961  147,899,374  2634  895
                     -m1650 -r1 -o12   18,152,982  147,907,628  2661  953
                     -m1650 -r1 -o16   18,142,625  148,306,179  2888 1109
                     -m1650 -r1 -o32   18,124,722  149,857,650  3361 1371
                     -m1650 -r1 -o64   18,122,785  151,343,426  3870 1554
                     -m1650 -r1 -o128  18,130,333                    1650
drt 9j | ppmonstr J  -m1650 -r1 -o11   18,165,440  147,859,151  2636
                     -m1650 -r1 -o64   18,120,770               2603

The following shows the effects of drt from lpaq9m on enwik8. The first numeric column is the compressed size of enwik8. The second is the compressed size of the uncompressed dictionary (lpqdict0.dic, 465,210 bytes) concatentated with enwik8.drt (61,289,634 bytes) using compressor versions that were current as of June 26, 2010 unless indicated. The ratio shows the improvement due to preprocessing. The dictionary contains 44880 lowercase words. DRT replaces word occurrences with codes of 1 to 3 bytes and uses codes to indicate capitalized words or letters.

Compressor    enwik8   dic+drt   ratio   Options (version)
----------   -------   --------  ------  -----------------
paq8px_v67   18293940  17342041  0.9480  -6
paq8l        18518485  17560378  0.9483  -6
nanozip      18826931  18633832  0.9897  -cc (v0.08a)
lpaq9m       19072743  18077356  0.9478  8
zpaq         19448650  18928856  0.9733  ocmax.cfg
pmm          19701161  18650601  0.9467  (J)
lpaq1        19796957  18905483  0.9550
paq9a        20129573  19374291  0.9625
paq6         20303336  19439547  0.9575  -6
cmm4         20548514  19133313  0.9311  (v0.1e)
zpaq         20941558  19447733  0.9287  ocmid.cfg
nz           20948832  20588807  0.9828  (v0.08a)
bwt.fpaq0f2  21798843  21406906  0.9820
paq1         22156982  21437426  0.9675
bwt.fpaq0p   23809591  22855730  0.9599
grzip        23846878  22379326  0.9385  (0.2.4)
bbb          24576921  22701384  0.9237
zpaq         24837469  21559014  0.8680  ocfast.cfg
tarsalzp     25134862  22773386  0.9060
lzpxj        25251404  21877402  0.8664  8 (1.2h)
p6           25377998  23078246  0.9094
ctw          25453025  24454785  0.9608
7z           25895909  23487746  0.9070  (9.12b)
szip         26120472  24045552  0.9206  -b41 -o16
ppmd         26275353  23448205  0.8924  (J)
ppms         26310248  23824677  0.9055  (J)
dmc          28402672  25532850  0.8990  100000000
cabarc       28465607  25963613  0.9121  -m lzx:21
bzip2        29008758  25612712  0.8829  -9
sr2          30432506  26328768  0.8652
RAR          35107917  30132497  0.8583  -m5 (v2.50)
HA           36379137  30633820  0.8421  (0.98)
gzip         36445248  30902821  0.8479  -9 (1.3.5)
zip          36445470  30903043  0.8479  -9 (2.32)
lzop         41217688  33358696  0.8093  -9 (1.01)
srank        43091439  38492535  0.8933  -C8
fcm1         45402225  29581661  0.6515
compress     45763941  37478724  0.8190
lzrw3-a      48009194  38635335  0.8047
bpe          53906667  41403271  0.7681  5000 4096 200 3
fastlz       54658924  42337322  0.7746
lzrw2        55360907  41854974  0.7560
fpaq0f2      56916872  40415334  0.7101
flzp         57366279  43944882  0.7660
lzrw5        59375192  46019812  0.7751
lzrw1-a      59471657  43184084  0.7261
fpaq0p       61457810  44979267  0.7319
ppp          61657971  44103741  0.7153
fpaq0        63391013  47589951  0.7507
            100000000  61289634  0.6129  (uncompressed)
bwt         100000004  61289638  0.6129  (msufsort 3.1b)

.1500 nanozip

nanozip 0.01a is a free, experimental, closed source GUI and command line archiver by Sami Runsas, July 14, 2008. For these tests, the command line version (smaller executable) was used. It compresses using several algorithms (fastest to best): LZP (options -cf and -cF), LZ77 (-cd, -cD), BWT (-co, -cO, uses 5N block size) and CM (-cc). The uppercase options (-cF, -cD, -cO) compress better but slower than the corresponding lowercase options and may use more memory. The default compression mode is -co (fast BWT). -m1500m selects 1500 MB memory, although the reported memory usage may differ and the actual memory usage (Cmem, Dmem, in MB) measured with Task Manager is usually lower than reported. The program will use less memory depending on available physical memory when run. -forcemem was used to override this. For all tests, -nm was used to turn off checksums and not store timestamps or file permissions. For -cO, the program uses a LZ77 variant (called LZT) instead of BWT for binary files. -txt is an optimization for text files with -co or -cO.

nanozip 0.03a was released July 31, 2008. Only -cc was tested.

nanozip 0.05a was released Oct. 20, 2008. Options are as in 0.01a and include -nm -forcemem.

nanozip 0.06a was released Feb. 13, 2009. Options are as in 0.01a and include -nm -forcemem. w32c creates a self extracting archive (.exe file).

nanozip 0.08a was released June 3, 2010. _64 refers to the Windows 64 bit version. w32c means to produce a self extracting archive. -nm means do not store metadata or redundancy information. -cc selects a context mixing model. -m2.6g means use 2.6 GB memory. enwik8 was tested with -m2g (uses 1670 MB).

nanozip 0.09a was released Nov. 4, 2011. Option w32c selects a self extracting archive, so the decompresser size is 0. Option -p4 runs multithreaded compression on 4 processors. Tested under 64 bit Linux.

Program       Options                enwik8      enwik9     zip size      Total     Comp  Deco  Cmem Dmem (reported) Alg  Note
--------    -----------            ----------  -----------  ---------  -----------  ----  ----  ---- ---- ---- ----  ---  ----
nz 0.01a    -cf                    46,381,713                                         24    24    96       404  404  LZP
            -cf -m1500m            46,381,713  417,351,980  266,797 x  417,618,777    26    31   975  978 1476 1476  LZP
            -cF                    40,733,125                                         62    43   155       404  404  LZP
            -cF -m1500m            40,733,125  359,192,720             359,459,517    63    40  1040 1045 1476 1476  LZP
            -cd                    33,241,150                                        127    28    89       422  402  LZ77
            -cd -m1500m            33,001,952  292,180,617             292,447,414   156    28   768  687 1546 1474  LZ77
            -cD                    29,384,997                                        288    27   282       466  258  LZ77
            -cD -m1500m            29,253,158  258,513,190             258,779,987   323    31  1020  693 1314  994  LZ77
            -co                    21,838,721                                        391   186   333       431  336  BWT
            -co -m1500m            20,503,629  176,470,974             176,737,771   448   221  1667 1160 1810 1294  BWT
            -co -m1500m -txt       20,503,629  170,711,387             170,978,184   336   234  1074 1120 1471 1463  BWT
            -cO                    21,623,801                                        465   247   333       431  266  BWT
            -cO -m1500m            20,306,489  174,770,662             175,037,459   511   269  1378 1135 1810 1294  BWT
            -cO -m1500m -txt       20,306,489  169,092,652             169,359,449   393   280  1074 1274 1471 1463  BWT
            -cO -m1670m -txt       20,306,489  167,509,921             167,776,718   403   284  1170 1325 1633 1625  BWT
            -cc                    18,994,349                                       2975  2910   360       436  435  CM
            -cc -m1500m            18,723,413  152,654,332             152,921,129  3147  3091  1556 1556 1524 1523  CM
nz 0.03a    -cc -m1670m            18,679,094  151,668,563  263,953 x  151,932,516  3058  3003  1700 1700 1700 1699  CM
nz 0.05a    -cf -m1670m            46,381,713                                         18    22   100                 LZP
            -cF -m1670m            40,608,638                                         66    41   164                 LZP
            -cd -m1670m            31,555,257                                         96    29   289                 LZ77
            -cD -m1670m            27,811,031                                        182    35   170                 LZ77
            -co -m1670m            20,499,411                                        351   177   626                 BWT
            -cO -m1670m            20,302,501                                        422   240   642                 BWT
            -cc -m1670m            18,638,419  151,176,555  288,449 x  151,465,004  3032  2975  1668                 CM
nz 0.06a    -co -m1670m            20,499,412                                        250   183   441                 BWT   26
            -cO -m1670m            20,302,502                                        300   243   457                 BWT   26
            -cc -m1670m            18,636,515  151,177,510  336,273 x  151,513,783  2143  2137  1670                 CM    26
            w32c -cc -m1670m       18,754,787  151,295,782        0 xd 151,295,782  2156  2173  1670                 CM    26
nz 0.08a_64 w32c -nm -cc -m2.6g    18,752,842  150,441,103        0 xd 150,441,103  1109  1086  2760                 CM    40
            -cc -m2g               18,623,317  150,375,385  459,607 x  150,834,992  1616        2088                 CM    42
nz 0.09a    w32c -cc -m3g -nm      18,723,846  150,037,341        0 xd 150,037,341  1110  1084  2693                 CM    40
            w32c -cc -m3g -nm -p4              158,107,738        0 xd 158,107,738   299        3124                 CM    40

.1512 xwrt

xml-wrt 2.0 is a free command line file compressor with source available, by Przemyslaw Skibinski, June 19, 2006. It uses LZMA (LZ77 + arithmetic coding) with preprocessing for modeing text, XML tags, dates, and numbers. It may also be used as a preprocessor for input to other compressors. Version 1.0 was strictly a preprocessor without built-in compression.

The -l6 option selects maximum LZMA compression. -b255 selects maximum buffer size of 255 MB for building a dynamic dictionary. -m255 selects maximum memory. -s turns off spaces modeling. -f8 sets the minimum word frequency for dictionary inclusion to 8 (default is 6).

xml-wrt 3.0 (Sept. 14, 2006) includes a stripped-down version of PAQ8 (-l11 option) in addition to LZMA compression.

xwrt 3.2 (Oct. 29, 2007) is a dictionary preprocessor frontend to LZMA, PPMVC and lpaq6 as well as a standalone preprocessor. Option -l14 selects lpaq6 option 9 (1542 MB). -b255 selects 255 MB memory (maximum) for building the dictionary. -m96 selects 96 MB buffer during compression. (Higher values cause out of memory error). -s turns of space modeling. -e40000 limits the dictionary size to 40000 words. -f200 limits the dictionary to words that occur at least 200 times.

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
xml-wrt 2.0  -l6 -b255 -m255 -s -f8           23,199,202  196,914,328     25,354 s  196,939,682    905    70  525 LZ77
xml-wrt 3.0  -l11 -b255 -m255 -f24            19,663,305  165,274,422     40,447 s  165,314,869   4398  4317  416 CM
xwrt 3.2     -l14 -b255 -m96 -s -e40000 -f200 18,679,742  151,171,364     52,569 s  151,223,933   2537  2328 1691 CM

xml-wrt 2.0 and higher and xwrt 3.2 can be used as either a standalone compressor or as a preprocessor to other compressors. The table below shows the best known settings for enwik9 and enwik8 for xml-wrt 3.0 and 2.0 as a preprocessor to ppmonstr var. J, the best known combination for which xml-wrt improves compression. xml-wrt 1.0 is a preprocessor only. See also xml-wrt and xwrt as a standalone compressor.


                                                                         Compressed size      Decompresser  Total size   Time (ns/byte)
Program/options                                                         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------------------------------------------------------------------   ----------  -----------  -----------  -----------  ----- -----  --- ---
xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e20000    | ppmonstr J -m1650 -o10 18,592,499  150,004,636     82,466 sx 150,087,102   3067  2708 1650 PPM
xml-wrt 3.0 -l0 -b255 -m255 -3 -s -e7000     | ppmonstr J -m1650 -o64 18,494,374                  82,466 sx               3500  3340 1650 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m1700 -o10 18,794,295  150,651,873     67,309 sx 150,719,182   2715 ~2650 1700 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e2300  | ppmonstr J -m1650 -o64 18,625,624                  67,309 sx               3550  3360 1650 PPM
xml-wrt 2.0 -l0 -w -s -c -b255 -m100 -e10000 | ppmonstr J -m800 -o8   18,863,790  154,223,582     67,309 sx 154,290,891   2820        800 PPM
xml-wrt 1.0 -f800                            | ppmonstr J -m800 -o8   19,043,178  154,749,585     56,837 sx 154,806,422   2702 ~2700  800 PPM

xml-wrt 1.0 (XML Word Reducing Transform) is a free command line single file preprocessor with source code by Przemyslaw Skibinski, May 10, 2006. It is not intended to compress files by itself (although it does somewhat). Rather, it is intended to improve the compressibility of text and XML files by replacing common words and XML substrings with shorter symbols. (So it is actually LZW with a static dictionary prepended to the output). It improves compression for most programs except for those that already have English text models such as paq8h. Some additional results are shown below for combinations with some other compressors.

                     Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program                Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Notes
-------                -------                     ----------  -----------  -----------  -----------  ----- -----  -----
xml-wrt 1.0|ppmonstr J -f1800 | -m800 -o10         18,965,658  155,066,074     56,837 sx 155,122,911   2905  2809
xml-wrt 1.0|slim23d    -f1800 | -m700 -o12         19,163,987  156,734,571     69,453 x  156,804,024   4702  4717
xml-wrt 1.0|ppmd J1    -f1800 | -m256 -o8 -r1      21,128,019  178,154,529     25,917 s  178,180,446    717   722

The following table shows the compressed size (without decompresser except SFX) of enwik8 before and after the XML-WRT transform with option -f180 for several compressors. A ratio less than 1 means that XML-WRT improves compression.


Program           Options                       enwik8   enwik8.xwrt  Ratio   Alg
-------           -------                    -----------  ----------  ------  ---
paq8h             -7                          17,674,700  18,341,959  1.0378  CM
ppmonstr J        -o10 -m800                  19,338,065  18,886,224  0.9766  PPM
slim23d           -m700 -o10                  19,264,094  18,938,602  0.9830  PPM
WinUDA 2.91       mode 3 (194 MB)             20,332,366  20,859,165  1.0259  CM
ppmd J1           -o10 -m256 -r1              21,388,296  20,945,220  0.9793  PPM
uhbc 1.0          -m3 -b100m                  20,930,838  21,171,204  1.0115  BWT
M03exp            32 MB                       21,948,192  21,583,059  0.9834  BWT
sbc               -ad -m3 -b63                22,470,539  22,216,425  0.9887  BWT
WinRAR 3.60b3     -mc7:128t+ -sfxWinCon.sfx   22,713,569  22,457,785  0.9887  PPM
PX 1.0                                        24,971,871  22,818,070  0.9137  CM
uharc 0.6b        -mx -md32768                23,911,123  22,915,299  0.9583  PPM
chile 0.3d-1      -b=40000                    23,408,335  22,884,519  0.9776  BWT
cabarc 1.00.0601  -m lzx:21                   28,465,607  25,739,214  0.9042  LZ77
WinACE            -sfx -m5                    30,919,182  27,112,651  0.8769
bzip2 1.0.3                                   29,008,758  27,339,845  0.9425  BWT
gzip 1.3.5        -9                          36,445,248  30,403,738  0.8342  LZ77
pkzip 2.0.4                                   36,934,712  30,729,525  0.8432  LZ77
thor 0.9a         ex                          41,670,916  32,586,444  0.7820
compress 4.3d                                 45,763,941  38,485,494  0.8409  LZW
Original size                                100,000,000  52,174,989  0.5217

The -f option (default -f6) selects the minimum word frequency required to have it added to the dictionary. The optimal setting depends on the input size. When used with ppmd or ppmonstr (the best compressors improved by XML-WRT), the optimal settings are about -f180 for enwik8 and -f1800 for enwik9, which results in a dictionary of 7697 words for enwik8 and 6657 words for enwik9. The following table shows the effect of the -f and -o options for ppmonstr -m800 enwik9. The best combination found is -f1800 -o8.

 -f       -o7          -o8          -o9          -o10        -o11         -o12         -o16         -o32
 ---  -----------  -----------  -----------  -----------  -----------  -----------  -----------  -----------
 100                                                                   155,908,621
 200                                                                   155,775,164
 300                                                                   155,653,815
 500               154,884,542               155,367,681  155,465,355  155,547,660
 600               154,787,455                                         155,497,645
 800               154,749,585
1000  154,909,136  154,794,501  154,951,751  155,122,278  155,306,526  155,409,926  155,948,066  157,901,320
1500  155,092,513  154,895,455  154,999,654  155,073,186  155,306,526  155,301,322
1800  155,191,178  154,924,936  155,036,534  155,066,074  155,366,281  155,297,828
2000               154,998,528                                         155,296,112
3000                                                                   155,379,959

The following table shows that the optimal setting for -f is lower for smaller files (with ppmd):

              Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  
-------         -------         ----------  -----------  -----------  -----------  ----- -----  
  xml-wrt 1.0   -f1800         (70,826,140)(532,089,443)   (14,818 s)(532,104,261)  (115) (103)
+ ppmd J        -m256 -o8 -r1   21,128,019  178,154,529     41,653 sx 178,196,182    712   723
  xml-wrt 1.0   -f180          (52,174,989)(468,964,104)   (14,818 s)(468,978,922)  (113) (103)
+ ppmd J        -m256 -o8 -r1   20,910,527  178,215,315     41,653 sx 178,256,968    690   699
ppmd J          -m256 -o10 -r1  21,388,296  183,964,915     26,835 x  183,991,750    880   895

The default values of -s (disable spaces model) and -t (disable try smaller word) appear to work best on this data.

xml-wrt -f1800 enwik9 | ppmonstr -m800 -o12
-------------------------------------------
(default)   154,924,936
-s          155,040,558
-t          155,421,035
-s -t       155,542,575

xml-wrt 2.0 released June 14, 2006 (updated June 19, 2006) has additional transform options, and also includes LZ77 (zlib) and LZMA (LZ with arithmetic coding) compression. When used as a preprocessor, this compression is turned off. enwik9 was compressed using the options:

  xml-wrt -l0 -w -s -c -b255 -m100 -e10000 enwik9
  ppmonstr e -o8 -m800 enwik9.xwrt

The option -l0 turns off compression. -w turns off word containers. -s turns off space modeling (this hurts compression in version 1.0 but helps in 2.0). -c turns off word and number containers (independent of -w and -n. -n hurts compression). -b255 sets memory for the dictionary to 255 MB, the maximum. -m100 sets the memory buffer to 100 MB, which is not maximum (255 MB), but larger values hurt compression. -e10000 sets the dictionary size to 10000 words. (The dictionary size can also be controlled with -f as in version 1.0, but using -e is less dependent on input size so it helps with enwik8). Additional tests showing the effects of -e, -m, and -o:

xml-wrt 2.0 options                ppmonstr J     enwik9
--------------------------------   ----------   -----------
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o8    154,223,582
-l0 -w -s -c -b255 -m100 -e8000  | -m800 -o8    154,234,621  (smaller -e)
-l0 -w -s -c -b255 -m100 -e12000 | -m800 -o8    154,239,769  (larger -e)
-l0 -w -s -c -b255 -m50  -e10000 | -m800 -o8    154,259,117  (smaller -m)
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o7    154,322,272  (smaller -o)
-l0 -w -s -c -b255 -m150 -e10000 | -m800 -o8    154,426,554  (larger -m)
-l0 -w -s -c -b255 -m100 -e10000 | -m800 -o9    154,445,811  (larger -o)

The optimal values of -w -c -s -n (turn off number containers) and -t (turn off try shorter words) was determined on enwik7 and enwik8 but not tested on enwik9.

A bug fix for LZMA compression, released June 19, 2006, does not change any values for the June 14, 2006 version (using the -l0 option). However the compressed source code increases from 25,290 bytes to 25,354 bytes. The June 14 version is no longer published. The URL is unchanged.

xml-wrt 3.0 (Sept. 14, 2006) option -3 means to optimize the default settings for PPM compressors. Version 3.0 also has a FastPAQ8 compressor for standalone compression which was tested separately.

xwrt 3.2 (see below) with ppmonstr J has the following results.

xwrt 3.2 options        ppmonstr J opt    enwik8      enwik9        program size      total        Comp    Decomp   Mem
----------------------  --------------  ----------  -----------  -----------------  -----------  --------  ------- ----
-2 -b255 -m255 -s -f64   -o10 -m1650    18,456,706  148,915,761  52,569s + 26,835x  148,995,165  475+2512  43+2503 1650
-2 -b255 -m255 -s -f64   -o64 -m1650    18,397,126                                               210+2810  50+2884 1527

ppmonstr option -o64 is optimal for enwik8, but -o10 is optimal for enwik9. -m1650 selects 1650 MB memory. xwrt option -2 optimizes for PPM. -b255 selects buffer size 255 MB for building the dictionary. -m255 selects 255 MB memory buffer. -s turns off space modeling. -f64 sets minimum word frequency for the dictionary to 64. Program size and times are xwrt + ppmonstr. Memory usage is 512 MB for xwrt, 1650 MB for ppmonstr.

.1532 fp8_v3

fp8 v1 (fast paq) is a free, open source archiver by Jan Ondrus, May 2, 2010. It is derived from pax8px_v68. It has fewer models than paq8px for better speed but retains the models for wav, bmp, and jpg. The option -8 selects maximum memory.

fp8 v2, Apr. 10, 2012, has some modeling improvements.

fp8 v3, May 13, 2012, has some more compression improvements (at a slight cost in speed) and a JPEG bug fix.

tangelo 1.0, June 17, 2013, is a single-file compressor based on fp8. It removes specialied models and preprocessors for exe, bmp, wav and jpeg types. It takes no options. It uses fixed memory of 567 MB, equivalent to fp8 -7.

tangelo 2.0, July 6, 2013, removed some models and made other simplifications for better speed and less memory but worse compression.

tangelo 2.1, July 20, 2013, faster with less compression.

tangelo 2.3, July 22, 2013, re-added APM for better compression, and minor changes for better speed.

Program   options    enwik8      enwik9    program size      total    Comp  Decomp Mem  Alg Note
-------   -------  ----------  ----------  ------------  -----------  ----- ------ ---- --- ----
fp8 v1       -8    18,573,126                  49,865 s               20010        1150 CM   26
fp8 v2       -8    18,556,327  154,359,664     49,964 s  154,409,626  19059  21196 1192 CM   26
fp8 v3       -8    18,438,169  153,188,176     50,068 s  153,238,244  20605  22593 1192 CM   26
tangelo 1.0        18,593,738  156,355,536      8,365 s  156,363,901  19849  19977  567 CM   26
tangelo 2.0        20,202,547  171,678,313      6,275 s  171,684,588   6028   6007  362 CM   26
tangelo 2.1        21,021,150  179,879,607     11,320 s  179,890,927   2275   2262  361 CM   26
tangelo 2.3        20,921,619  178,497,116     11,687 s  178,508,803   2172   2194  361 CM   26

.1563 WinRK

WinRK 3.0.3 is a commercial GUI archiver by Malcolm Taylor (Mar. 6, 2006). It is top ranked on some benchmarks. Unfortunately it is not available for free download (as of May 16, 2006). The "free trial" expires as soon as you install it. (Update, Sept. 11, 2006: versions 3.0.2 and 3.0.3 are no longer available for download. They appear to have been withdrawn last month). WinRK in PWCM mode (Paq Weighted Context Modeling) is based on the paq7/8 algorithm with text dictionary preprocessing and specialized models for wav, bmp, and exe files. Version 3.0.2 was based on the earlier paq6 algorithm which uses adaptive linear model mixing rather than a neural network which mixes bitwise predictions from models in the logistic (log p/(1-p)) domain. The +td and -td options turns English dictionary preprocessing on or off respectively. 800MB selects the memory limit. When not specified, PWCM appears to allocate all available memory except leaving 8 MB.

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp Mem   Alg  Notes
-------           -------                     ----------  -----------  -----------  -----------  ----- ----- ----  ---  -----
WinRK 3.03        PWCM (800MB +td)            18,612,453  156,291,924  3,017,362 x  159,309,286  68555             CM   10
WinRK 3.03        PWCM                        18,612,551  156,349,910  3,017,362 x  159,367,272 102973~90000       CM    9
WinRK 3.03        FPW1 (800MB +td)            19,035,564                                         24950                  10
WinRK 3.03        PWCM (800MB -td)            19,060,620                                         88310             CM   10
WinRK 3.03        Efficient                   21,157,165                                          5380             PPM  10
WinRK 3.03        Normal (PPMd)               22,322,981                                           620             PPM  10
WinRK 3.03        PWCM (800MB +td)            18,612,453  156,291,924     99,665 xd 156,391,589  68555        800  CM   10
WinRK 3.03 x64    PWCM (2047MB +td o28)       18,101,637  150,481,300                                        3053  CM   42

RK and RKC are predecessors of WinRK so I don't plan to test them.

.1570 ppmonstr, ppmd, ppms

ppmonstr, ppmd, and ppms var. J are free command line file compressors by Dmitry Shkarin (model) and Dmitry Subbotin (range coder), Feb. 16, 2006. (ppms on Feb. 21, 2006). ppmonstr is a slower, experimental version of ppmd with better compression. Source code is available for ppms and ppmd but not ppmonstr. ppms is a small memory (1 MB) version of ppmd. They all use PPMII (PPM with information inheritance). The -m256 option selects 256 MB memory (maximum for ppmd). The -o10 option selects PPM order 10. (Higher orders use up memory faster which hurts compression). When ppmd runs out of memory, it discards the model and starts over. The -r1 option (default in ppmonstr) tells ppmd to back up and partially rebuild the model before resuming compression. The default options for ppmd are -m10 -o4 -r0 which are designed for reasonably good compression with high speed and low memory usage (see table below).

ppms accepts only options -o2 through -o8. The default is -o5. This also gives the best compression on enwik8. Task Manager shows 1.8 MB memory used.

              Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Note
-------         -------         ----------  -----------  -----------  -----------  ----- -----  ----  
ppmonstr J      -m1700 -o16     19,055,092  157,007,383     42,019 x  157,049,402   3574 ~3600
ppmonstr J      -m800 -o16      19,230,657  161,496,685     42,019 x  161,538,704   3783 ~3800
ppmonstr J      -m1863 -o16     19,040,451  156,578,769     42,019 x  156,620,788               42
ppmd J          -m256 -o10 -r1  21,388,296  183,964,915     11,099 s  183,976,014    880   895
ppmd J          -m10 -o4 -r0    26,275,353  236,509,791     11,099 s  236,520,890    194   206
ppms J          -o5             26,310,248  233,442,414     16,467 x  233,458,881    330   354
                -o2             36,866,748                                           102
                -o3             30,242,535                                           135
                -o4             27,030,761                                           246
                -o6             26,644,863                                           449
                -o7             27,028,318                                           492
                -o8             27,343,283                                           532

ppmd was updated to J1 on May 10, 2006 to fix a bug. Compression benchmarks are unchanged except the size of the compressor (11,099 bytes as zipped source code). ppmonstr is unchanged.

.1598 slim

slim 23d is a free, closed source command line archiver by Serge Voskoboynikov, Sept 21, 2004. It uses a PPMII core (ppmd/ppmonstr) by Dmitry Shkarin with filters for special file types including text. The -m700 option selects 700 MB of memory. (I found -m800 causes disk thrashing at 1 GB). The -o10 option selects order 10 PPM. (-o12 and -o16 caused slim to fail on enwik9, creating an empty archive and exiting after about 60% completion with 1 GB. Smaller files were OK. There was no error with 2 GB).

As with other PPM compressors (ppmd, ppmonstr), using a higher order improves compression but consumes memory faster. For enwik8, -o32 is optimal with 700MB available, but lower orders are better for enwik9.

              Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  
-------         -------         ----------  -----------  -----------  -----------  ----- -----  
slim23d         -m1700 -o12     19,077,276  159,772,839     69,453 x  159,842,292   5232 ~5400
slim23d         -m700 -o32      19,226,339  (failed)        69,453 x                6530  6770
slim23d         -m700 -o10      19,264,094  162,529,098     69,453 x  162,598,551   5175  5360

.1605 bwmonstr

bwmonstr 0.00 is a free, experimental, closed source file compressor by Sami Runsas, Mar. 10, 2009. It uses BWT. The program takes no options. It loads the input file into a single block and allocates 1.25 times the block size in memory for either compression or decompression. Thus, it is able to transform enwik9 in a single block.

bwmonstr 0.01 was released Mar. 18, 2009.

bwmonstr 0.02 was released July 8, 2009. It uses a compressed representation internally, thus memory usage is less than the 1 GB block size. It compresses the entire input file in a single block and requires enough memory to hold the file. The program is multi-threaded even on a single block. Times shown are for a single core processor, but would be faster on a multi-core processor. reorder2 is an alphabet reordering program by Eugene Shelwien. drt is the dictionary preprocessor from lpaq9m by Alexander Rhatushnyak

              Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp  Decomp  Mem Alg Note
-------         -------         ----------  -----------  -----------  -----------  -----  ----- ---- --- ---- 
bwmonstr 0.00                   20,401,888  161,249,951     27,772 x  161,277,723  15638  13028 1224 BWT  26
bwmonstr 0.01                   20,379,365  161,026,258     32,163 x  161,058,420  15695  14135 1224 BWT  26
bwmonstr 0.02                   20,307,295  160,468,597     69,401 x  160,537,998 331801 156147  590 BWT  30
reorder2|bwmonstr 0.02          20,229,555                                                       590 BWT  30
drt|bwmonstr 0.02               19,750,461                                                       450 BWT  30

.1610 zcm

zcm v0.01 (discussion) is a free, experimental, closed source compressor for 32 bit Windows by Nania Francesco Antonio, Dec. 16, 2011. It uses context mixing. Commands c1 through c7 select memory usage for compression. Decompression uses the same memory. c7 uses the most memory and gets the best compression.

zcm v0.02 was released Dec. 23, 2011.

zcm v0.03 was released Dec. 28, 2011.

zcm v0.04 was released Jan. 30, 2012. (Program banner says v0.03).

zcm v0.11 was released Feb. 19, 2012. It is described as mixing 6 contexts. It detect file type and uses exe, delta, and LZP preprocessors. It has separate models for text and binary data. Speed and memory usage are the same for compression and decompression. Commands c0 through c7 select memory usage. Each increment doubles memory, resulting in better compression. Memory is used slowly as the program runs up to a maximum value which is not reached on enwik8 for c5 and higher. For enwik8, c7 uses 1286 MB rather than 1716 MB.

zcm 0.20b was released Apr. 4, 2012. It is an archiver rather than a single file compressor. Option -m7 selects maximum memory usage (range 32 MB to 1.7 GB).

zcm 0.30 was released May 2, 2012.

zcm 0.40 was released May 16, 2012. It is described as using CM with 6 contexts, a mixer, and one re-mixer (APM or SSE) to adjust the mixer output. It uses LZP preprocessing.

zcm 0.50a was released June 2, 2012.

zcm 0.60d adds multithreading and other improvements. The -t option selects the number of tasks. -t0 auto-detects the number of cores, which is equivalent to -t2 on the dual core test machine (T3200, 3 GB). The default is -t1. The -m option selects memory usage from -m1 (46 MB per task) to -m7 (1.6 GB per task). The default is -m4. Parallel compression is performed by separate processes that can independently access 2 GB of memory each in 32 bit Windows. When run with -t2, there is also a third task using 5 MB of memory. All three tasks saturate one CPU core each. It was found that -t2 makes compression worse (probably by splitting the input in half and compressing each separately) and is not much faster than -t1. The -t option can also be given during extraction. If the archive was compressed with -t2 then extraction with -t2 doubles memory usage but only improves speed slightly. If compressed with -t1 then extraction with -t2 is 4 seconds slower for enwik8 than with -t1 because the extra task exits immediately and the third 5 MB task continues to run.

zcm 0.70b was released Oct. 14, 2012.

zcm 0.80 was released May 15, 2013. It was tested in Linux under Wine. When -t2 was used to compress in 2 threads, it was also used to extract.

zcm 0.88 (discussion) was released June 21, 2013. It was tested both in Windows and in Linux under wine.

zcm 0.90 was released May 3, 2014.

zcm 0.92 was released May 16, 2014. A 64 bit Windows version was released July 3, 2014. It supports the undocumented -m8 option using up to 3 GB memory.

Program      Option     enwik8      enwik9       Prog      Total      Comp  Deco  Mem   Note
---------    ------   ----------  -----------  --------  ---------    ----  ----  ----  ----
zcm v0.01    c1       23,914,413                                      2260  2730    35   26
             c7       20,093,284  169,397,795  47,975 x  169,445,770  2965  2883  1486   26
zcm v0.02    c7       20,277,130  170,848,574                         2419  2396  1470   26
zcm v0.03    c7       20,159,212  169,368,119  27,589 x  169,395,708  2416  2369  1476   26
zcm v0.04    c7       20,853,133  173,956,638  27,731 x  173,984,369  1462  1459  1520   26
zcm v0.11    c0       23,963,073                                      1230  1210    22   26
             c1       22,937,669                                      1280          35   26
             c2       22,076,074                                      1290          62   26
             c3       21,362,445                                      1330         115   26
             c4       20,810,077                                      1370         222   26
             c5       20,447,150                                      1390         401   26
             c6       20,215,116                                      1400         697   26
             c7       20,078,151  165,518,908  31,576 x  165,550,484  1275  1190  1716   26
zcm 0.20b    -m7      20,204,267  167,177,534 161,122 x  167,338,656  1199  1204  1657   26
zcm 0.30     -m7      20,237,368  167,198,948 161,558 x  167,360,506   949   970  1720   26
zcm 0.40     -m7      20,200,819  167,138,719 161,502 x  167,300,221   904   929  1511   26
zcm 0.50a    -m7      19,966,605  164,661,654 161,614 x  164,823,268   947   971  1579   26
zcm 0.60d    -m7 -t1  19,786,363  162,731,120 171,517 x  162,902,637   915   960  1662   26
             -m1 -t1  23,374,636                                       890   920    46   26
             -m1 -t2  23,440,140                                       830   910    97   26
             -m4 -t1  20,698,415                                       950  1000   226   26
             -m4 -t2  20,925,875                                       940   990   389   26
             -m6 -t1  19,933,151                                      1030  1050   651   26
             -m6 -t2  20,359,596                                      1070   990  1160   26
             -m7 -t2  20,267,309                                      2080  1130  2450   26
zcm 0.70b    -m7 -t1  20,065,306  166,373,795 159,493 x  166,532,988   870   884  1412   26
zcm 0.80     -m7 -t1  19,937,741  164,724,585 110,565 x  164,835,150   552   557  1700   48
             -m7 -t2  20,554,326  166,468,556 110,565 x  166,579,121   414   415  1990   48
zcm 0.88         -t1  21,383,928                                       940   930   196   26
                 -t2  25,767,005                                       800   820   120   26
             -m7 -t2  20,418,171                                      1110   890  1400   26
             -m7 -t1  19,970,859  164,702,310 162,136 x  164,864,446   910   891  1434   26
             -m7 -t1  19,970,859  164,702,310 162,136 x  164,864,446   546   527  1434   48
zcm 0.90     -m7 -t1  20,006,179  165,266,797 164,361 x  165,431,158   511   516 ~1700   48
zcm 0.92     -m7 -t1  19,803,545  163,246,657 166,763 x  163,413,420   500   512  1546   48
zcm_x64 0.92 -m7 -t1  19,803,545  163,246,657 225,205 x  163,471,862   488   471  1549   48
             -m8 -t1  19,700,970  160,848,578 225,205 x  161,073,783   489   474  2400   48

.1617 nanozipltcb

nanozipltcb is a free file compressor by Sami Runsas, July 25, 2008. It uses BWT. It takes no options. It is a customized version of nanozip, similar to -cO -txt -m1700m, but tuned to this benchmark. Files compressed with nanozipltcb are not compatible with nanozip.

nanozipltcb 0.08, Mar. 3, 2010. is multithreaded and has other optimizations. Size is based on a self extracting archive. Only a 64 bit Windows version exists. Tested by the author on a quad core Q6600 at 3.0 GHz. The older version is withdrawn.

nanozipltcb 0.09, was relased May 10, 2010. It has only a 64 bit Linux executable version.

             Compression            Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options             enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------       ----------------   ----------  -----------  -----------  -----------  ----- -----  --- --- ----
nanozip 0.01  -cO -m1670m -txt   20,306,489  167,509,921    266,797 x   167,776,718   403   284 1325 BWT
nanozipltcb                      20,494,670  166,251,135    239,124 x   166,490,259   348   185 1729 BWT
nanozipltcb 0.08                 20,626,962  166,571,051          0 xd  166,571,051    93    53 1729 BWT 37
nanozipltcb 0.09                 20,537,902  161,581,290    133,784 x   161,715,074    64    30 3350 BWT 40

.1637 M03

M99 (mirror) is a free file compressor by Michael Maniscalco, originally written in 1999 and ported to Windows on Mar. 27, 2007. It uses BWT, based on MSufSort 3.1. M99 is a predecessor to M03. Command line is:

M99.exe e|d -switches blocksize input output 

switches are:
-r = post BWT run length encoding
-a = arithmetic coding instead of M99 style bit packing
-f = fast mode
-m = max compression mode (implies -a).
Blocksize can be specified in bytes (like 10000), kb, mb etc as 100m or 100k. Memory requirement for compression is 6 times the blocksize maximum, although in most cases only a little over 5 times blocksize is used. Blocksize 239m divides enwik9 into 4 approximately equal parts and requires about 1500 MB memory.

Version 2.1 was released Apr. 19, 2007.

M99 2.2.1, released July 18, 2008, has an optimization to compress the contents of TAR files separately. For other files, it increases the size by 1 byte.

M03 v0.2a, Oct. 10, 2009, takes just one option, which is the block size in bytes. Memory usage is 6x block size for compression and 5x for decompression.

M03 v1.1 beta was released Oct. 24, 2011 for 64 bit Windows. It includes some new, fully parallel suffix sorting and BWT construction algorithms. The option 1000000000 specifies a single block requiring 5 GB memory to compress or decompress.

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options        enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------      ----------  -----------  -----------  -----------  ----- -----  --- --- ----
M99               e -m 239m    21,431,211  180,477,144     67,697 x  180,544,841    674   496 1500 BWT
M99 v2.1          e -m 239m    21,251,170  178,910,174     68,052 x  178,978,226    713   535 1500 BWT
M99 v2.2.1        e -m 239m    21,251,171  178,910,175     72,245 x  178,982,420    704   520 1500 BWT
M03 0.2a          e 250000000  20,713,383  173,944,553     95,699 x  174,040,252    868   624 1470 BWT  26
M03 1.1b          e 1000000000 20,710,197  163,667,431     50,468 x  163,717,899    457   406 5735 BWT  52

.1639 bcm

bcm 0.03 (discussion) is a free command line compressor by Ilia Muraviev, Feb. 9, 2009. It uses BWT with a fixed block size of 32 MB and an order 0 CM back end. It takes no command line options.

bcm 0.04 (discusion) was released Feb. 11, 2009. It increases the block size to 64 MB and has modeling improvements including interpolated SSE.

bcm 0.05 (discussion) was released Mar. 5, 2009. The option -b327680 selects 327680 KB block size. It uses 5x block size memory.

bcm 0.07 (discussion) was released Mar. 15, 2009.

bcm 0.08 (discussion) was released May 31, 2009. The command e370 means to use a block size of 370 MB. Memory usage is 5 times block size. Larger values gave an "out of memory" error under 32 bit Windows Vista with 3 GB memory. reorder v2 (discussion) is an alphabet reordering preprocessor for BWT compressors by Eugene Shelwien, May 26, 2009. xlt is a pair of 256 byte files that defines the alphabet permutation used by reorder, released June 4, 2009 by Eugene Shelwien.

bcm 0.09 (discussion) was released Aug. 19, 2009. Option -b328 selects a block size of 328 MB. Memory usage is 5 times block size for both compression and decompression.

bcm 0.10 x64 x86 was released Dec. 11, 2009. Discussion The x64 version is for 64 bit Windows. The x86 version is for 32 bit Windows. The -b option gives the block size in MB. Memory usage is 5x block size.

bcm 0.11 (discussion) was released June 22, 2010. It is described as a complete rewrite.

bcm 0.12 (discussion) was released Oct. 31, 2010. A 64 bit version was tested by the author with -b1000 on June 1, 2011.

bcm 0.14 (discussion) was released June 22, 2013. Only a 64 bit Windows version was released. Command c1000 means to compress in 1000 MB blocks.

                  Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program             Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------             -------       ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
bcm 0.03                          22,007,655  192,194,478     67,988 x  192,262,466    517   437  164 BWT  26
bcm 0.04                          21,450,604  185,368,446     69,553 x  185,455,999    578   486  329 BWT  26
bcm 0.05            -b327680      20,770,671  172,180,796     69,040 x  172,249,836    684   535 1642 BWT  26
                    -b406991                  171,857,720     69,040 x  171,926,760              2030 BWT  27
bcm 0.07            -b327680      20,770,673  172,180,037     60,990 x  172,241,027    818   578 1642 BWT  26
                    -b488282                  169,396,680     60,990 x  169,457,670    472   341 2440 BWT  28
bcm 0.08            e370          20,744,613  171,891,509     61,666 x  171,953,175    948   709 1900 BWT  26
                    e477          20,744,613  169,179,098     61,666 x  169,232,764    545   418 2385 BWT  28
reorder_v2|bcm 0.08 e477          20,677,205  168,694,909     80,149 x  168,775,058    548   422 2385 BWT  28
reorder_V2|bcm 0.08 e477 xlt      20,665,536  168,598,121     80,661 x  168,678,782    552   420 2385 BWT  28
bcm 0.09            -b328         20,625,697  170,913,486     63,704 x  170,977,190   1342  1053 1652 BWT  26
bcm 0.10 x86        -b370         20,811,710  172,570,245     63,788 x  172,634,033    758   483 1899 BWT  26
bcm 0.10 x64        -b512                     169,871,532     72,366 x  169,943,898    362       2560 BWT  35
bcm 0.10 x64        -b477                     169,843,006     72,366 x  169,915,372    522   373 2500 BWT  36
bcm 0.11            -b328         20,773,468  172,267,889     70,936 x  172,338,825    798   548 1552 BWT  26
                    -b477         20,773,468  169,466,640     70,936 x  169,537,576    611   423 2500 BWT  43
bcm 0.12            -b328         20,825,972  172,665,135     61,874 x  172,727,009    637   414 1683 BWT  26
                    -b1000        20,825,972  164,654,285     61,974 x  164,716,259    281   214 5000 BWT  50
bcm 0.14            c1000         20,736,614  163,885,873     74,569 x  163,960,442    162   153 5000 BWT  60

.1640 bsc

bsc 1.00 x86 x64 is a free, experimental file compressor by Ilya Grebnov, Apr. 7, 2010. It uses BWT with LZP preprocessing. The option -b1000t selects a block size of 1000 MB and turns off multithreading (parallel compression on multiple cores). Memory requirements is 6x block size times number of threads. Multithreading was turned off (-t) for both compression and decompression in order to maximize compression. Nevertheless, compression shows CPU utilization of 109% on 2 cores even with -t set. -p turns off LZP preprocessing. -m2 selects a sort (Schindler) transform of order 5.

Other options select LZP table size (default 218 bytes, range 10..28), LZP match length (default 128, range 4..255), block sorting algorithm (default BWT, possible order 4 or 5 sort (Schindler) transform), and preceding or following context for sorting (default following). Only the defaults were tested, which may not be optimal. There are two versions: x86 for 32 bit Windows with a 2 GB memory limit, and x64 for 64 bit Windows with no memory limit. Notes apply to enwik9. enwik8 size is tested as in note 26.

bsc 1.03 x86 and x64 (discussion), Apr. 11, 2010, are bug fixes that do not change results except for the size of the program. The x64 version is 276,292 bytes.

bsc 2.00, May 3, 2010, is available with source code licensed under LGPL.

bsc 2.20, June 15, 2010, has speed improvements for multi-core support. -b1000p means use 1000 MB block size (-b1000, requires 5 GB memory) with no preprocessing (-p). -b80p uses 80 MB block size with no preprocessing. -m2f means use sort transform order 5 (-m2) and fast compression (-f). enwik8 was tested as in note 26 on bsc-x32 replacing -b1000p with -b100p.

bsc 2.26, July 26, 2010, has some speed improvements but retains compatibility with version 2.25. -b328 selects a block size of 328 MB, which divides enwik9 into 3 blocks. This is the fewest number of blocks supported by the x86 version because of a 2 GB process limit. The x64 version does not have this limit but requires 64 bit Windows. -t disables parallel block processing, which would double the memory requirement. -T disables all multicore processing. This gives a smaller compressed size but is slower than -t. -T or -t must be specified during decompression to prevent an out of memory error. With -t, CPU usage is 156% for compression and 129% for decompression on a dual core T3200 (2 GHz, 3 GB, Vista 32 bit).

bsc 2.4.5, Jan. 3, 2011, improves the speed of decompression. It remains compatible with the previous version.

bsc 2.5.0, Mar. 20, 2011, had no significant changes for the tests performed. Minor performance enhancements. CRC32 is replaced with Adler32.

bsc 3.0.0, Aug. 27, 2011 adds experimental NVIDEA (CUDA) GPU acceleration for forward sort transforms ST5 through ST8. ST7 and ST8 are GPU only. There are 32 and 64 bit versions. For the test shown, the 64 bit version was used. -b32 means to select 32 MB block size, -p disables preprocessing, -m8 selects order 8 sort transform, and -f selects fast compression. The test machine is a Core-i7 2600K (4 cores, 8 threads, 8 MB cache) overclocked from 3.4 GHz to 4.6 GHz, with a 384 CUDA processor GeForce 560Ti GPU, overclocked from 822 MHz to 900 MHZ, with 2000 MHz memory speed. Compression takes 8.705 seconds using 1129 MB CPU memory and about 1 GB GPU memory. Decompression uses only the CPU, taking 18.595 seconds using 1395 MB memory.

bsc 3.1.0 was released July 8, 2012.

              Compression       Compressed size      Decompresser  Total size   Time (ns/byte)
Program        Options        enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
-------        -------      ----------  -----------  -----------  -----------  ----- ----- ---- ---  ----
bsc-x64 1.00   -b1024       20,769,550  163,820,253    274,197 x  164,094,450    311   212 6000 BWT  34
bsc-x64 1.00   -b1000 -p    20,787,437  163,882,152    274,197 x  164,156,349    271   209 6000 BWT  38
bsc-x86 1.00   -b250 -t     20,769,550  174,337,692    258,824 x  174,596,616    473   276 1504 BWT  28
bsc-x64 1.00   -b79 -p -m2  22,864,952  200,607,811    274,197 x  200,882,008     36    71 1896 ST5  39
bsc-x86 1.03   -b250 -t     20,769,550  174,337,692    261,058 x  174,598,750    470   280 1504 BWT  28
bsc-x64 2.00   -b1000p      20,789,147  163,888,465    122,581 s  164,011,046    237   199 5095 BWT  39
bsc-x64 2.20   -b1000p      20,789,228  163,888,858    149,153 s  164,038,011    238    93 5095 BWT  39
               -b80p -m2f   23,031,164  201,321,919    149,153 s  201,471,072     27    68 1624 ST5  39
bsc-x86 2.26   -b328 -t     20,774,446  171,826,969    138,293 s  171,965,262    386   183 1667 BWT  28
               -b328 -T     20,772,543  171,820,075    138,293 s  171,958,368    438   274 1663 BWT  28
bsc-x86 2.45   -b328 -t     20,774,446  171,826,969    130,327 s  171,957,296    382   141 1667 BWT  28
               -b328 -T     20,772,543  171,820,075    130,327 s  171,950,402    443   195 1667 BWT  28
bsc-x86 2.50   -b328 -t     20,774,446  171,826,969    129,593 s  171,956,562    398   139 1670 BWT  28
               -b328 -T     20,772,543  171,820,075    129,593 s  171,949,668    444   195 1670 BWT  28
bsc-x64 3.00   -b32p -m8f   22,461,680  196,398,933    934,176 x  197,333,109      8    18 3129 ST8  51
bsc-x86 3.10   -b328 -T     20,920,018  173,026,090    241,476 s  173,267,566    390   149 1712 BWT  28                

.1640 bbb

bbb ver. 1 is a free, open source (GPL) command line file compressor by Matt Mahoney, Aug. 31, 2006. It uses a memory efficient BWT allowing blocks up to 80% of available memory. The transformed data is compressed with an order 0 PAQ like model: the previous bits of the current byte are mapped first to a bit history, then through a 6 level probability correcting adaptive chain before bitwise arithmetic coding.

The m1000 command selects 1000 MB block size. Thus, enwik9 is suffix sorted in one block. This is accomplished by sorting 16 smaller blocks, writing the pointers to 4 GB of temporary files, and merging them. The inverse transform is done in memory without building a linked list. Rather, the next position is found by looking up the approximate location in an index of size n/16 and finding the exact location by linear search.

bbb.exe Win32 executable compiled with MinGW g++ 3.4.2 and UPX 1.24w.

  g++ -Wall -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o bbb.exe
  upx bbb.exe

bbb Linux executable, supplied by Phil Carmody (Aug. 31, 2006). Compiled with g++-4.1 -Wall -O2 -o bbb bbb.cpp; strip bbb

bbb has a faster mode for both compression and decompression that does a "normal" BWT using 5x blocksize in memory. Output format is the same for fast and slow mode for both compression and decompression. A file compressed in fast mode can be decompressed in slow mode on another computer with less memory, and vice versa. The mode has no effect on the compressed file contents.

Recommended usage for best compression: For files smaller than 20% of available memory, use fast mode and one block. For example, if you have 1 GB memory (800 MB available under Windows) and foo is 100 MB:

  bbb cfm100 foo foo.bbb  (c = compress, f = fast, m100 = 100 MB blocks)
  bbb df foo.bbb foo.out  (d = decompress, f = fast)
If the file is 20% to 80% of available memory, use one block in slow mode. If foo is 500 MB:
  bbb cm500 foo foo.bbb
  bbb d foo.bbb foo.out
If the file is over 80% of memory, use 80% of memory as the block size in slow mode. If foo is 1 GB:
  bbb cm640 foo foo.bbb
  bbb d foo.bbb foo.out
The model requires about an additional 6 MB that should be subtracted from available memory.

bbb results by block size are shown below. Gain is the compression improvement obtained by using a larger block size. Gain(blocksize) is defined as C(blocksize/10)/C(blocksize) - 1 where C(x) means the compressed size of enwik9 with block size x. Compression times are fast modes for block sizes 10 through 108 and slow mode for 109 on a 2.2 GHz Athlon-64 with 2 GB memory under WinXP Home SP2.

Block   enwik8      enwik9     Gain  Comp ns/b
----  ----------  -----------  ----  ----
101   66,414,034  646,449,572        4359
102   56,241,619  542,912,447  .191  2169
103   45,500,201  435,597,745  .246  1907
104   37,006,646  343,663,203  .267  1802
105   30,946,413  275,172,983  .249  1838
106   26,661,555  233,555,297  .178  2095
107   23,460,457  204,355,672  .142  2499
108   20,847,290  182,162,626  .122  3106
109   20,847,290  164,032,650  .110  4524

.1644 mcm

mcm v0.0 is a free, experimental, closed source file compressor by Mathieu Chartier, June 4, 2013. It uses CM. Options -1 ... -9 select 8 MB to about 1500 MB memory.

mcm v0.2, June 11, 2013, has automatic detection of text and binary files with UTF modeling in text mode and sparse models in binary mode, an improved match model, and cache optimizations.

mcm v0.3 was released June 17, 2013.

mcm 0.4 was released as open source on July 17, 2013. To test, it was compiled with g++ 4.8.0 using the supplied make.bat file.

        Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program  Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------  -------    ----------  -----------  -----------  -----------  ----- -----  --- --- ----
mcm v0.0   -9       19,842,740  166,276,589    116,198 x  166,392,787   1425  1449 1447 CM   26
mcm v0.2   -9       19,768,502  165,480,329    137,308 x  165,617,637   1453  1468 1451 CM   26
mcm v0.3   -9       19,707,487  164,464,527    122,205 x  164,586,732   1387  1435 1452 CM   26
mcm v0.4            19,858,418                                          1718  1623  735 CM   26
           -9       19,762,418  165,009,983     43,479 s  165,053,462   1552  1494 1457 CM   26

.1652 paq9a

paq9a is a free, open source, command line archiver by Matt Mahoney, Dec. 31, 2007. It is a context mixing compressor with an LZP preprocessor to improve speed for highly redundant files. Matches to a context length of 12 or more are coded as 1 bit, and literals as 9 bits. Context mixing differs from paq8 in that it uses a chain of 2-input mixers rather than one mixer with many inputs. It mixes sparse order-1 contexts with gaps of 3, 2, 1, 0, then orders 2 through 6, then text word orders 0 and 1. Option -9 selects maximum memory.
        Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program  Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------  -------    ----------  -----------  -----------  -----------  ----- -----  --- ---
paq9a    -9         19,974,112  165,193,368     13,749 s  165,207,117   3997  4021 1585 CM

.1662 uda

uda 0.300 is a free, experimental file compressor by dwing, July 16, 2006. It is a modification of PAQ8H with optimizations for speed. It takes no options. The decompresser size is for uda.exe, since this is smaller than the corresponding zip file.

.1678 BWTmix

BWTmix v1 (from here) is a free, open source, experimental file compressor by Eugene Shelwien, June 28, 2009. It uses BWT (implemented using quicksort) followed by an 8 model CM mixed using a tree of 2-input mixers. The option c10000 selects a block size of 10000 * 100KB. The default block size is 100 MiB. Memory usage is 5x block size.

Program      Option     enwik8      enwik9     Comp  Deco  Mem   Note
---------    ------   ----------  -----------  ----  ----  ----  ----
bwtmix v1    c3334    20,608,793  170,596,616  3413  1253  1670   26
             c10000   20,608,793  167,978,527  1793   690  5000   49

.1694 lrzip

lrzip 0.40 is a free, open source file compressor by Con Kolivas, Nov. 26, 2009. It uses a range dictionary preprocessor to remove long range redundancies (based on rzip), followed by lzma (7zip) compression. It also has options to compress with lzo (lzop) or bzip2 after preprocessing, or to output the preprocessed data for compression with other programs. It runs under Linux.

lrzip 0.42 adds zpipe (zpaq cmid.cfg) as a back end compressor using option -z. It was tested in this mode.

lrzip 0.612 (discussion), Mar. 17, 2012, uses the current version of libzpaq (v5.01) for faster execution. The options select built in level 3 (max.cfg) compression.

Program     Options            enwik8      enwik9         prog       total       Comp  Deco Mem  alg  note
----------  ------------     ----------  -----------     --------  -----------   ----  ---- ---- ---- ----
lrzip 0.40                   25,190,577  214,903,304     38,173 x  214,941,477    843    31 1700 LZ77 33
lrzip 0.42  -z               21,327,441  183,609,156     49,881 x  183,659,037   2173  2230 1800 CM   33
lrzip 0.612 -z -L 9 -p 1     19,847,690  169,318,794     99,363 x  169,418,157   2987  2929 2700 CM   33

.1707 cm4_ext

cm0, cm0_ext, cm1 (discussion), and bwcm (discussion) are a series of free file compressors for Windows by Nauful. cm0 is a context mixing compressor released Dec. 4, 2013. cm0_ext is a slower version of the same program with better compression released Dec. 4, 2013. cm1 uses ROLZ and was released Dec. 5, 2013. bwcm used BWT and was released Dec. 6, 2013. Only bwcm takes any options. The command c128 uses a 128 MB block size. The default is c16. It requires 12x block size in memory for compression and 5x for decompression. All programs are single-threaded.

cm4_ext was released Jan. 21, 2014. It is an order 10 CM with a match model and SSE.

                Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------        ----------  -----------  -----------  -----------  ----- -----  --- --- ----
cm0                              23,276,242  206,929,764     201,213 x 207,130,977   1731  1791   68  CM  26
cm0_ext                          21,156,055  181,772,665     201,303 x 181,973,968   4206  4250  516  CM  26
cm1                              28,092,863  243,631,412     202,038 x 243,833,450    391   226  211  CM  26
bwcm                             23,265,333  204,416,216     202,803 x 204,619,019   1142   335  184  CM  26
bwcm              c128           21,278,364  185,473,048     202,803 x 185,675,851   1525   407 1469  CM  26
cm4_ext                          20,188,048  170,566,799     204,782 x 170,771,581   4123  4130 1906  CM  26

.1722 M1x2

M1 0.2a is a free, open source (GPL) file compressor by Christopher Mattern, released Oct. 3, 2008. It uses context mixing with only two contexts. The contexts are 64 bits with some bits masked out. The masks and several other parameters were selected by a combination of a genetic and hill climbing algorithms running for several hours to 3 days to optimize compression on this benchmark as discussed here.

M1 0.3 was released Jan. 2, 2009.

M1 0.3b was released Apr. 12, 2009. This version takes a configuration file created by an optimization version of the program. The configuration file is required by the decompresser (and is included in the program size).

e8-m103b1-mh is a parameter file for M1 0.3b obtained by mhajicek after about 3 days of CPU time running M1's genetic optimization program on enwik8.

M1x2 v0.5-1 was released Dec. 8, 2009. The option 6 means to use 48 x 26 MB memory. The option enwik7.txt is an optimization file which resulted from tuning parameters on the first 10 MB of the benchmark by a separate optimization process. It must be specified during decompression. The file size (242 bytes) is included in the decompresser size. The program includes source code and compiled Windows and Linux versions. The Windows version was tested. The program is described as follows by the author:

M1x2 mixes two ordinary M1 models in the logistic domain (thus four models in total). Data is processed bitwise with a flat decomposition. Contexts are mapped to states, which represent bit histories encountered under the corresponding context. In this implementation contexts are restricted to byte masks with some tweaks for text; the context mapping is implemented using hash tables. Two bit history states s1, s2 are quantised Q(.,.) and mapped to a linear counter to produce a prediction p = P(y=1|Q(s1, s2)), where y is the next bit. Afterwards two predictions are transformed into the logistic domain and mixed linearily. The final prediction is: p = Sq[ (St(p2)-St(p1))*w + St(p1) ]; St(.) and Sq(.) name stretch and squash (see PAQ) There is just a single weight w in [0, 1]. The Predictions and the weight are updated to minimize coding cost. As in previous versions a genetic optimzier can tune all degrees of freedom to a training data set. Parameters include: contexts, state machine structure, counter and mixer settings.

m1x2 v0.6 (discussion), Feb. 8, 2010, preprocesses the input by pre-compressing it with an order-1 12 bit length limited Huffman code prior to compression with the context mixing model of v0.5-1. This improves speed by reducing the size of the input and improves compression because the context hash tables are not filled as quickly. The 7 option says to use 8 x 27 MB memory. The decompresser size includes the 242 byte configuration file enwik7.txt. The length limited Huffman codes are generated using an algorithm described by A. Turpin and A. Moffat in Practical Length-Limited Coding for Large Alphabets, The Computer Journal, 38, (5), 339-347, 1995.

                Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Notes
-------           -------    ----------  -----------  -----------  -----------  ----- -----  --- ---  -----
M1 0.2a                      24,656,008  219,115,069     25,336 s  219,140,405    452   447   33 CM    26
M1 0.3                       24,004,989  215,101,056     24,596 s  215,125,652    395   404   33 CM    26
M1 0.3b       text2.txt      23,506,215  209,057,165     23,150 s  209,080,315    377   403   33 CM    26
M1 0.3b       text.txt       23,558,990                                           360   390   33 CM    26
M1 0.3b       e8-m103b1-mh   23,456,037  207,931,967     23,150 s  207,955,117    383   412   33 CM    26
M1x2 v0.5-1   6 enwik7.txt   20,812,625  172,771,031     47,608 x  172,818,639   1019  1091 1576 CM    26
M1x2 v0.6     7 enwik7.txt   20,723,056  172,212,773     38,467 s  172,251,240    711   715 1051 CM    26

.1727 cmm4

cmm1 is a free, open source (GPL) file compressor by Christopher Mattern, Sept. 18, 2007. It uses context mixing with LZP preprocessing.

cmm2 was released Dec. 10, 2007 without source code.

cmm2 080113 was released Jan. 13, 2008 without source code.

cmm3 080207 (test release) was released Feb. 7, 2008 without source code.

cmm4 v0.0 (test release) was released Mar. 14, 2008 without source code.

cmm4 v0.1e was released Apr. 20, 2008 without source code. It takes a 2 digit option "wm" (e.g. 96 meaning w=9, m=6). Memory usage is 2w MB for a sliding window, and 12*2m MB for a context mixing model (order 1,2,3,4,6). On my machine m=7 caused disk thrashing.

Description by the author: CMM4 0.1e Is a variable order context mixing coder, it predicts using the four "highest" (ranking: 643210) models in each bit coding step and, in addition, the match model input. Orders 0 and 1 are implemented using a table lookup, all higher orders use nibble based hashing. Matches are found using order 4 and 6 LZP, the pointers and a quick exclusion hash are stored within the model's hashing tables. The mixer joins the 4 (or 5 in presence of a match model) predictions and outputs them to a SSE stage. A mixer (similar to (L)PAQ) is selected based on the last byte's 4 MSBs and on the coding order. The SSE context is made of an order 0 context and qunatized combination of the previous symbol rank, the match length and partially matched symbol. This results in a notable compression increase on redundant data. The model's counters are quantized using the PAQ's state machine since CMM4 (will be replaced). Despite the use of hashing most data structures are tuned to never cross a cache line per nibble (the models) or octet (the mixer) (only SSE does). The core compression performance is equivalent to LPAQ1/2, while being faster. In addition there's a filter framework, which currently implements an x86 transform and will be extended.

Compression           Compressed size      Decompresser  Total size   Time (ns/byte)
Program      Opt     enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------      ---   ----------  -----------  -----------  -----------  ----- -----  --- ---
cmm1               23,495,627  207,266,867     18,785 x  207,285,652   1165  1198   50 CM
cmm2               23,477,008  208,268,161     17,901 x  208,286,062   1756  1849   32 CM
cmm2 080113        22,303,128  191,477,052     18,263 x  191,495,315   2180  2127  329 CM
cmm3 080207        21,212,766  179,633,451     18,700 x  179,652,151   2328 ~2609  395 CM
cmm4 v0.0          21,459,665  186,395,591     18,042 x  186,413,633   1807  1849  116 CM
cmm4 v0.1e   96    20,569,034  172,669,955     31,314 x  172,701,269   2052  2056 1321 CM
cmm4 v0.2b   87    20,550,129  171,969,035                                        1803 CM  42

.1741 ccm

ccm 1.03a is one of 3 versions of a free file compressor by Christian Martelock, Feb. 11, 2007. It uses context mixing. The 3 versions are ccm (fastest, uses 17 MB memory), ccm_high (slower but better compression), and ccm_extra (best compression, uses 100 MB memory). The programs take no options.

ccm 1.1.1a (Feb. 23, 2007) has only one version.

ccm 1.1.2a (Mar. 2, 2007) includes a ccm_low version using less memory, which was not tested.

ccm 1.20a (Mar. 21, 2007) has only one version.

ccm 1.20d (Apr. 8, 2007) has two versions: ccm using 99MB memory and ccmx using 210 MB for better compression. Only ccmx was tested.

ccm 1.21 (mirror) (Apr. 22, 2007) includes an option to select memory usage. 7 selects maximum memory, 1300 MB. Only the high compression version (ccmx) was tested.

ccm 1.30 (mirror) was released Jan. 7, 2008. Only ccmx 7 (high compression version, maximum memory) was tested.

Compression           Compressed size      Decompresser  Total size   Time (ns/byte)
Program              enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------            ----------  -----------  -----------  -----------  ----- -----  --- ---
ccm       1.0.3a   27,667,346  240,296,736      7,217 x  240,303,953    676   679   17 CM
ccm_high  1.0.3a   25,412,726  221,177,776      7,229 x  221,185,005   1119  1171   17 CM
ccm_extra 1.0.3a   24,027,805  207,273,926      7,230 x  207,281,156   1341  1353  100 CM
ccm       1.1.1a   22,824,629  197,271,467      9,019 x  197,280,486   1247  1252   82 CM
ccm       1.1.2a   22,675,768  195,965,427      8,502 x  195,973,929   1161  1183   83 CM
ccm       1.20a    21,350,295  182,784,655     13,346 x  182,798,001   1794  1801  210 CM
ccmx      1.20d    21,310,303  182,379,461     13,468 x  182,392,929   1383  1485  210 CM
ccmx 7    1.21     20,819,656  174,161,536     21,139 x  174,182,675   1521  1493 1324 CM
ccmx 7    1.30     20,857,925  174,142,092     15,014 x  174,157,106   1313  1338 1332 CM

.1744 bit

bit 0.1is a free, closed source file compressor by Osman Turan, Dec. 19, 2007. It uses ROLZ optimized for binary files. It takes no options.

bit 0.2b is an archiver, released June 14, 2008. Option -m lwcm selects the compression type (lightweight context mixint). This is the only type supported. Option -mem 9 selects maximum memory. This option ranges from 0 to 9 and uses 3 + 2opt MB memory. The program uses order 1, 2, 3, 4, and 6 context mixing with 2 SSE stages as discussed here. Comments by author:

LWCX (Light-Weight Context Mixing) is a codec of BIT Archiver. It's designed for getting high compression ratio with acceptable speed (Not enough fast currently). LWCX is a bit-wise context mixing schema which tries to mix order-n models (order 012346). The statistics are gathered by the counters which predict next bit by semi-stationary update rule. After gathering the predictions from all models, a neural network (similar to PAQ's neural network) tries to output a new mixed prediction. The mixed prediction is processed by a 2D SSE stage which have 32 vertices. Finally, a carryless arithmetic coder codes the given bit with final prediction.

Most of data structures are designed for avoiding cache misses. Order-0 and order-1 models' statistics stored in a direct lookup table. Higher orders (order 2346) models' statistics stored in a large hash table. Hash table size can be selected by "-mem N" option (memory usage is 3+2^(N+1) MB, N ranges 0 to 9). The codec locates a hash entry per only coding nibble.

bit 0.7 has options -p=1 through -p=5 to select memory usage of 10 + 20*2p MB.

Compressor       Opt      enwik8      enwik9         Prog      Total       Comp Decomp  Mem Alg  Note
---------        ---    ---------   -----------     -------  -----------   ----  ----   --- ---- ----
bit 0.1                 31,186,930  271,705,328    35,400 x  271,740,728    535    83    35 ROLZ
bit 0.2b -m lwcm -mem 9 21,971,587  189,881,180    63,665 x  189,944,845   2708  2747  1052 CM
bit 0.7  -p=5           20,823,204  174,425,039    62,493 x  174,487,532   2050  2100   663 CM   26

.1745 mcomp

mcomp x32 v2.00 is a free, closed source, command line file compressor by Malcolm Taylor (author of WinRK), released Aug. 23, 2008. It uses a large number of algorithms, although not the same ones as WinRK. There is a 32 bit version (mcomp_x32.exe) and a 64 bit version (mcomp_x64.exe) for Windows. Only the 32 bit version was tested (in 32-bit Vista). It displays the following help message:

LibMComp Demo Compressor (v2.00).
Copyright (c) 2008 M Software Ltd.

mcomp [options] pofile(s)

Options:
    -m[..]    Compression method:
              b    - BZIP2.
              c    - Experimental DMC codec.
              d    - Optimised deflate (df - fast, dx - max)
              d64  - Optimised deflate64 (d64f - fast, d64x - max)
              lz   - Optimised LZ (lzf - fast, lzx - max)
              f    - Optimised ROLZ (ff - fast, fx - max)
              f3   - Optimised ROLZ3 (f3f - fast, f3x - max)
              p    - PPMd var.J.
              sl   - Bitstream (LSB first).
              sm   - Bitstream (MSB first).
              w    - Experimental BWT codec.
    -MNN[k,m] Model size (in kb (default) or Mb, default 64M).
    -oNN      Order (for Bitstream and PPMd).
    -np       Display no progress information.

pofile(s) means input file and output file. When run with no compression options, the program decompresses. Test results are as follows on a dual core 2 GHz Pentium T3200 with 3 GB as in note 26.

Compressor Opt                 enwik8      enwik9         Prog      Total       Comp Decomp  Mem Alg  Note
---------  ---               ---------   -----------     -------  -----------   ----  ----   --- ---- ----
mcomp_x32  -mb               29,997,076                                         2070   970     4 BWT  -M has no effect
           -mc               23,546,185                                         1350  1410    50 DMC
           -mc -M512m        22,561,089                                         1520         322 DMC  max memory
           -mdf              fails
           -md               35,436,114                                         2140  1421     4 LZ77 fails
           -mdx              35,383,881                                         2240  1420     4 LZ77 fails
           -md64f            fails
           -md64x            32,983,178                                        28930  1310     4 LZ77 fails
           -mlz              24,648,445                                         3090    50   595 LZ77
           -mf               24,331,132                                         2240    78   149 ROLZ
           -mf -M1800m       23,187,091                                         3320    77   414 ROLZ
           -mfx -M1800m      23,182,541                                         3410    81   414 ROLZ
           -mf3x -M1800m     23,098,116                                         3850   112   415 ROLZ
           -mp -M1800m -o10  21,039,213  177,948,781   172,531 x  178,121,312   4580 12180  1847 PPM
           -mp -M1800m -o12  20,917,657  179,193,238   172,531 x  179,365,769   5180        1847 PPM
           -mp -M1800m -o16  20,868,127  181,150,814   172,531 x  181,323,345   5750        1847 PPM
           -msl -M1800m -o12 54,428,147                                         6510  6480     1 CM?  -M has no effect
           -msm              59,731,673                                         5880  5810     1 CM?  -M has no effect
           -mw               21,805,857  188,095,082   172,531 x  188,267,613    356   232   660 BWT  2 cores
           -mw -M180m        21,103,670  179,838,392   172,531 x  180,010,923    329   284  1850 BWT  2 cores
           -mw -M320m        21,103,670  174,388,351   172,531 x  174,560,882    473   399  1643 BWT  1 core

-mb produces bzip2 compatible format. -M has no effect. Memory usage is fixed at 4 MB.

-mc uses DMC. If memory is greater than -M512, then the program aborts with an assertion failed.

-md and -md64 are supposed to generate deflate and deflate64 formats (zip or gzip). However -mdf and -md64f (fast modes) crash immediately during compression. The other modes decompress to files that are the correct size but not identical to the original. Run times are very slow due to most of the CPU time spent in the kernel (up to 90%) as reported by timer 3.01.

-mp used PPMD var. J, but allows more memory (up to about 1800 MB). The original program was limited to 256 MB. The optimal orders are different for enwik8 and enwik9. Higher orders help compression, but lower orders save memory on larger files. The maximum order is -o16. Higher values have no effect. Decompression is slow due to 55% of the CPU time spent in the kernel. Normally this is around 1% and decompression speed would be the same as compression.

-msl and -msm ignore the -M option and use 1 MB memory, resulting in poor compression.

-mw (experimental BWT) is the only option that uses both cores. All others result in 50% CPU usage on a 2 core processor. The -M option actually selects the block size, not total memory usage. Memory usage is 5x block size if one core is used, or 10x if both are used. Both are used only if enough memory is available. The default is to split the file in half and compress the two halves in parallel. However, better but slower compression can be obtained by using -M to select one block for the whole file. Maximum memory is 2 GB, even if more is available. For enwik9, -M320 selects 3 blocks, which are compressed in series on one core. For two cores, time reported is wall time. Process time for -mw -M320m is 187% of wall time for compression and 139% for decompression.

.1749 epmopt | epm

epmopt + epm r9 is an experimental, closed source command line optimizer and file compressor by Serge Osnach, Oct. 16, 2003. It was intended for enc r16, but development on that project has stopped at enc r15, according to the web page (in Russian). The program has two parts: epm, a PPM compressor with text preprocessing, and epmopt, which attempts to optimize the parameters to epm by compressing repeatedly and varying the options one at a time until there is no more improvement. The input to epmopt may be different than epm, and supports optimization on sets of files matching patterns in specified sets of directories. The options to epm are memory limit, PPM order, and 20 undocumented options each specified by a single digit. The exact same options must be passed to the decompresser. In the results, I added 27 bytes to the compressed file sizes to account for this information. enwik9 was compressed and decompressed as follows:

  epmopt -m800 -n20 --fixedorder:12 enwik6 .
  epm c01286014321245957352513 enwik9 enwik9.epm -m800
  epm d01286014321245957352513 enwik9.epm enwik9.tmp -m800
The optimization data was enwik6, the first 106 bytes of the input file. epmopt compressed this about 100 times in 368 seconds with different options, making 35 passes through the list of 20 undocumented parameters, adjusting each one up or down one at a time. The fixed parameters were -m800 (800 MB memory limit) and PPM order 12 (--fixedorder:12, also the first 3 digits of the parameter string. Allowing epmopt to set the PPM order on a smaller training file will cause it to choose too large a value, hurting compression. I only tested orders 10, 12, and 20 on enwik8 and 12 gave the best compression). The -n20 option tells epm to tune all 20 parameters. The parameter string is written to the file enc.ini. The -m800 option need not be the same for epmopt and epm but must be the same for epm during compression and decompression.

Warning: epm failed to decompress correctly on enwik7 (first 107 bytes). In the output, some linefeeds were changed to spaces. This happened with all parameter combinations I tested including defaults: epm c enwik7 enwik7.epm. Decompression was bit-exact for enwik5, enwik6, enwik8 and enwik9.

.1749 WinUDA

WinUDA 0.291 is a free, closed source GUI archiver by dwing, July 4, 2005. It uses context mixing and is derived from paq6. Mode 3 is the slowest (about 3x slower than mode 0) and uses the most memory, 194 MB.

.1755 dark

dark v0.51 is a free, closed source archiver by Malyshev Dmitry Alexandrovich, Jan. 2, 2007. It uses BWT + distance coding without preprocessors. The -b333m option selects 333 MB blocks. -f (-f0 in 0.40 and 0.46, not supported in 0.32) forces no segmentation. Memory usage is 5 times the block size for compression (6x prior to v0.46).

opendark ver. A is an open source version of dark. The supplied Windows dark.exe crashed when decompressing enwik9 (size is 177,675,818). Decompression works up to -b127m. opendark does not support the -f option.

                             Compression      Compressed size      Decompresser  Total size   Time (ns/byte)
Program                        Options       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------                        -------     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
dark 0.32b  July  9, 2006      -b128m      21,414,479  185,844,554     31,076 x  185,875,590    481   407  790 BWT
dark 0.40b  Aug. 14, 2006      -b128mf0    21,243,259  184,271,115     34,688 x  184,305,803    471   316  790 BWT
dark 0.46   Aug. 23, 2006      -b160mf0    21,231,325  181,904,374     40,780 x  181,945,154    488   404  813 BWT
                               -b333mf0    21,231,325  175,955,412     40,780 x  175,996,192    432   425 1692 BWT
opendark A  Nov. 14, 2006      -b333m      21,432,727    (fails)       10,089 s                 450   390 1692 BWT
                               -b127m      21,432,727  185,985,101     10,089 s  185,995,190    389   331  652 BWT  26
dark 0.51   Jan.  2, 2007      -b333mf     21,169,819  175,471,417     34,797 x  175,506,214    533   453 1692 BWT

.1760 FreeArc

FreeArc 0.36 is a free, open source archiver by Bulat Ziganshin, Feb. 21, 2007. It incorporates 7 compression libraries - PPMd, GRZipII, LZMA (7zip), plus BCJ (7zip), REP (rzip-like), dynamic dictionary and LZP preprocessors. The option -m9 selects maximum compression (dict + LZP + PPMd for text files, REP+LZMA for binary). -lc1600000000 limits memory to 1.6 GB (same as -lc1600m). There is an option to use ppmonstr as an external compressor, which was not included in the test.

FreeArc 4.0 pre-4 is a free, open source archiver by Bulat Ziganshin, Dec. 16, 2007. It compresses using ppmd, GRZipII, and LZMA along with multimedia filters, a dictionary preprocessor and a REP preprocessor for removing repeating strings. It has Windows and Linux versions and an optional GUI.

ppmd generally gives the best compression for text. It will also call ppmonstr as an external program, but this mode was not tested, even though it compresses better.

For this test, the Windows command line version was tested. The option -mppmd:1012m:o13:r1 is equivalent to ppmd -m1012 -o13 -r1, selecting 1012 MB memory, order 13, and partial reinitialization of the model when memory is exhausted. Note that ppmd normally allows only up to -m256. This program was tested with 2 GB memory but values higher than -m1012 caused the program to crash during compression.

FreeArc 0.666 was released May 19, 2010. The 32 bit Windows console version was tested. -m9 selects maximum compression. There are many other compression options but these were not tested.

freearc 0.67a was released Mar. 15, 2014. Options -m1 to -m9 select the compression level from fastest to best. -m1x to -m9x select levels with fast decompression. Decompression was tested with the separate unarc.exe program.

                             Compression      Compressed size      Decompresser  Total size   Time (ns/byte)
Program                        Options       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------                        -------     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
FreeArc 0.36        -m9 -lc1600000000      21,153,231  184,498,111    372,457 s  184,870,568    665   517 1600 PPM
FreeArc 0.40 pre-4  -mppmd:1012m:o13:r1    20,931,605  175,254,732    748,202 x  176,002,934   1175  1216 1046 PPM
FreeArc 0.666       -m9                    21,659,587  189,696,374  1,214,530 x  190,910,004    524   416  785 PPM  26
FreeArc 0.67a       -m1                    39,485,049                                            25    27  191      26
                    -m2                    26,831,928                                            59   121  117      26
                    -m3                    25,221,359                                           147   100  157      26
                    -m4                    24,285,483                                           174   132  155      26
                    -m5                    23,020,671                                           410   443  311      26
                    -m6                    21,659,587                                           570   471  463      26
                    -m7                    21,659,587                                           592   477  463      26                     
                    -m8                    21,659,587                                           604   495  448      26
                    -m9                    21,659,587  189,696,374    148,665 xd 189,845,039    519   420  813      26
                    -m1x                   39,485,049                                            27    25  194      26
                    -m2x                   34,307,417                                            73    28  170      26
                    -m3x                   27,336,122                                           269    32  186      26
                    -m4x                   25,652,947                                           357    45  189      26
                    -m5x                   24,897,495                                           564    43  204      26
                    -m6x                   23,870,179                                           522    41  453      26
                    -m7x                   23,788,636                                           546    41  599      26
                    -m8x                   23,788,633                                           565    41  584      26
                    -m9x                   23,788,633                                           567    41  584      26

.1766 hook

hook v0.2 is a free, open source (GPL) command line file compressor by Nania Francesco Antonio, Jan. 8, 2007. It uses DMC: a state machine in which each state represents a bitwise context. Each state has 2 outgoing transitions corresponding to next bits 0 and 1, and a count n0 or n1 associated with each transition. Bit y (0 or 1) is compressed by arithmetic coding with probability ny/(n0+n1) (where ny is n0 or n1 according to y), and then ny is incremented.

After each input bit, the next state represents a context obtained by appending that bit on the right and possibly dropping bits on the left. States are cloned (copied) whenever the incoming and outgoing counts exceed certain limits. This has the effect of creating a new context in which no bits are dropped. In the example below, the state representing context 110 (dropping 2 bits from the previous context) is cloned by creating a new state 11110 because the incoming 0 transition count (ny for y=0) from state 1111 exceeded a limit. The new context is longer because it does not drop any bits. This transition is moved to point to the new state. Other incoming transitions (not shown) remain pointing to the original state. The outgoing transitions are copied. The counts of the original state are distributed to the new state in proportion to the moved transition's contribution to those counts, which is w = ny/(n0+n1).

                n0 ----> 1100           n0*(1-w) ----> 1100
         ny       /                             /     /
   1111 -----> 110               1111        110     /
        (y=0)     \                 |           \   /
                n1 ----> 1101       |   n1*(1-w) ----> 1101
                                    |             /    /
                                    |     n0*w   /    /
                                    | ny        /    /
                                    +----> 11110    /
                                                \  /
                                          n1*w   --

        Before cloning            After cloning 110 to 11110

Normally, the initial set of contexts begin on byte boundaries. The cloning mechanism ensures that new contexts also have this property.

In hook v0.2, the counts are 32 bit floating point numbers initialized to 0.1. The initial state machine has 256*255 states representing bytewise order 1 contexts with uniform statistics. When memory is exhausted, the model is discarded and the state machine is reinitialized. A new state is cloned when ny > limit and n0+n1-ny > length, where limit and length are parameters. The optimal parameters for enwik8 and enwik9 are "c 7 2 6", c means compress, 7 selects the maximum of 1 GB memory (64M states at 16 bytes each, minimum is 8 MB memory), 2 is the limit (range 1 to 7), and 6 selects a length of 32 (possible values are 1, 2, 3, 4, 8, 16, 32, 64). Larger lengths are better for large files because they conserve memory at the expense of compression.

hook v0.3 (Jan. 11, 2007) allows up to 1.8 GB memory (first option = 9) and uses double precision predictions in the 32 bit arithmetic coder.

hook v0.3a (Jan. 12, 2007) initializes the counts to 0.125 (instead of 0.1) and uses 24 bit precision in the arithmetic coder (instead of 32 bit).

hook v0.4 (Jan. 15, 2007) initializes counts to 0.1. Argument 2 selects length 3 (not 2).

hook v0.5b (Jan. 22, 2007) adds an LZP preprocessor. If the next byte to be coded is the same as the byte that occurred in the last matching 3 byte context, then this is indicated by coding a flag bit in an order 3 model (32 MB memory), and a match length coded by DMC with a fixed size of 128 MB. If there is no match, then the literal byte is coded by another variable sized DMC model. The parameters "c 1600000000 2 64 1 6" select compression (c), 1.6 GB for the DMC literal model (1600000000), a limit of 2 (minimum count for the cloned state), length of 64 (minimum remaining count for the state to be cloned), LZP selected (1), and a minimum match length of 6.

hook v0.6 (Feb. 7, 2007) removes the "length" parameter (effectively infinite). The arguments "c 1600 4 1 6" mean to compress (c), use 1600 MB memory, set the "limit" parameter to 4, turn on LZP preprocessing (1) with a minimum match length of 6. The "limit" parameter is the minimum count for an outbound DMC state transition to clone the state. Limit was tuned on enwik8.

hook v0.6b (Feb. 8, 2007) includes support for files up to 264 bytes (compiled by Ilia Muraviev. Earlier versions were compiled with MinGW g++ 3.4.5 by Matt Mahoney.) "limit" was tuned on both enwik8 and enwik9. Higher values conserve memory at the expense of compression on smaller files.

hook v0.6c (Feb. 14, 2007) stores the input filename in the compressed file and uses it during decompression.

hook v0.7 (Mar. 10, 2007) uses 325 MB more memory than advertised so it was tested with a lower option.

hook v0.7b (Mar. 12, 2007) reduces the excess memory to 94 MB.

hook v0.8 was released Mar. 17, 2007. Some additional results on enwik9 decreasing the rate at which the state machine fills up and is flushed:

hook08 params    enwik9
------------  -----------
c 1700 1 1 6  183,175,857
c 1700 2 1 6  181,578,888
c 1700 3 1 6  181,220,553
c 1700 4 1 6  181,268,867
c 1700 5 1 6  181,197,310
c 1700 6 1 6  181,567,697
c 1700 7 1 6  181,813,763
c 1700 8 1 6  182,360,391

hook v0.8b (Mar. 18, 2007) has some LZP improvements.

hook v0.8c (Mar. 19, 2007) is a minor bug fix. Compressed sizes are 1 byte larger than v0.8b.

hook v0.8d was released Mar. 21, 2007.

hook v0.8e was released Mar. 27, 2007.

hook v0.9 (Apr. 6, 2007) is closed source. It requires a processor that supports SSE instructions. It has some speed improvements and a E8/E9 filter for improved compression of .exe files. Memory usage is the second argument + 60MB.

freehook 0.2 is an open source port of hook v0.8e from C++ to C by Eugene Ortmann, Apr. 7, 2007. The supplied .exe file requires SSE instructions (Pentium 3 or higher), but the source can be recompiled for other processors.

hook v0.9b (Apr 10, 2007) replaces floating point arithmetic with integer arithmetic, so that archives are compatible across different processors. Note: I reduced the memory setting from 1800 to 1700 to prevent disk thrashing, which was a problem in earlier tests. I will do this from now on. This hurts enwik9 compression (but not enwik8) slightly, from 180,444,546 to 180,582,601. Actual memory usage is 60 MB over.

freehook 0.3 (Apr 10, 2007) has only very minor changes from 0.2 but is slightly faster due to different g++ compiler options. Compression is the same as 0.2. Memory usage is about 160 MB over.

hook v0.9c (May 8, 2007) has some speed improvements in the arithmetic coder. It compresses the same size as v0.9b.

hook v1.0 (Sept. 20, 2007) is closed source. The only option is memory size in MB.

The zip file linked above contains all versions (C++ source and Win32 .exe).

hook 1.1 (Nov. 13, 2007) improves BMP and WAV compression.

hook 1.3 was released Dec. 14, 2007, modified Dec. 15, 2007.

hook 1.4 was released Apr. 29, 2009.

Compression                             Compressed size      Decompresser  Total size   Time (ns/byte)
Program       Options                  enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------       -------                ----------  -----------  -----------  -----------  ----- -----  --- ---
hook v0.2     c 7 2 6                23,628,061  208,211,084      2,556 s  208,213,640    772   779 1052 DMC
hook v0.3     c 9 2 6                23,548,017  202,024,740      3,567 s  202,028,307    849   864 1764 DMC
hook v0.3a    c 9 2 6                23,499,700  201,934,976      3,555 s  201,938,531    862   832 1764 DMC
hook v0.4     c 9 2 6                23,349,695  199,829,234      4,112 s  199,833,346    934   959 1764 DMC
hook v0.5b    c 1600000000 2 64 1 6  22,806,402  193,227,085      5,113 s  193,232,198   1084  1029 1764 LZP+DMC
hook v0.6     c 1600 4 1 6           22,472,884  191,733,561      5,112 s  191,738,673   1146  1034 1600 LZP+DMC
hook v0.6b    c 1600 4 1 6           22,535,069  189,932,778      5,174 s  189,937,952   1040       1600 LZP+DMC
              c 1600 6 1 6           22,776,927  188,384,238      5,174 s  188,389,412   1090  1026 1600
hook v0.6c    c 1600 6 1 6           22,561,621  188,081,694      5,878 s  188,087,572   1131  1092 1600 LZP+DMC
hook v0.7     c 1000 6 1 6           22,410,669  191,516,313      6,195 s  191,522,508   1360  1353 1375 LZP+DMC
hook v0.7b    c 1700 6 1 6           22,404,817  184,765,030      6,195 s  184,771,225   1516  1655 1794 LZP+DMC
hook v0.8     c 1700 5 1 6           22,290,033  181,197,310      6,686 s  181,203,996   1110  1118 1700 LZP+DMC
hook v0.8b    c 1700 5 1 6           22,399,354  180,335,788      6,944 s  180,342,732    988  1033 1700 LZP+DMC
hook v0.8c    c 1700 5 1 6           22,399,355  180,335,789      7,071 s  180,342,860   1043  1005 1700 LZP+DMC
hook v0.8d    c 1700 5 1 6           22,399,027  180,319,203      7,037 s  180,326,240    928   915 1700 LZP+DMC
hook v0.8e    c 1700 3 1 6           22,039,935  178,140,788      7,263 s  178,148,051    952  1009 1700 LZP+DMC
hook v0.9     c 1800 2 1 6           21,969,342  178,932,435     10,069 x  178,942,435    869       1860 LZP+DMC
              c 1800 3 1 6           22,077,883  178,599,478     10,069 x  178,609,547    833   916 1860 LZP+DMC
freehook 0.2  c 1700 3 1 6           22,039,914  178,141,036      7,386 s  178,148,422    813   855 1860 LZP+DMC
hook v0.9b    c 1700 3 1 6           22,496,910  180,582,601      9,278 x  180,591,879    810   810 1721 LZP+DMC
freehook 0.3  c 1600 3 1 6           22,039,914  178,619,149      7,352 s  178,626,501    789   818 1713 LZP+DMC
hook v0.9c    c 1700 3 1 6           22,496,910  180,582,601      8,506 x  180,591,107    774   791 1721 LZP+DMC
hook v1.0     c 1700                 22,122,484  177,843,658     11,163 x  177,854,821    865   879 1739 LZP+DMC
hook v1.1     c 1700                 22,122,484  177,843,658     25,854 x  177,869,512    877   872 1739 LZP+DMC
hook v1.3     c 1700                 22,030,108  178,216,980     13,870 x  178,230,850    825   835 1736 LZP+DMC
hook v1.4     c 1700                 21,990,502  176,648,663     37,004 x  176,685,667    741   695 1777 LZP+DMC

.1789 7zip

7zip 4.42 is an open source GUI and command line archiver by Igor Pavlov, May 14, 2006. It compresses to 7z, zip, gzip, ppmd.H and tar format, optionally encrypts with AES, and will uncompress several other formats.

7z is the default format. It uses LZMA compression, a variation of LZ77. The option -mx=9 selects ultra (maximum) compression in this mode. The option -sfx7zCon.sfx creates a console-based self extracting executable by prepending a 131,584 byte decompresser. This is slightly smaller than the Windows GUI version (132,096 bytes) and much smaller than the decompression program itself as a zipped self extracting download (817,795 bytes). The best compression is with ppmd. The options are -m0=ppmd:mem=768m:o=10 equivalent to ppmd var H (with minor changes) order 10 with 768 MB memory. 7zip 4.46a was announced May 21, 2007. (The improved compression is due to testing with more memory).

7zip 9.04a was released Dec. 3, 2009. It gave an out of memory error with mem=1630.

7zip 9.20 was released Nov. 18, 2010. Default (LZMA) mode was tested. It uses 196 MB for compression using 75% of 2 cores, and 18 MB for decompression on a 2.0 GHz T3200 under Windows.

The following include the best known option combinations for 7zip on enwik8 in ppmd (PPM), 7z (LZMA), bzip2 (BWT) and zip (LZ77) formats.

                Compression                         Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Alg  Notes
-------           -------                        ----------  -----------  -----------  -----------  ----- -----  ---  -----
7zip 4.42 -m0=ppmd:mem=768:o=10 -sfx7xCon.sfx    21,375,060  185,043,783          0 xd 185,043,783    505  ~500  PPM
7zip 4.42 -m0=ppmd:mem=293m:o=7                  21,791,628                                           647   655  PPM   6
7zip 4.42 -mx=9 -sfx7zCon.sfx                    24,996,113  213,490,979          0 xd 213,490,979   2286    63  LZMA
7zip 4.42 -tbzip2 -mpass=2                       29,003,844                                          1974   176  BWT   6
7zip 4.42 -tzip -mm=deflate64 -mfb=153 -mpass=8  33,727,442                                          2803    28  LZ77  6
7zip 4.42 -tzip -mm=deflate -mfb=171 -mpass=8    35,056,389                                          2672    27  LZ77  6
7zip 4.42 -tzip -mm=deflate -mfb=258 -mpass=8    35,057,040                                          2664    29  LZ77  6
7zip 4.42 Zip/Ultra (in GUI)                     35,057,347                                          4307        LZ77  1
7zip 4.46a -m0=ppmd:mem=1630m:o=10 -sfx7xCon.sfx 21,197,559  178,965,454          0 xd 178,965,454    503   546  PPM
7zip 9.04a -m0=ppmd:mem=1500m:o=10 -sfx7zCon.sfx 21,211,895  179,209,403          0 xd 179,209,403    506   520  PPM   26
7zip 9.12b -m0=ppmd:mem=2048m:o=10               21,060,863  177,187,967                                         PPM   42
7zip 9.20                                        25,895,909  227,905,645    518,536 x  228,424,181   1031     42 LZMA  26

.1803 pimple2

pimple 1.43 beta is a free, closed source GUI archiver by Ilia Muraviev, Apr. 24, 2006. It uses context mixing.

pimple2 is a command line file compressor, June 11, 2007.

                Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- --- ----
pimple 1.43 beta  512MB, order 8, match 32    20,992,830  181,998,817    353,472 x  182,352,259   9638 10112  512 CM    3
pimple2           (none)                      20,871,457  180,251,530     78,642 x  180,330,172  18474 17992  128 CM          

.1807 ash

ash 04a is a free, experimental command line file compressor by Eugene D. Shelwien, Dec. 5, 2003. The /m700 option selects 700 MB memory limit. (/m800 causes disk thrashing with 1 GB). /o10 selects model order 9. This gives good results on smaller files when memory is constrained, but I did not try to optimize it. There is a /s option to select SSE depth that gives good results for the default value of /s5 so I did not try to optimize it either. Other results:

ash04a options           enwik9    Comp (ns/byte)
----------            -----------  ----
/m700 /o8  (order 7)  180,830,523  5883
/m700 /o10 (order 9)  180,735,542  6011
Note: the acutal memory usage (commit charge) for enwik9 /m700 /o8 was 1910 MB at the end of compression, minus 257 MB for other programs, according to Windows task manager. This is generally not a problem if your swap file is large enough. It appears to be a slow memory leak (recovered when program exits) and does not cause thrashing.

ash /m1700 /o10 and /o12 failed to compress enwik9 with 2 GB memory (error: could not allocate a block). enwik8 compressed to 19,713,239 using /o10 and 19,446,859 using /o12.

.1813 tree

tree 0.1 is a free, experimental, open source compressor by Kennon Conrad, Mar. 31, 2014. It is designed specifically to compress enwik9 and is not a general purpose compressor. The compressor is 3 separate programs. The first, TreeCapEncode.c, converts upper case letters to lower case plus special symbols. It takes 4 minutes. The second, TreeCompress.c, uses a suffix tree to parse the input into tokens. It takes 3 days, 21 hours, 37 minutes and uses 1850 MB memory. The third, TreeBitEncode.c encodes the tokens using variable length codes. This takes 27 seconds. The decoder, TreeDecode.c, takes 22 seconds using 400 MB memory. Compressed size depends on available memory; thus results below are machine dependent.

tree 0.3 was released Apr. 27, 2014. It uses a model that only parses whole words with a leading space.

tree 0.4 was release May 21, 2014.

tree 0.5 was released May 25, 2014.

tree 0.9 was released July 5, 2014. It includes a multi-threaded decompression program for better speed. TreeCapEncode.c is now TreePreEncode.c and run in 11 seconds.

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------       ----------  -----------  -----------  -----------  ----- -----  --- --- ----
tree 0.1                                    187,985,256      6,656 sd 187,985,256 337287    22 1850 Dict 64
                                23,660,364  187,933,399      6,656 sd 187,940,055 174589    12 1850 Dict 65
tree 0.2                        23,250,856  185,311,980                           337287    22 1850 Dict 64
tree 0.3                        23,233,932  184,838,711      6,591 sd 184,845,302 105728    23 1850 Dict 64
tree 0.4                        23,178,500  184,312,072      7,216 sd 184,319,288  68866    22 1850 Dict 64
tree 0.5                        23,084,884  181,375,076      8,271 sd 181,383,347  68869    22 1850 Dict 64
tree 0.9                        22,366,748  181,324,992      7,104 sd 181,332,096  70723    15 1850 Dict 64

.1823 ocamyd

ocamyd 1.65.final is a free, open source command line file compressor by Frank Schwellinger, May 25, 2006. It uses DMC. The -s0 selects slowest (maximum) compression. The -m8 option selects 800 MB memory (maximum is -m9 = 900 MB).

ocamyd LTCB 1.0 is a modification by Mauro Vezzosi on June 20, 2006 of Frank Schwellinger's ocamyd-1.65-final. The option -s0 selects maximum compression. -m3 selects 300 MB memory (the maximum for the test machine), but it supports up to -m8.

ocamyd 1.66.final, by Frank Schwellinger, Feb. 1, 2007, includes the -f option to prevent flushing and rebuilding the DMC model when memory is exhausted.

                Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------        ----------  -----------  -----------  -----------  ----- -----  --- --- ----
ocamyd 1.65.final -s0 -m8        21,456,536  185,727,437     20,618 x  185,748,055  50782 50935  800 DMC
ocamyd LTCB 1.0   -s0 -m3        21,285,121  182,359,986     21,030 x  182,381,016 108960~110000 300 DMC   6
ocamyd 1.66.final -s0 -m3 -f     21,123,280  182,410,035     20,636 x  182,430,561  59130 59637  300 DMC   6

The following table shows the effect of the -s and -m options on ocamyd 1.65.final on enwik8. Times are in ns/byte, process (kernel+user) time by timer 3.01, ~ indicates global (wall) time.

Options    enwik8    Comp  Decomp  Notes
-------  ----------  -----  -----  -----
-s0 -m8  21,456,536  42030  42010

-s0 -m4  22,073,527  70482  70538  6 (400 MB) (~101015 ~92921 global time)
-s1 -m4  23,944,647 ~33535         6
-s2 -m4  26,345,297  ~1940         6
-s3 -m4  28,060,900  ~1826         6

-s0 -m3  22,296,826 ~70960         6 (300 MB)
-s1 -m3  24,114,574 ~33818         6
-s2 -m3  26,911,154  ~1603         6
-s3 -m3  28,278,662  ~1514         6

-s0 -m2  22,688,950 ~70172         6 (200 MB)
-s1 -m2  24,511,065 ~33771         6
-s2 -m2  27,614,083  ~1562         6
-s3 -m2  28,928,850  ~1448         6

-s0 -m1  23,487,047 ~68522         6 (100 MB)
-s1 -m1  25,280,406 ~33277         6
-s2 -m1  29,045,902  ~1509         6
-s3 -m1  30,080,719  ~1408         6

-s0 -m0  24,210,216 ~66463         6 (64 MB)
-s1 -m0  25,882,226 ~33121         6
-s2 -m0  30,591,255  ~1481         6
-s3 -m0  31,276,535  ~1377         6

.1824 bee

bee 0.78 build 0154 is an open source (Delphi Object Pascal) command line archiver (with optional GUI) by Andrew Filinsky and Melchiorre Caruso, Sept. 23, 2005. It uses PPM. The -m3 option select maximum compression (default is -m1). The -d8 option selects 512 MB memory, the maximum that does not cause disk thrashing (default is -d2 = 10 MB).

bee includes beeopt, a parameter optimizer similar to epmopt. This was not tested. bee comes preconfigured with parameters trained on .txt and .xml files (and other types) in file bee.ini. This was tested by renaming enwik7 (first 107 bytes) to enwik7.txt and enwik7.xml but compression was worse. The executable size is a zip archive containing bee.exe and bee.ini. This is much smaller than the zipped source code download.

.1826 st

st 0.51 is a free, closed source file compressor by Stefan Gedo, Oct. 15, 2010. It uses PPM. It has 3 compression levels, -n (normal), -f (fast) and -b (best). The memory used depends on the file size and is higher for decompression (Dmem) than compression (Cmem).

st 0.81 was released July 12, 2012.

Compression           Compressed size      Decompresser  Total size   Time (ns/byte)
Program      Opt     enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  CMem Dmem Alg Note
-------      ---   ----------  -----------  -----------  -----------  ----- -----  ---- ---- --- ----
st 0.51       -f   37,570,294                                           211   218   135  198 PPM  26
              -n   28,054,812                                           322   341   136  207 PPM  26
              -b   27,426,599                                           498   526   167  238 PPM  26
              -b   27,426,599  241,665,756      44,447 x 241,710,203    432   464  1259 1413 PPM  26
st 0.81       -f   37,625,424                                           740   780    52   53 PPM  26
              -n   26,081,061                                           720   950    96   97 PPM  26
              -b   21,589,955  182,668,405      13,724 x 182,682,129   1344  1356  1809 1810 PPM  26
              -b   21,589,955  182,668,434      13,629 x 182,682,063   1165  1200  1800 1800 PPM  56

.1829 uhbc

uhbc 1.0 is an experimental, closed source command line file compressor by Uwe Herklotz, June 30, 2003. It uses BWT. The -b100m option selects 100 MB block size, which requires 800 MB for compression and 500 MB for decompression. -m3 selects maximum compression for the entropy coding stage, which consists of run length coding (RLE) + DWFC (double weighted frequency counting) + entropy coding. WFC is described in Deorowicz, S., Improvements to Burrows–Wheeler compression algorithm, Software–Practice and Experience, 2000; 30(13):1465–1483.

Additional results on enwik8:

Options                                     enwik8 size  Comp  Decomp (ns/byte)
-----------------------------------------   -----------  ----  ------
-m3 -b100m (one 100 MB block)                20,930,838  1145   858
-m3 (default block size is 5 MB)             24,296,345   914   733
-m2 (RLE + WFC + entropy coding, default)    24,411,843   806   644
-m2 -cp (prefix sort, default is suffix)     24,589,110   813   578
-m1 (RLE + MTF (move to front) + entropy)    25,021,683   680   547
-m0 (RLE + direct entropy coding)            25,341,274   603   500

.1831 smac

smac v1.8 (discussion) is a free, experimental file compressor for Windows by Jean-Marie Barone, Jan. 22, 2013. It uses an order-4 bitwise context model and arithmetic coding. It takes no options. Source code is in x86 assembler.

smac v1.9, Jan. 31, 2013, uses an order 4 and order 6 context model and chooses at each bit the model whose prediction is further away from 1/2.

smac v1.10, Feb. 7, 2013, uses a nonstationary model like PAQ6. When a bit count is incremented, half of the count over 2 of the other bit value is discarded.

smac v1.11, Feb. 18, 2013, switches between order 6, 4, and 3 context models depending on which prediction is furthest away from 1/2. For files smaller than 5 MB, it switches between lower order contexts.

smac v1.12a, Mar. 11, 2013, uses indirect context models. The context is mapped to a 16 bit state representing the number of 0 and 1 bits as 7 bit counters, plus the last 2 bits. When the counters reach the maximum value of 127, they are both halved and incremented. v1.12a is a speed improvement over v1.12 (released the day before) using prefetch instructions.

smac v1.13, Mar. 22, 2013, mixes the order 6, 4, and 3 indirect context models in the logistic domain, log(p(1)/p(0)). Each prediction has a fixed weight of 1/3.

smac v1.14, Apr. 20, 2013, uses adaptive mixer weight update with a learning rate of 0.002.

smac v1.15, May 19, 2013, uses an order 6-4-3-2-1 context mixing algorithm.

smac v1.16, July 30, 2013, has improvements to the context bit history model and match model.

smac 1.17 (discussion), Nov. 1, 2013, has some speed optimizations and small changes in the bit history counter rounding and use of floating point lookup tables.

smac 1.17a (discussion), Nov. 17, 2013, has some speed improvements with no change in compression.

smac 1.18 (discussion), Dec. 8, 2013, uses a polynomial function to compute squash() to improve speed.

smac 1.19 (discussion), Dec. 17, 2013, has a speed optimization of the squash function.

smac 1.20, Jan. 16, 2014, improves modeling of 0 frequency counts using a Laplace estimator, p=(n0+1)/(n0+n1+2).

Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
  Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem  Alg Note
  -------        ----------  -----------  -----------  -----------  ----- -----  ---  --- ----
smac 1.8         29,143,755  265,303,304     2,713 x   265,306,017   1917  1935 1691  o4  26
smac 1.9         26,888,498  242,014,586     2,832 x   242,017,418   3168  3266 1690  CM  26
smac 1.10        26,398,662  230,781,496     2,791 x   230,784,287   2917  3085 1649  CM  26
smac 1.11        25,633,348  223,294,431     2,831 x   223,297,262   3930  4331 1616  CM  26
smac 1.12a       24,948,001  216,016,106     2,833 x   216,018,939   4463  4568 1565  CM  26
smac 1.13        23,322,767  202,011,435     2,818 x   202,014,253   6801  6502 1613  CM  26
smac 1.14        22,675,896  193,797,222     2,965 x   193,800,187   5943  6148 1577  CM  26
smac 1.15        22,303,381  191,064,676     3,074 x   191,067,750   6518  7313 1658  CM  26
smac 1.16        21,831,822  183,551,384     3,465 x   183,554,849   6949  7285 1542  CM  26
smac 1.17        21,816,272  183,459,153     3,429 x   183,462,582   5672  5867 1542  CM  26
smac 1.17a       21,816,272  183,459,153     3,429 x   183,462,582   5335  5613 1542  CM  26
smac 1.18        21,816,285  183,459,860     4,522 x   183,464,382   4901  5137 1544  CM  26
smac 1.19        21,816,323  183,459,942     4,361 x   183,464,303   4211  4257 1542  CM  26
smac 1.20        21,781,544  183,190,888     4,356 x   183,195,244   4249  4399 1542  CM  26

.1839 ppmd

See ppmonstr (above).

.1849 tc

TC 5.2 dev 2 is an experimental command line file compressor, currently under development by Ilia Muraviev. It takes no options.

                                   Compressed size      Decompresser  Total size   Time (ns/byte)
Program                           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------                         ----------  -----------  -----------  -----------  ----- -----  --- --- ----
tc 5.0 dev 1  (May  26 2006)    33,774,535  295,836,604     23,681 x  295,860,285    236   204      LZP   3
tc 5.0 dev 2  (June 10 2006)    32,417,139  283,039,249     22,659 x  283,061,908    270   244      LZP   3
tc 5.0 dev 4  (June 21 2006)    32,417,139  283,039,249     22,496 x  283,061,745    224   206      LZP   3
tc 5.0 dev 6  (July  6 2006)    29,544,971  257,416,397     28,528 x  257,444,925    279   279      PPM   3
tc 5.0 dev 7  (July  9 2006)    28,111,955  250,077,573     30,058 x  250,107,631    285   325   20 PPM   3
tc 5.0 dev 9  (July 18 2006)    27,801,253  246,923,158     30,106 x  246,953,264    363   385   24 PPM   3
tc 5.0 dev 11 (July 24 2006)    27,293,396  242,199,762     31,074 x  242,230,836    446   393   56 PPM   3
tc 5.1 dev 1  (Oct.  1 2006)    31,708,176  280,007,538     26,578 x  280,034,116    289   154   25 LZ
tc 5.1 dev 2  (Oct.  2 2006)    31,155,963  274,831,393     24,620 x  274,856,013    344   147   25 LZ
tc 5.1 dev 5  (Oct. 13 2006)    28,567,681  247,853,181     26,659 x  247,879,840    951   439  148 CM
tc 5.1 dev 7  (Dec. 18 2006)    27,934,960  241,898,216     40,104 x  241,938,320   1864   639  148 CM
tc 5.1 dev 7x (Jan. 13 2007)    27,888,899  241,088,655     41,265 x  241,129,920   1974   638  609 CM
tc 5.2 dev 2  (Feb.  7 2007)    21,481,399  184,939,711     41,112 x  184,980,823   3637  3655  230 CM

5.0 Dev 1 uses LZP. Dev 4 includes an improved hash table to conserve memory and a faster range coder compared to dev. 2, but compression is the same. Starting with 5.0 dev 6, LZP literals and match lengths are encoded using PPMC (PPM with fixed escape probabilities to lower orders). Dev 7 and 9 use order 3-1-0 PPMC.

tc 5.0 dev 11 (July 24, 2006) is the last of this series.

tc 5.1 dev 1 uses ROLZ (reduced offset LZ) with PPM order 1-0 for literals, offset set reduced with order 2 context, and a 16 MB dictionary.

tc 5.1 dev 2 has improved parsing and is archive compatible with dev 1.

tc 5.1 dev 5 uses ROLZ plus context mixing (instead of PPM) for order 2 literals.

tc 5.1 dev 7 uses improved parsing (flexible parsing) and adds SSE.

tc 5.1 dev 7x uses a larger dictionary.

tc 5.2 dev 2 uses FPW (fast PAQ weighting).

.1854 rings

rings 0.1 is a free, closed source, experimental file compressor by Nania Francesco Antonio, Sept. 21, 2007. It uses LZP with order-2 coding of literals and arithmetic coding. It takes no command line options.

rings 0.2 (Nov. 16, 2007) includes improved BMP, WAV, TIFF, and PGM filters.

rings 0.3 was released Dec. 21, 2007.

rings 1.0 was released Feb. 8, 2008. It uses 50 MB for compression and 43 MB for decompression.

rings 1.1 was released Feb. 13, 2008 with same memory usage. It uses CM with LZP preprocessing for faster compression.

rings 1.2 was released Mar. 4, 2008 with the same memory usage.

rings 1.3 was released Apr. 2, 2008. It uses 54 MB for compression and 47 MB for decompression.

rings 1.4c was released Apr. 14, 2008. It has an option (1-9) which selects memory usage. Each increment doubles usage. Memory usage and run time are greater for decompression than compression. For option 9, compression uses 526 MB and decompression uses 789 MB. The program uses BWT. The transformed data is encoded using MTF (move to front), pre-Huffman coding followed by arithmetic coding.

rings 1.5 was released Apr. 21, 2008. It improves compression and is symmetric with regard to memory usage. Options are like 1.4c. The table below compares timing results on my old and new computers.

rings 1.6 was released Aug. 16, 2009. The option ranges from 1 to 10, where 10 uses the most memory. It includes a Linux version (18,348 bytes zipped) which was not tested.

rings 2.0 (discussion) is a multi-threaded archiver rather than a file compressor. It uses BWT. It has an interface similar to zcm. Option -m7 selects maximum block size of 100 MB using 500 MB memory per thread. Option -t1 or -t2 selects 1 or 2 threads. On a 2 core machine, selecting 2 threads shows 3 processes in Windows Task Manager, two of which use 500 MB memory and I/O dividing the input and output files, and one process using 7 MB with several GB of input and a lot of kernel CPU time. These 3 processes must share 2 cores. As a result, it runs slower than 1 thread.

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------       ----------  -----------  -----------  -----------  ----- -----  --- --- ----
rings 0.1                       35,693,969  314,161,660     11,271 x  314,172,931    187   179   16 LZP
rings 0.2                       35,693,969  314,161,660     25,832 x  314,187,492    192   167   16 LZP
rings 0.3                       35,151,555  309,179,126     32,132 x  309,211,258    188   154   16 LZP
rings 1.0                       26,384,013  235,897,616     25,585 x  235,923,201    221   321   50 CM
rings 1.1                       26,793,247  238,353,988     27,513 x  238,381,501    151   255   50 CM
rings 1.2                       25,873,235  229,695,548     30,484 x  229,726,032    120   175   50 CM
rings 1.3                       25,873,235  229,695,548     43,329 x  229,738,877    104   163   54 CM
rings 1.4c        9             24,591,826  217,427,384     39,149 x  217,466,533    103   287  789 BWT
rings 1.5         9             21,848,093  191,067,972     44,565 x  191,112,537    172   189  426 BWT
rings 1.5         9             21,848,093  191,067,972     44,565 x  191,112,537    144   188  425 BWT  26
rings 1.6         10            21,918,217  189,242,552     47,618 x  189,290,170    165   192  795 BWT  26
rings 2.0         -m7 -t2       21,195,013  185,258,194    164,995 x  185,423,189    398   223  986 BWT  26
rings 2.0         -m7 -t1       21,194,965  185,256,848    164,995 x  185,421,843    375   206  493 BWT  26

.1857 bwtsdc

bwtsdc v1 (discussion) is a free, experimental file compressor with source code by David A. Scott and Yuta Mori. It takes no options. Memory usage is 5 times the file size. The program is bijective, meaning that any file is valid input to the decompresser, and no two inputs will decompress to the same file. In other words, there is an exact 1 to 1 mapping between uncompressed files and compressed files. The compressor uses multiple stages, each of which is bijective. The first stage is a BWT variant called BWTS (BWT Scottified) developed by Scott. In this variation, it is not necessary to store the starting point for the inverse BWT. This is achieved by dividing the input into a lexicographically nonincreasing sequence of Lyndon words. A Lyndon word is any subsequence that lexicographically precedes any of its rotations. The block is then sorted using contexts that wrap within Lyndon words rather than the whole block. The BWTS is followed by distance coding (DC, developed in part by Mori), and Fibonacci coding, where each stage is also bijective. The compressor is implemented as 3 programs called from a .bat file.

.1859 fbc

fbc v1.0 is a free, experimental file compressor for Windows by David Catt, Feb. 29, 2012. It is described as using BWT (divsufsort) with a fast adapting (rate 1/16) 14 bit context model consisting of an 11 bit history and 3 bits to encode the position in the current byte. The input is preprocessed using Eugene Shelwein's alphabet reordering preprocessor, BWT_reorder_v2. The argument 250000000 selects the block size in bytes. Memory usage if 5 x block size.

fbc v1.1, Mar. 2, 2012, fixes a memory allocation bug that caused decompression to fail for a block size of 333 MB. It automatically selects between 32 and 64 bit versions of divsufsort. Results are shown for the 64 bit version.

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------       ----------  -----------  -----------  -----------  ----- -----  --- --- ----
fbc v1.0        250000000       22,554,133  188,976,445     21,244 x  188,997,689    541   480 1225 BWT  26
fbc v1.1        333333334       22,554,133  185,975,548     23,576 x  185,999,124    451   415 1647 BWT  55

.1862 ppmvc

ppmvc v1.1 is a free, command line file compressor by Przemysław Skibiński, May 12, 2006, based on PPMd var. J by Dmitry Shkarin. It uses variable length contexts as described in the paper, P. Skibinski and Sz. Grabowski. Variable-length contexts for PPM. Proceedings of the IEEE Data Compression Conference (DCC04), pp. 409-418, 2004. Long matching strings are encoded as in high order ROLZ, encoded as an index to a matching context and a length.

The command line options are the same as in PPMd: -o8 selects order 8, -m256 selects 256 MB memory, -r1 partially rebuilds the model when memory is exhausted. I tuned the compressor to -o8 on enwik8. There are additional options related to VC compression (which must be specified during decompression), but I used the defaults since there is no guidance on how to set them in the program documentation. The paper suggests that the best values (and defaults) are to encode matches of context length order+1 with a minimum match length of 2*order, searching the last 8 to 16 contexts for the longest match. The effect is usually greatest for low order PPM.

.1869 chile

chile 0.3d-1 is a free, command line file compressor as C source code by Alexandru Mosoi, May 29, 2006. It uses BWT. The option -b40000 selects a block size of 40000 KB, which requires about 785 MB of memory for compression and 240 MB for decompression. Version 0.3d1 is identical to version 0.3d except that the maximum block size was increased from 2048 KB to 99999 KB. For this test the program was compiled for Windows using MinGW 3.4.2 as specified in the Makefile.

chile 0.4 (Jan. 27, 2007) introduces a faster algorithm for building suffix arrays that uses less memory (7N). The option -b=244141 selects the block size in Kb (to split enwik9 in 4 equal parts). It was compiled using MinGW gcc 3.4.5 with options -W -Wall -fomit-frame-pointer -g -O3 and tested in WinXP Home with 2 GB memory.

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
-------           -------       ----------  -----------  -----------  -----------  ----- -----  --- ---
chile 0.3d-1      -b40000       23,408,335  203,451,387     11,298 s  203,462,685   4957   435  785 BWT
chile 0.4         -b=244141     22,218,917  186,979,614     11,530 s  186,991,144   2513   512 1426 BWT

.1901 bwtdisk

bwtdisk 0.9.0, is a free, experimental, open source (GPL v3) file compressor by Giovanni Manzini, July 7, 2010. It uses BWT. Its purpose is to test the techniques for low memory BWT described in the paper Lightweight Data Indexing and Compression in External Memory by Ferrangina, Gagie, and Manzini, Proc. LATIN 2010. The forward BWT computes the suffix array in small segments, then makes multiple passes over the BWT output to merge the result. The external disk usage can be further reduced by compressing the input first with zlib or lzma and decompressing the input on each pass. The program is single threaded.

The program is supplied as source code only. It was compiled with g++ 4.6.3 using the supplied Makefile in Ubuntu on a Core i7 M620, 4 GB. There are two programs, the compressor "bwte" and decompresser "unbwti". The compressor computes a low memory BWT using at most the memory specified by the -m option (in MB). The -b option specifies how the BWT transformed input is to be compressed. -b 1 specifies zlib, -b 4 specifies lzma, and -b 2 specifies run length coding and range coding. There is no block size parameter. The input is compressed in a single block. Decompression requires 4 times the file size in memory, which used all of the test machine for enwik9 so was tested for enwik8 only. Compression of enwik9 with -b 4 failed (cannot create pipe).

                Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
-------           -------       ----------  -----------  -----------  -----------  ----- -----  --- --- ----
bwtdisk 0.9.0   -b 1 -m 3500    27,173,252                                           245   234  500 BWT  48
                -b 1 -m 3500    27,173,252  214,137,751    169,579 s  214,342,831   1124       3500 BWT  48
                -b 2 -m 3500    24,725,277                                           186   255  500 BWT  48
                -b 2 -m 3500    24,725,277  190,004,306    169,579 s  190,173,885   1124       3500 BWT  48
                -b 4 -m 3500    26,975,980                                           270   247  500 BWT  48

.1910 CTXf

CTXf 0.75 pre-beta 1 is a free, closed source command line archiver by Nikita Lesnikov, Sept. 20, 2003. It uses PPM with preprocessing for text, exe and multimedia files. The option -me selects extreme (best) compression. It uses about 78 MB memory in Windows task manager.

.1912 M03exp

m03exp-2005-01-27 is an experimental, closed source GUI file compressor by mij4x, Jan. 27, 2005. It uses BWT implementing the M03 algorithm by Michael A Maniscalco. with a maximum block size of 8MB. (Note on the GUI: to compress or decompress, drop a file on the program window. Right click to select options). m03exp-2005-02-15 (Feb. 15, 2005) supports blocks up to 32MB but is otherwise identical.

Block size    enwik8    Comp  Decomp (ns/byte approx)
----------  ----------  ----  ------
8 MB        23,461,984  3860   1840
32 MB       21,948,192  4800   2100

.1930 Stuffit

Stuffit 9.0 is a commercial GUI archiver by Allume Systems, now Smith Micro. This was the current version as of May, 2006. Note: their free 30 day trial required registration and a credit card number which was charged if you forgot to cancel. The options tested were:

  • Stuffit X: Method 4 - Best Text Compression, Level 16, Memory 25 (36.1MB), Optimizers On, Block mode On, Redundancy Off, Text Encoding None, Encrypt archive disabled, Segment archive disabled.
  • Stuffit X: Method 6 - Auto-picks the best method, Level 25, Memory 25 (68.6MB), Optimizers On, Block mode On, Redundancy Off, Text Encoding None, Encrypt archive disabled, Segment archive disabled.

    Stuffit 12.0.0.17 (compression technology version 12.0.0.21) was released Jan. 31, 2008. It includes lossless compression of JPEG and MP3 files and lossy recompression of zip archives, GIF, TIFF, PNG, and PDF files. It supports a native SITX format as well as zip, gzip, rar, bzip2, compress, tar, cab, and some more obscure formats. It is multithreaded for multicore support, although I tested it on a single core processor. I only tested the native general-purpose formats. For these tests, I used the command line programs console_stuff.exe and console_unstuff.exe to reduce the executable size and measure run time more accurately. The options are -m=1 (LZ77-Huffman), -m=2 (LZ77-arithmetic), -m=4 (PPM), -m=8 (BWT), -l (level 2-16, higher is slower but better), -x (memory extents, max 30, higher uses more memory). The best compression for text is -m=4 (PPM) with maximum memory -x=30. (In the GUI but not the command line, above 29 causes an out of memory error with 2 GB RAM). The -l option apparently has no effect on PPM. The decompresser size is based on console_unstuff.exe and the minumum set of 5 .dll files needed to run it (4 common plus Plugins/sitx.dll). The full GUI installer (without Office plugins) zips to 17,051,856 bytes. The tested version was a complimentary copy provided by the company.

    Stuffit 2009 13.0.0.19 (compression technology 13.0.0.24) was released Dec. 19, 2008. I tested as with Stuffit 12, however the technique of finding the minimal set of .dll files that I used in Stuffit 12 did not work (internal error) so I had to include the zipped distribution size (StuffIt2009.exe), which includes many other compression formats and a GUI. The tested version was a complimentary copy provided by the company.

                    Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Notes
    -------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- ---  -----
    Stuffit 9.0.0.21  Method 4 (best text)        24,310,583  210,801,103  1,015,808 x  211,816,911    542   503   36      12
                      Method 6 (auto-pick best)   24,419,299  212,392,465  1,015,808 x  213,408,273   2149         68      12
    Stuffit 12.0.0.17 -m=1 -l=16 -x=30            25,926,107                                          2540   420  298 LZ77
                      -m=2 -l=16 -x=27            24,874,987                                          3080    90  881 LZ77
                      -m=8 -l=16 -x=30            25,574,676                                           560   230  229 BWT
                      -m=4 -l=16 -x=28            23,482,855                                           730   694  274 PPM
                      -m=4 -l=16 -x=29            22,744,155                                           770   720  537 PPM
                      -m=4 -l=16 -x=30            22,105,654  190,372,707  2,658,122 xd 193,030,829    628   658 1062 PPM
    Stuffit 13.0.0.19 -m=4 -l=16 -x=30            22,105,658  190,372,711 21,611,401 x  211,984,112    567   604 1060 PPM  26
    

    .1933 plzma

    plzma_v3b ( discussion) is a free, closed source, experimental file compressor for Windows (32 and 64 bit versions) by Eugene Shelwien, Oct. 8, 2011. It uses LZMA (7zip equivalent) with a modified entropy encoder. plzma_v3c was released Mar. 19, 2012. Options are as follows:

                                                                 Compressed size      Decompresser  Total size   Time (ns/byte)
    Program       Options                                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  CMem Dmem Alg  Notes
    -------   ---------------                                 ----------  -----------  -----------  -----------  ----- -----   --- ---- ---  -----
    plzma_v3b c2  1000000000 999999999 273 8 0 0 6000 1 1 1 7 24,206,571  193,240,160    101,221 x  193,341,381   8889    55 10110  975 LZMA 58
              c2                                              24,778,033                                          2110   167   394   54 LZMA 26
    plzma_v3c e                                               25,182,314                                          2050    39   394   54 LZMA 26
              c                                               24,866,192                                          2060   164   394   54 LZMA 26
              c2                                              24,778,037  213,154,428     55,974 x  213,210,402   2086   149   394   54 LZMA 26
    

    .1933 crook

    crook v0.1 (discussion) is a free, open source file compressor by Jüri Valdmann, Mar. 5, 2012. It uses bit-level PPM. Because it predicts bits rather than bytes, there is no escape modeling. This is like DMC in that each bit-level context is mapped to a next-bit prediction and a count (equvalent to two counts of zeros and ones). But unlike DMC, it avoids the problem of duplicate states representing the same contexts, which would dilute the statistics and waste memory.

    Bits are modeled MSB first. Contexts are stored in a binary tree where the two child nodes represent the current context extended by one bit on the right. Each node also has a pointer to a suffix node, representing the current context shortened by one byte on the left. Contexts always begin on byte boundaries. Each context maps to a 22 bit prediction for the next bit (initialized to 0.5) and a count. When a bit is coded, the current node and all of its suffix nodes are updated by adjusting the prediction to reduce the error by 1/count and the count is incremented by 1 up to a limit of 32. The initial tree is bytewise order 0 (255 contexts) with initial counts of 12. Subsequent nodes are added with a count of 1.5 and a prediction inherited from its suffix node whenever there is no node to represent the 1 bit extension, and the new node becomes the current context.

    The option -m1600 limits memory usage to 1600 MiB. When memory is exhausted, no new nodes are added to the tree, but predictions and counts of existing nodes continue to be updated. The current context then becomes the suffix node if needed. The option -O8 limits the tree depth to bytewise order 8 (found to be optimal for both enwik8 and enwik9). When the current node reaches this depth, no child nodes are added, but existing nodes and their suffixes continue to be updated, just as if the memory limit were reached. Increasing the model order improves compression but also causes the tree to grow faster, which sometimes makes compression worse if the memory limit is reached sooner. The defaults are -m128 -O4.

    Compression and decompression require the same time and memory. Also, the same compression options must be given again during decompression. (I added 10 bytes to the decompresser size to account for this). The compressed file is arithmetic coded with the original file size saved in the first 4 bytes. File sizes are limited to less than 2 GiB. The program is distributed as source code only. To test, I compiled with g++ 4.6.1 in 32 bit Windows using the options recommended in the source comments.

                    Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Note
    -------           -------       ----------  -----------  -----------  -----------  ----- -----  --- ----
    crook v0.1       -m1600 -O4     25,693,515  229,770,948                              379   393  781  26
                     -m1600 -O5     23,664,987  207,093,726                              423   442 1641  26
                     -m1600 -O6     22,793,009  197,202,156                              446   475 1641  26
                     -m1600 -O7     22,505,951  193,896,089                              462   496 1641  26
                     -m1600 -O8     22,503,627  193,333,159      8,539 s  193,341,698    483   513 1641  26
                     -m1600 -O9     22,620,471  193,912,162                              479   519 1641  26
                     -m1600 -O10    22,752,285  194,794,021                              488   511 1641  26
                     -m1600 -O12    22,957,581  196,397,188                              492   505 1641  26
                     -m1600 -O16    23,105,056  197,631,364                              477   503 1641  26
    

    .1936 ppmx

    ppmx 0.01 is a free, experimental, closed source file compressor by Ilia Muraviev, released Nov. 25, 2008. It uses PPM with no filters. It takes no options.

    ppmx 0.02 was released Dec. 2, 2008. It uses order 9 PPM with hashed context tables, as discussed here. There is also a core 2 duo version which is faster, although it runs on only one core, and has a slightly larger executable. Note that the table below is misleading because on enwik8 the regular version compressed at 976 ns/byte (12% longer) and decompressed at 992 ns/byte (4.5% longer) than the core 2 duo version.

    ppmx 0.03 (discussed here) was released Dec. 22, 2008.

    ppmx 0.04 (discussed here) was released Jan. 5, 2008. It uses order 12-5-3-2-1-0 PPM and 280 MB.

    ppmx 0.05 (discussion), Jan 19, 2010, adds SEE (secondary escape estimation), more memory, and some optimizations.

    ppmx 0.06, released July 27, 2010, is designed for improved speed and less memory usage rather than compression ratio. It removes SEE and uses only a fixed order 4-2-1-0 model with hash tables. It has a P4 version for Pentium-4 and higher that is about 12% faster. This is the version tested. It has a larger executable (54,496 vs. 45,216).

    ppmx 0.07, Feb. 20, 2011, uses order 5-3-2-1-0-(-1) PPM with hash tables. Memory usage is increased to 302 MB.

    ppmx v0.08 (discussion), Jan. 1, 2012, uses order 6-4-2-1-0-(-1) PPM with hash tables and SEE improvements.

    ppmx 0.09 (discussion) was released Mar. 24, 2014.

                              Compressed size      Decompresser  Total size   Time (ns/byte)
    Program                  enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------                ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    ppmx 0.01              24,369,312  213,206,926     51,454 x  213,258,380    557   515  550 PPM 26
    ppmx 0.02              22,580,291  194,298,469     53,511 x  194,351,980    874   888  609 PPM 26
    ppmxcore2duo 0.02      22,580,291                  55,824 x                 871   949  609 PPM 26
    ppmx 0.03              22,572,808  193,643,464     54,964 x  193,698,428    777   784  609 PPM 26
    ppmx 0.04              23,150,510  201,384,355     52,406 x  201,436,761    791   801  280 PPM 26
    ppmx 0.05              22,905,422  196,548,444     53,476 x  196,601,920    863   882  576 PPM 26
    ppmx 0.06              26,131,726  235,257,572     54,596 x  235,312,168    276   317   71 PPM 26
    ppmx 0.07              23,941,730  211,671,802     44,104 x  211,715,906    314   352  302 PPM 26
    ppmx 0.08              23,204,040  202,868,559     54,098 x  202,922,657    397   420  355 PPM 26
                           23,204,040  202,868,559     54,098 x  202,922,657    107   127  355 PPM 53
    ppmx 0.09              25,952,954  232,581,333     50,873 x  232,632,206    122   150  279 PPM 48
                           25,952,954  232,581,333     50,873 x  232,632,206     57    69  279 PPM 63
    

    .1947 lzturbo

    lzturbo 0.01 is a free, experimental, closed source file compressor by Hamid Bouzidi, Aug. 15, 2007. There is some controversy over the origin of the source code. Discussion. Discussion.

    It uses LZ77 with arithmetic coding. The option -49 selects method 4 (1, 2, 4) and level 9 (1..9) for best compression. Other combinations were not tested. There is also a Linux version which was not tested. Memory usage fluxuates but peaks at 654 MB for compression and 90 MB for decompression. The Windows version produces read-only output files that must be set with "attrib -r" before they can be modified or deleted.

    lzturbo 0.1 (Oct. 5, 2007) is threaded for parallel execution on multicore machines. The maximum comprssion level is -59 where it uses 248 MB for compression and a peak of 72 MB for decompression. Other modes compress much faster. The read-only bug was fixed.

    lzturbo 0.9 was released Feb. 25, 2008. Decompression memory peaks at 79 MB.

    lzturbo 0.94 was released Apr. 11, 2009. The option -b59 selects method 5, compression level 9 for maximum compression. -b100 selects a block size of 100 MB for independent compression in separate threads. The default is 32 MB. -p0 forces the compressor to run on one core. By default the program runs on on all cores, but this causes the program to run out of memory with -59 because each thread uses 1450 MB. Decompression ran on 2 cores with a process time of 20 seconds per core and wall time of 28 seconds using about 300 MB memory. Faster modes tested below are run on 2 cores with average process time per core shown.

    Prog           Opt              enwik8      enwik9         prog       Total       Comp  Deco  Mem Alg  Note
    ------------   ---            ----------  -----------     ------    -----------   ----  ----  --- ---- ----
    lzturbo 0.01   -49            26,678,709  233,322,999     68,561 x  233,391,560   1412    50  654 LZ77
    lzturbo 0.1    -59            26,616,816  232,708,136    129,344 x  232,837,480   1385    49  248 LZ77
    lzturbo 0.9    -59            26,616,278  232,701,587    116,508 x  232,818,095   1420    52  248 LZ77
    lzturbo 0.94   -59 -b100 -p0  24,763,542  217,342,694    152,254 x  217,494,948   5196    20 1450 LZ77 26
                   -10            51,426,368                                            10     8   78 LZ77 26
                   -14            38,325,178                                            74    10  171 LZ77 26
                   -39 -b50       26,123,933                                          1290    16 1450 LZ77 26
                   -41            36,615,397  325,577,604    152,254 x  325,729,858     29    23  203 LZ77 26
    

    lzturbo 1.1, Apr. 29, 2013, runs only on 64 bit Windows and 64 bit Linux. The Linux version was tested under Ubuntu (note 48) using the non-static (smaller) executable. The 2 digit options -11...-49 select the compression method and level. The first digit can be 1..4 with higher numbers compressing better. The second digit can be 0, 1, 2, or 9 with higher numbers compressing slower without affecting decompression speed. The program gave an error during compression with -40, -41, -42.

    Option -b1000 selects a block size of 1000 MB. The default is -b24. Separate blocks can be compressed and decompressed in parallel. The test machine automatically selects 4 threads. Larger blocks improve compression but use more memory and allow fewer threads to be allocated. -b1000 causes it to use 1 thread since there is a single block. At level 9 (-19, -29, -39, -49), it is not possible to compress enwik9 with -b1000 on the 4 GB test machine because it will use over 6 GB memory and start disk thrashing. -p1 selects 1 thread. -p0 disables multi-threading.

                 Compression       Compressed size      Decompresser  Total size  Time (ns/byte)
    Program        Options        enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Note
    -------        -------      ----------  -----------  -----------  -----------  ----- ----- ---- ----
    lzturbo 1.1    -10 -b24     53,199,932                                             3     2       48
                   -10 -b1000   53,194,540                                             6     2       48
                   -11 -b24     47,619,485                                             6     2       48
                   -11 -b1000   47,611,974                                            12     2       48
                   -12 -b24     44,421,925                                            18     2       48
                   -12 -b1000   44,413,087                                            36     2       48
                   -19 -b24     41,929,879                                           493     2       48
                   -19 -b1000   41,920,122                                          1610     2       48
    
                   -20 -b24     49,736,192                                             3     2       48
                   -20 -b1000   49,725,239                                             6     3       48
                   -21 -b24     42,628,330                                             6     2       48
                   -21 -b1000   42,538,087                                            12     3       48
                   -22 -b24     39,541,490                                            18     2       48
                   -22 -b1000   39,210,560                                            35     3       48
                   -29 -b24     32,919,788                                           543     2       48
                   -29 -b1000   31,370,930                                          1760     4       48
    
                   -30 -b24     39,036,288                                             5     3       48
                   -30 -b1000   39,023,229                                            10     6       48
                   -31 -b24     35,632,652                                             7     3       48
                   -31 -b1000   35,572,973                                            13     6       48
                   -32 -b24     31,266,016                                            18     3       48
                   -32 -b1000   30,753,365                                            38     6       48
                   -39 -b24     26,892,107                                           573     3       48
                   -39 -b1000   25,298,784                                          1838     6       48
    
                   -49 -b24     25,870,196  225,397,956     110,565 x  225,508,521   792    13 1702  48
               -p1 -49 -b200    24,416,777  207,335,845     110,565 x  207,446,410  2566    17 3200  48
                   -49 -b1000   24,416,777                                          2110    20       48
               -p0 -49 -b1000   24,416,777  194,681,713     110,670 x  194,792,383  1920     9 14700 59
    

    .1956 enc

    enc 0.15 is an experimental, closed source command line archiver by Serge Osnach, Feb. 14, 2003. It uses PPM and CM (in PaQ mode). It tries up to 5 different compression methods (depending on options) and chooses the best one. The methods are ("a" means "add to archive"): Methods ae and ab with options -o8 -d256 were found to give the best compression on enwik7 (first 107 bytes). These methods discard the model when the memory limit is reached, and this was observed to happen (in task manager), so these options should hold for larger files. However with -d127 (necessary to decompress), method aq gives the best compression.

    .1966 comprolz

    comprolz 0.1.0 (discussion) is a free, open source, experimental file compressor by Zhang Li, Oct. 7, 2012. It uses ROLZ. The option -b256 selects the maximum block size. During compression it uses 60-65% of two cores. Decompression uses one core.

    Only source code was provided. It was compiled for 32 bit Windows Vista using MinGW 4.6.1 using "gcc -O3 *.c".

    comprolz 0.2.0 was released Oct. 16, 2012. It includes the -f option to select flexible parsing. It is slower but compresses better.

    comprolz 0.10.0 (discussion) was released Nov. 25, 2012. It includes a dictionary derived from the first 10 MB of enwik8. To test, it was compiled as suggested in the documents using gcc 4.7.0 with options "-O3 -fomit-frame-pointer -mno-ms-bitfields". Source code is shared with comprox 0.10.0. The executable, packed with UPX, is smaller.

    comprolz 0.11.0 was released Dec. 17, 2012. The program builds a dictionary from the input instead of using a static dictionary. 32 bit executables are included for Windows and Linux. The Windows version was tested.

    comprolz 0.11.0-bugfix1, Dec. 18, 2012, fixes a bug that caused poor compression.

    Compressor         Opt            enwik8      enwik9         Prog      Total        Comp Decomp  Mem Alg  Note
    ---------          ---          ---------   -----------     -------  -----------    ----  ----   --- ---- ----
    comprolz 0.1.0     -b256        24,835,082  215,770,703     41,170 s 215,811,873     595   262   602 ROLZ 26
    comprolz 0.2.0     -b250 -f     24,280,609  210,255,761     43,899 s 210,299,660    1415   319   666 ROLZ 26
    comprolz 0.10.0    -b250 -f     23,050,103  198,635,448     82,824 x 198,718,272    1086   333   595 ROLZ 26
    comprolz 0.11.0    -b250 -f     23,687,477  213,585,466     29,509 x 213,614,975    1608   324   866 ROLZ 26
    comprolz 0.11.0b1  -b250 -f     22,813,215  196,651,379     29,453 x 196,680,832     984   308   688 ROLZ 26
    

    .1971 sbc

    sbc 0.970r2 is a free, closed source command line archiver and file encryptor by Sami, June 27 2005. Compression options suggest it uses BWT. The -m3 option selects maximum compression, requiring 32 MB memory (-m1 is minimum). The -b63 option selects maximum block size (32 MB, requiring 192 MB additional memory). -ad disables adaptive block size reduction for homogeneous data. SBC runs faster with smaller block sizes and minimum compression as shown:

                  Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
    Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  
    -------         -------         ----------  -----------  -----------  -----------  ----- -----  
    sbc 0.970r2     -ad -m3 -b63    22,470,539  197,066,203     99,094 xd 197,165,297   1733   313
    sbc 0.970r2     -ad -m1 -b31    23,288,217                  99,094 xd                620   230
    sbc 0.970r2     -ad -m1 -b1     27,087,118                  99,094 xd                300   180
    

    .1984 WinRAR

    WinRAR 3.60 beta 3 is a commercial (free trial) Windows GUI and command line archiver by Eugene Roshal, May 8, 2006. It produces rar and zip archives and decompresses many other formats. It also encrypts and performs other functions. The best compression mode uses PPM (actually ppmd var. I, an earlier version of ppmd J) with optimizations for text and other formats (exe, wav, bmp). The -mc7:128t+ option says to use PPM order 7, 128 MB memory (maximum) and force text preprocessing. The -sfxWinCon.sfx option says to produce a self extracting console executable (adding 79,360 bytes).

    The model order was tuned on enwik8. Additional results are shown for order 10, for -m5 (maximum compression), and for normal compression as a .exe and .rar file. The decompresser in the last case is zipped unrar.exe.

    WinRAR 4.20 was released June 9, 2012. It costs $29 with a 40 day free trial as of Feb. 1, 2013. Options are the same. -m1 through -m5 select compression level. The default is -m3. The algorithm is LZ77 with a 4 MB window. -mc7:128t+ selects PPM, order 7, with maximum 128 MB memory. Time and memory to decompress with PPM is about the same as compression.

    WinRAR 5.00b2 was released Apr. 29, 2013. It includes a larger dictionary, up to 1 GB for the 64 bit version and 256 MB for the 32 bit version. Option -ma5 selects the new archive format, which is not compatible with v4.20 or earlier. The default is the older format. In the newer format, option -mc is silently ignored. Option -m3 is the default compression level.

                          Compression              Compressed size      Decompresser  Total size   Time (ns/byte)
    Program                 Options               enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------         --------------------------  ----------  -----------  -----------  -----------  ----- ----- ---- ---- ----
    WinRAR 3.60b3   -mc7:128t+  -sfxWinCon.sfx  22,713,569  198,454,545          0 xd 198,454,545    506   415
                    -mc10:128t+ -sfxWinCon.sfx  23,233,523                       0 xd                770
                    -m5         -sfxWinCon.sfx  24,832,649                       0 xd                680   520
                                -sfxWinCon.sfx  29,828,890                       0 xd                780    40
                                                29,749,530                  98,888 xd                780    40
    WinRAR 4.20     -m1                         40,234,511                                            36    32   99 LZ77  26
                    -m2                         30,564,700                                           180    29   99 LZ77  26
                    -m3 (default)               29,671,175                                           325    30   99 LZ77  26
                    -m4                         29,329,237                                           484    30   99 LZ77  26
                    -m5                         29,225,016                                           590    30   99 LZ77  26
                    -mc5:128t+                  23,440,773                                           358        229 PPM   26
                    -mc6:128t+                  22,701,033                                           418        229 PPM   26
                    -mc7:128t+                  22,635,718  198,372,701    141,019 xd 198,513,720    440   373  229 PPM   26
                    -mc8:128t+                  22,769,557                                           518   456  229 PPM   26
                    -mc10:128t+                 23,153,065                                           582        229 PPM   26
                    -mc12:128t+                 23,401,290                                           609        229 PPM   26
    WinRAR 5.00b2   -mc7:128t+                  22,635,718  198,372,701    153,763 x  198,526,464    433   368  226 PPM   26
                    -ma5 -m1                    40,565,268                                            54    31  406 LZ77  26
                    -ma5 -m2                    29,758,785                                           228    30  435 LZ77  26
                    -ma5 -m3                    28,662,794                                           439    32  435 LZ77  26
                    -ma5 -m4                    28,072,832                                           751    31  435 LZ77  26
                    -ma5 -m5                    27,835,431                                          1004    31  435 LZ77  26
    

    .1986 quark

    quark v0.95r beta is a free, closed source command line file compressor by Frederic Bautista, Mar. 10, 2006. It uses LZ. It is characterized by high compression and fast decompression. The -m1 option selects relative mode compression, which is normally best, but slowest. The -d25 option selects a dictionary size of 225 which is the largest that will run without thrashing with 1 GB RAM. The -l8 option selects the search depth. Higher values normally improve compression (up to -l13, default -l4), but -l8 was the highest practical value for reasonable compression speed (7.5 hours). Also, larger values were found to hurt compression on enwik5. Compression time increases approximately exponentially with the -l value. The compression speed with -l13 is 6,100,000 ns/byte.

    .1994 lzip

    plzip is a free, open source file compressor by Antonio Diaz Diaz, Feb. 16, 2010. It is "parallel lzip", compatible with lzip, but multi-threaded for parallel execution. It uses LZMA (LZ77 with arithmetic coding). The -9 option selects maximum compression. It has a command line interface similar to gzip. When it compresses, it removes the original file and adds a .lz extension.

    lzip and plzip are written for Linux. A Windows port by Christian Schnaader on May 2, 2010 was tested. On my test computer (2 core T3200, 2 GHz), compression showed 180% CPU and decompression showed 117%.

    lzip 1.14-rc3 was released Jan. 15, 2013.

    Compressor     Opt            enwik8      enwik9         Prog       Total       Comp Decomp  Mem  Alg  Note
    ---------      ---          ---------   -----------     -------   -----------   ----  ----   ---  ---- ----
    plzip          -9           25,578,352  221,845,216     56,614 x  221,901,830   1308    37   1028 LZ77  26
    lzip 1.14-rc3  -9 -s512MiB  24,756,063  199,410,543     21,682 s  199,432,225   2409    21   5632 LZ77  57
    

    .1995 comprox

    comprox_sa 20110927 (discussion) is a free, experimental, open source file compressor by Zhang Li, Sept. 27, 2011. It uses LZSS (in 4 MB blocks) followed by arithmetic coding. The program takes no arguments. It uses 60 MB memory for compression and 6 MB for decompression. It runs in both Windows and Linux. Only the Windows version was tested.

    Version 20110928 was released Sept. 28, 2011. Compression runs in 2 threads. Both the Windows and Linux versions were tested (on different computers).

    Version 20110929 was released Sept. 29, 2011. Decompression also runs in 2 threads. Compression is slightly improved.

    comprox version 0.1.1, Oct. 10, 2011, replaces comprox_sa. It is a rewrite using LZ77 (instead of LZSS) and arithmetic coding. It takes a compression level 0 (fastest) to 9 (best) with a default of 5. All levels use the same memory, 218 MB for compression and 44 MB for decompression. The Linux version reports the same resident memory as Windows but higher virtual memory: 236 MB to compress and 284 MB to decompress. Both compression and decompression run in 2 threads. Reported times are real times.

    comprox 0.6.0 was released Aug. 24, 2012. It uses static 4K dictionary encoding followed by LZ77 and arithmetic coding. It was released as open source (3 clause BSD) C code only. For testing, it was compiled using g++ 4.6.1 as "gcc -O3 *.c" under 32 bit Windows. The option e200 means to use a 200 MiB block size. The default is e16. Larger blocks improve compression but use more memory. The program crashed with e250 or larger.

    comprox 0.7.0 (discussion) was released Sept. 10, 2012. It includes multi-threaded compression and other improvements. It includes a static English dictionary with about 3000 common words. It was tested in 64 bit Linux compiled with "gcc -O3 *.c -lpthread" and in 32 bit Windows compiled with "gcc -O3 *.c -lpthread -Wl,--stack,8000000".

    comprox v0.8.0 was released Sept. 26, 2012 with better compression. The Linux version was compiled with "gcc -O3 -march=native *.c -lpthread". The Windows version was compiled as before.

    comprox 0.8.0-bugfix1, Sept. 27, 2012, fixed a bug that caused compression to crash on some input files. It was compiled with MinGW 4.6.1 with "gcc -O3 -msse2 -s -Wl,--stack,8000000 *.c -lpthread".

    comprox 0.9.0 was released Oct. 16, 2012. The -b option sets the block size in MB. Default is -b16. -m sets number of matches to check. Default is -m40. -f selects flexible parsing. To test, the program was compiled "gcc -O3 -march=native -s *.c" as above.

    comprox 0.10.0 (discussion) was released Nov. 25, 2012. It includes a dictionary derived from the first 10 MB of enwik8. To test, it was compiled as suggested in the documents using gcc 4.7.0 with options "-O3 -fomit-frame-pointer -mno-ms-bitfields". Source code is shared with comprolz 0.10.0. The executable, packed with UPX, is smaller.

    comprox 0.11.0 was released Dec. 17, 2012. It builds a dictionary from the input rather than use a static dictionary. Executables are included for 32 bit Windows and Linux. These compressed smaller than the source code. The compressor crashed with -b250 (250 MB block size) on enwik9, but -b200 worked. -m100 selects the match search limit (default -m40). -f selects flexible parsing. Using large -m makes compression time nonlinear, i.e. increasing from 75s to 2115s from enwik8 to enwik9.

    comprox 0.11.0-bugfix1, Dec. 18, 2012, fixes a bug that caused poor compression.

    Compression                                      Compressed size      Decompresser  Total size   Time (ns/byte)
    Program    Version          Opt                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  CMem Dmem Alg Note
    -------    --------         ---               ----------  -----------  -----------  -----------  ----- -----  ---- ---- --- ----
    comprox_sa 20110927 (Win32)                   32,654,393  287,588,097      3,791 s  287,591,888    398   101    60    6 LZSS 26
    comprox_sa 20110928 (Win32)                   32,654,718  287,590,343      3,790 s  287,594,133    205   101   122   10 LZSS 26
    comprox_sa 20110928 (Linux)                   32,654,718  287,590,343      3,790 s  287,594,133    126    59   141   10 LZSS 48
    comprox_sa 20110929 (Win32)                   32,652,597  287,575,768      3,774 s  287,579,542    209    71   122   12 LZSS 48
    comprox_sa 20110929 (Linux)                   32,652,597  287,575,768      3,774 s  287,579,542    116    37   145   36 LZSS 48
    comprox 0.1.1       (Win32)  0                29,463,135                                           146    65   219   44 LZ77 26
                        (Win32)  5                28,836,139                                           290    65   218   43 LZ77 26
                        (Win32)  9                28,586,545  250,565,797      5,430 s  250,571,227    768    57   218   43 LZ77 26
                        (Linux)  9                28,586,545  250,565,797      5,430 s  250,571,227    496    29   218   44 LZ77 48
    comprox 0.6.0       (Win32) e200              25,504,328  221,405,873     23,367 s  221,429,240    484    92  1567  590 LZ77 26
                                e16               26,816,904                                           395   132   169   68 LZ77 26
    comprox 0.7.0       (Linux) e200              25,068,368  217,403,007     36,702 s  217,439,709    225    52  1000  410 LZ77 48
                        (Win32) e200              25,068,368  217,403,007     36,702 s  217,439,709    390   126  1107  472 LZ77 26
                        (Linux) e500              25,068,368  212,824,614     36,702 s  212,861,316    260    57  2500 1100 LZ77 48
                        (Linux) e700              25,068,368  212,348,904     36,702 s  212,385,606    309    49  3400 1500 LZ77 48
    comprox 0.8.0       (Win32) e200              24,537,383  212,651,678     42,764 s  212,694,442    460   128  1143  279 LZ77 26
                        (Linux) e500              24,537,383  208,328,173     42,764 s  208,370,937    296    49  2500  558 LZ77 48
    comprox 0.8.0-bugfix1 (Win) e200              24,537,453  212,652,159     42,804 s  212,694,963    480   145  1108  281 LZ77 26
    comprox 0.9.0       (Win32) -b250 -f -m100    24,243,078  208,369,181     46,387 s  208,415,568   1657   130  1405  326 LZ77 26
                                -b250 -f          24,281,529                                           748   161   733  164 LZ77 26
                                -b250             24,486,987                                           398   160   733  164 LZ77 26
                                                  25,494,243                                           317   167   151   86 LZ77 26
    comprox 0.10.0      (Win32) -b250 -f -m100    23,332,113  201,288,183     86,687 x  201,374,870   1209   151  1271      LZ77 26
    comprox 0.11.0      (Win32) -b200 -f -m100    23,990,134  217,340,709     34,176 x  217,374,885   2115   144  1211      LZ77 26
                        (Win32)                   25,003,709  234,265,741     34,176 x  234,299,917    436   145   269      LZ77 26
    comprox 0.11.0-bugfix1(Win) -b250 -f -m100    23,064,386  199,515,912     34,176 x  199,550,088    917   153   688      LZ77 26
                                                  23,861,257  209,481,309     34,176 x  209,515,485    307   162   196      LZ77 26
    

    .2018 bssc

    bssc 0.95a is a free command line file compressor by Sergeo Sizikov, 2005. It uses BWT. The -m16383 option selects the maximum block size of 16383 KB (uses 140 MB memory).

    .2024 flashzip

    flashzip 0.1 is a free, closed source file compressor by Nania Francesco Antonio, Jan. 10, 2008. It uses LZP and arithmetic coding.

    flashzip 0.2 was released Jan. 11, 2008. It is compatible with version 0.1 but faster. Note: in both versions, CPU utilization during compression is about 28% to 35%. Times shown are process times.

    flashzip 0.3 was released Feb. 4, 2008. It uses ROLZ plus arithmetic coding. It takes an option x for better compression (slower) and 1 through 5, where 5 is the slowest (best compression).

    flashzip 0.9 was released June 28, 2008. Option -m2 selects method 2 (default is -m1). -b1 through -b5 select buffer size, which affects memory usage. Default is -b3. -s1 through -s7 selects match length and speed. Default is -s1 (fastest, worst compression).

    flashzip 0.91 was released Aug. 17, 2008. Options are like version 0.9. Memory usage was increased to 198 MB for compression and 138 MB for decompression using settings for best compression. Minimum requirement is 10 MB and 6 MB.

    flashzip 0.93a was released Mar. 9, 2009.

    flashzip 0.94 was released Mar. 25, 2009.

    flashzip 0.99 was released July 23, 2009.

    flashzip 0.99b4 (Aug. 25, 2009) is an archiver rather than a compressor. The -s option was renamed to -c and the -b option was increased to -b8 to allow more memory usage. For enwik8, memory usage for both -m1 and -m2 is 182 MB for compression and 162 MB for decompression. For enwik9, memory usage for -m2 is 609 MB for compression and 592 MB for decompression.

    flashzip 0.99b8 (Feb. 28, 2010) has 4 compression levels from -m0 (fastest) to -m3 (best). The buffer size option was increased to -b9 (1 GB). Memory usage depends on the input size. For -m0 -c7 -b7 enwik8, compression takes 214 MB and decompression takes 195 MB. For -m1 through -m3 -c7 -b8, enwik8 compression takes 231 MB and decompression takes 195 MB. For -m3 -c7 -b8, enwik9 compression takes 658 MB and decompression takes 625 MB. Changing -b8 to -b9 has no effect on size, speed, or memory usage for enwik8, but for enwik9 it improves compression and increases memory usage to 1111 MB for compression and 1078 MB for decompression. The -s1 option enables the -b9 option. Otherwise -b9 will cause a "no memory" error.

    flashzip 0.99c1 (June 1, 2011) improves compression and speed. The option ranges are -m0...-m3, -c1...-c7 and -b1...-b7. Only the maximum compression options were tested.

    flashzip 0.99c3 (Oct. 10, 2011) is multi-threaded for compression in modes -m1, -m2, -m3. Decompression runs in a single thread. The archive is compatible with the previous version. In the tested mode (maximum compression), memory usage depends on the file size and climbs steadily during compression or decompression. It is the same for either, and same as the previous single threaded version.

    flashzip 0.99d1 was released Oct. 31, 2011. It has only two options, -m0...-m9 (default -m4) for compression method (fastest...best) and -b1...-b7 (default -b1) for buffer size. Memory usage ranges from 30 MB at -b1 to 1100 MB at -b7.

    flashzip 1.0.0 was released Oct. 3, 2012. Options -m1 to -m7 select compression -mx7 compresses best. Higher levels compress slower and use more memory but have little effect on decompression speed, which is generally faster. Decompression uses the same memory as compression, up to 1.1 GB depending on the file size. Options -b1 to -b7 select buffer size. Larger values use more memory but don't affect speed. The default is -b4. The program can use up to 8 threads and auto-detects the number of available cores. In the high compression modes tested, only 1 of 2 available cores was used. -e creates a self extracting archive. It extracts to the saved name using both cores.

    flashzip 1.1.2 was released Dec. 12, 2012. It includes a GUI that calls the command line version. The command line version was tested. The compression options were changed to -m0..-m3 and -mx0..-mx3, with -mx3 selecting maximum compression. Option -k0..-k7 select ROLZ dictionary size with -k7 using 256 MB for best compression using the most memory. -b1024 selects a buffer size of 1024 MB for best compression but using the most memory. There is a -t option for multi-threaading which defaults to -t1 to select a single thread. Using more threads makes compression worse. The -e option creates a self extracting archive by appending the compressed file to a copy of flashzip.exe, and therefore does not compress any smaller when the decompresser is included.

                    Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------           -------         ----------  -----------  -----------  -----------  ----- -----  --- ---  ----
    flashzip 0.1                      34,053,198  299,443,551     25,734 x  299,469,285     67    51   47 LZP
    flashzip 0.2                      34,053,198  299,443,551     25,257 x  299,468,808     62    52   47 LZP
    flashzip 0.3      5               28,541,292  248,094,851     26,738 x  248,121,589    297    73   86 ROLZ
                      x 5             27,845,033  241,997,412     26,738 x  242,024,150    673    72   86 ROLZ
    flashzip 0.9      (-m1 -s1 -b3)   31,856,012                                           141   124   83 ROLZ
                      -b1             32,088,940                                           148   125   70 ROLZ
                      -b5             31,764,213                                           143   119  132 ROLZ
                      -s4             29,235,064                                           269    99   83 ROLZ
                      -s7             28,370,670                                           928    87   83 ROLZ
                      -m2             31,641,305                                           188   121   83 ROLZ
                      -m2 -s7         27,665,526                                          2081    97   83 ROLZ
                      -m2 -s7 -b5     26,737,801  230,987,395     30,052 x  231,017,447   2476    75  132 ROLZ
    flashzip 0.91     -m2 -s7 -b5     26,068,507  227,945,252     34,222 x  227,979,474   3560   112  198 ROLZ
                      -m1 -s7 -b5     26,851,582                                          1305   127  198 ROLZ
    flashzip 0.93a    -m2 -s7 -b5     26,243,745  227,048,196     36,367 x  227,084,563   1458    95  132 ROLZ
                      -m1 -s7 -b5     27,004,639                                          1030   140  198 ROLZ 26
    flashzip 0.94     -m2 -s7 -b5     26,236,095  226,981,882     35,996 x  227,017,878   2451    87  132 ROLZ 26
                      -m1 -s7 -b5     26,662,405  230,985,291     35,996 x  231,021,287   1275    84  198 ROLZ 26
    flashzip 0.99     -m2 -s7 -b5     26,027,791  224,648,225     37,361 x  224,685,586   2399   110  198 ROLZ 26
                      -m1 -s7 -b5     26,305,210                                          1230   160  132 ROLZ 26
    flashzip 0.99b4   -m2 -c7 -b8     25,804,706  218,328,751    141,207 x  218,469,958   3037    86  609 ROLZ 26
                      -m1 -c7 -b8     26,255,893                                          1580    97  182 ROLZ 26
    flazhzip 0.99b8   -m0 -c7 -b8     29,191,973                                           200   110  214 ROLZ 26
                      -m1 -c7 -b8     27,752,588                                           510   110  231 ROLZ 26
                      -m2 -c7 -b8     26,351,718                                          1420   110  231 ROLZ 26
                      -m3 -c7 -b8     26,008,189  220,193,756    119,185 x  220,312,941   3281    84  658 ROLZ 26
                  -s1 -m3 -c7 -b9     26,008,189  218,405,144    119,185 x  218,524,329   3531    89 1111 ROLZ 26
    flashzip 0.99c1   -m3 -c7 -b7     24,840,311  206,005,639    131,128 x  206,136,767   2139   117 1050 ROLZ 26
    flashzip 0.99c3   -m3 -c7 -b7     24,840,025  205,992,947    246,816 x  206,239,763   1925   112 1050 ROLZ 26
    flashzip 0.99d1                   28,022,537                                           253    92   46 ROLZ 26
                              -b7     28,088,756                                           542   102  127 ROLZ 26
                      -m9     -b7     24,363,049  207,354,714    170,353 x  207,525,067   1180    94 1100 ROLZ 26
    flashzip 1.00                     26,788,895                                           168   127   37 ROLZ 26
                              -b7     26,761,559                                           174   123   91 ROLZ 26
                      -m7     -b7     26,761,559                                           762   130  136 ROLZ 26
                      -mx7    -b7     23,869,034  202,363,445    123,053 x  202,486,498   1296   122  802 ROLZ 26
                      -mx7 -e -b7     23,995,498  202,489,909          0 x  202,489,909   1123   123  840 ROLZ 26
    flashzip 1.12     -mx3 -k7 -b1024 24,726,693  211,104,283    151,961 x  211,256,255    581    94 1152 ROLZ 26
    

    .2065 lzham

    lzham alpha 2 is a free, open source (MIT license) file compressor and library by Richard Geldreich Jr., Aug. 21, 2010. LZHAM is short for LZMA-Huffman-Arithmetic-Markov. It is based on LZMA (7zip) but instead of using arithmetic coding throughout, it uses them only for binary decisions and uses Huffman or Polar codes for literal and match codes. A Polar code is similar to a Huffman code but is simpler to calculate at a cost of 0.1% in compression. Polar codes are calculated as follows:

    1. Symbols are sorted from highest to lowest frequency.
    2. The total frequency is rounded up to a power of 2.
    3. Individual frequencies are rounded down to a power of 2.
    4. Individual frequencies are doubled in descending order until the sum is equal.
    5. Step 4 is repeated as needed.
    6. At this point all codes have frequencies that are a power of 1/2 and codes are assigned.
    For example, if the symbols and their frequencies are A=3, B=2, C=1, then the sum (6) is rounded up to 8 and the individual frequencies are rounded down to A=2, B=2, C=1, which sums to 5. We then double A=4, which sums to 7. We cannot double B=4 because the sum would exceed 8, so we continue to C. At this point we have A=4, B=2, C=2, which sums to 8, and we may assign codes of appropriate lengths such as A=0, B=10, C=11.

    For this test, lzhamtest_x86 was used. There is a _x64 version for 64 bit machines which is faster. The library supports different speeds and dictionary sizes, but the test program does not have any options to select them, so none were used. Decompression uses 67 MB memory vs. 609 MB for compression. Compression uses both cores on the test machine but decompression uses only one.

    Version alpha 3, Aug. 30, 2010, supports all of the options suppored by the library. Option -d26 selects 64M dictionay, the largest supported by the x86 version. (The x64 version supports up to -d29 = 512M). -m4 selects "uber" compression mode. There are 5 compression levels from -m0 through -m4. The highest two levels use Huffman codes rather than Polar codes. -t2 says to use 2 helper threads (to match the number of cores on the test machine). The default is to use 1 less than the number of cores, up to 16 threads. Decompression is not multi-threaded.

    The x64 version was tested by the author. I guessed at memory usage. Each increment of the -d option approximately doubles memory usage.

                           Compression               Compressed size      Decompresser  Total size   Time (ns/byte)
    Program                  Options                enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------                  -------             ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    lzham alpha 2 x86                             25,907,665  224,554,163     95,922 x  224,650,085   2485    21  609 LZ77 26
    lzham alpha 3 x86     -m4 -d26 -t2            24,991,681  213,868,601    139,694 x  214,008,295   2970    22  611 LZ77 26
    lzham alpha 3 x64     -m4 -d29                24,954,329  206,393,809    155,282 x  206,549,091    595     9 4800 LZ77 45 
    

    .2081 uharc

    uharc 0.6b is a free (for noncommercial use) closed source command line archiver by Uwe Herklotz, Oct. 1, 2005. In maximum compression mode (-mx) it uses PPM. In modes -m1 (fastest) to -m3 (best) it uses ALZ: LZ77 with arithmetic coding. -mz uses LZP. -md32768 selects maximum dictionary size (uses 50 MB memory, default is -m4096). Additional results for enwik8:
    Options         enwik8    Comp  Decomp (ns/byte)
    -------       ----------  ----  ------
    -mx -md32768  23,911,123  1830  1510
    -mx           23,952,039  1832  1546
    -m3           27,957,245  1840   110
    -m2           28,459,084  1726   110
    -m1           29,660,279  1242   121
    -mz           30,429,795   191   236
    

    .2088 TarsaLZP

    TarsaLZP Aug 8 2007 is a free, experimental file compressor with public domain source code (FASM) by Piotr Tarsa.

    Older versions used order 3 LZP to code the last 16 matches at order 3, followed by order 2 PPM encoding of literals. It takes no command line options but compression/decompression settings may be specified in an initialization file. For this test, default settings were used and others were not tried.

    The Jul 30 2007 version uses 2 LZP models, one with a 4 byte context and one 8 byte. The program selects the one that gives a higher probability of a match. There is no initialization file.

    The Aug 8 2007 version uses 341 MB memory for compression and 333 MB for decompression.

    The interim Aug 10 2007 version runs at high priority. (CAUTION, this will make your computer unusable while running).

    TarsaLZP 29 Jan 2012 is distributed as Java source and class files. It has a GUI interface.

    TarsaLZP 18 Nov 2012 takes several options, but defaults were used for testing. It is available as source code in Python, Java, Javascript, and C. The C version was tested by compiling with MinGW gcc 4.7.0 with options "-O3 -std=c99" in 32 bit Vista.

                              Compressed size      Decompresser  Total size   Time (ns/byte)
    Program                  enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------                ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    TarsaLZP Jul  4 2006   35,745,297  334,661,013      2,255 sd 334,663,268    149   163   54 LZP
    TarsaLZP Jul 30 2006   34,321,697  320,160,237      1,455 xd 320,161,692    110   117   54 LZP
    TarsaLZP Aug  5 2006   32,270,002  295,312,202      1,579 xd 295,313,781    110   127   70 LZP
    TarsaLZP May  6 2007   32,461,606  297,130,840      1,580 xd 297,132,420     97   121   71 LZP
    TarsaLZP Jun 17 2007   31,233,381  283,895,945      1,604 xd 283,897,549    100   122   71 LZP
    TarsaLZP Jul 18 2007   31,363,533  285,248,058      2,365 xd 285,250,423     88   105   71 LZP
    TarsaLZP Jul 30 2007   26,664,933  233,613,937      2,472 xd 233,616,409    247   255   42 LZP
    TarsaLZP Aug  8 2007   25,134,862  215,301,412      2,843 xd 215,304,255    249   287  341 LZP
    TarsaLZP Aug 10 2007   25,135,357  215,301,079      3,546 xd 215,304,626    269   322  341 LZP
    TarsaLZP Jan 29 2012   24,751,389  208,867,187     13,081 s  208,880,268    203      ~2000 LZP 54
    TarsaLZP Nov 18 2012   24,860,676  211,990,481     20,303 s  212,010,784    244   277  330 LZP 26
    

    .2090 GRZipII

    GRZipII 0.2.4 is a free, open source (LGPL) command line file compressor by Grebnov Ilya, Feb. 12, 2004. It uses BWT. The -b8m option selects the maximum block size of 8 MB.

    .2091 4x4

    4x4 0.2a is a free, open source file compressor by Bulat Ziganshin, June 2, 2008. It is a wrapper around GRZipII, tornado, and LZMA (7zip), and a subset of the FreeARC archiver. Source code is included in the FreeARC distribution. The program allows arguments to be passed to each compressor, plus 16 preset options. Only the fastest and slowest preset option for each compressor was tested. Options 1-7 are tornado, 8-12 are LZMA, and 1t-4t are GRZipII.
                    Compression               Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options                enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------     -------------------------- ----------  -----------  -----------  -----------  ----- -----  --- ---
    4x4 0.2a    1 (tor:1:4m)               59,711,544                                            17    13   54 LZ77
                7 (tor:7:64m)              32,433,532                                           197    24  230 LZ77
                8 (lzma:fast:128m:ht4:mc8) 32,698,603                                           292    43  230 LZ77
               12 (lzma:128m:ht4:mc128)    27,307,504                                          4354    43  230 LZ77
               1t (grzip:m4)               26,576,294                                           167   232  128 BWT
               4t (grzip:m1:h18)           23,833,244  208,787,642     317,097 x 209,104,739    386   240  269 BWT
    

    .2101 rzm

    rzm 0.06c (mirror) is a free file compressor by Christian Martelock, Mar. 4, 2008. It uses order-1 ROLZ as discussed here. It takes no options. Memory usage is advertised as 258 MB for compression and 130 MB for decompression. Measured values (shown) are 180 MB for compression and 104 MB for decompression.

    rzm 0.07h was released Apr. 24, 2008. Advertised memory usage is unchanged.

                    Compression            Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options             enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------           -------           ----------  -----------  -----------  -----------  ----- -----  --- ---
    rzm 0.06c                           24,429,597  210,719,085     12,903 x  210,731,988   2216    92   180 ROLZ
    rzm 0.07h                           24,361,070  210,126,103     17,667 x  210,143,770   2336    81   160 ROLZ
    

    .2104 pim

    pim 2.01 is a free GUI archiver by Ilia Muraviev, based on PPMd by Dmitry Shkarin, using PPM. Version 2.01 was released June 14, 2007. It has options to model color images and .exe files. These make no difference on text and were turned off. It was timed with a watch.

    pim 2.04 beta was released July 21, 2007. It has PPMd as its only option.

    pim 2.10 was released July 31, 2007. Older versions are no longer supported.

    pim 2.50 was released July 22, 2008. It supports 3 compression modes: store, normal, and best. Only best was tested. It compresses in PPMd, bzip2 and DCL formats and extracts BALZ, QUAD, ZIP, JAR, PK3, PK4 and QUAKE PAK archives.

                    Compression            Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options             enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------           -------           ----------  -----------  -----------  -----------  ----- -----  --- ---
    pim 2.01   PPMd, no exe, no color   24,303,638  210,124,895    340,951 x  210,465,846   ~600   639   92 PPM
    pim 2.04b  PPMd                     24,303,638  210,124,895    335,004 x  210,459,899    900   780   84 PPM
    pim 2.10   PPMd                     24,303,638  210,124,895    335,374 x  210,460,269    895  ~900   84 PPM
    pim 2.50   best                     24,303,638  210,124,895    330,901 x  210,455,796    764  ~764   88 PPM
    

    .2118 xz

    xz 5.0.1 is a free, open source file compressor, Jan. 29, 2011. xz specifies a container format written by Lasse Collin. It uses the public domain LZMA2 compressed format from 7zip by Igor Pavlov. There are versions for most operating systems including Windows and Linux. The Windows version was tested. The option -9 specifies maximum compression and memory. The default is -6. The option -e (extreme) specifies better compression at a cost in compression (but not decompression) time.

    Program size is based on xz.exe. There is a separate decompressor (xzdec.exe) which is smaller and decompresses to standard output, but the Windows version does not work because it outputs in text mode. Additional results are shown below for enwik8 for compression and decompression time (ns/byte) and compression and decompression memory (in MB).

    Version     Options    enwik8    Ctime Dtime Cmem Dmem Note
    --------    -------  ----------  ----- ----- ---- ---- ----
    xv 5.0.1    -9 -e    24,831,648  2310   40   690  66   26
                -9       24,865,244  2600   40   690  66   26
                         26,375,764  2020   45    95   8   26
    

    .2120 CTW

    CTW 0.1 is a free, command line file compressor with source code by Erik Franken and Marcel Peeters, Nov. 13, 2002. It uses CTW (context tree weighting), a type of context-mixing algorithm (with single bit prediction and arithmetic coding) combining the predictions of different order contexts. Statistics are stored in a suffix tree.

    The -d6 option selects order 6 (depth of context tree). -n16M selects the maximum of 16M nodes for the tree (using 128 MB memory). -f16M selects the maximum 16 MB file buffer (for rebuilding pruned contexts). The default values of all other options were tested on enwik6 and found optimal. For -d, there is a tradeoff between compression and memory usage as with PPM compressors. -d6 was found optimal on both enwik7 and enwik8.

    Option    enwik7     enwik8      enwik9     Comp (ns/byte)
    ------  ---------  ----------  -----------  -----
    -d5     2,490,460  24,174,511               11340
    -d6     2,438,708  23,670,293  211,995,206  19221
    -d7     2,455,765  23,689,423               24680
    -d9     2,494,767
    -d12    2,531,284
    

    .2139 boa

    boa 0.58b is a free, closed source command line archiver by Ian Sutton, Apr. 2, 1998. It uses PPM. The -m15 option selects maximum memory, 15 MB.

    .2139 packet

    packet 0.01 is a free, experimental file compressor by Nania Francesco Antonio, May 11, 2008. It uses LZP. It takes no options.

    packet 0.02, May 16, 2008, improves compression for .wav files and supports files over 2 GB.

    packet 0.03b, May 20, 2008, uses LZ77, 3 MB for compression, and 1 MB for decompression. It takes an optional argument 'x' meaning better but slower compression, and a level 1 through 6, where 6 is slowest with best compression.

    packet 0.90b, June 18, 2008, has options -m1 to -m4 (method) and -s0 to -s9 (intensity). All options use 10 MB for compression and 2 MB for decompression.

    packet 0.91b, Aug. 6, 2009 has methods -m1 through -m6, where -m6 is maximum compression. Decompression requires 1.5 MB.

    packet 1.0 (discussion) was released Aug. 4, 2013. Options -m0..-mx9 select compression level (default -m4). Option -t2 selects 2 threads (default -t1).

    packet 1.1 (discussion) was released Dec. 7, 2013 for 64 bit Windows. It was tested in Ubuntu under wine. Option -m9 (or -mx) selects maximum compression. Default if -m4. -b512 selects maximum buffer size of 512 MB. Default is -b64. -h4 selects maximum number of buffers. Default is -h2.

                   Compression           Compressed size      Decompresser  Total size  Time (ns/byte)
    Program          Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------          -------         ----------  -----------  -----------  -----------  ----- ----- ---- ---- ----
    packet 0.01                      37,637,275  334,473,465     30,508 x  334,503,973     50    43    4 LZP
    packet 0.02                      37,637,276  334,473,466     27,900 x  334,501,366     58    42    4 LZP
    packet 0.03b     1               35,576,495                                           140    20    3 LZ77
                     x 1             34,792,199                                           170    20    3 LZ77
                     6               34,563,297                                           450    20    3 LZ77
                     x 6             33,752,502  297,266,174     26,435 x  297,292,609    594    18    3 LZ77
    packet 0.90b     -m1 -s0         35,426,140                                           199    28   10 LZ77
                     -m1 -s9         32,780,039                                          2887    26   10 LZ77
                     -m2 -s0         34,281,503                                           274    24   10 LZ77
                     -m2 -s9         31,968,711                                          4527    25   10 LZ77
                     -m3 -s0         34,966,621                                           236    56   10 LZ77
                     -m3 -s9         32,199,212                                          2965    51   10 LZ77
                     -m4 -s0         33,612,046                                           307    61   10 LZ77
                     -m4 -s3         32,033,412                                           861    57   10 LZ77
                     -m4 -s6         31,367,386                                          2411    57   10 LZ77
                     -m4 -s9         31,208,752  273,176,127     32,305 x  273,208,432   3871    48   10 LZ77
    packet 0.91b     -m6 -s9         31,306,703  274,033,491     45,358 x  274,078,849   3669    36   10 LZ77 26
    packet 1.0       -m4             28,349,717                                           487    37  416 LZ77 26
                     -m4 -t2         28,789,607                                           385    53  500 LZ77 26
                     -m9             27,439,216                                          4530    37  425 LZ77 26
                     -mx9            26,895,256  232,428,377    114,566 x  232,542,943  19749    34  429 LZ77 26
    packet 1.1                       26,848,041  233,803,751    265,102 x  234,068,853    295    26  335 LZ77 48
                     -m9 -b512 -h4   25,624,659  216,849,389    265,102 x  217,114,491    647    26 1500 LZ77 48
                     -mx -b512 -h4   25,348,872  213,722,850    265,102 x  213,987,952    767    26 1500 LZ77 48
    

    .2144 yzx

    yzx 0.01 (discussion) is a free, experimental command line archiver by Nania Francesco Antonio, May 3, 2010. It uses "LZKS" decribed as an LZ type algorithm. Option -b5 selects maximum memory. Option -m2 selects method 2 (default is -m1). -c8 selects number of match keys (range -c1 to -c8, default -c3). Memory usage is 732 MB for compression and 137 MB for decompression.

    yzx 0.02, May 7, 2010, corrects a bug in compression.

    yzx 0.03 was released May 21, 2010. The range of options is -m1..m2, -c1..c5, -b1..b6. Memory usage with -m2 -c5 -b6 is 404 MB for compression and 268 MB for decompression.

    yzx 0.04 was released May 27, 2010. Decompression memory remains at 268 MB.

    yzx 0.11 was released Jan. 4, 2012. Options -m0..-m9 select compression method (fast..slow). Options -b1..-b8 select ring buffer size (small..large). Options -h1..-h6 select search buffer size (small..large). Default is -m2 -b2 -h4. There was not enough memory to test maximum compression (-m9 -b8 -h6) without reducing either -b or -h.

    Compressor   Opt            enwik8      enwik9         Prog      Total        Comp Decomp  Mem Alg  Note
    ---------    ---          ---------   -----------     -------  -----------    ----  ----   --- ---- ----
    yzx 0.01     -b5          28,984,962  249,903,552    116,793 x  250,020,345    395    73   732 LZ   26
    yzx 0.02     -m2 -c8 -b5  27,293,259  229,890,264    116,795 x  230,007,059  10927    67   732 LZ   26
    yzx 0.03     -m2 -c5 -b6  28,132,853  241,790,934    116,141 x  241,907,075    911    71   404 LZ   26
    yzx 0.04     -m2 -c5 -b6  27,670,096  235,198,449    116,507 x  235,314,956    833    69   535 LZ   26
    yzx 0.11                  27,694,742                                           258    85   293 LZ   26
                 -m9 -b8 -h5  25,768,724                                           518    81   636 LZ   26
                 -m9 -b7 -h6  25,754,856  214,317,684    131,062 x  214,448,746    642    77  1590 LZ   26
    

    .2178 tornado

    tornado 0.1 is a free, open source file compressor by Bulat Ziganshin, Apr. 16, 2007. It uses LZ77 with arithmetic coding. The -9 option selects a predefined compression profile for maximum compression. There are custom options for hash table size, hash chain length, block size, type of coder, and an option to force or prohibit cache matching. Some of these options might give better compression, but were not tested.

    tornado 0.3 has options -1 through -12. Each increment approximately doubles compression time and memory usage. Decompression time is fast in all cases, but memory usage is approximately 2/3 that of compression (for the LZ77 buffer). -12 caused disk thrashing and was not tested for enwik9. There are several other options that were not tested.

    tornado 0.4a was released June 1, 2008. It includes Windows and Linux versions. There is a small version (tor-small.exe) which does not include some of the advanced options. The advanced options were not tested. Option -12 caused disk thrashing (2 GB memory) when enwik9 reached 80% compression, so -11 was used instead.

    tornado 0.6, Mar. 8, 2014, adds optimal parsing. It has 16 compression levels. The default is -5. For testing (note 48) it was compiled from source in Linux with g++ 4.8.1 using the provided build.sh script. Windows and Linux 32 and 64 bit executables are also provided.

                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp   Mem Alg  Note
    -------           -------        ----------  -----------  -----------  -----------  ----- -----   --- ---- ----
    tornado     0.1     -9           34,491,218  303,034,530     20,336 s  303,054,866    204    25   210 LZ77
    tornado     0.3     -1           59,790,826                                            18             LZ77
                        -2           44,570,662                                            22             LZ77
                        -3           40,173,986                                            28             LZ77
                        -4           37,849,654                                            60             LZ77
                        -5           34,206,892                                            81             LZ77
                        -6           33,319,753                                           130             LZ77
                        -7           32,346,652                                           195          96 LZ77
                        -8           31,659,225                                           304         192 LZ77
                        -9           30,967,871                                           506         384 LZ77
                        -10          30,614,648                                           802         768 LZ77
                        -11          30,274,896  259,412,590     45,833 s  259,458,423   1646    25  1510 LZ77
                        -12          30,057,549                                          3700    28  1768 LZ77
    tornado     0.4a    -11          30,157,610  258,761,459     42,516 s  258,803,975    783    25  1513 LZ77
                        -12          30,026,843                                          3200    29 >1800 LZ77
    tornado     0.6     -1           59,790,838  531,349,003                                8     5     2 LZ77  48
                        -2           49,093,116                                             8     6     3 LZ77  48
                        -3           39,510,585                                            14     9     5 LZ77  48
                        -4           38,018,770                                            18     9    11 LZ77  48
                        -5           34,175,257  300,482,758                               41     9    25 LZ77  48
                                     34,175,257  300,482,758                               93    24    29 LZ77  26       
                        -6           32,921,124                                            57    10    97 LZ77  48
                        -7           30,131,376                                           134    10   229 LZ77  48
                        -8           29,507,281                                           290    11   613 LZ77  48
                        -9           29,327,427                                           392    11   613 LZ77  48
                        -10          29,048,467                                           371    11   628 LZ77  48
                        -11          30,108,427                                           270    10   356 LZ77  48
                        -12          28,596,548                                           397     9   356 LZ77  48
                        -13          28,042,448                                           503     9   484 LZ77  48
                        -14          27,129,826                                           672     9   614 LZ77  48
                        -15          26,762,749                                           985    10   614 LZ77  48
                        -16          25,768,105  217,749,028     83,694 s  217,832,722   1482     9  1290 LZ77  48
    

    .2178 LZPXj

    LZPXj 1.1d is an experimental open source (GPL) command line file compressor by Ilia Muraviev and Jan Ondrus, May 21, 2006. The -m3 option selects maximum compression. The -e0 option turns off the exe filter (has no effect on text). The -r3 and -a0 options were tuned experimentally on enwik7. -r sets the rescale rate (range 1-5, default 3). -a0 turns off the alternate one byte matcher (default -a1 = on).

    LZPXj 1.2h, Mar. 6, 2007, uses LZP + PPM with a preprocessor for x86 executables. It has just one option (1-9) which select memory usage. The default is 6. The maximum is 9. Each increment doubles usage.

                    Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Notes
    -------           -------                     ----------  -----------  -----------  -----------  ----- -----  --- ---  -----
    LZPXj 1.1b        -s (best, = -r4 in 1.1d)    28,387,611                                           674            LZP
    LZPXj 1.1b           (default)                28,440,958                                           677            LZP
    LZPXj 1.1d        -m3 -r4 -a0 -e0             28,386,512  246,468,866      6,534 s  246,475,400    362   402  216 LZP
    LZPXj 1.2h        9                           25,205,783  217,880,584      4,853 s  217,885,437    783   717 1316 PPM  
    

    .2179 scmppm

    scmppm 0.93.3 is a GPL open source command line compressor for XML files by James Cheney and Joaquín Adiego, Oct. 3, 2005, and using PPMd var. I code by Dmitry Shkarin. It works by grouping XML data by tag, then compressing with ppmd (similar to XMill). scmppm is distributed as UNIX source code only. For this test it was compiled and run under WinXP using the latest version of Cygwin, g++, flex, and make as of May 24, 2006. To compile I had to add the line extern "C" int fileno(FILE*); to lex.yy.c.

    The -l 9 option selects maximum compression.

    .2185 acb

    acb (discussion) is a shareware archiver for DOS by George Buyanovsky. It achieved some popularity in Russia in 1997 after being described in a popular magazine there. acb uses a complex variant of LZ77 called "associative coding". (ACB means "associative coding by Buyanovsky"). History is collected in a context sorted ring (like BWT) called a "funnel of analogies". A string match is coded by the position of the longest (nearest) match in this data structure. The length is coded dependent on the length of neighboring matches. The result is arithmetic coded. There are 4 versions: All versions limit file size to 64 MB but do not limit archive size. To test enwik8, it was divided into 2 equal parts of 50 MB and compressed into one archive. Archives are compressed in "solid" mode. enwik9 was divided into 16 equal parts of 62.5 MB each (named 01 through 16) and compressed to 16 separate archives. The compressor crashed (after 12 hours and producing 1474 MB output in 3 files) with an illegal interrupt when attempting to compress enwik9 into a single archive.

    .2186 crushm

    crushm is a free file compressor for Windows by Abhilash, July 12, 2013. It uses CM. It takes no options.

    .2190 PX

    PX v1.0 is a free command line file compressor by Ilia Muraviev, Feb. 17, 2006. It is a context mixing compressor based on PAQ1 with fixed weight models.

    .2196 DGCA

    DGCA v1.10 is a free, closed source GUI archiver, Aug. 8, 2006. The installer is in Japanese but the program runs in several languages including English. It was tested with default settings except for producting a self extracting archive. This adds 189,936 bytes to enwik8.

    .2200 Squeez

    Squeez 5.20.4600 is a commercial (60 day trial) GUI archiver by SpeedProject, Apr. 11, 2006. It supports 13 different formats, but only the native .sqx (possibly LZ77) format was tested. The options used were 2.0 format (newest), 32 MB dictionary (largest, actually uses 365 MB memory), Ultra compression (best), and all checkboxes off (including no exe or multimedia compression). There is a SFX option but using UnSqueez to decompress instead gives a smaller size.

    .2212 fpaq2

    fpaq0s2 is a free, open source (GPL) file compressor by Nania Francesco Antonio, Sept, 29, 2006. It is an order 2 model based on the order 0 compressor fpaq0s by David A. Scott, which is based on fpaq0 by Matt Mahoney by modifying the arithmetic coder. fpaq0x is the same order 2 model based directly on fpaq0.

    fpaq0x1a is an order 3 model (hashed context) using fpaq0's arithmetic coder. fpaq0s2b is a similar model based on fpaq0s. Both were released Oct. 1, 2006.

    fpaq0x1b (Oct. 6, 2006) switches between different models up to order 3.

    fpaq0s3 (Oct. 8, 2006) uses a simple order 0 model on groups of 3 bytes.

    fpaq0s4 (Oct. 12, 2006) uses a combined order 0-1-2, PPM and LZ model.

    fpaq0s5 (Oct. 15, 2006) improves on fpaq0s4. Memory usage is 200 MB when run at normal priority and 160 MB when run at below normal priority (WinXP Home).

    fpaq2 (Oct. 21, 2006) uses a combination context mixing and PPM algorithm.

    fpaq0s6 (Oct. 30, 2006) improves on fpaq0s5.

    fastari (Nov. 7, 2006) is an order 2 compressor with an all new arithmetic coder and greater speed.

    fpaq3 (Nov. 20, 2006) is an order 3 compressor.

    fpaq3b (Dec. 2, 2006) is a bitwise order 28 compressor.

    fpaq3c (Dec. 21, 2006) is an improved bitwise order 28 compressor.

    fpaq3d (Dec. 28, 2006) adds an option to fpaq3c to select memory usage from 16 MB to 2 GB. Option 6 selects 1 GB memory (the highest tested).

    All programs are here.

    Program Opt   enwik8      enwik9     prog (zip)   enwik9+prog  Comp Decomp  Mem Alg
    ------- --- ----------  -----------  -----------  -----------  ----- -----  --- --
    fpaq2       25,287,775  221,242,386      3,429 s  221,245,815  20183 20186  131 CM
    fpaq3d    6 26,656,082  233,750,402      3,309 s  233,753,711   1922  1938 1050 o28b
    fpaq3c      27,978,995  248,253,886      2,535 s  248,256,421   1446  1456  268 o28b
    fpaq0s6     30,012,650  263,438,012      4,150 s  263,442,162    547   505  174 PPM
    fpaq0s5     30,374,122  266,244,843      4,027 s  266,248,870    480   419  200 PPM
    fpaq3b      29,992,583  270,804,549      2,926 s  270,807,475   1526  1517  256 o28b
    fpaq3       31,176,104  282,922,749      8,820 x  282,931,569   1770  1807  250 o3
    fpaq0x1b    30,860,828  283,001,299      2,727 s  283,004,026   1178  1180 1094 PPM
    fpaq0s4     33,327,611  311,104,858      3,528 s  311,108,386    477   473  147 PPM
    fpaq0x1a    36,186,433  339,131,763      2,561 s  339,134,324    621   623 1052 o3
    fpaq0s2b    35,934,548  343,603,459      3,029 s  343,606,488    599   605 1052 o3
    fastari     39,392,220  371,909,475      2,287 s  371,911,762    224   261  133 o2
    fpaq0s2     38,812,873  375,050,952      2,982 s  375,053,934    591   595  131 o2
    fpaq0x      38,845,305  375,276,899      2,482 s  375,279,381    631   631  263 o2
    fpaq0s3     49,728,923  490,781,136      3,000 s  490,784,136    525   475   32 o2
    

    .2217 TinyCM

    TinyCM 0.1 is a free, open source (GPL v3) file compressor by David Werecat, Oct. 12, 2012. It uses an order 1-2-3-6 context mixing model. It takes one option, a single digit "level" which apparently has no effect except to store the value in the first byte of the archive. (I used "9"). Memory is the same for compression and decompression. The supplied executables require MSVCR110.dll, which I did not have, so I recompiled the source code with g++ 4.6.1 using "gcc -O3 -march=native -s *.c -I." on a 2.0 GHz T3200 under 32 bit Vista.

    Compressor   Opt         enwik8      enwik9         Prog       Total       Comp Decomp  Mem Note
    ---------    ---       ---------   -----------     -------   -----------   ----  ----   --- ----
    TinyCM 0.1     9      25,913,605   221,773,542      12,553 x 221,786,095   1342  1330  1083 26
    

    .2226 dmc

    dmc is the original DMC compressor written by Gordon V. Cormack in 1987 and described in "Data Compression using Dynamic Markov Modelling", by Gordon Cormack and Nigel Horspool in Computer Journal 30:6 (December 1987). The algorithm is the same as described in hook with the last 2 arguments fixed at "2 2". The dmc argument "c 1800000000" means to compress with 1.8 GB memory. The memory size must also be given for decompression. Thus, 10 bytes (the size of the argument) was added to the decompresser size (source zipped with Info-Zip 2.31 -9). Because dmc compresses and decompresses from stdin to stdout, it was tested in Linux (Ubuntu 2.6.15.27-amd64-generic), compiled in gcc 4.0.3 x86-64 as follows:
      gcc -O -s -Dexp=expand dmc.c
    
    and tested on a 2.2 GHz Athlon-64 with 2 GB memory. The compiler argument "-Dexp=expand" removes a compiler error due to a K&R style redefinition of exp().

    .2276 szip

    szip 1.12a is a free, open source file compressor by Michael Schindler, Mar. 3, 2000. It uses a modified BWT (a Schindler transform) which sorts using a truncated string comparison to speed the transform on highly redundant data. The algorithm is protected by patent 6,199,064 in the U.S. until Nov. 19, 2017. The first version of szip was released on June 2, 1997.

    The option -b41o16 selects a block size of 4.1 MB (the maximum) and order 16, the maximum length of string comparisons. Memory usage is 17 MB (4x block size) for compression and 21 MB (5x block size) for decompression. o0 means unbounded order, which is the same as a normal BWT. The default is -b16o6.

    Compressor   Opt         enwik8      enwik9         Prog       Total       Comp Decomp  Mem Note
    ---------    ---       ---------   -----------     -------   -----------   ----  ----   --- ----
    szip 1.12a   -b41o16   26,120,472  227,586,463     31,708 x  227,618,171   1191   289    21  26
                 -b41o4    27,561,829                                            70   210    21  26
                 -b16o6    27,666,448                                           270   220     8  26
                 -b41o6    26,365,058                                           360   240    21  26
                 -b41o8    26,185,222                                           530   250    21  26
                 -b41o32   26,128,020                                          2550   400    21  26
                 -b41o64   26,130,850                                          5210   600    21  26
                 -b41o0    26,130,985                                           750   200    21  26
    

    .2282 balz

    balz 1.02 is a free, closed source file compressor by Ilia Muraviev, Mar. 8, 2008. It uses LZ77 with arithmetic coding, a 512K buffer with Storer and Symanski parsing. It takes no options. Memory usage is 346 MB for compression and 18 MB for decompression.

    balz 1.06, May 9, 2008, has two compression options, e for normal and ex for better but slower compression. Both options use 67 MB for compression and 48 MB for decompression.

    balz 1.07 was released May 14, 2008. It uses 132 MB for compression and 95 MB for decompression.

    balz 1.08 was released May 20, 2008. It uses 200 MB for compression and 126 MB for decompression. Only mode ex was tested.

    balz 1.09 was released May 21, 2008. It uses 128 MB for decompression. Only mode ex was tested.

    balz 1.12 was released June 3, 2008. It uses 123 MB for decompression.

    balz 1.13 was released June 11, 2008. It uses 127 MB for decompression.

    balz 1.15 was released as open source on July 8, 2008. It uses 67 MB for compression and 49 MB for decompression.

    Compressor   Opt     enwik8      enwik9         Prog      Total       Comp Decomp  Mem Alg
    ---------    ---   ---------   -----------     -------  -----------   ----  ----   --- ----
    balz 1.02          30,634,726  268,552,062     48,030 x  268,600,092  21804    58  346 LZ77
    balz 1.06    e     28,674,640                                          1580    79   67 ROLZ
    balz 1.06    ex    28,234,913  245,288,229     48,937 x  245,337,166   2440    75   67 ROLZ
    balz 1.07    e     28,271,200                                          1060    96  132 ROLZ
    balz 1.07    ex    27,416,245  237,492,151     49,082 x  237,541,233   2106    77  132 ROLZ
    balz 1.08    ex    26,534,890  229,477,116     49,351 x  229,526,467   4431   126  200 ROLZ
    balz 1.09    ex    26,534,257  229,476,459     49,928 x  229,526,387   4049   128  201 ROLZ
    balz 1.12    e     27,522,348                                          1800   177  201 ROLZ
    balz 1.12    ex    26,522,258  229,347,434     48,400 x  229,395,834   3989   148  201 ROLZ
    balz 1.13    e     27,405,650                                          1670   221  206 ROLZ
    balz 1.13    ex    26,421,416  228,337,644     49,024 x  228,286,668   3700   190  206 ROLZ
    balz 1.15    ex    28,232,824  245,218,274      4,045 s  245,222,319   1064    95   67 ROLZ
    

    .2291 lzpm

    lzpm 0.02 is a free, closed source file compressor by Ilia Muraviev, Apr. 19, 2007. It uses LZ77. It takes no options.

    lzpm 0.03, Apr. 28, 2007, uses more memory for compression (181 MB), but still uses 20 MB for decompression.

    lzpm 0.04, May 4, 2007, uses ROLZ. Memory usage is 83 MB for compression and 20 MB for decompression. The new design uses circular hash chains for better speed on binary files, but a little slower for text.

    lzpm 0.06, May 19, 2007, improves compression over 0.04 with the same memory usage.

    lzpm 0.07, Aug. 6, 2007, and later versions use 280 MB for compression and 20 MB for decompression.

    lzpm 0.08, Aug. 8, 2007.

    lzpm 0.09, Aug. 15, 2007.

    lzpm 0.10, Aug. 23, 2007.

    lzpm 0.11, Sept. 5, 2007, takes the command 1..9 to choose the compression level (fastest...maximum). 1 uses greedy parsing. 2..8 use 1..7 byte lookahead. 9 uses unbounded lookahead. All modes use 723 MB for compression and 77 MB for decompression.

    lzpmlite 0.11, Sept. 13, 2007, is a "lite" version of lzpm, using about half as much memory and twice as fast. Options range from 1..9 with 1 being fastest and 9 for best compression. (3 is a good compromise). All modes use 362 MB for compression and 39 MB for decompression.

    lzpm 0.13 was released Dec. 1, 2007.

    lzpm 0.14 was released Jan. 1, 2008. It uses 40 MB for decompression.

    lzpm 0.15 was released Jan. 16, 2008. It uses 40 MB for decompression.

    Compression            Compressed size      Decompresser  Total size   Time (ns/byte)
    Program      Opt      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------      ---    ----------  -----------  -----------  -----------  ----- -----  --- ----
    lzpm 0.02           29,274,461  254,596,796     26,078 x  254,622,874    612    59   83 LZ77
    lzpm 0.03           29,248,641  254,378,973     26,089 x  254,405,062    749    59  181 LZ77
    lzpm 0.04           29,297,905  254,793,933     25,333 x  254,819,266    665    60   83 ROLZ
    lzpm 0.06           28,896,680  251,111,835     25,369 x  251,137,204    852    58   83 ROLZ
    lzpm 0.07           28,385,939  246,426,198     46,692 x  246,472,890   2185    56  280 ROLZ
    lzpm 0.08           28,259,984  245,221,254     48,122 x  245,269,376   2754    59  280 ROLZ
    lzpm 0.09           27,986,111  242,929,442     46,933 x  242,976,375   2451    56  280 ROLZ
    lzpm 0.10           27,849,915  241,719,857     46,871 x  241,766,728   2598    57  280 ROLZ
    lzpm 0.11     1     29,728,112                                          1162    76  723 ROLZ
                  2     27,967,747                                          3746    66  723 ROLZ
                  3     27,424,937                                          5204    68  723 ROLZ
                  4     27,239,304                                          6488    66  723 ROLZ
                  5     27,134,495                                          7446    63  723 ROLZ
                  6     27,038,405                                          8143    64  723 ROLZ
                  7     26,962,337                                          8761    63  723 ROLZ
                  8     26,890,422                                          9330    62  723 ROLZ
    lzpm 0.11     9     26,501,542  229,083,971     46,824 x  229,130,795  15395    57  723 ROLZ
    lzpmlite 0.11 1     30,136,214                                           627    69  362 ROLZ
                  3     27,918,695                                          2620    64  362 ROLZ
    lzpmlite 0.11 9     27,096,516  235,135,224     48,144 x  235,183,368   6235    59  362 ROLZ
    lzpm 0.12     9     27,391,197  237,915,048     47,030 x  237,962,078   4501    57  280 ROLZ
    lzpm 0.13     9     27,318,013  237,241,658     47,129 x  237,288,787   4543    59  280 ROLZ
    lzpm 0.14     9     27,091,358  235,074,141     48,790 x  235,122,931   6467    73  428 ROLZ
    lzpm 0.15     9     27,145,224  235,567,823     48,401 x  235,616,224   6557    62  427 ROLZ
    

    .2299 qazar

    qazar 0.0pre5 is a free, closed source command line file compressor by Denis Kyznetsov, Jan. 31, 2006. It uses LZP, an LZ77 variant where the decompresser dynamically computes the same sequence of context matches as the compressor. The compressor uses a single bit flag to indicate if the pointer computed by the decompresser should be followed. In qazar, the output symbols are arithmetic coded.

    The -d9 option selects maximum dictionary size. -x7 selects maximum hash level (most memory). -l7 selects maximim search level (slowest).

    .2299 csc32

    csc2 is a free, experimental, closed source file compressor by Fu Siyuan, Apr. 18, 2009. It uses LZP with order 1 modeling of literals and range coding over a 270 size alphabet. The program takes no options. It recognizes whether the input file is compressed, and if so, decompresses it.

    csc3 v.2009.08.12 is a free file compressor with source code in C by Fu Siyuan, Aug. 11, 2009. It uses LZ77. The option -m3 selects best and slowest compression (range -m1 to -m3, default -m2). -d7 selects the maximum dictionary size (range -d1 to -d7, default -d4). -fo turns off EXE and delta filtering (default unless detected by file name extension). The decompresser size is based on csc3.exe, which is smaller than csc3compile2.exe, but does not work on some machines. It is smaller than the zipped source code (17,247 bytes). Timing is similar for both versions and a version compiled with gcc 4.4 with -O2 -s -march=pentium4 -fomit-frame-pointer.

    csc31 was released Sept. 23, 2009 without source code. Discussion.

    csc32 a2 (discussion), May 9, 2010, is a rewrite of csc31. The option -m3 selects maximum compression. -d9 selects maximum dictionary size. Memory usage is 528 MB for compression and 330 MB for decompression.

    csc32 final, Mar. 1, 2011, has 3 compression settings from -m1 (fastest) to -m3 (best) and dictionary sizes up to -d512 (512 MB) which get the best compression but use the most memory. Compression requires memory in addition to the dictionary, but decompression does not. Source code is now available.

                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------           -------        ----------  -----------  -----------  -----------  ----- -----  --- ---- ----
    csc2                             34,119,354  298,385,256      9,092 x  298,394,348    141   201   49 LZP  26
    csc3 2009.08.12   -m1 -d1        33,920,768                                           150    59   15 LZ77 26
                      -m1 -d7        33,510,724                                           320    59  511 LZ77 26
                      -m2 -d4        31,627,835                                           660    56   93 LZ77 26
                      -m2 -d7        31,460,838                                           730    55  511 LZ77 26
                      -m3 -d7        30,430,159  263,485,695     14,027 x  263,499,722   1514    43  675 LZ77 26
    csc31             -m3 -d7        28,984,849  250,172,831     64,214 x  250,237,045   1045    33  791 LZ77 26
    csc32 a2          -m3 -d9        30,304,020  262,999,383    111,571 x  263,110,954    340    35  528 LZ77 26
    csc32 final       -m1 -d128      28,973,600                                           178    49  166 LZ77 26
                      -m2 -d128      28,624,802                                           283    52  166 LZ77 26
                      -m3 -d4        27,776,206                                           416    52   24 LZ77 26
                      -m3 -d128      26,842,072  232,326,926     53,665 s  232,380,591    420    46  201 LZ77 26
                      -m3 -d512      26,842,072  229,929,654     53,665 s  229,983,319    423    47  660 LZ77 26
    

    .2317 KuaiZip

    KuaiZip 2.3.2 is a free GUI archiver for Windows, Sept. 9, 2011. It uses a proprietary compression algorithm, probably LZMA. It takes no compression options. On the test machine (dual core T3200), compression uses 1.5 threads (75% CPU). Decompression uses one thread. Times are reported by the application.

    Compression                         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program    Version                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  CMem Dmem Alg Note
    -------    --------              ----------  -----------  -----------  -----------  ----- -----  ---- ---- --- ----
    KuaiZip    2.3.2 x86             25,895,915  227,905,650  3,857,649 x  231,763,299   1061    47   197   19 LZMA  26
    

    .2328 qc

    qc 0.050 is a free, closed source, command line file compressor by Denis Kyznetsov, Sept. 17, 2006. The -8 option selects maximum compression (slowest and most memory).

    .2334 ppms

    See ppmonstr above.

    .2356 dzo

    dzo is a commercial GUI deduplicator and archiver for Windows by Essenso Labs. A beta version (32 day free trial) dated Sept. 15, 2011 was tested. The trial version will compress either a single file or a folder. It first finds duplicate files or regions within files and produces an intermediate temporary file (file.dp) that removes the duplicates. Then it compresses the temporary file using LZMA (7zip) to file.dzo and removes it. The original files are not removed. Decompression restores a single file to (dzo)file or folder(dzo), again through a temporary .dp file. Both commands are activated by right-clicking on the file or folder to compress or the .dzo file to decompress and selecting the command from the context menu. Times are as reported by the appliation. LZMA compression is multi-threaded.

    .2428 comprox_ba

    comprox_ba 20110927 (discussion) is a free, experimental, open source file compressor by Zhang Li, Sept. 27, 2011. It uses BWTS (BWT Scottified) with 4 MB blocks, followed by MTF (move to front), RLEZ (run length encoding of zeros) and arithmetic coding. BWTS is a bijective variant of BWT developed by David A. Scott in which the starting index is not stored. In BWTS, the input is factored into a sequence of lexicographically non-decreasing Lyndon words, which are then context-sorted separately. The starting indexes for the inverse BWTS are the beginnings of each word.

    The program takes no arguments. It uses 103 MB (24x block size) for compression and 25 MB (6x block size) for decompression. There is a Windows and a Linux version. Only the Windows version was tested.

    comprox_ba 20110928 was released Sept. 28, 2011. Compression runs in 2 threads. Both the Windows and Linux versions were tested (on different computers).

    comprox_ba 20110929 was released Sept. 29, 2011. Compression is slightly improved. Both compression and decompression are now multi-threaded.

    Compression                         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program    Version                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  CMem Dmem Alg Note
    -------    --------              ----------  -----------  -----------  -----------  ----- -----  ---- ---- --- ----
    comprox_ba 20110927              27,831,722  242,858,769      4,165 s  242,862,934   1500   227   103   25 BWTS 26
    comprox_ba 20110928 (Win32)      27,831,722  242,858,769      4,151 s  242,862,920    957   227   206   25 BWTS 26
    comprox_ba 20110928 (Linux)      27,831,722  242,858,769      4,151 s  242,862,920    363   168   226   30 BWTS 48
    comprox_ba 20110929 (Win32)      27,828,189  242,846,243      4,134 s  242,850,377    984   152   206   50 BWTS 26
    comprox_ba 20110929 (Linux)      27,828,189  242,846,243      4,134 s  242,850,377    397   101   226   76 BWTS 48
    

    .2453 turtle

    turtle 0.01 is a free, experimental, closed source file compressor by Nania Francesco Antonio, June 1, 2007. It uses PPM. It takes no options.

    turtle 0.02 was released June 2, 2007. Compression is identical.

    turtle 0.03 was released June 5, 2007. It is faster and improves compression slightly. The file name is stored in the compressed file.

    turtle 0.04 was released June 8, 2007. It recognizes several different file types.

    turtle 0.05 was released June 12, 2007. It improves compression at the cost of time and memory.

    turtle 0.07 was released June 23, 2007. It includes a model for audio files.

    WinTurtle 1.2 is a Windows GUI version of turtle, released Aug. 16, 2007. It uses PPM with LZP preprocessing. It detects .tar, .iso, .nrg, .wav, .aiff, .bmp, .exe, .pdf, .log and text files. Compression times are wall times. Note: the user interface is not fully functional. To compress a file, click "Drive", click on "Buffer" until it is set to 512 MB (it does not work until you click "Drive" first, also 1 GB caused program to crash on enwik8), select "File/compress single file" from the upper menu, then select the input file and output archive from the two file dialogs. The program adds a .tur extention to the output archive. To decompress, select File/open archive, click on the file name, click Select, click Extract, and select an output folder from the file dialog.

    WinTurtle 1.21, Aug. 16, 2007, fixes an unrelated bug but is otherwise the same as 1.2.

    WinTurtle 1.30 was released Aug. 30, 2007.

    WinTurtle 1.60 was released Jan. 1, 2008.

    Compressor   Opt       enwik8      enwik9         Prog      Total       Comp Decomp  Mem Alg
    ---------    ---     ---------   -----------     -------  -----------   ----  ----   --- ----
    turtle v0.01         31,314,961  274,696,820     5,079 x  274,701,899    187   178   122 PPM
    turtle v0.02         31,314,961  274,696,820     4,637 x  274,701,457    196   175   122 PPM
    turtle v0.03         31,287,161  274,649,069     7,111 x  274,656,180    142   129   122 PPM
    turtle v0.04         31,137,531  273,100,225     7,808 x  273,108,033    141   128   122 PPM
    turtle v0.05         28,860,689  251,626,176     9,779 x  251,635,955    242   203   174 PPM
    turtle v0.07         28,669,320  250,600,644    10,625 x  250,611,269    217   175   206 PPM
    WinTurtle 1.2  8MB   29,601,717  258,927,402   238,080 x  259,164,482    248   242    31 PPM
                   512MB 28,814,475  250,364,644   238,080 x  250,598,724    264   240   548 PPM
    WinTurtle 1.21 512MB 28,814,475  250,364,644   225,123 x  250,589,767    255   219   548 PPM
    WinTurtle 1.30 512MB 28,814,478  250,364,647   239,247 x  250,603,594    243   240   597 PPM
    WinTurtle 1.60 512MB 28,379,612  245,217,944   160,090 x  245,378,034    273   237   583 PPM
    

    .2466 diz

    diz is a free, experimental, open source (GPL) file compressor by Roger Flores, Aug. 3, 2012. It is a PPMC based compressor written in Python. It is distributed as source code only. The program was tested as recommended by running in pypy version 1.9.

    .2469 lza

    lza 0.01 is a free archiver for 32 bit Windows by Nania Francesco Antonio, May 29, 2014. It uses LZ77 (based apparently on zcm). Option -t selects number of threads. Default is -t1. Using a greater number of threads makes compression worse by splitting the input among threads. -h0..-h7 selects hash buffer memory 8 MB to 1 GB. Default is -h2 (32 MB). -b0..-b7 selects LZ buffer memory 8 MB to 1 GB. Default is -b3 (64 MB). Option combinations -b6 -h7 or -b7 -h6 or higher run out of memory. -m1..-m5 selects compression level (faster..better). Default is -m3. lza 0.10 was released June 29, 2014. It improves compression and speed and adds compression levels -mx1..-mx5 for higher compression. A 64 bit version was released July 3, 2014 to support larger memory options.
                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------           -------        ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    lza 0.01                         39,644,188  302,602,114                              142     9  111 LZ77 48
                  -m5 -b6 -h6 -t1    32,766,063  275,376,918    159,693 x  275,536,611    345    11 1024 LZ77 48
                  -m5 -b6 -h6 -t2    33,496,841  277,860,891                              237    20 2048 LZ77 48
    lza 0.10      -mx5 -b6 -h6       29,052,976  250,653,981    159,953 x  250,813,934    238    11 1012 LZ77 48
    lza_x64 0.10  -mx5 -b7 -h7       28,835,165  246,671,312    259,425 x  246,930,737    265    12 1800 LZ77 48
    

    .2508 cabarc

    cabarc 1.00.0601 is a command line archiver available for free download by Microsoft, Mar. 18, 1997 (SDK released Jan. 8, 2002). It produces .cab files, which are often used to distribute Microsoft software. It is designed for very fast decompression. It uses LZX, a variant of LZ77 with fixed Huffman coding, but with shorter symbols reserved for the three most recent matches. The option -m lzx:21 selects a window size of 221 (2 MB) for maximum compression. There is a separate extraction program, "extract". The actual (global) decompression time of 32 sec. includes 15 sec. of CPU (process) time and the rest for disk I/O.

    .2530 sr3

    sr2 is a free, open source (GPL) file compressor by Matt Mahoney, Aug. 3, 2007. It uses symbol ranking. It takes no options. There are separate programs for compression and decompression.

    Compression is as follows. A 20-bit hashed order-4 context is mapped into the last 3 bytes seen in that context in a move-to-front queue, plus a consecutive hit count. Queue positions (hits) or literals (misses) are arithmetic coded using the count and an an order-1 context (order-0 if the count is more than 3) as secondary context. After a byte is coded, it is moved to the front of the queue. The hit count is updated as follows: incremented (max 63) if the first byte is matched, set to 1 if any other byte is matched, or set to 0 in case of a miss.

    sr3 (mirror) is a modification by Nania Francesco Antonio, Oct. 28, 2007. The context table size is increased from 4 MB to 64 MB, which effectively increases the context from order-4 to order-5. This helps compression on larger files, but makes it worse for some smaller files. The program also detects file type. For .bmp files, the order is decreased. For .wav files, the input is split into separate 1 byte wide streams for each audio sample. There is no separate compressor and decompresser program.

    sr3.exe was recompiled on July 23, 2009 without upack to remove antivirus false alarms, resulting in a larger executable. The new size is shown using source code.

    Program    enwik8      enwik9         prog       Total      Comp  Deco Mem Alg
    -------  ----------  -----------      ----   ------------   ----  ---- --- --- 
    sr2      30,432,506  273,906,319     2,831 sd 273,909,150     99   111   6 SR
    sr3      28,926,691  253,031,980     5,611 x  253,037,591    130   146  68 SR
    sr3      28,926,691  253,031,980     9,399 s  253,054,625    148   160  68 SR  26
    

    .2540 bzip2

    bzip2 1.0.2 is an open source command line single file compressor by Julian Seward, released Dec. 30, 2001. It uses BWT. The -9 option selects maximum compression.

    bzip2 1.0.3 (May 22, 2005) compresses very slightly larger but is faster, as shown by the following table. The decompresser size is based on zipped bunzip2.exe. This is smaller than the source (724,919 bytes as a zip download).

                    Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp
    -------           -------       ----------  -----------  -----------  -----------  ----- -----
    bzip2 1.0.2       -9            29,008,736  253,977,839     30,036 x  254,007,875    379   129
    bzip2 1.0.3       -9            29,008,758  253,977,891     56,082 xd 254,033,973    334   120
    

    .2545 RangeCoderC

    RangeCoderC v1.2 (discussion) is a free, experimental open source file compressor by David Catt, Nov. 23, 2011. The option 26 selects a simple bitwise order 26 model. An order n model requires 16*2n bytes of memory.

    RangeCoderC v1.3, Nov. 25, 2011, has 3 versions. The standard version is compatible with v1.2 but uses half as much memory. The "double" version uses a main model to select among several sub-models to improve compression at a cost in speed and memory. There is also an "indirect" version that was not tested because there was no 32 bit Windows version.

    RangeCoderC v1.4 was released Nov. 28, 2011. It has 4 versions: standard, double, indirect, and a new version, hashed, which computes a hashed context and gives the best compression.

    RangeCoderC v1.5 was released Nov. 29, 2011. It combines the 4 models from v1.4 into one program and includes the model type in the archive header. Option c3 selects the hashed model. It gives the same size as v1.4. The other models were not tested.

    RangeCoderC v1.6 was released Dec. 1, 2011. It has 6 compression modes selected by options c0 through c5 as follows:

    0 - Simple Bitwise Model (default)
    1 - Indirect Bitwise Model
    2 - Indexed Bitwise Model Array
    3 - Hashed Bitwise Model
    4 - Bitwise Linear CM
    5 - Bitwise Linear CM With SSE
    
    c1 failed on enwik8. It produced a "compressed" file about 2.5 GB which decompressed incorrectly. The other modes were tested at the highest order allowed by the 2 GB memory space available in the 32 bit version.

    RangeCoderC v1.7 alpha, Dec. 5, 2011, fixes the bug in c1 mode in v1.6. The other 5 modes are presumably the same and were not tested. It is a pre-release of version 1.7, released without source code.

    RangeCoderC v1.7, Dec. 9, 2011, adds two new compression modes:

    6 - Bytewise Hashed Model
    7 - Combined Model
    
    The Bytewise Hashed model uses the hash and cache structure from ZPAQ to achieve high speeds, even at higher orders. The Combined Model uses the same structure as the Double Model but has a hashed context and outputs its predictions into a SSE model for better compression.

    RangeCoderC v1.8, Dec. 13, 2011, removes two obsolete modes and adds one mode: "The Bitwise Adaptive Model uses probabilities instead of counts, which are adjusted nonlinearly for better compression on changing data. The learning speed of the model is derived from the model order." The modes are:

    0 - Bytewise Hashed Model
    1 - Simple Bitwise Model (default)
    2 - Adaptive Bitwise Model
    3 - Indexed Bitwise Model Array
    4 - Hashed Bitwise Model
    5 - Combined Model
    
    Only the new mode (c2) was tested.
                      Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program             Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp Mem  Note
    -------             -------       ----------  -----------  -----------  -----------  ----- ----- ---  ----
    RangeCoderC v1.2    0             99,801,301
                        1             99,660,153
                        2             97,987,717
                        3             96,963,829
                        4             95,670,157
                        5             94,154,825
                        6             87,831,925
                        7             80,009,581
                        8             73,016,189
                        16            46,805,877
                        24            35,625,897
                        25            34,635,889
                        26            33,761,533  320,897,805      4,120 x  320,901,925   1324  1348 1050  26
    RangeCoderC v1.3    27            33,225,249  314,021,089      3,977 x  314,025,066   1210  1234 1050  26
                        26 (Double)   30,934,993  285,258,957      4,052 x  285,263,009   1501  1488 1100  26
    RangeCoderC v1.4    27 (Hashed)   30,371,685  271,371,793      4,407 x  271,376,200   1809  1658 1050  26
                        26 (Double)   30,934,989                   4,359 x                1560  1650 1116  26
                        27 (Indirect) 36,108,281                   4,773 x                2700  3090 1182  26
                        27 (Standard) 33,225,245                   4,288 x                1210  1270 1050  26
    RangeCoderC v1.5    c3 27         30,371,685                   5,747 x                1740  1810 1050  26
    RangeCoderC v1.6    c0 26         33,761,529                   7,028 x                1200  1230  525  26
                        c0 27         33,225,245                                          1280  1330 1050  26
                        c2 26         30,934,989                                          1610  1680 1116  26
                        c3 26         30,832,497                                          1610  1720  525  26
                        c3 27         30,371,685                                          1740  1790 1050  26
                        c4 26         29,269,185                                          5320  5880 1642  26
                        c5 26         28,461,477  260,009,661      7,028 x  260,016,689   5752  5833 1642  26
    RangeCoderC v1.7a   c1 27         36,108,281                   7,060 x                2570  3000 1182  26
    RangeCoderC v1.7    c0 27         33,225,245                                          1300  1330 1050  26
                        c1 27         36,108,281                                          2490  2420 1182  26
                        c2 26         30,934,989                                          1590  1660 1116  26
                        c3 27         30,371,685                                          1710  1980 1050  26
                        c4 26         29,269,185                                          5120  5130 1641  27
                        c5 26         28,461,477  260,009,661      7,858 x  260,017,519   5832  5779 1642  26
                        c6 27         35,265,593                                           990  1020 1050  26
                        c7 26         28,788,013  254,527,369      7,858 x  254,535,227   2460  2436 1116  26
    RangeCoderC v1.8    c2 28         32,432,825  285,488,437      6,537 x  285,494,974   1338  1363 1050  26
    

    .2561 quad

    quad is a free file compressor by Ilia Muraviev. Only the latest version (now open source) is supported, so only that version appears in the main table.

    As described by the author: QUAD uses ROLZ compression (Reduced Offset LZ). It makes use of an order-2 context to reduce the offset set that is matched to. This can be regarded as a fast large dictionary LZ. Literals and Match Lengths fits in a single alphabet which is coded using an order-2-0 PPM with Full Exclusion. Match indexes are coded using an order-0 model. QUAD uses a 16 MB dictionary. For selectable compression speed and ratio, QUAD uses different parsing schemes: with Normal mode (Default) QUAD uses a Lazy Matching; with Max mode (-x option) QUAD uses a variant of Flexible Parsing. In addition, QUAD has an E8/E9 transformer for better executable compression which is always enabled.

    quad 1.01a (Dec. 24, 2006) used LZ77. It was closed source and took no options.

    quad 1.04a (Feb. 8, 2007) used LZP. Memory was expanded for this version only, however it is no longer supported.

    quad 1.07beta (Feb. 22, 2007) included the "x" option for better compression.

    quad 1.08 was released Mar. 12, 2007. Quad became open source.

    quad 1.10 was released Mar. 19, 2007. -x selects maximum compression.

    quad 1.11 (Apr. 4, 2007) uses ROLZ.

    quad 1.11HASH2 (Apr. 5, 2007, experimental, no source code) produces the same size archives, but uses a hash table for faster compression.

    quad 1.12 was released Apr. 7, 2007.

    Compression            Compressed size      Decompresser  Total size   Time (ns/byte)
    Program              enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------            ----------  -----------  -----------  -----------  ----- -----  --- ----
    quad v1.01a        29,930,547  263,137,995     26,927 x  263,164,922   1281   168   33 LZ77
    quad v1.04a        27,712,832  239,596,416     38,552 x  239,634,968    933   748  165 LZP
    quad v1.07b     x  29,360,404  258,361,092     61,067 x  258,422,159   1282   146   33 LZP
    quad v1.08      x  29,171,593  256,664,803     13,042 s  256,677,845   1206   164   33 LZP
    quad v1.10      -x 29,152,166  256,486,470     13,288 s  256,499,758   1007   117   34 LZP
    quad v1.11      -x 29,110,579  256,145,858     13,387 s  256,159,245    956   116   34 ROLZ
    quad v1.11HASH2 -x 29,110,519  256,145,858     30,129 x  256,175,987    705   117   42 ROLZ
    quad v1.12      -x 29,110,519  256,145,858     13,516 s  256,159,334    527   120   34 ROLZ
    

    .2572 WinACE

    WinACE 2.61 is a shareware GUI/command line archiver, Mar. 8, 2006. It compresses in ACE and ZIP formats and decompresses many others. ACE decompresses much faster than it compresses, suggesting it is based on LZ77. The option -m5 selects maximum compression. -d4096 select maximum dictionary size of 4MB (default is -1024 = 1MB). -sfx creates a self extracting archive, which adds less space than the program itself.

    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
      Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp
      -------        ----------  -----------  -----------  -----------  ----- -----
    -sfx -m5 -d4096  29,481,470  257,237,710          0 xd 257,237,710   1080    77
    -sfx -m5         30,919,182  270,578,538          0 xd 270,578,538    738    79
    -sfx             30,937,342                                          ~770   ~40
    

    .2584 RH4

    RH is a free, experimental file compressor by Nauful, Feb. 17, 2014. There are two versions, RH and RH2. RH uses order 3 ROLZ and Huffman coding, using 8 MB memory. RH2 has 3 compression levels using 64 MB memory. Level c1 uses LZP. c2 uses order 1 ROLZ with limited search. c3 uses full search. A literal is coded with 1 bit plus the value. A match is coded with 1 bit to signal a match, 8 bits for the length, and 12 bits for the index into the ROLZ table.

    The 32 and 64 bit Windows .exe versions produce incompatible archives. The 32 bit version was tested in Windows. The 64 bit version was tested in Ubuntu under Wine 1.6.

    RH2 20Feb2014, released Feb. 27, 2014, has 5 compression levels c1..c5.

    RH4_x64, Mar. 22, 2014 is an archiver with file-level deduplication and compression improvements. It has 6 compression levels. There are several earlier versions without version numbers that were not tested.

    RH4 Apr 24 2014 version.

                   Compression     Compressed size      Decompresser  Total size  Time (ns/byte)
    Program          Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------          -------    ----------  -----------  -----------  -----------  ----- ----- ---- ---- ----
    RH_x86                      35,675,086                  91,772 x                  78    47    8 ROLZ  26
    RH2_x86            c1       34,857,781                                            67    27   64 LZP   26
                       c2       31,957,388                                           149    28   64 ROLZ  26
                       c3       31,937,059  279,524,710     93,364 x  279,618,074    152    28   64 ROLZ  26
    RH2_x64            c3       31,937,063  279,524,714     97,016 x  279,621,730     72    20   64 ROLZ  48
    RH2_x64 20Feb2014  c1       34,816,471  306,646,293                               32    15   64 LZP   48
                       c2       32,215,361  282,209,254                               48    14   64 ROLZ  48
                       c3       30,960,001  271,181,799                               67    17   64 ROLZ  48
                       c4       30,787,281  269,670,002                               76    15   64 ROLZ  48
                       c5       30,543,306  267,344,532     53,408 x  267,397,940    447    18   64 ROLZ  48
    RH4_x64 22Mar2014  c1       32,664,118                                            44    13
                       c2       31,309,650                                            47    12
                       c3       30,906,206                                            61    12
                       c4       30,872,697                                            64    12
                       c5       30,030,867                                           128    11
                       c6       29,553,289  258,411,625     79,155 x  258,490,780    301    12   27 ROLZ  48
    RH4_x64 24Apr2014  c2       31,309,670  274,101,406     90,071 x  274,191,477     44     9   31 ROLZ  48
                       c6       29,553,309  258,411,645     90,071 x  258,501,716    287     9   31 ROLZ  48
    

    .2589 lzsr

    lzsr 0.01 is a free file compressor for Windows by Nania Francesco Antonio, Oct. 1, 2011. It is described as using a "fusion of LZ77-LZP and SR" and arithmetic coding. It takes no options.

    .2625 xpv5

    xpv5 is a free Windows command line file compressor by Abhilash Anand, Oct. 20, 2011. It is described as using ROLZ with an order 1 back end. It has 3 compression levels: c0, c1, c2. All levels use 9 MB memory for compression or decompression. It is single threaded.

    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
      Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem  Alg Note
      -------        ----------  -----------  -----------  -----------  ----- -----  ---  --- ----
    xpv5 c0          31,675,180  277,174,541     14,371 x  277,188,912   908    534    9 ROLZ  26
         c1          30,297,863  265,643,665     14,371 x  265,658,036  1236    515    9 ROLZ  26
         c2          29,963,217  262,525,246     14,371 x  262,539,617  2359    516    9 ROLZ  26
    

    .2660 sr3c

    sr3c 1.0 is a free, open source (MIT license) file compressor and library by Kenneth Oksanen, released Nov. 27, 2008. It uses symbol ranking, based on ideas from SR3, but completely rewritten in C. The distribution contains a portable compression engine and source code for drivers for UNIX/Linux. To test, I wrote a simple driver for Windows (sr3cw) and compiled it using gcc 3.4.5 -O3 -fomit-frame-pointer -march=pentiumpro -s and included sr3cw.exe in the distribution. The driver takes no options.

    .2665 lzc

    lzc v0.01 is a free, closed source file comprssor by Nania Francesco Antonio, May 8, 2007. It uses an LZ77 like algorithm. The option 4 selects the maximum memory mode, 1 GB + 100 MB for compression and 16 + 100 MB for decompression. The actual memory usage indicated by Windows Task Manager in this mode was 360 MB for compression and 107 MB for decompression.

    lzc 0.03 was released May 11, 2007.

    lzc 0.04 was released May 16, 2007. All versions up to 0.04 use 107 MB memory for decompression.

    lzc 0.05b was released May 26, 2007. It has options from 1 (fastest) to 16 (best compression). It uses 771 MB to compress and 390 MB to decompress.

    All versions through 0.05b are linked in the above archive.

    lzc 0.06b was released Aug. 27, 2007. It uses 790 MB (peak) for compression and 409 MB (peak) for decompression.

    lzc 0.07 was released Oct. 24, 2007. Options range from 1 (fastest) to 10 (slowest).

    lzc 0.08 was released Nov. 15, 2007. It improves BMP and WAV compression.

    Compressor   Opt     enwik8      enwik9         Prog      Total       Comp Decomp  Mem Alg
    ---------    ---   ---------   -----------     -------  -----------   ----  ----   --- ----
    lzc v0.01     4    40,312,925  363,504,638     7,656 x  363,512,294    238    61   360 LZ77
    lzc v0.03     4    37,908,748  341,811,895     8,268 x  341,820,163    182    61   515 LZ77
    lzc v0.04     4    37,779,426  340,628,765     8,869 x  340,637,634    142    59   540 LZ77
    lzc v0.05b    1    44,893,624                                          117    54       LZ77
    lzc v0.05b   16    30,611,315  267,784,591     9,158 x  267,793,749    365    82   771 LZ77
    lzc v0.06b   16    30,611,315  267,784,590    12,170 x  267,796,760    347    68   790 LZ77
    lzc v0.07     1    40,554,444                                          110    60    70 LZ77
    lzc v0.07    10    30,611,315  266,565,255    28,997 x  266,594,252    309    67   584 LZ77
    lzc v0.08    10    30,611,315  266,565,255    11,364 x  266,576,619    302    63   550 LZ77
    

    .2688 zling

    zling (discussion) is a free, open source (BSD license) file compressor by Zhang Li, Nov. 1, 2013. It uses order 1 ROLZ, based on the order 3 ROLZ compressor zlite. It takes no options. The compressor is C source code only. To test, it was compiled with gcc 4.8.0 -O3 for 32 bit Windows.

    zling (discussion) was updated Dec. 25, 2013. It was tested in Ubuntu with gcc 4.8.1 and Boost_1_55_0 using the supplied Makefile.

    zling 20140121 (discussion), Jan. 21, 2014, has some optimizations, and removes Boost. It was tested by compiling with g++ 4.8.1 -O3 in Windows and with the supplied Makefile in Linux.

    libzling 20140219, Feb. 19, 2014, separates the program into compression API and a simple demo program. It was tested by building the demo using cmake under Linux as recommended in the readme file.

    libzling 20140324 was released Mar. 24, 2014. The demo program has 5 compression levels.

    libzling 20140414 was released Apr. 14, 2014. It is faster with better compression.

    libzling 20140430-bugfix (discussion) was released May 4, 2014.

                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------           -------        ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    zling Nov-01-2013                33,297,650  292,746,596      5,468 s  292,752,064     80    21   37 ROLZ 26
    zling Dec-25-2013                32,222,737  282,435,374     12,807 s  282,448,181     33     8   27 ROLZ 48
    zling Jan-21-2014                32,189,336  281,869,136     14,886 s  281,884,022     78    21   29 ROLZ 26
    zling Jan-21-2014                32,189,336  281,869,136     14,886 s  281,884,022     29     7   29 ROLZ 48
    zling_demo Feb-19-2014           31,310,257  274,180,830     32,046 s  274,212,876     56    14   27 ROLZ 48
    zling_demo Mar-24-2014   e0      33,391,083                                            24     9   27 ROLZ 48
                             e1      32,613,829                                            29     9   27 ROLZ 48
                             e2      31,732,466                                            33     9   27 ROLZ 48
                             e3      31,310,257                                            40     9   27 ROLZ 48
                             e4      30,861,848  270,258,636     32,421 s  270,291,057     51     9   27 ROLZ 48
    zling_demo 201401414     e0      32,456,306  284,804,449                               23     9   27 ROLZ 48
                             e1      31,800,497  278,703,086                               28     9   27 ROLZ 48
                             e2      31,419,861  275,231,487                               32     9   27 ROLZ 48
                             e3      31,064,418  271,969,050                               36     9   27 ROLZ 48
                             e4      30,782,340  269,496,300     31,644 s  269,527,944     42     9   27 ROLZ 48
    zling_demo 20140430-bugfix e0    32,378,187                                            29    11   27 ROLZ 48
                             e1      31,720,214                                            30    11   27 ROLZ 48
                             e2      31,340,822                                            34    11   27 ROLZ 48
                             e3      30,979,872                                            39    11   27 ROLZ 48
                             e4      30,707,022  268,793,105     32,148 s  268,825,253     40    10   27 ROLZ 48
    

    .2794 crush

    crush 0.01 is a free, experimental file compressor by Ilia Muraviev, May 17, 2011. It uses LZ77. It has 3 compression modes: cf (fast), c (medium), and cx (best). Compression in all modes use 143 MB memory, and decompression uses 65 MB.

    Source code (public domain) was released on June 26, 2013. The file format consists of 64 MiB blocks with a 4 byte header in machine dependent (LSB first for x86) order giving the block size. Literal and match codes are packed LSB first and padded with trailing 0 bits in the last byte. Codes are as follows:

      0,xxxxxxxx              - literal byte x
      1,1,xx                  - match length x+3   (3..6)
      1,0,1,xx                - match length x+7   (7..10)
      1,0,0,1,xx              - match length x+11  (11..14)
      1,0,0,0,1,xxx           - match length x+15  (15..22)
      1,0,0,0,0,1,xxxxx       - match length x+23  (23..54)
      1,0,0,0,0,0,xxxxxxxxx   - match length x+55  (55..566)
    
    A match code is followed by 2 fields (call them L and P) giving the offset. L is 4 bits, and gives the length of P. If L is 0000, then P is 5 bits and the offset is P + 1 (1..32). If L is in 1..15, then P is L + 4 bits long and the offset is 2L+4 + P + 1 (33..220). A match is decoded by going back offset bytes in the output and copying the specified length to the output.

    The compressor maintains an index for finding matches consisting of two hash tables of size 221 for strings of length 3 and 224 for strings of length 4. The second table is maintained as a linked list. The two rolling context hashes are computed by shifting the current hash 7 or 6 bits left, respectively, adding the next byte, and chopping off the high bits. It tests the length 3 hash first, then follows the linked list of length 4 hashes to find the best match for up to 4, 256, or 4096 locations in the input buffer for compression options cf, c, and cx respectively. In addition for option cx, the compressor looks ahead one byte and codes the current byte as a literal if starting at the next byte produces a better match. A match is better if it is longer with a penalty of log16 offset plus one for the literal in case of looking ahead. The minimum match length is 3 for offsets less than 64 KiB, otherwise 4.

    To save memory, only the last 220 linked list pointers are saved in a rotating queue. As a speed optimization for testing matches, the first and last byte at the current best match length are tested first, then the rest of the string.

    crush 1.00 (discussion) was released July 1, 2013. It increases the window size from 220 to 221, thus increasing the minimum and maximum length of an offset code by 1 bit, i.e. if L is 0 the P is 6 bits (1..64) and if L is in 1..15 then P is L + 5 bits (65..221). Also, the penalty for coding a match offset is changed to log8(offset/16).

                   Compression     Compressed size      Decompresser  Total size  Time (ns/byte)
    Program          Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------          -------    ----------  -----------  -----------  -----------  ----- ----- ---- ---- ----
    crush 0.01       cf         37,401,090  330,975,986     46,879 x  331,022,865     94  17.2  143 LZ77 26
    crush 0.01       cf         37,401,090  330,975,986     46,879 x  331,022,865     21   4.2  143 LZ77 50
    crush 0.01       c          33,618,865                                          1040  13    143 LZ77 26
    crush 0.01       c          33,618,865  297,103,092     46,879 x  297,721,957    129   3.9  143 LZ77 50
    crush 0.01       cx         32,577,338                                          4490  13    143 LZ77 26
    crush 0.01       cx         32,577,338  287,333,602     46,879 x  287,380,481    532   3.8  143 LZ77 50
    crush 0.01       cx         32,577,338  287,333,602      2,469 s  287,336,071    532   3.8  143 LZ77 50
    
    crush 1.00       cf         37,308,893                                           132  15    148 LZ77 26
    crush 1.00       c          32,878,537                                          1541  15    148 LZ77 26
    crush 1.00       cx         31,731,537                                          7916  15    148 LZ77 26
    crush 1.00       cx         31,731,711  279,491,430      2,489 s  279,493,919    948   2.9  148 LZ77 60
    

    .2839 bzp

    bzp 0.2 is a free file archiver by Nania Francesco Antonio, Sept. 16, 2008. It uses LZP and arithmetic coding. It takes no options. Earlier versions (0.0, 0.1) were not tested.

    .2857 ha

    ha 0.98 is a free command line archiver by Harry Hirvola, Jan. 7, 1993. A later version, 0.999b, is available for UNIX with source code and ports to DOS. It uses order-5 PPMC (PPM with fixed escape probabilities for dropping to a lower order context. Newer PPM compressors (PPMZ, PPMII) use adaptive escape probabilities given a small context.) The command a2 selects compression method HSC (default is a1 = ASC). a21 automatically chooses the best method. Time is ns/byte.
    Version     Options      enwik8    Comp  Decomp Notes
    --------    -----      ----------  ----  ----   -----
    ha 0.98      a1        36,379,137   873   257          ns/byte
    ha 0.98      a2        31,250,524  2080  1850
    ha 0.999b    a21       31,250,523  2447          16    DOS compile, 1995
    ha 0.9991a   a21       31,250,524  1551          16    DOS (.com) compile, 1995
    ha 0.999b    a21       31,250,524  1290          16    Compiled for NT by Michael Markowsky at Apr 30 1997
    lgha v1.1    a21       31,250,524  1110          16    ha v.0999c DOS compile by Lyapko George, 1999
    lgha v1.1              31,250,524  1068  1114    16
    

    .2924 irolz

    irolz source code is a free, open source (GPL), experimental file compressor by Andrew Polar, Sept. 26, 2010. It uses ROLZ. The algorithm is like LZ77 except that match offsets are coded by counting previous occurrences of the current context in the history buffer rather than as pointers. In irolz, the context is order 2. Previous occurrences are stored in a linked list with a maximum length of 31 (5 bit offset). Matches less than 4 bytes are coded as literals. Symbols (match flags, 5 bit offsets, 8 bit lengths, and 8 bit literals) are binary arithmetic coded. Lengths and literals are coded in an order 2 context model. Match flags and offset counts are modeled without context. Each symbol and context to be predicted is mapped to 2 16-bit predictions, one fast adapting (learning rate 1/8) and one slow adapting (rate 1/64). The prediction is the average of the two.

    Only source code is available. For this test, the program irolz.cpp was compiled using g++ 4.5.0 on a 2 GHz T3200 under 32 bit Vista with options -O2 -march=pentiumpro -fomit-frame-pointer -s.

    .2961 lcssr

    symbra 0.2 is a free, open source (GPL) (mirror with .exe) file compressor by Frank Schwellinger, Nov. 29, 2007. It uses symbol ranking. Only source code (C++) is provided. For the test, the program was compiled as indicated in the source comments and tested in Windows XP (32 bit). The option -c4 or -c5 selects order 4 or 5 context. -m5 turns on suffix matching with maximum buffer size, which greatly slows compression. -p2 selects 2 passes, which reorders the alphabet by descending frequency. The defaults are -c4 -m0 -p1.

    lcssr 0.2 (Dec. 3, 2007, same website) (mirror with .exe) is derived from symbra. It drops the secondary symbol queue and instead uses a variable length context based on the length of the longest match as with LZ77/LZP. The option -b7 selects a 1152 MB buffer for finding context matches.

                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------           -------        ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    symbra 0.2        -c4 -m0 -p1    38,308,164  352,524,859     11,299 s  352,536,158    245   282   68 SR  26
    symbra 0.2        -c4 -m5 -p2    34,644,072  302,948,753     11,299 s  302,960,062   4669  4633  112 SR  26
    symbra 0.2        -c5 -m5 -p2    34,683,661  302,656,095     11,299 s  302,667,394   4700  4622  112 SR  26
    lcssr 0.2         -b7 -l9        34,549,048  296,160,661      8,802 x  296,169,463   8186  8281 1184 SR  26
    

    .2984 zlite

    zlite is an open source file compressor by Zhang Li, Aug. 20, 2013. It uses ROLZ. It was released as C source code only. To test, it was compiled with MinGW gcc 4.8.0. with option -O3. zlite takes no options.

    .3062 lazy

    lazy v1.00 is a free, open source file compressor by Matt Mahoney, Oct. 10, 2012. It uses LZ77. It has 5 compression levels from 1 to 5. Higher levels are slower and use more memory to compress. However decompression speed does not change and always uses 16 MB.

    The LZ77 format codes literals uncompressed after a length code. Matches can have an offset in the range 1 to 224-1 and length 4 to 224-1. Literals are coded as 00,N,L[N], where N is the number of literals to follow coded in marked binary. A marked binary number discards the leading 1, then precedes each bit by a 1 and marks the end with a 0 bit. For example, 5=101 would be coded as 1,0,1,1,0. Matches are coded as 5 bits to indicate the number of offset bits (where the first 2 bits are not 00) in the range 0..23, then the match length as a marked binary number except for the last 2 bits, then the low 2 bits of the match length are coded directly, and then 0 to 23 bits of the offset without the leading 1 bit.

    Compression is achieved in a 16 MB sliding window implemented as a pair of buffers. A hash table of 219 buckets of 2level (2..32) pointers each, indexed by an order 4 context hash, maintains pointers for finding matches. The longest match of length at least 4 is coded, except that if the offset is over 64K and the last symbol is a match, then the minimum length is 5.

                    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options          enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------           -------        ----------  -----------  -----------  -----------  ----- -----  --- ---
    lazy 1.00         1              40,518,222  359,237,695      5,986 s  359,243,681     57    25   36 LZ77
                      2              38,580,043  340,152,648      5,986 s  340,158,634     75    25   40 LZ77
                      3              37,074,105  325,609,617      5,986 s  325,615,603    104    29   48 LZ77
                      4              35,908,430  314,545,955      5,986 s  314,551,941    166    25   64 LZ77
                      5              35,024,082  306,245,949      5,986 s  306,251,935    273    24   96 LZ77
    

    .3085 zhuff

    zhuff 0.1 is a free file compressor for Windows by Yann Collet, Dec. 13, 2009. It is described as a combination of LZ4 and Huff0, a fast Huffman coder. LZ4 uses LZSS, an LZ77 variant using flags to identify matches and literals. It requires the Microsoft runtime libraries, which are not included in the program size shown.

    zhuff 0.7, Mar. 15, 2011, is multithreaded. It automatically detects the number of cores and compressses or decompresses in parallel, or the number can be changed with -t. However, since the program is already faster than disk I/O with one thread, using more threads makes no difference in practice. Speeds shown below are total process times. Actual times are 17 seconds to compress and 44 to decompress with either -t1 or -t2. Compressed size is the same either way and the archives are compatible but not identical.

    zhuff 0.8 (discussion) has 3 compression levels, from -c0 (fastest) to -c2 (best). All are multithreaded, but decompression at all levels and compression with -c0 is I/O bounded (about 40 seconds). Times are process times for these cases, and real times for -c1 and -c2 compression.

    zhuff 0.95b was released Jan. 27, 2014. zhuff 0.97 beta was released Feb. 2, 2014. Both programs were tested using the 64 bit Windows version under Ubuntu Wine. There are also 32 bit Windows versions that produces identical compressed files.

                    Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------           -------       ----------  -----------  -----------  -----------  ----- -----  --- ---- ----
    zhuff 0.1                       43,299,291  384,578,436      9,626 x  384,588,062     16    10  1.4 LZ77 26
    zhuff 0.7         -t1           40,974,542  365,122,888     45,522 x  365,168,410     17    10   12 LZ77 26
                      -t2           40,974,542  365,122,888     45,522 x  365,168,410     17    10   19 LZ77 26
    zhuff 0.8         -c0           40,990,942  365,277,964     50,939 x  365,328,903     18    13   19 LZ77 26
                      -c1           36,235,017  320,629,066     50,939 x  320,680,005     73    12   19 LZ77 26
                      -c2           35,078,148  309,881,876     50,939 x  309,932,815    111    11   19 LZ77 26
    zhuff 0.95b       -c0           40,615,710  362,653,616     61,684 x  362,715,300      6.5   4.2 32 LZ77 48
                      -c1           35,973,813  319,010,291     61,684 x  319,071,975     15     3.6 32 LZ77 48
                      -c2           35,022,597  309,639,139     61,684 x  309,700,823     24     3.6 32 LZ77 48
    zhuff 0.97 beta   -c0           37,076,873  328,438,763     63,209 x  328,501,972     10     4.0 32 LZ77 48
                      -c1           35,864,003  317,929,499     63,209 x  317,992,708     16     3.7 32 LZ77 48
                      -c2           34,907,478  308,530,122     63,209 x  308,593,331     24     3.5 32 LZ77 48
    

    .3092 slug

    slug v1.1b (mirror) is a free, closed source file compressor by Christian Martelock, Apr. 26, 2007. It uses an LZ type algorithm with a 128K non-sliding window and Huffman coding. It is designed for high speed and low memory usage. System (wall) times for enwik9: 18 (51) seconds for compression, 14 (30) for decompression.

    slug 1.27, May 7, 2007, uses a ROLZ variant with a 8MB non-sliding window and semi-dynamic Huffman coding trees rebuilt every 4KB (more frequently near the beginning of a file).

             Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program   Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------   -------       ----------  -----------  -----------  -----------  ----- -----  --- ---
    slug 1.1b               45,274,048  404,250,979      5,836 x  404,256,815     18    14    1 LZ77
    slug 1.27               35,093,954  309,201,454      6,809 x  309,208,263     32    28   14 ROLZ
    

    .3098 pigz

    pigz 2.2.3 is a free command-line file compressor for Linux, Jan. 15, 2012. It uses the deflate (LZ77) format for compatibility with gzip, but is multi-threaded for better speed at a small cost in compression ratio. -9 selects best compression. Decompression is single-threaded and I/O bound.

    pigz is distributed as source code only. It requires linking with zlib version 1.2.3 or higher. For this test, pigz was compiled using the supplied Makefile under Ubuntu Linux with g++ 4.6.1 and linked to zlib 1.2.5. Decompression was tested with unpigz, compiled similarly. It was tested on a 2.66 GHz Core i7 M620 (2 cores x 2 hyperthreads per core) as in note 48. Virtual memory usage was measured with top at 115 MB for compression and 33 MB for decompression. Resident memory usage was 2 MB. Compression time is real time at about 350% CPU usage. Decompression is I/O bound (less than 100% CPU), so CPU time is reported. gzip is shown for comparison.

    pigz 2.3, Mar. 4, 2013, adds option -11 implelemting Google's zopfli algorithm, a very highly optimized and slow implementation of deflate. Decompression speed is not affected and is compatible with gzip. The test program was built from source code in Ubuntu using the supplied Makefile with g++ 4.6.3.

                          Compression                 Compressed size      Decompresser  Total size  Time (ns/byte)
    Program                 Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Notes
    -------                 -------               ----------  -----------  -----------  -----------  ----- ----- ---- ---- --
    gzip 1.3.5              -9                    36,445,248  322,591,995     34,408 x  322,626,403     55    22  4.5 LZ77 48
    pigz 2.2.3              -9                    36,490,716  322,926,625     36,521 xd 322,963,146     31    10  115 LZ77 48
    pigz 2.3                                      36,565,142  324,081,152     52,717 s  324,133,869     25    12    3 LZ77 48
                            -9                    36,490,716  322,926,625     52,717 s  322,979,342     29    13    3 LZ77 48
                            -11                   35,002,893  309,812,953     52,717 s  309,865,670   2237    13   25 LZ77 48
    

    .3102 kzip

    kzip is a free, closed source command line compressor by Ken Silverman, compiled May 13, 2006, released May 18, 2006. It is an optimizing compressor producing zip-compatible archives but with better compression. The option /b512 sets the block splitting threshold. The default is /b256, but /b512 was found optimal on enwik8. /s0 (default) selects maximum compression and ranges from /s0 to /s3. No decompresser is included, but archives can be read with any program that reads zip files (pkzip, unzip, 7zip, WinRAR, WinACE, etc).
    Options      enwik8    Comp (ns/B)    enwik9
    -------    ----------  -----------  ----------
    /s0 /b0    35,029,924  2490                     (one large block)
    /s0 /b256  35,025,767  5220         310,281,906 (default, s0 = extreme mode)
    /s0 /b512  35,012,219  5410         310,248,404 (best enwik8)
    /s0 /b1024 35,016,649  4440         310,188,783 (best enwik9)
    /s1        35,028,473  5240                     (s1 = intense mode)
    /s2        42,370,689   860                     (s2 = longest run)
    /s3        63,191,700   820                     (s3 = Huffman code only)
    pkzip 204  36,934,712   123                     (for comparison)
    

    .3128 uc2

    uc2 (UltraCompressor II revision 3 pro) is a commercial (free for noncommercial use) command line and GUI archiver for DOS by Nico de Vries, June 1, 1995. It uses LZ77 and Huffman coding. The -tst option selects maximum compression.

    uc2 includes a program for converting archives to self extracting programs (uc2sea) which produced smaller files (enwik8.exe = 35,397,343 bytes, enwik9.exe = 312,759,499 bytes), but in this mode decompression failed for enwik9, truncating the last 21 bytes of output. uc2sea works by first extracting the archive and then recompressing it using a slightly different algorithm.

    .3141 thor

    thor 0.9a is an experimental, closed source, command line file compressor by Oscar Garcia, Mar. 19, 2006. It is the fastest compressor on the maximumcompression benchmark. It has 3 modes: ef (fastest), e (normal) and ex (best). However in this test it appears speed may be limited by disk I/O.

    thor 0.94 alpha (mirror) (mirror) was relesed Apr. 22, 2007. exx is a new mode to select maximum compression. Times shown are process times excluding disk I/O. Actual times are 96 sec. to compress, 75 sec. to decompress).

    thor 0.95 (mirror), May 8, 2007, has 5 compression options: e1 through e4 are LZP in order of increasing compression; e5 is LZ77. Note that e5 is best on enwik8 but e4 on enwik9.

    thor 0.96a, Aug. 23, 2007, works like 0.95.

                    Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp Mem
    -------           -------       ----------  -----------  -----------  -----------  ----- ----- ---
    thor 0.9a         ex            41,670,916  368,669,696     61,556 x  368,731,252     54    51 5.5
    thor 0.9a         e             45,842,692  412,096,696     61,556 x  412,157,852     44    50
    thor 0.9a         ef            55,063,944  490,400,720     61,556 x  490,461,876     45    53
    
    thor 0.94a        exx           35,696,028  315,611,168     68,922 x  315,680,090     82    32   2
    
    thor 0.95         e1            55,138,792                                            21    27
    thor 0.95         e2            45,714,740                                            21    23
    thor 0.95         e3            41,528,948                                            29    29
    thor 0.95         e4            35,795,184  314,092,324     49,925 x  314,142,249     64    34  16
    thor 0.95         e5            35,696,032  315,611,172     49,925 x  315,661,097     80    22   2
    
    thor 0.96a        e1            54,915,456  488,397,982     50,071 x  488,448,053     17    20 1.6
    thor 0.96a        e2            45,714,724  411,416,252     50,071 x  411,466,323     23    19 1.5
    thor 0.96a        e3            41,531,628  367,671,220     50,071 x  367,721,291     27    24   6
    thor 0.96a        e4            35,795,184  314,092,324     50,071 x  314,142,395     62    30  16
    thor 0.96a        e5            35,696,032  315,611,172     50,071 x  315,661,243     80    18   2
    

    .3148 etincelle

    etincelle alpha 3 is a free file compressor by Yann Collet, Mar. 26, 2010. It uses ROLZ with an order 1 context to reduce the offest length, followed by Huffman coding.

    .3211 gzip124hack

    gzip124hack (mirror) (discussion) is a modified version of gzip 1.2.4 by Ilia Muraviev, Aug. 13, 2007. It uses LZ77. It is a file compressor like gzip, except that it does not delete the input file. It improves compression by using LZ77 lazy matching with 2 byte lookahead. The compressed format is compatible with gzip. -9 selects maximum compression.

    .3224 doboz

    doboz 0.1 is a free, open source file compressor by Attila T. Áfra, Mar. 18, 2011. It uses LZ77. It is both a compression library and a simple single-threaded file compressor which takes no options. To test, the supplied compressor for 32 and 64 bit Windows was tested. The 32 bit version crashed while compressing enwik9, possibly due to reading the whole file into memory. The 64 bit version succeeded under Ubuntu/wine.

    .3226 gzip

    gzip 1.3.5 is an open source single file command line compressor by Jean-loup Gailly and Mark Adler, Sept. 30, 2002. It uses LZ77 (flate, but not compatible with zip). The -9 option selects maximum compression although its effect is small (see below).

                  Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
    Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp Note
    -------         -------         ----------  -----------  -----------  -----------  ----- ----- ---- 
    gzip 1.3.5      -9              36,445,248  322,591,995     34,408 x  322,626,403     55    22  48 (Linux)
    gzip 1.3.5      -9              36,445,248  322,591,995     38,801 x  322,630,796    101    17     (Windows)
    gzip 1.3.5                      36,518,329  323,742,882     38,801 x  323,781,683     85    19
    
                  Compression          Compressed size      Decompresser  Total size   Time (ns/byte)
    Program         Options           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Note
    -------         -------         ----------  -----------  -----------  -----------  ----- ----- ---- ---- 
    doboz 0.1                       36,367,430         fail     76,471 x                 940    10       26
                                    36,367,430  322,415,409     83,591 x  322,499,000    533   3.4 1200  48
    

    .3226 Info-ZIP

    Info-ZIP 2.3.1 (Mar. 8, 2005) is a free, open source archiver for many operating systems. It uses the standard LZ77 "flate" format, like gzip and many zip-compatible programs. (The sizes are exactly 125 bytes larger than gzip). This test was under Linux (Ubuntu 2.6.15.27-amd64-generic) on a 2.2 GHz Athlon-64. Uncompression was with UnZip 5.52 (Feb. 28, 2005), both part of the normal Ubuntu distribution. The -9 option selects maximum compression.

    The Windows version 2.32 is dated June 19, 2006.

    Info-ZIP 3.00 was released July 7, 2008. Decompression was tested with UnZip 6.00, released Apr. 29, 2009.

                          Compression                 Compressed size      Decompresser  Total size  Time (ns/byte)
    Program                 Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Notes
    -------                 -------               ----------  -----------  -----------  -----------  ----- ----- ---- ---- --
    Info-ZIP 2.31 (Linux)   -9                    36,445,373  322,592,120     57,583 x  322,649,703    104    35  0.1 LZ77
    Info-ZIP 2.32 (DOS)     -9 (unset TZ)         36,445,333                                           178   101      LZ77 16
    Info-ZIP 2.32 (DOS)     -9                    36,445,351                                           179            LZ77 16
    Info-ZIP 2.32 (Win32)   -9                    36,445,474                                           183            LZ77 16
    Info-ZIP 2.32 (Win32)   -9                    36,445,443  322,592,190     75,806 xd 322,667,996     96    13  1.2 LZ77
    Info-ZIP 3.00 (Win32)   -9                    36,445,475  322,592,222    101,079 xd 322,693,301    114    18  1.3 LZ77 26
    

    .3234 pkzip

    pkzip 2.04e is a commercial (free trial) command line archiver by PKWARE Inc. written Jan 25, 1993. It uses LZ77 (flate format). The option -ex selects maximum compression. The decompresser is pkunzip 2.04e. Times are wall times. (Timer doesn't show process times for DOS programs).

    There are many programs that produce zip files. I don't plan to test them all.

                          Compression                 Compressed size      Decompresser  Total size  Time (ns/byte)
    Program                 Options                 enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------                 -------               ----------  -----------  -----------  -----------  ----- ----- ---- ----
    pkzip 2.0.4                                   36,934,712  327,607,376     29,184 xd 327,636,560    123    44  1.7 LZ77
    pkzip 2.0.4             -ex                   36,556,552  323,403,526     29,184 xd 323,432,710    171    50  2.5 LZ77
    

    .3237 jar

    jar 0.98-gcc is an open source command line archiver by Bryan Burns, 2002. It uses LZ77 (zip). It is included with Java (1.5.0_06) and is normally used to create .jar files for compiled Java applications and applets, but it can also be used as an archiver. It has no compression options. The cvf options creates an archive. The M option says to not add a manifest file.

    Note: this is not the jar compressor from Arjsoft.

    .3244 PeaZip

    PeaZip 1.0 by Giorgio Tani (Nov. 6, 2006) is a GPL open source GUI archiver supporting several common formats. The format tested is the native format which uses zlib (gzip algorithm). The "better" option chooses best compression (equivalent to gzip -9). Integrity check (checksum) and encryption are turned off.

    .3286 arj

    arj 3.10 is a free, open source (GPL v2) archiver by ARJ Software Russia, June 23, 2005. It is compatible with the original ARJ by Robert K. Jung, which was patented (U.S. patent 5140321 A) filed Sept. 4, 1991 and presumably expired. According to the patent, it uses LZ77 with flags to indicate a repeat of the last match (like LZX used in cabarc). Matches are found from a hash table of FIFO queues.

    The options -m0 through -m4 select compression level. The default, -m1, gives maximum compression. -m0 stores with no compression. -m1 through -m4 compress progressively larger but faster, with slower decompression.

    Program   Options    enwik8      enwik9      prog size     Total       Comp Decomp Mem Alg  Note
    -------   -------  ----------  -----------   ----------   -----------   ---- ----- --- ---- ----
    arj 3.10  -m0     100,000,127                                             12    10   3 store 26
              -m1      37,091,317  328,553,982     143,956 x  328,697,938    262    67   3 LZ77  26
              -m2      37,381,391                                            224    68   3 LZ77  26
              -m3      39,413,127                                            185    72   3 LZ77  26
              -m4      44,157,478                                            116    91   3 LZ77  26
    

    .3326 ulz

    ulz 0.01 (discussion) is a free, experimental file compressor by Ilia Muraviev, Feb. 1, 2010. It uses LZ77 with bytewise encoding. The options c1 through c5 select the compression level from fastest to best. The option does not affect memory usage. All levels use 43 MB for compression and 33 MB for decompression.

    ulz 0.02 adds a new faster mode (c1). Options c2 through c6 are the same as c1 through c5 in ulz 0.01.

    Program   Options    enwik8      enwik9      prog size     Total       Comp Decomp Mem Alg  Note
    -------   -------  ----------  -----------   ----------   -----------   ---- ----- --- ---- ----
    ulz 0.01  c1       45,751,335  411,826,108     47,809 x   411,873,917     50    11  43 LZ77 26
              c2       41,677,764                                             77    10  43 LZ77 26
              c3       39,368,127                                            145     9  43 LZ77 26
              c4       37,861,566                                            581     9  43 LZ77 26
              c5       37,652,826  332,626,591                332,674,400   1077     9  43 LZ77 26
    ulz 0.02  c1       50,382,083                                             37    10  43 LZ77 26
              c2       45,751,335                                             52    10  43 LZ77 26
              c3       41,677,764                                             74     9  43 LZ77 26
              c4       39,368,127                                            139     8  43 LZ77 26
              c5       37,861,566                                            576     8  43 LZ77 26
              c6       37,652,826  332,626,591     47,833 x   332,674,424   1056     8  43 LZ77 26
    

    .3344 lzgt3a

    lzgt1 (click on lzgt3a.zip) is one of a group of free, open source, experimental file compressors by Gerald R. Tamayo, released July 17, 2008. It uses LZT (Lempel-Ziv-Tamayo) compression, a LZ77 variant in which the decompresser rebuilds a list of matches sorted by context match length and the match length is implied or partially implied by the position in the list. lzgt implements LZT using a 4K sliding window, 32 byte look-ahead buffer and 3 bit code length. lzgt1 is like lzgt but uses a 16K sliding window and 128 byte look-ahead buffer. lzgt2 eliminates the code length entirely. lzgt3 is an improved version of lzgt2. All programs have separate decompressers (lzgtd1, etc) and are compiled for DOS (and Windows).

    lzgt3a was added Oct. 25, 2008. It uses a 128K window size, 64K lookahead buffer, and improved coding.

    Program           enwik8      enwik9      prog size     Total       Comp Decomp Mem Alg
    -------         ----------  -----------   ----------   -----------   ---- ----- --- ----
    lzgt            47,560,234                   1,989 sd                 634   234   2 LZ77
    lzgt1           43,928,072  403,385,292      2,025 sd  403,387,317   3390   865   2 LZ77
    lzgt2           57,268,099                   1,935 sd                 982   274   1 LZ77
    lzgt3           54,253,334                   1,963 sd                 889   280   1 LZ77
    lzgt3a          37,444,440  334,405,713      4,387 xd  334,410,100   1581  2886   2 LZ77
    

    .3388 lzuf

    lzuf is a free, experimental open source file compressor by Gerald R. Tamayo, Apr. 15, 2009. It uses LZ77 with folded unary encoding of match lengths. It takes no arguments. It has a separate decompression program, lzufd.exe.

    .3502 pucrunch

    pucrunch is a free, open source file compressor by Pasi Ojala, last updated Mar. 8, 2002. It uses a combination of run length encoding (RLE) and LZ77 with Elias Gamma coding of the offsets and run lengths. The original version was written on Mar. 14, 1997 for the Commodore series (Vic 20, Commodore 64, Commodore 128 and Commodore Plus 4/C16) in 6510 assembly language, with updates on Dec. 17, 1997 and Oct. 14, 1998. The 6510 is a 1 MHz, 8 bit microprocessor with 3 registers, 16 bit (64K) address space, no cache, no pipelining, 8 bit ALU, no multiply or floating point instructions, and no support for multitasking or virtual memory. The decompresser was designed to execute quickly in this environment with only a few hundred bytes of memory.

    The most recent version was written in Visual C and ported to Windows as a cross compressor intended to produce self extracting archives for the Commodore. By default, pucrunch appends a 276 byte header containing 6510 code to extract the file. There are also standalone decompressers written in 6510 assembler and in Z80 assembler. I could not test in these environments, so I used the -d -c0 options to turn off the self extracting feature, which requires the (larger) Win32 external compressor/decompresser.

    There are two additional limitations. First, the decompresser appends a 2 byte header to indicate the load address, which is required by the Commodore. To make the decompressed file bitwise identical, this must be stripped off. Second, the input file size is limited to 64,936 bytes. The author tested a modified version without a file size limit on the Calgary corpus, but this modified version was not posted, so I did not use it.

    To overcome these limitations I wrote the following Perl scripts to compress and decompress. The first script compresses by splitting the input into blocks of 64,936 bytes, compressing them separately, and appending the compressed files each with a 2 byte header to indicate the block size. The second script decompresses each block one at a time, strips off the 2 byte Commodore header, and appends them. Each script takes the input and output files as command line arguments. The second script is included in the decompresser size.

    
    #!/usr/bin/perl
    # compress with pucrunch: perl p input output
    open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
    open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
    binmode(IN);
    binmode(OUT);
    while ($n=read(IN, $s, 64936)) {
      open(TMP1,">tmp1")||die "$!: tmp1";
      binmode(TMP1);
      syswrite(TMP1, $s, $n);
      close(TMP1);
      `pucrunch -d -c0 tmp1 tmp2`;
      open(TMP2,"tmp2")||die "$!: tmp2";
      binmode(TMP2);
      $size=(stat(TMP2))[7];
      print("$n -> $size\n");
      $n=read(TMP2,$s,$size);
      printf(OUT "%c%c%s", $size/256, $size%256, $s);
      close(TMP2);
    }
    
    
    #!/usr/bin/perl
    # unpack with pucrunch: perl up input output
    open(IN,"$ARGV[0]")||die "$!: $ARGV[0]";
    open(OUT,">$ARGV[1]")||die "$!: $ARGV[1]";
    binmode(IN);
    binmode(OUT);
    while (($c1=getc(IN)) ne "") {
      $c2=getc(IN);
      $size=unpack("C",$c1)*256+unpack("C",$c2);
      $n=read(IN, $s, $size);
      if ($size!=$n) {die "size=$size n=$n\n";}
      open(TMP1,">tmp1")||die "$!: tmp1";
      binmode(TMP1);
      syswrite(TMP1, $s, $n);
      close(TMP1);
      `pucrunch -u tmp1 tmp2`;
      open(TMP2,"tmp2")||die "$!: tmp2";
      binmode(TMP2);
      read(TMP2,$s,2);
      read(TMP2,$s,64936);
      printf(OUT "%s", $s);
      close(TMP2);
    }
    

    pucrunch suggests using -p1 and -m6 options to improve compression but these do not help.

    Run times are wall times. Using scripts, Timer 3.01 does not provide useful process times, since it times Perl rather than pucrunch. The decompression time (463 sec) is probably high because Windows Task Manager shows that pucrunch is running only a small fraction of the time, perhaps 10%. Most of the time is probably the overhead of file I/O and running pucrunch 15,400 times.

    .3619 packARC

    packARC v0.7RC11 (discussion) is a free, open source (GPL v3) archiver by Matthias Stirner, Dec. 7, 2013. It incorporates packJPG (JPEG compressor), packMP3 (MP3 compressor) and packPNM (BMP, PPM, PGM, PBM image compressor). Other file times are compressed with a simple context model and arithmetic coder. Option -sfx creates a self extracting archive. Option -np tells the program not to pause when done. For this test, the source was compiled with MinGW g++ 4.8.0 using the supplied buil_packarc.bat for 32 bit Windows.

    .3626 urban

    urban is an open source file compressor for Unix by Urban Koistinen, Apr. 30, 1991. The program is an order-2 indirect context model with bitwise arithmetic coding. A hash of the last two whole bytes plus the previously coded bits of the current byte (MSB first) are mapped to a hash table of size 710123. Each table element contains a count of 0s and 1s in the range 0 through 8, and a hash verification consisting of a second hash. When a collision is detected, the counts are reset to 0. Otherwise, the appropriate count is incremented and both are halved if either exceeds 8.

    The pair of bit counts and the character count mod 3 (probably unnecessary) are mapped to a second table of counts to compute the next-bit probability. That table is updated by incrementing the appropriate count and halving both if the sum exceeds 60000. The initial mapping of this second table is (n0,n1) to (n0,n1) except if either of the input counts is 0, in which case the mapping is (0,n1) to (1,1+2^n1) or (n0,0) to (1+2^n0,1). The final bit prediction is n1/(n0+n1).

    The program was a submission to a data compresssion context for Dr. Dobbs Journal. To test, the source code was compiled using make and tested in Linux. It compresses and decompresses from standard input to standard output. It takes no options.

    .3663 lzop

    lzop v1.01 is a free, open source (GPL) command line file compressor by Markus F.X.J. Oberhumer, Apr. 27, 2003. A newer version, 1.02 rc1 was released July 25, 2005, but no Win32 executable was available for download as of May 29, 2006. lzop uses LZ77. It is designed for high speed. -9 selects maximum compression. lzop is I/O bound. timer 3.01 reports the decompression process time as 12 seconds. The remaining 38 seconds is due to disk access.

    .3676 lzw

    lzw v0.1 is a free, experimental file compressor by Ilia Muraviev, Jan. 30, 2008. It uses LZW with 16 bit code words. It takes no options.

    lzw v0.2 was released with public domain source code for the decompresser, which zips to 671 bytes. The file format is as follows. There is no header or trailer. Each 16 bit code word is in machine dependent order (LSB first on x86). Codes 0-255 represent single bytes of the same value. Codes 256-65535 are assigned in ascending order by concatenating the decoded values of the previous two codes. After assigning code 65535, new codes are assigned by replacing the oldest codes first, starting with 256. Data is decoded into a rotating buffer of size 16 MiB (224 bytes) by copying a string from elsewhere in the buffer. Neither the original nor copied string crosses the buffer boundary, and they do not overlap each other. No new symbol is added after decoding the first byte of the buffer.

             Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program   Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------   -------       ----------  -----------  -----------  -----------  ----- -----  --- ---
    lzw 0.1                 42,554,530  380,782,976     42,215 x  380,825,191   1917    27   17 LZW
    lzw 0.2                 41,960,994  367,633,910        671 s  367,634,581   3597    31   18 LZW
    

    .3701 MTCompressor

    MTCompressor v1.0 (discussion) is a free, experimental command line compressor for Windows by David Catt, Jan. 20, 2012. It uses an LZ77 variant similar to deflate. It is multi-threaded. Reported time is real time running on 2 cores (note 26). Memory usage fluctuates during use. The peak is reported.

    .3790 arbc2z

    arbc2z is a free, experimental command line file compressor with source code by David A. Scott, June 23, 2006. It is a bijective order-2 (PPM) arithmetic coder. A bijective coder has the property that all inputs to the decompresser are valid and produce distinct outputs. The above archive also contains arbc2, which uses a different method of handling of the zero frequency problem, arbc1 (order 1), and arbc0 (order 0), all of which are bijective.

                    Compression        Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg
    -------           -------       ----------  -----------  -----------  -----------  ----- -----  --- ---
    arbc2z                          38,756,037  379,054,068      6,255 sd 379,060,323   2659  2674   68 PPM2
    arbc2                           38,780,256  379,093,120      6,070 sd 379,099,190   2528  2646   67 PPM2
    arbc1                           48,586,591  486,892,000      6,047 sd 486,898,047   2439  2611  1.8 PPM1
    arbc0                           63,501,994  644,561,590      5,988 sd 644,567,578   2459  2606  1.5 o0
    

    .3800 lz4

    lz4 v0.2 (website) is a free file compressor by Yann Collet, Oct. 16, 2009. It uses LZSS (an LZ77 variant with flags to mark literals and matches). It takes no options. Run times are dominated by disk access.

    lz4 0.6 was released Dec. 12, 2010. lz4hc 0.9 (Dec. 13, 2010, same link) is a compatible version with better compression. In both cases, run times are dominated by disk access. Times shown are process times. Actual times were 80+37 sec. for lz4 and 137+39 sec. for lz4hc. The programs take no compression options.

    lz4 v1.2 was released Oct. 10, 2011. It has 3 compression levels (c0...c2). The program automatically detects the number of cores (2, note 26) and uses the same number of threads. However compression in mode c0 and all decompression modes are I/O bound, using about 20% of available CPU. For these modes, process time is reported. Compression modes c1 and c2 are real times with both cores fully utilized.

                            Compressed size      Decompresser  Total size   Time (ns/byte)
    Program    Opt         enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------    ---       ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    lz4 0.2              55,092,932  488,028,718      9,556 x  488,038,274     13     7   13 LZ77 26
    lz4 0.6              55,062,753  487,772,940     42,139 x  487,815,079     14     7   13 LZ77 26
    lz4hc 0.9            44,182,558  392,102,544     43,617 x  392,146,161     65     7   14 LZ77 26
    lz4 1.2    -c0       54,303,743  481,142,522     49,128 x  481,191,650     15     6   20 LZ77 26
               -c1       44,218,551  392,460,229     49,128 x  392,509,357     69     6   21 LZ77 26
               -c2       42,870,164  379,999,522     49,128 x  380,048,650     91     6   20 LZ77 26
    

    .3802 lzss

    lzss 0.01 (withdrawn) is a free, experimental file compressor by Ilia Muravyov, Aug. 1, 2008. It uses LZSS, a byte aligned LZ77 variant with matches encoded with an 18 bit pointer and 6 bit length field, and 1 bit flags to distinguish matches from literals. It is discussed here. Compression options are e (fast) or ex (smaller). The program is designed for fast decompression. The program uses 625 MB for compression and 33 MB for decompression.

    lzss 0.02 (discussion) was released Feb. 7, 2014. Options cf, c, cx select fast, medium, and best compression.

               Compression     Compressed size      Decompresser  Total size  Time (ns/byte)
    Program      Options     enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg  Note
    -------      -------   ----------  -----------  -----------  -----------  ----- ----- ---- ---- ----
    lzss 0.01    e         48,615,051  426,009,994      44,555 x 426,054,549    193    15  625 LZSS
                 ex        38,254,303  337,565,308      44,555 x 337,609,863   9708    14  625 LZSS
    lzss 0.02    cf        50,110,565  448,712,956      48,114 x 448,761,070     22    12   17 LZSS  26
                 cf                    448,712,956      48,114 x 448,761,070      6.0       17 LZSS  63
                 c         45,093,733  399,850,630      48,114 x 399,898,744     40    11   17 LZSS  26
                 c                     399,850,630      48,114 x 399,898,744     12.5       17 LZSS  63
                 cx        42,874,387  380,192,378      48,114 x 380,240,492    265    10  145 LZSS  26
                 cx        42,874,387  380,192,378      48,114 x 380,240,492    107   2.3  145 LZSS  63
    

    .3894 xdelta

    xdelta 3.0u is a free, open source command line file compressor by Joshua McDonald, Oct. 12, 2008. It uses LZ77. The program is a delta coder, meaning it will output the compressed difference between two files, and then decompress the second file when given the first file uncompressed. It allows the first file to be omitted, in which case it simply compresses. This is how the test was done. -9 specifies maximum compression.

    .3972 mtari

    mtari 0.2 is a free, open source (GPL v3) file compressor by David Werecat, Dec. 10, 2013. It is a multi-threaded bitwise order 17 context model with arithmetic coding. To test, it was compiled with MinGW gcc 4.8.0 with options -O2 -fopenmp.

    .4092 srank

    srank 1.1 is a free, open source file compressor by P. M. Fenwick, originally written Sept. 5, 1996 and last updated Apr. 10, 1997. It uses symbol ranking, like MTF (move to front) in BWT, but in order 3 contexts without a BWT transform. When a symbol is encountered it is encoded with 1, 3, or 4 bits according to its position in a queue of length 3, then moved to the front. Long runs of first place symbols are run length encoded using 12 bits to encode the length of the length of the run. A miss is coded using pseudo-MTF in an order-0 context using 7 bits for the first 32 symbols and 12 bits for the rest. It is pseudo-MTF because after a symbol is found it is swapped with another symbol about half way to the front, with some dithering. The algorithm is designed for speed rather than good compression.

    The -C8 option selects the maximum number of contexts, 218. For this test, the C source code was compiled with MinGW 3.4.5:

      gcc -O2 -march=pentium4 -fomit-frame-pointer -s srank.c -o srank.exe
    

    .4106 QuickLZ

    QuickLZ v0.1 is an open source (GPL) compression library designed for high speed by Lasse Mikkel Reinhold, Sept. 24, 2006. Tests were performed with demo.exe. Speed is I/O bound. Times shown are process times, but wall times can be 2-4 times greater. On enwik9 compression, the program reports "file too big".

    Version 0.9 (Oct. 22, 2006) is a faster version (quick.exe) which handles large (64 bit) files.

    Version 1.20 (Mar. 15, 2007) is an archiver rather than a file compressor.

    Version 1.30 beta (Apr. 16, 2007) has 4 modes (0-3) with 4 separate executables. Only version 3 (quick3.exe, max compression) was tested.

    Version 1.30 (Aug. 14, 2007) modes 0, 1, and 2 are compatible with version 1.20, but mode 3 (best compression) is new.

    Version 1.40 (Nov. 13, 2007) is an experimental version designed for better speed. It has only one mode.

    Version           enwik8      enwik9      prog size     Total       Comp Decomp Mem Alg
    -------         ----------  -----------   ----------  -----------    ---- ----- --- ----
    QuickLZ 0.1     57,331,969  (fails)         45,361 x                  19    21  154 LZ77
    QuickLZ 0.9     56,900,177  507,806,141     45,086 x  507,851,227     11    11   10 LZ77
    QuickLZ 1.20    57,147,067  510,018,447     43,501 x  510,061,948     17    12    2 LZ77
    quick3 1.30b    46,378,438  410,633,262     44,202 x  410,677,464     48    12    3 LZ77
    QuickLZ 1.30 -3 46,445,704  411,493,051     47,304 x  411,540,355     49    12    2 LZ77
                 -2 51,941,357                                            23    11
                 -1 57,153,015                                            12    11
                 -0 52,803,919                                            20    16
    quickLZ 1.40    47,728,849  417,653,684     43,922 x  417,697,606     28    13   13 LZ77
    

    .4164 lzf

    lzf v1.00 (discussion) is a free, experimental file compressor by Ilya Muravyov, Oct. 29, 2013. It uses byte aligned LZ77 with a 8 KB window. Commands c and cx give faster or better compression, respectively.

    lzf 1.01is a performance optimization with no change in compresion.

    Version           enwik8      enwik9      prog size     Total       Comp Decomp Mem Alg  Note
    -------         ----------  -----------   ----------  -----------    ---- ----- --- ---- ----
    lzf 1.00  c     48,947,532  440,862,551     47,737 x  440,910,288     39    12   18 LZ77 26
              c     48,947,532  440,862,551     47,737 x  440,910,288      8            LZ77 60
              cx    46,318,130  416,377,741     47,737 x  416,425,478     53    11   18 LZ77 26
              cx    46,318,130  416,377,741     47,737 x  416,425,478     14   2.3      LZ77 60
    lzf 1.01  c     48,947,532  440,862,551     47,728 x  440,910,279     39    12   18 LZ77 26
              c     48,947,532  440,862,551     47,728 x  440,910,279      8            LZ77 60
              cx    46,318,130  416,377,741     47,728 x  416,425,469     49    11   18 LZ77 26
              cx    46,318,130  416,377,741     47,728 x  416,425,469     12   2.3      LZ77 60
    

    .4165 stz

    stz 0.7.2 is a free, experimental file compressor by Bruno Wyttenbach, Feb. 15, 2011. It uses LZ77. It has 4 compression modes as shown in the table below. Times are process times. Real times are closer to 40-45 seconds. Memory is 3.3. MB for all compression modes and the same for decompression. Most of the memory is for I/O buffers (2MB each). The actual algorithm uses 48 KB. Modes -c and -c3 compress to the same size but the archives differ by 1 byte in the header. stz.exe zip size is 40,425.

    stz 0.8, Mar. 4, 2011, improves compression and adds two new experimental modes. Compression and decompression process times in ns/byte are given below for both enwik8 and enwik9. Wall times are slower due to disk I/O. Modes -c, -c1, and -c2 select best compression speed, best uncompression speed, and best size respectively, but this appears only to hold for enwik8, probably because of disk I/O interference. Modes -c3, -c4, and -c5 produce identical archives. Additional changes are a Drag'n'drop interface, a CRC check (adds 2% to time), and more flexible command line interface. 5313_stz.zip size is 41,941.

    Version   Option                                  enwik8  C/D Time   enwik9     C/D Time   Mem Note
    -------   ------------------------------------  ---------- ------- -----------  ---------  --- ----
    stz 0.7.2 -c  (LZBW2 best compression speed)    50,575,825         447,732,354   15   13    3   26
              -c1 (LZBW3 best uncompression speed)  56,100,810         510,600,276   16   10    3   26
              -c2 (LZBW2A best compression)         47,681,682         420,391,400   16   12    3   26
              -c3 (LZBW3A experimental)             50,575,825         447,732,354   15   11    3   26
    stz 0.8   -c  (LZBW2 best compression speed)    50,143,263  11  11 444,061,128   16   13    3   26
              -c1 (LZBW3 best uncompression speed)  55,670,417  16   9 506,622,114   18   12    3   26
              -c2 (LZBW2A best compression)         47,192,312  16  11 416,524,596   14   13    3   26
              -c3 (LZBW3A)                          54,080,795  15  11 480,696,931   18   12    3   26
              -c4 (LZBW2B experimental)             54,080,795  13   9 480,696,931   20   13    3   26
              -c5 (LZBW3B experimental)             54,080,795  16  12 480,696,931   19   14    3   26
    

    .4246 compress

    compress 4.3d is is the Windows version of the UNIX compress command, released Jan 18, 1990. It uses LZW and has no compression options.

    .4253 BriefLZ

    BriefLZ 1.05 is a free, open source (C and MASM) file compressor by Joergen Ibsen, Jan. 15, 2005. It uses LZ77. It takes no options. It uses about 2 MB memory for compression and about 900 KB for decompression.

    .4382 lzrw3-a

    lzrw3-a is one of a series of public domain (open source) memory to memory compressors by Ross Williams in 1991. The programs were implemented as file compressors by Matt Mahoney on Feb. 14, 2008. The programs are as follows:

    lzrw1 (Mar. 31, 1991) is byte-aligned LZ77 with a 12 bit offset and 4 bit length field allowing lengths 3-16. Each group of 16 phrases (pointers or literals) is preceded by 2 flag bytes to distinguish pointers from literals. Matches are found using a 4K hash table without confirmation which is updated after each phrase. It uses 16K of memory plus the input and output buffers.

    lzrw1-a (June 25, 1991) is lzrw1 except that the length field represents values 3-18.

    lzrw2 (June 29, 1991) replaces the offset with a 12 bit index into a rotating table of offsets, allowing the last 4K phrases (rather than 4K bytes) to be reached. The decompresser must reconstruct the phrase table (but not the hash table). It uses 24K memory plus buffers.

    lzrw3 (June 30, 1991) replaces the 12 bit length field with a 12 bit index into the hash table. The decompresser must reconstruct the hash table. It uses 16K memory plus buffers.

    lzrw3-a (July 15, 1991) uses a deep hash table (8 offsets per hash) with LRU replacement. It uses 16K memory plus buffers.

    lzrw5 (July 17, 1991) uses LZW. The dictionary is implemented as a tree. It uses up to 384K memory plus buffers.

    There is an experimental lzrw4, but it was never fully implemented.

    All of the compression algorithms were originally implemented as memory to memory compression functions in C, not as complete programs. I wrote a driver program which divides the input into 1 MB blocks (except lzrw5), compresses them independently by calling the provided functions, and writing the compressed size as a 4 byte number followed by the compressed data. However, compression could be improved by using larger blocks at the cost of more memory. For lzrw5 the block size is 64K because the program is not guaranteed to work correctly for larger blocks. It did work on this benchmark for a 192K block size, but not for 256K. The distribution linked above uses a 64K block size.

    Compressor     enwik8      enwik9           prog     Total       Comp  Deco  Mem ALg
    -------      ----------  -----------      -------  -----------   ----  ----  --- ---
    lzrw1        59,692,493  564,053,011      3,142 s  564,056,153     24    17    2 LZ77
    lzrw1-a      59,471,657  560,457,545      4,328 x  560,461,873     23    15    2 LZ77
    lzrw2        55,360,907  511,142,568      4,420 x  511,146,988     22    16    2 LZ77
    lzrw3        52,616,827  483,918,830      4,622 x  483,923,452     21    17    2 LZ77
    lzrw3-a      48,009,194  438,253,704      4,750 x  438,258,454     38    17    2 LZ77
    lzrw5 (64K)  59,375,192  570,387,858      4,544 x  570,392,402    146    14    1 LZW
    lzrw5 (192K) 50,721,610  479,044,732                              174    14    1 LZW
    

    .4473 fcm1

    fcm1 is a free, open source file compressor by Ilia Muraviev, May 23, 2008. It mixes order 0 and order 1 models and uses bitwise arithmetic coding as in fpaq0 and paq. The bit predictions are combined by weighted averaging, with the order 1 model weighted 15/16 unless the model is in its initial state, in which case the order 0 model prediction is used. Each context is mapped to 2 16-bit counters in initial state 1/2. One counter is updated by 1/8 of the prediction error and the other by 1/32. The model prediction is the average of these two values. The compressed file has a 4 byte header containing the file size.
    Compressor     enwik8      enwik9           prog     Total       Comp  Deco  Mem ALg
    -------      ----------  -----------      -------  -----------   ----  ----  --- ---
    fcm1         45,402,225  447,305,681      1,116 s  447,306,797    228   261    1 CM1
    

    .4581 runcoder1

    runcoder1 is a free, open source (GPL) file compressor by Andrew Polar, Mar. 30, 2009. It uses an order 1 model with arithmetic coding. It takes no options. The program is available as source code (C++) only. For this test it was compiled with MinGW g++ 3.4.2 with options -O2 -march=pentiumpro -fomit-frame-pointer -s for 32-bit Vista as noted in note 26.

    .4598 data-shrinker

    data-shrinker is a free, open source file compressor by Siyuan Fu, Mar. 23, 2012. It uses a LZ77 format similer to LZ4 for high speed. It takes no options. No executable was provided. To test, the source code was compiled with g++ 4.5.1 -O3 -s under 32 bit Windows and process times measured with output to nul:
    Compressor    Version    Opt    enwik8      enwik9          prog     Total       Comp  Deco  Mem Alg  Note
    ----------    ---------  ---  ----------  -----------     -------  -----------   ----  ----  --- ---- ----
    data-shrinker 23Mar2012       51,658,517  459,825,318     3,706 s  459,829,024     14     4    2 LZ77 26
    

    .4638 lzwc

    lzwc 0.3 is a free, open source (GPL) file compressor by David Catt, Jan. 15, 2013. It uses LZW with dictionary entries coded using 2 bytes. There is also a version 0.1 which produces identical compressed files but is not as fast. The program takes no options.

    lzwc v0.7 fixes a bug in decompression of binary files, but does not change compressed size or speed. lzwc_bitwise is a version that uses less than 16 bits to encode symbols when the dictionary is small.

    Compressor           enwik8      enwik9          prog     Total       Comp  Deco  Mem Alg  Note
    ----------         ----------  -----------     -------  -----------   ----  ----  --- ---- ----
    lzwc 0.1           46,647,318                  1,955 x                 280   290   70 LZW   26
    lzwc 0.3           46,647,318  463,892,454     3,017 x  463,895,471     85    90   71 LZW   26
    lzwc_bitwise 0.7   46,639,414  463,884,550     4,183 x  463,888,733    123   134   71 LZW   26 
    

    .4798 exdupe

    exdupe v0.3.3 beta is a deduplicating archiver supporting full and incremental backups, under development by Lasse Reinhold, Oct. 20, 2011. When the beta phase ends, it will be a commercial program with source code available under restricted and non-permissive terms. Only 64 bit systems are supported. Partial source code is available for this version, although not for the compression and decompression code, which is derived from QuickLZ (LZ77). It was tested in Linux. A later version, 0.3.6 beta, was available only for 64 bit Windows on Oct. 30, 2012, and was not tested.

    Compressor     Opt    enwik8      enwik9          prog     Total       Comp  Deco   Mem ALg  Note
    ----------     ---  ----------  -----------     -------  -----------   ----  ----  ---- ---- ----
    exdupe 0.3.3        53,717,422  478,788,378  1,092,986 x 479,881,364     27     5  1000 LZ77  48
    

    .4884 lzv

    lzv 0.1.0 is a free, experimental file compressor for Windows by Valéry Croizier, Jan. 1, 2014. It takes no options.
    Compressor     Opt    enwik8      enwik9          prog     Total       Comp  Deco   Mem ALg  Note
    ----------     ---  ----------  -----------     -------  -----------   ----  ----  ---- ---- ----
    lzv 0.1.0           54,950,847  488,436,027     10,385 x 488,446,412      6     5     3 LZ77  62
    lzv 0.1.0           54,950,847  488,436,027     10,385 x 488,446,412     15     6     3 LZ77  26
    lzv 0.1.0           54,950,847  488,436,027     10,385 x 488,446,412      4   2.6     3 LZ77  48
    

    .4930 FastLZ

    FastLZ is a free, open source compression library and file compressor by Ariya Hidayat, announced June 12, 2007 with no date or version number, and downloaded and tested on June 16, 2007. It uses byte-aligned LZ77. The software was released as source code only (in C). For this test it was compiled with MinGW gcc 3.4.5 as suggested by README.TXT (plus -s to strip debugging info):

      gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6pack.c fastlz.c -o 6pack -s
      gcc -march=pentium -O3 -fomit-frame-pointer -mtune=pentium 6unpack.c fastlz.c -o 6unpack -s
    
    6pack and 6unpack are the compressor and decompresser, respectively. They take no options. The compressed file name is stored without a path in the archive.

    .4945 sharc

    sharc 0.9.6 beta is a free, open source (GPL v3) file compressor by Guillaume Voirin, Aug. 1, 2013. It uses dictionary coding. Option -c0 uses 1 pass and -c1 uses 2 passes for better compression.

    sharc 0.9.10 was released Dec. 12, 2013.

    sharc 0.9.11b, Dec. 14, 2013 has compression levels -c1 and -c2. -c0 selects no compression. -c1 selects dictionary encoding. -c2 selects LZP preprocessing followed by dictionary coding. The program uses the Density 0.9.12b compression library which is now a separate component.

    Compressor    Opt        enwik8      enwik9       prog     Total       Comp  Deco  Mem Alg  Note
    -------       ----     ----------  -----------  -------  -----------   ----  ----  --- ---  ----
    sharc 0.9.6   -c0      63,290,900  625,090,400  25,822 s 625,116,222     14    11   14 Dict  26
    sharc 0.9.6   -c1      58,612,834  554,587,996  25,822 s 554,613,818     19    15   14 Dict  26
    sharc 0.9.10  -c0      61,798,570  610,691,896  11,765 s 610,703,661     13    11    4 Dict  26
    sharc 0.9.10  -c1      57,031,766  538,757,716  11,765 s 538,769,481     14    15    5 Dict  26
    sharc 0.9.11b -c1      61,611,730  608,740,104  81,001 s 608,821,105     12     9    5 Dict  26
    sharc 0.9.11b -c2      53,175,042  494,421,068  81,001 s 494,502,069     15    14    6 LZP   26
    

    .4975 flzp

    flzp v1 is a free, open source file compressor by Matt Mahoney, June 18, 2008. It uses byte-oriented LZP. The input is divided into blocks such that at least 33 byte values never occur, or 64KB, whichever is smaller, then uses those bytes to code an end of block symbol plus match lengths from 2 up to the number of unused bytes - 1. A match length is decoded by finding the most recent context hash match in a 4 MB rotating buffer and outputting the bytes that follow. It uses a 1M hash table and an order 4 context hash. Each block begins with a 32 byte bitmap to distinguish symbols for matches from literals. flzp can be used as a preprocessor to a low order compressor like fpaq0 or ppmd -o3 to improve compression and speed.

    .5157 alba

    alba 0.1 is a free, open source, experimental file compressor by xezz, Feb. 4, 2014, updated Feb. 5, 2014 to fix a bug in the "C" option. It uses byte pair encoding. The option c32768 selects the maximum block size. The default is 4096. It has an "optimal" compression mode "C". It was tested in Linux by compiling with gcc 4.8.1 -O3.

    alba 0.2, Feb. 6, 2014, adds extreme (e) mode. Modes c and C are unchanged.

    alba 0.5.1, Feb, 18, 2014, adds dynamic block sizing (cd).

    Compressor    Opt        enwik8      enwik9       prog     Total       Comp  Deco  Mem Alg  Note
    -------       ----     ----------  -----------  -------  -----------   ----  ----  --- ---  ----
    alba 0.1      c        53,643,211  526,932,392   2,950 s 526,935,342    219    10    1 BPE   48
                  c32768   57,419,643  548,461,196   2,950 s 548,464,146    171     8    1 BPE   48
                  C        53,618,232  526,577,702   2,880 s 526,580,582    227    14    1 BPE   48
                  C32768   57,395,415  547,792,821   2,880 s 547,795,701    179    12    1 BPE   48
    alba 0.2      e        53,611,841  526,860,426   3,247 s 526,863,673    819   603    1 BPE   48
    alba 0.5.1    cd       52,728,620  515,760,096   4,870 s 515,764,966    239    10    4 BPE   48
    

    .5277 snappy

    snappy 1.0.1 is a free, open source (Apache) compression library for Linux from Google, Mar. 25, 2011. It uses byte aligned LZ77, and is intended for high speed rather than good compression. Google uses snappy internally to compress its data structures for its search engine.

    The compressed data contains tag bytes such that the low 2 bits indicate literals and matches as follows:

      00 = literal
      01 = 1 byte match
      10 = 2 byte match
      11 = 4 byte match (not used)
    

    A literal of length 1 to 60 is encoded by storing the length - 1 in the upper 6 bits. Longer literals are coded by storing 60..63 in the upper 6 bits to indicate that the length is encoded in the next 1 to 4 bytes in little-endian (LSB first) format. This is followed by the uncompressed literals.

    Matches of length 4 to 11 with offsets of 1 to 2047 are encoded using a 1 byte match. The match length - 4 is stored in the middle 3 bits of the tag byte. The most significant 3 bits of the offset are stored in the most significant 3 bits of the tag byte. The lower 8 bits of the offset are stored in the next byte. A match may overlap the area to be copied. Thus, the string "abababa" could be written using a literal "ab" and a match with an offset of 2 and length of 5. This would be encoded as:

      000001 00  (literal of length 2)
      01100001   (literal 'a')
      01100010   (literal 'b')
      000 001 01 (high bits of offset, match of length 5)
      00000010   (low 8 bits of offset)
    

    Matches of length 1 to 64 with offsets of 1 to 65535 are encoded using a 2 byte match. The length - 1 is encoded in the high 6 bits of the tag byte The offset is stored in the next 2 bytes with the least significant bit first. Longer matches are encoded as a series of 64 byte matches with a final shorter match of 4 to 63. If the final part of the match is less than 4 then it is encoded as a 60 byte match plus a 4 to 7 byte match.

    A 4 byte match allows offsets up to 232 - 1 to be encoded as with a 2 byte match. The decompresser will decode them but the compressor does not produce them because the input is compressed in 32K blocks such that a match does not span a block boundary.

    The entire sequence of matches and literals is preceded by the uncompressed length up to 232 - 1 written in base 128, LSB first, using 1 to 5 digits in the low 7 bits. The high bit is 1 to indicate that more digits follow.

    Compression searches for matches by comparing a hash of the 4 current bytes with previous occurrences of the same hash earlier in the 32K block. The hash function interprets the 4 bytes as a 32 bit value, LSB first, multiplies by 0x1e35a7bd, and shifts out the low bits. The hash table size is the smallest power of 2 in the range 256 to 16384 that is at least as large as the input string. As an optimization for hard to compress data, after 32 failures to find a match, the compressor checks only every second location in the input for the next 32 tests, then every third for the next 32 tests, and so on. When it finds a match, it goes back to testing every location.

    As another optimization for the x86-64 architecture, copies of 16 bytes or less are done using two 64-byte assignments rather than memcpy(). To support this, if 15 or fewer bytes remain after a match then they are encoded as literals with no further search.

    Snappy compresses from memory to memory rather than from file to file, so it was necessary to write a small test program (below), which was not included in the compressed size. The program loads the input into a string, compresses or decompresses it to a new string, and writes it to output. It gives the best possible compression but is not optimal for speed or memory. With this test, speed is 25 ns/byte for compression and 12 ns/byte for decompression (under 64 bit Linux). In a separate test (not shown), compressing in 32K chucks takes 9 ns/byte with very slightly larger size due to storing the size in each chunk. Decompression was not tested in this mode, but should be twice as fast. Memory usage for the test program is 2 GB to store the input and output, but actual memory usage by the library is at most 32K for the hash table.

    The test program was compiled with g++ 4.4.5 -O3 in 64 bit Ubuntu Linux and linked to Snappy after running "./configure; make". Use -DMODE=Compress or -DMODE=Uncompress to create a compressor or decompresser respectively.

    #define NDEBUG 1  // turn off debugging checks
    #include "snappy.h"
    #include <stdio.h>
    int main() {
      std::string input, output;
      int c;
      while ((c=getchar())!=EOF) input+=char(c);  // read from stdin
      snappy::MODE(input.c_str(), input.size(), &output);  // MODE = Compress or Uncompress
      fwrite(output.c_str(), 1, output.size(), stdout);  // write to stdout
      return 0;
    }
    

    .5322 bpe

    bpe is a free, experimental file compressor by Philip Gage. It was published as source code only in "The C Users Journal" in Feb. 1994. It uses byte pair encoding. The input is divided into blocks which are iteratively compressed by finding the most frequent byte pair and replacing it with another byte value that never occurs in the block, until all of the unused bytes are used up or no pair occurs more than a minumum number of times.

    For testing, I compiled with gcc 4.4.0 -s -O2 -march=pentiumpro -fomit-frame-pointer. I used the recommended compression options "5000 4096 200 3" and did not try to find a better combination. The options say to use a maximum block size of 5000, a hash table size of 4096 (it is recommended to be 5% to 20% smaller than the block size), a maximum of 200 different byte values per block, and do not replace pairs that occur less than 3 times.

    .5326 kwc

    kwc (discussion) is a free GUI file compressor by sportman, Jan. 18, 2010. The input is divided into strings of 6 bytes each, and each value is replaced with a dictionary code. The dictionary size is not bounded, so usage increases with the size and randomness of the input. enwik9 uses 668 MB for compression and 333 MB for decompression.

    .5427 bpe2

    bpe2 v1 is a free, experimental, open source (public domain) file compressor by Will, Jan. 15, 2010. It uses byte pair encoding. It divides the input into blocks of 8192 bytes which are compressed independently. A block is compressed by finding the byte pair which occurs most frequently and a byte value which never occurs in the block, and then substituing that byte value for each occurrence of the pair. The byte pair and its replacement are appended to the block as a 3 byte header. The process is repeated until either there are no unused byte values left, or there is no pair that occurs at least 4 times. The block is output with an additional 2 byte header to indicate its size.

    bpe2 v2, Jan. 15, 2010, uses a faster algorithm to find the most frequent byte pair during compression.

    bpe2 v3, Feb. 12, 2010, has some optimizations. (discussion)

    The programs were tested by compiling with g++ 4.4.0 -O2 -s -march=pentiumpro -fomit-frame-pointer under Windows Vista on a 2.0 GHz T3200.

    Compression         Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem Alg Note
    -------         ----------  -----------  -----------  -----------  ----- -----  --- --- ----
    bpe2 v1         55,390,822  545,319,505      1,621 s  545,321,126   2785   228  0.5 Dict 26
    bpe2 v2         55,389,832  545,268,425      1,635 s  545,270,060   1257   229  0.5 Dict 26
    bpe2 v3         55,289,197  542,748,980      2,979 s  542,751,959    518   132  0.5 Dict 26
    

    .5586 fpaq0f2

    fpaq is a free, experimental command line file compressor with source code (in assembler) by Nikolay Petrov, Feb. 20, 2006. It is a faster implementation of fpaq0 by Matt Mahoney (Sept. 3, 2004) maintaining archive compatibility. fpaq is an order-0 arithmetic coder which models independent, identically distributed (i.i.d.) characters, and is not intended as a general purpose compressor. Its purpose is to test the efficiency of different arithmetic coding algorithms. There are several variants.
    Compressor     enwik8      enwik9    Comp  Decomp  Author                  Date
    ----------   ----------  ----------  ----  ----    --------------          ----
    fpaq0        63,391,013  641,421,110  336   351    Matt Mahoney            Sep 03 2004
    fpaq1        63,502,003               477   489    Matt Mahoney            Jan 10 2006
    fpaq0b       63,375,460               457   437    Fabio Buffoni           Jan 10 2006
    fpaq0s       63,375,457               427   417    David A. Scott          Jan 16 2006
    fpaq         63,391,013  641,421,110  255   246    Nicolay Petrov          Feb 20 2006
    fpaq0p       61,457,810  622,237,009  131   131    Ilia Muraviev           Apr 15 2007
    fpaq02       63,501,997  644,561,596 1345  1325    David Anderson          May 27 2007
    fpaqa        61,340,408  620,681,885  262   237    Matt Mahoney            Dec 15 2007
    fpaqb        61,270,458  620,278,361  264   171    Matt Mahoney            Dec 20 2007
    fpaq0m       61,389,879  621,285,504  153   135    Ilia Muraviev           Dec 20 2007
    fpaq0mw      61,271,869  618,959,309  455   457    Eugene Shelwien         Dec 21 2007
    fpaqc        61,270,455  620,278,358  252   177    Matt Mahoney            Dec 24 2007
    fpaq0pv2     61,280,398  620,379,449  116   133    Ilia Muraviev           Dec 26 2007
    fpaq0r       61,234,684  620,169,855  129   142    Alexander Rhatushnyak   Jan 09 2008
    fpaq0rs      61,202,171  619,839,546  139   138    Alexander Rhatushnyak   Jan 09 2008
    fpaq0f       58,088,230  581,053,251  265   251    Matt Mahoney            Jan 28 2008
    fpaq0f2      56,916,872  558,645,708  222   207    Matt Mahoney            Jan 30 2008
    fpaq0pv3     61,457,810  622,237,009  103   119    Nania Francesco Antonio Apr 04 2008
    fpaq0pv4     61,457,810  622,237,009   70    79    Eugene Shelwien         Apr 06 2008
    fpaq0pv4nc   61,350,834  621,169,159   64    69    Eugene Shelwien         Apr 06 2008
    fpaq0pv4nc0  61,287,662  620,506,072   68    74    Eugene Shelwien         Apr 06 2008
    fpaq0pv5     61,457,810  622,237,009   81    87    Nania Francesco Antonio Apr 06 2008
    fpaq0pv4a    61,457,810  622,237,009   70    75    Eugene Shelwien         Apr 07 2008
    fpaq0pv4anc  61,323,986  621,169,159   64    65    Eugene Shelwien         Apr 07 2008
    fpaq0pv4anc0 61,287,662  620,506,072   66    66    Eugene Shelwien         Apr 07 2008
    fpaq0pv4b1   61,287,234  620,488,244   56    60    Eugene Shelwien         Apr 18 2008
    

    fpaq0 uses a 32-bit carryless arithmetic coder to code binary decisions and output one byte at a time. fpaq1 uses a 64 bit coder. fpaq0b uses a 32 bit coder but counts carries and outputs a bit at a time to achieve greater internal precision. fpaq0s improves on fpaq0b by using the compressed EOF to encode the uncompressed EOF, unlike the other models which code an extra bit for each byte to indicate the end. fpaq02 extends this idea to 64 bits. All programs except fpaq are C++ source code and compiled as follows with MinGW 3.4.2 (where %1 is the program name):

    g++ -Wall %1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o %1.exe
    

    fpaq0p by Ilia Muraviev, Apr. 15, 2007, uses an adaptive order 0 model. Instead of keeping a 0,1 count for each context, it keeps a probability and updates it by adjusting by 1/32 of the error. This is faster because it avoids a division instruction.

    fpaqa by Matt Mahoney, Dec. 15, 2007, is the first implementation of Jarek Duda's asymmetric binary coder, described in section 3 of Optimal encoding on discrete lattice with translational invariant constrains using statistical algorithms, 2007.

    The model is based on fpaq0p (adaptive order 0), but with probabilities modeled with 16 bits resolution (instead of 12) to improve compression. The source (GPL) can be compiled with -DARITH to substitute the arithmetic coder from fpaq0 and fpaq0p for the asymmetric coder.

    An asymmetric coder has a single N-bit integer state variable x, as opposed to two variables (low and high) in an arithmetic coder, which allows a lookup table implementation. In fpaqa, N=10. A bit d (0 or 1) with probability q = P(d = 1) (0 < q < 1, a multiple of 2-N) is coded:

      if d = 0 then x := ceil((x+1)/(1-q)) - 1
      if d = 1 then x := floor(x/q)
    
    To decode, given x and q
      d = ceil((x+1)*q) - ceil(x*q)  (1 if fract(x*q) >= 1-q, else 0)
      if d = 0 then x := x - ceil(x*q)
      if d = 1 then x := ceil(x*q)
    
    x is maintained in the range 2N to 2N+1-1 by writing the low bits of x prior to encoding d and reading into the low bits of x after decoding. Because compression and decompression are reverse operations of each other, they must be performed in reverse order. The encoder divides the input into blocks of size B=500K bits, saves the predictions (q) in a stack, then encodes the bits in reverse order to a second stack. The block size and final state x are then written, followed by the compressed bits in the second stack in reverse order that they were coded. The decompresser runs everything in the forward direction, reading the saved x at the beginning of each block.

    To reduce the size of the coding tables, q is quantized to R=7 bits on a nonlinear scale with closer spacing near 0 and 1. The quantization is such that ln(q/(1-q)) is a multiple of 1/8 between -8 and 8.

    In the source, N, R, and B are adjustable parameters up to N=12, R=7. Larger values improve compression at the expense of speed and memory. fpaqa uses 2N+R+2 + 5*B/4 bytes for compression and 2N+R+1 bytes for decompression.

    fpaqb (Matt Mahoney, Dec. 17, 2007, updated to ver 2 on Dec. 20, 2007) is a revision of fpaqa, using the same model, but using an asymmetric coder that uses direct calculations in place of lookup tables to update the state. This allows higher precision to improve compression (eliminating a 0.03% penalty), saving memory, and allowing bytewise I/O (x in range 2N to 2N+8-1 for N=12). Compression is about the same speed as fpaqa but decompression is 28% faster. Ver. 2 is faster but maintains archive compatibility with ver. 1.

    fpaq0m by Ilia Muraviev, Dec. 20, 2007, uses arithmetic coding and 2 order 0 models averaged together, one with fast update (rate 1/16) and one slow (1/64).

    fpaq0mw by Eugene Shelwien, Dec. 21, 2007, modifies fpaq0m by using a weighted mix of a fast (1/16) and slow (1/256) adapting order 0 model, where the weight is adjusted dynamically to favor the better model.

    fpaqc (Matt Mahoney, Dec. 24, 2007) is fpaqb with some optimizations to the asymmetric coder.

    fpaq0pv2 (Ilia Muraviev, Dec. 26, 2007) is a speed optimized version of fpaq0p with arithmetic coding.

    fpaq0r by Alexander Rhatushnyak, Jan. 9, 2008, is an order 0 model with arithmetic coding. The model is tuned for better text compression. When compiled with -DSLOWER (fpaq0rs.exe), the arithmetic coder uses higher precision for better compression with a small speed penalty.

    fpaq0f by Matt Mahoney, Jan. 28, 2008, uses an adaptive order 0 model which includes the bit history (as an 8 bit state) in each context. (It is controversial whather this is really "order 0"). It uses arithmetic coding with 16 bit probabilities (rather than 12 bits).

    fpaq0f2 by Matt Mahoney, Jan. 30, 2008, uses a simplified bit history consisting of just the last 8 bits, plus some minor improvements.

    fpaq0pv3 by Nania Francesco Antonio, Apr 04, 2008, is compatible with fpaq0p but 20-30% faster.

    fpaq0pv4 including fpaq0pv4nc and fpaq0pv4nc0, are speed optimizations by Eugene Shelwien, Apr. 6, 2008, as discussed here. fpaq0pv4 is compatible with fpaq0p but faster. The nc and nc0 variants dispense with the extra EOF flags in each byte.

    fpaq0pv5 by Nania Francesco Antonio, Apr 6, 2008, is a modification to fpaq0pv4.

    fpaq0pv4a including fpaq0pv4anc and fpaq0pv4anc0 are bug fixes to fpaq0pv4 by Eugene Shelwien, Apr. 7, 2008, as discussed above.

    fpaq0pv4b by Eugene Shelwien, Apr. 18, 2008, replaces the arithmetic coder with sh_v1m port (uses carries), Windows I/O, and other optimizations as discussed here. The Intel-compiled .exe only runs on Intel machines. I tested fpaq0pv4b1 which was patched on May 19, 2008 to run on AMD machines.

    .5793 ppp

    ppp is the public domain file compressor specified in RFC 1978 for datagram compression using the Point-to-Point Protocol. The RFC includes an implementation in C written by Dave Rand with modifications by Ian Donaldson and Carsten Bormann, published in Aug. 1996. The program uses order-4 symbol ranking with a queue length of 1 with a 64K hash table without collision detection. Match flags are packed 8 to a byte, followed by up to 8 literals for each incorrect guess. The 16 bit context hash is updated by shifting left 4 bits and XORing with the current byte. The program reads from a file and outputs to stdout like this:
      ppp enwik9 > enwik9.ppp     (compress)
      ppp -d enwik9.ppp > enwik9  (decompress)
    
    The original code opens both files in text mode, which does not work in Windows. For testing, I modified 3 lines of code to open the input and output files in binary mode as follows:
      #include <fcntl.h>  // added
      setmode(fileno(stdout), O_BINARY);  // added
      FILE *f = fopen(*p, "rb");  // changed "r" to "rb"
    
    I compiled using gcc 3.4.2 -O3 -fomit-frame-pointer -march=pentiumpro and packed with UPX (linked above, Feb. 11 2008). Times are wall times. I did not use timer 3.01 because its output would be redirected to the output file. Process times are about 50% of wall time based on watching Task Manager.

    .5805 ksc

    ksc (keyword shuffle compressor) is a free, experimental file compressor for Windows by Sportman, Feb. 13, 2014. It uses symbol ranking of 1, 2, 3, or 4 byte fixed length strings (user selected) encoded from a move-to-front queue with dictionary entries near the front encoded with half the bits of the maximum pointer length. Decoding is in reverse order and therefore requires reading the whole file into memory. Thus, decompression requires more memory than compression, depending on the file size. The option 1..4 selects the string length.

    The program uses a Windows GUI when run with no arguments. It was tested with command line arguments under Wine 1.6 in Ubuntu.

    Compressor Opt     enwik8      enwik9           prog     Total       Comp  Deco  Cmem Dmem Alg  Note
    -------    ---   ----------  -----------      -------  -----------   ----  ----  ---- ---- ---  ----
    ksc         1    79,706,130                                          3250  2790    40  265 SR   48
                2    67,676,824                                          3730  1480    40  227 SR   48
                3    62,570,897                                          8560  1800    59  273 SR   48
                4    59,511,259                                         32780  6670    62  220 SR   48
                4                580,557,413     13,507 x 580,570,920   40050  7917   155 1700 SR   48
    

    .5902 lzbw1

    lzbw1 0.8 is a free, command line file compressor by Bruno Wyttenbach, Apr. 26, 2009. It uses LZP and is derived from LZP2. It takes no options.

    .5981 lzp2

    lzp2 0.1 is a free file compressor by Yann Collet, Apr. 17, 2009. It uses LZP. There are no compression options. There is a smaller, separate program (unlzp2) that only decompresses.

    lzp2 0.7c was released Oct. 10, 2009. Run times are dominated by disk access, not included below.

    Compressor     enwik8      enwik9           prog     Total       Comp  Deco  Mem Alg  Note
    -------      ----------  -----------      -------  -----------   ----  ----  --- ---  ----
    lzp2 0.1     74,358,722  655,709,055      5,855 xd 655,714,910     11     9   15 LZP  26
    lzp2 0.7c    67,909,076  598,076,882     40,819 x  598,117,701     11     8   15 LZP  26
    

    .6368 NTFS

    NTFS disk compression is used in Microsoft Windows when the "compress files to save disk space" checkbox is checked in the folder properties dialog box. Disk compression was introduced in NTFS v1.2 in mid 1995 according to Wikipedia. The compression format is called LZNT1. The algorithm is propretary. However, it was reverse engineered (in Russian, see also here). The algorithm is LZSS (similar to lzrw1). The format consists of groups of 8 symbols each preceded by 8 flag bits packed into a byte. A 0 bit indicates a literal symbol, which is decoded by copying it. A 1 bit indicates a 2 byte offset-length pair which is decoded by going back 'offset' bytes in the output and copying the next 'length'+3 bytes. An offset-length pair uses a variable number of bits allocated to the offset (from 4 to 12) depending on the position in the file, and any remaining bits allocated to the length of the match. A 12 bit offset would correspond to a 4 KB block on disk.

    I tested by copying enwik9 between folders with the compression turned on in one folder, and compared with times to copy between two folders both with compression turned off. I tried each copy twice and took the second time, which was at most 1 second faster than the first copy. I used the test machine in note 26 running Windows Vista Home Premium SP1 32 bit with 3 GB memory and a 200 GB disk between folders on the same partition. Copying between two uncompressed folders takes 41 seconds. Copying to a compressed folder takes 51 seconds, or a difference of 10 seconds. Copying from a compressed folder takes 35 seconds. I estimated 9 seconds for decompression by assuming that copying the compressed file directly would take 26 seconds based on its size of 636 MB. (This is probably wrong because the file would be cached in memory uncompressed, but the alternative is a negative time for decompression. Copying either the compressed or uncompressed file to NUL: takes 2 seconds on the second try).

    Times were recorded with a watch because timer 3.01 will not time built-in commands like 'copy'. Task Manager does not show any processes consuming CPU time or memory during copying. However, memory use should be insignificant (under 16 KB) for LZSS with 4 KB blocks. Sizes are as reported by right clicking on the compressed file in Explorer as 'size on disk'. The size of the decompression program is not known.

    .6373 shindlet

    shindlet (mirror) is a series of 3 free command line file compressors by Piotr Tarsa. All are order-0 arithmetic coders with identical models written in assembler (included). The three variants are fs (frequency sorting), bt (binary tree), and sl (linear search). All three produce identical sized compressed files. In addition, the compressed output of bt and sl are identical. Results for all 3 variations are below. Comp and Decomp show global times including disk I/O in ns/byte, with CPU (process) times in parenthesis. Date is the latest program timestamp in the distribution, not the release date.
    Compressor       Date        enwik8       enwik9     prog     Total size     Comp      Decomp
    -----------  ------------  -----------  ----------  -------   -----------  ---------  ---------
    shindlet_fs  May  7, 2006  62,890,267  637,390,277  1,275 xd  637,391,552  185 (113)  123 (103)
    shindlet_bt  May 27, 2006  62,890,267  637,390,277  1,387 xd  637,391,664  163  (85)  118  (96)
    shindlet_sl  Apr 12, 2006  62,890,267  637,390,277  2,415 xd  637,392,692  166  (94)  121 (102)
    

    .6445 arb255

    arb255 is a free, experimental command line file compressor with source code availalbe by David A. Scott, July 28, 2004. It is a bijective order-0 arithmetic coder, best suited for i.i.d. bytes (like fpaq). It takes no arguments except the input and output filenames. The decompresser is unarb255.exe.

    .6483 compact

    compact (man page) is a file compressor by Colin L. Mc Master, Feb. 28, 1979. It was written in K&R C for VAX/PDP11 and SUN under Berkeley UNIX. It uses adaptive order-0 Huffman coding. The (separate) decompression program rebuilds the Huffman tree, so it need not be transmitted.

    Neither program takes options. compact deletes the input file and creates an output file with a .C extension. uncompact deletes the compressed file and restores the original. compact was later superceded by compress, which gives better compression.

    For this test, compact was compiled using the provided Makefile and tested under Ubuntu Linux. Minor source code corrections were needed to compile under gcc. However, the decompresser size is based on the original code. A port to Windows would be possible but would require more source code changes.

    .6942 TinyLZP

    TinyLZP is a free, open source (GPL v3) file compressor by David Werecat, Oct. 12, 2012. It uses LZP and takes no options. The first entry is compiled from source using "cl /O2 tinylzp.c /I." using Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86 and tested on a 2.0 GHz T3200 under 32 bit Vista. The second entry, TinyLZP-x86-SSE2.exe, is supplied and requires MSVCR110.dll (Visual Studio 2012 C++ runtime) to run.

    Compressor                   enwik8       enwik9     prog     Total size   Comp  Deco  Mem  Alg Note
    -----------                -----------  ----------  -------   -----------  ----  ----  ---  --- ----
    TinyLZP 0.1                79,220,546  694,274,932   2,811 s  694,277,743    58    46   10  LZP  26
    TinyLZP-x86-SSE2           79,220,546  694,274,932   2,811 s  694,277,743    32    38   10  LZP  26
    

    .6955 smile

    smile (Nov. 5, 2004) and smile256 (Dec. 5, 2004) (discussion) are free, open source file compressors by Andrei Frolov. These programs are unique for their small executable size. smile consists of two programs: a 250 byte compressor, smile_e.com and a 207 byte decompresser, smile_d.com. smile256 is both a compressor and a decompresser in 256 bytes. This includes code to parse the command line and open the input and output files. Source code is in 16 bit assembler for DOS. Program size is given for the uncompressed .com files because zip makes them larger.

    Both programs use a move-to-front algorithm with the queue position encoded using an interleaved Elias Gamma code. The position of the current byte in the queue (1..256) is encoded by dropping the leading 1 bit, preceding each of the remaining bits with a 0 bit, then terminating with a 1 bit. After encoding, the byte value is moved to the front of the queue. smile256 also encodes EOF as 257, resulting in a file that is usually 1 byte larger than smile_e.

    Compressor                   enwik8       enwik9     prog     Total size   Comp  Deco  Mem  Alg Note
    -----------                -----------  ----------  -------   -----------  ----  ----  ---  --- ----
    smile_e/smile_d             71,154,788  695,562,502  207 xd   695,562,709 10517 10414  0.6  MTF  26
    smile256                    71,154,789               256 x                11190 10840  0.6  MTF  26
    

    .7594 barf

    barf is a free, open source file compressor by Matt Mahoney, Sept. 21, 2003. It was written as a joke to debunk claims of recursive compression. The algorithm is as follows:
    1. If the input is one of the 14 files of the Calgary corpus, the output is coded as 1 byte to indicate which file.
    2. If not, then the input is compressed with a byte oriented LZ77 code, in which bytes 0-31 code a literal of that length, and 32-255 code a match of length 2 and offset 0-223.
    3. If step 2 does not compress, then the first byte is removed and a filename extension is added to encode that byte.
    The main table shows the size and total process time after 2 compression passes. Further passes will "compress" by one byte. The decompresser source code size includes the Calgary corpus, which is needed to build the executable. (barf.exe is 1,009,274 bytes after packing with UPX and zip). Results by pass are shown below. Times are process times (Timer 3.01) with actual wall times in parenthesis.
    Pass    enwik8      enwik9    size (zip)   enwik9+prog  Comp (wall) Decomp  Mem Alg   Filename
    ----  ----------  ----------- -----------  -----------  ----------  ------- --- ----  --------
    1     76,450,126  763,918,762   983,782 s  764,902,544   315 (330)   30 (73)  4 LZ77  enwik9.x
    2     76,074,327  758,482,743   983,782 s  759,466,525   439 (462)   23 (60)  4 LZ77  enwik9.x.x
    3     76,074,326  758,482,742   983,782 s  759,466,524   488 (551)   18 (44)  4 copy  enwik9.x.x.x9v
    

    A similar program, barfest.exe, compresses the million random digits file to 1 byte, rather than the Calgary corpus. The decompresser size is 455,755 bytes (zipped).

    .9956 arb2x

    arb2x v20060602 is a free, experimental command line file compressor with source code availalbe by David A. Scott, updated June 2, 2006. It is a bitwise bijective order-0 arithmetic coder, best suited for i.i.d. bits. It takes no arguments except the input and output filenames. The decompresser is unarb2x.exe.

    Failed and Pending Tests

    hipp

    hipp v0.5819 is an experimental command line file compressor with source code available by Bogatov Roman, Aug. 19, 2005. It uses context mixing with ordinary and optionally sparse (fixed gap) contexts, using a suffix tree with path compression to store statistics. The options are /m to specify the memory limit in MB (default /m2048), /o to specify primary context order, i.e. the depth of the suffix tree with path compression (default /o256), /do to set max deterministic order (actual order with path decompression) (default /do256, do >= o), /so to set the number of sparse contexts (default /so0). Sparse contexts are useful for binary data but generally not text. Memory usage increases with the size of the file and with /o and /so (but not /do). Also, if the memory limit is exceeded then an error occurs. Unfortunately enwik9 cannot be compressed at all because initialization requires more than 800 MB. Some results for enwik8:

    hipp5819   enwik8    MB Mem  Comp (ns/byte)
    -------  ----------  ------  ----
    /o5      22,390,366  248.5  ~3710     
    /o8      20,555,951  719.5  ~4300
    
    Zipped size: C++ source (commented in Russian) = 98,765, exe = 36,724.

    ppmz2

    ppmz2 v0.81 is a free, experimental, open source file compressor by Charles Bloom, May 9, 2004. It uses PPM. It takes several compression options but only the defaults were tested. Memory usage grows as the program runs. On enwik9 it runs out of memory.

    XMill

    XMill 0.8 is an open source command line XML preprocessor/compressor by AT&T, written by Dan Suciu, Hartmut Liefke, and Hedzer Westra in March, 2003. It works by sorting by XML tags to bring similar content together, then compressing with gzip, bzip2, or ppmd. Optionally it can (in theory) output the preprocessed data as input to another compressor.

    Unfortunately, the compressor will not accept truncated XML files such as this benchmark. It can be made to work by appending the following 38 bytes to enwik8 or enwik9 to create a properly formed XML file (a trailing newline is optional but was not used):

    "</text></revision></page></mediawiki>
    
    However, decompression succeeds for enwik8 but fails for enwik9. (Failed values in parenthesis, timed for enwik8). The decompresser (xdemill) reports "corrupt file".
                    Compression                      Compressed size      Decompresser  Total size   Time (ns/byte)
    Program           Options                       enwik8      enwik9     size (zip)   enwik9+prog  Comp Decomp  Mem
    -------           -------                     ----------  -----------  -----------  -----------  ----- -----  ---
    xcmill 0.8        -w -P -9 -m800              26,579,004 (230,934,622)  114,764 xd (231,049,386)   616 (530)  800
    xcmill 0.9.1      -w -P -9 -m1700             26,579,004 (230,914,289)  108,845 xd (231,023,134)   711        984
    
    The -w option preserves whitespace. Otherwise compression is lossy. -P selects ppmdi compression (bzip2, gzip and no compression are also available). -9 selects maximum compression. -m800 allows 800 MB of memory.

    In theory, using no compression (-N) would allow XMill to be used as a preprocessor to other compressors. However, the decompresser will not accept either enwik8 or enwik9 (with closing tags appended) if processed with -N (reports "corrupt file").

    xmill 0.9.1 (Mar. 15, 2004) also fails to decompress enwik9 and fails to decompress either file with -N.

    lzp3o2

    lzp3o2 (LZP 3 with order 2 literal coding) is one of a family of open source file compressors by Charles Bloom, originally written in 1995. The algorithm is described in a paper submitted to DCC'96. lzp3o2 uses LZP compression with order 2 modeling of literals and arithmetic coding. The tested version of the source code is dated Aug. 25, 1996 and compiled for Windows Oct. 10, 1998. The compiled distribution from here was tested.
    Program    enwik8     Comp  Deco  Mem  Alg
    -------  ----------   ----  ----  ---  ---
    lzp1     56,013,656     23    20  153  LZP
    lzp2     40,350,594     80        280  LZP
    lzp3o2   33,041,439    230   270  151  LZP
    

    All programs report "malloc failed" on enwik9. The LZP algorithms use very little memory themselves, but these implementations allocate input and output buffers all at once. This fails for enwik9 because of the 2 GB process limit in Windows.

    lzp1 is both a compressor and decompresser. To decompress, use -d as the third argument. lzp2 is a compressor only. There is a source code decompresser "lzp2d" but I was unsuccessful in compiling it. It allows an unexplained option "HuffType" which I did not experiment with. lzp3o2 has a separate decompresser "lzp3o2d.exe" included in the distribution.

    History

    May 10 2006 - benchmark began with 1 month of testing about 2 compressors per day.
    Jun 10 2006 - began test data analysis.
    Jun 14 2006 - updated xml-wrt 2.0 14.06.06 | ppmonstr.
    Jun 17 2006 - reorganized website from 1 big page to 4 smaller pages.
    Jun 19 2006 - added xml-wrt 2.0 19.06.06 (standalone LZMA mode).
    Jun 20 2006 - added ocamyd 1.65 LTCB 1.0.
    Jun 21 2006 - updated TC 5.0 to dev 4 (compression unchanged but faster).
    Jul 19 2006 - updated TC 5.0 to dev 9, added dark 0.32b.
    Jul 20 2006 - added arbc2z.
    Jul 21 2006 - added TarsaLZP (July 4 2006).
    Jul 22 2006 - added uda 0.300.
    Jul 23 2006 - verified uda 0.300 decompression.
    Jul 24 2006 - updated TC 5.0 to dev 11.
    Jul 29 2006 - added CTW 0.1.
    Aug 01 2006 - updated TarsaLZP (July 30 2006), added ppmvc v1.1.
    Aug 06 2006 - added the Hutter Prize, renamed Large Text Compression Benchmark to Human Knowledge Compression Contest,
                  added rules for the Hutter Prize, and updated rationale to add a section on AIXI.
    Aug 07 2006 - added link to paq8f, updated prize formula (Z might not decrease), and that prize committee members
                  are not elibible for prize money.  Added logo.  Minor edit to rationale.
    Aug 08 2006 - the prize fund (Z) does not decrease.
    Aug 11 2006 - added a lexcial and string repetition analysis to the data study.
    Aug 13 2006 - typo in Rationale.
    Aug 14 2006 - updated dark v0.40. Edited Rationale (AIXI, compression does not seem like AI, lossy compression).
    Aug 16 2006 - raq8g and durilca 0.5(Hutter) submitted for Hutter prize, neither verified yet.
    Aug 17 2006 - verified durilca 0.5(Hutter) claim.  Posted raq8g.exe for Windows.
    Aug 18 2006 - verified raq8h -7 on enwik8 under Windows.  Tested paq8f -8 on enwik8 (not verified).
                  Reported raq8h -8 result (Linux).
    Aug 19 2006 - updated ha, added Info-ZIP, ESP.  Clarified rules 5 and 6.
    Aug 20 2006 - Removed rules and results for the Hutter prize.  These may be found on the Hutter Prize website.
                  Updated ha and Info-ZIP.
    Aug 22 2006 - added paq8hp1.  Updated Info-ZIP.  Added submission times and unzipped .exe sizes for Hutter prize candidates.
    Aug 23 2006 - updated paq8hp1 for enwik9 -8 (compress only).  Tuned xml-wrt|ppmonstr for enwik8 at 2 GB.  Added durilca4linux.
    Aug 26 2006 - updated dark 0.46.  Fixed link to durilca4linux.  Posted enwik8.bz2 and enwik9.bz2 on the data page.
    Aug 28 2006 - added paq8hp2 (enwik8, 1 GB, not checked).  Updated ppmonstr, xmlwrt|ppmonstr, slim, and ash for 2 GB memory.
    Aug 29 2006 - verified paq8hp2 for enwik8 (1 GB and 2 GB).
    Aug 31 2006 - added bbb.
    Sep 01 2006 - updated bbb, TarsaLZP, paq8hp2 (as a preprocessor).
    Sep 02 2006 - corrected error in lexical analysis table on data page (found by Szymon Grabowski).
    Sep 03 2006 - added paq8hp3 -7 for enwik8 (Hutter prize candidate, verified).
    Sep 05 2006 - updated paq8hp3 (enwik9 -8, not verified).
    Sep 10 2006 - updated paq8hp4 (verified for enwik8), fixed links to PX and pimple.
    Sep 11 2006 - updated paq8hp4 for enwik9 (compression only), added paq1 and expanded PAQ series documentation.
    Sep 12 2006 - minor edits in paq8hp1, raq8g descriptions.
    Sep 13 2006 - updated paq8hp2 for enwik9.
    Sep 14 2006 - updated xml-wrt 3.0.
    Sep 15 2006 - updated xml-wrt 3.0|ppmonstr.
    Sep 20 2006 - updated paq8hp5 -7 enwik8.  Verified paq8hp4 -8 enwik9.
    Sep 21 2006 - updated paq8hp5 -8 enwik8.
    Sep 23 2006 - updated paq8hp5 -8 enwik9 (not verified).
    Sep 24 2006 - added QuickLZ.
    Sep 29 2006 - added fpaq0x, fpaq0s2.
    Sep 30 2006 - clarified submission dates for paq8hp2 through paq8hp5.  Posted paq8hp2 source code.
    Oct 01 2006 - updated fpaq0x1a, fpaq0s2b, tc 5.1 dev 1.
    Oct 02 2006 - updated tc 5.1 dev 2.
    Oct 06 2006 - posted paq8hp3 source code (now top ranked).  Added fpaq0x1b.
    Oct 08 2006 - added fpaq0s3.
    Oct 10 2006 - posted paq8hp4 source code (now top ranked).
    Oct 12 2006 - added fpaq0s4.
    Oct 13 2006 - added tc 5.1 dev 5.
    Oct 15 2006 - verified paq8hp5 -8 enwik9 decompression.  Added fpaq0s5.
    Oct 16 2006 - added durilca4linux_2 (now top ranked, not yet verified for enwik9).
    Oct 18 2006 - updated duricla4linux_2 (-t2(11) option).
    Oct 21 2006 - added fpaq2.
    Oct 22 2006 - updated QuickLZ 0.9.
    Oct 27 2006 - posted paq8hp5 source code (now ranked #2).
    Oct 30 2006 - updated fpaq0s6.
    Nov 03 2006 - mirrored enwik8.bz2 and enwik9.bz2 to mattmahoney.net/text
    Nov 05 2006 - updated paq8hp6.  Linked to FV results on data page.
    Nov 06 2006 - verified paq8hp6 -7 enwik9 decompression.
    Nov 07 2006 - updated fastari.
    Nov 10 2006 - added PeaZip.
    Nov 15 2006 - added paq8j.
    Nov 17 2006 - added paq8ja.
    Nov 20 2006 - added fpaq3.
    Nov 22 2006 - added paq8jb.
    Nov 29 2006 - added paq8jc.
    Dec 02 2006 - added fpaq3b.
    Dec 08 2006 - added paqh8p7a (enwik8 only), posted paq8hp6 source.
    Dec 10 2006 - updated paq8hp7a for enwik9 (not verified).
    Dec 12 2006 - added paq8hp7.
    Dec 13 2006 - updated paq8hp6 -8 enwik9.
    Dec 17 2006 - posted enwik8.pmd and enwik9.pmd (PPMD var. J format).
    Dec 21 2006 - added fpaq3c.
    Dec 24 2006 - added quad v1.01a, tc 5.1 dev 7.
    Dec 28 2006 - added fpaq3d.
    Jan 01 2007 - added paq8jd (enwik8 -7).
    Jan 02 2007 - updated paq8jd -8 enwik8 (not verified).
    Jan 08 2007 - added hook v0.2.
    Jan 11 2007 - added hook v0.3.
    Jan 12 2007 - added hook v0.3a.
    Jan 13 2007 - added tc 5.1dev7x.  Fixed hook.zip archive.
    Jan 15 2007 - posted paq8hp7 source code.  Added hook v0.4.
    Jan 17 2007 - completed dmc and Info-Zip 2.3.1.
    Jan 19 2007 - added paq8hp8.
    Jan 22 2007 - added hook v0.5b.
    Jan 27 2007 - added chile 0.4.
    Feb 03 2007 - added ocamyd-1.66.final (merged with ocamyd LTCB)
    Feb 07 2007 - added hook v0.6.
    Feb 08 2007 - added hook v0.6b, quad v1.04a, tc 5.2 dev 2.
    Feb 09 2007 - corrected error in tc 5.2 dev 2.
    Feb 12 2007 - added ccm_extra 1.03a.
    Feb 14 2007 - added hook v0.6c.
    Feb 15 2007 - added paq8k -8 enwik8 (not verified).
    Feb 20 2007 - added paq8hp9 -7 enwik8 (verified).
    Feb 22 2007 - updated paq8hp9 -7 enwik9.
    Feb 23 2007 - added link to paq8hp9any (revised paq8hp9, not tested), added quad 1.07b, ccm 1.1.1a.
    Mar 02 2007 - added ccm 1.1.2a.
    Mar 06 2007 - added LZPXj 1.2h.
    Mar 10 2007 - added paq8l enwik8.
    Mar 11 2007 - added hook v0.7.
    Mar 13 2007 - added hook v0.7b.
    Mar 14 2007 - added quad 1.08.
    Mar 17 2007 - added hook v0.8.
    Mar 18 2007 - added hook v0.8b.
    Mar 19 2007 - added hook v0.8c.
    Mar 21 2007 - added hook v0.8d, FreeArc 0.36.
    Mar 24 2007 - added quad 1.10.
    Mar 27 2007 - added paq8hp10 -7 enwik8, posted paq8hp9 source code, added hook v0.8e, M99.
    Mar 28 2007 - corrected M99 enwik8 result, updated FreeArc description, removed unsupported quad versions from main table.
    Mar 31 2007 - added paq8hp10any -8 enwik8.
    Apr 01 2007 - added dark 0.51, opendark.
    Apr 02 2007 - updated paq8hp10any -8 enwik9 (decompression not verified), added DGCA 1.10.
    Apr 05 2007 - added quad 1.11, quad 1.11HASH2, ccm 1.20a, updated FreeArc description.
    Apr 06 2007 - added hook v0.9.
    Apr 08 2007 - added freehook 0.2, ccm 1.20d.
    Apr 09 2007 - added xmill 0.9.1 (fails), barf, quad 1.12.
    Apr 10 2007 - added hook 0.9b, freehook 0.3.
    Apr 19 2007 - added M99 v2.1, QuickLZ 1.20 and 1.30beta, lzpm 0.02, tornado 0.1.
    Apr 22 2007 - added thor 0.94a.
    Apr 23 2007 - added ccm (ccmx) 1.21.
    Apr 27 2007 - added slug 1.1b.
    Apr 30 2007 - added paq8hp11 -7 enwik8.  Posted paq8hp10any source code.
    May 03 2007 - added paq8hp11any -8 enwik8, fpaq0p.
    May 05 2007 - added lzpm 0.03 and 0.04.  Fixed misleading description of DMC algorithm in hook.
    May 08 2007 - added lzc 0.01, hook0.9c.
    May 09 2007 - added pucrunch, TarsaLZP May 6 2007, thor 0.95, srank 1.1.
    May 10 2007 - added paq8hp11any -8 enwik9 (decompression not verified).
    May 11 2007 - added lzc 0.03, updated table description (time, memory, algorithms).
    May 14 2007 - added paq8hp12 -7 enwik8.
    May 16 2007 - added uc2, lzc 0.04.
    May 18 2007 - added BriefLZ 1.05.
    May 20 2007 - added paq8hp12any -8 enwik8/9 (decompression not verified), lzpm 0.06.  Updated times in main table to process times.
    May 21 2007 - added paq8hp12any -7/-8 enwik8 (decompression verified), 7zip 4.46a.
    May 26 2007 - added lzc 0.05b.
    May 29 2007 - added fpaq02.
    Jun 01 2007 - added turtle 0.01.
    Jun 02 2007 - added turtle 0.02.
    Jun 05 2007 - added turtle 0.03.
    Jun 08 2007 - added turtle 0.04.
    Jun 12 2007 - posted paq8hp11any source code, added turtle 0.05.
    Jun 16 2007 - added TarsaLZP ver. Jun 17 2007, FastLZ ver. Jun 12 2007, pim 2.01.
    Jun 23 2007 - added turtle 0.07.
    Jul 24 2007 - added lpaq1, pim 2.04b, TarsaLZP Jul 18 2007, posted paq8hp12any source code.
    Jul 30 2007 - added TarsaLZP Jul 30 2007.  Updated rules to allow 1800 MB memory.
    Jul 31 2007 - added pim 2.10.
    Aug 03 2007 - added sr2.
    Aug 07 2007 - added lzpm 0.07.  Underlined times and memory to indicate records.
    Aug 08 2007 - added pimple2.
    Aug 09 2007 - added lzpm 0.08, TarsaLZP Aug 8 2007.
    Aug 11 2007 - added TarsaLZP Aug 10 2007.
    Aug 13 2007 - added gziphack, retested gzip 1.3.5, Info-ZIP 2.32 Win32.
    Aug 14 2007 - added QuickLZ 1.30, compact.
    Aug 15 2007 - added lzturbo 0.01, WinTurtle 1.2.
    Aug 16 2007 - added paq8fthis2 -8 enwik8, WinTurtle 1.21, lzpm 0.09.
    Aug 23 2007 - added paq8n -8 enwik8, paq8osse -8 enwik8, thor 0.96a, lzpm 0.10.
    Aug 24 2007 - added paq8o -8 enwik8.
    Aug 29 2007 - added lzc 0.06b.
    Aug 30 2007 - added HKCC-2 enwik8 decompresser, added link to paq8o ver. 2, added WinTurtle 1.30, qazar 0.0pre5.
    Aug 31 2007 - added qc 0.050.
    Sep 02 2007 - added HKCC-2 Sep 01 2007 version, WinRK 3.03 SFX.
    Sep 06 2007 - added lzpm 0.11.
    Sep 13 2007 - added lzpmlite 0.11.
    Sep 14 2007 - added paq8o3 -8 enwik8.
    Sep 20 2007 - added lpaq2, hook 1.0.
    Sep 22 2007 - added paq8o4 v1, rings 0.1.
    Sep 29 2007 - added paq8o6 -8 enwik8.
    Sep 30 2007 - added lpaq3, elpaq3, lprepaq 1.2.
    Oct 01 2007 - added lpaq3a, lpaq3e.
    Oct 04 2007 - added lpaq4, lpaq4e.
    Oct 05 2007 - added lzturbo 0.1.
    Oct 16 2007 - added lpaq5, lpaq5e, withdrew HKCC-2.
    Oct 20 2007 - added paq8o7 -8 enwik8.
    Oct 23 2007 - added lpaq6, lpaq6e.
    Oct 24 2007 - added paq8o8 -8 enwik8.
    Oct 25 2007 - added lzc 0.07.
    Oct 28 2007 - added rule that benchmark results will be delayed 30 days after the latest version of the program is published.
    Nov 09 2007 - added lpaq7, lpaq7e*, xwrt 3.2*, sr3*.
    Nov 22 2007 - added quickLZ 1.40, rings 0.2, hook 1.1, lzc 0.08*.
    Nov 23 2007 - added lzpm 0.12.
    Dec 03 2007 - ranked lpaq7e, xwrt 3.2, sr3, lzc 0.08.
    Dec 04 2007 - added and ranked xwrt 3.2|ppmonstr J.
    Dec 05 2007 - added symbra 0.2*.
    Dec 11 2007 - added lpaq8*, lpaq8e*.
    Dec 13 2007 - added lcssr 0.2*.
    Dec 16 2007 - uploaded symbra 0.2, lcssr 0.2 mirrors, added fpaqa*, hook 1.3, lzpm 1.3, cmm1, cmm2.
    Dec 17 2007 - corrected cmm1, cmm2, ranked cmm1.
    Dec 18 2007 - added fpaqb*.
    Dec 20 2007 - updated fpaqb v2*, added fpaq0m, bit 0.1*.
    Dec 21 2007 - added lpaq1a.
    Dec 24 2007 - added fpaqc*.
    Dec 25 2007 - added lpq1, rings 0.3*.
    Dec 26 2007 - added FreeArc 0.40-pre-4*.
    Jan 09 2008 - added fpaq0r, fpaq0rs*, ranked lpaq8e, lcssr 0.2.
    Jan 11 2008 - added flashzip 0.01, flashzip 0.02*, winturtle 1.60*, ccmx 1.30*.
    Jan 13 2008 - added lzpm 0.14, cmm 080113*.  Updated pkzip 2.04 -ex.
    Jan 17 2008 - added lzpm 0.15.
    Jan 25 2008 - added fpaq0pv2, ranked FreeArc 0.40-pre-4, bit 0.1, rings 0.3, fpaq0mw.
    Jan 28 2008 - added fpaq0f*.
    Jan 30 2008 - added fpaq0f2*.
    Jan 31 2008 - added lzw 0.1, paq9a.  Repealed 30 day wait rule and ranked pending compressors marked with *.
    Feb 04 2008 - added flashzip 0.3.
    Feb 08 2008 - added lzw 0.2, rings 1.0.
    Feb 09 2008 - added cmm3 080207.
    Feb 11 2008 - added ppp.
    Feb 12 2008 - added lzp3o2, updated ppp description.
    Feb 13 2008 - added rings 1.1, lzrw1.
    Feb 14 2008 - added lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5, updated lzrw1.
    Feb 17 2008 - updated lzrw1-a, lzrw2, lzrw3, lzrw3-a, lzrw5 (new .exe sizes).
    Feb 21 2008 - added durilca4linux_3.
    Feb 22 2008 - added drt|lpaq9e.
    Feb 25 2008 - added lzturbo 0.9.
    Mar 04 2008 - added rings 1.2.
    Mar 09 2008 - added balz 1.02, rzm 0.06c, tornado 0.3.
    Mar 13 2008 - added Stuffit 12.0.0.17.
    Mar 14 2008 - added cmm4 v0.0.
    Apr 02 2008 - added rings 1.3.
    Apr 04 2008 - added fpaq0pv3.
    Apr 06 2008 - added fpaq0pv5.
    Apr 14 2008 - added rings 1.4c.
    Apr 15 2008 - updated rings 1.4c description.
    Apr 21 2008 - added rings 1.5.
    Apr 22 2008 - added durilca4linux_3 v2 (new dictionary).
    Apr 28 2008 - added lpaq9f.
    May 09 2008 - added balz 1.06.
    May 11 2008 - added packet 0.01, slug 1.27, rzm 0.07h.
    May 14 2008 - added balz 1.07.
    May 18 2008 - added packet 0.02.
    May 19 2008 - added fpaq0pv4, fpaq0pv4nc, fpaq0pv4nc0, fpaq0pv4a, fpaq0pv4anc, fpaq0pv4and0.
    May 20 2008 - added packet 0.03b, balz 1.08, fpaq0pv4b1.
    May 21 2008 - added balz 1.09.
    May 22 2008 - added durilca4linux3 v3, cmm4 v0.1e.
    May 23 2008 - updated cmm4 v0.1e description, lpaq9g, fcm1.
    Jun 03 2008 - added balz 1.12.
    Jun 04 2008 - added lpaq9h.
    Jun 10 2008 - added paq8o8-intel -1, paq8o8z-jun7 -1.
    Jun 12 2008 - added paq8o10t (enwik8 only), balz 1.13.
    Jun 13 2008 - added lpaq9i.
    Jun 14 2008 - added drt|ppmonstr (under lpaq9i).
    Jun 17 2008 - updated paq8o8z (note 25), durilca4linux_3 v3 (2 GB).
    Jun 18 2008 - added flzp v1.
    Jun 19 2008 - added packet 0.90b.
    Jul 17 2008 - added lzgt, lzgt1, lzgt2, lzgt3.
    Jul 19 2008 - added nanozip 0.01a, balz 1.15.
    Jul 20 2008 - updated nanozip 0.01a -txt, clarified method of creating zip archive of decompresser.
    Jul 22 2008 - added pim 2.50, tornado 0.4a, M99 v2.2.1.
    Jul 24 2008 - added 4x4 0.2a, bit 0.2b.
    Jul 25 2008 - added nanozipltcb.
    Jul 26 2008 - added flashzip 0.9.
    Jul 28 2008 - corrected Pareto frontier.
    Aug 02 2008 - added nanozip 0.03a, lzss 0.01.
    Aug 18 2008 - added flashzip 0.91, lpaq9j.
    Sep 05 2008 - added size vs. speed and memory graphs.
    Sep 26 2008 - added bzp 0.2, ppms J.
    Oct 02 2008 - added lpaq9k.
    Oct 27 2008 - added nanozip 0.05a.
    Oct 28 2008 - added lzgt3a.
    Nov 21 2008 - added bit 0.7. Updated test computer (note 26).
    Nov 27 2008 - added ppmx 0.01, sr3c.
    Nov 28 2008 - added mcomp 2.00.
    Dec 02 2008 - added lpaq9l, ppmx 0.02.
    Dec 22 2008 - added ppmx 0.03.
    Dec 29 2008 - added M1 0.2a.
    Jan 02 2009 - added M1 0.3.
    Jan 05 2009 - added ppmx 0.04.
    Jan 09 2009 - updated link to paq8hp12any.
    Jan 28 2009 - added xdelta 3.0u.
    Feb 09 2009 - added bcm 0.03.
    Feb 11 2009 - added bcm 0.04.
    Feb 21 2009 - added drt|lpaq9m.
    Mar 02 2009 - added Stuffit 2009 13.0.0.19, nanozip 0.06a, NTFS (LZNT1).
    Mar 05 2009 - added bcm 0.05.
    Mar 06 2009 - updated bcm 0.05.
    Mar 10 2009 - added flashzip 0.93a, fixed links to winturtle, flashzip, rings, hook, packet, bzp.
    Mar 12 2009 - added bwmonstr 0.00.
    Mar 15 2009 - added bcm 0.07.
    Mar 20 2009 - added bwmonstr 0.01.
    Mar 26 2009 - added flashzip 0.94, decomp8.
    Apr 01 2009 - added runcoder1.
    Apr 13 2009 - added lzturbo 0.94, M1 0.3b.
    Apr 14 2009 - added lzuf.
    Apr 16 2009 - added M1 0.3b parameter e8-m103b1-mh.
    Apr 17 2009 - added lzp2.
    Apr 18 2009 - added csc2.
    Apr 21 2009 - added paq8p3, paq8p3 v2.
    Apr 22 2009 - added decomp8b.
    Apr 22 2009 - added lzbw1 0.8.
    Apr 29 2009 - added hook 1.4.
    May 08 2009 - updated opendark-A.
    May 26 2009 - added decmprs8.
    Jun 01 2009 - added bcm 0.08.
    Jun 02 2009 - added reorder_v2|bcm 0.08.
    Jun 05 2009 - updated reorder_v2|bcm 0.08 xlt.
    Jul 14 2009 - added bwmonstr 0.02
    Jul 16 2009 - updated bwmonstr 0.02 comments.
    Jul 21 2009 - added durilca'kingsize
    Jul 23 2009 - moved website to http://mattmahoney.net/dc/
                  added paq8px_v60_turbo, split paq from paq8hp entries, moved decompr8 series to lpaq,
                  added flashzip 0.99, updated sr3.exe to remove antivirus false alarms due to upack.
    Aug 07 2009 - added packet 0.91b.
    Aug 14 2009 - added csc3 v.2009.8.12, combined with csc2.
    Aug 16 2009 - added and corrected rings 1.6.
    Aug 26 2009 - added flashzip 0.99b4.
    Sep 14 2009 - added zpaq 1.03.
    Sep 15 2009 - updated zpaq 1.03 cmax3.cfg.
    Sep 16 2009 - updated zpaq 1.03 cmax4.cfg, updated paq8hp12 links,
    Sep 17 2009 - added rule that each compressor can only be listed once,
                  so removed xwrt|ppmonstr. Updated zpaq 1.03 with drt|cmax4.cfg (not in main table),
                  updated zpaq 1.03 cmax_enwik9.
    Sep 18 2009 - updated zpaq 1.03 o0.cfg, o1.cfg, o2.cfg, drt|max_enwik9drt.cfg.
    Sep 23 2009 - added csc31.
    Oct 01 2009 - added zpipe 1.00 (zpaq).
    Oct 07 2009 - added zpaq cbwt_j2.cfg,18.
    Oct 11 2009 - added M03 v0.2a, lzp2 0.7c.
    Oct 13 2009 - added bcm 0.09.
    Oct 15 2009 - added zpaq v1.08 cbwt_slowmode1_1GB_block.cfg.
    Oct 15 2009 - added lz4 0.2.
    Oct 26 2009 - added zpaq v1.09 ocbwt_j1.cfg and corrected memory usage.
    Oct 29 2009 - corrections to Pareto frontier.
    Nov 12 2009 - added durilca'kingsize_4 (new dictionary).
    Nov 27 2009 - added lrzip 0.40.
    Nov 29 2009 - added tests for durilca'kingsize.
    Nov 30 2009 - added tests for durilca'kingsize_4, added lrzip 0.42.
    Dec 07 2009 - added 7zip 9.04a.
    Dec 15 2009 - added zhuff 0.1, bcm 0.10.
    Dec 17 2009 - added M1x2 v0.5-1.
    Dec 29 2009 - updated bcm 0.10.
    Jan 15 2010 - added bpe2 v1, bpe2 v2.
    Jan 17 2010 - updated shindlet link.
    Jan 19 2010 - added kwc.
    Jan 21 2010 - added acb 2.00c.
    Feb 01 2010 - added ulz 0.01.
    Feb 06 2010 - added ulz 0.02.
    Feb 08 2010 - added m1x2 0.6.
    Feb 12 2010 - added bpe, bpe2v3.
    Feb 14 2010 - updated bpe2v3 description.
    Feb 16 2010 - updated srank link.
    Feb 19 2010 - added ppmx 0.05.
    Feb 24 2010 - added szip 1.12a, fixed typos.
    Mar 01 2010 - added flashzip 0.99b8.
    Mar 03 2010 - added nanozipltcb 0.08.
    Mar 30 2010 - added etincelle alpha 3.
    Apr 07 2010 - added bsc 1.0.0.
    Apr 08 2010 - updated bsc 1.0.0.
    Apr 11 2010 - added bsc 1.0.3.
    Apr 23 2010 - corrections to ppmvc, ctxf.
    May 03 2010 - added yzx 0.01, bsc 2.00, fp8_v1, plzip.
    May 10 2010 - added csc32 a2, yzx 0.02, nanozipltcb 0.09.
    May 21 2010 - added yzx 0.03.
    May 27 2010 - added yzx 0.04.
    Jun 06 2010 - added nanozip 0.08a.
    Jun 09 2010 - updated lpaq9m.
    Jun 11 2010 - updated nanozip 0.08a, cmm4 0.2b, 7ip 9.12b (note 42).
    Jun 15 2010 - added bsc 2.20.
    Jun 21 2010 - updated winrk 3.03, ppmonstr J.
    Jun 22 2010 - added bcm 0.11.
    Jun 26 2010 - updated bcm 0.11, drt (lpaq9m).
    Jun 28 2010 - updated paq8hp12any (note 41), bcm link.
    Jul 16 2010 - added zp 1.00.
    Jul 28 2010 - added ppmx 0.06, bsc 2.26. Updated links to pimple2, ocamyd.
    Aug 05 2010 - updated zp 1.00 (zpaq).
    Aug 26 2010 - added lzham alpha 2.
    Aug 30 2010 - added lzham alpha 3.
    Sep 01 2010 - updated lzham alpha 3.
    Sep 26 2010 - added irolz.
    Oct 15 2010 - added st 0.51.
    Nov 02 2010 - added bcm 0.12.
    Dec 16 2010 - added bwtsdc v1.
    Jan 06 2011 - added bsc 2.4.5.
    Jan 23 2011 - added pzpaq 0.01.
    Jan 24 2011 - updated pzpaq 0.01.
    Jan 25 2011 - added lz4 0.6, lz4hc 0.9.
    Jan 31 2011 - added xz 5.0.1.
    Feb 19 2011 - added stz 0.7.2.
    Feb 23 2011 - added ppmx 0.07.
    Mar 02 2011 - added BWTmix v1.
    Mar 04 2011 - added stz 0.8.
    Mar 22 2011 - added csc32 final, zhuff 0.7.
    Mar 23 2011 - added bsc 2.5.0.
    Apr 27 2011 - added snappy 1.0.1.
    May 17 2011 - added crush 0.01.
    May 20 2011 - added zp 1.02.
    May 28 2011 - updated bwtsdc description.
    Jun 01 2011 - added flashzip 0.99c1. updated bcm 0.12.
    Aug 29 2011 - added bsc 3.0.0.
    Aug 30 2011 - corrections to bsc 3.0.0 description.
    Sep 01 2011 - added enwik8.zip and enwik9.zip to textdata.html.
    Sep 27 2011 - added comprox_ba 20110927, comprox_sa 20110927.
    Sep 28 2011 - added dzo beta, comprox_ba 20110928, comprox_sa 20110928.
    Sep 29 2011 - added comprox_ba 20110929, comprox_sa 20110929.
    Sep 30 2011 - added KuaiZip 2.3.2 x86, 7zip 9.20, Info-ZIP 3.00.
    Oct 02 2011 - added lzsr 0.01.
    Oct 10 2011 - added comprox 0.1.1, flashzip 0.99c3.
    Oct 12 2011 - added lz4 v1.2.
    Oct 20 2011 - added xpv5.
    Oct 31 2011 - added flashzip 0.99d1.
    Nov 02 2011 - added M03 v1.1b.
    Nov 05 2011 - added nanozip 0.09a. Added link to enwik8 ranking on compressionratings.com.
    Nov 13 2011 - added zpaq v4.00, merged with zp.
    Nov 24 2011 - added RangeCoderC v1.2.
    Nov 26 2011 - added RangeCoderC v1.3.
    Nov 29 2011 - added zhuff v0.8, RangeCoderC v1.4 and v1.5, link to dark.
    Dec 05 2011 - added RangeCoderC v1.6, v1.7a.
    Dec 09 2011 - added RangeCoderC v1.7.
    Dec 13 2011 - added RangeCoderC v1.8.
    Dec 17 2011 - added zcm v0.01.
    Dec 23 2011 - added zcm v0.02.
    Dec 31 2011 - added ppmx v0.08.
    Jan 01 2012 - updated ppmx v0.08.
    Jan 04 2012 - added yzx 0.11, zcm 0.03.
    Jan 17 2012 - added pigz 2.2.3, updated gzip 1.3.5.
    Jan 24 2012 - added MTCompressor 1.0.
    Jan 26 2012 - added paq8pxd.
    Jan 29 2012 - added TarsaLZP 29 Jan 2012.
    Jan 30 2012 - added zcm v0.04.
    Feb 11 2012 - added paq8pxd_v2.
    Feb 17 2012 - added paq8px_v69.
    Feb 19 2012 - added zcm 0.11.
    Mar 01 2012 - added fbc v1.0.
    Mar 02 2012 - added fbc v1.1. Converted decmprs8, decomp8, decomp8b, all_HKCC, lpaq9* to .zpaq
    Mar 05 2012 - added crook v0.1.
    Mar 18 2012 - added lrzip 0.612.
    Mar 22 2012 - corrected lrzip options.
    Mar 23 2012 - added data-shrinker 23Mar2012.
    Apr 04 2012 - added zcm 0.20b.
    Apr 11 2012 - added fp8 v2, FreeArc 0.666.
    Apr 19 2012 - added paq8pxd_v3.
    Apr 23 2012 - added paq8pxd_v4.
    May 02 2012 - added zcm 0.30.
    May 15 2012 - added fp8 v3.
    May 16 2012 - added zcm 0.40.
    May 17 2012 - updated zcm 0.40.
    Jun 02 2012 - added zcm 0.50a.
    Jun 12 2012 - changed spelling "Ratushnyak" to "Rhatushnyak" due to name change.
    Jun 17 2012 - added urban.
    Jul 10 2012 - added bsc 3.1.0.
    Aug 05 2012 - added diz.
    Aug 24 2012 - added comprox 0.6.0.
    Sep 01 2012 - added st 0.81.
    Sep 10 2012 - added comprox 0.7.0.
    Sep 11 2012 - updated comprox 0.7.0, added zcm 0.60d.
    Sep 26 2012 - added comprox 0.8.0.
    Sep 27 2012 - added comprox 0.8.0-bugfix1.
    Oct 05 2012 - added flashzip 1.0.0.
    Oct 07 2012 - added comprolz 0.1.0.
    Oct 10 2012 - added lazy 1.00.
    Oct 12 2012 - added TinyLZP 0.1, TinyCM 0.1.
    Oct 14 2012 - updated TinyLZP 0.1, added zcm 0.70b.
    Oct 18 2012 - added comprox 0.9.0, comprolz 0.2.0.
    Oct 21 2012 - added smile.
    Oct 22 2012 - updated smile.
    Oct 23 2012 - added zpaq 6.12.
    Oct 30 2012 - added exdupe 0.3.3 beta.
    Nov 19 2012 - added TarsaLZP 18.nov.2012.
    Nov 20 2012 - updated link to dmc.
    Nov 26 2012 - added comprox 0.10.0, comprolz 0.10.0.
    Dec 12 2012 - added flashzip 1.1.2.
    Dec 17 2012 - added comprox 0.11.0, comprolz 0.11.0.
    Dec 18 2012 - added comprox 0.11.0-bugfix1, comprolz 0.11.0-bugfix1.
    Jan 15 2013 - added lzwc 0.1, lzwc 0.3, lzwc_bitwise 0.7, lzip 1.14-rc3.
    Jan 17 2013 - added plzma_v3p, plzma_v3c.
    Jan 18 2013 - updated plzma_v3b (not v3p), plzma_v3c.
    Jan 23 2013 - added smac 1.8.
    Jan 24 2013 - added zpaq 6.19.
    Jan 31 2013 - added smac 1.9.
    Feb 01 2013 - added WinRAR 4.20.
    Feb 07 2013 - added smac 1.10.
    Feb 24 2013 - added smac 1.11.
    Mar 11 2013 - added smac 1.12a.
    Mar 15 2013 - added pigz 2.3.
    Mar 25 2013 - added smac 1.13.
    Apr 15 2013 - updated bwmonstr description.
    Apr 20 2013 - added smac 1.14.
    Apr 21 2013 - added paq8pxd_v5.
    Apr 30 2013 - added WinRAR 5.00b2.
    May 01 2013 - updated WinRAR 5.00b2.
    May 14 2013 - added lzturbo 1.1.
    May 15 2013 - updated lzturbo 1.1.
    May 16 2013 - added zcm 0.80.
    May 21 2013 - added smac 1.15.
    Jun 04 2013 - added mcm 0.0.
    Jun 13 2013 - added mcm 0.2.
    Jun 18 2013 - added tangelo 1.0 (fp8).
    Jun 22 2013 - added bcm 0.14, zcm 0.88.
    Jun 26 2013 - added zpaq 6.34.
    Jun 27 2013 - updated crush 0.01, added mcm 0.3.
    Jun 28 2013 - updated crush 0.01 description.
    Jun 30 2013 - updated bsc 3.10 description.
    Jul 01 2013 - added crush 1.00.
    Jul 02 2013 - updated crush 1.00.
    Jul 06 2013 - added tangelo 2.0 (fp8).
    Jul 08 2013 - updated tangelo 2.0.
    Jul 11 2013 - added rings 2.0.
    Jul 14 2013 - added bwtdisk 0.9.0.
    Jul 15 2013 - added crushm.
    Jul 17 2013 - added mcm 0.4.
    Jul 20 2013 - added tangelo 2.1.
    Jul 24 2013 - added tangelo 2.3.
    Jul 31 2013 - added smac 1.16, sharc 0.9.5b.
    Aug 01 2013 - updated sharc 0.9.6.
    Aug 20 2013 - added packet 1.0, paq8pxd_v7, zlite.
    Aug 28 2013 - added ppmz2 0.81.
    Oct 14 2013 - added zpaq 6.42, zpaqd 6.32 max5.cfg.
    Oct 16 2013 - added arj 3.10, zpaq 6.42 max6.cfg.
    Oct 28 2013 - added lzf 1.00.
    Oct 30 2013 - added lzf 1.01.
    Nov 01 2013 - added zling.
    Nov 04 2013 - added smac 1.17.
    Nov 19 2013 - added smac 1.17a.
    Dec 10 2013 - added smac 1.18, packet 1.1, packARC 0.7RC11, mtari 0.2.
    Dec 11 2013 - added cm0_ext (includes cm0, cm1, bwcm).
    Dec 12 2013 - added sharc 0.9.10.
    Dec 13 2013 - added sharc 0.9.11b.
    Dec 14 2013 - updated sharc 0.9.11b description.
    Dec 19 2013 - added smac 1.19.
    Dec 26 2013 - added zling Dec-25-2013.
    Jan 02 2014 - added lzv 0.1.0.
    Jan 08 2014 - added doboz 0.1.
    Jan 17 2014 - added smac 1.20.
    Jan 21 2014 - added zling Jan-21-2013.
    Jan 23 2014 - added cm4_ext.
    Feb 04 2014 - added zhuff 0.95b, 0.97 beta, alba 0.1.
    Feb 05 2014 - updated alba 0.1.
    Feb 06 2014 - added alba 0.2.
    Feb 10 2014 - added lzss 0.2.
    Feb 11 2014 - updated lzss 0.2.
    Feb 17 2014 - added RH, RH2.
    Feb 18 2014 - added alba 0.5.1.
    Feb 22 2014 - added ksc.
    Feb 27 2014 - added RH2 20Feb2014.
    Mar 02 2014 - added zling (libzling) 20140219.
    Mar 10 2014 - added tornado 0.6.
    Mar 15 2014 - added freearc 0.67a.
    Mar 23 2014 - added RH4_x64 22Mar2014.
    Mar 24 2014 - added libzling 20140324.
    Mar 25 2014 - added ppmx 0.09, zpaq 6.50.
    Apr 01 2014 - added tree 0.1.
    Apr 02 2014 - updated tree 0.1.
    Apr 04 2014 - updated tree 0.1.
    Apr 14 2014 - added libzling 20140414.
    Apr 16 2014 - added cmix v1.
    Apr 28 2014 - added tree 0.3.
    Apr 29 2014 - added RH4 24Apr2014.
    May 04 2014 - added zcm 0.90.
    May 05 2014 - added zling (libzling) 20140430-bugfix.
    May 12 2014 - updated gzip124hack description and link.
    May 16 2014 - added zcm 0.92.
    May 27 2014 - added tree 0.4, tree 0.5.
    May 29 2014 - added cmix v2.
    Jun 02 2014 - added lza 0.01.
    Jun 18 2014 - added paq8pxd_v8.
    Jun 27 2014 - added cmix v3.
    Jun 29 2014 - added paq8pxd_v10.
    Jun 30 2014 - added lza 0.10.
    Jul 05 2014 - added lza_x64 0.10.
    Jul 06 2014 - added tree 0.9.
    Jul 07 2014 - added zcm_x64 0.92.
    Jul 09 2014 - updated zcm_x64 0.92.
    

    This page is maintained by Matt Mahoney, mattmahoneyfl (at) gmail (dot) com.