PAQ is a series of open source data compression archivers that have evolved through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). This page traces their development. All versions may be downloaded here (GPL source, Windows and Linux executables). Latest well supported versions.
Large Text Compression Benchmark Benchmarks on Calgary corpus PAQ benchmarks (solid archive) WRT dictionary benchmarks Calgary Corpus Challenge Contributors (each listed oldest to newest) Matt Mahoney, Serge Osnach Neural Network Compression (includes AAAI paper) PAQ1 (includes an unpublished paper) PAQ6 (and technical report) PAQ7 archiver PAQ8A, PAQ8F, PAQ8L, PAQ8M, PAQ8N Berto Destasio Johan de Bock David A Scott Fabio Buffoni Jason Schmidt Alexander Rhatushnyak (PAQAR, PAQ8H, PAQ8HP1-12) Przemyslaw Skibinski (WRT, PAsQDa, PAQ8B,C,D,E,G) Rudi Cilibrasi (raq8g) Pavel Holoborodko (PAQ8I) Bill Pettis (PAQ8JD, PAQ8K) Serge Osnach (PAQ8JB) Jan Ondrus (PAQ8FTHIS2)
The most recent paper describes PAQ6 and its derivatives PAsQDa and PAQAR as of 2005. The compressors use context mixing: a large number of models estimate the probability that the next bit of data will be a 0 or 1. These predictions are combined and arithmetic coded (similar to PPM). In PAQ6, predictions are combined by weighted averaging, then adjusting the weights to favor the most accurate models.
M. Mahoney, Adaptive Weighing of Context Models for Lossless Data Compression, Florida Tech. Technical Report CS-2005-16, 2005.
PAQ7 and later differ mainly in that model predictions are combined using a neural network rather than by weighted averaging. This is described in more detail in the paq8f.cpp comments.
see also the Wikipedia article on PAQ.
The Calgary corpus benchmarks have not been maintained
since about 2005 except for PAQ versions. Timing tests were done on a
now dead computer. Recent benchmarks.
Test results are shown on the Calgary corpus (14 individual files
or concatenated into a single file of 3,141,622 bytes) on a 750 MHz
Duron under Windows Me and 256 MB memory. All options set for maximum
compression
(generally slower) within 64 MB memory (which limits compression
on many of the better programs) unless indicated otherwise. Programs
are ordered by increasing compression on the concatenated corpus.
For sources to many programs, see
ftp://ftp.elf.stuba.sk/pub/pc/pack/.
Notes: slim does not have options to limit memory usage. slim
caused disk thrashing on my 256 MB PC, which was eliminated by using
-d16, with no loss of compression.
rkc (with -td+ option), durilca, compressia, and WinRK use English
dictionaries (marked with "D").
For programs that are not archivers (compress, gzip, epm, durilca, rkc,
ash),
the 14 file test size is the total size of 14 compressed files rather
than the size of the archive (so grouping similar files in a tar file
first might improve compression).
ash /m64 (64 MB memory) compresses poorly on the concatenated corpus
(about 1.2 MB) so I posted the result for unlimited memory. I didn't
try all the options to see which got the best compression.
Increasing WinRK 1.0 memory to 224 MB or PPM order from 16 to 32
does not improve compression.
The following are available below. Compressed size for the
concatenated corpus is always about 150-200 bytes smaller (due solely
to the archive header), and compression time is about the same.
Decompression time below is about the same as compression time, although
for some programs above (like gzip), decompression may be faster.
*Timed on an AMD 2800+ with 1 GB memory by Werner Bergmans. Times are
approximated for 750 MHz by multiplying by 3.6, the approximate
ratio of run times on both machines.
Times marked with "t" denote some disk thrashing.
**Tested on a PIII 500 MHz by Leonardo (run times not adjusted).
***Tested on a 2.2 GHz AMD-64 (in 32 bit XP), adjusted times 4.17.
D = Uses external English dictionary.
WRT11
is a word replacing transform preprocessor written by
Przemyslaw Skibinski. It replaces words with 1-3
byte symbols using an external dictionary. Run times include the
3 seconds to run WRT.
WRT20
was released Dec. 29, 2003.
WRT30
(generic dictionary) +
d2
dictionary (tuned to Calgary corpus as with WRT11-20) was released
Jan. 29, 2004. Results below:
Some improvement is possible by compressing the four binary files
separately and the text files as a solid archive. For example,
PAQ6 -6 and WRT20 + PAQ6 -6 each compress about 5K smaller. Savings
are similar for other PAQ and WRT versions.
PAsQDa 2.0 integrates PAQAR 4.0 with WRT and file reordering to
compress to 577,404 bytes, improved with later versions.
To restore the Calgary corpus:
The source for d.cpp is released under the GNU General Public License.
(It doesn't say so because there are no comments).
It is a stripped down version of PAQC that does decompression only.
PAQC can also be used as a general purpose archiver, although the
compression is usually not quite as good as PAQ6.
(PAQC differs mainly in an improved model for pic.) Use the
compression option -1 (default) for text, -2 for CCITT images, or -3
for other binary files. The program uses 190 MB memory.
On Apr. 2, 2004, Alexander Rhatushnyak submitted an entry of
637,116 bytes
using a modified version of PAQ (paqar -6).
He improved this to 619,922 bytes on Apr. 25, 2004,
to 614,738 bytes on May 19, 2004,
to 610,920 bytes on June 24, 2004,
and to 609,650 bytes on July 12, 2004.
The table below compares the
compression with paq6-emilcont-blaster (paq6eb -5), which was the best
available version of PAQ at the time of get637
(paq6eb -6 should compress about
500 bytes smaller but thrashes the disk on my 256 MB PC).
The corresponding compressor (source and executable) for get614
is PAQAR 1.0 (use -6 option).
The corresponding compressor for get610 is
PAQAR 2.0 -6.
Przemyslaw Skibinski submitted a challenge entry pc.ha
of 603,416 bytes on Apr. 4, 2005. It appears to be a variant of
PAsQDa with a tiny dictionary built in, and a single archive of
592,486 bytes. This was improved to 596,314 bytes,
(cc 596),
by Alexander Rhatushnyak on Oct. 25, 2005, 593,620
bytes on Dec. 3, 2005, 589,862 bytes on June 5, 2006.
The actual 589,862 byte entry is the two files prog.pmd and c.dat in
cc589.zip, not the zip archive. The size is calculated by adding the length
of the data file (c.dat), plus 1 byte for the terminator and 3 bytes
for the size. prog.pmd is a PPMd var. I archive containing the
decompressor C++ source code and two include files.
cc580, 580,170 bytes, by
Alexander Rhatushnyak on July 2, 2010. The code appears to be
based on PAQ8 rather than PAQAR, but organized as above.
These programs trace the historical development of the PAQ series of
archivers. I don't maintain this code, so if it doesn't work on your
compiler you will have to fix it yourself. These programs all work
like PAQ6 except that there are no options in the older programs.
PAQ1SSE/PAQ2 and PAQ3N are by Serge Osnach. Other versions are
by Matt Mahoney. Additional contributors after the release of PAQ6
are listed separately.
All of the compressors on this page work the same way.
These papers describe how the programs work.
PAQ1 uses a combination of models, the most important of
which is a nonstationary context-sensitive bit predictor (but
no neural network).
It give better compression than stationary models such as PPM
or Burrows-Wheeler on data where the statistics change over time
(such as concatenated files of different types).
paq1.exe Windows executable, requires 64 MB memory.
Originally posted Jan. 6, 2002. Last updated Jan. 21, 2002 to
use a Borland executable (rather than DJGPP), since it's smaller,
and to fix some bugs. Run time is the same and the archives are compatible.
Paper: The PAQ1 Data Compression Program (draft), PDF.
Jan. 20, 2002, revised Feb. 28 and Mar. 2, Mar 6, Mar. 19.
paq1.cpp source code and documentation. Updated
Jan. 21, 2002 to fix bugs and port to Borland (does not affect archive
compatibility).
To compile: g++ -O paq1.cpp If you want to modify the code, you might need
stategen.cpp which generated some of the source
code (the state tables for type Counter). Updated Jan. 20, 2002.
paq2.cpp source code. The source of PAQ2 is PAQ1SSE which can be found at
compression.graphicon.ru/so/
(in Russian). The only changes are to rename the program and to give
credit in the banner. Unfortunately this makes the archives incompatible
because the 4'th byte of every archive is changed from "1" to "2".
(I changed it because PAQ1 and PAQ2 archives are genuinely incompatible
and I wanted both programs to give a sensible error message).
paq3.cpp source code.
paq3.exe executable for Windows, compiled with
g++ -O (DJGPP 2.95.2) and packed with
UPX on 9/2/03.
paq3a.exe for Pentium 4, AMD Athlon, or higher.
Compiled with VS .net 7.1 and packed with UPX.
Runs 10% faster than paq3.exe.
(Compiled by Jason Schmidt, 9/6/03).
paq3b.exe
with Intel 7.1 using the "release" and "whole program optimization" options,
and packed with UPX.
It is about 10% faster than paq3a.exe on his 1600 MHz Athlon XP, but
about the same speed as paq3a on my 750 MHz Duron.
(Compiled by Jason Schmidt, 9/18/03).
paq3c.exe compiled with Intel 8.0 (beta).
The smallest (37,376 byte executable) and fastest.
(Compiled by Eugene D. Shelwien, 9/20/03).
All executables are archive compatible. I recommend paq3c.exe.
All of the following Windows executables are archive compatible: PAQ4 mixes models using adaptive rather than fixed weights, and also
includes an improved model for data with fixed length records. This
is all explained in the source code.
paq4v2.cpp Source code (ver. 2, Nov. 15, 2003) Version 2 fixes a bug in which some files were not decompressed correctly
in the last few bytes. It will correctly decompress files compressed
with either PAQ4 or PAQ4V2. Version 1 is given below for reference only.
(Thanks to Alexander Rhatushnyak for finding the bug).
paq4.cpp Source code (Oct. 16, 2003) PAQ5 has some minor improvements over PAQ4, including word models
for text, models for audio and images, an improved hash table, dual
mixers, and modeling of run lengths within contexts. It uses about 186 MB
of memory. Updated Dec. 18, 2003.
paq5.cpp source code, includes a more detailed
description. The main improvement in PAQ6 over PAQ5 is in the context counter states.
When counting 0 and 1 bits in a context, it more
aggressively decreases the opposite bit count, and gives greater weight
to counts when there is a large differene between them. It also includes
models for .exe/.dll files and CCITT images. See the source code comments
for details.
PAQ6 is an archiving data compression program for most operating
systems including Windows, UNIX, and Linux. It ranks among the
top archivers for data compression, at the expense of speed and
memory. (A derived version has won the Calgary
Challenge). PAQ6 should be considered experimental, as I expect future
improvements. The purpose of the program is to foster the development
of better data models and algorithms.
These programs were developed with the help of many people.
They are open source and are free under terms of the
GNU General Public License.
To create a new archive, you specify the name of the archive on
the command line, and the files you want to compress, either after the
archive name or from standard input. Wildcards are not expanded in
Windows, so you can use dir/b to get the same effect. For
example, to compress all .txt files into archive.pq6
To view the contents of an archive:
PAQ6 (but not earlier versions) includes an option to trade off
compression vs. memory and speed. To compress:
There are no decompression options. Instead, the compression option
stored in the archive is used,
which means that the decompressor needs the same amount of
memory as was used to compress the files.
There are no options to add, update, or extract individual files.
You have to create or extract the entire archive all at once.
File names are stored and extracted as they are entered. Thus, if
you enter the file names without a directory path (which I recommend),
then they will be extracted to the current directory. The archive
does not store timestamps, permissions, etc., as these can't be
done portably.
paq6v2.cpp source code (Jan. 8, 2004) If you want to modify the state tables in the source code, you will
need stgen6.cpp.
PAQ6V2 is a replacement for PAQ6, which incorrectly decompresses some
small files (those that compress smaller than 4 bytes). PAQ6V2 will
correctly decompress files made by either version. Compression produces
identical archives so the benchmarks below for PAQ6 are valid.
See the bottom of this page for variants that improve on PAQ6 slightly.
Note that all versions are archive incompatible with each other unless noted.
paq6.cpp Source code, fully documented,
Dec. 30, 2003 paq6.exe, Windows executable, compiled using
Intel 8 + UPX, the fastest version in my
tests, compiled by Jason Schmidt, Dec. 31, 2003.
Non-Windows users can compile as follows:
The Windows executables below are slower but are archive compatible.
These are included for benchmarking purposes only.
paq6a.exe, DJGPP g++ 2.95.2 + UPX PAQ7 is a complete rewrite of PAQ6 and variants (PAQAR, PAsQDa).
Compression ratio is similar to PAQAR but 3 times faster. However it
lacks x86 and a dictionary, so does not compress Windows executables and
English text files as well as PAsQDa. It does include models for
color .bmp, .tiff, and .jpeg files, so compresses these files better.
The primary difference from PAQ6 is it uses a neural network to combine
models rather than a gradient descent mixer.
paq7.exe Windows executable, g++ compile (76,288 bytes, Dec. 24, 2005) To use: Tested under 32-bit Windows (g++, Borland, Mars under Me and XP),
64-bit Linux, and Solaris (Sparc). For non-Windows, see source code comments
to compile.
In Windows only the g++ version accepts wildcards in file names.
Note: when reading file names by piping DIR/B be sure the archive is not
in the directory you are compressing or else PAQ7 might try to compress (part of)
itself. Either put the archive in another directory or give the
archive a different extension than the files you are compressing like this:
paq7pp.exe is compiled with NASN 0.98.38, MinGW C++ 3.4.2, and UPX 1.24w as follows.
Executable size is 30,208 bytes.
The x86 model uses a preprocessor which is tested for correct decompression during compression.
If this fails, then the preprocessor is bypassed and compression is still correct.
Options are -0 (18 MB memory) to -9 (4 GB). -0 is faster than other options, and is the default.
-4 uses 115 MB. Each increment doubles memory usage.
paq8a.exe Windows executable (Pentium Pro or newer), Jan 27, 2006. To install in Windows, put paq8f.exe or a shortcut on the desktop.
To compress a file or folder, drop it on the icon. An archive with
a .paq8f extension is put in the same folder as the source. To
extract, drop the compressed file on the icon.
From the command line use as follows:
paq8f has a more robust detector for x86 preprocessing.
Rather than depend on the file name extension (.exe, .dll...) or
"MZ" in the header, it tries the E8E9 transform and tests if it
helps compression. This allows it to detect Linux executables
and reject 16-bit Windows executables. It divides the input file
into blocks and will not use the transform on non-executable data
within the file. Like earlier versions, the transform is tested
at compression time for correct decompression, and abandoned
if it fails. No user intervention is required.
paq8f uses a new indirect context model that improves compression
on most files, text and binary. For example, given a string
"AB...AC...AB...AC...AB...A?" it guesses "C" based on the previous
observation that "C" followed "BCB" after the first 3 occurrences of "A".
This is an example of an order (1,3) indirect context. paq8f also
models orders (1,1), (1,2), (2,1) and (2,2).
paq8f.exe Windows executable, g++ compile (Pentium Pro or higher), Feb. 28, 2006 Update Nov. 21, 2006. Updated the wording of the copyright notice (GPL).
There is no change to the code or the license. It is recommended that
all future versions should use this wording.
Update Nov. 22, 2006. paq-8f.zip
and paq-8f.tar.gz (Nov. 23, 2006)
UNIX/Linux source distribution prepared by Jari Aalto.
Update Dec. 15, 2006. paq-x86_64.tgz
x86_64 Linux port of paq8f by Matthew Fite. Also as a
patch. The updated assembler code
paq7asm-x86_64.asm in paq-x86_64.tgz assembled with YASM
should work with any version of PAQ that uses paq7asm.asm, which includes
all versions of paq7, paq8, and paq8hp* under Linux on X86_64
processors. It replaces MMX code with 64 bit SSE2 code.
Update Jan. 19, 2007. Updated the above assembler code (which does
not work).
paq8f.zip and paq8jd.zip use
new assembler code, which can be linked to any paq7/8 version with no changes
to the C++ code. The 64 bit Linux versions are archive compatible with the
Win32 versions but about 7% faster on an Athlon 64.
Update Jan. 30, 2007. Added 32-bit SSE2 assembler code by wowtiger
for Pentium 4.
Update Feb. 2, 2007. Added 32-bit Linux executables (by Giorgio
Tani) to paq8f.zip and paq8jd.zip. The archives contain source
and executables for Win32 for Pentium-MMX or higher, Win32 for
Pentium 4 or higher, and 32 and 64 bit Linux executables, and all
source code. (updated readme.txt on Feb. 12, 2007).
paq8l, Mar. 7, 2007, improves on paq8jd by adding a DMC model
and removing some redundant models in SparseModel, plus minor tuneups
and documentation fixes.
paq8m, Aug. 4, 2007, is paq8l with the improved
JPEG model from paq8fthis by Jan Ondrus. The JPEG model includes a bug
fix (it crashed on some malformed JPEG files), and some speed optimization
of the DCT/IDCT code. However, JPEG compression is still slower than paq8l.
The program will now report errors in case of malformed JPEGs, but they are
harmless.
Note: paq8m still crashes on one of the JPEG images in the private MFC compression
test from maximumcompression.com. paq8l does not have this problem.
Benchmarks with -6 option (files from maximumcompression.com)
on a 2.2 GHz Athlon-64, 2 GB, Win32:
These large-memory variations by Berto Destasio improve on PAQ4 and PAQ5.
paq4-emilcont-duritium.exe is
a large memory version (about 364 MB) of PAQ4v2 by Berto Destasio which takes
first place on his benchmark as of Nov. 22, 2003.
It's not compatible with any other version. I did not test this on the
Calgary corpus because my PC has only 256 MB memory.
Also, from examining the source code at
paq4v2-emilcont-duritium.cpp,
I believe there is a bug in the random number generator that could cause
decompression errors.
The program uses modified counter state transition tables, generated with
stategen-emilcont.cpp
paq5-emilcont-deuterium.cpp
(needs 168 MB), Dec. 26, 2003, tuned from PAQ5. The bug in the random number generator
is fixed. Additional improvements of pre-release versions of PAQ6 which I
sent him. PAQ6 improves on these, however.
paq6-emilcont-jackdamarioum.cpp
(needs 344 MB), Dec. 29, 2003 Adds a new sparse model (SparseModel2) to paq606fb.
paq6-emilcont-febas.cpp, Mar. 28, 2004 No source code yet.
paq6-emilcont-anny.exe, Mar. 30, 2004 paq6-emilcont-anny-607fb.exe, Apr. 1, 2004
paq6-emilcont-blaster.cpp Apr. 7, 2004 Versions derived from paq6ebb.cpp. Compiled by Jason Schmidt,
Apr. 18, 2004. (Add "using namespace std;" to .cpp file to compile)
paq6-emilcont-destroyer.cpp, Apr. 12, 2004 paq6-emilcont-annyhilator.cpp, Apr. 12, 2004 paq6-emilcont-harlock.cpp, Apr, 15, 2004 paq6-emilcont-italia, May 2, 2004 The newest versions of Emilcont can be found at
http://www.freewebs.com/emilcont/index.htm PAQ6eb compiled by Johan De Bock contains 2 minor changes to
paq6-emilcont-blaster to compile with the
Intel 8 compiler (added "using namespace std;" and corrected the
line "CounterMap t0, t1, t2, t3, t4, t5, t6,;"). It is otherwise
identical to paq6-emilcont-blaster but about 40% faster.
paq6eb.cpp, Apr. 8, 2004 PAQ6ebb is PAQ6eb that reports compression progress as it runs.
This replaces a version posted Apr. 9 which had a bug and was removed.
paq6ebb.cpp, Apr. 10, 2004 PAQ6v2ds is a variant of PAQ6v2 by David A. Scott that uses
64 bit arithmetic encoding.
It improves compression by about 0.05% over PAQ6v2, but is about 3% slower.
The compiler must
support the unsigned long long type (e.g. g++ and some others).
All of the PAQ6 variants from here on accept the same compression
options as PAQ6.
paq6v2ds.cpp, Jan. 17, 2004 PAQ6fdj2 is a variant of PAQ6fdj that has about the same performance but
includes an integrity check during decompression. It uses a CACM
arithmetic coder which compresses very close to the Shannon limit.
(See Moffat, A., Neal, R. M., Witten, I. H. (1998),
Arithmetic Coding Revisited,
ACM Trans. Information Systems, 16(3) 256-294).
paq6fdj2.cpp bit_byts.cpp
bit_byts.h Source: Jan. 20, 2004 PAQ32 is a variant of PAQ6fdj2 that returns the encoder to 32 bits
for a bit more speed. Compression is nearly identical to PAQ6fdj2
(since there is no point in using higher precision with a CACM coder).
paq32.cpp bit_bytm.cpp
bit_bytm.h Source: Jan. 24, 2004 paq6fb.cpp, Jan. 19, 2004 PAQ601 includes a new mixer, some word model changes and some
SSE context changes. It uses the original PAQ6 arithmetic coder.
paq601.cpp, Jan. 24, 2004. PAQ603 is a version uses David Scott's 32 bit CACM coder.
paq603.cpp
bit_bytm.cpp
bit_bytm.h Jan. 25, 2004 PAQ605fb: new recordmodel, changes to state table,
minor changes and fine tuning. Includes CACM coder all in one file.
paq605fb.cpp, Jan. 30, 2004. PAQ606fb contains minor changes.
paq606fb.cpp, Mar. 15, 2004. PAQ607fb.
Several tuning (state table, SSE, charmodel,
sparsemodel), new recordmodel, extended mixer, modified sparsemodel2,
5% slower than paq606fb. Memory usage: -6 = 206 MB, -7 = 412 MB, -8 = 824 MB.
(Has a bug).
paq607fb.cpp, Mar. 30, 2004 paq6fdj.cpp, Jan. 19, 2004 This variant of PAQ601 includes David Scott's 64 bit coder from PAQ6fdj2.
paq602.cpp, Jan. 25, 2004. This uses his 32 bit CACM coder.
paq604.cpp
bit_bytm.cpp
bit_bytm.h Jan. 25, 2004 PAQ605fbj adds sparse record and word models to PAQ605fb. Memory usage
is 20% higher than stated in the help message.
paq605fbj.cpp, Jan. 30, 2004 These variants add even more models for a slight improvement at the
cost of speed and memory. The -5 option works with 256M memory but -6
does not.
paq6fbj8.cpp Feb. 20, 2004 paq6fbj9.cpp Feb. 20, 2004 Versions derived from paq6-emilcont-destroyer with changes to the
counter state tables, one extra CharModel order, and a minor change to
RecordModel2. VarB also adds sparse word modeling out to 12 words,
and is somewhat slower and
takes more memory than VarA, but gives better compression.
paq6ed-schmidtvara.cpp, Apr. 19, 2004 paq6ed-schmidtvarb.cpp, Apr. 19, 2004 PAQAR 1.0a is the compressor producing the files for get614.ha, the
top entry to the Calgary Challenge (614,738 bytes including the
decompressor) as of May 19, 2004. It also works as a general purpose
compressor and is the first PAQ version to take the #1 spot in the
Maximum Compression benchmark.
It uses 240 MB but will run very slowly on a 256 MB machine due to
disk thrashing (3.5 hours). With more memory it should take about
20 minutes (750 MHz).
To compile in g++ I had to add "#include <cstdio>"
and fix 2 old style for-loop scoping problems. (I did not change
the posted version, however).
Source
and .exe (RAR archive) PAQAR 1.1 improves compression and uses slightly less memory.
paqar1_1.rar, May 22, 2004 PAQAR 1.2 accepts the option -Ne
(e.g. -6e) to improve execution on x86 code (.exe, .dll files).
Source
and .exe (RAR archive) PAQAR 1.3
Source
and .exe (RAR archive) PAQAR 2.0
Source
and .exe (RAR archive) PAQAR 3.0
Compresses the Calgary corpus to 603,375 bytes as follows:
Source
and .exe (RAR archive) PAQAR 4.0
Compresses the Calgary corpus to 602,556 bytes as follows (GET609 order):
Source
and .exe (RAR archive) PAQAR 4.1 has a bug fix in the x86 preprocessor that caused
some 16-bit executables to decompress incorrectly when used
with the -e option in earlier versions. This bug also occurred
in PAsQDa versions prior to 4.3b. Calgary corpus results are
the same as 4.0.
Source
and .exe (RAR archive, Dec. 12, 2005) PAQAR differs from PAQ6 as follows (see whatsnew.txt in distribution):
PAQ7PLUS 1.11 combines the models from PAQ7 (includes .bmp, .tif, .jpg,
mixed with neural network) with the state table,
arithmetic coder, English dictionary and TE8E9 x86 preprocessor from PAsQDa.
Use with options -0 through -4 (low to high memory) or -0e to -4e to compress
.exe or .dll files. Speed is about the same for all options (like PAQ7).
paq7plus.rar PAQ7PLUS v1.19 - small improvements over v1.11,
posted Jan. 23, 2006.
PAQAR 4.5 and PAQARCC 4.5 will probably be the last version based on the PAQ6 core,
nothing from PAQ7 or PAQ8. PAQ8H is based on PAQ8G with some improvements to the model. Released Mar. 22, 2006, updated Mar. 24, 2006. PAQ8HP1 through PAQ8HP6 are specialized for the Hutter prize (text), and
lack models for binary data. They are not benchmarked here. See the
large text benchmark.
pasqda10.zip
(source, Windows .exe and dictionary) PAsQDa 2.0 combines WRT with PAQAR 4.0 and also reorders the input
files to improve compression.
pasqda20.zip
(source, Windows .exe and dictionary), Jan. 24, 2005 PAsQDa 2.1 - on non text files, does not use dictionary and automatically
restarts PAQ model. -Ne (-1e to -9e) on .exe/.dll files works like in PAQAR.
pasqda21.zip
(source, Windows .exe and dictionary), Jan. 31, 2005 PAsQDa 3.0 - word model is optimized for the preprocessor. During
compression of Calgary corpus, book2 becomes a predictor for textual
files (which increases the memory requirement).
pasqda30.zip
(source, Windows .exe and dictionary), Feb. 7, 2005 PAsQDa 4.0 - new dictionary and other improvements.
pasqda40.zip,
Apr. 4, 2005. PAsQDa 3.9 - uses less memory than 4.0
pasqda39.zip,
Apr. 7, 2005. PAsQDa 4.1 - includes a version optimized for the Calgary
corpus - PAsQDaCC.
pasqda41.zip,
July 1, 2005 PAsQDa 4.1b - is a bug fix for 4.1.
Version 4.1 fails to correctly decompress the word "bulandsness".
Thanks to Alexander Rhatushnyak for finding the bug.
pasqda41b.zip,
Oct. 13, 2005 PAsQDa 4.2 has 2 bug fixes. First, it fixes a bug in PAsQDa 4.1b
that incorrectly decompressed text files ending with a space character
(no trailing newline). Second, it fixes a bug in the x86 exe preprocessor
TE8E9 that incorrectly decompressed some 16-bit executables.
(Thanks to Alexander Rhatushnyak for finding both bugs and fixing the
x86 bug). Additional features:
pasqda42.zip, Dec. 8, 2005 PAsQDa 4.3. adds 2 more options. Intel compiles by Johan de Bock.
pasqda43.zip, Dec. 7, 2005 PAsQDa 4.3b fixes another bug in executables compressed with -e in
version 4.3. No changes in benchmarks.
pasqda43b.zip, Dec. 14, 2005 PAsQDa 4.3c fixes a bug in 4.3b that caused files ending in a punctuation
character such as , or ! to decompress incorrectly.
pasqda43c.zip, Dec. 21, 2005 PAsQDa 4.4 has improved file type detection and improved
compression on foreign language text.
pasqda44.zip, Jan. 4, 2006 PAQ8A2 adds WRT dictionaries to PAQ8A (Feb 7, 2006). PAQ8B replaces PAQ8A2 (which was a pre-release I wasn't supposed
to post). It is faster (Intel 8 compile by Johan De Bock), has
improved file detection, and fixes a bug in PAQ8A and PAQ8A2 where
it was leaving temporary files behind. To install, put paq8b.exe in
your PATH and put the 7 wrt*.dic files in a subdirectory
TextFilter under the directory where you put paq8b.exe.
paq8b.zip Feb. 8, 2006 PAQ8C
paq8c.zip Feb. 12, 2006 PAQ8D
paq8d.zip Feb. 15, 2006 PAQ8E
paq8e.zip Feb. 23, 2006 PAQ8G is PAQ8F with dictionaries added. However it uses the
same user interface as older PAQ versions (no drag and drop).
Additional improvements:
paq8g.zip (source, Windows and Linux executables, Mar. 3, 2006). Update: Aug. 22, 2006. I added paq8ib.exe to the archive. This is a Borland 5.5
compile of the same code to fix a bug (also in paq8g and paq8h) that causes the program
to crash on some text files when compiled with MINGW 3.4.2 g++ -O. The bug does not occur
when compiled with Borland, VC++, or Intel C++, or with g++ without optimization.
However, paq8ib.exe is about 20% slower than paq8i.exe. No source code was
changed but a file "vector" was added. They were compiled:
Update: Sept. 4, 2006. paq8ib.exe crashes on most files, so I removed it.
I added paq8idmc.exe, compiled with Digital Mars 8.38n, which appears to work.
The original g++ compile is named paq8igcc.exe. I changed one line of paq8i.cpp
from #include "vector" to #include <vector>.
The Mars compile is 12-14% slower than the gcc compile. To compile in Mars:
Update: Sept. 13, 2006. paq8i_cleaned.zip
is a "cleaned up" version of the source code with a Mars 8.49 compile,
by Michael Adams. It splits up the source code, strips out
inline targets, and fixes some warnings. It is archive-compatible with other
paq8i versions.
paq8j
(Nov. 13, 2006) is based on paq8f with model improvements from paq8hp5,
but without dictionaries. It uses the paq8f drag and drop interface.
paq8jd (Dec. 30, 2006) (linked above) is based on paq8jc with additional
APM (SSE) stages.
Update (Jan 19, 2007). Ported paq8f and
paq8jd to AMD64 Linux. The zip files contain
source code (C++, 32 and 64 bit NASM/YASM assembler, Win32 and Linux-x86_64
executables. The new paq7asm-x86_64.asm (using 64 bit SSE2 code in YASM)
can be linked to any paq7/8 version with no changes to the .cpp file.
Update (Jan 30, 2007). Added SSE2 assembler source code by wowtiger
for 32-bit Pentium 4 or higher
to the paq8f.zip and paq8jd.zip
downloads. The code should work with any paq7/8 version.
Speed is improved by about 1%. A Win32 paq8jdsse.exe is included in paq8jd.
paq8k,
Feb. 13, 2007.
(See also PAQ1SSE and PAQ3N).
paq8ja (Nov. 16, 2006)
improves the sparse model of paq8j for better
compression of binary and some text files. The model groups bytes in 6 categories
(letters, punctuation, etc) and uses up to order-11 contexts.
paq8ja uses the drag and drop interface of paq8j.
paq8jb (Nov. 21, 2006)
adds a distance model, using context of distance back to an anchor
character (x00, space, newline, xff) combined with previous
characters. Win32 compiled with VS2003.
Update, Nov. 23, 2006. paq8jbb.zip by
Andrew Paterson fixes some minor bugs (memory leaks) identified by Borland
CodeGuard. It maintains compatibility with paq8jb. It also includes a
Borland .exe, although it is slower than the VS compile.
paq8jc (Nov. 28, 2006) includes paq8jbb bug fixes,
improvements to the record model and minor tuneups.
paq8fthis (July 27, 2007) is paq8f with
improved JPEG compression.
paq8fthis2 (Aug. 12, 2007) further
improves JPEG compression, is faster, and fixes a bug that caused
paq8fthis to crash on some malformed JPEG data (e.g. JPEG fragments
in some Thumbs.db files).
Matt Mahoney, mmahoney@cs.fit.edu
Benchmarks
Calgary Corpus
Program Options 14 files Seconds Concatenated
------- ------- -------- ------- ------------
compress 1,272,772 1.5 1,318,269
pkzip 2.04e 1,032,290 1.5 1,033,217
gzip 1.2.4 -9 1,017,624 2 1,021,863
bzip2 1.0.0 -9 828,347 5 859,448
winhki v1.3e free (hki1 max) 830,315 6 852,745
7zip 3.11 a -mx=9 822,059 20 821,872
sbc 0.910 c -m3 740,161 4.1 819,016
GRZipII 0.2.2 e 768,609 4.5 794,045
GRZipII 0.2.4 e 773,008 3.9 793,866
sbc 0.970r2 -b8 -m3 738,253 5.5 784,749
ppms e 765,587 4 774,072
acb u 766,322 110 769,363
boa 0.58b -m15 751,413 44 769,196
winhki 1.3e reg (hki2 max) 752,927 14 768,108
winrar 3.20 b3 best, solid 754,270 7 760,953
ppmd H e -m64 -o16 744,057 5 759,674
rk 1.04 -mx3 -M64 -ts 712,188 36 755,872
ppmd J e -o16 -m64 756,763 5.5*** 753,848
rk 1.02 b5 -mx3 -M64 -ts 707,160 44 750,744
ppmn 1.00b N1 e -O9 -M:50 -MT1 716,297 23 748,588
enc v0.15 a 724,540 251 739,052
ppmonstr H e -m64 -o1 719,922 13 736,899
rkc a -M80m 685,226 87 710,125 (80 MB)
ppmonstr Ipre e -m64 -o128 696,647 35 703,320
epm r7 c -m64 693,538 49 702,612
durilca v.03a e -m64 D 696,789 29 696,845
(as in READ_ME.TXT) D 647,028 35
rkc a -M80m -td+ D 661,102 91 695,900 (80 MB)
ash cn-04 sse-9A9 /s64 709,837 109 694,527 (387 MB)
epm r9 c 668,115 54 693,636 (? MB)
slim 16 a -d16 662,991 139 686,796 (? MB)
slim 17 a 661,333 141 681,714 (? MB)
slim 18-19 a 659,358 153 678,898 (? MB)
slim 20 a 659,213 159 678,880 (? MB)
slim 21 a 658,494 156 678,652 (? MB)
durilca v.0.2a e -t2(7) -m64 D 658,943 30 678,372
(as in READ_ME.TXT and -m64) D 652,599 32
durilca v.0.1 e -t2(7) -m64 D 659,670 31 677,989
(as in READ_ME.TXT) D 652,840 33
compressia 1.0 beta (180 MB) D 650,398 66 674,830
Block size 5 (60 MB),English D 709,614 7 674,994
ppmonstr J e -o128 673,744 46*** 667,050 146 MB
WinRK 1.00b2 64M ppmz16 no dict 668,692 102 683,462
WinRK 1.00b2 64M ppmz16 dict D 639,545 102 655,955
WinRK 2.0.1 PWCM, no dictionary 617,240 1275 619,205 192 MB
WinRK 2.0.1 PWCM, dictionary D 593,348 1107 597,939 192 MB
WinRK 3.0.2b PWCM, dict. D 586,148 1326*** 591,342 700 MB
no dictionary, 700 MB 603,916 1505*** 608,915 700 MB
no dict., 256 MB 606,018 1301*** 611,188 256 MB
PAQ compressors found here
Compressor Solid archive size Seconds Memory used
---------- ------------------ ------- -----------
P5 992,902 31.8 256 KB
P6 841,717 38.4 16 MB
P12 831,341 39.2 16 MB
P12a 831,341 36.6
PAQ1 716,704 68.1 48 MB
PAQ2 702,382 93.1 48 MB
PAQ3 696,616 76.7 48 MB
PAQ3a 70.0
PAQ3b 70.6
PAQ3c 69.6
PAQ3N 684,580 156.2 80 MB
PAQ3Na 147.2
PAQ3N_ic8_ml_ipo (fastest) 142.0
PAQ3N_vc71 (smallest .exe) 162.0
PAQ4 672,134 222.4 84 MB
PAQ4a 186.0
PAQ4b 166.5
PAQ4v2a 183.2
WRT11 + PAQ4v2a 649,201 139.0
PAQ5a 661,811 366.3 186 MB
PAQ5b 298.3
WRT11 + PAQ5a 638,635 261.3 186 MB
PAQ5-EMILCONT-DEUTERIUM 661,604 494.6 168 MB
PAQ6a -0 858,954 51.8 2 MB
PAQ6a -1 780,031 65.6 3 MB
PAQ6a -2 725,798 76.1 6 MB
PAQ6a -3 709,806 97.4 18 MB
PAQ6b -3 79.2
PAQ6 -3 73.5
PAQ6a -4 655,694 354.1 64 MB
PAQ6a -5 648,951 625.2 154 MB
PAQ6a -6 648,892 635.8 202 MB
PAQ6b -6 549.2
PAQ6 -6 516.7
PAQ6b -7 647,767 592.6* 404 MB
PAQ6b -8 647,646 607.0* 808 MB
PAQ6v2ds -6 648,572 505.1 202 MB
PAQ6fb -6 648,257 428.3 202 MB
PAQ6fdj -6 647,923 444.7 202 MB
PAQ6fdj -7 646,932 455.8* 404 MB
PAQ6fdj -8 646,943 472.1* 808 MB
PAQ6fdj2 -6 647,898 430.0 202 MB
PAQ32 -6 647,898 428.5 202 MB
PAQ601 -6 647,369 445.9 202 MB
PAQ602 -6 646,931 430.6 202 MB
PAQ604 -6 646,875 435.0 202 MB
PAQ603 -6 644,978 419.9 202 MB
PAQ605fb -6 642,178 400.2 202 MB
-7 641,357 412.0* 404 MB
-8 640,978 423.8* 808 MB
PAQ605fbj -6 640,730 623.2 252 MB
-7 639,924 644.6 504 MB
-8 639,468 670.5 1008 MB
PAQ605fbj8 -5 640,629 750.7 <256 MB
-6 640,133 >256 MB
PAQ605fbj9 -5 640,768 716.3 <256 MB
-6 640,242 >256 MB
PAQ606fb -6 640,464 423.3 202 MB
PAQ6-emilcont-febas -5 639,770 625.8 <256 MB
-6 639,371 626* >256 MB
-7 638,404 636* >512 MB
-8 638,046 648* >1024 MB
PAQ6-emilcont-anny -5 638,740 817.9 <256 MB
-6 638,279 820* >256 MB
-7 637,289 833* >512 MB
-8 636,867 861* >1024 MB
PAQ607fb -6 634,892 556.4 206 MB (g++ compile)
PAQ6-emilcont-anny-607fb -5 634,471 805.8 <256 MB
-6 633,943
-7 633,133
-8 632,865
PAQ6-emilcont-blaster -5 633,551 891.5 <256 MB
-6 633,084
-7 632,242
-8 631,834
PAQ6-emilcont-destroyer -5 633,373 831.3 <256 MB (g++ compile)
PAQ6-emilcont-annyhilator -5 633,788 828.7 <256 MB (g++ compile)
PAQ6-emilcont-harlock -5 633,582 967.3 <256 MB (MARS compile)
PAQ6ed-schmidtvara -5 632,659 709.8 <256 MB
PAQ6ed-schmidtvarb -5 632,119 851.6 <256 MB
PAQ6-emilcont-italia -4 640,727 <256 MB
PAQAR 1.0 -6 (get614) 610,647 12733.7t 240 MB
PAQAR 1.0 -6 (get614) 610,647 1580* 240 MB
PAQAR 1.0 -7 (get614) 610,468 1598* 480 MB
PAQAR 1.0 -8 (get614) 610,649 9800*t 960 MB
PAQAR 1.1 -6 610,270 1675* 230 MB
PAQAR 1.1 -7 610,036 1696* 460 MB
PAQAR 1.1 -8 610,247 8453*t 920 MB
PAQAR 1.2 -6 610,244 7541.0t 230 MB
PAQAR 1.2 -6 1681*
PAQAR 1.2 -7 610,062 1701* 460 MB
PAQAR 1.3 -6 608,656 1668* 230 MB
PAQAR 1.3 -7 608,438 1687* 460 MB
PAQAR 2.0 -5 607,541 1792* 120 MB
PAQAR 2.0 -6 606,117 1779* 230 MB
PAQAR 2.0 -7 606,131 1780* 460 MB
PAQAR 3.0 -5 607,417 2021* 120 MB
PAQAR 3.0 -6 605,187 2024* 230 MB
PAQAR 3.0 -7 604,872 2015* 460 MB
PAQAR 4.0 -5 606,641 2129* 120 MB
PAQAR 4.0 -6 604,254 2127* 230 MB
PAQAR 4.0 -7 604,037 2116* 460 MB
PAQAR 4.0 -8 604,232 7311*t 920 MB
emilcontv02 -4 (MARS build) 654,118 334 <256 MB
(Intel 8 build) 228
emilcontv02 -5 (Intel 8 build) 635,336 669t ~256 MB
emilcontv03 alpha -3 651,932 789 <192 MB
PAsQDa10 -5 D 614,614 444.4 164 MB
PAsQDa20 -5 D 577,404 1564 130 MB
-6 D 576,890 1563* 240 MB
-7 D 577,063 1559* 470 MB
-8 D 577,178 2370* 930 MB
PAsQDa21 -4 D 578,750 1462 100 MB
-5 D 576,471 1555* 180 MB
-6 D 575,911 1552* 330 MB
-7 D 575,870 1548* 630 MB
-7e D 576,835 1574* 630 MB
PAsQDa30 -5 D 573,644 1585* 191 MB
-6 D 572,968 1576* 354 MB
-7 D 572,938 1580* 690 MB
PAsQDa40 -5 D 569,250 1570* 191 MB
-6 D 568,318 1558* 354 MB
-7 D 568,229 1563* 690 MB
-7e D 569,245 1584* 690 MB
PAsQDa39 -5 D 571,478 1609* 128 MB
-6 D 570,833 1601* 240 MB
-7 D 570,773 1601* 470 MB
-7e D 571,750 1623* 470 MB
-8 D 570,874 2890* 930 MB
-8e D 571,827 2801* 930 MB
PAsQDA41 -5 D 571,127 1586* 128 MB
-6 D 570,451 1579* 240 MB
-7 D 570,429 1600* 470 MB
-7e D 570,704 3186* 470 MB
PAsQDaCC41 -5 D 568,511 1627* 191 MB
-6 D 568,152 1616* 354 MB
-7 D 568,043 1634* 690 MB
-7e D 569,099 1634* 690 MB
PAsQDa 4.2 -5 D 571,268 1488 112 MB
PAsQDacc 4.2 -5 D 568,876 1432 175 MB
PAsQDa 4.3 -5 D 571,080 1442 128 MB
PAsQDacc 4.3 -5 D 568,580 1643 191 MB
PAsQDa 4.3c -5 D 571,080 1494* 191 MB
-6 D 570,385 1490 128 MB
-7 D 570,351 1483* 240 MB
-7e D 571,717 1508* 470 MB
-8 D 570,502 2955* 930 MB
PAsQDacc 4.3c -5 D 568,234 1490* 191 MB
-6 D 567,833 1490* 322 MB
-7 D 567,668 1512* 626 MB
-8 D 569,139
PAQ7 -1 625,924 650 56 MB (times are for g++ compile)
-2 618,301 645 87 MB
-3 614,209 710 150 MB
-4 612,338 740*** 275 MB
-5 611,684 740*** 525 MB
PAsQDa 4.4 -5 D 571,803 1538 128 MB
-7 D 571,011 1475*** 470 MB
PAsQDaCC 4.4 -5 D 567,548 1630 191 MB
-7 D 567,245 1480*** 626 MB
PAQ7PLUS v1.11 -0 D 586,198 461 53 MB
-1 D 582,337 468 84 MB
-2 D 579,799 501 146 MB
-3 D 578,388 503*** 272 MB
-4 D 577,691 507*** 522 MB
PAQ7PLUS v1.19 -0 D 585,071 478 53 MB
-1 D 581,602 480 84 MB
-2 D 579,357 500 146 MB
-3 D 578,057 512*** 272 MB
-4 D 575,538 514*** 522 MB
PAQ8A -4 610,624 792*** 115 MB
PAQ8A2 -4 D 592,976 577*** 116 MB
-6 D 592,847 577*** 418 MB
PAQ8B -4 D 592,976 515*** 116 MB
-6 D 592,847 516*** 418 MB
PAQ8C -4 D 572,763 497*** 116 MB
-6 D 572,265 501*** 418 MB
PAQAR 4.5 -5 D 570,374 1557*** 191 MB
-7 D 569,956 1540*** 626 MB
PAQARCC 4.5 -5 D 566,495 1552*** 191 MB
-7 D 565,495 1847*** 626 MB
PAQ8D -4 D 572,089 495*** 116 MB
-6 D 571,717 500*** 418 MB
PAQ8E -4 D 572,461 500*** 116 MB
-6 D 572,115 503*** 418 MB
PAQ8F -4 606,605 828*** 120 MB
-6 605,650 840*** 435 MB
-7 605,792 881*** 854 MB
PAQ8Fsse -7 816***
PAQ8G -4 D 575,351 561*** 120 MB
-6 D 575,521 572*** 435 MB
PAQ8H -4 D 572,018 694*** 120 MB
-6 D 572,077 702*** 450 MB
RAQ8G -6 603,312 1150*** 552 MB
PAQ8I -7 D 572,277 832*** 730 MB
PAQ8J -7 598,081 1810*** 959 MB
PAQ8JA -7 597,106 1997*** 992 MB
PAQ8JB -7 596,824 2030*** 1004 MB
PAQ8JC -7 596,883 2052*** 1017 MB
PAQ8JD -7 596,179 1997*** 1030 MB
PAQ8JDsse 1886***
PAQ8K -7 595,537 5984*** 767 MB
PAQ8L -6 595,586 1918*** 435 MB
-7 594,857 1872*** 837 MB
WRT (dictionary) benchmarks
WRT11 + PAQ6a -6 626,395 446.9 202 MB
WRT20 + PAQ6a -6 617,734 439.2 202 MB
WRT20 + PAQ6b -7 617,376 415.5* 404 MB
WRT20 + PAQ6b -8 618,005 423.2* 808 MB
WRT30 -p -b + PAQ6v2 -6 624,067 384.7 202 MB
WRT30d2 -p -b + PAQ6v2 -6 615,325 384.1 202 MB
WRT30 -p -b + PAQ603 -6 621,350 317.1 202 MB
WRT30d2 -p -b + PAQ603 -6 613,684 327.3 202 MB
WRT30d2 -p -b + PAQ606fb -6 609,877 312.0 202 MB
WRT30d2 -p -b + PAQ607fb -6 605,601 206 MB
WRT30 -p -b + PAQAR 1.2 -6 599,638 2091** 240 MB
WRT30d2 -p -b + PAQAR 1.2 -6 592,156 1934** 240 MB
WRT30 -p -b + PAQAR 1.2 -6 589,111 ** 240 MB (binaries separate)
WRT30d2 -p -b + PAQAR 1.2 -6 581,945 ** 240 MB (binaries separate)
WRT30 -p -b + PAQAR 4.0 -5 594,364 1633 120 MB
WRT30d2 -p -b + PAQAR 4.0 -5 587,029 1612 120 MB
paq6 -6 archive1 news bib book1 book2 paper1 paper2 progc progl progp trans
-> 508514 476557
paq6 -6 archive2 geo -> 45263 45274
paq6 -6 archive3 pic -> 29274 29254
paq6 -6 archive4 obj1 -> 8189 8068
paq6 -6 archive5 obj2 -> 52554 52965
------ ------
Total 643794 612118 with WRT20 in 5 archives
paq6 -6 archive * 648892 617734 with WRT20 in one archive
File sizes for PAQAR 1.2 -5 and -6 (reported by Leonardo on May 27-28, 2004).
Text file order is bib, book1, book2, news, paper1, paper2, progc, progl,
progp, trans, compressed together. PAQAR 2.0 results reported June 27, 2004.
PAQAR 1.2 -5 -6 2.0 -6
------------ ------ ------
text + WRT30 467172 466536
text + WRT30d2 459937 459370 457638
geo 44498 44481 44338
obj1 7778 7776 7653
obj2 46489 46331 45649
pic 23996 23987 23883
--- ------ ------ ------
Total WRT30 589933 589111
Total WRT30d2 582698 581945 579161
Calgary Challenge
paqc.cpp produced a winning
entry to the Calgary Challenge
with a RAR archive of 645,667 bytes containing a decompression program
and 5 compressed files on Jan. 10, 2004. PAQC is derived from PAQ6 as
explained in the source code.
unrar e calgary.rar
gxx -O d.cpp -o d.exe (depending on your compiler)
d v
d w
d x
d y
d z
The 5 compressed files (total size 639,567 bytes) were produced as follows:
paqc -1 v news bib book1 book2 paper1 paper2 progc progl progp trans
paqc -2 w pic
paqc -3 x geo
paqc -3 y obj1
paqc -3 z obj2
File paqc paq6eb get637 get619 get614 get610 get609 pc.ha cc 596 cc 593 cc 589
---- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
geo 45346 44955 45173 44409 44491 44338 44323
obj1 8154 8105 8216 7836 7781
obj2 52569 49667 50196 47516 46542 45649
pic 26072 27552 25840 24252 23989 23883 23872
others 507426 500377 499380 489644 485804 490713 535178 592486 588183 586071 582325
------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
Archive 639567 630636 628805 613657 608607 604583 603373 592486 588183 586071 582325
+code 637116 619992 614738 610920 609650 603416 596314 593620 589862
Contributors
Versions by Matt Mahoney and Serge Osnach
Neural Network Data Compression
P5, P6, and P12 are the only known data compression programs based on neural
networks that are fast enough for practical use.
You may download, use, copy, modify, and distribute these programs
under the terms of the
GNU general public license.
I recommend P12 unless you're short on memory. Files compressed with
one program cannot be decompressed with another.
Windows Executables
To use these archivers, run them from the command line in an MS-DOS box:
p12 Print this help message
p12 archive file file... Create new archive
dir/b | archive Create new archive of whole directory
p12 archive Extract or compare files from existing archive
more < archive View contents of archive
Files are never clobbered. The command:
p12 archive file
has the following meanings:
You can't update or extract individual files in an archive. You can
only create or extract/compare the whole archive at once. Timestamps,
permissions, etc. are not preserved. If you enter a path when compressing,
then the filename will be stored that way and extracted to that path,
for example:
p12 archive file1 sub\file2 \tmp\file3
p12 archive
then file1 will be extracted to the current directory, file2 will be
extracted to the the subdirectory sub of the current directory
(which must exist or the file will be skipped during extraction), and file3
will be extracted to \tmp from the root directory
(which also must exist). Substitute / for \ in UNIX. If you want your
files to be portable across Windows and UNIX, don't use a path, and
enter filenames in lower case.
Source Code
All three programs use std.h, a replacement for Borland 5.0's poor
implementation of vector and string (later fixed
in version 5.5). I am including them for reference, as the papers
below are based on them, but you may
have to port the code. I later ported P12 to g++ 2.952 (DJGPP for
Windows) as p12a.cpp, which does not require
std.h. This is the one I recommend you use. Archives created with
p12a and p12 are compatible, however other combinations are not.
To compile (ignore warnings):
gxx -O p12a.cpp -o p12.exe
PAQ1 Archiver
or: bcc32 -O paq1.cpp
PAQ2 Archiver
This is an improved version of PAQ1 with SSE added by Serge Osnach
(ench at netcity.ru). It compresses the Calgary corpus to 702,242
bytes (updated May 11, 2003).
paq2.exe executable for Windows.
PAQ3 Archiver
PAQ3 introduces improvements to SSE in PAQ2: linear interpolation between
buckets, a more compact SSE representation (2 1-byte counters), and
initialization to SSE(p) = p, and some minor improvements (updated
Sept. 3, 2003). Thanks to Serge Osnach for introducing me to SSE.
PAQ3N Archiver
PAQ3N contains modifications to PAQ3 by Serge Osnach, released Oct. 9, 2003.
It includes improvements
to the SSE context (including the last two characters) and a new
submodel (SparseModel), three order-2 models which each skip over
one byte. It is not archive compatible with PAQ3. It uses about 80 MB
memory. Available from his website at
www.thepipe.kiev.ua/download/paq3n.zip
or mirrored here:
paq3n.cpp
paq3n.exe (compiled by Serge Osnach, 10/9/03)
paq3na.exe (compiled by Jason Schmidt using VS .net 2003, 10/9/03)
ru.datacompression.info/paq3nb.rar contains
several faster and smaller variants compiled by Eugene D. Shelwien (10/9/03)
paq3n_ic8_ml_ipo.exe (fastest)
paq3n_vc71.exe (smallest, 10,752 bytes)
PAQ4 Archiver
paq4v2.exe Windows executable (g++ -O, UPX, 88,148 bytes)
paq4v2a.exe (39,424 bytes, 16% faster,
compiled by Jason Schmidt in VS .net 7.1 /O2 /G7, UPX --brute --force,
Nov. 22, 2003)
paq4.exe Windows executable (compiled with g++ -O
and packed with UPX, 88,136 bytes)
paq4a.exe, smaller (39,424 bytes) and 16% faster,
compiled by Jason Schmidt using VS .net 7.1 /O2 /G7 and packed with UPX 1.90w --brute
--force (Oct. 17, 2003)
paq4b.exe, even smaller (31,744 bytes) and
another 10% faster, compiled by Eugene Shelwien using Intel 8
(Oct. 21, 2003). Other versions (some as small as 9728 bytes) are
here.
PAQ5
paq5a.exe Windows executable, compiled with g++ -O
and UPX. I'm waiting for a faster version to call paq5.exe.
paq5b.exe compiled by Jason Schmidt, Dec. 19, 2003,
VS .net 7.1 /O2 /G7, UPX --brute --force
PAQ6
paq6 archive.pq6 file1.txt file2.txt (in any operating system)
paq6 archive.pq6 *.txt (in UNIX)
dir/b *.txt | paq6 archive.pq6 (in Windows)
To decompress:
paq6 archive.pq6
PAQ6 assumes you want to extract rather than compress files if the
archive already exists.
If the files to be extracted also exist, then PAQ6 will simply compare
them and report whether they are identical. PAQ6 never clobbers any files.
more < archive.pq6
File names and their lengths are stored in a human-readable header ending
with a Windows EOF character and a formfeed to hide the binary compressed
data. The first line starts with "PAQ6" so you know which version you
need to extract the files. Different versions (PAQ1, PAQ2, etc.) produce
incompatible archives.
paq6 -3 archive.pq6 files...
The -3 is optional, and gives a reasonable tradeoff. The possible values
are:
Compression option Memory needed to compress/decompress
------------------ ------------------------------------
-0 2 MB (fastest)
-1 3 MB
-2 6 MB
-3 18 MB (default)
-4 64 MB
-5 154 MB
-6 202 MB
-7 404 MB
-8 808 MB
-9 1616 MB (best compression, slowest)
paq6v2.exe Windows executable (Intel 8, UPX, by
Jason Schmidt)
PAQ6 v1
This version has a bug in that small files (those that compress to less than
4 bytes) will not decompress correctly. PAQ6V2 will correctly decompress
all files compressed with PAQ6. Thanks to Alexander Rhatushnyak for finding
the bug.
stgen6.cpp, program to generate the state table
in paq6.cpp. (You don't need this unless you want to modify it).
g++ -O paq6.cpp
paq6b.exe,
compiled by Jason Schmidt using VS .net 7.1 /O2 /G7 + UPX (Dec. 30, 2003)
paq6_versions.rar, 8 other compiles by Jason
Schmidt for older or multithreaded processors (RAR archive). See the
readme file. The fastest of these (by about 3%) on my PC is
PAQ6_P4_Athlon_AXP.exe, which is just paq6b.exe above.
Other executables
by Eugene Shelwien, including the smallest (12,288 bytes), and one
which displays compression progress (paq6_verb).
Source
(Jan. 5, 2004).
PAQ7
paq-7.exe Intel compile by Johan De Bock, 15% faster but doesn't accept wildcards (use dir/b) (47,616 bytes, Dec. 25, 2005)
paq7pp.exe g++ compile for Pentium Pro and higher (PCs since 1997),
4% slower than paq-7 but accepts wildcards (30,208 bytes, Jan. 2, 2006).
paq7 32-bit Linux 2.6.9 binary (elf, shared libraries, compiled like paq7pp, 66,908 bytes), Jan. 5, 2006
paq7static 32-bit Linux binary, static libraries (517,472 bytes), Jan. 5, 2006
To compress: paq7 -3 archive files...
or (in Windows): dir/b | paq7 -3 archive (reads filenames from standard input)
To extract/compare: paq7 archive
To extract with different names: paq7 archive files...
To view contents: more < archive
Compression option is -1 to -5 to control memory usage. Speed is about
the same for all options (slow):
-1 = 62 MB
-2 = 96 MB
-3 = 163 MB (default)
-4 = 296 MB
-5 = 525 MB
Memory usage is 10% less if no .jpeg images are detected.
dir/b *.txt | paq7 \temp\textfiles.paq7
Source code: paq7.cpp
and paq7asm.asm (assembles with
NASM,
or compile with -DNOASM (1/3 slower))
nasm -f win32 paq7asm.asm --prefix _
g++ paq7.cpp paq7asm.obj -O2 -Os -s -o paq7pp.exe -march=pentiumpro -fomit-frame-pointer
upx paq7pp.exe
PAQ8A
PAQ8A is an experimental pre-release of PAQ8. It has an improved context map (2 byte hash)
and state table, bug fixes in the jpeg model, a new x86 model, and minor improvements.
It does not include an English dictionary like paq7plus or pasqda, and does not have a .wav model.
paq8a.cpp Source code (compiled as with paq7pp and linked with paq7asm.obj)
PAQ8F
PAQ8F has 3 improvments over PAQ8A: a more memory efficient context
model, a new indirect context model to improve compression, and
a new user interface to support drag and drop in Windows. It does
not use an English dictionary like PAQ8B/C/D/E.
paq8f [-level] archive files... (compresses to archive.paq8f)
paq8f [-d] dir1\archive.paq8f [dir2] (extracts to dir2 if given, else dir1)
-level ranges from -0 (store without compression) to -9 (smallest,
slowest, uses most memory). Default is -5 (needs 256MB memory).
You can also compress directories the same way as files. The directory hierarchy
is restored upon extraction, creating directories as needed. However
file attributes like timestamps and permissions are not preserved.
To support drag and drop, paq8f will pause if run with only one argument
and no options until you press ENTER. To prevent this, use an option
like -5 or -d even if not required. paq8f does not read file names
from standard input like earlier versions. Wildcards are allowed
(requires g++ compile).
paq-8f.exe Intel compile by Johan de Bock (10% faster, but does not accept wildcards)
paq8f.cpp, see source for compile instructions, link with paq7asm.asm from paq7
PAQ8L
PAQ8M
PAQ8N
paq8n, Aug 18, 2007, is paq8l with the further
improved JPEG model from paq8fthis2 by Jan Ondrus. It no longer
reports harmless errors for malformed JPEGs.
842,468 a10.jpg Compression time (seconds)
698,214 a10.jpg.paq8f 19
667,190 a10.jpg.paq8fthis 47
667,722 a10.jpg.paq8l 22
674,995 a10.jpg.paq8m 36
660,740 a10.jpg.paq8fthis2 23
661,321 a10.jpg.paq8n 27
4,168,192 ohs.doc (contains a large embedded JPEG file).
553,493 ohs.doc.paq8f 105
524,926 ohs.doc.paq8fthis 217
547,082 ohs.doc.paq8l 171
518,694 ohs.doc.paq8m 228
519,163 ohs.doc.paq8fthis2 120
513,045 ohs.doc.paq8n 188
Compression is identical to paq8l and paq8m for non JPEG data.
Versions by Berto Destasio
paq5-emilcont-deuterium.exe,
compiled with Digital MARS
paq5ed.exe, about 23% faster, compiled by Jason
Schmidt using VS .net 7.1, Dec. 27, 2003 (not archive compatible).
paq6d-emilcont-jackdamarioum.cpp
(needs 396 MB), Dec. 29, 2003
paq6-emilcont-febas.exe
(has a bug).
paq6-emilcont-blaster.exe
paq6eba.exe Intel 8, UPX compile by Jason Schmidt, Apr. 8, 2004.
paq6-emilcont-destroyer.exe, Intel 8, UPX
paq6-emilcont-annyhilator.exe, Intel 8, UPX
paq6-emilcont-harlock.exe, Intel 8, UPX
Intel builds by Johan De Bock can be found at
http://studwww.ugent.be/~jdebock/win32_compressor_builds.htm
Versions by Johan De Bock
paq6eb.exe
paq6ebb.exe, Intel 8, UPX (Jason Schmidt, Apr. 11, 2004)
Versions by David A. Scott
paq6v2ds.exe, Windows executable,
compiled by Jason Schmidt
paq6fdj2.exe, Intel 8, UPX
(compiled by Jason Schmidt)
paq32.exe Intel 8, UPX (compiled by Jason Schmidt)
Versions by Fabio Buffoni
PAQ6fb is variant of PAQ6 by Fabio Buffoni that is a bit faster and gives
better compression than PAQ6. It should compile in g++, Borland, Mars
and VC6 (old or new for-loop scoping rules).
paq6fb.exe, Intel 8, UPX compiled by Jason Schmidt
paq601.exe Intel 8, UPX (compiled by Jason Schmidt)
paq603.exe, Intel 8, UPX (compiled by Jason Schmidt)
paq605fb.exe, Intel 8, UPX (compiled by Jason Schmidt)
paq606fb.exe, Intel 8, UPX (compiled by Jason Schmidt)
paq607fb.exe, DJGPP g++ 2.95.2, UPX
paq607fba.exe, Intel 8, UPX by Jason Schmidt, Apr. 8, 2004
Versions by Jason Schmidt
This variant by Jason Schmidt combines the modifications from both
PAQ6v2ds and PAQ6fb. (fdj = Fabio, David, Jason).
paq6fdj.exe, Intel 8, UPX
paq602.exe, Intel 8, UPX
paq604.exe, Intel 8, UPX
paq605fbj.exe, Intel 8, UPX
paq6fbj8.exe Intel 8, UPX
paq6fbj9.exe Intel 8, UPX
paq6ed-schmidtvara.exe, Intel 8, UPX
paq6ed-schmidtvarb.exe, Intel 8, UPX
Versions by Alexander Rhatushnyak
paqar1_0.rar, mirror, May 20, 2004
paqar1_2.rar, mirror, May 22, 2004
paqar1_3.rar, mirror, June 9, 2004
paqar2.rar, mirror, June 24, 2004
paqar -6 v book1 news paper2 paper1 book2 bib trans progc progp progl obj1 obj2
paqar -6 w pic
paqar -6 x geo
paqar3.rar, mirror, July 11, 2004
paqar -6 a book1 news paper2 paper1 book2 bib trans progc progl progp obj1 obj2
paqar -6 p pic
paqar -6 g geo
paqar4.rar, mirror, July 25, 2004, updated July 27
to fix a bug in the decompressor (does not change statistics).
paqar41.rar, mirror, posted Jan. 3, 2006.
Mirror (Jan 11, 2006).
paqar45.rar Feb. 13, 2006
paqar45.rar (mirror)
g++/NASM port (for Linux) by Luchezar Georgiev, Aug. 30, 2006, updated Sept. 4, 2006
paq8h.rar source code.
paq8h.zip includes source, Windows .exe and dictionaries.
Note: there are two executables: paq8h.exe (VC++) and paq-8h.exe (Intel by Johan de Bock).
The Intel compile is about 2% faster, and 9% faster than the original g++ compile
posted Mar. 22, which has been removed. All executables produce identical archives.
The benchmark timings are based on the Intel compile.
Versions by Przemyslaw Skibinski
PAsQDa 1.0 combines dictionary coding (WRT) with PAQ6v2.
Command: "pasqda -5 calgary.paqd book1 book2 paper1 paper2 bib news progc
progl progp pic trans obj1 obj2 geo" gives file with 614170 bytes (225.81
sec. on Celeron 2.4Ghz).
Mirror, Jan. 18, 2005
Mirror, posted Jan. 26, 2005
Mirror, posted Feb. 1, 2005
Mirror, posted Feb. 7, 2005
Mirror, posted Apr. 5, 2005
Mirror, posted Apr. 7, 2005
Mirror, posted July 15, 2005.
Mirror, posted Oct. 13, 2005.
Mirror, Dec. 8, 2005
These replace the post of Dec. 5, 2005 with faster executables (Intel compile
courtesy of Johan de Bock). No source code changes.
Mirror, posted Dec. 8, 2005
Mirror, posted Dec. 14, 2005
Mirror, posted Dec. 21, 2005
Mirror, posted Jan. 4, 2006
To install:
Mirror, Feb. 8, 2006
Intel9 compile by Johan de Bock.
(mirror) Feb. 13, 2006
(mirror) Feb. 15, 2006
Intel9 compile by Johan de Bock.
(mirror) Feb. 23, 2006
Additional dictionaries in 6 other languages are
available at http://www.ii.uni.wroc.pl/~inikep/research/dicts/.
Mirror.
Versions by Rudi Cilibrasi
raq8g is a modification of paq8f with
optimizations for the Hutter prize, released Aug. 16, 2006.
The improvements come mainly from modeling the
nesting of parenthesis and brackets in text, and from increased memory usage.
raq8g.exe (Windows executable, compiled with g++ and linked with
paq7asm.asm (NASM)). Commands work like paq8f. It does not use
dictionaries. The website has a Linux executable and raq8g.cpp.
Versions by Pavel L. Holoborodko
paq8i by Pavel L. Holoborodko, Aug. 18, 2006,
is a modification to paq8h to add a PGM (grayscale image) model.
Some results are included as a spreadsheet in the distribution.
BMP compression is also improved (small bug fix). It works like paq8h and
uses the same dictionaries for text compression (which must be present and identical
for decompression, in a TextFilter subdirectory under paq8i.exe).
nasm -f obj --prefix _ paq7asm.asm
bcc32 -O -DWIN32 -w-8027 paq8i.cpp paq7asm.obj
rename paq8i.exe paq8ib.exe
upx paq8ib.exe
nasm -f win32 --prefix _ paq7asm.asm
g++ -Wall %1.cpp -O2 -Os -march=pentiumpro -fomit-frame-pointer -s -o paq8i.exe
upx paq8i.exe
nasm -f obj --prefix _ paq7asm.asm
dmc -O -Ae -DWIN32 -I\dm\stlport\stlport paq8i.cpp paq7asm.obj
Versions by Bill Pettis
Versions by Serge Osnach
Versions by Jan Ondrus