ZPAQ Utilities

zpaqd v7.15 (Windows exe, source, docs) is a public domain development tool allows you to create, test, debug, and optimize new compression algorithms in streaming format that are compatible with all existing ZPAQ decompressors such as zpaq, zpipe, zpsfx, and the reference decoder. There are 3 built in compression levels, or you can describe the algorithm in the ZPAQL language in a configuration (.cfg) file and sometimes an external preprocessor. Neither file is needed to decompress. See libzpaq.h for a description of the ZPAQL language.

Sample configuration files.

fast.cfg, mid.cfg, and max.cfg describe the built in models for levels 1, 2, and 3 in libzpaq and zpipe. mid is equivalent to compression level -method 4 in zpaq. They are context mixing models without preprocessors using 2, 8, and 22 components respectively. With zpaq, you can pass an argument to increase memory usage to improve compression of large files.

bwt.1.zip contains 4 BWT based configurations with preprocessors (C++ source and Win32 executables) similar to -method 3. All are based on the Burrow-Wheeler transform, which sort the input by its following context, then compress with a low order model. The preprocessor uses Yuta Mori's libdivsufsort (open source MIT license).

bwt_slowmode1.zip is a low memory BWT compression algorithm by Jan Ondrus. It is based on on BBB slow mode with block sizes up to 1 GB using 1443 MB memory, instead of the usual 5 times block size. It compresses enwik9 to 163,565,006 bytes in a single block.

bmp_j4c.zip by Jan Ondrus uses a color transform preprocessor and a context model based on paq8px_v64 for compressing .bmp images. It achieves the best known result for the rafale.bmp benchmark, 521,992 bytes.

exe_j1.zip by Jan Ondrus contains a E8E9 preprocessor that improves compression of .exe and .dll files by translating x86 JMP and CALL addresses from relative to absolute. The context model is the same as max.cfg, but it typically compresses these file types 7-10% smaller at the same speed.

jpg_test2.zip by Jan Ondrus compresses JPEG images (which are already compressed) by an additional 15%. It uses a preprocessor that expands Huffman codes to whole bytes, followed by context modeling.

pi.cfg compresses pi.txt from the Canterbury miscellaneous corpus from 1 MB to 114 bytes. No other compressor is known to compress it to less than 415,241 bytes. It uses a postprocessor that ignores the decoded data and computes pi to 1 million digits. Warning: it may take several hours or days to run. It works only on this file. Discussion.

pi10k.zpaq by Kai Lüke and Matt Mahoney compresses 10,000 digits of pi to 112 bytes using the predicted byte as context rather than a postprocessor. (Download containing pi.cfg and pi10000.txt is 919 bytes and takes 18 seconds to decompress).

lz1.zip is a fast LZ77 based model similar to -method 1 and 2. It compresses enwik9 to 271,702,398 bytes in 391 seconds and decompresses in 227 sec. on a 2.0 GHz T3200 (Win32) in a single thread using 66 MB memory to compress and 18 MB to decompress. It include lz1.cfg and a preprocessor lzpre.cpp and lzpre.exe. The preprocessor can also compress by itself with less compression but better speed (327,501,489 bytes, 215 sec, 26 sec). The preprocessed format is a byte-aligned LZ77 using 2, 3, or 4 bytes to encode matches of up to 64 bytes with offsets up to 16 MB as explained in the source comments.

lazy v1.00 is a fast LZ77 compressor (source, Win32 .exe) with 5 compression levels, and equivelent config file (lazy.cfg). Level 3 compresses enwik9 to 325,609,617 in 125 sec (2 GHz T3200) and decompresses in 64 sec to disk or 21 sec. to nul:. It is the compressor used by zpaq -method 1 versions 6.00 through 6.14 (at level 3).

lazy v2.10 is like lazy, but with an E8E9 filter for better compression of .exe and .dll files. File size is limited to 1 GB (unlike lazy, which has no limit), and it needs about 1 GB memory. It includes an equivalent lazy2.cfg for decompression (but limited to 16 MB using 16 MB memory).

Miscellaneous Applications

zpipe (user's guide) is a simple application that compresses or decompresses from standard input to standard output. It illustrates the use of libzpaq. Contains GPL source and a Windows executable (updated to libzpaq v5.01 on Feb. 2, 2012).

zpsfx v1.01 (Apr. 4, 2012) is a self extracting archive stub for Windows. To use: compile, then append an archive created with zpaqd to the executable. Be sure that the archive begins with a header locator tag (included by default). To extract, run the program. For example:

  g++ -O2 -s zpsfx101.cpp libzpaq.cpp -o zpsfx.exe
  upx zpsfx.exe
  zpaqd c 2 archive files...                       (compression level 1, 2, or 3)
  copy/b zpsfx.exe+archive.zpaq archive.exe
To extract:
  archive.exe

For either program, you need libzpaq from the zpaq distribution to compile.

tiny_unzpaq.cpp v1.0, Mar. 21, 2012, is a small ZPAQ level 2 extractor derived from unzpaq200.cpp, with error checking, extra code, comments, and white space removed to make the source code compress as small as possible. It compiles by itself. It is written by Matt Mahoney and released to the public domain. To run: tiny_unzpaq archive.zpaq. It will create files as named in the archive. It only extracts files created in streaming format, as with zpaqd or zpaq -method s...