@@ -2,14 +2,17 @@ zpaqlpy compiler
22================
33
44Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd.
5+
56That way it is easy to develop new compression algorithms with ZPAQ.
7+
68Or to bring a decompression algorithm to the ZPAQ format so that the compressed data can be stored in a ZPAQ archive without breaking compatibility.
9+
710The Python source files are standalone executable with Python 3 (tested: 3.4, 3.5).
811
912Jump to the end for a tutorial or look into test/lz1.py, test/pnm.py or test/brotli.py for an example.
1013
11- Build with: make zpaqlpy
12- To build again: make clean
14+ Build with: ` make zpaqlpy `
15+ To build again: ` make clean `
1316
1417Copyright (C) 2016 Kai Lüke
[email protected] 1518
@@ -37,11 +40,13 @@ the compressed data this archive format wants to solve the problem that
3740changes to the algorithm need new software at the recipient's device.
3841Also it acknowledges the fact that different input data should be
3942handled with different compression techniques.
43+
4044The PAQ compression programmes typically use context mixing i.e.
4145mixing different predictors which are context-aware for usage in an
4246arithmetic encoder, and thus often achieve the best known compression
4347results. The ZPAQ archiver is the successor to them and also supports
4448more simple models like LZ77 and BWT depending on the input data.
49+
4550It is only specified how decompression takes place. The format makes
4651use of predefined context model components which can be woven into
4752a network, a binary code for context computation for components and a
@@ -62,6 +67,7 @@ to the archive. Arbitrary algorithms are not supported, but a good
6267variety of specialised and universal methods is available.
6368
6469Homepage: http://mattmahoney.net/dc/zpaq.html
70+
6571Working principle: http://mattmahoney.net/dc/zpaq_compression.pdf
6672
6773** zpaqd - development tool for new algorithms**
@@ -85,6 +91,7 @@ The zpaqlpy Python-subset
8591For user-defined sections of the template. Not all is supported but anyway
8692included for specific error messages instead of parser errors (e.g. nonlocal,
8793dicts, strings or the @-operator for matrix multiplication).
94+
8895Listed here are productions with NUMBER, NAME, ”symbols”, NEWLINE, INDENT,
8996DEDENT or STRING as terminals, nonterminals are defined on the left side of the -> arrow.
9097
@@ -142,15 +149,17 @@ the values for hh, hm, ph, pm like in a ZPAQ configuration to define the size of
142149H and M in hcomp and pcomp sections. In the dict which serves for calculation of
143150n (i.e. number of context mixing components) you have to specify the components
144151as in a ZPAQ configuration file, arguments are documented in the specification
145- (see --info-zpaq for link).
152+ (see ` --info-zpaq ` for link).
153+
146154Only valid Python programmes without exceptions are supported as input, so run
147155them standalone before compiling.
148156For the arrays on top of H or M there is no boundary check, please make sure
149157the Python version works correct. If you need a ringbuffer on H or M, you have
150- to use % len(hH) or &((1<<hh)-1) and can not rely on integer overflows or the
158+ to use ` % len(hH) ` or ` &((1<<hh)-1) ` and can not rely on integer overflows or the
151159modulo-array-length operation on indices in H or M like in plain ZPAQL because
152160H is expanded to contain the stack (and also due to the lack of overflows when
153161running the plain Python script)
162+
154163Only positive 32-bit integers can be used, no strings, lists, arbitrary big
155164numbers, classes, closures and (function) objects.
156165
@@ -167,16 +176,16 @@ refer to definitions in the first section.
167176 API functions for input and output, initialization of memory | no
168177 function hcomp and associated global variables and functions | yes
169178 function pcomp and associated global variables and functions | yes
170- code for standalone execution of the Python file analog to running a ZPAQL configuration with zpaqd r [ cfg] p|h | no
179+ code for standalone execution of the Python file analog to running a ZPAQL configuration with zpaqd ` r [cfg] p|h ` | no
171180
172181** Exposed API**
173182
174- The 32- or 8-bit memory areas H and M are available as arrays hH, pH, hM, pM
175- depending on being a hcomp or pcomp section with size 2** hh , 2** hm , 2** ph ,
176- 2** pm defined in the header as available constants hh, hm, ph, pm.
177- There is support for len(hH), len(pH), len(hM), len(pM) instead of calculating
178- 2** hh. But in general len() is not supported, see len_hH() below for dynamic
179- arrays. NONE is a shortcut for 0 - 1 = 4294967295.
183+ The 32- or 8-bit memory areas H and M are available as arrays ` hH ` , ` pH ` , ` hM ` , ` pM `
184+ depending on being a hcomp or pcomp section with size ` 2**hh ` , ` 2**hm ` , ` 2**ph ` ,
185+ ` 2**pm ` defined in the header as available constants hh, hm, ph, pm.
186+ There is support for ` len(hH) ` , ` len(pH) ` , ` len(hM) ` , ` len(pM) ` instead of calculating
187+ ` 2**hh ` . But in general len() is not supported, see ` len_hH() ` below for dynamic
188+ arrays. ` NONE ` is a shortcut for 0 - 1 = 4294967295.
180189
181190 Other functions | Description
182191----------------------------|--------------------------------------------------
@@ -191,18 +200,19 @@ len_pH(aref), … | Get the length of an array in pH/pM/hH/hM
191200free_pH(aref), … | Free the memory in pH/pM/hH/hM again by
192201 | destructing the array
193202
194- If backend implementations addr_alloc_pH(size), addr_free_pH(addr), … are
203+ If backend implementations ` addr_alloc_pH(size) ` , ` addr_free_pH(addr) ` , … are
195204defined then dynamic memory management is available though the API functions
196- alloc_pM and free_pM. The cast array_pH(numbervar) is sometimes needed when the
205+ ` alloc_pM ` and ` free_pM ` . The cast ` array_pH(numbervar) ` is sometimes needed when the
197206array reference is passed between functions because then it is just treated as
198207integer again because no boxed types are used in general.
199208
200- The template provides sample implementations of addr_alloc_pM, addr_free_pM , ….
209+ The template provides sample implementations of ` addr_alloc_pM ` , ` addr_free_pM ` , ….
201210The returned pointer is expected to point at the first element of the array. One
202211entry before the first element is used to store whether this memory section is
203212free or not. Before that the length of the array is store, i.e.
204213H[ arraypointer-2] for arrays in H and the four bytes
205214M[ arraypointer-5] …M[ arraypointer-2] of the 32-bit length for arrays in M.
215+
206216The last addressable starting point for any list is 2147483647 == (1<<31) - 1
207217because the compiler uses the 32nd bit to distinguish between pointers to M/H.
208218
0 commit comments