Home | History | Annotate | Download | only in lzma
      1 LZMA SDK 9.20
      2 -------------
      3 
      4 LZMA SDK provides the documentation, samples, header files, libraries, 
      5 and tools you need to develop applications that use LZMA compression.
      6 
      7 LZMA is default and general compression method of 7z format
      8 in 7-Zip compression program (www.7-zip.org). LZMA provides high 
      9 compression ratio and very fast decompression.
     10 
     11 LZMA is an improved version of famous LZ77 compression algorithm. 
     12 It was improved in way of maximum increasing of compression ratio,
     13 keeping high decompression speed and low memory requirements for 
     14 decompressing.
     15 
     16 
     17 
     18 LICENSE
     19 -------
     20 
     21 LZMA SDK is written and placed in the public domain by Igor Pavlov.
     22 
     23 Some code in LZMA SDK is based on public domain code from another developers:
     24   1) PPMd var.H (2001): Dmitry Shkarin
     25   2) SHA-256: Wei Dai (Crypto++ library)
     26 
     27 
     28 LZMA SDK Contents
     29 -----------------
     30 
     31 LZMA SDK includes:
     32 
     33   - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
     34   - Compiled file->file LZMA compressing/decompressing program for Windows system
     35 
     36 
     37 UNIX/Linux version 
     38 ------------------
     39 To compile C++ version of file->file LZMA encoding, go to directory
     40 CPP/7zip/Bundles/LzmaCon
     41 and call make to recompile it:
     42   make -f makefile.gcc clean all
     43 
     44 In some UNIX/Linux versions you must compile LZMA with static libraries.
     45 To compile with static libraries, you can use 
     46 LIB = -lm -static
     47 
     48 
     49 Files
     50 ---------------------
     51 lzma.txt     - LZMA SDK description (this file)
     52 7zFormat.txt - 7z Format description
     53 7zC.txt      - 7z ANSI-C Decoder description
     54 methods.txt  - Compression method IDs for .7z
     55 lzma.exe     - Compiled file->file LZMA encoder/decoder for Windows
     56 7zr.exe      - 7-Zip with 7z/lzma/xz support.
     57 history.txt  - history of the LZMA SDK
     58 
     59 
     60 Source code structure
     61 ---------------------
     62 
     63 C/  - C files
     64         7zCrc*.*   - CRC code
     65         Alloc.*    - Memory allocation functions
     66         Bra*.*     - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
     67         LzFind.*   - Match finder for LZ (LZMA) encoders 
     68         LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
     69         LzHash.h   - Additional file for LZ match finder
     70         LzmaDec.*  - LZMA decoding
     71         LzmaEnc.*  - LZMA encoding
     72         LzmaLib.*  - LZMA Library for DLL calling
     73         Types.h    - Basic types for another .c files
     74         Threads.*  - The code for multithreading.
     75 
     76     LzmaLib  - LZMA Library (.DLL for Windows)
     77     
     78     LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
     79 
     80     Archive - files related to archiving
     81       7z     - 7z ANSI-C Decoder
     82 
     83 CPP/ -- CPP files
     84 
     85   Common  - common files for C++ projects
     86   Windows - common files for Windows related code
     87 
     88   7zip    - files related to 7-Zip Project
     89 
     90     Common   - common files for 7-Zip
     91 
     92     Compress - files related to compression/decompression
     93 
     94     Archive - files related to archiving
     95 
     96       Common   - common files for archive handling
     97       7z       - 7z C++ Encoder/Decoder
     98 
     99     Bundles    - Modules that are bundles of other modules
    100   
    101       Alone7z           - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
    102       LzmaCon           - lzma.exe: LZMA compression/decompression
    103       Format7zR         - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
    104       Format7zExtractR  - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
    105 
    106     UI        - User Interface files
    107          
    108       Client7z - Test application for 7za.dll,  7zr.dll, 7zxr.dll
    109       Common   - Common UI files
    110       Console  - Code for console archiver
    111 
    112 
    113 
    114 CS/ - C# files
    115   7zip
    116     Common   - some common files for 7-Zip
    117     Compress - files related to compression/decompression
    118       LZ     - files related to LZ (Lempel-Ziv) compression algorithm
    119       LZMA         - LZMA compression/decompression
    120       LzmaAlone    - file->file LZMA compression/decompression
    121       RangeCoder   - Range Coder (special code of compression/decompression)
    122 
    123 Java/  - Java files
    124   SevenZip
    125     Compression    - files related to compression/decompression
    126       LZ           - files related to LZ (Lempel-Ziv) compression algorithm
    127       LZMA         - LZMA compression/decompression
    128       RangeCoder   - Range Coder (special code of compression/decompression)
    129 
    130 
    131 C/C++ source code of LZMA SDK is part of 7-Zip project.
    132 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
    133 
    134   http://sourceforge.net/projects/sevenzip/
    135 
    136 
    137 
    138 LZMA features
    139 -------------
    140   - Variable dictionary size (up to 1 GB)
    141   - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
    142   - Estimated decompressing speed: 
    143       - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
    144       - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
    145   - Small memory requirements for decompressing (16 KB + DictionarySize)
    146   - Small code size for decompressing: 5-8 KB
    147 
    148 LZMA decoder uses only integer operations and can be 
    149 implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
    150 
    151 Some critical operations that affect the speed of LZMA decompression:
    152   1) 32*16 bit integer multiply
    153   2) Misspredicted branches (penalty mostly depends from pipeline length)
    154   3) 32-bit shift and arithmetic operations
    155 
    156 The speed of LZMA decompressing mostly depends from CPU speed.
    157 Memory speed has no big meaning. But if your CPU has small data cache, 
    158 overall weight of memory speed will slightly increase.
    159 
    160 
    161 How To Use
    162 ----------
    163 
    164 Using LZMA encoder/decoder executable
    165 --------------------------------------
    166 
    167 Usage:  LZMA <e|d> inputFile outputFile [<switches>...]
    168 
    169   e: encode file
    170 
    171   d: decode file
    172 
    173   b: Benchmark. There are two tests: compressing and decompressing 
    174      with LZMA method. Benchmark shows rating in MIPS (million 
    175      instructions per second). Rating value is calculated from 
    176      measured speed and it is normalized with Intel's Core 2 results.
    177      Also Benchmark checks possible hardware errors (RAM 
    178      errors in most cases). Benchmark uses these settings:
    179      (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter. 
    180      Also you can change the number of iterations. Example for 30 iterations:
    181        LZMA b 30
    182      Default number of iterations is 10.
    183 
    184 <Switches>
    185   
    186 
    187   -a{N}:  set compression mode 0 = fast, 1 = normal
    188           default: 1 (normal)
    189 
    190   d{N}:   Sets Dictionary size - [0, 30], default: 23 (8MB)
    191           The maximum value for dictionary size is 1 GB = 2^30 bytes.
    192           Dictionary size is calculated as DictionarySize = 2^N bytes. 
    193           For decompressing file compressed by LZMA method with dictionary 
    194           size D = 2^N you need about D bytes of memory (RAM).
    195 
    196   -fb{N}: set number of fast bytes - [5, 273], default: 128
    197           Usually big number gives a little bit better compression ratio 
    198           and slower compression process.
    199 
    200   -lc{N}: set number of literal context bits - [0, 8], default: 3
    201           Sometimes lc=4 gives gain for big files.
    202 
    203   -lp{N}: set number of literal pos bits - [0, 4], default: 0
    204           lp switch is intended for periodical data when period is 
    205           equal 2^N. For example, for 32-bit (4 bytes) 
    206           periodical data you can use lp=2. Often it's better to set lc0, 
    207           if you change lp switch.
    208 
    209   -pb{N}: set number of pos bits - [0, 4], default: 2
    210           pb switch is intended for periodical data 
    211           when period is equal 2^N.
    212 
    213   -mf{MF_ID}: set Match Finder. Default: bt4. 
    214               Algorithms from hc* group doesn't provide good compression 
    215               ratio, but they often works pretty fast in combination with 
    216               fast mode (-a0).
    217 
    218               Memory requirements depend from dictionary size 
    219               (parameter "d" in table below). 
    220 
    221                MF_ID     Memory                   Description
    222 
    223                 bt2    d *  9.5 + 4MB  Binary Tree with 2 bytes hashing.
    224                 bt3    d * 11.5 + 4MB  Binary Tree with 3 bytes hashing.
    225                 bt4    d * 11.5 + 4MB  Binary Tree with 4 bytes hashing.
    226                 hc4    d *  7.5 + 4MB  Hash Chain with 4 bytes hashing.
    227 
    228   -eos:   write End Of Stream marker. By default LZMA doesn't write 
    229           eos marker, since LZMA decoder knows uncompressed size 
    230           stored in .lzma file header.
    231 
    232   -si:    Read data from stdin (it will write End Of Stream marker).
    233   -so:    Write data to stdout
    234 
    235 
    236 Examples:
    237 
    238 1) LZMA e file.bin file.lzma -d16 -lc0 
    239 
    240 compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)  
    241 and 0 literal context bits. -lc0 allows to reduce memory requirements 
    242 for decompression.
    243 
    244 
    245 2) LZMA e file.bin file.lzma -lc0 -lp2
    246 
    247 compresses file.bin to file.lzma with settings suitable 
    248 for 32-bit periodical data (for example, ARM or MIPS code).
    249 
    250 3) LZMA d file.lzma file.bin
    251 
    252 decompresses file.lzma to file.bin.
    253 
    254 
    255 Compression ratio hints
    256 -----------------------
    257 
    258 Recommendations
    259 ---------------
    260 
    261 To increase the compression ratio for LZMA compressing it's desirable 
    262 to have aligned data (if it's possible) and also it's desirable to locate
    263 data in such order, where code is grouped in one place and data is 
    264 grouped in other place (it's better than such mixing: code, data, code,
    265 data, ...).
    266 
    267 
    268 Filters
    269 -------
    270 You can increase the compression ratio for some data types, using
    271 special filters before compressing. For example, it's possible to 
    272 increase the compression ratio on 5-10% for code for those CPU ISAs: 
    273 x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
    274 
    275 You can find C source code of such filters in C/Bra*.* files
    276 
    277 You can check the compression ratio gain of these filters with such 
    278 7-Zip commands (example for ARM code):
    279 No filter:
    280   7z a a1.7z a.bin -m0=lzma
    281 
    282 With filter for little-endian ARM code:
    283   7z a a2.7z a.bin -m0=arm -m1=lzma        
    284 
    285 It works in such manner:
    286 Compressing    = Filter_encoding + LZMA_encoding
    287 Decompressing  = LZMA_decoding + Filter_decoding
    288 
    289 Compressing and decompressing speed of such filters is very high,
    290 so it will not increase decompressing time too much.
    291 Moreover, it reduces decompression time for LZMA_decoding, 
    292 since compression ratio with filtering is higher.
    293 
    294 These filters convert CALL (calling procedure) instructions 
    295 from relative offsets to absolute addresses, so such data becomes more 
    296 compressible.
    297 
    298 For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
    299 
    300 
    301 LZMA compressed file format
    302 ---------------------------
    303 Offset Size Description
    304   0     1   Special LZMA properties (lc,lp, pb in encoded form)
    305   1     4   Dictionary size (little endian)
    306   5     8   Uncompressed size (little endian). -1 means unknown size
    307  13         Compressed data
    308 
    309 
    310 ANSI-C LZMA Decoder
    311 ~~~~~~~~~~~~~~~~~~~
    312 
    313 Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
    314 If you want to use old interfaces you can download previous version of LZMA SDK
    315 from sourceforge.net site.
    316 
    317 To use ANSI-C LZMA Decoder you need the following files:
    318 1) LzmaDec.h + LzmaDec.c + Types.h
    319 LzmaUtil/LzmaUtil.c is example application that uses these files.
    320 
    321 
    322 Memory requirements for LZMA decoding
    323 -------------------------------------
    324 
    325 Stack usage of LZMA decoding function for local variables is not 
    326 larger than 200-400 bytes.
    327 
    328 LZMA Decoder uses dictionary buffer and internal state structure.
    329 Internal state structure consumes
    330   state_size = (4 + (1.5 << (lc + lp))) KB
    331 by default (lc=3, lp=0), state_size = 16 KB.
    332 
    333 
    334 How To decompress data
    335 ----------------------
    336 
    337 LZMA Decoder (ANSI-C version) now supports 2 interfaces:
    338 1) Single-call Decompressing
    339 2) Multi-call State Decompressing (zlib-like interface)
    340 
    341 You must use external allocator:
    342 Example:
    343 void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
    344 void SzFree(void *p, void *address) { p = p; free(address); }
    345 ISzAlloc alloc = { SzAlloc, SzFree };
    346 
    347 You can use p = p; operator to disable compiler warnings.
    348 
    349 
    350 Single-call Decompressing
    351 -------------------------
    352 When to use: RAM->RAM decompressing
    353 Compile files: LzmaDec.h + LzmaDec.c + Types.h
    354 Compile defines: no defines
    355 Memory Requirements:
    356   - Input buffer: compressed size
    357   - Output buffer: uncompressed size
    358   - LZMA Internal Structures: state_size (16 KB for default settings) 
    359 
    360 Interface:
    361   int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
    362       const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, 
    363       ELzmaStatus *status, ISzAlloc *alloc);
    364   In: 
    365     dest     - output data
    366     destLen  - output data size
    367     src      - input data
    368     srcLen   - input data size
    369     propData - LZMA properties  (5 bytes)
    370     propSize - size of propData buffer (5 bytes)
    371     finishMode - It has meaning only if the decoding reaches output limit (*destLen).
    372          LZMA_FINISH_ANY - Decode just destLen bytes.
    373          LZMA_FINISH_END - Stream must be finished after (*destLen).
    374                            You can use LZMA_FINISH_END, when you know that 
    375                            current output buffer covers last bytes of stream. 
    376     alloc    - Memory allocator.
    377 
    378   Out: 
    379     destLen  - processed output size 
    380     srcLen   - processed input size 
    381 
    382   Output:
    383     SZ_OK
    384       status:
    385         LZMA_STATUS_FINISHED_WITH_MARK
    386         LZMA_STATUS_NOT_FINISHED 
    387         LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
    388     SZ_ERROR_DATA - Data error
    389     SZ_ERROR_MEM  - Memory allocation error
    390     SZ_ERROR_UNSUPPORTED - Unsupported properties
    391     SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
    392 
    393   If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
    394   and output value of destLen will be less than output buffer size limit.
    395 
    396   You can use multiple checks to test data integrity after full decompression:
    397     1) Check Result and "status" variable.
    398     2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
    399     3) Check that output(srcLen) = compressedSize, if you know real compressedSize. 
    400        You must use correct finish mode in that case. */ 
    401 
    402 
    403 Multi-call State Decompressing (zlib-like interface)
    404 ----------------------------------------------------
    405 
    406 When to use: file->file decompressing 
    407 Compile files: LzmaDec.h + LzmaDec.c + Types.h
    408 
    409 Memory Requirements:
    410  - Buffer for input stream: any size (for example, 16 KB)
    411  - Buffer for output stream: any size (for example, 16 KB)
    412  - LZMA Internal Structures: state_size (16 KB for default settings) 
    413  - LZMA dictionary (dictionary size is encoded in LZMA properties header)
    414 
    415 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
    416    unsigned char header[LZMA_PROPS_SIZE + 8];
    417    ReadFile(inFile, header, sizeof(header)
    418 
    419 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
    420 
    421   CLzmaDec state;
    422   LzmaDec_Constr(&state);
    423   res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
    424   if (res != SZ_OK)
    425     return res;
    426 
    427 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
    428 
    429   LzmaDec_Init(&state);
    430   for (;;)
    431   {
    432     ... 
    433     int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, 
    434         const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
    435     ...
    436   }
    437 
    438 
    439 4) Free all allocated structures
    440   LzmaDec_Free(&state, &g_Alloc);
    441 
    442 For full code example, look at C/LzmaUtil/LzmaUtil.c code.
    443 
    444 
    445 How To compress data
    446 --------------------
    447 
    448 Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
    449 LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
    450 
    451 Memory Requirements:
    452   - (dictSize * 11.5 + 6 MB) + state_size
    453 
    454 Lzma Encoder can use two memory allocators:
    455 1) alloc - for small arrays.
    456 2) allocBig - for big arrays.
    457 
    458 For example, you can use Large RAM Pages (2 MB) in allocBig allocator for 
    459 better compression speed. Note that Windows has bad implementation for 
    460 Large RAM Pages. 
    461 It's OK to use same allocator for alloc and allocBig.
    462 
    463 
    464 Single-call Compression with callbacks
    465 --------------------------------------
    466 
    467 Check C/LzmaUtil/LzmaUtil.c as example, 
    468 
    469 When to use: file->file decompressing 
    470 
    471 1) you must implement callback structures for interfaces:
    472 ISeqInStream
    473 ISeqOutStream
    474 ICompressProgress
    475 ISzAlloc
    476 
    477 static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
    478 static void SzFree(void *p, void *address) {  p = p; MyFree(address); }
    479 static ISzAlloc g_Alloc = { SzAlloc, SzFree };
    480 
    481   CFileSeqInStream inStream;
    482   CFileSeqOutStream outStream;
    483 
    484   inStream.funcTable.Read = MyRead;
    485   inStream.file = inFile;
    486   outStream.funcTable.Write = MyWrite;
    487   outStream.file = outFile;
    488 
    489 
    490 2) Create CLzmaEncHandle object;
    491 
    492   CLzmaEncHandle enc;
    493 
    494   enc = LzmaEnc_Create(&g_Alloc);
    495   if (enc == 0)
    496     return SZ_ERROR_MEM;
    497 
    498 
    499 3) initialize CLzmaEncProps properties;
    500 
    501   LzmaEncProps_Init(&props);
    502 
    503   Then you can change some properties in that structure.
    504 
    505 4) Send LZMA properties to LZMA Encoder
    506 
    507   res = LzmaEnc_SetProps(enc, &props);
    508 
    509 5) Write encoded properties to header
    510 
    511     Byte header[LZMA_PROPS_SIZE + 8];
    512     size_t headerSize = LZMA_PROPS_SIZE;
    513     UInt64 fileSize;
    514     int i;
    515 
    516     res = LzmaEnc_WriteProperties(enc, header, &headerSize);
    517     fileSize = MyGetFileLength(inFile);
    518     for (i = 0; i < 8; i++)
    519       header[headerSize++] = (Byte)(fileSize >> (8 * i));
    520     MyWriteFileAndCheck(outFile, header, headerSize)
    521 
    522 6) Call encoding function:
    523       res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, 
    524         NULL, &g_Alloc, &g_Alloc);
    525 
    526 7) Destroy LZMA Encoder Object
    527   LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
    528 
    529 
    530 If callback function return some error code, LzmaEnc_Encode also returns that code
    531 or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
    532 
    533 
    534 Single-call RAM->RAM Compression
    535 --------------------------------
    536 
    537 Single-call RAM->RAM Compression is similar to Compression with callbacks,
    538 but you provide pointers to buffers instead of pointers to stream callbacks:
    539 
    540 HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
    541     CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, 
    542     ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
    543 
    544 Return code:
    545   SZ_OK               - OK
    546   SZ_ERROR_MEM        - Memory allocation error 
    547   SZ_ERROR_PARAM      - Incorrect paramater
    548   SZ_ERROR_OUTPUT_EOF - output buffer overflow
    549   SZ_ERROR_THREAD     - errors in multithreading functions (only for Mt version)
    550 
    551 
    552 
    553 Defines
    554 -------
    555 
    556 _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
    557 
    558 _LZMA_PROB32   - It can increase the speed on some 32-bit CPUs, but memory usage for 
    559                  some structures will be doubled in that case.
    560 
    561 _LZMA_UINT32_IS_ULONG  - Define it if int is 16-bit on your compiler and long is 32-bit.
    562 
    563 _LZMA_NO_SYSTEM_SIZE_T  - Define it if you don't want to use size_t type.
    564 
    565 
    566 _7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.
    567 
    568 
    569 C++ LZMA Encoder/Decoder 
    570 ~~~~~~~~~~~~~~~~~~~~~~~~
    571 C++ LZMA code use COM-like interfaces. So if you want to use it, 
    572 you can study basics of COM/OLE.
    573 C++ LZMA code is just wrapper over ANSI-C code.
    574 
    575 
    576 C++ Notes
    577 ~~~~~~~~~~~~~~~~~~~~~~~~
    578 If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
    579 you must check that you correctly work with "new" operator.
    580 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
    581 So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
    582 operator new(size_t size)
    583 {
    584   void *p = ::malloc(size);
    585   if (p == 0)
    586     throw CNewException();
    587   return p;
    588 }
    589 If you use MSCV that throws exception for "new" operator, you can compile without 
    590 "NewHandler.cpp". So standard exception will be used. Actually some code of 
    591 7-Zip catches any exception in internal code and converts it to HRESULT code.
    592 So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
    593 
    594 ---
    595 
    596 http://www.7-zip.org
    597 http://www.7-zip.org/sdk.html
    598 http://www.7-zip.org/support.html
    599