Home | History | Annotate | Download | only in lz4

Lines Matching full:compressed

1 <html><head><title>LZ4 Framing format - stable</title><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">@import url('https://themes.googleusercontent.com/fonts/css?kit=wAPX1HepqA24RkYW1AuHYA');ol{margin:0;padding:0}.c13{max-width:453.6pt;background-color:#ffffff;padding:70.8pt 70.8pt 70.8pt 70.8pt}.c3{font-size:10pt;font-family:"Courier New";font-weight:bold}.c0{font-size:14pt;text-decoration:underline;font-weight:bold}.c8{color:inherit;text-decoration:inherit}.c1{text-decoration:underline;font-weight:bold}.c4{color:#1155cc;text-decoration:underline}.c7{line-height:1.0;padding-bottom:0pt}.c6{margin-left:36pt}.c12{font-style:italic}.c10{text-align:center}.c18{font-size:14pt}.c17{color:#0000ff}.c5{height:11pt}.c15{font-size:18pt}.c11{text-decoration:underline}.c2{direction:ltr}.c9{font-weight:bold}.c16{font-family:"Courier New"}.c14{margin-left:18pt}.title{padding-top:12pt;line-height:1.15;text-align:center;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}.subtitle{padding-top:0pt;line-height:1.15;text-align:center;color:#000000;font-size:11pt;font-family:"Arial";padding-bottom:3pt}li{color:#000000;font-size:11pt;font-family:"Calibri"}p{color:#000000;font-size:11pt;margin:0;font-family:"Calibri"}h1{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:16pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h2{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-style:italic;font-size:14pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h3{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:13pt;font-family:"Arial";font-weight:bold;padding-bottom:3pt}h4{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:14pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}h5{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-style:italic;font-size:13pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}h6{padding-top:12pt;line-height:1.15;text-align:left;color:#000000;font-size:11pt;font-family:"Calibri";font-weight:bold;padding-bottom:3pt}</style></head><body class="c13"><hr><p class="c10 c2"><span class="c15 c9">LZ4 </span><span class="c15 c9">Framing </span><span class="c15 c9">Format</span></p><hr><p class="c5 c2"><span class="c9 c15"></span></p><p class="c2"><span class="c0">Notices</span></p><p class="c2"><span>Copyright (c) 2013-2014 Yann Collet</span></p><p class="c2"><span>Permission is granted to copy and distribute this document for any &nbsp;purpose and without charge, including translations into other &nbsp;languages and incorporation into compilations, provided that the copyright notice and this notice are preserved, and that any substantive changes or deletions from the original are clearly marked.</span></p><p class="c2"><span class="c0">Version</span></p><p class="c2"><span>1.4.1</span></p><h1 class="c2"><a name="h.2z5bl598dfq9"></a><span>Introduction</span></h1><p class="c2"><span>The purpose of this document is to define a lossless compressed data format, that is independent of CPU type, operating system, file system and character set, suitable for File compression, Pipe and streaming compression using the LZ4 algorithm : </span><span class="c11 c17"><a class="c8" href="">http://code.google.com/p/lz4/</a></span></p><p class="c2"><span>The data can be produced or consumed, even for an arbitrarily long sequentially presented input data stream, using only an a priori bounded amount of intermediate storage, and hence can be used in data communications. &nbsp;The format uses the LZ4 compression method, and </span><span class="c4"><a class="c8" href="http://code.google.com/p/xxhash/">xxHash-32</a></span><span>&nbsp;checksum method, for detection of data corruption.</span></p><p class="c2"><span>The data format defined by this specification does not attempt to allow random access to compressed data.</span></p><p class="c2"><span>This specification is intended for use by implementers of software to compress data into LZ4 format and/or decompress data from LZ4 format. The text of the specification assumes a basic background in programming at the level of bits and other primitive data representations.</span></p><p class="c2"><span>Unless otherwise indicated below, </span><span>a compliant compressor must produce data sets that conform to all the specifications presented here.</span></p><p class="c2"><span>A</span><span>&nbsp;compliant decompressor must be able to accept and decompress </span><span>at least one </span><span>data set that conforms to the specifications presented here</span><span>; whenever it does not support any parameter, it must produce a non-ambiguous error code and associated error message explaining which parameter value is unsupported (a typical example being an unsupported buffer size).</span></p><p class="c2"><span>Distribution of this document is unlimited.</span></p><p class="c7 c5 c2"><span></span></p><hr style="page-break-before:always;display:none;"><p class="c7 c5 c2"><span></span></p><p class="c2 c7"><span class="c0">Summary </span><span class="c1">:</span></p><p class="c7 c5 c2"><span></span></p><p class="c2 c14"><span class="c4"><a class="c8" href="#h.2z5bl598dfq9">Introduction</a></span></p><p class="c2 c14"><span class="c4">General structure of </span><span class="c4"><a class="c8" href="#h.1615sutikt7e">LZ4 Framing Format</a></span></p><p class="c2 c6"><span class="c4">Frame </span><span class="c4"><a class="c8" href="#h.uof0plru1f66">Descriptor</a></span></p><p class="c6 c2"><span class="c4"><a class="c8" href="#h.u8dkhfnwqyg">Data Blocks</a></span></p><p class="c2 c14"><span class="c4"><a class="c8" href="#h.152pfqac8luc">Skippable </a></span><span class="c4">Frames</span></p><p class="c2 c14"><span class="c4"><a class="c8" href="#h.ujcdmapf87vn">Legacy format</a></span></p><p class="c2 c14"><span class="c4"><a class="c8" href="#h.zij6fhosmkvv">Appendix</a></span></p><p class="c5 c2"><span></span></p><p class="c5 c2"><span class="c0"></span></p><hr style="page-break-before:always;display:none;"><p class="c5 c2"><span class="c0"></span></p><h1 class="c2"><a name="h.1615sutikt7e"></a><span class="c11">General Structure of </span><span class="c11">LZ4 Framing format</span></h1><p class="c5 c2 c10"><span class="c0"></span></p><p class="c10 c2"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 408.00px; height: 106.00px;"><img alt="LZ4 Framing Format - General Structure.png" src="images/image05.png" style="width: 408.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c5 c2"><span class="c0"></span></p><p class="c2"><span class="c1">Magic Number</span></p><p class="c2"><span>4 Bytes, </span><span class="c11">Little endian</span><span>&nbsp;format.<br>Value : </span><span class="c9 c16">0x184D2204</span></p><p class="c5 c2"><span class="c3"></span></p><p class="c2"><span class="c1">Frame D</span><span class="c1">escriptor</span></p><p class="c2"><span>3</span><span>&nbsp;to 1</span><span>5</span><span>&nbsp;Bytes, to be detailed </span><span>in the next part.</span><span><br>Most </span><span>significant </span><span>part of the spec.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Data Blocks</span></p><p class="c2"><span>To be detailed later on.<br>That&rsquo;s where compressed data is stored.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">EndMark</span></p><p class="c2"><span>The flow of </span><span>blocks </span><span>ends when the last data block has a size of &ldquo;</span><span class="c9">0</span><span>&rdquo;. </span><span><br></span><span>The size is expressed as </span><span>a </span><span>32-bits value.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Content Checksum</span></p><p class="c2"><span>Content Checksum verify that the full content has been decoded correctly.<br>The content checksum is the result of </span><span class="c4"><a class="c8" href="http://code.google.com/p/xxhash/">xxh32()</a></span><span>&nbsp;hash function digesting the original (decoded) data as input, and a seed of zero.<br>Content checksum is only present when its </span><span class="c4"><a class="c8" href="#id.s5zerkv6retr">associated flag </a></span><span>is set in the framing descriptor. Content Checksum validates the result, that all blocks were fully transmitted in the correct order and without error, and also that the encoding/decoding process itself generated no distortion. Its usage is recommended. </span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Frame Concatenation</span></p><p class="c2"><span>In some circumstances, it may be preferable </span><span>to append multiple frames, </span><span>for example </span><span>in order to add new data to an existing compressed file without re-framing it.</span></p><p class="c2"><span>In such case, each frame has its own set of descriptor flags. Each frame is considered independent. The only relation between frames is their sequential order.</span></p><p class="c2"><span>The ability to decode multiple concatenated frames within a single stream or file is left outside of this specification. While a logical default behavior could be to decode the frames in their sequential order, this is not a requirement. </span></p><p class="c5 c2"><span></span></p><hr style="page-break-before:always;display:none;"><p class="c5 c2"><span></span></p><h2 class="c2"><a name="h.uof0plru1f66"></a><span class="c11">Frame </span><span class="c11">Descriptor</span></h2><p class="c10 c2"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 292.00px; height: 106.00px;"><img alt="LZ4 Framing Format - Frame Descriptor.png" src="images/image03.png" style="width: 292.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c10 c2"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 606.00px; height: 114.67px;"><img alt="LZ4 Framing Format - Descriptor Flags.png" src="images/image02.png" style="width: 606.00px; height: 114.67px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c2"><span>The descriptor uses a minimum of </span><span>3</span><span>&nbsp;bytes</span><span>, and up to 15 bytes depending on optional parameters.</span><span><br>In the picture, bit 7 is highest bit, while bit 0 is lowest.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Version Number :</span></p><p class="c2"><span>2-bits field, </span><span class="c1">must</span><span class="c9">&nbsp;</span><span>be set to &ldquo;</span><span class="c9">01</span><span>&rdquo;.<br>Any other value cannot be decoded by this </span><span>version of the specification.</span><span><br>Other version numbers will use different flag layouts.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Block </span><span class="c1">Independence </span><span class="c1">flag :</span></p><p class="c2"><span>If this flag is set to &ldquo;1&rdquo;</span><span>, blocks are independent, and can therefore be decoded independently, in parallel.<br>If this flag is set to &ldquo;</span><span>0</span><span>&rdquo;, each block depends on previous ones for decoding (up to LZ4 window size, which is 64 KB). In this case, it&rsquo;s necessary to decode all blocks in sequence.</span></p><p class="c2"><span>Block </span><span>dependency</span><span>&nbsp;improves compression ratio, especially for small blocks. On the other hand, it makes jumps or multi-threaded decoding impossible.</span></p><p class="c5 c2"><span></span></p><a href="#" name="id.r4mqxzdxswxz"></a><p class="c2"><span class="c1">Block checksum flag :</span></p><p class="c2"><span>If this flag is set, e</span><span>ach data block will be followed by a 4-bytes checksum, calculated by using the xxHash-32 algorithm on the raw (compressedcompressed by LZ4.</span></p><p class="c2"><span>All other bits give the size, in bytes, of the following data block (the size does not include the checksum if present).</span></p><p class="c2"><span>Block Size shall never be larger than Block Maximum Size. Such a thing could happen when the original data is incompressible. In this case, such a data block shall be passed in uncompressed format.</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Data</span></p><p class="c2"><span>Where the actual data to decode stands. It might be compressed or not, depending on previous field indications.<br>Uncompressed size of Data can be any size, up to &ldquo;block maximum size&rdquo;. <br>Note that the data block is not necessarily filled : an arbitrary &ldquo;flush&rdquo; may happen anytime. Any block can be </span><span>&ldquo;partially filled&rdquo;.</span></p><p class="c5 c2"><span></span></p><a href="#" name="id.3p4pcqe6ab8n"></a><p class="c2"><span class="c1">Block checksum :</span></p><p class="c2"><span>Only present if the </span><span class="c4"><a class="c8" href="#id.r4mqxzdxswxz">associated flag is set</a></span><span>.<br>This is a 4-bytes checksum value, in little endian format, <br>calculated by using the xxHash-32 algorithm </span><span class="c11">on the raw (undecoded) data block</span><span>, <br>and a seed of zero.</span><span><br>The intention is to detect data corruption (storage or transmission errors) </span><span class="c12">before </span><span>decoding.</span></p><p class="c2"><span>Block checksum is cumulative with Content checksum.</span></p><hr style="page-break-before:always;display:none;"><p class="c5 c2"><span class="c0"></span></p><h1 class="c2"><a name="h.152pfqac8luc"></a><span class="c11">Skippable Frames</span></h1><p class="c10 c2"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 285.00px; height: 106.00px;"><img alt="LZ4 Framing Format - Skippable Frame.png" src="images/image01.png" style="width: 285.00px; height: 106.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c2"><span>Skippable frames allow the integration of user-defined data into a flow of concatenated frames.<br>Its design is pretty straightforward, with the sole objective to allow the decoder to quickly skip over user-defined data and continue decoding.</span></p><p class="c2"><span>For the purpose of facilitating identification, it is discouraged to start a flow of concatenated frames with a skippable frame. If there is a need to start such a flow with some user data encapsulated into a skippable frame, it&rsquo;s recommended to start will a zero-byte LZ4 frame followed by a skippable frame. This will make it easier for file type identifiers.</span></p><p class="c2"><span>&nbsp;</span></p><p class="c2"><span class="c1">Magic Number</span></p><p class="c2"><span>4 Bytes, </span><span class="c11">Little endian</span><span>&nbsp;format.<br>Value : </span><span class="c3">0x184D2A5X</span><span>, which means any value from</span><span class="c3">&nbsp;0x184D2A50 to 0x184D2A5F.</span><span>&nbsp;All 16 values are valid to identify a skippable frame.<br></span></p><p class="c2"><span class="c1">Frame Size</span><span class="c1">&nbsp;</span></p><p class="c2"><span>This is the size, in bytes, of the following User Data (without including the magic number nor the size field itself).<br>4 Bytes, </span><span class="c11">Little endian</span><span>&nbsp;format, unsigned 32-bits.<br>This means User Data can&rsquo;t be bigger than (2^32-1) Bytes.<br></span></p><p class="c2"><span class="c1">User Data</span></p><p class="c2"><span>User Data can be anything. Data will just be skipped by the decoder. </span></p><hr style="page-break-before:always;display:none;"><p class="c5 c2"><span class="c0"></span></p><h1 class="c2"><a name="h.ujcdmapf87vn"></a><span class="c11">Legacy frame</span></h1><p class="c10 c2"><span style="overflow: hidden; display: inline-block; margin: 0.00px 0.00px; border: 0.00px solid #000000; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px); width: 570.00px; height: 90.00px;"><img alt="" src="images/image00.png" style="width: 570.00px; height: 90.00px; margin-left: 0.00px; margin-top: 0.00px; transform: rotate(0.00rad) translateZ(0px); -webkit-transform: rotate(0.00rad) translateZ(0px);" title=""></span></p><p class="c2"><span>The Legacy frame format was defined into the initial versions of &ldquo;LZ4Demo&rdquo;.<br>Newer compressors should not use this format anymore, since it is too restrictive.<br>It is recommended that decompressors shall be able to decode this format during the transition period.</span></p><p class="c2"><span>Main properties of legacy format :<br>- Fixed block size : </span><span>8 MB</span><span>.<br>- All blocks must be completely filled, except the last one.<br>- All blocks are always compressed, even when compression is detri</span><span>mental.</span><span><br>- The last block is detected either because it is followed by the &ldquo;EOF&rdquo; (End of File) mark</span><span>, or because it is followed by a known Frame Magic Number.</span><span><br>- No checksum<br>- Convention is Little endian</span></p><p class="c5 c2"><span></span></p><p class="c2"><span class="c1">Magic Number</span></p><p class="c2"><span>4 Bytes, </span><span class="c11">Little endian</span><span>&nbsp;format.<br>Value : </span><span class="c3">0x184C2102<br></span></p><p class="c2"><span class="c1">Block Compressed Size</span></p><p class="c2"><span>This is the size, in bytes, of the following compressed data block.<br>4 Bytes, </span><span class="c11">Little endian</span><span>&nbsp;format.<br></span></p><p class="c2"><span class="c1">Data</span></p><p class="c2"><span>Where the actual data stands. <br>Data is </span><span class="c11">always</span><span>&nbsp;compressed, even when compression is detrimental (i.e. larger than original size).</span></p><hr style="page-break-before:always;display:none;"><p class="c5 c2"><span class="c0"></span></p><h1 class="c2"><a name="h.zij6fhosmkvv"></a><span class="c1">Appendix </span><span>&nbsp;</span></h1><p class="c2"><span class="c18">Version changes</span></p><p class="c2"><span>1.4.1 : changed wording from &ldquo;stream&rdquo; to &ldquo;frame&rdquo;</span></p><p class="c2"><span>1.4 : added skippable streams, re-added stream checksum </span></p><p class="c2"><span>1.3 : modified header checksum</span></p><p class="c2"><span>1.2 : reduced choice of &ldquo;block size&rdquo;, to postpone decision on &ldquo;dynamic size of BlockSize Field&rdquo;.</span></p><p class="c2"><span>1.1 : optional fields are now part of the descriptor</span></p><p class="c2"><span>1.0 : changed &ldquo;block size&rdquo; specification, adding a compressed