Home | History | Annotate | Download | only in bzip2
      1 <html>
      2 <head>
      3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
      4 <title>bzip2 and libbzip2, version 1.0.6</title>
      5 <meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
      6 <style type="text/css" media="screen">/* Colours:
      7 #74240f  dark brown      h1, h2, h3, h4
      8 #336699  medium blue     links
      9 #339999  turquoise       link hover colour
     10 #202020  almost black    general text
     11 #761596  purple          md5sum text
     12 #626262  dark gray       pre border
     13 #eeeeee  very light gray pre background
     14 #f2f2f9  very light blue nav table background
     15 #3366cc  medium blue     nav table border
     16 */
     17 
     18 a, a:link, a:visited, a:active { color: #336699; }
     19 a:hover { color: #339999; }
     20 
     21 body { font: 80%/126% sans-serif; }
     22 h1, h2, h3, h4 { color: #74240f; }
     23 
     24 dt { color: #336699; font-weight: bold }
     25 dd { 
     26  margin-left: 1.5em; 
     27  padding-bottom: 0.8em;
     28 }
     29 
     30 /* -- ruler -- */
     31 div.hr_blue { 
     32   height:  3px; 
     33   background:#ffffff url("/images/hr_blue.png") repeat-x; }
     34 div.hr_blue hr { display:none; }
     35 
     36 /* release styles */
     37 #release p { margin-top: 0.4em; }
     38 #release .md5sum { color: #761596; }
     39 
     40 
     41 /* ------ styles for docs|manuals|howto ------ */
     42 /* -- lists -- */
     43 ul  { 
     44  margin:     0px 4px 16px 16px;
     45  padding:    0px;
     46  list-style: url("/images/li-blue.png"); 
     47 }
     48 ul li { 
     49  margin-bottom: 10px;
     50 }
     51 ul ul	{ 
     52  list-style-type:  none; 
     53  list-style-image: none; 
     54  margin-left:      0px; 
     55 }
     56 
     57 /* header / footer nav tables */
     58 table.nav {
     59  border:     solid 1px #3366cc;
     60  background: #f2f2f9;
     61  background-color: #f2f2f9;
     62  margin-bottom: 0.5em;
     63 }
     64 /* don't have underlined links in chunked nav menus */
     65 table.nav a { text-decoration: none; }
     66 table.nav a:hover { text-decoration: underline; }
     67 table.nav td { font-size: 85%; }
     68 
     69 code, tt, pre { font-size: 120%; }
     70 code, tt { color: #761596; }
     71 
     72 div.literallayout, pre.programlisting, pre.screen {
     73  color:      #000000;
     74  padding:    0.5em;
     75  background: #eeeeee;
     76  border:     1px solid #626262;
     77  background-color: #eeeeee;
     78  margin: 4px 0px 4px 0px; 
     79 }
     80 </style>
     81 </head>
     82 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="bzip2 and libbzip2, version 1.0.6">
     83 <div class="titlepage">
     84 <div>
     85 <div><h1 class="title">
     86 <a name="userman"></a>bzip2 and libbzip2, version 1.0.6</h1></div>
     87 <div><h2 class="subtitle">A program and library for data compression</h2></div>
     88 <div><div class="authorgroup"><div class="author">
     89 <h3 class="author">
     90 <span class="firstname">Julian</span> <span class="surname">Seward</span>
     91 </h3>
     92 <div class="affiliation"><span class="orgname">http://www.bzip.org<br></span></div>
     93 </div></div></div>
     94 <div><p class="releaseinfo">Version 1.0.6 of 6 September 2010</p></div>
     95 <div><p class="copyright">Copyright  1996-2010 Julian Seward</p></div>
     96 <div><div class="legalnotice" title="Legal Notice">
     97 <a name="id537185"></a><p>This program, <code class="computeroutput">bzip2</code>, the
     98   associated library <code class="computeroutput">libbzip2</code>, and
     99   all documentation, are copyright  1996-2010 Julian Seward.
    100   All rights reserved.</p>
    101 <p>Redistribution and use in source and binary forms, with
    102   or without modification, are permitted provided that the
    103   following conditions are met:</p>
    104 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
    105 <li class="listitem" style="list-style-type: disc"><p>Redistributions of source code must retain the
    106    above copyright notice, this list of conditions and the
    107    following disclaimer.</p></li>
    108 <li class="listitem" style="list-style-type: disc"><p>The origin of this software must not be
    109    misrepresented; you must not claim that you wrote the original
    110    software.  If you use this software in a product, an
    111    acknowledgment in the product documentation would be
    112    appreciated but is not required.</p></li>
    113 <li class="listitem" style="list-style-type: disc"><p>Altered source versions must be plainly marked
    114    as such, and must not be misrepresented as being the original
    115    software.</p></li>
    116 <li class="listitem" style="list-style-type: disc"><p>The name of the author may not be used to
    117    endorse or promote products derived from this software without
    118    specific prior written permission.</p></li>
    119 </ul></div>
    120 <p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY
    121   EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
    122   THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
    123   PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
    124   AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
    125   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
    126   TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    127   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
    128   ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
    129   LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
    130   IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
    131   THE POSSIBILITY OF SUCH DAMAGE.</p>
    132 <p>PATENTS: To the best of my knowledge,
    133  <code class="computeroutput">bzip2</code> and
    134  <code class="computeroutput">libbzip2</code> do not use any patented
    135  algorithms.  However, I do not have the resources to carry
    136  out a patent search.  Therefore I cannot give any guarantee of
    137  the above statement.
    138  </p>
    139 </div></div>
    140 </div>
    141 <hr>
    142 </div>
    143 <div class="toc">
    144 <p><b>Table of Contents</b></p>
    145 <dl>
    146 <dt><span class="chapter"><a href="#intro">1. Introduction</a></span></dt>
    147 <dt><span class="chapter"><a href="#using">2. How to use bzip2</a></span></dt>
    148 <dd><dl>
    149 <dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
    150 <dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
    151 <dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
    152 <dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
    153 <dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
    154 <dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
    155 <dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
    156 <dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
    157 <dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
    158 </dl></dd>
    159 <dt><span class="chapter"><a href="#libprog">3. 
    160 Programming with <code class="computeroutput">libbzip2</code>
    161 </a></span></dt>
    162 <dd><dl>
    163 <dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
    164 <dd><dl>
    165 <dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
    166 <dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
    167 <dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
    168 </dl></dd>
    169 <dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
    170 <dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
    171 <dd><dl>
    172 <dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
    173 <dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
    174 <dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
    175 <dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
    176 <dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
    177 <dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
    178 </dl></dd>
    179 <dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
    180 <dd><dl>
    181 <dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
    182 <dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
    183 <dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
    184 <dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
    185 <dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
    186 <dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
    187 <dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
    188 <dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
    189 <dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
    190 </dl></dd>
    191 <dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
    192 <dd><dl>
    193 <dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
    194 <dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
    195 </dl></dd>
    196 <dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
    197 <dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
    198 <dd><dl>
    199 <dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
    200 <dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
    201 </dl></dd>
    202 <dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
    203 </dl></dd>
    204 <dt><span class="chapter"><a href="#misc">4. Miscellanea</a></span></dt>
    205 <dd><dl>
    206 <dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
    207 <dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
    208 <dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
    209 <dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
    210 <dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
    211 </dl></dd>
    212 </dl>
    213 </div>
    214 <div class="chapter" title="1.Introduction">
    215 <div class="titlepage"><div><div><h2 class="title">
    216 <a name="intro"></a>1.Introduction</h2></div></div></div>
    217 <p><code class="computeroutput">bzip2</code> compresses files
    218 using the Burrows-Wheeler block-sorting text compression
    219 algorithm, and Huffman coding.  Compression is generally
    220 considerably better than that achieved by more conventional
    221 LZ77/LZ78-based compressors, and approaches the performance of
    222 the PPM family of statistical compressors.</p>
    223 <p><code class="computeroutput">bzip2</code> is built on top of
    224 <code class="computeroutput">libbzip2</code>, a flexible library for
    225 handling compressed data in the
    226 <code class="computeroutput">bzip2</code> format.  This manual
    227 describes both how to use the program and how to work with the
    228 library interface.  Most of the manual is devoted to this
    229 library, not the program, which is good news if your interest is
    230 only in the program.</p>
    231 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
    232 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#using" title="2.How to use bzip2">How to use bzip2</a> describes how to use
    233  <code class="computeroutput">bzip2</code>; this is the only part
    234  you need to read if you just want to know how to operate the
    235  program.</p></li>
    236 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#libprog" title="3. Programming with libbzip2">Programming with libbzip2</a> describes the
    237  programming interfaces in detail, and</p></li>
    238 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#misc" title="4.Miscellanea">Miscellanea</a> records some
    239  miscellaneous notes which I thought ought to be recorded
    240  somewhere.</p></li>
    241 </ul></div>
    242 </div>
    243 <div class="chapter" title="2.How to use bzip2">
    244 <div class="titlepage"><div><div><h2 class="title">
    245 <a name="using"></a>2.How to use bzip2</h2></div></div></div>
    246 <div class="toc">
    247 <p><b>Table of Contents</b></p>
    248 <dl>
    249 <dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
    250 <dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
    251 <dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
    252 <dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
    253 <dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
    254 <dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
    255 <dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
    256 <dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
    257 <dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
    258 </dl>
    259 </div>
    260 <p>This chapter contains a copy of the
    261 <code class="computeroutput">bzip2</code> man page, and nothing
    262 else.</p>
    263 <div class="sect1" title="2.1.NAME">
    264 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    265 <a name="name"></a>2.1.NAME</h2></div></div></div>
    266 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
    267 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2</code>,
    268   <code class="computeroutput">bunzip2</code> - a block-sorting file
    269   compressor, v1.0.6</p></li>
    270 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzcat</code> -
    271    decompresses files to stdout</p></li>
    272 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2recover</code> -
    273    recovers data from damaged bzip2 files</p></li>
    274 </ul></div>
    275 </div>
    276 <div class="sect1" title="2.2.SYNOPSIS">
    277 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    278 <a name="synopsis"></a>2.2.SYNOPSIS</h2></div></div></div>
    279 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
    280 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2</code> [
    281   -cdfkqstvzVL123456789 ] [ filenames ...  ]</p></li>
    282 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bunzip2</code> [
    283   -fkvsVL ] [ filenames ...  ]</p></li>
    284 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzcat</code> [ -s ] [
    285   filenames ...  ]</p></li>
    286 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2recover</code>
    287   filename</p></li>
    288 </ul></div>
    289 </div>
    290 <div class="sect1" title="2.3.DESCRIPTION">
    291 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    292 <a name="description"></a>2.3.DESCRIPTION</h2></div></div></div>
    293 <p><code class="computeroutput">bzip2</code> compresses files
    294 using the Burrows-Wheeler block sorting text compression
    295 algorithm, and Huffman coding.  Compression is generally
    296 considerably better than that achieved by more conventional
    297 LZ77/LZ78-based compressors, and approaches the performance of
    298 the PPM family of statistical compressors.</p>
    299 <p>The command-line options are deliberately very similar to
    300 those of GNU <code class="computeroutput">gzip</code>, but they are
    301 not identical.</p>
    302 <p><code class="computeroutput">bzip2</code> expects a list of
    303 file names to accompany the command-line flags.  Each file is
    304 replaced by a compressed version of itself, with the name
    305 <code class="computeroutput">original_name.bz2</code>.  Each
    306 compressed file has the same modification date, permissions, and,
    307 when possible, ownership as the corresponding original, so that
    308 these properties can be correctly restored at decompression time.
    309 File name handling is naive in the sense that there is no
    310 mechanism for preserving original file names, permissions,
    311 ownerships or dates in filesystems which lack these concepts, or
    312 have serious file name length restrictions, such as
    313 MS-DOS.</p>
    314 <p><code class="computeroutput">bzip2</code> and
    315 <code class="computeroutput">bunzip2</code> will by default not
    316 overwrite existing files.  If you want this to happen, specify
    317 the <code class="computeroutput">-f</code> flag.</p>
    318 <p>If no file names are specified,
    319 <code class="computeroutput">bzip2</code> compresses from standard
    320 input to standard output.  In this case,
    321 <code class="computeroutput">bzip2</code> will decline to write
    322 compressed output to a terminal, as this would be entirely
    323 incomprehensible and therefore pointless.</p>
    324 <p><code class="computeroutput">bunzip2</code> (or
    325 <code class="computeroutput">bzip2 -d</code>) decompresses all
    326 specified files.  Files which were not created by
    327 <code class="computeroutput">bzip2</code> will be detected and
    328 ignored, and a warning issued.
    329 <code class="computeroutput">bzip2</code> attempts to guess the
    330 filename for the decompressed file from that of the compressed
    331 file as follows:</p>
    332 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
    333 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.bz2 </code>
    334   becomes
    335   <code class="computeroutput">filename</code></p></li>
    336 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.bz </code>
    337   becomes
    338   <code class="computeroutput">filename</code></p></li>
    339 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.tbz2</code>
    340   becomes
    341   <code class="computeroutput">filename.tar</code></p></li>
    342 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.tbz </code>
    343   becomes
    344   <code class="computeroutput">filename.tar</code></p></li>
    345 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">anyothername </code>
    346   becomes
    347   <code class="computeroutput">anyothername.out</code></p></li>
    348 </ul></div>
    349 <p>If the file does not end in one of the recognised endings,
    350 <code class="computeroutput">.bz2</code>,
    351 <code class="computeroutput">.bz</code>,
    352 <code class="computeroutput">.tbz2</code> or
    353 <code class="computeroutput">.tbz</code>,
    354 <code class="computeroutput">bzip2</code> complains that it cannot
    355 guess the name of the original file, and uses the original name
    356 with <code class="computeroutput">.out</code> appended.</p>
    357 <p>As with compression, supplying no filenames causes
    358 decompression from standard input to standard output.</p>
    359 <p><code class="computeroutput">bunzip2</code> will correctly
    360 decompress a file which is the concatenation of two or more
    361 compressed files.  The result is the concatenation of the
    362 corresponding uncompressed files.  Integrity testing
    363 (<code class="computeroutput">-t</code>) of concatenated compressed
    364 files is also supported.</p>
    365 <p>You can also compress or decompress files to the standard
    366 output by giving the <code class="computeroutput">-c</code> flag.
    367 Multiple files may be compressed and decompressed like this.  The
    368 resulting outputs are fed sequentially to stdout.  Compression of
    369 multiple files in this manner generates a stream containing
    370 multiple compressed file representations.  Such a stream can be
    371 decompressed correctly only by
    372 <code class="computeroutput">bzip2</code> version 0.9.0 or later.
    373 Earlier versions of <code class="computeroutput">bzip2</code> will
    374 stop after decompressing the first file in the stream.</p>
    375 <p><code class="computeroutput">bzcat</code> (or
    376 <code class="computeroutput">bzip2 -dc</code>) decompresses all
    377 specified files to the standard output.</p>
    378 <p><code class="computeroutput">bzip2</code> will read arguments
    379 from the environment variables
    380 <code class="computeroutput">BZIP2</code> and
    381 <code class="computeroutput">BZIP</code>, in that order, and will
    382 process them before any arguments read from the command line.
    383 This gives a convenient way to supply default arguments.</p>
    384 <p>Compression is always performed, even if the compressed
    385 file is slightly larger than the original.  Files of less than
    386 about one hundred bytes tend to get larger, since the compression
    387 mechanism has a constant overhead in the region of 50 bytes.
    388 Random data (including the output of most file compressors) is
    389 coded at about 8.05 bits per byte, giving an expansion of around
    390 0.5%.</p>
    391 <p>As a self-check for your protection,
    392 <code class="computeroutput">bzip2</code> uses 32-bit CRCs to make
    393 sure that the decompressed version of a file is identical to the
    394 original.  This guards against corruption of the compressed data,
    395 and against undetected bugs in
    396 <code class="computeroutput">bzip2</code> (hopefully very unlikely).
    397 The chances of data corruption going undetected is microscopic,
    398 about one chance in four billion for each file processed.  Be
    399 aware, though, that the check occurs upon decompression, so it
    400 can only tell you that something is wrong.  It can't help you
    401 recover the original uncompressed data.  You can use
    402 <code class="computeroutput">bzip2recover</code> to try to recover
    403 data from damaged files.</p>
    404 <p>Return values: 0 for a normal exit, 1 for environmental
    405 problems (file not found, invalid flags, I/O errors, etc.), 2
    406 to indicate a corrupt compressed file, 3 for an internal
    407 consistency error (eg, bug) which caused
    408 <code class="computeroutput">bzip2</code> to panic.</p>
    409 </div>
    410 <div class="sect1" title="2.4.OPTIONS">
    411 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    412 <a name="options"></a>2.4.OPTIONS</h2></div></div></div>
    413 <div class="variablelist"><dl>
    414 <dt><span class="term"><code class="computeroutput">-c --stdout</code></span></dt>
    415 <dd><p>Compress or decompress to standard
    416   output.</p></dd>
    417 <dt><span class="term"><code class="computeroutput">-d --decompress</code></span></dt>
    418 <dd><p>Force decompression.
    419   <code class="computeroutput">bzip2</code>,
    420   <code class="computeroutput">bunzip2</code> and
    421   <code class="computeroutput">bzcat</code> are really the same
    422   program, and the decision about what actions to take is done on
    423   the basis of which name is used.  This flag overrides that
    424   mechanism, and forces bzip2 to decompress.</p></dd>
    425 <dt><span class="term"><code class="computeroutput">-z --compress</code></span></dt>
    426 <dd><p>The complement to
    427   <code class="computeroutput">-d</code>: forces compression,
    428   regardless of the invokation name.</p></dd>
    429 <dt><span class="term"><code class="computeroutput">-t --test</code></span></dt>
    430 <dd><p>Check integrity of the specified file(s), but
    431   don't decompress them.  This really performs a trial
    432   decompression and throws away the result.</p></dd>
    433 <dt><span class="term"><code class="computeroutput">-f --force</code></span></dt>
    434 <dd>
    435 <p>Force overwrite of output files.  Normally,
    436   <code class="computeroutput">bzip2</code> will not overwrite
    437   existing output files.  Also forces
    438   <code class="computeroutput">bzip2</code> to break hard links to
    439   files, which it otherwise wouldn't do.</p>
    440 <p><code class="computeroutput">bzip2</code> normally declines
    441   to decompress files which don't have the correct magic header
    442   bytes. If forced (<code class="computeroutput">-f</code>),
    443   however, it will pass such files through unmodified. This is
    444   how GNU <code class="computeroutput">gzip</code> behaves.</p>
    445 </dd>
    446 <dt><span class="term"><code class="computeroutput">-k --keep</code></span></dt>
    447 <dd><p>Keep (don't delete) input files during
    448   compression or decompression.</p></dd>
    449 <dt><span class="term"><code class="computeroutput">-s --small</code></span></dt>
    450 <dd>
    451 <p>Reduce memory usage, for compression,
    452   decompression and testing.  Files are decompressed and tested
    453   using a modified algorithm which only requires 2.5 bytes per
    454   block byte.  This means any file can be decompressed in 2300k
    455   of memory, albeit at about half the normal speed.</p>
    456 <p>During compression, <code class="computeroutput">-s</code>
    457   selects a block size of 200k, which limits memory use to around
    458   the same figure, at the expense of your compression ratio.  In
    459   short, if your machine is low on memory (8 megabytes or less),
    460   use <code class="computeroutput">-s</code> for everything.  See
    461   <a class="xref" href="#memory-management" title="2.5.MEMORY MANAGEMENT">MEMORY MANAGEMENT</a> below.</p>
    462 </dd>
    463 <dt><span class="term"><code class="computeroutput">-q --quiet</code></span></dt>
    464 <dd><p>Suppress non-essential warning messages.
    465   Messages pertaining to I/O errors and other critical events
    466   will not be suppressed.</p></dd>
    467 <dt><span class="term"><code class="computeroutput">-v --verbose</code></span></dt>
    468 <dd><p>Verbose mode -- show the compression ratio for
    469   each file processed.  Further
    470   <code class="computeroutput">-v</code>'s increase the verbosity
    471   level, spewing out lots of information which is primarily of
    472   interest for diagnostic purposes.</p></dd>
    473 <dt><span class="term"><code class="computeroutput">-L --license -V --version</code></span></dt>
    474 <dd><p>Display the software version, license terms and
    475   conditions.</p></dd>
    476 <dt><span class="term"><code class="computeroutput">-1</code> (or
    477  <code class="computeroutput">--fast</code>) to
    478  <code class="computeroutput">-9</code> (or
    479  <code class="computeroutput">-best</code>)</span></dt>
    480 <dd><p>Set the block size to 100 k, 200 k ...  900 k
    481   when compressing.  Has no effect when decompressing.  See <a class="xref" href="#memory-management" title="2.5.MEMORY MANAGEMENT">MEMORY MANAGEMENT</a> below.  The
    482   <code class="computeroutput">--fast</code> and
    483   <code class="computeroutput">--best</code> aliases are primarily
    484   for GNU <code class="computeroutput">gzip</code> compatibility.
    485   In particular, <code class="computeroutput">--fast</code> doesn't
    486   make things significantly faster.  And
    487   <code class="computeroutput">--best</code> merely selects the
    488   default behaviour.</p></dd>
    489 <dt><span class="term"><code class="computeroutput">--</code></span></dt>
    490 <dd><p>Treats all subsequent arguments as file names,
    491   even if they start with a dash.  This is so you can handle
    492   files with names beginning with a dash, for example:
    493   <code class="computeroutput">bzip2 --
    494   -myfilename</code>.</p></dd>
    495 <dt>
    496 <span class="term"><code class="computeroutput">--repetitive-fast</code>, </span><span class="term"><code class="computeroutput">--repetitive-best</code></span>
    497 </dt>
    498 <dd><p>These flags are redundant in versions 0.9.5 and
    499   above.  They provided some coarse control over the behaviour of
    500   the sorting algorithm in earlier versions, which was sometimes
    501   useful.  0.9.5 and above have an improved algorithm which
    502   renders these flags irrelevant.</p></dd>
    503 </dl></div>
    504 </div>
    505 <div class="sect1" title="2.5.MEMORY MANAGEMENT">
    506 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    507 <a name="memory-management"></a>2.5.MEMORY MANAGEMENT</h2></div></div></div>
    508 <p><code class="computeroutput">bzip2</code> compresses large
    509 files in blocks.  The block size affects both the compression
    510 ratio achieved, and the amount of memory needed for compression
    511 and decompression.  The flags <code class="computeroutput">-1</code>
    512 through <code class="computeroutput">-9</code> specify the block
    513 size to be 100,000 bytes through 900,000 bytes (the default)
    514 respectively.  At decompression time, the block size used for
    515 compression is read from the header of the compressed file, and
    516 <code class="computeroutput">bunzip2</code> then allocates itself
    517 just enough memory to decompress the file.  Since block sizes are
    518 stored in compressed files, it follows that the flags
    519 <code class="computeroutput">-1</code> to
    520 <code class="computeroutput">-9</code> are irrelevant to and so
    521 ignored during decompression.</p>
    522 <p>Compression and decompression requirements, in bytes, can be
    523 estimated as:</p>
    524 <pre class="programlisting">Compression:   400k + ( 8 x block size )
    525 
    526 Decompression: 100k + ( 4 x block size ), or
    527                100k + ( 2.5 x block size )</pre>
    528 <p>Larger block sizes give rapidly diminishing marginal
    529 returns.  Most of the compression comes from the first two or
    530 three hundred k of block size, a fact worth bearing in mind when
    531 using <code class="computeroutput">bzip2</code> on small machines.
    532 It is also important to appreciate that the decompression memory
    533 requirement is set at compression time by the choice of block
    534 size.</p>
    535 <p>For files compressed with the default 900k block size,
    536 <code class="computeroutput">bunzip2</code> will require about 3700
    537 kbytes to decompress.  To support decompression of any file on a
    538 4 megabyte machine, <code class="computeroutput">bunzip2</code> has
    539 an option to decompress using approximately half this amount of
    540 memory, about 2300 kbytes.  Decompression speed is also halved,
    541 so you should use this option only where necessary.  The relevant
    542 flag is <code class="computeroutput">-s</code>.</p>
    543 <p>In general, try and use the largest block size memory
    544 constraints allow, since that maximises the compression achieved.
    545 Compression and decompression speed are virtually unaffected by
    546 block size.</p>
    547 <p>Another significant point applies to files which fit in a
    548 single block -- that means most files you'd encounter using a
    549 large block size.  The amount of real memory touched is
    550 proportional to the size of the file, since the file is smaller
    551 than a block.  For example, compressing a file 20,000 bytes long
    552 with the flag <code class="computeroutput">-9</code> will cause the
    553 compressor to allocate around 7600k of memory, but only touch
    554 400k + 20000 * 8 = 560 kbytes of it.  Similarly, the decompressor
    555 will allocate 3700k but only touch 100k + 20000 * 4 = 180
    556 kbytes.</p>
    557 <p>Here is a table which summarises the maximum memory usage
    558 for different block sizes.  Also recorded is the total compressed
    559 size for 14 files of the Calgary Text Compression Corpus
    560 totalling 3,141,622 bytes.  This column gives some feel for how
    561 compression varies with block size.  These figures tend to
    562 understate the advantage of larger block sizes for larger files,
    563 since the Corpus is dominated by smaller files.</p>
    564 <pre class="programlisting">        Compress   Decompress   Decompress   Corpus
    565 Flag     usage      usage       -s usage     Size
    566 
    567  -1      1200k       500k         350k      914704
    568  -2      2000k       900k         600k      877703
    569  -3      2800k      1300k         850k      860338
    570  -4      3600k      1700k        1100k      846899
    571  -5      4400k      2100k        1350k      845160
    572  -6      5200k      2500k        1600k      838626
    573  -7      6100k      2900k        1850k      834096
    574  -8      6800k      3300k        2100k      828642
    575  -9      7600k      3700k        2350k      828642</pre>
    576 </div>
    577 <div class="sect1" title="2.6.RECOVERING DATA FROM DAMAGED FILES">
    578 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    579 <a name="recovering"></a>2.6.RECOVERING DATA FROM DAMAGED FILES</h2></div></div></div>
    580 <p><code class="computeroutput">bzip2</code> compresses files in
    581 blocks, usually 900kbytes long.  Each block is handled
    582 independently.  If a media or transmission error causes a
    583 multi-block <code class="computeroutput">.bz2</code> file to become
    584 damaged, it may be possible to recover data from the undamaged
    585 blocks in the file.</p>
    586 <p>The compressed representation of each block is delimited by
    587 a 48-bit pattern, which makes it possible to find the block
    588 boundaries with reasonable certainty.  Each block also carries
    589 its own 32-bit CRC, so damaged blocks can be distinguished from
    590 undamaged ones.</p>
    591 <p><code class="computeroutput">bzip2recover</code> is a simple
    592 program whose purpose is to search for blocks in
    593 <code class="computeroutput">.bz2</code> files, and write each block
    594 out into its own <code class="computeroutput">.bz2</code> file.  You
    595 can then use <code class="computeroutput">bzip2 -t</code> to test
    596 the integrity of the resulting files, and decompress those which
    597 are undamaged.</p>
    598 <p><code class="computeroutput">bzip2recover</code> takes a
    599 single argument, the name of the damaged file, and writes a
    600 number of files <code class="computeroutput">rec0001file.bz2</code>,
    601 <code class="computeroutput">rec0002file.bz2</code>, etc, containing
    602 the extracted blocks.  The output filenames are designed so that
    603 the use of wildcards in subsequent processing -- for example,
    604 <code class="computeroutput">bzip2 -dc rec*file.bz2 &gt;
    605 recovered_data</code> -- lists the files in the correct
    606 order.</p>
    607 <p><code class="computeroutput">bzip2recover</code> should be of
    608 most use dealing with large <code class="computeroutput">.bz2</code>
    609 files, as these will contain many blocks.  It is clearly futile
    610 to use it on damaged single-block files, since a damaged block
    611 cannot be recovered.  If you wish to minimise any potential data
    612 loss through media or transmission errors, you might consider
    613 compressing with a smaller block size.</p>
    614 </div>
    615 <div class="sect1" title="2.7.PERFORMANCE NOTES">
    616 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    617 <a name="performance"></a>2.7.PERFORMANCE NOTES</h2></div></div></div>
    618 <p>The sorting phase of compression gathers together similar
    619 strings in the file.  Because of this, files containing very long
    620 runs of repeated symbols, like "aabaabaabaab ..."  (repeated
    621 several hundred times) may compress more slowly than normal.
    622 Versions 0.9.5 and above fare much better than previous versions
    623 in this respect.  The ratio between worst-case and average-case
    624 compression time is in the region of 10:1.  For previous
    625 versions, this figure was more like 100:1.  You can use the
    626 <code class="computeroutput">-vvvv</code> option to monitor progress
    627 in great detail, if you want.</p>
    628 <p>Decompression speed is unaffected by these
    629 phenomena.</p>
    630 <p><code class="computeroutput">bzip2</code> usually allocates
    631 several megabytes of memory to operate in, and then charges all
    632 over it in a fairly random fashion.  This means that performance,
    633 both for compressing and decompressing, is largely determined by
    634 the speed at which your machine can service cache misses.
    635 Because of this, small changes to the code to reduce the miss
    636 rate have been observed to give disproportionately large
    637 performance improvements.  I imagine
    638 <code class="computeroutput">bzip2</code> will perform best on
    639 machines with very large caches.</p>
    640 </div>
    641 <div class="sect1" title="2.8.CAVEATS">
    642 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    643 <a name="caveats"></a>2.8.CAVEATS</h2></div></div></div>
    644 <p>I/O error messages are not as helpful as they could be.
    645 <code class="computeroutput">bzip2</code> tries hard to detect I/O
    646 errors and exit cleanly, but the details of what the problem is
    647 sometimes seem rather misleading.</p>
    648 <p>This manual page pertains to version 1.0.6 of
    649 <code class="computeroutput">bzip2</code>.  Compressed data created by
    650 this version is entirely forwards and backwards compatible with the
    651 previous public releases, versions 0.1pl2, 0.9.0 and 0.9.5, 1.0.0,
    652 1.0.1, 1.0.2 and 1.0.3, but with the following exception: 0.9.0 and
    653 above can correctly decompress multiple concatenated compressed files.
    654 0.1pl2 cannot do this; it will stop after decompressing just the first
    655 file in the stream.</p>
    656 <p><code class="computeroutput">bzip2recover</code> versions
    657 prior to 1.0.2 used 32-bit integers to represent bit positions in
    658 compressed files, so it could not handle compressed files more
    659 than 512 megabytes long.  Versions 1.0.2 and above use 64-bit ints
    660 on some platforms which support them (GNU supported targets, and
    661 Windows). To establish whether or not
    662 <code class="computeroutput">bzip2recover</code> was built with such
    663 a limitation, run it without arguments. In any event you can
    664 build yourself an unlimited version if you can recompile it with
    665 <code class="computeroutput">MaybeUInt64</code> set to be an
    666 unsigned 64-bit integer.</p>
    667 </div>
    668 <div class="sect1" title="2.9.AUTHOR">
    669 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    670 <a name="author"></a>2.9.AUTHOR</h2></div></div></div>
    671 <p>Julian Seward,
    672 <code class="computeroutput">jseward (a] bzip.org</code></p>
    673 <p>The ideas embodied in
    674 <code class="computeroutput">bzip2</code> are due to (at least) the
    675 following people: Michael Burrows and David Wheeler (for the
    676 block sorting transformation), David Wheeler (again, for the
    677 Huffman coder), Peter Fenwick (for the structured coding model in
    678 the original <code class="computeroutput">bzip</code>, and many
    679 refinements), and Alistair Moffat, Radford Neal and Ian Witten
    680 (for the arithmetic coder in the original
    681 <code class="computeroutput">bzip</code>).  I am much indebted for
    682 their help, support and advice.  See the manual in the source
    683 distribution for pointers to sources of documentation.  Christian
    684 von Roques encouraged me to look for faster sorting algorithms,
    685 so as to speed up compression.  Bela Lubkin encouraged me to
    686 improve the worst-case compression performance.  
    687 Donna Robinson XMLised the documentation.
    688 Many people sent
    689 patches, helped with portability problems, lent machines, gave
    690 advice and were generally helpful.</p>
    691 </div>
    692 </div>
    693 <div class="chapter" title="3. Programming with libbzip2">
    694 <div class="titlepage"><div><div><h2 class="title">
    695 <a name="libprog"></a>3.
    696 Programming with <code class="computeroutput">libbzip2</code>
    697 </h2></div></div></div>
    698 <div class="toc">
    699 <p><b>Table of Contents</b></p>
    700 <dl>
    701 <dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
    702 <dd><dl>
    703 <dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
    704 <dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
    705 <dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
    706 </dl></dd>
    707 <dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
    708 <dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
    709 <dd><dl>
    710 <dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
    711 <dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
    712 <dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
    713 <dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
    714 <dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
    715 <dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
    716 </dl></dd>
    717 <dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
    718 <dd><dl>
    719 <dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
    720 <dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
    721 <dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
    722 <dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
    723 <dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
    724 <dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
    725 <dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
    726 <dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
    727 <dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
    728 </dl></dd>
    729 <dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
    730 <dd><dl>
    731 <dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
    732 <dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
    733 </dl></dd>
    734 <dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
    735 <dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
    736 <dd><dl>
    737 <dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
    738 <dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
    739 </dl></dd>
    740 <dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
    741 </dl>
    742 </div>
    743 <p>This chapter describes the programming interface to
    744 <code class="computeroutput">libbzip2</code>.</p>
    745 <p>For general background information, particularly about
    746 memory use and performance aspects, you'd be well advised to read
    747 <a class="xref" href="#using" title="2.How to use bzip2">How to use bzip2</a> as well.</p>
    748 <div class="sect1" title="3.1.Top-level structure">
    749 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    750 <a name="top-level"></a>3.1.Top-level structure</h2></div></div></div>
    751 <p><code class="computeroutput">libbzip2</code> is a flexible
    752 library for compressing and decompressing data in the
    753 <code class="computeroutput">bzip2</code> data format.  Although
    754 packaged as a single entity, it helps to regard the library as
    755 three separate parts: the low level interface, and the high level
    756 interface, and some utility functions.</p>
    757 <p>The structure of
    758 <code class="computeroutput">libbzip2</code>'s interfaces is similar
    759 to that of Jean-loup Gailly's and Mark Adler's excellent
    760 <code class="computeroutput">zlib</code> library.</p>
    761 <p>All externally visible symbols have names beginning
    762 <code class="computeroutput">BZ2_</code>.  This is new in version
    763 1.0.  The intention is to minimise pollution of the namespaces of
    764 library clients.</p>
    765 <p>To use any part of the library, you need to
    766 <code class="computeroutput">#include &lt;bzlib.h&gt;</code>
    767 into your sources.</p>
    768 <div class="sect2" title="3.1.1.Low-level summary">
    769 <div class="titlepage"><div><div><h3 class="title">
    770 <a name="ll-summary"></a>3.1.1.Low-level summary</h3></div></div></div>
    771 <p>This interface provides services for compressing and
    772 decompressing data in memory.  There's no provision for dealing
    773 with files, streams or any other I/O mechanisms, just straight
    774 memory-to-memory work.  In fact, this part of the library can be
    775 compiled without inclusion of
    776 <code class="computeroutput">stdio.h</code>, which may be helpful
    777 for embedded applications.</p>
    778 <p>The low-level part of the library has no global variables
    779 and is therefore thread-safe.</p>
    780 <p>Six routines make up the low level interface:
    781 <code class="computeroutput">BZ2_bzCompressInit</code>,
    782 <code class="computeroutput">BZ2_bzCompress</code>, and
    783 <code class="computeroutput">BZ2_bzCompressEnd</code> for
    784 compression, and a corresponding trio
    785 <code class="computeroutput">BZ2_bzDecompressInit</code>,
    786 <code class="computeroutput">BZ2_bzDecompress</code> and
    787 <code class="computeroutput">BZ2_bzDecompressEnd</code> for
    788 decompression.  The <code class="computeroutput">*Init</code>
    789 functions allocate memory for compression/decompression and do
    790 other initialisations, whilst the
    791 <code class="computeroutput">*End</code> functions close down
    792 operations and release memory.</p>
    793 <p>The real work is done by
    794 <code class="computeroutput">BZ2_bzCompress</code> and
    795 <code class="computeroutput">BZ2_bzDecompress</code>.  These
    796 compress and decompress data from a user-supplied input buffer to
    797 a user-supplied output buffer.  These buffers can be any size;
    798 arbitrary quantities of data are handled by making repeated calls
    799 to these functions.  This is a flexible mechanism allowing a
    800 consumer-pull style of activity, or producer-push, or a mixture
    801 of both.</p>
    802 </div>
    803 <div class="sect2" title="3.1.2.High-level summary">
    804 <div class="titlepage"><div><div><h3 class="title">
    805 <a name="hl-summary"></a>3.1.2.High-level summary</h3></div></div></div>
    806 <p>This interface provides some handy wrappers around the
    807 low-level interface to facilitate reading and writing
    808 <code class="computeroutput">bzip2</code> format files
    809 (<code class="computeroutput">.bz2</code> files).  The routines
    810 provide hooks to facilitate reading files in which the
    811 <code class="computeroutput">bzip2</code> data stream is embedded
    812 within some larger-scale file structure, or where there are
    813 multiple <code class="computeroutput">bzip2</code> data streams
    814 concatenated end-to-end.</p>
    815 <p>For reading files,
    816 <code class="computeroutput">BZ2_bzReadOpen</code>,
    817 <code class="computeroutput">BZ2_bzRead</code>,
    818 <code class="computeroutput">BZ2_bzReadClose</code> and 
    819 <code class="computeroutput">BZ2_bzReadGetUnused</code> are
    820 supplied.  For writing files,
    821 <code class="computeroutput">BZ2_bzWriteOpen</code>,
    822 <code class="computeroutput">BZ2_bzWrite</code> and
    823 <code class="computeroutput">BZ2_bzWriteFinish</code> are
    824 available.</p>
    825 <p>As with the low-level library, no global variables are used
    826 so the library is per se thread-safe.  However, if I/O errors
    827 occur whilst reading or writing the underlying compressed files,
    828 you may have to consult <code class="computeroutput">errno</code> to
    829 determine the cause of the error.  In that case, you'd need a C
    830 library which correctly supports
    831 <code class="computeroutput">errno</code> in a multithreaded
    832 environment.</p>
    833 <p>To make the library a little simpler and more portable,
    834 <code class="computeroutput">BZ2_bzReadOpen</code> and
    835 <code class="computeroutput">BZ2_bzWriteOpen</code> require you to
    836 pass them file handles (<code class="computeroutput">FILE*</code>s)
    837 which have previously been opened for reading or writing
    838 respectively.  That avoids portability problems associated with
    839 file operations and file attributes, whilst not being much of an
    840 imposition on the programmer.</p>
    841 </div>
    842 <div class="sect2" title="3.1.3.Utility functions summary">
    843 <div class="titlepage"><div><div><h3 class="title">
    844 <a name="util-fns-summary"></a>3.1.3.Utility functions summary</h3></div></div></div>
    845 <p>For very simple needs,
    846 <code class="computeroutput">BZ2_bzBuffToBuffCompress</code> and
    847 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> are
    848 provided.  These compress data in memory from one buffer to
    849 another buffer in a single function call.  You should assess
    850 whether these functions fulfill your memory-to-memory
    851 compression/decompression requirements before investing effort in
    852 understanding the more general but more complex low-level
    853 interface.</p>
    854 <p>Yoshioka Tsuneo
    855 (<code class="computeroutput">tsuneo (a] rr.iij4u.or.jp</code>) has
    856 contributed some functions to give better
    857 <code class="computeroutput">zlib</code> compatibility.  These
    858 functions are <code class="computeroutput">BZ2_bzopen</code>,
    859 <code class="computeroutput">BZ2_bzread</code>,
    860 <code class="computeroutput">BZ2_bzwrite</code>,
    861 <code class="computeroutput">BZ2_bzflush</code>,
    862 <code class="computeroutput">BZ2_bzclose</code>,
    863 <code class="computeroutput">BZ2_bzerror</code> and
    864 <code class="computeroutput">BZ2_bzlibVersion</code>.  You may find
    865 these functions more convenient for simple file reading and
    866 writing, than those in the high-level interface.  These functions
    867 are not (yet) officially part of the library, and are minimally
    868 documented here.  If they break, you get to keep all the pieces.
    869 I hope to document them properly when time permits.</p>
    870 <p>Yoshioka also contributed modifications to allow the
    871 library to be built as a Windows DLL.</p>
    872 </div>
    873 </div>
    874 <div class="sect1" title="3.2.Error handling">
    875 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    876 <a name="err-handling"></a>3.2.Error handling</h2></div></div></div>
    877 <p>The library is designed to recover cleanly in all
    878 situations, including the worst-case situation of decompressing
    879 random data.  I'm not 100% sure that it can always do this, so
    880 you might want to add a signal handler to catch segmentation
    881 violations during decompression if you are feeling especially
    882 paranoid.  I would be interested in hearing more about the
    883 robustness of the library to corrupted compressed data.</p>
    884 <p>Version 1.0.3 more robust in this respect than any
    885 previous version.  Investigations with Valgrind (a tool for detecting
    886 problems with memory management) indicate
    887 that, at least for the few files I tested, all single-bit errors
    888 in the decompressed data are caught properly, with no
    889 segmentation faults, no uses of uninitialised data, no out of
    890 range reads or writes, and no infinite looping in the decompressor.
    891 So it's certainly pretty robust, although
    892 I wouldn't claim it to be totally bombproof.</p>
    893 <p>The file <code class="computeroutput">bzlib.h</code> contains
    894 all definitions needed to use the library.  In particular, you
    895 should definitely not include
    896 <code class="computeroutput">bzlib_private.h</code>.</p>
    897 <p>In <code class="computeroutput">bzlib.h</code>, the various
    898 return values are defined.  The following list is not intended as
    899 an exhaustive description of the circumstances in which a given
    900 value may be returned -- those descriptions are given later.
    901 Rather, it is intended to convey the rough meaning of each return
    902 value.  The first five actions are normal and not intended to
    903 denote an error situation.</p>
    904 <div class="variablelist"><dl>
    905 <dt><span class="term"><code class="computeroutput">BZ_OK</code></span></dt>
    906 <dd><p>The requested action was completed
    907    successfully.</p></dd>
    908 <dt><span class="term"><code class="computeroutput">BZ_RUN_OK, BZ_FLUSH_OK,
    909     BZ_FINISH_OK</code></span></dt>
    910 <dd><p>In 
    911    <code class="computeroutput">BZ2_bzCompress</code>, the requested
    912    flush/finish/nothing-special action was completed
    913    successfully.</p></dd>
    914 <dt><span class="term"><code class="computeroutput">BZ_STREAM_END</code></span></dt>
    915 <dd><p>Compression of data was completed, or the
    916    logical stream end was detected during
    917    decompression.</p></dd>
    918 </dl></div>
    919 <p>The following return values indicate an error of some
    920 kind.</p>
    921 <div class="variablelist"><dl>
    922 <dt><span class="term"><code class="computeroutput">BZ_CONFIG_ERROR</code></span></dt>
    923 <dd><p>Indicates that the library has been improperly
    924    compiled on your platform -- a major configuration error.
    925    Specifically, it means that
    926    <code class="computeroutput">sizeof(char)</code>,
    927    <code class="computeroutput">sizeof(short)</code> and
    928    <code class="computeroutput">sizeof(int)</code> are not 1, 2 and
    929    4 respectively, as they should be.  Note that the library
    930    should still work properly on 64-bit platforms which follow
    931    the LP64 programming model -- that is, where
    932    <code class="computeroutput">sizeof(long)</code> and
    933    <code class="computeroutput">sizeof(void*)</code> are 8.  Under
    934    LP64, <code class="computeroutput">sizeof(int)</code> is still 4,
    935    so <code class="computeroutput">libbzip2</code>, which doesn't
    936    use the <code class="computeroutput">long</code> type, is
    937    OK.</p></dd>
    938 <dt><span class="term"><code class="computeroutput">BZ_SEQUENCE_ERROR</code></span></dt>
    939 <dd><p>When using the library, it is important to call
    940    the functions in the correct sequence and with data structures
    941    (buffers etc) in the correct states.
    942    <code class="computeroutput">libbzip2</code> checks as much as it
    943    can to ensure this is happening, and returns
    944    <code class="computeroutput">BZ_SEQUENCE_ERROR</code> if not.
    945    Code which complies precisely with the function semantics, as
    946    detailed below, should never receive this value; such an event
    947    denotes buggy code which you should
    948    investigate.</p></dd>
    949 <dt><span class="term"><code class="computeroutput">BZ_PARAM_ERROR</code></span></dt>
    950 <dd><p>Returned when a parameter to a function call is
    951    out of range or otherwise manifestly incorrect.  As with
    952    <code class="computeroutput">BZ_SEQUENCE_ERROR</code>, this
    953    denotes a bug in the client code.  The distinction between
    954    <code class="computeroutput">BZ_PARAM_ERROR</code> and
    955    <code class="computeroutput">BZ_SEQUENCE_ERROR</code> is a bit
    956    hazy, but still worth making.</p></dd>
    957 <dt><span class="term"><code class="computeroutput">BZ_MEM_ERROR</code></span></dt>
    958 <dd><p>Returned when a request to allocate memory
    959    failed.  Note that the quantity of memory needed to decompress
    960    a stream cannot be determined until the stream's header has
    961    been read.  So
    962    <code class="computeroutput">BZ2_bzDecompress</code> and
    963    <code class="computeroutput">BZ2_bzRead</code> may return
    964    <code class="computeroutput">BZ_MEM_ERROR</code> even though some
    965    of the compressed data has been read.  The same is not true
    966    for compression; once
    967    <code class="computeroutput">BZ2_bzCompressInit</code> or
    968    <code class="computeroutput">BZ2_bzWriteOpen</code> have
    969    successfully completed,
    970    <code class="computeroutput">BZ_MEM_ERROR</code> cannot
    971    occur.</p></dd>
    972 <dt><span class="term"><code class="computeroutput">BZ_DATA_ERROR</code></span></dt>
    973 <dd><p>Returned when a data integrity error is
    974    detected during decompression.  Most importantly, this means
    975    when stored and computed CRCs for the data do not match.  This
    976    value is also returned upon detection of any other anomaly in
    977    the compressed data.</p></dd>
    978 <dt><span class="term"><code class="computeroutput">BZ_DATA_ERROR_MAGIC</code></span></dt>
    979 <dd><p>As a special case of
    980    <code class="computeroutput">BZ_DATA_ERROR</code>, it is
    981    sometimes useful to know when the compressed stream does not
    982    start with the correct magic bytes (<code class="computeroutput">'B' 'Z'
    983    'h'</code>).</p></dd>
    984 <dt><span class="term"><code class="computeroutput">BZ_IO_ERROR</code></span></dt>
    985 <dd><p>Returned by
    986    <code class="computeroutput">BZ2_bzRead</code> and
    987    <code class="computeroutput">BZ2_bzWrite</code> when there is an
    988    error reading or writing in the compressed file, and by
    989    <code class="computeroutput">BZ2_bzReadOpen</code> and
    990    <code class="computeroutput">BZ2_bzWriteOpen</code> for attempts
    991    to use a file for which the error indicator (viz,
    992    <code class="computeroutput">ferror(f)</code>) is set.  On
    993    receipt of <code class="computeroutput">BZ_IO_ERROR</code>, the
    994    caller should consult <code class="computeroutput">errno</code>
    995    and/or <code class="computeroutput">perror</code> to acquire
    996    operating-system specific information about the
    997    problem.</p></dd>
    998 <dt><span class="term"><code class="computeroutput">BZ_UNEXPECTED_EOF</code></span></dt>
    999 <dd><p>Returned by
   1000    <code class="computeroutput">BZ2_bzRead</code> when the
   1001    compressed file finishes before the logical end of stream is
   1002    detected.</p></dd>
   1003 <dt><span class="term"><code class="computeroutput">BZ_OUTBUFF_FULL</code></span></dt>
   1004 <dd><p>Returned by
   1005    <code class="computeroutput">BZ2_bzBuffToBuffCompress</code> and
   1006    <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> to
   1007    indicate that the output data will not fit into the output
   1008    buffer provided.</p></dd>
   1009 </dl></div>
   1010 </div>
   1011 <div class="sect1" title="3.3.Low-level interface">
   1012 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   1013 <a name="low-level"></a>3.3.Low-level interface</h2></div></div></div>
   1014 <div class="sect2" title="3.3.1.BZ2_bzCompressInit">
   1015 <div class="titlepage"><div><div><h3 class="title">
   1016 <a name="bzcompress-init"></a>3.3.1.BZ2_bzCompressInit</h3></div></div></div>
   1017 <pre class="programlisting">typedef struct {
   1018   char *next_in;
   1019   unsigned int avail_in;
   1020   unsigned int total_in_lo32;
   1021   unsigned int total_in_hi32;
   1022 
   1023   char *next_out;
   1024   unsigned int avail_out;
   1025   unsigned int total_out_lo32;
   1026   unsigned int total_out_hi32;
   1027 
   1028   void *state;
   1029 
   1030   void *(*bzalloc)(void *,int,int);
   1031   void (*bzfree)(void *,void *);
   1032   void *opaque;
   1033 } bz_stream;
   1034 
   1035 int BZ2_bzCompressInit ( bz_stream *strm, 
   1036                          int blockSize100k, 
   1037                          int verbosity,
   1038                          int workFactor );</pre>
   1039 <p>Prepares for compression.  The
   1040 <code class="computeroutput">bz_stream</code> structure holds all
   1041 data pertaining to the compression activity.  A
   1042 <code class="computeroutput">bz_stream</code> structure should be
   1043 allocated and initialised prior to the call.  The fields of
   1044 <code class="computeroutput">bz_stream</code> comprise the entirety
   1045 of the user-visible data.  <code class="computeroutput">state</code>
   1046 is a pointer to the private data structures required for
   1047 compression.</p>
   1048 <p>Custom memory allocators are supported, via fields
   1049 <code class="computeroutput">bzalloc</code>,
   1050 <code class="computeroutput">bzfree</code>, and
   1051 <code class="computeroutput">opaque</code>.  The value
   1052 <code class="computeroutput">opaque</code> is passed to as the first
   1053 argument to all calls to <code class="computeroutput">bzalloc</code>
   1054 and <code class="computeroutput">bzfree</code>, but is otherwise
   1055 ignored by the library.  The call <code class="computeroutput">bzalloc (
   1056 opaque, n, m )</code> is expected to return a pointer
   1057 <code class="computeroutput">p</code> to <code class="computeroutput">n *
   1058 m</code> bytes of memory, and <code class="computeroutput">bzfree (
   1059 opaque, p )</code> should free that memory.</p>
   1060 <p>If you don't want to use a custom memory allocator, set
   1061 <code class="computeroutput">bzalloc</code>,
   1062 <code class="computeroutput">bzfree</code> and
   1063 <code class="computeroutput">opaque</code> to
   1064 <code class="computeroutput">NULL</code>, and the library will then
   1065 use the standard <code class="computeroutput">malloc</code> /
   1066 <code class="computeroutput">free</code> routines.</p>
   1067 <p>Before calling
   1068 <code class="computeroutput">BZ2_bzCompressInit</code>, fields
   1069 <code class="computeroutput">bzalloc</code>,
   1070 <code class="computeroutput">bzfree</code> and
   1071 <code class="computeroutput">opaque</code> should be filled
   1072 appropriately, as just described.  Upon return, the internal
   1073 state will have been allocated and initialised, and
   1074 <code class="computeroutput">total_in_lo32</code>,
   1075 <code class="computeroutput">total_in_hi32</code>,
   1076 <code class="computeroutput">total_out_lo32</code> and
   1077 <code class="computeroutput">total_out_hi32</code> will have been
   1078 set to zero.  These four fields are used by the library to inform
   1079 the caller of the total amount of data passed into and out of the
   1080 library, respectively.  You should not try to change them.  As of
   1081 version 1.0, 64-bit counts are maintained, even on 32-bit
   1082 platforms, using the <code class="computeroutput">_hi32</code>
   1083 fields to store the upper 32 bits of the count.  So, for example,
   1084 the total amount of data in is <code class="computeroutput">(total_in_hi32
   1085 &lt;&lt; 32) + total_in_lo32</code>.</p>
   1086 <p>Parameter <code class="computeroutput">blockSize100k</code>
   1087 specifies the block size to be used for compression.  It should
   1088 be a value between 1 and 9 inclusive, and the actual block size
   1089 used is 100000 x this figure.  9 gives the best compression but
   1090 takes most memory.</p>
   1091 <p>Parameter <code class="computeroutput">verbosity</code> should
   1092 be set to a number between 0 and 4 inclusive.  0 is silent, and
   1093 greater numbers give increasingly verbose monitoring/debugging
   1094 output.  If the library has been compiled with
   1095 <code class="computeroutput">-DBZ_NO_STDIO</code>, no such output
   1096 will appear for any verbosity setting.</p>
   1097 <p>Parameter <code class="computeroutput">workFactor</code>
   1098 controls how the compression phase behaves when presented with
   1099 worst case, highly repetitive, input data.  If compression runs
   1100 into difficulties caused by repetitive data, the library switches
   1101 from the standard sorting algorithm to a fallback algorithm.  The
   1102 fallback is slower than the standard algorithm by perhaps a
   1103 factor of three, but always behaves reasonably, no matter how bad
   1104 the input.</p>
   1105 <p>Lower values of <code class="computeroutput">workFactor</code>
   1106 reduce the amount of effort the standard algorithm will expend
   1107 before resorting to the fallback.  You should set this parameter
   1108 carefully; too low, and many inputs will be handled by the
   1109 fallback algorithm and so compress rather slowly, too high, and
   1110 your average-to-worst case compression times can become very
   1111 large.  The default value of 30 gives reasonable behaviour over a
   1112 wide range of circumstances.</p>
   1113 <p>Allowable values range from 0 to 250 inclusive.  0 is a
   1114 special case, equivalent to using the default value of 30.</p>
   1115 <p>Note that the compressed output generated is the same
   1116 regardless of whether or not the fallback algorithm is
   1117 used.</p>
   1118 <p>Be aware also that this parameter may disappear entirely in
   1119 future versions of the library.  In principle it should be
   1120 possible to devise a good way to automatically choose which
   1121 algorithm to use.  Such a mechanism would render the parameter
   1122 obsolete.</p>
   1123 <p>Possible return values:</p>
   1124 <pre class="programlisting">BZ_CONFIG_ERROR
   1125   if the library has been mis-compiled
   1126 BZ_PARAM_ERROR
   1127   if strm is NULL 
   1128   or blockSize &lt; 1 or blockSize &gt; 9
   1129   or verbosity &lt; 0 or verbosity &gt; 4
   1130   or workFactor &lt; 0 or workFactor &gt; 250
   1131 BZ_MEM_ERROR 
   1132   if not enough memory is available
   1133 BZ_OK 
   1134   otherwise</pre>
   1135 <p>Allowable next actions:</p>
   1136 <pre class="programlisting">BZ2_bzCompress
   1137   if BZ_OK is returned
   1138   no specific action needed in case of error</pre>
   1139 </div>
   1140 <div class="sect2" title="3.3.2.BZ2_bzCompress">
   1141 <div class="titlepage"><div><div><h3 class="title">
   1142 <a name="bzCompress"></a>3.3.2.BZ2_bzCompress</h3></div></div></div>
   1143 <pre class="programlisting">int BZ2_bzCompress ( bz_stream *strm, int action );</pre>
   1144 <p>Provides more input and/or output buffer space for the
   1145 library.  The caller maintains input and output buffers, and
   1146 calls <code class="computeroutput">BZ2_bzCompress</code> to transfer
   1147 data between them.</p>
   1148 <p>Before each call to
   1149 <code class="computeroutput">BZ2_bzCompress</code>,
   1150 <code class="computeroutput">next_in</code> should point at the data
   1151 to be compressed, and <code class="computeroutput">avail_in</code>
   1152 should indicate how many bytes the library may read.
   1153 <code class="computeroutput">BZ2_bzCompress</code> updates
   1154 <code class="computeroutput">next_in</code>,
   1155 <code class="computeroutput">avail_in</code> and
   1156 <code class="computeroutput">total_in</code> to reflect the number
   1157 of bytes it has read.</p>
   1158 <p>Similarly, <code class="computeroutput">next_out</code> should
   1159 point to a buffer in which the compressed data is to be placed,
   1160 with <code class="computeroutput">avail_out</code> indicating how
   1161 much output space is available.
   1162 <code class="computeroutput">BZ2_bzCompress</code> updates
   1163 <code class="computeroutput">next_out</code>,
   1164 <code class="computeroutput">avail_out</code> and
   1165 <code class="computeroutput">total_out</code> to reflect the number
   1166 of bytes output.</p>
   1167 <p>You may provide and remove as little or as much data as you
   1168 like on each call of
   1169 <code class="computeroutput">BZ2_bzCompress</code>.  In the limit,
   1170 it is acceptable to supply and remove data one byte at a time,
   1171 although this would be terribly inefficient.  You should always
   1172 ensure that at least one byte of output space is available at
   1173 each call.</p>
   1174 <p>A second purpose of
   1175 <code class="computeroutput">BZ2_bzCompress</code> is to request a
   1176 change of mode of the compressed stream.</p>
   1177 <p>Conceptually, a compressed stream can be in one of four
   1178 states: IDLE, RUNNING, FLUSHING and FINISHING.  Before
   1179 initialisation
   1180 (<code class="computeroutput">BZ2_bzCompressInit</code>) and after
   1181 termination (<code class="computeroutput">BZ2_bzCompressEnd</code>),
   1182 a stream is regarded as IDLE.</p>
   1183 <p>Upon initialisation
   1184 (<code class="computeroutput">BZ2_bzCompressInit</code>), the stream
   1185 is placed in the RUNNING state.  Subsequent calls to
   1186 <code class="computeroutput">BZ2_bzCompress</code> should pass
   1187 <code class="computeroutput">BZ_RUN</code> as the requested action;
   1188 other actions are illegal and will result in
   1189 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>.</p>
   1190 <p>At some point, the calling program will have provided all
   1191 the input data it wants to.  It will then want to finish up -- in
   1192 effect, asking the library to process any data it might have
   1193 buffered internally.  In this state,
   1194 <code class="computeroutput">BZ2_bzCompress</code> will no longer
   1195 attempt to read data from
   1196 <code class="computeroutput">next_in</code>, but it will want to
   1197 write data to <code class="computeroutput">next_out</code>.  Because
   1198 the output buffer supplied by the user can be arbitrarily small,
   1199 the finishing-up operation cannot necessarily be done with a
   1200 single call of
   1201 <code class="computeroutput">BZ2_bzCompress</code>.</p>
   1202 <p>Instead, the calling program passes
   1203 <code class="computeroutput">BZ_FINISH</code> as an action to
   1204 <code class="computeroutput">BZ2_bzCompress</code>.  This changes
   1205 the stream's state to FINISHING.  Any remaining input (ie,
   1206 <code class="computeroutput">next_in[0 .. avail_in-1]</code>) is
   1207 compressed and transferred to the output buffer.  To do this,
   1208 <code class="computeroutput">BZ2_bzCompress</code> must be called
   1209 repeatedly until all the output has been consumed.  At that
   1210 point, <code class="computeroutput">BZ2_bzCompress</code> returns
   1211 <code class="computeroutput">BZ_STREAM_END</code>, and the stream's
   1212 state is set back to IDLE.
   1213 <code class="computeroutput">BZ2_bzCompressEnd</code> should then be
   1214 called.</p>
   1215 <p>Just to make sure the calling program does not cheat, the
   1216 library makes a note of <code class="computeroutput">avail_in</code>
   1217 at the time of the first call to
   1218 <code class="computeroutput">BZ2_bzCompress</code> which has
   1219 <code class="computeroutput">BZ_FINISH</code> as an action (ie, at
   1220 the time the program has announced its intention to not supply
   1221 any more input).  By comparing this value with that of
   1222 <code class="computeroutput">avail_in</code> over subsequent calls
   1223 to <code class="computeroutput">BZ2_bzCompress</code>, the library
   1224 can detect any attempts to slip in more data to compress.  Any
   1225 calls for which this is detected will return
   1226 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>.  This
   1227 indicates a programming mistake which should be corrected.</p>
   1228 <p>Instead of asking to finish, the calling program may ask
   1229 <code class="computeroutput">BZ2_bzCompress</code> to take all the
   1230 remaining input, compress it and terminate the current
   1231 (Burrows-Wheeler) compression block.  This could be useful for
   1232 error control purposes.  The mechanism is analogous to that for
   1233 finishing: call <code class="computeroutput">BZ2_bzCompress</code>
   1234 with an action of <code class="computeroutput">BZ_FLUSH</code>,
   1235 remove output data, and persist with the
   1236 <code class="computeroutput">BZ_FLUSH</code> action until the value
   1237 <code class="computeroutput">BZ_RUN</code> is returned.  As with
   1238 finishing, <code class="computeroutput">BZ2_bzCompress</code>
   1239 detects any attempt to provide more input data once the flush has
   1240 begun.</p>
   1241 <p>Once the flush is complete, the stream returns to the
   1242 normal RUNNING state.</p>
   1243 <p>This all sounds pretty complex, but isn't really.  Here's a
   1244 table which shows which actions are allowable in each state, what
   1245 action will be taken, what the next state is, and what the
   1246 non-error return values are.  Note that you can't explicitly ask
   1247 what state the stream is in, but nor do you need to -- it can be
   1248 inferred from the values returned by
   1249 <code class="computeroutput">BZ2_bzCompress</code>.</p>
   1250 <pre class="programlisting">IDLE/any
   1251   Illegal.  IDLE state only exists after BZ2_bzCompressEnd or
   1252   before BZ2_bzCompressInit.
   1253   Return value = BZ_SEQUENCE_ERROR
   1254 
   1255 RUNNING/BZ_RUN
   1256   Compress from next_in to next_out as much as possible.
   1257   Next state = RUNNING
   1258   Return value = BZ_RUN_OK
   1259 
   1260 RUNNING/BZ_FLUSH
   1261   Remember current value of next_in. Compress from next_in
   1262   to next_out as much as possible, but do not accept any more input.
   1263   Next state = FLUSHING
   1264   Return value = BZ_FLUSH_OK
   1265 
   1266 RUNNING/BZ_FINISH
   1267   Remember current value of next_in. Compress from next_in
   1268   to next_out as much as possible, but do not accept any more input.
   1269   Next state = FINISHING
   1270   Return value = BZ_FINISH_OK
   1271 
   1272 FLUSHING/BZ_FLUSH
   1273   Compress from next_in to next_out as much as possible, 
   1274   but do not accept any more input.
   1275   If all the existing input has been used up and all compressed
   1276   output has been removed
   1277     Next state = RUNNING; Return value = BZ_RUN_OK
   1278   else
   1279     Next state = FLUSHING; Return value = BZ_FLUSH_OK
   1280 
   1281 FLUSHING/other     
   1282   Illegal.
   1283   Return value = BZ_SEQUENCE_ERROR
   1284 
   1285 FINISHING/BZ_FINISH
   1286   Compress from next_in to next_out as much as possible,
   1287   but to not accept any more input.  
   1288   If all the existing input has been used up and all compressed
   1289   output has been removed
   1290     Next state = IDLE; Return value = BZ_STREAM_END
   1291   else
   1292     Next state = FINISHING; Return value = BZ_FINISH_OK
   1293 
   1294 FINISHING/other
   1295   Illegal.
   1296   Return value = BZ_SEQUENCE_ERROR</pre>
   1297 <p>That still looks complicated?  Well, fair enough.  The
   1298 usual sequence of calls for compressing a load of data is:</p>
   1299 <div class="orderedlist"><ol class="orderedlist" type="1">
   1300 <li class="listitem"><p>Get started with
   1301   <code class="computeroutput">BZ2_bzCompressInit</code>.</p></li>
   1302 <li class="listitem"><p>Shovel data in and shlurp out its compressed form
   1303   using zero or more calls of
   1304   <code class="computeroutput">BZ2_bzCompress</code> with action =
   1305   <code class="computeroutput">BZ_RUN</code>.</p></li>
   1306 <li class="listitem"><p>Finish up. Repeatedly call
   1307   <code class="computeroutput">BZ2_bzCompress</code> with action =
   1308   <code class="computeroutput">BZ_FINISH</code>, copying out the
   1309   compressed output, until
   1310   <code class="computeroutput">BZ_STREAM_END</code> is
   1311   returned.</p></li>
   1312 <li class="listitem"><p>Close up and go home.  Call
   1313   <code class="computeroutput">BZ2_bzCompressEnd</code>.</p></li>
   1314 </ol></div>
   1315 <p>If the data you want to compress fits into your input
   1316 buffer all at once, you can skip the calls of
   1317 <code class="computeroutput">BZ2_bzCompress ( ..., BZ_RUN )</code>
   1318 and just do the <code class="computeroutput">BZ2_bzCompress ( ..., BZ_FINISH
   1319 )</code> calls.</p>
   1320 <p>All required memory is allocated by
   1321 <code class="computeroutput">BZ2_bzCompressInit</code>.  The
   1322 compression library can accept any data at all (obviously).  So
   1323 you shouldn't get any error return values from the
   1324 <code class="computeroutput">BZ2_bzCompress</code> calls.  If you
   1325 do, they will be
   1326 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>, and indicate
   1327 a bug in your programming.</p>
   1328 <p>Trivial other possible return values:</p>
   1329 <pre class="programlisting">BZ_PARAM_ERROR
   1330   if strm is NULL, or strm-&gt;s is NULL</pre>
   1331 </div>
   1332 <div class="sect2" title="3.3.3.BZ2_bzCompressEnd">
   1333 <div class="titlepage"><div><div><h3 class="title">
   1334 <a name="bzCompress-end"></a>3.3.3.BZ2_bzCompressEnd</h3></div></div></div>
   1335 <pre class="programlisting">int BZ2_bzCompressEnd ( bz_stream *strm );</pre>
   1336 <p>Releases all memory associated with a compression
   1337 stream.</p>
   1338 <p>Possible return values:</p>
   1339 <pre class="programlisting">BZ_PARAM_ERROR  if strm is NULL or strm-&gt;s is NULL
   1340 BZ_OK           otherwise</pre>
   1341 </div>
   1342 <div class="sect2" title="3.3.4.BZ2_bzDecompressInit">
   1343 <div class="titlepage"><div><div><h3 class="title">
   1344 <a name="bzDecompress-init"></a>3.3.4.BZ2_bzDecompressInit</h3></div></div></div>
   1345 <pre class="programlisting">int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );</pre>
   1346 <p>Prepares for decompression.  As with
   1347 <code class="computeroutput">BZ2_bzCompressInit</code>, a
   1348 <code class="computeroutput">bz_stream</code> record should be
   1349 allocated and initialised before the call.  Fields
   1350 <code class="computeroutput">bzalloc</code>,
   1351 <code class="computeroutput">bzfree</code> and
   1352 <code class="computeroutput">opaque</code> should be set if a custom
   1353 memory allocator is required, or made
   1354 <code class="computeroutput">NULL</code> for the normal
   1355 <code class="computeroutput">malloc</code> /
   1356 <code class="computeroutput">free</code> routines.  Upon return, the
   1357 internal state will have been initialised, and
   1358 <code class="computeroutput">total_in</code> and
   1359 <code class="computeroutput">total_out</code> will be zero.</p>
   1360 <p>For the meaning of parameter
   1361 <code class="computeroutput">verbosity</code>, see
   1362 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
   1363 <p>If <code class="computeroutput">small</code> is nonzero, the
   1364 library will use an alternative decompression algorithm which
   1365 uses less memory but at the cost of decompressing more slowly
   1366 (roughly speaking, half the speed, but the maximum memory
   1367 requirement drops to around 2300k).  See <a class="xref" href="#using" title="2.How to use bzip2">How to use bzip2</a>
   1368 for more information on memory management.</p>
   1369 <p>Note that the amount of memory needed to decompress a
   1370 stream cannot be determined until the stream's header has been
   1371 read, so even if
   1372 <code class="computeroutput">BZ2_bzDecompressInit</code> succeeds, a
   1373 subsequent <code class="computeroutput">BZ2_bzDecompress</code>
   1374 could fail with
   1375 <code class="computeroutput">BZ_MEM_ERROR</code>.</p>
   1376 <p>Possible return values:</p>
   1377 <pre class="programlisting">BZ_CONFIG_ERROR
   1378   if the library has been mis-compiled
   1379 BZ_PARAM_ERROR
   1380   if ( small != 0 &amp;&amp; small != 1 )
   1381   or (verbosity &lt;; 0 || verbosity &gt; 4)
   1382 BZ_MEM_ERROR
   1383   if insufficient memory is available</pre>
   1384 <p>Allowable next actions:</p>
   1385 <pre class="programlisting">BZ2_bzDecompress
   1386   if BZ_OK was returned
   1387   no specific action required in case of error</pre>
   1388 </div>
   1389 <div class="sect2" title="3.3.5.BZ2_bzDecompress">
   1390 <div class="titlepage"><div><div><h3 class="title">
   1391 <a name="bzDecompress"></a>3.3.5.BZ2_bzDecompress</h3></div></div></div>
   1392 <pre class="programlisting">int BZ2_bzDecompress ( bz_stream *strm );</pre>
   1393 <p>Provides more input and/out output buffer space for the
   1394 library.  The caller maintains input and output buffers, and uses
   1395 <code class="computeroutput">BZ2_bzDecompress</code> to transfer
   1396 data between them.</p>
   1397 <p>Before each call to
   1398 <code class="computeroutput">BZ2_bzDecompress</code>,
   1399 <code class="computeroutput">next_in</code> should point at the
   1400 compressed data, and <code class="computeroutput">avail_in</code>
   1401 should indicate how many bytes the library may read.
   1402 <code class="computeroutput">BZ2_bzDecompress</code> updates
   1403 <code class="computeroutput">next_in</code>,
   1404 <code class="computeroutput">avail_in</code> and
   1405 <code class="computeroutput">total_in</code> to reflect the number
   1406 of bytes it has read.</p>
   1407 <p>Similarly, <code class="computeroutput">next_out</code> should
   1408 point to a buffer in which the uncompressed output is to be
   1409 placed, with <code class="computeroutput">avail_out</code>
   1410 indicating how much output space is available.
   1411 <code class="computeroutput">BZ2_bzCompress</code> updates
   1412 <code class="computeroutput">next_out</code>,
   1413 <code class="computeroutput">avail_out</code> and
   1414 <code class="computeroutput">total_out</code> to reflect the number
   1415 of bytes output.</p>
   1416 <p>You may provide and remove as little or as much data as you
   1417 like on each call of
   1418 <code class="computeroutput">BZ2_bzDecompress</code>.  In the limit,
   1419 it is acceptable to supply and remove data one byte at a time,
   1420 although this would be terribly inefficient.  You should always
   1421 ensure that at least one byte of output space is available at
   1422 each call.</p>
   1423 <p>Use of <code class="computeroutput">BZ2_bzDecompress</code> is
   1424 simpler than
   1425 <code class="computeroutput">BZ2_bzCompress</code>.</p>
   1426 <p>You should provide input and remove output as described
   1427 above, and repeatedly call
   1428 <code class="computeroutput">BZ2_bzDecompress</code> until
   1429 <code class="computeroutput">BZ_STREAM_END</code> is returned.
   1430 Appearance of <code class="computeroutput">BZ_STREAM_END</code>
   1431 denotes that <code class="computeroutput">BZ2_bzDecompress</code>
   1432 has detected the logical end of the compressed stream.
   1433 <code class="computeroutput">BZ2_bzDecompress</code> will not
   1434 produce <code class="computeroutput">BZ_STREAM_END</code> until all
   1435 output data has been placed into the output buffer, so once
   1436 <code class="computeroutput">BZ_STREAM_END</code> appears, you are
   1437 guaranteed to have available all the decompressed output, and
   1438 <code class="computeroutput">BZ2_bzDecompressEnd</code> can safely
   1439 be called.</p>
   1440 <p>If case of an error return value, you should call
   1441 <code class="computeroutput">BZ2_bzDecompressEnd</code> to clean up
   1442 and release memory.</p>
   1443 <p>Possible return values:</p>
   1444 <pre class="programlisting">BZ_PARAM_ERROR
   1445   if strm is NULL or strm-&gt;s is NULL
   1446   or strm-&gt;avail_out &lt; 1
   1447 BZ_DATA_ERROR
   1448   if a data integrity error is detected in the compressed stream
   1449 BZ_DATA_ERROR_MAGIC
   1450   if the compressed stream doesn't begin with the right magic bytes
   1451 BZ_MEM_ERROR
   1452   if there wasn't enough memory available
   1453 BZ_STREAM_END
   1454   if the logical end of the data stream was detected and all
   1455   output in has been consumed, eg s--&gt;avail_out &gt; 0
   1456 BZ_OK
   1457   otherwise</pre>
   1458 <p>Allowable next actions:</p>
   1459 <pre class="programlisting">BZ2_bzDecompress
   1460   if BZ_OK was returned
   1461 BZ2_bzDecompressEnd
   1462   otherwise</pre>
   1463 </div>
   1464 <div class="sect2" title="3.3.6.BZ2_bzDecompressEnd">
   1465 <div class="titlepage"><div><div><h3 class="title">
   1466 <a name="bzDecompress-end"></a>3.3.6.BZ2_bzDecompressEnd</h3></div></div></div>
   1467 <pre class="programlisting">int BZ2_bzDecompressEnd ( bz_stream *strm );</pre>
   1468 <p>Releases all memory associated with a decompression
   1469 stream.</p>
   1470 <p>Possible return values:</p>
   1471 <pre class="programlisting">BZ_PARAM_ERROR
   1472   if strm is NULL or strm-&gt;s is NULL
   1473 BZ_OK
   1474   otherwise</pre>
   1475 <p>Allowable next actions:</p>
   1476 <pre class="programlisting">  None.</pre>
   1477 </div>
   1478 </div>
   1479 <div class="sect1" title="3.4.High-level interface">
   1480 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   1481 <a name="hl-interface"></a>3.4.High-level interface</h2></div></div></div>
   1482 <p>This interface provides functions for reading and writing
   1483 <code class="computeroutput">bzip2</code> format files.  First, some
   1484 general points.</p>
   1485 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
   1486 <li class="listitem" style="list-style-type: disc"><p>All of the functions take an
   1487   <code class="computeroutput">int*</code> first argument,
   1488   <code class="computeroutput">bzerror</code>.  After each call,
   1489   <code class="computeroutput">bzerror</code> should be consulted
   1490   first to determine the outcome of the call.  If
   1491   <code class="computeroutput">bzerror</code> is
   1492   <code class="computeroutput">BZ_OK</code>, the call completed
   1493   successfully, and only then should the return value of the
   1494   function (if any) be consulted.  If
   1495   <code class="computeroutput">bzerror</code> is
   1496   <code class="computeroutput">BZ_IO_ERROR</code>, there was an
   1497   error reading/writing the underlying compressed file, and you
   1498   should then consult <code class="computeroutput">errno</code> /
   1499   <code class="computeroutput">perror</code> to determine the cause
   1500   of the difficulty.  <code class="computeroutput">bzerror</code>
   1501   may also be set to various other values; precise details are
   1502   given on a per-function basis below.</p></li>
   1503 <li class="listitem" style="list-style-type: disc"><p>If <code class="computeroutput">bzerror</code> indicates
   1504   an error (ie, anything except
   1505   <code class="computeroutput">BZ_OK</code> and
   1506   <code class="computeroutput">BZ_STREAM_END</code>), you should
   1507   immediately call
   1508   <code class="computeroutput">BZ2_bzReadClose</code> (or
   1509   <code class="computeroutput">BZ2_bzWriteClose</code>, depending on
   1510   whether you are attempting to read or to write) to free up all
   1511   resources associated with the stream.  Once an error has been
   1512   indicated, behaviour of all calls except
   1513   <code class="computeroutput">BZ2_bzReadClose</code>
   1514   (<code class="computeroutput">BZ2_bzWriteClose</code>) is
   1515   undefined.  The implication is that (1)
   1516   <code class="computeroutput">bzerror</code> should be checked
   1517   after each call, and (2) if
   1518   <code class="computeroutput">bzerror</code> indicates an error,
   1519   <code class="computeroutput">BZ2_bzReadClose</code>
   1520   (<code class="computeroutput">BZ2_bzWriteClose</code>) should then
   1521   be called to clean up.</p></li>
   1522 <li class="listitem" style="list-style-type: disc"><p>The <code class="computeroutput">FILE*</code> arguments
   1523   passed to <code class="computeroutput">BZ2_bzReadOpen</code> /
   1524   <code class="computeroutput">BZ2_bzWriteOpen</code> should be set
   1525   to binary mode.  Most Unix systems will do this by default, but
   1526   other platforms, including Windows and Mac, will not.  If you
   1527   omit this, you may encounter problems when moving code to new
   1528   platforms.</p></li>
   1529 <li class="listitem" style="list-style-type: disc"><p>Memory allocation requests are handled by
   1530   <code class="computeroutput">malloc</code> /
   1531   <code class="computeroutput">free</code>.  At present there is no
   1532   facility for user-defined memory allocators in the file I/O
   1533   functions (could easily be added, though).</p></li>
   1534 </ul></div>
   1535 <div class="sect2" title="3.4.1.BZ2_bzReadOpen">
   1536 <div class="titlepage"><div><div><h3 class="title">
   1537 <a name="bzreadopen"></a>3.4.1.BZ2_bzReadOpen</h3></div></div></div>
   1538 <pre class="programlisting">typedef void BZFILE;
   1539 
   1540 BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, 
   1541                         int verbosity, int small,
   1542                         void *unused, int nUnused );</pre>
   1543 <p>Prepare to read compressed data from file handle
   1544 <code class="computeroutput">f</code>.
   1545 <code class="computeroutput">f</code> should refer to a file which
   1546 has been opened for reading, and for which the error indicator
   1547 (<code class="computeroutput">ferror(f)</code>)is not set.  If
   1548 <code class="computeroutput">small</code> is 1, the library will try
   1549 to decompress using less memory, at the expense of speed.</p>
   1550 <p>For reasons explained below,
   1551 <code class="computeroutput">BZ2_bzRead</code> will decompress the
   1552 <code class="computeroutput">nUnused</code> bytes starting at
   1553 <code class="computeroutput">unused</code>, before starting to read
   1554 from the file <code class="computeroutput">f</code>.  At most
   1555 <code class="computeroutput">BZ_MAX_UNUSED</code> bytes may be
   1556 supplied like this.  If this facility is not required, you should
   1557 pass <code class="computeroutput">NULL</code> and
   1558 <code class="computeroutput">0</code> for
   1559 <code class="computeroutput">unused</code> and
   1560 n<code class="computeroutput">Unused</code> respectively.</p>
   1561 <p>For the meaning of parameters
   1562 <code class="computeroutput">small</code> and
   1563 <code class="computeroutput">verbosity</code>, see
   1564 <code class="computeroutput">BZ2_bzDecompressInit</code>.</p>
   1565 <p>The amount of memory needed to decompress a file cannot be
   1566 determined until the file's header has been read.  So it is
   1567 possible that <code class="computeroutput">BZ2_bzReadOpen</code>
   1568 returns <code class="computeroutput">BZ_OK</code> but a subsequent
   1569 call of <code class="computeroutput">BZ2_bzRead</code> will return
   1570 <code class="computeroutput">BZ_MEM_ERROR</code>.</p>
   1571 <p>Possible assignments to
   1572 <code class="computeroutput">bzerror</code>:</p>
   1573 <pre class="programlisting">BZ_CONFIG_ERROR
   1574   if the library has been mis-compiled
   1575 BZ_PARAM_ERROR
   1576   if f is NULL
   1577   or small is neither 0 nor 1
   1578   or ( unused == NULL &amp;&amp; nUnused != 0 )
   1579   or ( unused != NULL &amp;&amp; !(0 &lt;= nUnused &lt;= BZ_MAX_UNUSED) )
   1580 BZ_IO_ERROR
   1581   if ferror(f) is nonzero
   1582 BZ_MEM_ERROR
   1583   if insufficient memory is available
   1584 BZ_OK
   1585   otherwise.</pre>
   1586 <p>Possible return values:</p>
   1587 <pre class="programlisting">Pointer to an abstract BZFILE
   1588   if bzerror is BZ_OK
   1589 NULL
   1590   otherwise</pre>
   1591 <p>Allowable next actions:</p>
   1592 <pre class="programlisting">BZ2_bzRead
   1593   if bzerror is BZ_OK
   1594 BZ2_bzClose
   1595   otherwise</pre>
   1596 </div>
   1597 <div class="sect2" title="3.4.2.BZ2_bzRead">
   1598 <div class="titlepage"><div><div><h3 class="title">
   1599 <a name="bzread"></a>3.4.2.BZ2_bzRead</h3></div></div></div>
   1600 <pre class="programlisting">int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
   1601 <p>Reads up to <code class="computeroutput">len</code>
   1602 (uncompressed) bytes from the compressed file
   1603 <code class="computeroutput">b</code> into the buffer
   1604 <code class="computeroutput">buf</code>.  If the read was
   1605 successful, <code class="computeroutput">bzerror</code> is set to
   1606 <code class="computeroutput">BZ_OK</code> and the number of bytes
   1607 read is returned.  If the logical end-of-stream was detected,
   1608 <code class="computeroutput">bzerror</code> will be set to
   1609 <code class="computeroutput">BZ_STREAM_END</code>, and the number of
   1610 bytes read is returned.  All other
   1611 <code class="computeroutput">bzerror</code> values denote an
   1612 error.</p>
   1613 <p><code class="computeroutput">BZ2_bzRead</code> will supply
   1614 <code class="computeroutput">len</code> bytes, unless the logical
   1615 stream end is detected or an error occurs.  Because of this, it
   1616 is possible to detect the stream end by observing when the number
   1617 of bytes returned is less than the number requested.
   1618 Nevertheless, this is regarded as inadvisable; you should instead
   1619 check <code class="computeroutput">bzerror</code> after every call
   1620 and watch out for
   1621 <code class="computeroutput">BZ_STREAM_END</code>.</p>
   1622 <p>Internally, <code class="computeroutput">BZ2_bzRead</code>
   1623 copies data from the compressed file in chunks of size
   1624 <code class="computeroutput">BZ_MAX_UNUSED</code> bytes before
   1625 decompressing it.  If the file contains more bytes than strictly
   1626 needed to reach the logical end-of-stream,
   1627 <code class="computeroutput">BZ2_bzRead</code> will almost certainly
   1628 read some of the trailing data before signalling
   1629 <code class="computeroutput">BZ_SEQUENCE_END</code>.  To collect the
   1630 read but unused data once
   1631 <code class="computeroutput">BZ_SEQUENCE_END</code> has appeared,
   1632 call <code class="computeroutput">BZ2_bzReadGetUnused</code>
   1633 immediately before
   1634 <code class="computeroutput">BZ2_bzReadClose</code>.</p>
   1635 <p>Possible assignments to
   1636 <code class="computeroutput">bzerror</code>:</p>
   1637 <pre class="programlisting">BZ_PARAM_ERROR
   1638   if b is NULL or buf is NULL or len &lt; 0
   1639 BZ_SEQUENCE_ERROR
   1640   if b was opened with BZ2_bzWriteOpen
   1641 BZ_IO_ERROR
   1642   if there is an error reading from the compressed file
   1643 BZ_UNEXPECTED_EOF
   1644   if the compressed file ended before 
   1645   the logical end-of-stream was detected
   1646 BZ_DATA_ERROR
   1647   if a data integrity error was detected in the compressed stream
   1648 BZ_DATA_ERROR_MAGIC
   1649   if the stream does not begin with the requisite header bytes 
   1650   (ie, is not a bzip2 data file).  This is really 
   1651   a special case of BZ_DATA_ERROR.
   1652 BZ_MEM_ERROR
   1653   if insufficient memory was available
   1654 BZ_STREAM_END
   1655   if the logical end of stream was detected.
   1656 BZ_OK
   1657   otherwise.</pre>
   1658 <p>Possible return values:</p>
   1659 <pre class="programlisting">number of bytes read
   1660   if bzerror is BZ_OK or BZ_STREAM_END
   1661 undefined
   1662   otherwise</pre>
   1663 <p>Allowable next actions:</p>
   1664 <pre class="programlisting">collect data from buf, then BZ2_bzRead or BZ2_bzReadClose
   1665   if bzerror is BZ_OK
   1666 collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused
   1667   if bzerror is BZ_SEQUENCE_END
   1668 BZ2_bzReadClose
   1669   otherwise</pre>
   1670 </div>
   1671 <div class="sect2" title="3.4.3.BZ2_bzReadGetUnused">
   1672 <div class="titlepage"><div><div><h3 class="title">
   1673 <a name="bzreadgetunused"></a>3.4.3.BZ2_bzReadGetUnused</h3></div></div></div>
   1674 <pre class="programlisting">void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, 
   1675                           void** unused, int* nUnused );</pre>
   1676 <p>Returns data which was read from the compressed file but
   1677 was not needed to get to the logical end-of-stream.
   1678 <code class="computeroutput">*unused</code> is set to the address of
   1679 the data, and <code class="computeroutput">*nUnused</code> to the
   1680 number of bytes.  <code class="computeroutput">*nUnused</code> will
   1681 be set to a value between <code class="computeroutput">0</code> and
   1682 <code class="computeroutput">BZ_MAX_UNUSED</code> inclusive.</p>
   1683 <p>This function may only be called once
   1684 <code class="computeroutput">BZ2_bzRead</code> has signalled
   1685 <code class="computeroutput">BZ_STREAM_END</code> but before
   1686 <code class="computeroutput">BZ2_bzReadClose</code>.</p>
   1687 <p>Possible assignments to
   1688 <code class="computeroutput">bzerror</code>:</p>
   1689 <pre class="programlisting">BZ_PARAM_ERROR
   1690   if b is NULL
   1691   or unused is NULL or nUnused is NULL
   1692 BZ_SEQUENCE_ERROR
   1693   if BZ_STREAM_END has not been signalled
   1694   or if b was opened with BZ2_bzWriteOpen
   1695 BZ_OK
   1696   otherwise</pre>
   1697 <p>Allowable next actions:</p>
   1698 <pre class="programlisting">BZ2_bzReadClose</pre>
   1699 </div>
   1700 <div class="sect2" title="3.4.4.BZ2_bzReadClose">
   1701 <div class="titlepage"><div><div><h3 class="title">
   1702 <a name="bzreadclose"></a>3.4.4.BZ2_bzReadClose</h3></div></div></div>
   1703 <pre class="programlisting">void BZ2_bzReadClose ( int *bzerror, BZFILE *b );</pre>
   1704 <p>Releases all memory pertaining to the compressed file
   1705 <code class="computeroutput">b</code>.
   1706 <code class="computeroutput">BZ2_bzReadClose</code> does not call
   1707 <code class="computeroutput">fclose</code> on the underlying file
   1708 handle, so you should do that yourself if appropriate.
   1709 <code class="computeroutput">BZ2_bzReadClose</code> should be called
   1710 to clean up after all error situations.</p>
   1711 <p>Possible assignments to
   1712 <code class="computeroutput">bzerror</code>:</p>
   1713 <pre class="programlisting">BZ_SEQUENCE_ERROR
   1714   if b was opened with BZ2_bzOpenWrite
   1715 BZ_OK
   1716   otherwise</pre>
   1717 <p>Allowable next actions:</p>
   1718 <pre class="programlisting">none</pre>
   1719 </div>
   1720 <div class="sect2" title="3.4.5.BZ2_bzWriteOpen">
   1721 <div class="titlepage"><div><div><h3 class="title">
   1722 <a name="bzwriteopen"></a>3.4.5.BZ2_bzWriteOpen</h3></div></div></div>
   1723 <pre class="programlisting">BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, 
   1724                          int blockSize100k, int verbosity,
   1725                          int workFactor );</pre>
   1726 <p>Prepare to write compressed data to file handle
   1727 <code class="computeroutput">f</code>.
   1728 <code class="computeroutput">f</code> should refer to a file which
   1729 has been opened for writing, and for which the error indicator
   1730 (<code class="computeroutput">ferror(f)</code>)is not set.</p>
   1731 <p>For the meaning of parameters
   1732 <code class="computeroutput">blockSize100k</code>,
   1733 <code class="computeroutput">verbosity</code> and
   1734 <code class="computeroutput">workFactor</code>, see
   1735 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
   1736 <p>All required memory is allocated at this stage, so if the
   1737 call completes successfully,
   1738 <code class="computeroutput">BZ_MEM_ERROR</code> cannot be signalled
   1739 by a subsequent call to
   1740 <code class="computeroutput">BZ2_bzWrite</code>.</p>
   1741 <p>Possible assignments to
   1742 <code class="computeroutput">bzerror</code>:</p>
   1743 <pre class="programlisting">BZ_CONFIG_ERROR
   1744   if the library has been mis-compiled
   1745 BZ_PARAM_ERROR
   1746   if f is NULL
   1747   or blockSize100k &lt; 1 or blockSize100k &gt; 9
   1748 BZ_IO_ERROR
   1749   if ferror(f) is nonzero
   1750 BZ_MEM_ERROR
   1751   if insufficient memory is available
   1752 BZ_OK
   1753   otherwise</pre>
   1754 <p>Possible return values:</p>
   1755 <pre class="programlisting">Pointer to an abstract BZFILE
   1756   if bzerror is BZ_OK
   1757 NULL
   1758   otherwise</pre>
   1759 <p>Allowable next actions:</p>
   1760 <pre class="programlisting">BZ2_bzWrite
   1761   if bzerror is BZ_OK
   1762   (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless)
   1763 BZ2_bzWriteClose
   1764   otherwise</pre>
   1765 </div>
   1766 <div class="sect2" title="3.4.6.BZ2_bzWrite">
   1767 <div class="titlepage"><div><div><h3 class="title">
   1768 <a name="bzwrite"></a>3.4.6.BZ2_bzWrite</h3></div></div></div>
   1769 <pre class="programlisting">void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
   1770 <p>Absorbs <code class="computeroutput">len</code> bytes from the
   1771 buffer <code class="computeroutput">buf</code>, eventually to be
   1772 compressed and written to the file.</p>
   1773 <p>Possible assignments to
   1774 <code class="computeroutput">bzerror</code>:</p>
   1775 <pre class="programlisting">BZ_PARAM_ERROR
   1776   if b is NULL or buf is NULL or len &lt; 0
   1777 BZ_SEQUENCE_ERROR
   1778   if b was opened with BZ2_bzReadOpen
   1779 BZ_IO_ERROR
   1780   if there is an error writing the compressed file.
   1781 BZ_OK
   1782   otherwise</pre>
   1783 </div>
   1784 <div class="sect2" title="3.4.7.BZ2_bzWriteClose">
   1785 <div class="titlepage"><div><div><h3 class="title">
   1786 <a name="bzwriteclose"></a>3.4.7.BZ2_bzWriteClose</h3></div></div></div>
   1787 <pre class="programlisting">void BZ2_bzWriteClose( int *bzerror, BZFILE* f,
   1788                        int abandon,
   1789                        unsigned int* nbytes_in,
   1790                        unsigned int* nbytes_out );
   1791 
   1792 void BZ2_bzWriteClose64( int *bzerror, BZFILE* f,
   1793                          int abandon,
   1794                          unsigned int* nbytes_in_lo32,
   1795                          unsigned int* nbytes_in_hi32,
   1796                          unsigned int* nbytes_out_lo32,
   1797                          unsigned int* nbytes_out_hi32 );</pre>
   1798 <p>Compresses and flushes to the compressed file all data so
   1799 far supplied by <code class="computeroutput">BZ2_bzWrite</code>.
   1800 The logical end-of-stream markers are also written, so subsequent
   1801 calls to <code class="computeroutput">BZ2_bzWrite</code> are
   1802 illegal.  All memory associated with the compressed file
   1803 <code class="computeroutput">b</code> is released.
   1804 <code class="computeroutput">fflush</code> is called on the
   1805 compressed file, but it is not
   1806 <code class="computeroutput">fclose</code>'d.</p>
   1807 <p>If <code class="computeroutput">BZ2_bzWriteClose</code> is
   1808 called to clean up after an error, the only action is to release
   1809 the memory.  The library records the error codes issued by
   1810 previous calls, so this situation will be detected automatically.
   1811 There is no attempt to complete the compression operation, nor to
   1812 <code class="computeroutput">fflush</code> the compressed file.  You
   1813 can force this behaviour to happen even in the case of no error,
   1814 by passing a nonzero value to
   1815 <code class="computeroutput">abandon</code>.</p>
   1816 <p>If <code class="computeroutput">nbytes_in</code> is non-null,
   1817 <code class="computeroutput">*nbytes_in</code> will be set to be the
   1818 total volume of uncompressed data handled.  Similarly,
   1819 <code class="computeroutput">nbytes_out</code> will be set to the
   1820 total volume of compressed data written.  For compatibility with
   1821 older versions of the library,
   1822 <code class="computeroutput">BZ2_bzWriteClose</code> only yields the
   1823 lower 32 bits of these counts.  Use
   1824 <code class="computeroutput">BZ2_bzWriteClose64</code> if you want
   1825 the full 64 bit counts.  These two functions are otherwise
   1826 absolutely identical.</p>
   1827 <p>Possible assignments to
   1828 <code class="computeroutput">bzerror</code>:</p>
   1829 <pre class="programlisting">BZ_SEQUENCE_ERROR
   1830   if b was opened with BZ2_bzReadOpen
   1831 BZ_IO_ERROR
   1832   if there is an error writing the compressed file
   1833 BZ_OK
   1834   otherwise</pre>
   1835 </div>
   1836 <div class="sect2" title="3.4.8.Handling embedded compressed data streams">
   1837 <div class="titlepage"><div><div><h3 class="title">
   1838 <a name="embed"></a>3.4.8.Handling embedded compressed data streams</h3></div></div></div>
   1839 <p>The high-level library facilitates use of
   1840 <code class="computeroutput">bzip2</code> data streams which form
   1841 some part of a surrounding, larger data stream.</p>
   1842 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
   1843 <li class="listitem" style="list-style-type: disc"><p>For writing, the library takes an open file handle,
   1844   writes compressed data to it,
   1845   <code class="computeroutput">fflush</code>es it but does not
   1846   <code class="computeroutput">fclose</code> it.  The calling
   1847   application can write its own data before and after the
   1848   compressed data stream, using that same file handle.</p></li>
   1849 <li class="listitem" style="list-style-type: disc"><p>Reading is more complex, and the facilities are not as
   1850   general as they could be since generality is hard to reconcile
   1851   with efficiency.  <code class="computeroutput">BZ2_bzRead</code>
   1852   reads from the compressed file in blocks of size
   1853   <code class="computeroutput">BZ_MAX_UNUSED</code> bytes, and in
   1854   doing so probably will overshoot the logical end of compressed
   1855   stream.  To recover this data once decompression has ended,
   1856   call <code class="computeroutput">BZ2_bzReadGetUnused</code> after
   1857   the last call of <code class="computeroutput">BZ2_bzRead</code>
   1858   (the one returning
   1859   <code class="computeroutput">BZ_STREAM_END</code>) but before
   1860   calling
   1861   <code class="computeroutput">BZ2_bzReadClose</code>.</p></li>
   1862 </ul></div>
   1863 <p>This mechanism makes it easy to decompress multiple
   1864 <code class="computeroutput">bzip2</code> streams placed end-to-end.
   1865 As the end of one stream, when
   1866 <code class="computeroutput">BZ2_bzRead</code> returns
   1867 <code class="computeroutput">BZ_STREAM_END</code>, call
   1868 <code class="computeroutput">BZ2_bzReadGetUnused</code> to collect
   1869 the unused data (copy it into your own buffer somewhere).  That
   1870 data forms the start of the next compressed stream.  To start
   1871 uncompressing that next stream, call
   1872 <code class="computeroutput">BZ2_bzReadOpen</code> again, feeding in
   1873 the unused data via the <code class="computeroutput">unused</code> /
   1874 <code class="computeroutput">nUnused</code> parameters.  Keep doing
   1875 this until <code class="computeroutput">BZ_STREAM_END</code> return
   1876 coincides with the physical end of file
   1877 (<code class="computeroutput">feof(f)</code>).  In this situation
   1878 <code class="computeroutput">BZ2_bzReadGetUnused</code> will of
   1879 course return no data.</p>
   1880 <p>This should give some feel for how the high-level interface
   1881 can be used.  If you require extra flexibility, you'll have to
   1882 bite the bullet and get to grips with the low-level
   1883 interface.</p>
   1884 </div>
   1885 <div class="sect2" title="3.4.9.Standard file-reading/writing code">
   1886 <div class="titlepage"><div><div><h3 class="title">
   1887 <a name="std-rdwr"></a>3.4.9.Standard file-reading/writing code</h3></div></div></div>
   1888 <p>Here's how you'd write data to a compressed file:</p>
   1889 <pre class="programlisting">FILE*   f;
   1890 BZFILE* b;
   1891 int     nBuf;
   1892 char    buf[ /* whatever size you like */ ];
   1893 int     bzerror;
   1894 int     nWritten;
   1895 
   1896 f = fopen ( "myfile.bz2", "w" );
   1897 if ( !f ) {
   1898  /* handle error */
   1899 }
   1900 b = BZ2_bzWriteOpen( &amp;bzerror, f, 9 );
   1901 if (bzerror != BZ_OK) {
   1902  BZ2_bzWriteClose ( b );
   1903  /* handle error */
   1904 }
   1905 
   1906 while ( /* condition */ ) {
   1907  /* get data to write into buf, and set nBuf appropriately */
   1908  nWritten = BZ2_bzWrite ( &amp;bzerror, b, buf, nBuf );
   1909  if (bzerror == BZ_IO_ERROR) { 
   1910    BZ2_bzWriteClose ( &amp;bzerror, b );
   1911    /* handle error */
   1912  }
   1913 }
   1914 
   1915 BZ2_bzWriteClose( &amp;bzerror, b );
   1916 if (bzerror == BZ_IO_ERROR) {
   1917  /* handle error */
   1918 }</pre>
   1919 <p>And to read from a compressed file:</p>
   1920 <pre class="programlisting">FILE*   f;
   1921 BZFILE* b;
   1922 int     nBuf;
   1923 char    buf[ /* whatever size you like */ ];
   1924 int     bzerror;
   1925 int     nWritten;
   1926 
   1927 f = fopen ( "myfile.bz2", "r" );
   1928 if ( !f ) {
   1929   /* handle error */
   1930 }
   1931 b = BZ2_bzReadOpen ( &amp;bzerror, f, 0, NULL, 0 );
   1932 if ( bzerror != BZ_OK ) {
   1933   BZ2_bzReadClose ( &amp;bzerror, b );
   1934   /* handle error */
   1935 }
   1936 
   1937 bzerror = BZ_OK;
   1938 while ( bzerror == BZ_OK &amp;&amp; /* arbitrary other conditions */) {
   1939   nBuf = BZ2_bzRead ( &amp;bzerror, b, buf, /* size of buf */ );
   1940   if ( bzerror == BZ_OK ) {
   1941     /* do something with buf[0 .. nBuf-1] */
   1942   }
   1943 }
   1944 if ( bzerror != BZ_STREAM_END ) {
   1945    BZ2_bzReadClose ( &amp;bzerror, b );
   1946    /* handle error */
   1947 } else {
   1948    BZ2_bzReadClose ( &amp;bzerror, b );
   1949 }</pre>
   1950 </div>
   1951 </div>
   1952 <div class="sect1" title="3.5.Utility functions">
   1953 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   1954 <a name="util-fns"></a>3.5.Utility functions</h2></div></div></div>
   1955 <div class="sect2" title="3.5.1.BZ2_bzBuffToBuffCompress">
   1956 <div class="titlepage"><div><div><h3 class="title">
   1957 <a name="bzbufftobuffcompress"></a>3.5.1.BZ2_bzBuffToBuffCompress</h3></div></div></div>
   1958 <pre class="programlisting">int BZ2_bzBuffToBuffCompress( char*         dest,
   1959                               unsigned int* destLen,
   1960                               char*         source,
   1961                               unsigned int  sourceLen,
   1962                               int           blockSize100k,
   1963                               int           verbosity,
   1964                               int           workFactor );</pre>
   1965 <p>Attempts to compress the data in <code class="computeroutput">source[0
   1966 .. sourceLen-1]</code> into the destination buffer,
   1967 <code class="computeroutput">dest[0 .. *destLen-1]</code>.  If the
   1968 destination buffer is big enough,
   1969 <code class="computeroutput">*destLen</code> is set to the size of
   1970 the compressed data, and <code class="computeroutput">BZ_OK</code>
   1971 is returned.  If the compressed data won't fit,
   1972 <code class="computeroutput">*destLen</code> is unchanged, and
   1973 <code class="computeroutput">BZ_OUTBUFF_FULL</code> is
   1974 returned.</p>
   1975 <p>Compression in this manner is a one-shot event, done with a
   1976 single call to this function.  The resulting compressed data is a
   1977 complete <code class="computeroutput">bzip2</code> format data
   1978 stream.  There is no mechanism for making additional calls to
   1979 provide extra input data.  If you want that kind of mechanism,
   1980 use the low-level interface.</p>
   1981 <p>For the meaning of parameters
   1982 <code class="computeroutput">blockSize100k</code>,
   1983 <code class="computeroutput">verbosity</code> and
   1984 <code class="computeroutput">workFactor</code>, see
   1985 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
   1986 <p>To guarantee that the compressed data will fit in its
   1987 buffer, allocate an output buffer of size 1% larger than the
   1988 uncompressed data, plus six hundred extra bytes.</p>
   1989 <p><code class="computeroutput">BZ2_bzBuffToBuffDecompress</code>
   1990 will not write data at or beyond
   1991 <code class="computeroutput">dest[*destLen]</code>, even in case of
   1992 buffer overflow.</p>
   1993 <p>Possible return values:</p>
   1994 <pre class="programlisting">BZ_CONFIG_ERROR
   1995   if the library has been mis-compiled
   1996 BZ_PARAM_ERROR
   1997   if dest is NULL or destLen is NULL
   1998   or blockSize100k &lt; 1 or blockSize100k &gt; 9
   1999   or verbosity &lt; 0 or verbosity &gt; 4
   2000   or workFactor &lt; 0 or workFactor &gt; 250
   2001 BZ_MEM_ERROR
   2002   if insufficient memory is available 
   2003 BZ_OUTBUFF_FULL
   2004   if the size of the compressed data exceeds *destLen
   2005 BZ_OK
   2006   otherwise</pre>
   2007 </div>
   2008 <div class="sect2" title="3.5.2.BZ2_bzBuffToBuffDecompress">
   2009 <div class="titlepage"><div><div><h3 class="title">
   2010 <a name="bzbufftobuffdecompress"></a>3.5.2.BZ2_bzBuffToBuffDecompress</h3></div></div></div>
   2011 <pre class="programlisting">int BZ2_bzBuffToBuffDecompress( char*         dest,
   2012                                 unsigned int* destLen,
   2013                                 char*         source,
   2014                                 unsigned int  sourceLen,
   2015                                 int           small,
   2016                                 int           verbosity );</pre>
   2017 <p>Attempts to decompress the data in <code class="computeroutput">source[0
   2018 .. sourceLen-1]</code> into the destination buffer,
   2019 <code class="computeroutput">dest[0 .. *destLen-1]</code>.  If the
   2020 destination buffer is big enough,
   2021 <code class="computeroutput">*destLen</code> is set to the size of
   2022 the uncompressed data, and <code class="computeroutput">BZ_OK</code>
   2023 is returned.  If the compressed data won't fit,
   2024 <code class="computeroutput">*destLen</code> is unchanged, and
   2025 <code class="computeroutput">BZ_OUTBUFF_FULL</code> is
   2026 returned.</p>
   2027 <p><code class="computeroutput">source</code> is assumed to hold
   2028 a complete <code class="computeroutput">bzip2</code> format data
   2029 stream.
   2030 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> tries
   2031 to decompress the entirety of the stream into the output
   2032 buffer.</p>
   2033 <p>For the meaning of parameters
   2034 <code class="computeroutput">small</code> and
   2035 <code class="computeroutput">verbosity</code>, see
   2036 <code class="computeroutput">BZ2_bzDecompressInit</code>.</p>
   2037 <p>Because the compression ratio of the compressed data cannot
   2038 be known in advance, there is no easy way to guarantee that the
   2039 output buffer will be big enough.  You may of course make
   2040 arrangements in your code to record the size of the uncompressed
   2041 data, but such a mechanism is beyond the scope of this
   2042 library.</p>
   2043 <p><code class="computeroutput">BZ2_bzBuffToBuffDecompress</code>
   2044 will not write data at or beyond
   2045 <code class="computeroutput">dest[*destLen]</code>, even in case of
   2046 buffer overflow.</p>
   2047 <p>Possible return values:</p>
   2048 <pre class="programlisting">BZ_CONFIG_ERROR
   2049   if the library has been mis-compiled
   2050 BZ_PARAM_ERROR
   2051   if dest is NULL or destLen is NULL
   2052   or small != 0 &amp;&amp; small != 1
   2053   or verbosity &lt; 0 or verbosity &gt; 4
   2054 BZ_MEM_ERROR
   2055   if insufficient memory is available 
   2056 BZ_OUTBUFF_FULL
   2057   if the size of the compressed data exceeds *destLen
   2058 BZ_DATA_ERROR
   2059   if a data integrity error was detected in the compressed data
   2060 BZ_DATA_ERROR_MAGIC
   2061   if the compressed data doesn't begin with the right magic bytes
   2062 BZ_UNEXPECTED_EOF
   2063   if the compressed data ends unexpectedly
   2064 BZ_OK
   2065   otherwise</pre>
   2066 </div>
   2067 </div>
   2068 <div class="sect1" title="3.6.zlib compatibility functions">
   2069 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2070 <a name="zlib-compat"></a>3.6.zlib compatibility functions</h2></div></div></div>
   2071 <p>Yoshioka Tsuneo has contributed some functions to give
   2072 better <code class="computeroutput">zlib</code> compatibility.
   2073 These functions are <code class="computeroutput">BZ2_bzopen</code>,
   2074 <code class="computeroutput">BZ2_bzread</code>,
   2075 <code class="computeroutput">BZ2_bzwrite</code>,
   2076 <code class="computeroutput">BZ2_bzflush</code>,
   2077 <code class="computeroutput">BZ2_bzclose</code>,
   2078 <code class="computeroutput">BZ2_bzerror</code> and
   2079 <code class="computeroutput">BZ2_bzlibVersion</code>.  These
   2080 functions are not (yet) officially part of the library.  If they
   2081 break, you get to keep all the pieces.  Nevertheless, I think
   2082 they work ok.</p>
   2083 <pre class="programlisting">typedef void BZFILE;
   2084 
   2085 const char * BZ2_bzlibVersion ( void );</pre>
   2086 <p>Returns a string indicating the library version.</p>
   2087 <pre class="programlisting">BZFILE * BZ2_bzopen  ( const char *path, const char *mode );
   2088 BZFILE * BZ2_bzdopen ( int        fd,    const char *mode );</pre>
   2089 <p>Opens a <code class="computeroutput">.bz2</code> file for
   2090 reading or writing, using either its name or a pre-existing file
   2091 descriptor.  Analogous to <code class="computeroutput">fopen</code>
   2092 and <code class="computeroutput">fdopen</code>.</p>
   2093 <pre class="programlisting">int BZ2_bzread  ( BZFILE* b, void* buf, int len );
   2094 int BZ2_bzwrite ( BZFILE* b, void* buf, int len );</pre>
   2095 <p>Reads/writes data from/to a previously opened
   2096 <code class="computeroutput">BZFILE</code>.  Analogous to
   2097 <code class="computeroutput">fread</code> and
   2098 <code class="computeroutput">fwrite</code>.</p>
   2099 <pre class="programlisting">int  BZ2_bzflush ( BZFILE* b );
   2100 void BZ2_bzclose ( BZFILE* b );</pre>
   2101 <p>Flushes/closes a <code class="computeroutput">BZFILE</code>.
   2102 <code class="computeroutput">BZ2_bzflush</code> doesn't actually do
   2103 anything.  Analogous to <code class="computeroutput">fflush</code>
   2104 and <code class="computeroutput">fclose</code>.</p>
   2105 <pre class="programlisting">const char * BZ2_bzerror ( BZFILE *b, int *errnum )</pre>
   2106 <p>Returns a string describing the more recent error status of
   2107 <code class="computeroutput">b</code>, and also sets
   2108 <code class="computeroutput">*errnum</code> to its numerical
   2109 value.</p>
   2110 </div>
   2111 <div class="sect1" title="3.7.Using the library in a stdio-free environment">
   2112 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2113 <a name="stdio-free"></a>3.7.Using the library in a stdio-free environment</h2></div></div></div>
   2114 <div class="sect2" title="3.7.1.Getting rid of stdio">
   2115 <div class="titlepage"><div><div><h3 class="title">
   2116 <a name="stdio-bye"></a>3.7.1.Getting rid of stdio</h3></div></div></div>
   2117 <p>In a deeply embedded application, you might want to use
   2118 just the memory-to-memory functions.  You can do this
   2119 conveniently by compiling the library with preprocessor symbol
   2120 <code class="computeroutput">BZ_NO_STDIO</code> defined.  Doing this
   2121 gives you a library containing only the following eight
   2122 functions:</p>
   2123 <p><code class="computeroutput">BZ2_bzCompressInit</code>,
   2124 <code class="computeroutput">BZ2_bzCompress</code>,
   2125 <code class="computeroutput">BZ2_bzCompressEnd</code>
   2126 <code class="computeroutput">BZ2_bzDecompressInit</code>,
   2127 <code class="computeroutput">BZ2_bzDecompress</code>,
   2128 <code class="computeroutput">BZ2_bzDecompressEnd</code>
   2129 <code class="computeroutput">BZ2_bzBuffToBuffCompress</code>,
   2130 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code></p>
   2131 <p>When compiled like this, all functions will ignore
   2132 <code class="computeroutput">verbosity</code> settings.</p>
   2133 </div>
   2134 <div class="sect2" title="3.7.2.Critical error handling">
   2135 <div class="titlepage"><div><div><h3 class="title">
   2136 <a name="critical-error"></a>3.7.2.Critical error handling</h3></div></div></div>
   2137 <p><code class="computeroutput">libbzip2</code> contains a number
   2138 of internal assertion checks which should, needless to say, never
   2139 be activated.  Nevertheless, if an assertion should fail,
   2140 behaviour depends on whether or not the library was compiled with
   2141 <code class="computeroutput">BZ_NO_STDIO</code> set.</p>
   2142 <p>For a normal compile, an assertion failure yields the
   2143 message:</p>
   2144 <div class="blockquote"><blockquote class="blockquote">
   2145 <p>bzip2/libbzip2: internal error number N.</p>
   2146 <p>This is a bug in bzip2/libbzip2, 1.0.6 of 6 September 2010.
   2147 Please report it to me at: jseward (a] bzip.org.  If this happened
   2148 when you were using some program which uses libbzip2 as a
   2149 component, you should also report this bug to the author(s)
   2150 of that program.  Please make an effort to report this bug;
   2151 timely and accurate bug reports eventually lead to higher
   2152 quality software.  Thanks.  Julian Seward, 6 September 2010.
   2153 </p>
   2154 </blockquote></div>
   2155 <p>where <code class="computeroutput">N</code> is some error code
   2156 number.  If <code class="computeroutput">N == 1007</code>, it also
   2157 prints some extra text advising the reader that unreliable memory
   2158 is often associated with internal error 1007. (This is a
   2159 frequently-observed-phenomenon with versions 1.0.0/1.0.1).</p>
   2160 <p><code class="computeroutput">exit(3)</code> is then
   2161 called.</p>
   2162 <p>For a <code class="computeroutput">stdio</code>-free library,
   2163 assertion failures result in a call to a function declared
   2164 as:</p>
   2165 <pre class="programlisting">extern void bz_internal_error ( int errcode );</pre>
   2166 <p>The relevant code is passed as a parameter.  You should
   2167 supply such a function.</p>
   2168 <p>In either case, once an assertion failure has occurred, any
   2169 <code class="computeroutput">bz_stream</code> records involved can
   2170 be regarded as invalid.  You should not attempt to resume normal
   2171 operation with them.</p>
   2172 <p>You may, of course, change critical error handling to suit
   2173 your needs.  As I said above, critical errors indicate bugs in
   2174 the library and should not occur.  All "normal" error situations
   2175 are indicated via error return codes from functions, and can be
   2176 recovered from.</p>
   2177 </div>
   2178 </div>
   2179 <div class="sect1" title="3.8.Making a Windows DLL">
   2180 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2181 <a name="win-dll"></a>3.8.Making a Windows DLL</h2></div></div></div>
   2182 <p>Everything related to Windows has been contributed by
   2183 Yoshioka Tsuneo
   2184 (<code class="computeroutput">tsuneo (a] rr.iij4u.or.jp</code>), so
   2185 you should send your queries to him (but perhaps Cc: me,
   2186 <code class="computeroutput">jseward (a] bzip.org</code>).</p>
   2187 <p>My vague understanding of what to do is: using Visual C++
   2188 5.0, open the project file
   2189 <code class="computeroutput">libbz2.dsp</code>, and build.  That's
   2190 all.</p>
   2191 <p>If you can't open the project file for some reason, make a
   2192 new one, naming these files:
   2193 <code class="computeroutput">blocksort.c</code>,
   2194 <code class="computeroutput">bzlib.c</code>,
   2195 <code class="computeroutput">compress.c</code>,
   2196 <code class="computeroutput">crctable.c</code>,
   2197 <code class="computeroutput">decompress.c</code>,
   2198 <code class="computeroutput">huffman.c</code>,
   2199 <code class="computeroutput">randtable.c</code> and
   2200 <code class="computeroutput">libbz2.def</code>.  You will also need
   2201 to name the header files <code class="computeroutput">bzlib.h</code>
   2202 and <code class="computeroutput">bzlib_private.h</code>.</p>
   2203 <p>If you don't use VC++, you may need to define the
   2204 proprocessor symbol
   2205 <code class="computeroutput">_WIN32</code>.</p>
   2206 <p>Finally, <code class="computeroutput">dlltest.c</code> is a
   2207 sample program using the DLL.  It has a project file,
   2208 <code class="computeroutput">dlltest.dsp</code>.</p>
   2209 <p>If you just want a makefile for Visual C, have a look at
   2210 <code class="computeroutput">makefile.msc</code>.</p>
   2211 <p>Be aware that if you compile
   2212 <code class="computeroutput">bzip2</code> itself on Win32, you must
   2213 set <code class="computeroutput">BZ_UNIX</code> to 0 and
   2214 <code class="computeroutput">BZ_LCCWIN32</code> to 1, in the file
   2215 <code class="computeroutput">bzip2.c</code>, before compiling.
   2216 Otherwise the resulting binary won't work correctly.</p>
   2217 <p>I haven't tried any of this stuff myself, but it all looks
   2218 plausible.</p>
   2219 </div>
   2220 </div>
   2221 <div class="chapter" title="4.Miscellanea">
   2222 <div class="titlepage"><div><div><h2 class="title">
   2223 <a name="misc"></a>4.Miscellanea</h2></div></div></div>
   2224 <div class="toc">
   2225 <p><b>Table of Contents</b></p>
   2226 <dl>
   2227 <dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
   2228 <dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
   2229 <dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
   2230 <dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
   2231 <dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
   2232 </dl>
   2233 </div>
   2234 <p>These are just some random thoughts of mine.  Your mileage
   2235 may vary.</p>
   2236 <div class="sect1" title="4.1.Limitations of the compressed file format">
   2237 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2238 <a name="limits"></a>4.1.Limitations of the compressed file format</h2></div></div></div>
   2239 <p><code class="computeroutput">bzip2-1.0.X</code>,
   2240 <code class="computeroutput">0.9.5</code> and
   2241 <code class="computeroutput">0.9.0</code> use exactly the same file
   2242 format as the original version,
   2243 <code class="computeroutput">bzip2-0.1</code>.  This decision was
   2244 made in the interests of stability.  Creating yet another
   2245 incompatible compressed file format would create further
   2246 confusion and disruption for users.</p>
   2247 <p>Nevertheless, this is not a painless decision.  Development
   2248 work since the release of
   2249 <code class="computeroutput">bzip2-0.1</code> in August 1997 has
   2250 shown complexities in the file format which slow down
   2251 decompression and, in retrospect, are unnecessary.  These
   2252 are:</p>
   2253 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
   2254 <li class="listitem" style="list-style-type: disc"><p>The run-length encoder, which is the first of the
   2255    compression transformations, is entirely irrelevant.  The
   2256    original purpose was to protect the sorting algorithm from the
   2257    very worst case input: a string of repeated symbols.  But
   2258    algorithm steps Q6a and Q6b in the original Burrows-Wheeler
   2259    technical report (SRC-124) show how repeats can be handled
   2260    without difficulty in block sorting.</p></li>
   2261 <li class="listitem" style="list-style-type: disc">
   2262 <p>The randomisation mechanism doesn't really need to be
   2263    there.  Udi Manber and Gene Myers published a suffix array
   2264    construction algorithm a few years back, which can be employed
   2265    to sort any block, no matter how repetitive, in O(N log N)
   2266    time.  Subsequent work by Kunihiko Sadakane has produced a
   2267    derivative O(N (log N)^2) algorithm which usually outperforms
   2268    the Manber-Myers algorithm.</p>
   2269 <p>I could have changed to Sadakane's algorithm, but I find
   2270    it to be slower than <code class="computeroutput">bzip2</code>'s
   2271    existing algorithm for most inputs, and the randomisation
   2272    mechanism protects adequately against bad cases.  I didn't
   2273    think it was a good tradeoff to make.  Partly this is due to
   2274    the fact that I was not flooded with email complaints about
   2275    <code class="computeroutput">bzip2-0.1</code>'s performance on
   2276    repetitive data, so perhaps it isn't a problem for real
   2277    inputs.</p>
   2278 <p>Probably the best long-term solution, and the one I have
   2279    incorporated into 0.9.5 and above, is to use the existing
   2280    sorting algorithm initially, and fall back to a O(N (log N)^2)
   2281    algorithm if the standard algorithm gets into
   2282    difficulties.</p>
   2283 </li>
   2284 <li class="listitem" style="list-style-type: disc"><p>The compressed file format was never designed to be
   2285    handled by a library, and I have had to jump though some hoops
   2286    to produce an efficient implementation of decompression.  It's
   2287    a bit hairy.  Try passing
   2288    <code class="computeroutput">decompress.c</code> through the C
   2289    preprocessor and you'll see what I mean.  Much of this
   2290    complexity could have been avoided if the compressed size of
   2291    each block of data was recorded in the data stream.</p></li>
   2292 <li class="listitem" style="list-style-type: disc"><p>An Adler-32 checksum, rather than a CRC32 checksum,
   2293    would be faster to compute.</p></li>
   2294 </ul></div>
   2295 <p>It would be fair to say that the
   2296 <code class="computeroutput">bzip2</code> format was frozen before I
   2297 properly and fully understood the performance consequences of
   2298 doing so.</p>
   2299 <p>Improvements which I was able to incorporate into 0.9.0,
   2300 despite using the same file format, are:</p>
   2301 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
   2302 <li class="listitem" style="list-style-type: disc"><p>Single array implementation of the inverse BWT.  This
   2303   significantly speeds up decompression, presumably because it
   2304   reduces the number of cache misses.</p></li>
   2305 <li class="listitem" style="list-style-type: disc"><p>Faster inverse MTF transform for large MTF values.
   2306   The new implementation is based on the notion of sliding blocks
   2307   of values.</p></li>
   2308 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2-0.9.0</code> now reads
   2309   and writes files with <code class="computeroutput">fread</code>
   2310   and <code class="computeroutput">fwrite</code>; version 0.1 used
   2311   <code class="computeroutput">putc</code> and
   2312   <code class="computeroutput">getc</code>.  Duh!  Well, you live
   2313   and learn.</p></li>
   2314 </ul></div>
   2315 <p>Further ahead, it would be nice to be able to do random
   2316 access into files.  This will require some careful design of
   2317 compressed file formats.</p>
   2318 </div>
   2319 <div class="sect1" title="4.2.Portability issues">
   2320 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2321 <a name="port-issues"></a>4.2.Portability issues</h2></div></div></div>
   2322 <p>After some consideration, I have decided not to use GNU
   2323 <code class="computeroutput">autoconf</code> to configure 0.9.5 or
   2324 1.0.</p>
   2325 <p><code class="computeroutput">autoconf</code>, admirable and
   2326 wonderful though it is, mainly assists with portability problems
   2327 between Unix-like platforms.  But
   2328 <code class="computeroutput">bzip2</code> doesn't have much in the
   2329 way of portability problems on Unix; most of the difficulties
   2330 appear when porting to the Mac, or to Microsoft's operating
   2331 systems.  <code class="computeroutput">autoconf</code> doesn't help
   2332 in those cases, and brings in a whole load of new
   2333 complexity.</p>
   2334 <p>Most people should be able to compile the library and
   2335 program under Unix straight out-of-the-box, so to speak,
   2336 especially if you have a version of GNU C available.</p>
   2337 <p>There are a couple of
   2338 <code class="computeroutput">__inline__</code> directives in the
   2339 code.  GNU C (<code class="computeroutput">gcc</code>) should be
   2340 able to handle them.  If you're not using GNU C, your C compiler
   2341 shouldn't see them at all.  If your compiler does, for some
   2342 reason, see them and doesn't like them, just
   2343 <code class="computeroutput">#define</code>
   2344 <code class="computeroutput">__inline__</code> to be
   2345 <code class="computeroutput">/* */</code>.  One easy way to do this
   2346 is to compile with the flag
   2347 <code class="computeroutput">-D__inline__=</code>, which should be
   2348 understood by most Unix compilers.</p>
   2349 <p>If you still have difficulties, try compiling with the
   2350 macro <code class="computeroutput">BZ_STRICT_ANSI</code> defined.
   2351 This should enable you to build the library in a strictly ANSI
   2352 compliant environment.  Building the program itself like this is
   2353 dangerous and not supported, since you remove
   2354 <code class="computeroutput">bzip2</code>'s checks against
   2355 compressing directories, symbolic links, devices, and other
   2356 not-really-a-file entities.  This could cause filesystem
   2357 corruption!</p>
   2358 <p>One other thing: if you create a
   2359 <code class="computeroutput">bzip2</code> binary for public distribution,
   2360 please consider linking it statically (<code class="computeroutput">gcc
   2361 -static</code>).  This avoids all sorts of library-version
   2362 issues that others may encounter later on.</p>
   2363 <p>If you build <code class="computeroutput">bzip2</code> on
   2364 Win32, you must set <code class="computeroutput">BZ_UNIX</code> to 0
   2365 and <code class="computeroutput">BZ_LCCWIN32</code> to 1, in the
   2366 file <code class="computeroutput">bzip2.c</code>, before compiling.
   2367 Otherwise the resulting binary won't work correctly.</p>
   2368 </div>
   2369 <div class="sect1" title="4.3.Reporting bugs">
   2370 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2371 <a name="bugs"></a>4.3.Reporting bugs</h2></div></div></div>
   2372 <p>I tried pretty hard to make sure
   2373 <code class="computeroutput">bzip2</code> is bug free, both by
   2374 design and by testing.  Hopefully you'll never need to read this
   2375 section for real.</p>
   2376 <p>Nevertheless, if <code class="computeroutput">bzip2</code> dies
   2377 with a segmentation fault, a bus error or an internal assertion
   2378 failure, it will ask you to email me a bug report.  Experience from
   2379 years of feedback of bzip2 users indicates that almost all these
   2380 problems can be traced to either compiler bugs or hardware
   2381 problems.</p>
   2382 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
   2383 <li class="listitem" style="list-style-type: disc">
   2384 <p>Recompile the program with no optimisation, and
   2385   see if it works.  And/or try a different compiler.  I heard all
   2386   sorts of stories about various flavours of GNU C (and other
   2387   compilers) generating bad code for
   2388   <code class="computeroutput">bzip2</code>, and I've run across two
   2389   such examples myself.</p>
   2390 <p>2.7.X versions of GNU C are known to generate bad code
   2391   from time to time, at high optimisation levels.  If you get
   2392   problems, try using the flags
   2393   <code class="computeroutput">-O2</code>
   2394   <code class="computeroutput">-fomit-frame-pointer</code>
   2395   <code class="computeroutput">-fno-strength-reduce</code>.  You
   2396   should specifically <span class="emphasis"><em>not</em></span> use
   2397   <code class="computeroutput">-funroll-loops</code>.</p>
   2398 <p>You may notice that the Makefile runs six tests as part
   2399   of the build process.  If the program passes all of these, it's
   2400   a pretty good (but not 100%) indication that the compiler has
   2401   done its job correctly.</p>
   2402 </li>
   2403 <li class="listitem" style="list-style-type: disc">
   2404 <p>If <code class="computeroutput">bzip2</code>
   2405   crashes randomly, and the crashes are not repeatable, you may
   2406   have a flaky memory subsystem.
   2407   <code class="computeroutput">bzip2</code> really hammers your
   2408   memory hierarchy, and if it's a bit marginal, you may get these
   2409   problems.  Ditto if your disk or I/O subsystem is slowly
   2410   failing.  Yup, this really does happen.</p>
   2411 <p>Try using a different machine of the same type, and see
   2412   if you can repeat the problem.</p>
   2413 </li>
   2414 <li class="listitem" style="list-style-type: disc"><p>This isn't really a bug, but ... If
   2415   <code class="computeroutput">bzip2</code> tells you your file is
   2416   corrupted on decompression, and you obtained the file via FTP,
   2417   there is a possibility that you forgot to tell FTP to do a
   2418   binary mode transfer.  That absolutely will cause the file to
   2419   be non-decompressible.  You'll have to transfer it
   2420   again.</p></li>
   2421 </ul></div>
   2422 <p>If you've incorporated
   2423 <code class="computeroutput">libbzip2</code> into your own program
   2424 and are getting problems, please, please, please, check that the
   2425 parameters you are passing in calls to the library, are correct,
   2426 and in accordance with what the documentation says is allowable.
   2427 I have tried to make the library robust against such problems,
   2428 but I'm sure I haven't succeeded.</p>
   2429 <p>Finally, if the above comments don't help, you'll have to
   2430 send me a bug report.  Now, it's just amazing how many people
   2431 will send me a bug report saying something like:</p>
   2432 <pre class="programlisting">bzip2 crashed with segmentation fault on my machine</pre>
   2433 <p>and absolutely nothing else.  Needless to say, a such a
   2434 report is <span class="emphasis"><em>totally, utterly, completely and
   2435 comprehensively 100% useless; a waste of your time, my time, and
   2436 net bandwidth</em></span>.  With no details at all, there's no way
   2437 I can possibly begin to figure out what the problem is.</p>
   2438 <p>The rules of the game are: facts, facts, facts.  Don't omit
   2439 them because "oh, they won't be relevant".  At the bare
   2440 minimum:</p>
   2441 <pre class="programlisting">Machine type.  Operating system version.  
   2442 Exact version of bzip2 (do bzip2 -V).  
   2443 Exact version of the compiler used.  
   2444 Flags passed to the compiler.</pre>
   2445 <p>However, the most important single thing that will help me
   2446 is the file that you were trying to compress or decompress at the
   2447 time the problem happened.  Without that, my ability to do
   2448 anything more than speculate about the cause, is limited.</p>
   2449 </div>
   2450 <div class="sect1" title="4.4.Did you get the right package?">
   2451 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2452 <a name="package"></a>4.4.Did you get the right package?</h2></div></div></div>
   2453 <p><code class="computeroutput">bzip2</code> is a resource hog.
   2454 It soaks up large amounts of CPU cycles and memory.  Also, it
   2455 gives very large latencies.  In the worst case, you can feed many
   2456 megabytes of uncompressed data into the library before getting
   2457 any compressed output, so this probably rules out applications
   2458 requiring interactive behaviour.</p>
   2459 <p>These aren't faults of my implementation, I hope, but more
   2460 an intrinsic property of the Burrows-Wheeler transform
   2461 (unfortunately).  Maybe this isn't what you want.</p>
   2462 <p>If you want a compressor and/or library which is faster,
   2463 uses less memory but gets pretty good compression, and has
   2464 minimal latency, consider Jean-loup Gailly's and Mark Adler's
   2465 work, <code class="computeroutput">zlib-1.2.1</code> and
   2466 <code class="computeroutput">gzip-1.2.4</code>.  Look for them at 
   2467 <a class="ulink" href="http://www.zlib.org" target="_top">http://www.zlib.org</a> and 
   2468 <a class="ulink" href="http://www.gzip.org" target="_top">http://www.gzip.org</a>
   2469 respectively.</p>
   2470 <p>For something faster and lighter still, you might try Markus F
   2471 X J Oberhumer's <code class="computeroutput">LZO</code> real-time
   2472 compression/decompression library, at 
   2473 <a class="ulink" href="http://www.oberhumer.com/opensource" target="_top">http://www.oberhumer.com/opensource</a>.</p>
   2474 </div>
   2475 <div class="sect1" title="4.5.Further Reading">
   2476 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   2477 <a name="reading"></a>4.5.Further Reading</h2></div></div></div>
   2478 <p><code class="computeroutput">bzip2</code> is not research
   2479 work, in the sense that it doesn't present any new ideas.
   2480 Rather, it's an engineering exercise based on existing
   2481 ideas.</p>
   2482 <p>Four documents describe essentially all the ideas behind
   2483 <code class="computeroutput">bzip2</code>:</p>
   2484 <div class="literallayout"><p>MichaelBurrowsandD.J.Wheeler:<br>
   2485 "Ablock-sortinglosslessdatacompressionalgorithm"<br>
   2486 10thMay1994.<br>
   2487 DigitalSRCResearchReport124.<br>
   2488 ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz<br>
   2489 Ifyouhavetroublefindingit,trysearchingatthe<br>
   2490 NewZealandDigitalLibrary,http://www.nzdl.org.<br>
   2491 <br>
   2492 DanielS.HirschbergandDebraA.LeLewer<br>
   2493 "EfficientDecodingofPrefixCodes"<br>
   2494 CommunicationsoftheACM,April1990,Vol33,Number4.<br>
   2495 Youmightbeabletogetanelectroniccopyofthis<br>
   2496 fromtheACMDigitalLibrary.<br>
   2497 <br>
   2498 DavidJ.Wheeler<br>
   2499 Programbred3.candaccompanyingdocumentbred3.ps.<br>
   2500 Thiscontainstheideabehindthemulti-tableHuffmancodingscheme.<br>
   2501 ftp://ftp.cl.cam.ac.uk/users/djw3/<br>
   2502 <br>
   2503 JonL.BentleyandRobertSedgewick<br>
   2504 "FastAlgorithmsforSortingandSearchingStrings"<br>
   2505 AvailablefromSedgewick'swebpage,<br>
   2506 www.cs.princeton.edu/~rs<br>
   2507 </p></div>
   2508 <p>The following paper gives valuable additional insights into
   2509 the algorithm, but is not immediately the basis of any code used
   2510 in bzip2.</p>
   2511 <div class="literallayout"><p>PeterFenwick:<br>
   2512 BlockSortingTextCompression<br>
   2513 Proceedingsofthe19thAustralasianComputerScienceConference,<br>
   2514 Melbourne,Australia.Jan31-Feb2,1996.<br>
   2515 ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</p></div>
   2516 <p>Kunihiko Sadakane's sorting algorithm, mentioned above, is
   2517 available from:</p>
   2518 <div class="literallayout"><p>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz<br>
   2519 </p></div>
   2520 <p>The Manber-Myers suffix array construction algorithm is
   2521 described in a paper available from:</p>
   2522 <div class="literallayout"><p>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps<br>
   2523 </p></div>
   2524 <p>Finally, the following papers document some
   2525 investigations I made into the performance of sorting
   2526 and decompression algorithms:</p>
   2527 <div class="literallayout"><p>JulianSeward<br>
   2528 OnthePerformanceofBWTSortingAlgorithms<br>
   2529 ProceedingsoftheIEEEDataCompressionConference2000<br>
   2530 Snowbird,Utah.28-30March2000.<br>
   2531 <br>
   2532 JulianSeward<br>
   2533 Space-timeTradeoffsintheInverseB-WTransform<br>
   2534 ProceedingsoftheIEEEDataCompressionConference2001<br>
   2535 Snowbird,Utah.27-29March2001.<br>
   2536 </p></div>
   2537 </div>
   2538 </div>
   2539 </div></body>
   2540 </html>
   2541