Home | History | Annotate | Download | only in xdoc
      1 <?xml version="1.0"?>
      2 <!--
      3 
      4    Licensed to the Apache Software Foundation (ASF) under one or more
      5    contributor license agreements.  See the NOTICE file distributed with
      6    this work for additional information regarding copyright ownership.
      7    The ASF licenses this file to You under the Apache License, Version 2.0
      8    (the "License"); you may not use this file except in compliance with
      9    the License.  You may obtain a copy of the License at
     10 
     11        http://www.apache.org/licenses/LICENSE-2.0
     12 
     13    Unless required by applicable law or agreed to in writing, software
     14    distributed under the License is distributed on an "AS IS" BASIS,
     15    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16    See the License for the specific language governing permissions and
     17    limitations under the License.
     18 
     19 -->
     20 <document>
     21   <properties>
     22     <title>Commons Compress TAR package</title>
     23     <author email="dev (a] commons.apache.org">Commons Documentation Team</author>
     24   </properties>
     25   <body>
     26     <section name="The TAR package">
     27 
     28       <p>In addition to the information stored
     29       in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code>
     30       stores various attributes including information about the
     31       original owner and permissions.</p>
     32 
     33       <p>There are several different dialects of the TAR format, maybe
     34       even different TAR formats. The tar package contains special
     35       cases in order to read many of the existing dialects and will by
     36       default try to create archives in the original format (often
     37       called "ustar"). This original format didn't support file names
     38       longer than 100 characters or bigger than 8 GiB and the tar
     39       package will by default fail if you try to write an entry that
     40       goes beyond those limits. "ustar" is the common denominator of
     41       all the existing tar dialects and is understood by most of the
     42       existing tools.</p>
     43 
     44       <p>The tar package does not support the full POSIX tar standard
     45       nor more modern GNU extension of said standard.</p>
     46 
     47       <subsection name="Long File Names">
     48 
     49         <p>The <code>longFileMode</code> option of
     50         <code>TarArchiveOutputStream</code> controls how files with
     51         names longer than 100 characters are handled.  The possible
     52         choices are:</p>
     53 
     54         <ul>
     55           <li><code>LONGFILE_ERROR</code>: throw an exception if such a
     56           file is added.  This is the default.</li>
     57           <li><code>LONGFILE_TRUNCATE</code>: truncate such names.</li>
     58           <li><code>LONGFILE_GNU</code>: use a GNU tar variant now
     59           refered to as "oldgnu" of storing such names.  If you choose
     60           the GNU tar option, the archive can not be extracted using
     61           many other tar implementations like the ones of OpenBSD,
     62           Solaris or MacOS X.</li> 
     63           <li><code>LONGFILE_POSIX</code>: use a PAX <a
     64           href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
     65           header</a> as defined by POSIX 1003.1.  Most modern tar
     66           implementations are able to extract such archives. <em>since
     67           Commons Compress 1.4</em></li>
     68         </ul>
     69 
     70         <p><code>TarArchiveInputStream</code> will recognize the GNU
     71         tar as well as the POSIX extensions (starting with Commons
     72         Compress 1.2) for long file names and reads the longer names
     73         transparently.</p>
     74       </subsection>
     75 
     76       <subsection name="Big Numeric Values">
     77 
     78         <p>The <code>bigNumberMode</code> option of
     79         <code>TarArchiveOutputStream</code> controls how files larger
     80         than 8GiB or with other big numeric values that can't be
     81         encoded in traditional header fields are handled.  The
     82         possible choices are:</p>
     83 
     84         <ul>
     85           <li><code>BIGNUMBER_ERROR</code>: throw an exception if such an
     86           entry is added.  This is the default.</li>
     87           <li><code>BIGNUMBER_STAR</code>: use a variant first
     88           introduced by J&#xf6;rg Schilling's <a
     89           href="http://developer.berlios.de/projects/star">star</a>
     90           and later adopted by GNU and BSD tar.  This method is not
     91           supported by all implementations.</li>
     92           <li><code>BIGNUMBER_POSIX</code>: use a PAX <a
     93           href="http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_03">extended
     94           header</a> as defined by POSIX 1003.1.  Most modern tar
     95           implementations are able to extract such archives.</li>
     96         </ul>
     97 
     98         <p>Starting with Commons Compress 1.4
     99         <code>TarArchiveInputStream</code> will recognize the star as
    100         well as the POSIX extensions for big numeric values and reads them
    101         transparently.</p>
    102       </subsection>
    103 
    104       <subsection name="File Name Encoding">
    105         <p>The original ustar format only supports 7-Bit ASCII file
    106         names, later implementations use the platform's default
    107         encoding to encode file names.  The POSIX standard recommends
    108         using PAX extension headers for non-ASCII file names
    109         instead.</p>
    110 
    111         <p>Commons Compress 1.1 to 1.3 assumed file names would be
    112         encoded using ISO-8859-1.  Starting with Commons Compress 1.4
    113         you can specify the encoding to expect (to use when writing)
    114         as a parameter to <code>TarArchiveInputStream</code>
    115         (<code>TarArchiveOutputStream</code>), it now defaults to the
    116         platform's default encoding.</p>
    117 
    118         <p>Since Commons Compress 1.4 another optional parameter -
    119         <code>addPaxHeadersForNonAsciiNames</code> - of
    120         <code>TarArchiveOutputStream</code> controls whether PAX
    121         extension headers will be written for non-ASCII file names.
    122         By default they will not be written to preserve space.
    123         <code>TarArchiveInputStream</code> will read them
    124         transparently if present.</p>
    125       </subsection>
    126 
    127       <subsection name="Sparse files">
    128 
    129         <p><code>TarArchiveInputStream</code> will recognize sparse
    130         file entries stored using the "oldgnu" format
    131         (<code>-&#x2d;sparse-version=0.0</code> in GNU tar) but is not
    132         able to extract them correctly.  <a href="#Unsupported
    133         Features"><code>canReadEntryData</code></a> will return false
    134         on such entries.  The other variants of sparse files can
    135         currently not be detected at all.</p>
    136       </subsection>
    137 
    138       <subsection name="Consuming Archives Completely">
    139 
    140         <p>The end of a tar archive is signalled by two consecutive
    141         records of all zeros.  Unfortunately not all tar
    142         implementations adhere to this and some only write one record
    143         to end the archive.  Commons Compress will always write two
    144         records but stop reading an archive as soon as finds one
    145         record of all zeros.</p>
    146 
    147         <p>Prior to version 1.5 this could leave the second EOF record
    148         inside the stream when <code>getNextEntry</code> or
    149         <code>getNextTarEntry</code> returned <code>null</code>
    150         Starting with version 1.5 <code>TarArchiveInputStream</code>
    151         will try to read a second record as well if present,
    152         effectively consuming the archive completely.</p>
    153 
    154       </subsection>
    155 
    156       <subsection name="PAX Extended Header">
    157         <p>The tar package has supported reading PAX extended headers
    158         since 1.3 for local headers and 1.11 for global headers. The
    159         following entries of PAX headers are applied when reading:</p>
    160 
    161         <dl>
    162           <dt>path</dt>
    163           <dd>set the entry's name</dd>
    164 
    165           <dt>linkpath</dt>
    166           <dd>set the entry's link name</dd>
    167 
    168           <dt>gid</dt>
    169           <dd>set the entry's group id</dd>
    170 
    171           <dt>gname</dt>
    172           <dd>set the entry's group name</dd>
    173 
    174           <dt>uid</dt>
    175           <dd>set the entry's user id</dd>
    176 
    177           <dt>uname</dt>
    178           <dd>set the entry's user name</dd>
    179 
    180           <dt>size</dt>
    181           <dd>set the entry's size</dd>
    182 
    183           <dt>mtime</dt>
    184           <dd>set the entry's modification time</dd>
    185 
    186           <dt>SCHILY.devminor</dt>
    187           <dd>set the entry's minor device number</dd>
    188 
    189           <dt>SCHILY.devmajor</dt>
    190           <dd>set the entry's major device number</dd>
    191         </dl>
    192 
    193         <p>in addition some fields used by GNU tar and star used to
    194         signal sparse entries are supported and are used for the
    195         <code>is*GNUSparse</code> and <code>isStarSparse</code>
    196         methods.</p>
    197 
    198         <p>Some PAX extra headers may be set when writing archives,
    199         for example for non-ASCII names or big numeric values. This
    200         depends on various setting of the output stream - see the
    201         previous sections.</p>
    202 
    203         <p>Since 1.15 you can directly access all PAX extension
    204         headers that have been found when reading an entry or specify
    205         extra headers to be written to a (local) PAX extended header
    206         entry.</p>
    207 
    208         <p>Some hints if you try to set extended headers:</p>
    209 
    210         <ul>
    211           <li>pax header keywords should be ascii.  star/gnutar
    212           (SCHILY.xattr.* ) do not check for this.  libarchive/bsdtar
    213           (LIBARCHIVE.xattr.*) uses URL-Encoding.</li>
    214           <li>pax header values should be encoded as UTF-8 characters
    215           (including trailing <code>\0</code>).  star/gnutar
    216           (SCHILY.xattr.*) do not check for this.  libarchive/bsdtar
    217           (LIBARCHIVE.xattr.*) encode values using Base64.</li>
    218           <li>libarchive/bsdtar will read SCHILY.xattr headers, but
    219           will not generate them.</li>
    220           <li>gnutar will complain about LIBARCHIVE.xattr (and any
    221           other unknown) headers and will neither encode nor decode
    222           them.</li>
    223         </ul>
    224       </subsection>
    225 
    226     </section>
    227   </body>
    228 </document>
    229