Home | History | Annotate | only in /external/archive-patcher
Up to higher level directory
NameDateSize
.classpath22-Oct-20201K
.project22-Oct-2020397
Android.bp22-Oct-2020847
applier/22-Oct-2020
build.gradle22-Oct-2020205
explainer/22-Oct-2020
generator/22-Oct-2020
gradle/22-Oct-2020
gradle.properties22-Oct-2020855
gradlew22-Oct-20204.9K
gradlew.bat22-Oct-20202.3K
integrationtest/22-Oct-2020
LICENSE22-Oct-202012.2K
METADATA22-Oct-2020462
MODULE_LICENSE_APACHE222-Oct-20200
NOTICE22-Oct-202011.1K
OWNERS22-Oct-2020250
README.md22-Oct-202031.5K
sample/22-Oct-2020
settings.gradle22-Oct-2020145
shared/22-Oct-2020
sharedtest/22-Oct-2020
tools/22-Oct-2020

README.md

      1 # Archive Patcher Documentation
      2 
      3 Copyright 2016 Google Inc. All rights reserved.
      4 
      5 Licensed under the Apache License, Version 2.0 (the "License");
      6 you may not use this file except in compliance with the License.
      7 You may obtain a copy of the License at
      8 
      9      http://www.apache.org/licenses/LICENSE-2.0
     10 
     11 Unless required by applicable law or agreed to in writing, software
     12 distributed under the License is distributed on an "AS IS" BASIS,
     13 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     14 See the License for the specific language governing permissions and
     15 limitations under the License.
     16 
     17 ----
     18 
     19 # Table of Contents
     20 * [Introduction](#introduction)
     21 * [How It Works](#how-it-works)
     22  * [Generating a Patch](#generating-a-patch)
     23  * [Applying a Patch](#applying-a-patch)
     24  * [Handled Cases](#handled-cases)
     25 * [Sample Code: Generating a Patch](#sample-code-generating-a-patch)
     26 * [Sample Code: Applying a Patch](#sample-code-applying-a-patch)
     27 * [Background](#background)
     28 * [The File-by-File v1 Patch Format](#the-file-by-file-v1-patch-format)
     29  * [Old Archive Uncompression Op](#old-archive-uncompression-op)
     30  * [New Archive Recompression Op](#new-archive-recompression-op)
     31  * [Compression Settings](#compression-settings)
     32  * [Compatibility Window](#compatibility-window)
     33  * [Delta Descriptor Record](#delta-descriptor-record)
     34 * [Appendix](#appendix)
     35  * [Interesting Obstacles to Patching Archives](#interesting-obstacles-to-patching-archives)
     36  * [Areas For Improvement](#areas-for-improvement)
     37 * [Acknowledgements](#acknowledgements)
     38 
     39 # Introduction
     40 **Archive-patcher is an open-source project that allows space-efficient patching of zip archives.** Many common distribution formats (such as jar and apk) are valid zip archives; archive-patcher works with all of them.
     41 
     42 Because the patching process examines each individual file within the input archives, we refer to the process as **File-by-File patching** and an individual patch generated by that process as a **File-by-File patch**. Archive-patcher processes almost all zip files, but it is most efficient for zip files created with "standard" tools like PKWARE's 'zip', Oracle's 'jar', and Google's 'aapt'.
     43 
     44 By design, **File-by-File patches are uncompressed**. This allows freedom in choosing the best compression algorithms for a given use case. It is usually best to compress the patches for storage or transport.
     45 
     46 > *Note: Archive-patcher does not currently handle 'zip64' archives (archives supporting more than 65,535 files or containing files larger than 4GB in size).*
     47 
     48 # How It Works
     49 Archive-patcher **transforms** archives into a **delta-friendly space** to generate and apply a delta. This transformation involves uncompressing the compressed content that has changed, while leaving everything else alone. The patch applier then recompresses the content that has changed to create a perfect binary copy of the original input file. In v1, bsdiff is the delta algorithm used within the delta-friendly space. Much more information on this subject is available in the [Appendix](#appendix).
     50 
     51 Diagrams and examples follow. In these examples we will use an old archive and a new archive, each containing 3 files: foo.txt, bar.xml, and baz.lib:
     52 
     53 * **foo.txt** has changed its content between the old and new archives. It is uncompressed from both the old and new archives during transformation to the delta-friendly space. This will allow the delta between v1 and v2 of the file to be encoded efficiently.
     54 * **bar.xml** has also changed its content between the old and new archives. It is already uncompressed in the old and new archives, so it is left alone during transformation to the delta-friendly space. The delta between v1 and v2 of the file can already be encoded efficiently.
     55 * **baz.lib** has *not* changed between the old and new archives. It is left alone during transformation to the delta-friendly space because it has not changed and the delta for an unchanged file is trivially empty.
     56 
     57 ## Generating a Patch
     58 1. Determine which files in the new archive have changed from the old archive.
     59 2. Determine which of the changed files from (1) have deflate settings that can be determined and record those settings.
     60 3. Determine the original offsets and lengths of all files in (2) in both the old and new archives.
     61 4. Create delta-friendly versions of both the old and new archives, uncompressing the files from (2). The resulting intermediate artifacts are called **delta-friendly blobs**; they are no longer valid zip archives.
     62 5. Generate a delta between the old and new delta-friendly blobs from (4).
     63 6. Output the patch carrying the data from (2), (3) and (5).
     64 
     65 ```
     66 File-by-File v1: Patch Generation Overview
     67 
     68 
     69                       Delta-Friendly       Delta-Friendly
     70    Old Archive           Old Blob             New Blob            New Archive
     71  ----------------    ----------------     ----------------    ----------------
     72  |   foo.txt    |    |   foo.txt    |     |   foo.txt    |    |   foo.txt    |
     73  |   version 1  |    |   version 1  |     |   version 2  |    |   version 2  |
     74  | (compressed) |    |(uncompressed)|     |(uncompressed)|    | (compressed) |
     75  |--------------|    |              |     |              |    |--------------|
     76  |   bar.xml    |    |              |     |              |    |   bar.xml    |
     77  |   version 1  |    |--------------|     |--------------|    |   version 2  |
     78  |(uncompressed)|--->|   bar.xml    |----|   bar.xml    |<---|(uncompressed)|
     79  |--------------|    |   version 1  |  |  |   version 2  |    |--------------|
     80  |   baz.lib    |    |(uncompressed)|  |  |(uncompressed)|    |   baz.lib    |
     81  |   version 1  |    |--------------|  |  |--------------|    |   version 1  |
     82  | (compressed) |    |   baz.lib    |  |  |   baz.lib    |    | (compressed) |
     83  ----------------    |   version 1  |  |  |   version 1  |    ----------------
     84         |            | (compressed) |  |  | (compressed) |            |
     85         |            ----------------  |  ----------------            |
     86         v                              v                              v
     87  ----------------                 ----------                  ----------------
     88  |Uncompression |                 | delta  |                  |Recompression |
     89  |   metadata   |                 ----------                  |   metadata   |
     90  ----------------                      |                      ----------------
     91         |                              v                              |
     92         |                   ----------------------                    |
     93         ------------------>|  File-by-File v1   |<-------------------
     94                             |       Patch        |
     95                             ----------------------
     96 ```
     97 
     98 ## Applying a Patch
     99 1. Reconstruct the delta-friendly old blob using information from the patch.
    100 2. Apply the delta to the delta-friendly old blob generated in (1). This generates the delta-friendly new blob.
    101 3. Recompress the files in the delta-friendly new blob using information from the patch. The result is the "new archive" that was the original input to the patch generator.
    102 
    103 ```
    104 File-by-File v1: Patch Application Overview
    105 
    106 
    107                       Delta-Friendly       Delta-Friendly
    108    Old Archive           Old Blob             New Blob           New Archive
    109  ----------------    ----------------     ---------------     ----------------
    110  |   foo.txt    |    |   foo.txt    |     |   foo.txt    |    |   foo.txt    |
    111  |   version 1  |    |   version 1  |     |   version 2  |    |   version 2  |
    112  | (compressed) |    |(uncompressed)|     |(uncompressed)|    | (compressed) |
    113  |--------------|    |              |     |              |    |--------------|
    114  |   bar.xml    |    |              |     |              |    |   bar.xml    |
    115  |   version 1  |    |--------------|     |--------------|    |   version 2  |
    116  |(uncompressed)|-->|   bar.xml    |     |   bar.xml    |-->|(uncompressed)|
    117  |--------------| |  |   version 1  |     |   version 2  | |  |--------------|
    118  |   baz.lib    | |  |(uncompressed)|     |(uncompressed)| |  |   baz.lib    |
    119  |   version 1  | |  |--------------|     |--------------| |  |   version 1  |
    120  | (compressed) | |  |   baz.lib    |     |   baz.lib    | |  | (compressed) |
    121  ---------------- |  |   version 1  |     |   version 1  | |  ----------------
    122                   |  | (compressed) |     | (compressed) | |
    123                   |  ----------------     ---------------- |
    124                   |         |                    ^         |
    125  ---------------- |         |     ----------     |         |  ----------------
    126  |Uncompression |-         ---->| delta  |-----         --|Recompression |
    127  |   metadata   |                 ----------                  |   metadata   |
    128  ----------------                      ^                      ----------------
    129         ^                              |                              ^
    130         |                   ----------------------                    |
    131         -------------------|  File-by-File v1   |--------------------
    132                             |       Patch        |
    133                             ----------------------
    134 ```
    135 
    136 ## Handled Cases
    137 The examples above used two simple archives with 3 common files to help explain the process, but there is significantly more nuance in the implementation. The implementation searches for and handles changes of many types, including some trickier edge cases such as a file that changes compression level, becomes compressed or becomes uncompressed, or is renamed without changes.
    138 
    139 Files that are only in the *new* archive are always left alone, and the delta usually encodes them as a literal copy. Files that are only in the *old* archive are similarly left alone, and the delta usually just discards their bytes completely. And of course, files whose deflate settings cannot be inferred are left alone, since they cannot be recompressed and are therefore required to remain in their existing compressed form.
    140 
    141 > *Note: The v1 implementation does not detect files that are renamed and changed at the same time. This is the domain of similar-file detection, a feature deemed desirable - but not critical - for v1.*
    142 
    143 # Sample Code: Generating a Patch
    144 The following code snippet illustrates how to generate a patch and compress it with deflate compression. The example in the subsequent section shows how to apply such a patch.
    145 
    146 ```java
    147 import com.google.archivepatcher.generator.FileByFileV1DeltaGenerator;
    148 import com.google.archivepatcher.shared.DefaultDeflateCompatibilityWindow;
    149 import java.io.File;
    150 import java.io.FileOutputStream;
    151 import java.util.zip.Deflater;
    152 import java.util.zip.DeflaterOutputStream;
    153 
    154 /** Generate a patch; args are old file path, new file path, and patch file path. */
    155 public class SamplePatchGenerator {
    156   public static void main(String... args) throws Exception {
    157     if (!new DefaultDeflateCompatibilityWindow().isCompatible()) {
    158       System.err.println("zlib not compatible on this system");
    159       System.exit(-1);
    160     }
    161     File oldFile = new File(args[0]); // must be a zip archive
    162     File newFile = new File(args[1]); // must be a zip archive
    163     Deflater compressor = new Deflater(9, true); // to compress the patch
    164     try (FileOutputStream patchOut = new FileOutputStream(args[2]);
    165         DeflaterOutputStream compressedPatchOut =
    166             new DeflaterOutputStream(patchOut, compressor, 32768)) {
    167       new FileByFileV1DeltaGenerator().generateDelta(oldFile, newFile, compressedPatchOut);
    168       compressedPatchOut.finish();
    169       compressedPatchOut.flush();
    170     } finally {
    171       compressor.end();
    172     }
    173   }
    174 }
    175 ```
    176 
    177 # Sample Code: Applying a Patch
    178 The following code snippet illustrates how to apply a patch that was compressed with deflate compression, as in the previous example.
    179 
    180 ```java
    181 import com.google.archivepatcher.applier.FileByFileV1DeltaApplier;
    182 import com.google.archivepatcher.shared.DefaultDeflateCompatibilityWindow;
    183 import java.io.File;
    184 import java.io.FileInputStream;
    185 import java.io.FileOutputStream;
    186 import java.util.zip.Inflater;
    187 import java.util.zip.InflaterInputStream;
    188 
    189 /** Apply a patch; args are old file path, patch file path, and new file path. */
    190 public class SamplePatchApplier {
    191   public static void main(String... args) throws Exception {
    192     if (!new DefaultDeflateCompatibilityWindow().isCompatible()) {
    193       System.err.println("zlib not compatible on this system");
    194       System.exit(-1);
    195     }
    196     File oldFile = new File(args[0]); // must be a zip archive
    197     Inflater uncompressor = new Inflater(true); // to uncompress the patch
    198     try (FileInputStream compressedPatchIn = new FileInputStream(args[1]);
    199         InflaterInputStream patchIn =
    200             new InflaterInputStream(compressedPatchIn, uncompressor, 32768);
    201         FileOutputStream newFileOut = new FileOutputStream(args[2])) {
    202       new FileByFileV1DeltaApplier().applyDelta(oldFile, patchIn, newFileOut);
    203     } finally {
    204       uncompressor.end();
    205     }
    206   }
    207 }
    208 ```
    209 
    210 # Background
    211 Patching software exists primarily to make updating software or data files **spatially efficient**. This is accomplished by figuring out what has changed between the inputs (usually an old version and a new version of a given file) and transmitting **only the changes** instead of transmitting the entire file. For example, if we wanted to update a dictionary with one new definition, it's much more efficient to send just the one updated definition than to send along a brand new dictionary! A number of excellent algorithms exist to do just this - diff, bsdiff, xdelta and many more.
    212 
    213 In order to generate **spatially efficient** patches for zip archives, the content within the zip archives needs to be uncompressed. This necessitates recompressing after applying a patch, and this in turn requires knowing the settings that were originally used to compress the data within the zip archive and being able to reproduce them exactly. These three problems are what make patching zip archives a unique challenge, and their solutions are what make archive-patcher interesting. If you'd like to read more about this now, skip down to [Interesting Obstacles to Patching Archives](#interesting-obstacles-to-patching-archives).
    214 
    215 # The File-by-File v1 Patch Format
    216 The v1 patch format is a sequence of bytes described below. Care has been taken to make the format friendly to streaming, so the order of fields in the patch is intended to reflect the order of operationsneededtoapplythe patch. Unlessotherwise noted, the following constraintsapply:
    217 
    218 * Allintegerfieldscontain **unsigned**, **big endian** values. However:
    219  * 32-bitintegerfieldshave a maximumvalue of 2^31 - 1 (due to limitationsin Java)
    220  * 64-bitintegerfieldshave a maximumvalue of 2^63 - 1 (due to limitationsin Java)
    221 
    222 ```
    223 |------------------------------------------------------|
    224 | Versioned Identifier (8 bytes) (UTF-8 text)          | Literal: "GFbFv1_0"
    225 |------------------------------------------------------|
    226 | Flags (4 bytes) (currently unused, but reserved)     |
    227 |------------------------------------------------------|
    228 | Delta-friendly old archive size (8 bytes) (uint64)   |
    229 |------------------------------------------------------|
    230 | Num old archive uncompression ops (4 bytes) (uint32) |
    231 |------------------------------------------------------|
    232 | Old archive uncompression op 1...n (variable length) | (see definition below)
    233 |------------------------------------------------------|
    234 | Num new archive recompression ops (4 bytes) (uint32) |
    235 |------------------------------------------------------|
    236 | New archive recompression op 1...n (variable length) | (see definition below)
    237 |------------------------------------------------------|
    238 | Num delta descriptor records (4 bytes) (uint32)      |
    239 |------------------------------------------------------|
    240 | Delta descriptor record 1...n (variable legth)       | (see definition below)
    241 |------------------------------------------------------|
    242 | Delta 1...n (variable length)                        | (see definition below)
    243 |------------------------------------------------------|
    244 ```
    245 
    246 ## Old Archive Uncompression Op
    247 The number of these entries is determined by the "Num old archive uncompression ops" field previously defined. Each entry consists of an offset (from the beginning of the file) and a number of bytes to uncompress. Important notes:
    248 
    249 * Entries must be ordered in ascending order by offset. This is to allow the transformation of the old archive into the delta-friendly space to be done by reading a the old archive as a stream, instead of requiring random access.
    250 * Entries must not overlap (for sanity)
    251 * Areas of the old archive that are not included in any uncompression op will be left alone, i.e. represent arbitrary data that should **not** be uncompressed, such as zip structural components or blocks of data that are stored without compression already.
    252 
    253 ```
    254 |------------------------------------------------------|
    255 | Offset of first byte to uncompress (8 bytes) (uint64)|
    256 |------------------------------------------------------|
    257 | Number of bytes to uncompress (8 bytes) (uint64)     |
    258 |------------------------------------------------------|
    259 ```
    260 
    261 ## New Archive Recompression Op
    262 The number of these entries is determined by the "Num new archive recompression ops" field previously defined. Like an old archive uncompression op, each entry consists of an offset - but this time from the beginning of the delta-friendly new blob. This is followed by the number of bytes to compress, and finally a compression settings field. Important notes:
    263 
    264 * Entries must be ordered in ascending order by offset. This allows the output from the delta apply process (which creates the delta-friendly new blob) to be piped to an intelligent partially-compressing stream that is seeded with the knowledge of which ranges to recompress and the settings to use for each. This avoids the need to write the delta-friendly new blob to persistent storage, an important optimization.
    265 * Entries must not overlap (for sanity)
    266 * Areas of the new archive that are not included in any recompression op will be copied through from the delta-friendly new blob without modification. These represent arbitrary data that should **not** be compressed, such as zip structural components or blocks of data that are stored without compression in the new archive.
    267 
    268 ```
    269 |------------------------------------------------------|
    270 | Offset of first byte to compress (8 bytes) (uint64)  |
    271 |------------------------------------------------------|
    272 | Number of bytes to compress (8 bytes) (uint64)       |
    273 |------------------------------------------------------|
    274 | Compression settings (4 bytes)                       | (see definition below)
    275 |------------------------------------------------------|
    276 ```
    277 
    278 ## Compression Settings
    279 The compression settings define the deflate level (in the range 1 to 9, inclusive), the deflate strategy (in the range 0 to 2, inclusive) and the wrapping mode (wrap or nowrap). The settings are specific to a **compatibility window**, discussed in the next section in more detail.
    280 
    281 > *In practice almost all entries in zip archives have strategy 0 (the default) and wrapping mode 'nowrap'. The other strategies are primarily used in-situ, e.g., the compression used within the PNG format; wrapping, on the other hand, is almost exclusively used in gzip operations.*
    282 
    283 ```
    284 |------------------------------------------------------|
    285 | Compatibility window ID (1 byte) (uint8)             | (see definition below)
    286 |------------------------------------------------------|
    287 | Deflate level (1 byte) (uint8) (range: [1,9])        |
    288 |------------------------------------------------------|
    289 | Deflate strategy (1 bytes) (uint8) (range: [0,2]     |
    290 |------------------------------------------------------|
    291 | Wrap mode (1 bytes) (uint8) (0=wrap, 1=nowrap)       |
    292 |------------------------------------------------------|
    293 ```
    294 
    295 ## Compatibility Window
    296 A compatibility window specifies a compression algorithm along with a range of versions and platforms upon which it is known to produce predictable and consistent output. That is, all implementations within a given compatibility window must produce *identical output* for any *identical inputs* consisting of bytes to be compressed along with the compression settings (level, strategy, wrapping mode).
    297 
    298 In File-by-File v1, there is only one compatibility window defined. It is **the default deflate compatibility window**, having **ID=0** (all other values reserved for future expansion), and it specifies the following configuration:
    299 
    300 * Algorithm: deflate (zlib)
    301 * Windowlength:32,768 bytes(hardcoded and implied, not explicitly set)
    302 * Valid compressionlevels: 1 through 9 (0 meansstore, and isunused)
    303 * Valid strategies:0,1,or2 (java.util.zip doesnot support anylater strategies)
    304 * Valid wrapping modes: wrap, nowrap
    305 
    306 The default compatibility window is compatible with the following runtime environments based on empiricaltesting. Other environments may be compatible, but the ones in this table are known to be.
    307 
    308 Runtime Environment | OS | Hardware Architectures | Min Version | Max Version | Notes
    309 --- | --- | --- | --- | --- | ---
    310 Sun/Oracle JRE (including OpenJDK) | Linux | x64 | 1.7 (07 Jul, 2011) | None known as of September 2016 | Still compatible asof 1.8, the latest asof August2016. Versionsprior to 1.7 have different level_flags(see [zlib change](https://github.com/madler/zlib/commit/086e982175da84b3db958191031380794315f95f)).
    311 Dalvik/ART | Android | armeabiv7a, arm64v8a, x86 | API 15 (19 Oct, 2011) | None known as of September 2016 | Still compatible asof API 24 (Nougat), the latest asof September2016. Versionsprior to API 15 (Ice Cream Sandwich)used a smaller sliding window size (see [AOSPchange](https://android.googlesource.com/platform/libcore/+/909a18fd6628cee6718865a7b7bf2534ea25f5ec%5E%21/#F0)).
    312 
    313 ## Delta Descriptor Record
    314 Delta descriptor records are grouped together before any of the actual deltas. In File-by-File v1 there is always exactly one delta, so there is exactly one delta descriptor record followed immediately by the delta data. Conceptually, the descriptor defines input and output regions of the archives along with a delta to be applied to those regions (reading from one, and writing to the other).
    315 
    316 > *In subsequent versions there may be arbitrarily many deltas. When there is more than one delta, all the descriptors are listed in a contiguous block followed by all of the deltas themselves, also in a contiguous block. This allows the patch applier to preprocess the list of all deltas that are goingtobeapplied and allocate resources accordingly. As with the other descriptors, these must be ordered by ascending offset and overlaps are not allowed.*
    317 
    318 ```
    319 |------------------------------------------------------|
    320 | Delta format ID (1 byte) (uint8)                     |
    321 |------------------------------------------------------|
    322 | Old delta-friendly region start (8 bytes) (uint64)   |
    323 |------------------------------------------------------|
    324 | Old delta-friendly region length (8 bytes) (uint64)  |
    325 |------------------------------------------------------|
    326 | New delta-friendly region start (8 bytes) (uint64)   |
    327 |------------------------------------------------------|
    328 | New delta-friendly region length (8 bytes) (uint64)  |
    329 |------------------------------------------------------|
    330 | Delta length (8 bytes) (uint64)                      |
    331 |------------------------------------------------------|
    332 ```
    333 
    334 Description of the fields within this record are a little more complex than in the other parts of the patch:
    335 
    336 * **Delta format**: The only delta format in File-by-File v1 is **bsdiff**, having **ID=0**.
    337 * **Old delta-friendly region start**: The offset into the old archive (*after* transformation *into* the delta-friendly space) to which the delta applies. In File-by-File v1, this is always zero.
    338 * **Old delta-friendly region length**: The number of bytes in the old archive (again, *after* transformation *into* the delta-friendly space) to which the delta applies. In File-by-File v1, this is always the length of the old archive in the delta-friendly space.
    339 * **New delta-friendly region start**: The offset into the new archive (*before* transformation *out of* the delta-friendly space) to which the delta applies. In File-by-File v1, this is always zero.
    340 * **New delta-friendly region length**: The number of bytes in the new archive (again, *before* transformation *out of* the delta-friendly space) to which the delta applies. In File-by-File v1, this is always the length of the new archive in the delta-friendly space.
    341 * **Delta length**: The number of bytes in the actual delta (e.g., a bsdiff patch) that needs to be applied to the regions defined above. The type of the delta is determined by the delta format, also defined above.
    342 
    343 # Appendix
    344 
    345 ## Interesting Obstacles to Patching Archives
    346 
    347 ### Problem #1: Spatial Efficiency
    348 **Problem**: Zip files make patching hard because compression obscures the changes. Deflate, the compression algorithm used most widely in zip archives, uses a 32k "sliding window" to compress, carrying state with it as it goes. Because state is carried along, even small changes to the data that is being compressed can result in drastic changes to the bytes that are output - even if the size remains similar. If you change the definition of 'aardvark' in our imaginary dictionary (from back in the [Background](#background) section) and zip both the old and new copies, the resulting zip files will be about the **same size**, but will have very **different bytes**. If you try to generate a patch between the two zip files with the same algorithm you used before (e.g., bsdiff) you'll find that the resulting patch file is much, much larger - probably about the same size of one of the zip files. This is because the files are too dissimilar to express any changes succinctly, so the patching algorithm ends up having to just embed a copy of almost the entire file.
    349 
    350 **Solution**: Archive-patcher **transforms** the input archives into what we refer to as **delta-friendly space** where changed files are stored uncompressed, allowing diffing algorithms like bsdiff to function far more effectively.
    351 
    352 > *Note: There are techniques that can be applied to deflate to isolate changes and stop them from causing the entire output to be different, such those used in rsync-friendly gzip. However, zip archives created with such techniques are uncommon - and tend to be slightly larger in size.*
    353 
    354 ### Problem #2: Correctness When Generating Patches
    355 **Problem**: In order for the generated patch to be correct, we need to know the **original deflate settings** that were used for any changed content that we plan to uncompress during the transformation to the delta-friendly space. This is necessary so that the patch applier can **recompress** that changed content after applying the delta, such that the resulting archive is exactly the same as the input to the patch generator. The deflate settings we care about are the **level**, **strategy**, and **wrap mode**.
    356 
    357 **Solution**: Archive-patcher iteratively recompresses each piece of changed content with different deflate settings, looking for a perfect match. The search is ordered based on empirical data and one of the first 3 guesses is extremely likely to succeed. Because deflate has a stateful and small sliding window, mismatches are quickly identified and discarded. If a match *is* found, the corresponding settings are added to the patch stream and the content is uncompressed in-place as previously described; if a match *is not* found then the content is left compressed (because we lack any way to tell the patch applier how to recompress it later).
    358 
    359 > *Note: While it is possible to change other settings for deflate (like the size of its sliding window), in practice this is almost never done. Content that has been compressed with other settings changed will be left compressed during patch generation.*
    360 
    361 ### Problem #3: Correctness When Applying Patches
    362 **Problem**: The patch applier needs to know that it can reproduce deflate output in exactly the same way as the patch generator did. If this is not possible, patching will fail. The biggest risk is that the patch applier's implementation of deflate differs in some way from that of the patch generator that detected the deflate settings. Any deviation will cause the output to diverge from the original input to the patch generator. Archive-patcher relies on the java.util.zip package which in turn wraps a copy of zlib that ships with the JRE. It is this version of zlib that provides the implementation of deflate.
    363 
    364 **Solution**: Archive-patcher contains a ~9000 byte **corpus** of text that produces a unique output for every possible combination of deflate settings that are exposed through the java.util.zip interface (level, strategy, and wrapping mode). These outputs are digested to produce "fingerprints" for each combination of deflate settings on a given version of the zlib library; these fingerprints are then hard-coded into the application. The patch applier checks the local zlib implementation's suitability by repeating the process, deflating the corpus with each combination of java.util.zip settings and digesting the results, then checks that the resulting fingerprints match the hard-coded values.
    365 
    366 > *Note: At the time of this writing (September, 2016), all zlib versions since 1.2.0.4 (dated 10 August 2003) have identical fingerprints. This includes every version of Sun/Oracle Java from 1.6.0 onwards on x86 and x86_64 as well as all versions of the Android Open Source Project from 4.0 onward on x86, arm32 and arm64. Other platforms may also be compatible but have not been tested.*
    367 
    368 > *Note: This solution is somewhat brittle, but is demonstrably suitable to cover 13 years of zlib updates. Compatibility may be extended in a future version by bundling specific versions of zlib with the application to avoid a dependency upon the zlib in the JRE as necessary.*
    369 
    370 ## Areas For Improvement
    371 The File-by-File v1 patching process dramatically improves the spatial efficiency of patches for zip archives, but there are many improvements that can still be made. Here are a few of the more obvious ones that did not make it into v1, but are good candidates for inclusion into later versions:
    372 
    373 * Support for detecting "similar" files between the old and new archives to handle renames that are coupled with content changes.
    374 * Support for additional versions of zlib or other implementations of deflate.
    375 * Support for other archive formats.
    376 * Support for other delta algorithms.
    377 * Support for more than one delta (i.e., applying different algorithms to different regions of the archives).
    378 * Support for file-specific transformations (i.e., delta-friendly optimization of different files types during the transformation into the delta-friendly space).
    379 * Support for partial decompression (i.e., only undoing the Huffman coding steps of deflate and operating on the LZ77 instruction stream during patch generation, allowing a much faster "recompression" step that skips LZ77).
    380 
    381 # Acknowledgements
    382 Major software contributions, in alphabetical order:
    383 
    384 * Andrew Hayden - design, implementation, documentation
    385 * Anthony Morris - code reviews, cleanups, div suffix sort port, and invaluable suggestions
    386 * Glenn Hartmann - code reviews, initial bsdiff port and quick suffix sort port, bsdiff cleanups
    387 
    388 Additionally, we wish to acknowledge the following, also in alphabetical order:
    389 
    390 * Colin Percival - the author of [bsdiff](http://www.daemonology.net/bsdiff/)
    391 * Mark Adler - the author of [zlib](http://www.zlib.net)
    392 * N. Jesper Larsson and Kunihiko Sadakane - for their paper "[Faster Suffix Sorting](http://www.larsson.dogma.net/ssrev-tr.pdf)", basis of the quick suffix sort algorithm
    393 * PKWARE, Inc. - creators and stewards of the [zip specification](https://support.pkware.com/display/PKZIP/APPNOTE)
    394 * Yuta Mori - for the C implementation of the "div suffix sort" ([libdivsufsort](http://code.google.com/p/libdivsufsort/))
    395 
    396 # Contact
    397 archive-patcher-contacts (a] google.com
    398