Home | History | Annotate | Download | only in courgette
      1 Courgette Internals
      2 ===================
      3 
      4 Patch Generation
      5 ----------------
      6 
      7 ![Patch Generation](generation.png)
      8 
      9 - courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch
     10   generation by calling ensemble\_create.cc:GenerateEnsemblePatch
     11 
     12 - The files are read in by in courgette:SourceStream objects
     13 
     14 - ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which
     15   uses MakeGenerator to create
     16   patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes.
     17 
     18 - PatchGeneratorX86\_32's Transform method transforms the input file
     19   using Courgette's core techniques that make the bsdiff delta
     20   smaller.  The steps it takes are the following:
     21 
     22   - _disassemble_ the old and new binaries into AssemblyProgram
     23     objects,
     24 
     25   - _adjust_ the new AssemblyProgram object, and
     26 
     27   - _encode_ the AssemblyProgram object back into raw bytes.
     28 
     29 ### Disassemble
     30 
     31 - The input is a pointer to a buffer containing the raw bytes of the
     32   input file.
     33 
     34 - Disassembly converts certain machine instructions that reference
     35   addresses to Courgette instructions.  It is not actually
     36   disassembly, but this is the term the code-base uses.  Specifically,
     37   it detects instructions that use absolute addresses given by the
     38   binary file's relocation table, and relative addresses used in
     39   relative branches.
     40 
     41 - Done by disassemble:ParseDetectedExecutable, which selects the
     42   appropriate Disassembler subclass by looking at the binary file's
     43   headers.
     44 
     45   - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler
     46 
     47   - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler
     48 
     49   - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler
     50 
     51 - The Disassembler replaces the relocation table with a Courgette
     52   instruction that can regenerate the relocation table.
     53 
     54 - The Disassembler builds a list of addresses referenced by the
     55   machine code, numbering each one.
     56 
     57 - The Disassembler replaces and address used in machine instructions
     58   with its index number.
     59 
     60 - The output is an assembly\_program.h:AssemblyProgram class, which
     61   contains a list of instructions, machine or Courgette, and a mapping
     62   of indices to actual addresses.
     63 
     64 ### Adjust
     65 
     66 - This step takes the AssemblyProgram for the old file and reassigns
     67   the indices that map to actual addresses.  It is performed by
     68   adjustment_method.cc:Adjust().
     69 
     70 - The goal is the match the indices from the old program to the new
     71   program as closely as possible.
     72 
     73 - When matched correctly, machine instructions that jump to the
     74   function in both the new and old binary will look the same to
     75   bsdiff, even the function is located in a different part of the
     76   binary.
     77 
     78 ### Encode
     79 
     80 - This step takes an AssemblyProgram object and encodes both the
     81   instructions and the mapping of indices to addresses as byte
     82   vectors.  This format can be written to a file directly, and is also
     83   more appropriate for bsdiffing.  It is done by
     84   AssemblyProgram.Encode().
     85 
     86 - encoded_program.h:EncodedProgram defines the binary format and a
     87   WriteTo method that writes to a file.
     88 
     89 ### bsdiff
     90 
     91 - simple_delta.c:GenerateSimpleDelta
     92 
     93 Patch Application
     94 -----------------
     95 
     96 ![Patch Application](application.png)
     97 
     98 - courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation
     99   by calling ensemble\_apply.cc:ApplyEnsemblePatch
    100 
    101 - ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the
    102   patch's header, then calls the overloaded version of
    103   ensemble\_create.cc:ApplyEnsemblePatch.
    104 
    105 - The patch is read into an ensemble_apply.cc:EnsemblePatchApplication
    106   object, which generates a set of patcher_x86_32.h:PatcherX86_32
    107   objects for the sections in the patch.
    108 
    109 - The original file is disassembled and encoded via a call
    110   EnsemblePatchApplication.TransformUp, which in turn call
    111   patcher_x86_32.h:PatcherX86_32.Transform.
    112 
    113 - The transformed file is then bspatched via
    114   EnsemblePatchApplication.SubpatchTransformedElements, which calls
    115   EnsemblePatchApplication.SubpatchStreamSets, which calls
    116   simple_delta.cc:ApplySimpleDelta, Courgette's built-in
    117   implementation of bspatch.
    118 
    119 - Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
    120   reverses the encoding and disassembly, on the patched binary data.
    121   This is done by calling PatcherX86_32.Reform, which in turn calls
    122   the global function encoded_program.cc:Assemble, which calls
    123   EncodedProgram.AssembleTo.
    124 
    125 
    126 Glossary
    127 --------
    128 
    129 **Adjust**: Reassign address indices in the new program to match more
    130   closely those from the old.
    131 
    132 **Assembly program**: The output of _disassembly_.  Contains a list of
    133   _Courgette instructions_ and an index of branch target addresses.
    134 
    135 **Assemble**: Convert an _assembly program_ back into an object file
    136   by evaluating the _Courgette instructions_ and leaving the machine
    137   instructions in place.
    138 
    139 **Courgette instruction**: Replaces machine instructions in the
    140   program.  Courgette instructions replace branches with an index to
    141   the target addresses and replace part of the relocation table.
    142 
    143 **Disassembler**: Takes a binary file and produces an _assembly
    144   program_.
    145 
    146 **Encode**: Convert an _assembly program_ into an _encoded program_ by
    147   serializing its data structures into byte vectors more appropriate
    148   for storage in a file.
    149 
    150 **Encoded Program**: The output of encoding.
    151 
    152 **Ensemble**: A Courgette-style patch containing sections for the list
    153   of branch addresses, the encoded program.  It supports patching
    154   multiple object files at once.
    155 
    156 **Opcode**: The number corresponding to either a machine or _Courgette
    157   instruction_.
    158