Home | History | Annotate | Download | only in courgette
      1 <h1>Courgette Internals</h1>
      2 
      3 <h2>Patch Generation</h2>
      4 
      5 <p><img src="generation.png" alt="Patch Generation" title="" /></p>
      6 
      7 <ul>
      8 <li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch
      9 generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li>
     10 <li><p>The files are read in by in courgette:SourceStream objects</p></li>
     11 <li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which
     12 uses MakeGenerator to create
     13 patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li>
     14 <li><p>PatchGeneratorX86_32's Transform method transforms the input file
     15 using Courgette's core techniques that make the bsdiff delta
     16 smaller.  The steps it takes are the following:</p>
     17 
     18 <ul>
     19 <li><p><em>disassemble</em> the old and new binaries into AssemblyProgram
     20 objects,</p></li>
     21 <li><p><em>adjust</em> the new AssemblyProgram object, and</p></li>
     22 <li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li>
     23 </ul></li>
     24 </ul>
     25 
     26 <h3>Disassemble</h3>
     27 
     28 <ul>
     29 <li><p>The input is a pointer to a buffer containing the raw bytes of the
     30 input file.</p></li>
     31 <li><p>Disassembly converts certain machine instructions that reference
     32 addresses to Courgette instructions.  It is not actually
     33 disassembly, but this is the term the code-base uses.  Specifically,
     34 it detects instructions that use absolute addresses given by the
     35 binary file's relocation table, and relative addresses used in
     36 relative branches.</p></li>
     37 <li><p>Done by disassemble:ParseDetectedExecutable, which selects the
     38 appropriate Disassembler subclass by looking at the binary file's
     39 headers.</p>
     40 
     41 <ul>
     42 <li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li>
     43 <li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li>
     44 <li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li>
     45 </ul></li>
     46 <li><p>The Disassembler replaces the relocation table with a Courgette
     47 instruction that can regenerate the relocation table.</p></li>
     48 <li><p>The Disassembler builds a list of addresses referenced by the
     49 machine code, numbering each one.</p></li>
     50 <li><p>The Disassembler replaces and address used in machine instructions
     51 with its index number.</p></li>
     52 <li><p>The output is an assembly_program.h:AssemblyProgram class, which
     53 contains a list of instructions, machine or Courgette, and a mapping
     54 of indices to actual addresses.</p></li>
     55 </ul>
     56 
     57 <h3>Adjust</h3>
     58 
     59 <ul>
     60 <li><p>This step takes the AssemblyProgram for the old file and reassigns
     61 the indices that map to actual addresses.  It is performed by
     62 adjustment_method.cc:Adjust().</p></li>
     63 <li><p>The goal is the match the indices from the old program to the new
     64 program as closely as possible.</p></li>
     65 <li><p>When matched correctly, machine instructions that jump to the
     66 function in both the new and old binary will look the same to
     67 bsdiff, even the function is located in a different part of the
     68 binary.</p></li>
     69 </ul>
     70 
     71 <h3>Encode</h3>
     72 
     73 <ul>
     74 <li><p>This step takes an AssemblyProgram object and encodes both the
     75 instructions and the mapping of indices to addresses as byte
     76 vectors.  This format can be written to a file directly, and is also
     77 more appropriate for bsdiffing.  It is done by
     78 AssemblyProgram.Encode().</p></li>
     79 <li><p>encoded_program.h:EncodedProgram defines the binary format and a
     80 WriteTo method that writes to a file.</p></li>
     81 </ul>
     82 
     83 <h3>bsdiff</h3>
     84 
     85 <ul>
     86 <li>simple_delta.c:GenerateSimpleDelta</li>
     87 </ul>
     88 
     89 <h2>Patch Application</h2>
     90 
     91 <p><img src="application.png" alt="Patch Application" title="" /></p>
     92 
     93 <ul>
     94 <li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation
     95 by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li>
     96 <li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the
     97 patch's header, then calls the overloaded version of
     98 ensemble_create.cc:ApplyEnsemblePatch.</p></li>
     99 <li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication
    100 object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32
    101 objects for the sections in the patch.</p></li>
    102 <li><p>The original file is disassembled and encoded via a call
    103 EnsemblePatchApplication.TransformUp, which in turn call
    104 patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li>
    105 <li><p>The transformed file is then bspatched via
    106 EnsemblePatchApplication.SubpatchTransformedElements, which calls
    107 EnsemblePatchApplication.SubpatchStreamSets, which calls
    108 simple_delta.cc:ApplySimpleDelta, Courgette's built-in
    109 implementation of bspatch.</p></li>
    110 <li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
    111 reverses the encoding and disassembly, on the patched binary data.
    112 This is done by calling PatcherX86<em>32.Reform, which in turn calls
    113 the global function encoded</em>program.cc:Assemble, which calls
    114 EncodedProgram.AssembleTo.</p></li>
    115 </ul>
    116 
    117 <h2>Glossary</h2>
    118 
    119 <p><strong>Adjust</strong>: Reassign address indices in the new program to match more
    120   closely those from the old.</p>
    121 
    122 <p><strong>Assembly program</strong>: The output of <em>disassembly</em>.  Contains a list of
    123   <em>Courgette instructions</em> and an index of branch target addresses.</p>
    124 
    125 <p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file
    126   by evaluating the <em>Courgette instructions</em> and leaving the machine
    127   instructions in place.</p>
    128 
    129 <p><strong>Courgette instruction</strong>: Replaces machine instructions in the
    130   program.  Courgette instructions replace branches with an index to
    131   the target addresses and replace part of the relocation table.</p>
    132 
    133 <p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assembly
    134   program</em>.</p>
    135 
    136 <p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>encoded program</em> by
    137   serializing its data structures into byte vectors more appropriate
    138   for storage in a file.</p>
    139 
    140 <p><strong>Encoded Program</strong>: The output of encoding.</p>
    141 
    142 <p><strong>Ensemble</strong>: A Courgette-style patch containing sections for the list
    143   of branch addresses, the encoded program.  It supports patching
    144   multiple object files at once.</p>
    145 
    146 <p><strong>Opcode</strong>: The number corresponding to either a machine or <em>Courgette
    147   instruction</em>.</p>
    148