1 <h1>Courgette Internals</h1> 2 3 <h2>Patch Generation</h2> 4 5 <p><img src="generation.png" alt="Patch Generation" title="" /></p> 6 7 <ul> 8 <li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch 9 generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li> 10 <li><p>The files are read in by in courgette:SourceStream objects</p></li> 11 <li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which 12 uses MakeGenerator to create 13 patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li> 14 <li><p>PatchGeneratorX86_32's Transform method transforms the input file 15 using Courgette's core techniques that make the bsdiff delta 16 smaller. The steps it takes are the following:</p> 17 18 <ul> 19 <li><p><em>disassemble</em> the old and new binaries into AssemblyProgram 20 objects,</p></li> 21 <li><p><em>adjust</em> the new AssemblyProgram object, and</p></li> 22 <li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li> 23 </ul></li> 24 </ul> 25 26 <h3>Disassemble</h3> 27 28 <ul> 29 <li><p>The input is a pointer to a buffer containing the raw bytes of the 30 input file.</p></li> 31 <li><p>Disassembly converts certain machine instructions that reference 32 addresses to Courgette instructions. It is not actually 33 disassembly, but this is the term the code-base uses. Specifically, 34 it detects instructions that use absolute addresses given by the 35 binary file's relocation table, and relative addresses used in 36 relative branches.</p></li> 37 <li><p>Done by disassemble:ParseDetectedExecutable, which selects the 38 appropriate Disassembler subclass by looking at the binary file's 39 headers.</p> 40 41 <ul> 42 <li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li> 43 <li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li> 44 <li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li> 45 </ul></li> 46 <li><p>The Disassembler replaces the relocation table with a Courgette 47 instruction that can regenerate the relocation table.</p></li> 48 <li><p>The Disassembler builds a list of addresses referenced by the 49 machine code, numbering each one.</p></li> 50 <li><p>The Disassembler replaces and address used in machine instructions 51 with its index number.</p></li> 52 <li><p>The output is an assembly_program.h:AssemblyProgram class, which 53 contains a list of instructions, machine or Courgette, and a mapping 54 of indices to actual addresses.</p></li> 55 </ul> 56 57 <h3>Adjust</h3> 58 59 <ul> 60 <li><p>This step takes the AssemblyProgram for the old file and reassigns 61 the indices that map to actual addresses. It is performed by 62 adjustment_method.cc:Adjust().</p></li> 63 <li><p>The goal is the match the indices from the old program to the new 64 program as closely as possible.</p></li> 65 <li><p>When matched correctly, machine instructions that jump to the 66 function in both the new and old binary will look the same to 67 bsdiff, even the function is located in a different part of the 68 binary.</p></li> 69 </ul> 70 71 <h3>Encode</h3> 72 73 <ul> 74 <li><p>This step takes an AssemblyProgram object and encodes both the 75 instructions and the mapping of indices to addresses as byte 76 vectors. This format can be written to a file directly, and is also 77 more appropriate for bsdiffing. It is done by 78 AssemblyProgram.Encode().</p></li> 79 <li><p>encoded_program.h:EncodedProgram defines the binary format and a 80 WriteTo method that writes to a file.</p></li> 81 </ul> 82 83 <h3>bsdiff</h3> 84 85 <ul> 86 <li>simple_delta.c:GenerateSimpleDelta</li> 87 </ul> 88 89 <h2>Patch Application</h2> 90 91 <p><img src="application.png" alt="Patch Application" title="" /></p> 92 93 <ul> 94 <li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation 95 by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li> 96 <li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the 97 patch's header, then calls the overloaded version of 98 ensemble_create.cc:ApplyEnsemblePatch.</p></li> 99 <li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication 100 object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32 101 objects for the sections in the patch.</p></li> 102 <li><p>The original file is disassembled and encoded via a call 103 EnsemblePatchApplication.TransformUp, which in turn call 104 patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li> 105 <li><p>The transformed file is then bspatched via 106 EnsemblePatchApplication.SubpatchTransformedElements, which calls 107 EnsemblePatchApplication.SubpatchStreamSets, which calls 108 simple_delta.cc:ApplySimpleDelta, Courgette's built-in 109 implementation of bspatch.</p></li> 110 <li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e., 111 reverses the encoding and disassembly, on the patched binary data. 112 This is done by calling PatcherX86<em>32.Reform, which in turn calls 113 the global function encoded</em>program.cc:Assemble, which calls 114 EncodedProgram.AssembleTo.</p></li> 115 </ul> 116 117 <h2>Glossary</h2> 118 119 <p><strong>Adjust</strong>: Reassign address indices in the new program to match more 120 closely those from the old.</p> 121 122 <p><strong>Assembly program</strong>: The output of <em>disassembly</em>. Contains a list of 123 <em>Courgette instructions</em> and an index of branch target addresses.</p> 124 125 <p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file 126 by evaluating the <em>Courgette instructions</em> and leaving the machine 127 instructions in place.</p> 128 129 <p><strong>Courgette instruction</strong>: Replaces machine instructions in the 130 program. Courgette instructions replace branches with an index to 131 the target addresses and replace part of the relocation table.</p> 132 133 <p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assembly 134 program</em>.</p> 135 136 <p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>encoded program</em> by 137 serializing its data structures into byte vectors more appropriate 138 for storage in a file.</p> 139 140 <p><strong>Encoded Program</strong>: The output of encoding.</p> 141 142 <p><strong>Ensemble</strong>: A Courgette-style patch containing sections for the list 143 of branch addresses, the encoded program. It supports patching 144 multiple object files at once.</p> 145 146 <p><strong>Opcode</strong>: The number corresponding to either a machine or <em>Courgette 147 instruction</em>.</p> 148