1 Courgette Internals 2 =================== 3 4 Patch Generation 5 ---------------- 6 7 ![Patch Generation](generation.png) 8 9 - courgette\_tool.cc:GenerateEnsemblePatch kicks off the patch 10 generation by calling ensemble\_create.cc:GenerateEnsemblePatch 11 12 - The files are read in by in courgette:SourceStream objects 13 14 - ensemble\_create.cc:GenerateEnsemblePatch uses FindGenerators, which 15 uses MakeGenerator to create 16 patch\_generator\_x86\_32.h:PatchGeneratorX86\_32 classes. 17 18 - PatchGeneratorX86\_32's Transform method transforms the input file 19 using Courgette's core techniques that make the bsdiff delta 20 smaller. The steps it takes are the following: 21 22 - _disassemble_ the old and new binaries into AssemblyProgram 23 objects, 24 25 - _adjust_ the new AssemblyProgram object, and 26 27 - _encode_ the AssemblyProgram object back into raw bytes. 28 29 ### Disassemble 30 31 - The input is a pointer to a buffer containing the raw bytes of the 32 input file. 33 34 - Disassembly converts certain machine instructions that reference 35 addresses to Courgette instructions. It is not actually 36 disassembly, but this is the term the code-base uses. Specifically, 37 it detects instructions that use absolute addresses given by the 38 binary file's relocation table, and relative addresses used in 39 relative branches. 40 41 - Done by disassemble:ParseDetectedExecutable, which selects the 42 appropriate Disassembler subclass by looking at the binary file's 43 headers. 44 45 - disassembler\_win32\_x86.h defines the PE/COFF x86 disassembler 46 47 - disassembler\_elf\_32\_x86.h defines the ELF 32-bit x86 disassembler 48 49 - disassembler\_elf\_32\_arm.h defines the ELF 32-bit arm disassembler 50 51 - The Disassembler replaces the relocation table with a Courgette 52 instruction that can regenerate the relocation table. 53 54 - The Disassembler builds a list of addresses referenced by the 55 machine code, numbering each one. 56 57 - The Disassembler replaces and address used in machine instructions 58 with its index number. 59 60 - The output is an assembly\_program.h:AssemblyProgram class, which 61 contains a list of instructions, machine or Courgette, and a mapping 62 of indices to actual addresses. 63 64 ### Adjust 65 66 - This step takes the AssemblyProgram for the old file and reassigns 67 the indices that map to actual addresses. It is performed by 68 adjustment_method.cc:Adjust(). 69 70 - The goal is the match the indices from the old program to the new 71 program as closely as possible. 72 73 - When matched correctly, machine instructions that jump to the 74 function in both the new and old binary will look the same to 75 bsdiff, even the function is located in a different part of the 76 binary. 77 78 ### Encode 79 80 - This step takes an AssemblyProgram object and encodes both the 81 instructions and the mapping of indices to addresses as byte 82 vectors. This format can be written to a file directly, and is also 83 more appropriate for bsdiffing. It is done by 84 AssemblyProgram.Encode(). 85 86 - encoded_program.h:EncodedProgram defines the binary format and a 87 WriteTo method that writes to a file. 88 89 ### bsdiff 90 91 - simple_delta.c:GenerateSimpleDelta 92 93 Patch Application 94 ----------------- 95 96 ![Patch Application](application.png) 97 98 - courgette\_tool.cc:ApplyEnsemblePatch kicks off the patch generation 99 by calling ensemble\_apply.cc:ApplyEnsemblePatch 100 101 - ensemble\_create.cc:ApplyEnsemblePatch, reads and verifies the 102 patch's header, then calls the overloaded version of 103 ensemble\_create.cc:ApplyEnsemblePatch. 104 105 - The patch is read into an ensemble_apply.cc:EnsemblePatchApplication 106 object, which generates a set of patcher_x86_32.h:PatcherX86_32 107 objects for the sections in the patch. 108 109 - The original file is disassembled and encoded via a call 110 EnsemblePatchApplication.TransformUp, which in turn call 111 patcher_x86_32.h:PatcherX86_32.Transform. 112 113 - The transformed file is then bspatched via 114 EnsemblePatchApplication.SubpatchTransformedElements, which calls 115 EnsemblePatchApplication.SubpatchStreamSets, which calls 116 simple_delta.cc:ApplySimpleDelta, Courgette's built-in 117 implementation of bspatch. 118 119 - Finally, EnsemblePatchApplication.TransformDown assembles, i.e., 120 reverses the encoding and disassembly, on the patched binary data. 121 This is done by calling PatcherX86_32.Reform, which in turn calls 122 the global function encoded_program.cc:Assemble, which calls 123 EncodedProgram.AssembleTo. 124 125 126 Glossary 127 -------- 128 129 **Adjust**: Reassign address indices in the new program to match more 130 closely those from the old. 131 132 **Assembly program**: The output of _disassembly_. Contains a list of 133 _Courgette instructions_ and an index of branch target addresses. 134 135 **Assemble**: Convert an _assembly program_ back into an object file 136 by evaluating the _Courgette instructions_ and leaving the machine 137 instructions in place. 138 139 **Courgette instruction**: Replaces machine instructions in the 140 program. Courgette instructions replace branches with an index to 141 the target addresses and replace part of the relocation table. 142 143 **Disassembler**: Takes a binary file and produces an _assembly 144 program_. 145 146 **Encode**: Convert an _assembly program_ into an _encoded program_ by 147 serializing its data structures into byte vectors more appropriate 148 for storage in a file. 149 150 **Encoded Program**: The output of encoding. 151 152 **Ensemble**: A Courgette-style patch containing sections for the list 153 of branch addresses, the encoded program. It supports patching 154 multiple object files at once. 155 156 **Opcode**: The number corresponding to either a machine or _Courgette 157 instruction_. 158