Home | History | Annotate | Download | only in src
      1 The design of crazy_linker:
      2 ===========================
      3 
      4 Introduction:
      5 -------------
      6 
      7 A system linker (e.g. ld.so on Linux, or /system/bin/linker on Android), is a
      8 particularly sophisticated piece of code because it is used to load and start
      9 _executables_ on the system. This requires dealing with really low-level
     10 details like:
     11 
     12   - The way the kernel loads and initializes binaries into a new process.
     13 
     14   - The way it passes initialization data (e.g. command-line arguments) to
     15     the process being launched.
     16 
     17   - Setting up the C runtime library, thread-local storage, and others properly
     18     before calling main().
     19 
     20   - Be very careful in the way it operates, due to the fact that it will be used
     21     to load set-uid programs.
     22 
     23   - Need to support a flurry of exotic flags and environment variables that
     24     affect runtime behaviour in "interesting" but mostly unpredictable ways
     25     (see the manpages for dlopen, dlsym and ld.so for details).
     26 
     27 Add to this that most of this must be done without the C library being loaded or
     28 initialized yet. No wonder this code is really complex.
     29 
     30 By contrast, crazy_linker is a static library whose only purpose is to load
     31 ELF shared libraries, inside an _existing_ executable process. This makes it
     32 considerably simpler:
     33 
     34   - The runtime environment (C library, libstdc++) is available and properly
     35     initialized.
     36 
     37   - No need to care about kernel interfaces. Everything uses mmap() and simple
     38     file accesses.
     39 
     40   - The API is simple, and straightforward (no hidden behaviour changes due to
     41     environment variables).
     42 
     43 This document explains how the crazy_linker works. A good understanding of the
     44 ELF file format is recommended, though not necessary.
     45 
     46 
     47 I. ELF Loading Basics:
     48 ----------------------
     49 
     50 When it comes to loading shared libraries, an ELF file mainly consists in the
     51 following parts:
     52 
     53   - A fixed-size header that identifies the file as an ELF file and gives
     54     offsets/sizes to other tables.
     55 
     56   - A table (called the "program header table"), containing entries describing
     57     'segments' of interest in the ELF file.
     58 
     59   - A table (called the "dynamic table"), containing entries describing
     60     properties of the ELF library. The most interesting ones are the list
     61     of libraries the current one depends on.
     62 
     63   - A table describing the symbols (function or global names) that the library
     64     references or exports.
     65 
     66   - One or more tables containing 'relocations'. Because libraries can be loaded
     67     at any page-aligned address in memory, numerical pointers they contain must
     68     be adjusted after load. That's what the relocation entries do. They can
     69     also reference symbols to be found in other libraries.
     70 
     71 The process of loading a given ELF shared library can be decomposed into 4 steps:
     72 
     73   1) Map loadable segments into memory.
     74 
     75     This step parses the program header table to identify 'loadable' segments,
     76     reserve the corresponding address space, then map them directly into
     77     memory with mmap().
     78 
     79        Related: src/crazy_linker_elf_loader.cpp
     80 
     81 
     82   2) Load library dependencies.
     83 
     84     This step parses the dynamic table to identify all the other shared
     85     libraries the current one depends on, then will _recursively_ load them.
     86 
     87         Related: src/crazy_linker_library_list.cpp
     88                  (crazy::LibraryList::LoadLibrary())
     89 
     90   3) Apply all relocations.
     91 
     92      This steps adjusts all pointers within the library for the actual load
     93      address. This can also reference symbols that appear in other libraries
     94      loaded in step 2).
     95 
     96         Related: src/crazy_linker_elf_relocator.cpp
     97 
     98   4) Run constructors.
     99 
    100      Libraries include a list of functions to be run at load time, typically
    101      to perform static C++ initialization.
    102 
    103         Related: src/crazy_linker_shared_library.cpp
    104                  (SharedLibrary::RunConstructors())
    105 
    106 Unloading a library is similar, but in reverse order:
    107 
    108   1) Run destructors.
    109   2) Unload dependencies recursively.
    110   3) Unmap loadable segments.
    111 
    112 
    113 II. Managing the list of libraries:
    114 -----------------------------------
    115 
    116 It is crucial to avoid loading the same library twice in the same process,
    117 otherwise some really bad undefined behaviour may happen.
    118 
    119 This implies that, inside an Android application process, all system libraries
    120 should be loaded by the system linker (because otherwise, the Dalvik-based
    121 framework might load the same library on demand, at an unpredictable time).
    122 
    123 To handle this, the crazy_linker uses a custom class (crazy::LibraryList) where
    124 each entry (crazy::LibraryView) is reference-counted, and either references:
    125 
    126   - An application shared libraries, loaded by the crazy_linker itself.
    127   - A system shared libraries, loaded through the system dlopen().
    128 
    129 Libraries loaded by the crazy_linker are modelled by a crazy::SharedLibrary
    130 object. The source code comments often refer to these objects as
    131 "crazy libraries", as opposed to "system libraries".
    132 
    133 As an example, here's a diagram that shows the list after loading a library
    134 'libfoo.so' that depends on the system libraries 'libc.so', 'libm.so' and
    135 'libOpenSLES.so'.
    136 
    137     +-------------+
    138     | LibraryList |
    139     +-------------+
    140            |
    141            |    +-------------+
    142            +----| LibraryView | ----> libc.so
    143            |    +-------------+
    144            |
    145            |    +-------------+
    146            +----| LibraryView | ----> libm.so
    147            |    +-------------+
    148            |
    149            |    +-------------+
    150            +----| LibraryView | ----> libOpenSLES.so
    151            |    +-------------+
    152            |
    153            |    +-------------+      +-------------+
    154            +----| LibraryView |----->|SharedLibrary| ---> libfoo.so
    155            |    +-------------+      +-------------+
    156            |
    157           ___
    158            _
    159 
    160 System libraries are identified by name. Only the official NDK-official system
    161 libraries are listed. It is likely that using crazy_linker to load non-NDK
    162 system libraries will not work correctly, so don't do it.
    163 
    164 
    165 III. Wrapping of linker symbols within crazy ones:
    166 --------------------------------------------------
    167 
    168 Libraries loaded by the crazy linker are not visible to the system linker.
    169 
    170 This means that directly calling the system dlopen() or dlsym() from a library
    171 code loaded by the crazy_linker will not work properly.
    172 
    173 To work-around this, crazy_linker redirects all linker symbols to its own
    174 wrapper implementation. This redirection happens transparently.
    175 
    176   Related: src/crazy_linker_wrappers.cpp
    177 
    178 This also includes a few "hidden" dynamic linker symbols which are used for
    179 stack-unwinding. This guarantees that C++ exception propagation works.
    180 
    181 
    182 IV. GDB support:
    183 ----------------
    184 
    185 The crazy_linker contains support code to ensure that libraries loaded with it
    186 are visible through GDB at runtime. For more details, see the extensive comments
    187 in src/crazy_linker_rdebug.h
    188 
    189 
    190 V. Other Implementation details:
    191 --------------------------------
    192 
    193 The crazy_linker is written in C++, but its API is completely C-based.
    194 
    195 The implementation doesn't require any C++ STL feature (except for new
    196 and delete).
    197 
    198 Very little of the code is actually Android-specific. The target system's
    199 bitness is abstracted through a C++ traits class (see src/elf_traits.h).
    200 
    201 Written originally for Chrome, so follows the Chromium coding style. Which can
    202 be enforced by using the 'clang-format' tool with:
    203 
    204   cd /path/to/crazy_linker/
    205   find . -name "*.h" -o -name "*.cpp" | xargs clang-format -style Chromium -i
    206