Home | History | Annotate | Download | only in docs
      1 =========================
      2 Driver Design & Internals
      3 =========================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 This document describes the Clang driver. The purpose of this document
     12 is to describe both the motivation and design goals for the driver, as
     13 well as details of the internal implementation.
     14 
     15 Features and Goals
     16 ==================
     17 
     18 The Clang driver is intended to be a production quality compiler driver
     19 providing access to the Clang compiler and tools, with a command line
     20 interface which is compatible with the gcc driver.
     21 
     22 Although the driver is part of and driven by the Clang project, it is
     23 logically a separate tool which shares many of the same goals as Clang:
     24 
     25 .. contents:: Features
     26    :local:
     27 
     28 GCC Compatibility
     29 -----------------
     30 
     31 The number one goal of the driver is to ease the adoption of Clang by
     32 allowing users to drop Clang into a build system which was designed to
     33 call GCC. Although this makes the driver much more complicated than
     34 might otherwise be necessary, we decided that being very compatible with
     35 the gcc command line interface was worth it in order to allow users to
     36 quickly test clang on their projects.
     37 
     38 Flexible
     39 --------
     40 
     41 The driver was designed to be flexible and easily accommodate new uses
     42 as we grow the clang and LLVM infrastructure. As one example, the driver
     43 can easily support the introduction of tools which have an integrated
     44 assembler; something we hope to add to LLVM in the future.
     45 
     46 Similarly, most of the driver functionality is kept in a library which
     47 can be used to build other tools which want to implement or accept a gcc
     48 like interface.
     49 
     50 Low Overhead
     51 ------------
     52 
     53 The driver should have as little overhead as possible. In practice, we
     54 found that the gcc driver by itself incurred a small but meaningful
     55 overhead when compiling many small files. The driver doesn't do much
     56 work compared to a compilation, but we have tried to keep it as
     57 efficient as possible by following a few simple principles:
     58 
     59 -  Avoid memory allocation and string copying when possible.
     60 -  Don't parse arguments more than once.
     61 -  Provide a few simple interfaces for efficiently searching arguments.
     62 
     63 Simple
     64 ------
     65 
     66 Finally, the driver was designed to be "as simple as possible", given
     67 the other goals. Notably, trying to be completely compatible with the
     68 gcc driver adds a significant amount of complexity. However, the design
     69 of the driver attempts to mitigate this complexity by dividing the
     70 process into a number of independent stages instead of a single
     71 monolithic task.
     72 
     73 Internal Design and Implementation
     74 ==================================
     75 
     76 .. contents::
     77    :local:
     78    :depth: 1
     79 
     80 Internals Introduction
     81 ----------------------
     82 
     83 In order to satisfy the stated goals, the driver was designed to
     84 completely subsume the functionality of the gcc executable; that is, the
     85 driver should not need to delegate to gcc to perform subtasks. On
     86 Darwin, this implies that the Clang driver also subsumes the gcc
     87 driver-driver, which is used to implement support for building universal
     88 images (binaries and object files). This also implies that the driver
     89 should be able to call the language specific compilers (e.g. cc1)
     90 directly, which means that it must have enough information to forward
     91 command line arguments to child processes correctly.
     92 
     93 Design Overview
     94 ---------------
     95 
     96 The diagram below shows the significant components of the driver
     97 architecture and how they relate to one another. The orange components
     98 represent concrete data structures built by the driver, the green
     99 components indicate conceptually distinct stages which manipulate these
    100 data structures, and the blue components are important helper classes.
    101 
    102 .. image:: DriverArchitecture.png
    103    :align: center
    104    :alt: Driver Architecture Diagram
    105 
    106 Driver Stages
    107 -------------
    108 
    109 The driver functionality is conceptually divided into five stages:
    110 
    111 #. **Parse: Option Parsing**
    112 
    113    The command line argument strings are decomposed into arguments
    114    (``Arg`` instances). The driver expects to understand all available
    115    options, although there is some facility for just passing certain
    116    classes of options through (like ``-Wl,``).
    117 
    118    Each argument corresponds to exactly one abstract ``Option``
    119    definition, which describes how the option is parsed along with some
    120    additional metadata. The Arg instances themselves are lightweight and
    121    merely contain enough information for clients to determine which
    122    option they correspond to and their values (if they have additional
    123    parameters).
    124 
    125    For example, a command line like "-Ifoo -I foo" would parse to two
    126    Arg instances (a JoinedArg and a SeparateArg instance), but each
    127    would refer to the same Option.
    128 
    129    Options are lazily created in order to avoid populating all Option
    130    classes when the driver is loaded. Most of the driver code only needs
    131    to deal with options by their unique ID (e.g., ``options::OPT_I``),
    132 
    133    Arg instances themselves do not generally store the values of
    134    parameters. In many cases, this would simply result in creating
    135    unnecessary string copies. Instead, Arg instances are always embedded
    136    inside an ArgList structure, which contains the original vector of
    137    argument strings. Each Arg itself only needs to contain an index into
    138    this vector instead of storing its values directly.
    139 
    140    The clang driver can dump the results of this stage using the
    141    ``-ccc-print-options`` flag (which must precede any actual command
    142    line arguments). For example:
    143 
    144    .. code-block:: console
    145 
    146       $ clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
    147       Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
    148       Option 1 - Name: "-Wa,", Values: {"-fast"}
    149       Option 2 - Name: "-I", Values: {"foo"}
    150       Option 3 - Name: "-I", Values: {"foo"}
    151       Option 4 - Name: "<input>", Values: {"t.c"}
    152 
    153    After this stage is complete the command line should be broken down
    154    into well defined option objects with their appropriate parameters.
    155    Subsequent stages should rarely, if ever, need to do any string
    156    processing.
    157 
    158 #. **Pipeline: Compilation Job Construction**
    159 
    160    Once the arguments are parsed, the tree of subprocess jobs needed for
    161    the desired compilation sequence are constructed. This involves
    162    determining the input files and their types, what work is to be done
    163    on them (preprocess, compile, assemble, link, etc.), and constructing
    164    a list of Action instances for each task. The result is a list of one
    165    or more top-level actions, each of which generally corresponds to a
    166    single output (for example, an object or linked executable).
    167 
    168    The majority of Actions correspond to actual tasks, however there are
    169    two special Actions. The first is InputAction, which simply serves to
    170    adapt an input argument for use as an input to other Actions. The
    171    second is BindArchAction, which conceptually alters the architecture
    172    to be used for all of its input Actions.
    173 
    174    The clang driver can dump the results of this stage using the
    175    ``-ccc-print-phases`` flag. For example:
    176 
    177    .. code-block:: console
    178 
    179       $ clang -ccc-print-phases -x c t.c -x assembler t.s
    180       0: input, "t.c", c
    181       1: preprocessor, {0}, cpp-output
    182       2: compiler, {1}, assembler
    183       3: assembler, {2}, object
    184       4: input, "t.s", assembler
    185       5: assembler, {4}, object
    186       6: linker, {3, 5}, image
    187 
    188    Here the driver is constructing seven distinct actions, four to
    189    compile the "t.c" input into an object file, two to assemble the
    190    "t.s" input, and one to link them together.
    191 
    192    A rather different compilation pipeline is shown here; in this
    193    example there are two top level actions to compile the input files
    194    into two separate object files, where each object file is built using
    195    ``lipo`` to merge results built for two separate architectures.
    196 
    197    .. code-block:: console
    198 
    199       $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
    200       0: input, "t0.c", c
    201       1: preprocessor, {0}, cpp-output
    202       2: compiler, {1}, assembler
    203       3: assembler, {2}, object
    204       4: bind-arch, "i386", {3}, object
    205       5: bind-arch, "x86_64", {3}, object
    206       6: lipo, {4, 5}, object
    207       7: input, "t1.c", c
    208       8: preprocessor, {7}, cpp-output
    209       9: compiler, {8}, assembler
    210       10: assembler, {9}, object
    211       11: bind-arch, "i386", {10}, object
    212       12: bind-arch, "x86_64", {10}, object
    213       13: lipo, {11, 12}, object
    214 
    215    After this stage is complete the compilation process is divided into
    216    a simple set of actions which need to be performed to produce
    217    intermediate or final outputs (in some cases, like ``-fsyntax-only``,
    218    there is no "real" final output). Phases are well known compilation
    219    steps, such as "preprocess", "compile", "assemble", "link", etc.
    220 
    221 #. **Bind: Tool & Filename Selection**
    222 
    223    This stage (in conjunction with the Translate stage) turns the tree
    224    of Actions into a list of actual subprocess to run. Conceptually, the
    225    driver performs a top down matching to assign Action(s) to Tools. The
    226    ToolChain is responsible for selecting the tool to perform a
    227    particular action; once selected the driver interacts with the tool
    228    to see if it can match additional actions (for example, by having an
    229    integrated preprocessor).
    230 
    231    Once Tools have been selected for all actions, the driver determines
    232    how the tools should be connected (for example, using an inprocess
    233    module, pipes, temporary files, or user provided filenames). If an
    234    output file is required, the driver also computes the appropriate
    235    file name (the suffix and file location depend on the input types and
    236    options such as ``-save-temps``).
    237 
    238    The driver interacts with a ToolChain to perform the Tool bindings.
    239    Each ToolChain contains information about all the tools needed for
    240    compilation for a particular architecture, platform, and operating
    241    system. A single driver invocation may query multiple ToolChains
    242    during one compilation in order to interact with tools for separate
    243    architectures.
    244 
    245    The results of this stage are not computed directly, but the driver
    246    can print the results via the ``-ccc-print-bindings`` option. For
    247    example:
    248 
    249    .. code-block:: console
    250 
    251       $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
    252       # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
    253       # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
    254       # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
    255       # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
    256       # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
    257       # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
    258       # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
    259 
    260    This shows the tool chain, tool, inputs and outputs which have been
    261    bound for this compilation sequence. Here clang is being used to
    262    compile t0.c on the i386 architecture and darwin specific versions of
    263    the tools are being used to assemble and link the result, but generic
    264    gcc versions of the tools are being used on PowerPC.
    265 
    266 #. **Translate: Tool Specific Argument Translation**
    267 
    268    Once a Tool has been selected to perform a particular Action, the
    269    Tool must construct concrete Jobs which will be executed during
    270    compilation. The main work is in translating from the gcc style
    271    command line options to whatever options the subprocess expects.
    272 
    273    Some tools, such as the assembler, only interact with a handful of
    274    arguments and just determine the path of the executable to call and
    275    pass on their input and output arguments. Others, like the compiler
    276    or the linker, may translate a large number of arguments in addition.
    277 
    278    The ArgList class provides a number of simple helper methods to
    279    assist with translating arguments; for example, to pass on only the
    280    last of arguments corresponding to some option, or all arguments for
    281    an option.
    282 
    283    The result of this stage is a list of Jobs (executable paths and
    284    argument strings) to execute.
    285 
    286 #. **Execute**
    287 
    288    Finally, the compilation pipeline is executed. This is mostly
    289    straightforward, although there is some interaction with options like
    290    ``-pipe``, ``-pass-exit-codes`` and ``-time``.
    291 
    292 Additional Notes
    293 ----------------
    294 
    295 The Compilation Object
    296 ^^^^^^^^^^^^^^^^^^^^^^
    297 
    298 The driver constructs a Compilation object for each set of command line
    299 arguments. The Driver itself is intended to be invariant during
    300 construction of a Compilation; an IDE should be able to construct a
    301 single long lived driver instance to use for an entire build, for
    302 example.
    303 
    304 The Compilation object holds information that is particular to each
    305 compilation sequence. For example, the list of used temporary files
    306 (which must be removed once compilation is finished) and result files
    307 (which should be removed if compilation fails).
    308 
    309 Unified Parsing & Pipelining
    310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    311 
    312 Parsing and pipelining both occur without reference to a Compilation
    313 instance. This is by design; the driver expects that both of these
    314 phases are platform neutral, with a few very well defined exceptions
    315 such as whether the platform uses a driver driver.
    316 
    317 ToolChain Argument Translation
    318 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    319 
    320 In order to match gcc very closely, the clang driver currently allows
    321 tool chains to perform their own translation of the argument list (into
    322 a new ArgList data structure). Although this allows the clang driver to
    323 match gcc easily, it also makes the driver operation much harder to
    324 understand (since the Tools stop seeing some arguments the user
    325 provided, and see new ones instead).
    326 
    327 For example, on Darwin ``-gfull`` gets translated into two separate
    328 arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to
    329 write Tool logic to do something with ``-gfull`` will not work, because
    330 Tool argument translation is done after the arguments have been
    331 translated.
    332 
    333 A long term goal is to remove this tool chain specific translation, and
    334 instead force each tool to change its own logic to do the right thing on
    335 the untranslated original arguments.
    336 
    337 Unused Argument Warnings
    338 ^^^^^^^^^^^^^^^^^^^^^^^^
    339 
    340 The driver operates by parsing all arguments but giving Tools the
    341 opportunity to choose which arguments to pass on. One downside of this
    342 infrastructure is that if the user misspells some option, or is confused
    343 about which options to use, some command line arguments the user really
    344 cared about may go unused. This problem is particularly important when
    345 using clang as a compiler, since the clang compiler does not support
    346 anywhere near all the options that gcc does, and we want to make sure
    347 users know which ones are being used.
    348 
    349 To support this, the driver maintains a bit associated with each
    350 argument of whether it has been used (at all) during the compilation.
    351 This bit usually doesn't need to be set by hand, as the key ArgList
    352 accessors will set it automatically.
    353 
    354 When a compilation is successful (there are no errors), the driver
    355 checks the bit and emits an "unused argument" warning for any arguments
    356 which were never accessed. This is conservative (the argument may not
    357 have been used to do what the user wanted) but still catches the most
    358 obvious cases.
    359 
    360 Relation to GCC Driver Concepts
    361 -------------------------------
    362 
    363 For those familiar with the gcc driver, this section provides a brief
    364 overview of how things from the gcc driver map to the clang driver.
    365 
    366 -  **Driver Driver**
    367 
    368    The driver driver is fully integrated into the clang driver. The
    369    driver simply constructs additional Actions to bind the architecture
    370    during the *Pipeline* phase. The tool chain specific argument
    371    translation is responsible for handling ``-Xarch_``.
    372 
    373    The one caveat is that this approach requires ``-Xarch_`` not be used
    374    to alter the compilation itself (for example, one cannot provide
    375    ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject
    376    such invocations, and overall there isn't a good reason to abuse
    377    ``-Xarch_`` to that end in practice.
    378 
    379    The upside is that the clang driver is more efficient and does little
    380    extra work to support universal builds. It also provides better error
    381    reporting and UI consistency.
    382 
    383 -  **Specs**
    384 
    385    The clang driver has no direct correspondent for "specs". The
    386    majority of the functionality that is embedded in specs is in the
    387    Tool specific argument translation routines. The parts of specs which
    388    control the compilation pipeline are generally part of the *Pipeline*
    389    stage.
    390 
    391 -  **Toolchains**
    392 
    393    The gcc driver has no direct understanding of tool chains. Each gcc
    394    binary roughly corresponds to the information which is embedded
    395    inside a single ToolChain.
    396 
    397    The clang driver is intended to be portable and support complex
    398    compilation environments. All platform and tool chain specific code
    399    should be protected behind either abstract or well defined interfaces
    400    (such as whether the platform supports use as a driver driver).
    401