1 ========================= 2 Driver Design & Internals 3 ========================= 4 5 .. contents:: 6 :local: 7 8 Introduction 9 ============ 10 11 This document describes the Clang driver. The purpose of this document 12 is to describe both the motivation and design goals for the driver, as 13 well as details of the internal implementation. 14 15 Features and Goals 16 ================== 17 18 The Clang driver is intended to be a production quality compiler driver 19 providing access to the Clang compiler and tools, with a command line 20 interface which is compatible with the gcc driver. 21 22 Although the driver is part of and driven by the Clang project, it is 23 logically a separate tool which shares many of the same goals as Clang: 24 25 .. contents:: Features 26 :local: 27 28 GCC Compatibility 29 ----------------- 30 31 The number one goal of the driver is to ease the adoption of Clang by 32 allowing users to drop Clang into a build system which was designed to 33 call GCC. Although this makes the driver much more complicated than 34 might otherwise be necessary, we decided that being very compatible with 35 the gcc command line interface was worth it in order to allow users to 36 quickly test clang on their projects. 37 38 Flexible 39 -------- 40 41 The driver was designed to be flexible and easily accommodate new uses 42 as we grow the clang and LLVM infrastructure. As one example, the driver 43 can easily support the introduction of tools which have an integrated 44 assembler; something we hope to add to LLVM in the future. 45 46 Similarly, most of the driver functionality is kept in a library which 47 can be used to build other tools which want to implement or accept a gcc 48 like interface. 49 50 Low Overhead 51 ------------ 52 53 The driver should have as little overhead as possible. In practice, we 54 found that the gcc driver by itself incurred a small but meaningful 55 overhead when compiling many small files. The driver doesn't do much 56 work compared to a compilation, but we have tried to keep it as 57 efficient as possible by following a few simple principles: 58 59 - Avoid memory allocation and string copying when possible. 60 - Don't parse arguments more than once. 61 - Provide a few simple interfaces for efficiently searching arguments. 62 63 Simple 64 ------ 65 66 Finally, the driver was designed to be "as simple as possible", given 67 the other goals. Notably, trying to be completely compatible with the 68 gcc driver adds a significant amount of complexity. However, the design 69 of the driver attempts to mitigate this complexity by dividing the 70 process into a number of independent stages instead of a single 71 monolithic task. 72 73 Internal Design and Implementation 74 ================================== 75 76 .. contents:: 77 :local: 78 :depth: 1 79 80 Internals Introduction 81 ---------------------- 82 83 In order to satisfy the stated goals, the driver was designed to 84 completely subsume the functionality of the gcc executable; that is, the 85 driver should not need to delegate to gcc to perform subtasks. On 86 Darwin, this implies that the Clang driver also subsumes the gcc 87 driver-driver, which is used to implement support for building universal 88 images (binaries and object files). This also implies that the driver 89 should be able to call the language specific compilers (e.g. cc1) 90 directly, which means that it must have enough information to forward 91 command line arguments to child processes correctly. 92 93 Design Overview 94 --------------- 95 96 The diagram below shows the significant components of the driver 97 architecture and how they relate to one another. The orange components 98 represent concrete data structures built by the driver, the green 99 components indicate conceptually distinct stages which manipulate these 100 data structures, and the blue components are important helper classes. 101 102 .. image:: DriverArchitecture.png 103 :align: center 104 :alt: Driver Architecture Diagram 105 106 Driver Stages 107 ------------- 108 109 The driver functionality is conceptually divided into five stages: 110 111 #. **Parse: Option Parsing** 112 113 The command line argument strings are decomposed into arguments 114 (``Arg`` instances). The driver expects to understand all available 115 options, although there is some facility for just passing certain 116 classes of options through (like ``-Wl,``). 117 118 Each argument corresponds to exactly one abstract ``Option`` 119 definition, which describes how the option is parsed along with some 120 additional metadata. The Arg instances themselves are lightweight and 121 merely contain enough information for clients to determine which 122 option they correspond to and their values (if they have additional 123 parameters). 124 125 For example, a command line like "-Ifoo -I foo" would parse to two 126 Arg instances (a JoinedArg and a SeparateArg instance), but each 127 would refer to the same Option. 128 129 Options are lazily created in order to avoid populating all Option 130 classes when the driver is loaded. Most of the driver code only needs 131 to deal with options by their unique ID (e.g., ``options::OPT_I``), 132 133 Arg instances themselves do not generally store the values of 134 parameters. In many cases, this would simply result in creating 135 unnecessary string copies. Instead, Arg instances are always embedded 136 inside an ArgList structure, which contains the original vector of 137 argument strings. Each Arg itself only needs to contain an index into 138 this vector instead of storing its values directly. 139 140 The clang driver can dump the results of this stage using the 141 ``-ccc-print-options`` flag (which must precede any actual command 142 line arguments). For example: 143 144 .. code-block:: console 145 146 $ clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c 147 Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"} 148 Option 1 - Name: "-Wa,", Values: {"-fast"} 149 Option 2 - Name: "-I", Values: {"foo"} 150 Option 3 - Name: "-I", Values: {"foo"} 151 Option 4 - Name: "<input>", Values: {"t.c"} 152 153 After this stage is complete the command line should be broken down 154 into well defined option objects with their appropriate parameters. 155 Subsequent stages should rarely, if ever, need to do any string 156 processing. 157 158 #. **Pipeline: Compilation Job Construction** 159 160 Once the arguments are parsed, the tree of subprocess jobs needed for 161 the desired compilation sequence are constructed. This involves 162 determining the input files and their types, what work is to be done 163 on them (preprocess, compile, assemble, link, etc.), and constructing 164 a list of Action instances for each task. The result is a list of one 165 or more top-level actions, each of which generally corresponds to a 166 single output (for example, an object or linked executable). 167 168 The majority of Actions correspond to actual tasks, however there are 169 two special Actions. The first is InputAction, which simply serves to 170 adapt an input argument for use as an input to other Actions. The 171 second is BindArchAction, which conceptually alters the architecture 172 to be used for all of its input Actions. 173 174 The clang driver can dump the results of this stage using the 175 ``-ccc-print-phases`` flag. For example: 176 177 .. code-block:: console 178 179 $ clang -ccc-print-phases -x c t.c -x assembler t.s 180 0: input, "t.c", c 181 1: preprocessor, {0}, cpp-output 182 2: compiler, {1}, assembler 183 3: assembler, {2}, object 184 4: input, "t.s", assembler 185 5: assembler, {4}, object 186 6: linker, {3, 5}, image 187 188 Here the driver is constructing seven distinct actions, four to 189 compile the "t.c" input into an object file, two to assemble the 190 "t.s" input, and one to link them together. 191 192 A rather different compilation pipeline is shown here; in this 193 example there are two top level actions to compile the input files 194 into two separate object files, where each object file is built using 195 ``lipo`` to merge results built for two separate architectures. 196 197 .. code-block:: console 198 199 $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c 200 0: input, "t0.c", c 201 1: preprocessor, {0}, cpp-output 202 2: compiler, {1}, assembler 203 3: assembler, {2}, object 204 4: bind-arch, "i386", {3}, object 205 5: bind-arch, "x86_64", {3}, object 206 6: lipo, {4, 5}, object 207 7: input, "t1.c", c 208 8: preprocessor, {7}, cpp-output 209 9: compiler, {8}, assembler 210 10: assembler, {9}, object 211 11: bind-arch, "i386", {10}, object 212 12: bind-arch, "x86_64", {10}, object 213 13: lipo, {11, 12}, object 214 215 After this stage is complete the compilation process is divided into 216 a simple set of actions which need to be performed to produce 217 intermediate or final outputs (in some cases, like ``-fsyntax-only``, 218 there is no "real" final output). Phases are well known compilation 219 steps, such as "preprocess", "compile", "assemble", "link", etc. 220 221 #. **Bind: Tool & Filename Selection** 222 223 This stage (in conjunction with the Translate stage) turns the tree 224 of Actions into a list of actual subprocess to run. Conceptually, the 225 driver performs a top down matching to assign Action(s) to Tools. The 226 ToolChain is responsible for selecting the tool to perform a 227 particular action; once selected the driver interacts with the tool 228 to see if it can match additional actions (for example, by having an 229 integrated preprocessor). 230 231 Once Tools have been selected for all actions, the driver determines 232 how the tools should be connected (for example, using an inprocess 233 module, pipes, temporary files, or user provided filenames). If an 234 output file is required, the driver also computes the appropriate 235 file name (the suffix and file location depend on the input types and 236 options such as ``-save-temps``). 237 238 The driver interacts with a ToolChain to perform the Tool bindings. 239 Each ToolChain contains information about all the tools needed for 240 compilation for a particular architecture, platform, and operating 241 system. A single driver invocation may query multiple ToolChains 242 during one compilation in order to interact with tools for separate 243 architectures. 244 245 The results of this stage are not computed directly, but the driver 246 can print the results via the ``-ccc-print-bindings`` option. For 247 example: 248 249 .. code-block:: console 250 251 $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c 252 # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s" 253 # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o" 254 # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out" 255 # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s" 256 # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o" 257 # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out" 258 # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out" 259 260 This shows the tool chain, tool, inputs and outputs which have been 261 bound for this compilation sequence. Here clang is being used to 262 compile t0.c on the i386 architecture and darwin specific versions of 263 the tools are being used to assemble and link the result, but generic 264 gcc versions of the tools are being used on PowerPC. 265 266 #. **Translate: Tool Specific Argument Translation** 267 268 Once a Tool has been selected to perform a particular Action, the 269 Tool must construct concrete Jobs which will be executed during 270 compilation. The main work is in translating from the gcc style 271 command line options to whatever options the subprocess expects. 272 273 Some tools, such as the assembler, only interact with a handful of 274 arguments and just determine the path of the executable to call and 275 pass on their input and output arguments. Others, like the compiler 276 or the linker, may translate a large number of arguments in addition. 277 278 The ArgList class provides a number of simple helper methods to 279 assist with translating arguments; for example, to pass on only the 280 last of arguments corresponding to some option, or all arguments for 281 an option. 282 283 The result of this stage is a list of Jobs (executable paths and 284 argument strings) to execute. 285 286 #. **Execute** 287 288 Finally, the compilation pipeline is executed. This is mostly 289 straightforward, although there is some interaction with options like 290 ``-pipe``, ``-pass-exit-codes`` and ``-time``. 291 292 Additional Notes 293 ---------------- 294 295 The Compilation Object 296 ^^^^^^^^^^^^^^^^^^^^^^ 297 298 The driver constructs a Compilation object for each set of command line 299 arguments. The Driver itself is intended to be invariant during 300 construction of a Compilation; an IDE should be able to construct a 301 single long lived driver instance to use for an entire build, for 302 example. 303 304 The Compilation object holds information that is particular to each 305 compilation sequence. For example, the list of used temporary files 306 (which must be removed once compilation is finished) and result files 307 (which should be removed if compilation fails). 308 309 Unified Parsing & Pipelining 310 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 311 312 Parsing and pipelining both occur without reference to a Compilation 313 instance. This is by design; the driver expects that both of these 314 phases are platform neutral, with a few very well defined exceptions 315 such as whether the platform uses a driver driver. 316 317 ToolChain Argument Translation 318 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 319 320 In order to match gcc very closely, the clang driver currently allows 321 tool chains to perform their own translation of the argument list (into 322 a new ArgList data structure). Although this allows the clang driver to 323 match gcc easily, it also makes the driver operation much harder to 324 understand (since the Tools stop seeing some arguments the user 325 provided, and see new ones instead). 326 327 For example, on Darwin ``-gfull`` gets translated into two separate 328 arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to 329 write Tool logic to do something with ``-gfull`` will not work, because 330 Tool argument translation is done after the arguments have been 331 translated. 332 333 A long term goal is to remove this tool chain specific translation, and 334 instead force each tool to change its own logic to do the right thing on 335 the untranslated original arguments. 336 337 Unused Argument Warnings 338 ^^^^^^^^^^^^^^^^^^^^^^^^ 339 340 The driver operates by parsing all arguments but giving Tools the 341 opportunity to choose which arguments to pass on. One downside of this 342 infrastructure is that if the user misspells some option, or is confused 343 about which options to use, some command line arguments the user really 344 cared about may go unused. This problem is particularly important when 345 using clang as a compiler, since the clang compiler does not support 346 anywhere near all the options that gcc does, and we want to make sure 347 users know which ones are being used. 348 349 To support this, the driver maintains a bit associated with each 350 argument of whether it has been used (at all) during the compilation. 351 This bit usually doesn't need to be set by hand, as the key ArgList 352 accessors will set it automatically. 353 354 When a compilation is successful (there are no errors), the driver 355 checks the bit and emits an "unused argument" warning for any arguments 356 which were never accessed. This is conservative (the argument may not 357 have been used to do what the user wanted) but still catches the most 358 obvious cases. 359 360 Relation to GCC Driver Concepts 361 ------------------------------- 362 363 For those familiar with the gcc driver, this section provides a brief 364 overview of how things from the gcc driver map to the clang driver. 365 366 - **Driver Driver** 367 368 The driver driver is fully integrated into the clang driver. The 369 driver simply constructs additional Actions to bind the architecture 370 during the *Pipeline* phase. The tool chain specific argument 371 translation is responsible for handling ``-Xarch_``. 372 373 The one caveat is that this approach requires ``-Xarch_`` not be used 374 to alter the compilation itself (for example, one cannot provide 375 ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject 376 such invocations, and overall there isn't a good reason to abuse 377 ``-Xarch_`` to that end in practice. 378 379 The upside is that the clang driver is more efficient and does little 380 extra work to support universal builds. It also provides better error 381 reporting and UI consistency. 382 383 - **Specs** 384 385 The clang driver has no direct correspondent for "specs". The 386 majority of the functionality that is embedded in specs is in the 387 Tool specific argument translation routines. The parts of specs which 388 control the compilation pipeline are generally part of the *Pipeline* 389 stage. 390 391 - **Toolchains** 392 393 The gcc driver has no direct understanding of tool chains. Each gcc 394 binary roughly corresponds to the information which is embedded 395 inside a single ToolChain. 396 397 The clang driver is intended to be portable and support complex 398 compilation environments. All platform and tool chain specific code 399 should be protected behind either abstract or well defined interfaces 400 (such as whether the platform supports use as a driver driver). 401