1 <html> 2 <head> 3 <title>Clang Driver Manual</title> 4 <link type="text/css" rel="stylesheet" href="../menu.css" /> 5 <link type="text/css" rel="stylesheet" href="../content.css" /> 6 <style type="text/css"> 7 td { 8 vertical-align: top; 9 } 10 </style> 11 </head> 12 <body> 13 14 <!--#include virtual="../menu.html.incl"--> 15 16 <div id="content"> 17 18 <h1>Driver Design & Internals</h1> 19 20 <ul> 21 <li><a href="#intro">Introduction</a></li> 22 <li><a href="#features">Features and Goals</a></li> 23 <ul> 24 <li><a href="#gcccompat">GCC Compatibility</a></li> 25 <li><a href="#components">Flexible</a></li> 26 <li><a href="#performance">Low Overhead</a></li> 27 <li><a href="#simple">Simple</a></li> 28 </ul> 29 <li><a href="#design">Design</a></li> 30 <ul> 31 <li><a href="#int_intro">Internals Introduction</a></li> 32 <li><a href="#int_overview">Design Overview</a></li> 33 <li><a href="#int_notes">Additional Notes</a></li> 34 <ul> 35 <li><a href="#int_compilation">The Compilation Object</a></li> 36 <li><a href="#int_unified_parsing">Unified Parsing & Pipelining</a></li> 37 <li><a href="#int_toolchain_translation">ToolChain Argument Translation</a></li> 38 <li><a href="#int_unused_warnings">Unused Argument Warnings</a></li> 39 </ul> 40 <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li> 41 </ul> 42 </ul> 43 44 45 <!-- ======================================================================= --> 46 <h2 id="intro">Introduction</h2> 47 <!-- ======================================================================= --> 48 49 <p>This document describes the Clang driver. The purpose of this 50 document is to describe both the motivation and design goals 51 for the driver, as well as details of the internal 52 implementation.</p> 53 54 <!-- ======================================================================= --> 55 <h2 id="features">Features and Goals</h2> 56 <!-- ======================================================================= --> 57 58 <p>The Clang driver is intended to be a production quality 59 compiler driver providing access to the Clang compiler and 60 tools, with a command line interface which is compatible with 61 the gcc driver.</p> 62 63 <p>Although the driver is part of and driven by the Clang 64 project, it is logically a separate tool which shares many of 65 the same goals as Clang:</p> 66 67 <p><b>Features</b>:</p> 68 <ul> 69 <li><a href="#gcccompat">GCC Compatibility</a></li> 70 <li><a href="#components">Flexible</a></li> 71 <li><a href="#performance">Low Overhead</a></li> 72 <li><a href="#simple">Simple</a></li> 73 </ul> 74 75 <!--=======================================================================--> 76 <h3 id="gcccompat">GCC Compatibility</h3> 77 <!--=======================================================================--> 78 79 <p>The number one goal of the driver is to ease the adoption of 80 Clang by allowing users to drop Clang into a build system 81 which was designed to call GCC. Although this makes the driver 82 much more complicated than might otherwise be necessary, we 83 decided that being very compatible with the gcc command line 84 interface was worth it in order to allow users to quickly test 85 clang on their projects.</p> 86 87 <!--=======================================================================--> 88 <h3 id="components">Flexible</h3> 89 <!--=======================================================================--> 90 91 <p>The driver was designed to be flexible and easily accommodate 92 new uses as we grow the clang and LLVM infrastructure. As one 93 example, the driver can easily support the introduction of 94 tools which have an integrated assembler; something we hope to 95 add to LLVM in the future.</p> 96 97 <p>Similarly, most of the driver functionality is kept in a 98 library which can be used to build other tools which want to 99 implement or accept a gcc like interface. </p> 100 101 <!--=======================================================================--> 102 <h3 id="performance">Low Overhead</h3> 103 <!--=======================================================================--> 104 105 <p>The driver should have as little overhead as possible. In 106 practice, we found that the gcc driver by itself incurred a 107 small but meaningful overhead when compiling many small 108 files. The driver doesn't do much work compared to a 109 compilation, but we have tried to keep it as efficient as 110 possible by following a few simple principles:</p> 111 <ul> 112 <li>Avoid memory allocation and string copying when 113 possible.</li> 114 115 <li>Don't parse arguments more than once.</li> 116 117 <li>Provide a few simple interfaces for efficiently searching 118 arguments.</li> 119 </ul> 120 121 <!--=======================================================================--> 122 <h3 id="simple">Simple</h3> 123 <!--=======================================================================--> 124 125 <p>Finally, the driver was designed to be "as simple as 126 possible", given the other goals. Notably, trying to be 127 completely compatible with the gcc driver adds a significant 128 amount of complexity. However, the design of the driver 129 attempts to mitigate this complexity by dividing the process 130 into a number of independent stages instead of a single 131 monolithic task.</p> 132 133 <!-- ======================================================================= --> 134 <h2 id="design">Internal Design and Implementation</h2> 135 <!-- ======================================================================= --> 136 137 <ul> 138 <li><a href="#int_intro">Internals Introduction</a></li> 139 <li><a href="#int_overview">Design Overview</a></li> 140 <li><a href="#int_notes">Additional Notes</a></li> 141 <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li> 142 </ul> 143 144 <!--=======================================================================--> 145 <h3><a name="int_intro">Internals Introduction</a></h3> 146 <!--=======================================================================--> 147 148 <p>In order to satisfy the stated goals, the driver was designed 149 to completely subsume the functionality of the gcc executable; 150 that is, the driver should not need to delegate to gcc to 151 perform subtasks. On Darwin, this implies that the Clang 152 driver also subsumes the gcc driver-driver, which is used to 153 implement support for building universal images (binaries and 154 object files). This also implies that the driver should be 155 able to call the language specific compilers (e.g. cc1) 156 directly, which means that it must have enough information to 157 forward command line arguments to child processes 158 correctly.</p> 159 160 <!--=======================================================================--> 161 <h3><a name="int_overview">Design Overview</a></h3> 162 <!--=======================================================================--> 163 164 <p>The diagram below shows the significant components of the 165 driver architecture and how they relate to one another. The 166 orange components represent concrete data structures built by 167 the driver, the green components indicate conceptually 168 distinct stages which manipulate these data structures, and 169 the blue components are important helper classes. </p> 170 171 <center> 172 <a href="DriverArchitecture.png" alt="Driver Architecture Diagram"> 173 <img width=400 src="DriverArchitecture.png"> 174 </a> 175 </center> 176 177 <!--=======================================================================--> 178 <h3><a name="int_stages">Driver Stages</a></h3> 179 <!--=======================================================================--> 180 181 <p>The driver functionality is conceptually divided into five stages:</p> 182 183 <ol> 184 <li> 185 <b>Parse: Option Parsing</b> 186 187 <p>The command line argument strings are decomposed into 188 arguments (<tt>Arg</tt> instances). The driver expects to 189 understand all available options, although there is some 190 facility for just passing certain classes of options 191 through (like <tt>-Wl,</tt>).</p> 192 193 <p>Each argument corresponds to exactly one 194 abstract <tt>Option</tt> definition, which describes how 195 the option is parsed along with some additional 196 metadata. The Arg instances themselves are lightweight and 197 merely contain enough information for clients to determine 198 which option they correspond to and their values (if they 199 have additional parameters).</p> 200 201 <p>For example, a command line like "-Ifoo -I foo" would 202 parse to two Arg instances (a JoinedArg and a SeparateArg 203 instance), but each would refer to the same Option.</p> 204 205 <p>Options are lazily created in order to avoid populating 206 all Option classes when the driver is loaded. Most of the 207 driver code only needs to deal with options by their 208 unique ID (e.g., <tt>options::OPT_I</tt>),</p> 209 210 <p>Arg instances themselves do not generally store the 211 values of parameters. In many cases, this would 212 simply result in creating unnecessary string 213 copies. Instead, Arg instances are always embedded inside 214 an ArgList structure, which contains the original vector 215 of argument strings. Each Arg itself only needs to contain 216 an index into this vector instead of storing its values 217 directly.</p> 218 219 <p>The clang driver can dump the results of this 220 stage using the <tt>-ccc-print-options</tt> flag (which 221 must precede any actual command line arguments). For 222 example:</p> 223 <pre> 224 $ <b>clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c</b> 225 Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"} 226 Option 1 - Name: "-Wa,", Values: {"-fast"} 227 Option 2 - Name: "-I", Values: {"foo"} 228 Option 3 - Name: "-I", Values: {"foo"} 229 Option 4 - Name: "<input>", Values: {"t.c"} 230 </pre> 231 232 <p>After this stage is complete the command line should be 233 broken down into well defined option objects with their 234 appropriate parameters. Subsequent stages should rarely, 235 if ever, need to do any string processing.</p> 236 </li> 237 238 <li> 239 <b>Pipeline: Compilation Job Construction</b> 240 241 <p>Once the arguments are parsed, the tree of subprocess 242 jobs needed for the desired compilation sequence are 243 constructed. This involves determining the input files and 244 their types, what work is to be done on them (preprocess, 245 compile, assemble, link, etc.), and constructing a list of 246 Action instances for each task. The result is a list of 247 one or more top-level actions, each of which generally 248 corresponds to a single output (for example, an object or 249 linked executable).</p> 250 251 <p>The majority of Actions correspond to actual tasks, 252 however there are two special Actions. The first is 253 InputAction, which simply serves to adapt an input 254 argument for use as an input to other Actions. The second 255 is BindArchAction, which conceptually alters the 256 architecture to be used for all of its input Actions.</p> 257 258 <p>The clang driver can dump the results of this 259 stage using the <tt>-ccc-print-phases</tt> flag. For 260 example:</p> 261 <pre> 262 $ <b>clang -ccc-print-phases -x c t.c -x assembler t.s</b> 263 0: input, "t.c", c 264 1: preprocessor, {0}, cpp-output 265 2: compiler, {1}, assembler 266 3: assembler, {2}, object 267 4: input, "t.s", assembler 268 5: assembler, {4}, object 269 6: linker, {3, 5}, image 270 </pre> 271 <p>Here the driver is constructing seven distinct actions, 272 four to compile the "t.c" input into an object file, two to 273 assemble the "t.s" input, and one to link them together.</p> 274 275 <p>A rather different compilation pipeline is shown here; in 276 this example there are two top level actions to compile 277 the input files into two separate object files, where each 278 object file is built using <tt>lipo</tt> to merge results 279 built for two separate architectures.</p> 280 <pre> 281 $ <b>clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c</b> 282 0: input, "t0.c", c 283 1: preprocessor, {0}, cpp-output 284 2: compiler, {1}, assembler 285 3: assembler, {2}, object 286 4: bind-arch, "i386", {3}, object 287 5: bind-arch, "x86_64", {3}, object 288 6: lipo, {4, 5}, object 289 7: input, "t1.c", c 290 8: preprocessor, {7}, cpp-output 291 9: compiler, {8}, assembler 292 10: assembler, {9}, object 293 11: bind-arch, "i386", {10}, object 294 12: bind-arch, "x86_64", {10}, object 295 13: lipo, {11, 12}, object 296 </pre> 297 298 <p>After this stage is complete the compilation process is 299 divided into a simple set of actions which need to be 300 performed to produce intermediate or final outputs (in 301 some cases, like <tt>-fsyntax-only</tt>, there is no 302 "real" final output). Phases are well known compilation 303 steps, such as "preprocess", "compile", "assemble", 304 "link", etc.</p> 305 </li> 306 307 <li> 308 <b>Bind: Tool & Filename Selection</b> 309 310 <p>This stage (in conjunction with the Translate stage) 311 turns the tree of Actions into a list of actual subprocess 312 to run. Conceptually, the driver performs a top down 313 matching to assign Action(s) to Tools. The ToolChain is 314 responsible for selecting the tool to perform a particular 315 action; once selected the driver interacts with the tool 316 to see if it can match additional actions (for example, by 317 having an integrated preprocessor). 318 319 <p>Once Tools have been selected for all actions, the driver 320 determines how the tools should be connected (for example, 321 using an inprocess module, pipes, temporary files, or user 322 provided filenames). If an output file is required, the 323 driver also computes the appropriate file name (the suffix 324 and file location depend on the input types and options 325 such as <tt>-save-temps</tt>). 326 327 <p>The driver interacts with a ToolChain to perform the Tool 328 bindings. Each ToolChain contains information about all 329 the tools needed for compilation for a particular 330 architecture, platform, and operating system. A single 331 driver invocation may query multiple ToolChains during one 332 compilation in order to interact with tools for separate 333 architectures.</p> 334 335 <p>The results of this stage are not computed directly, but 336 the driver can print the results via 337 the <tt>-ccc-print-bindings</tt> option. For example:</p> 338 <pre> 339 $ <b>clang -ccc-print-bindings -arch i386 -arch ppc t0.c</b> 340 # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s" 341 # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o" 342 # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out" 343 # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s" 344 # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o" 345 # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out" 346 # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out" 347 </pre> 348 349 <p>This shows the tool chain, tool, inputs and outputs which 350 have been bound for this compilation sequence. Here clang 351 is being used to compile t0.c on the i386 architecture and 352 darwin specific versions of the tools are being used to 353 assemble and link the result, but generic gcc versions of 354 the tools are being used on PowerPC.</p> 355 </li> 356 357 <li> 358 <b>Translate: Tool Specific Argument Translation</b> 359 360 <p>Once a Tool has been selected to perform a particular 361 Action, the Tool must construct concrete Jobs which will be 362 executed during compilation. The main work is in translating 363 from the gcc style command line options to whatever options 364 the subprocess expects.</p> 365 366 <p>Some tools, such as the assembler, only interact with a 367 handful of arguments and just determine the path of the 368 executable to call and pass on their input and output 369 arguments. Others, like the compiler or the linker, may 370 translate a large number of arguments in addition.</p> 371 372 <p>The ArgList class provides a number of simple helper 373 methods to assist with translating arguments; for example, 374 to pass on only the last of arguments corresponding to some 375 option, or all arguments for an option.</p> 376 377 <p>The result of this stage is a list of Jobs (executable 378 paths and argument strings) to execute.</p> 379 </li> 380 381 <li> 382 <b>Execute</b> 383 <p>Finally, the compilation pipeline is executed. This is 384 mostly straightforward, although there is some interaction 385 with options 386 like <tt>-pipe</tt>, <tt>-pass-exit-codes</tt> 387 and <tt>-time</tt>.</p> 388 </li> 389 390 </ol> 391 392 <!--=======================================================================--> 393 <h3><a name="int_notes">Additional Notes</a></h3> 394 <!--=======================================================================--> 395 396 <h4 id="int_compilation">The Compilation Object</h4> 397 398 <p>The driver constructs a Compilation object for each set of 399 command line arguments. The Driver itself is intended to be 400 invariant during construction of a Compilation; an IDE should be 401 able to construct a single long lived driver instance to use 402 for an entire build, for example.</p> 403 404 <p>The Compilation object holds information that is particular 405 to each compilation sequence. For example, the list of used 406 temporary files (which must be removed once compilation is 407 finished) and result files (which should be removed if 408 compilation fails).</p> 409 410 <h4 id="int_unified_parsing">Unified Parsing & Pipelining</h4> 411 412 <p>Parsing and pipelining both occur without reference to a 413 Compilation instance. This is by design; the driver expects that 414 both of these phases are platform neutral, with a few very well 415 defined exceptions such as whether the platform uses a driver 416 driver.</p> 417 418 <h4 id="int_toolchain_translation">ToolChain Argument Translation</h4> 419 420 <p>In order to match gcc very closely, the clang driver 421 currently allows tool chains to perform their own translation of 422 the argument list (into a new ArgList data structure). Although 423 this allows the clang driver to match gcc easily, it also makes 424 the driver operation much harder to understand (since the Tools 425 stop seeing some arguments the user provided, and see new ones 426 instead).</p> 427 428 <p>For example, on Darwin <tt>-gfull</tt> gets translated into two 429 separate arguments, <tt>-g</tt> 430 and <tt>-fno-eliminate-unused-debug-symbols</tt>. Trying to write Tool 431 logic to do something with <tt>-gfull</tt> will not work, because Tool 432 argument translation is done after the arguments have been 433 translated.</p> 434 435 <p>A long term goal is to remove this tool chain specific 436 translation, and instead force each tool to change its own logic 437 to do the right thing on the untranslated original arguments.</p> 438 439 <h4 id="int_unused_warnings">Unused Argument Warnings</h4> 440 <p>The driver operates by parsing all arguments but giving Tools 441 the opportunity to choose which arguments to pass on. One 442 downside of this infrastructure is that if the user misspells 443 some option, or is confused about which options to use, some 444 command line arguments the user really cared about may go 445 unused. This problem is particularly important when using 446 clang as a compiler, since the clang compiler does not support 447 anywhere near all the options that gcc does, and we want to make 448 sure users know which ones are being used.</p> 449 450 <p>To support this, the driver maintains a bit associated with 451 each argument of whether it has been used (at all) during the 452 compilation. This bit usually doesn't need to be set by hand, 453 as the key ArgList accessors will set it automatically.</p> 454 455 <p>When a compilation is successful (there are no errors), the 456 driver checks the bit and emits an "unused argument" warning for 457 any arguments which were never accessed. This is conservative 458 (the argument may not have been used to do what the user wanted) 459 but still catches the most obvious cases.</p> 460 461 <!--=======================================================================--> 462 <h3><a name="int_gcc_concepts">Relation to GCC Driver Concepts</a></h3> 463 <!--=======================================================================--> 464 465 <p>For those familiar with the gcc driver, this section provides 466 a brief overview of how things from the gcc driver map to the 467 clang driver.</p> 468 469 <ul> 470 <li> 471 <b>Driver Driver</b> 472 <p>The driver driver is fully integrated into the clang 473 driver. The driver simply constructs additional Actions to 474 bind the architecture during the <i>Pipeline</i> 475 phase. The tool chain specific argument translation is 476 responsible for handling <tt>-Xarch_</tt>.</p> 477 478 <p>The one caveat is that this approach 479 requires <tt>-Xarch_</tt> not be used to alter the 480 compilation itself (for example, one cannot 481 provide <tt>-S</tt> as an <tt>-Xarch_</tt> argument). The 482 driver attempts to reject such invocations, and overall 483 there isn't a good reason to abuse <tt>-Xarch_</tt> to 484 that end in practice.</p> 485 486 <p>The upside is that the clang driver is more efficient and 487 does little extra work to support universal builds. It also 488 provides better error reporting and UI consistency.</p> 489 </li> 490 491 <li> 492 <b>Specs</b> 493 <p>The clang driver has no direct correspondent for 494 "specs". The majority of the functionality that is 495 embedded in specs is in the Tool specific argument 496 translation routines. The parts of specs which control the 497 compilation pipeline are generally part of 498 the <ii>Pipeline</ii> stage.</p> 499 </li> 500 501 <li> 502 <b>Toolchains</b> 503 <p>The gcc driver has no direct understanding of tool 504 chains. Each gcc binary roughly corresponds to the 505 information which is embedded inside a single 506 ToolChain.</p> 507 508 <p>The clang driver is intended to be portable and support 509 complex compilation environments. All platform and tool 510 chain specific code should be protected behind either 511 abstract or well defined interfaces (such as whether the 512 platform supports use as a driver driver).</p> 513 </li> 514 </ul> 515 </div> 516 </body> 517 </html> 518