1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 <html> 4 <head> 5 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 6 <title>Clang - Features and Goals</title> 7 <link type="text/css" rel="stylesheet" href="menu.css"> 8 <link type="text/css" rel="stylesheet" href="content.css"> 9 <style type="text/css"> 10 </style> 11 </head> 12 <body> 13 14 <!--#include virtual="menu.html.incl"--> 15 16 <div id="content"> 17 18 <!--*************************************************************************--> 19 <h1>Clang - Features and Goals</h1> 20 <!--*************************************************************************--> 21 22 <p> 23 This page describes the <a href="index.html#goals">features and goals</a> of 24 Clang in more detail and gives a more broad explanation about what we mean. 25 These features are: 26 </p> 27 28 <p>End-User Features:</p> 29 30 <ul> 31 <li><a href="#performance">Fast compiles and low memory use</a></li> 32 <li><a href="#expressivediags">Expressive diagnostics</a></li> 33 <li><a href="#gcccompat">GCC compatibility</a></li> 34 </ul> 35 36 <p>Utility and Applications:</p> 37 38 <ul> 39 <li><a href="#libraryarch">Library based architecture</a></li> 40 <li><a href="#diverseclients">Support diverse clients</a></li> 41 <li><a href="#ideintegration">Integration with IDEs</a></li> 42 <li><a href="#license">Use the LLVM 'BSD' License</a></li> 43 </ul> 44 45 <p>Internal Design and Implementation:</p> 46 47 <ul> 48 <li><a href="#real">A real-world, production quality compiler</a></li> 49 <li><a href="#simplecode">A simple and hackable code base</a></li> 50 <li><a href="#unifiedparser">A single unified parser for C, Objective C, C++, 51 and Objective C++</a></li> 52 <li><a href="#conformance">Conformance with C/C++/ObjC and their 53 variants</a></li> 54 </ul> 55 56 <!--*************************************************************************--> 57 <h2><a name="enduser">End-User Features</a></h2> 58 <!--*************************************************************************--> 59 60 61 <!--=======================================================================--> 62 <h3><a name="performance">Fast compiles and Low Memory Use</a></h3> 63 <!--=======================================================================--> 64 65 <p>A major focus of our work on clang is to make it fast, light and scalable. 66 The library-based architecture of clang makes it straight-forward to time and 67 profile the cost of each layer of the stack, and the driver has a number of 68 options for performance analysis.</p> 69 70 <p>While there is still much that can be done, we find that the clang front-end 71 is significantly quicker than gcc and uses less memory For example, when 72 compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p> 73 74 <img class="img_slide" src="feature-compile1.png" width="400" height="300" 75 alt="Time to parse carbon.h: -fsyntax-only"> 76 77 <p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code, 78 declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum 79 constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang 80 talk</a> for more information). It is also #include'd into almost every C file 81 in a GUI app on the Mac, so its compile time is very important.</p> 82 83 <p>From the slide above, you can see that we can measure the time to preprocess 84 the file independently from the time to parse it, and independently from the 85 time to build the ASTs for the code. GCC doesn't provide a way to measure the 86 parser without AST building (it only provides -fsyntax-only). In our 87 measurements, we find that clang's preprocessor is consistently 40% faster than 88 GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have 89 sources that do not depend as heavily on the preprocessor (or if you 90 use Precompiled Headers) you may see a much bigger speedup from clang. 91 </p> 92 93 <p>Compile time performance is important, but when using clang as an API, often 94 memory use is even moreso: the less memory the code takes the more code you can 95 fit into memory at a time (useful for whole program analysis tools, for 96 example).</p> 97 98 <img class="img_slide" src="feature-memory1.png" width="400" height="300" 99 alt="Space"> 100 101 <p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b> 102 than GCC's syntax trees, despite the fact that clang's ASTs capture far more 103 source-level information than GCC's trees do. This feat is accomplished through 104 the use of carefully designed APIs and efficient representations.</p> 105 106 <p>In addition to being efficient when pitted head-to-head against GCC in batch 107 mode, clang is built with a <a href="#libraryarch">library based 108 architecture</a> that makes it relatively easy to adapt it and build new tools 109 with it. This means that it is often possible to apply out-of-the-box thinking 110 and novel techniques to improve compilation in various ways.</p> 111 112 <img class="img_slide" src="feature-compile2.png" width="400" height="300" 113 alt="Preprocessor Speeds: GCC 4.2 vs clang-all"> 114 115 <p>This slide shows how the clang preprocessor can be used to make "distcc" 116 parallelization <b>3x</b> more scalable than when using the GCC preprocessor. 117 "distcc" quickly bottlenecks on the preprocessor running on the central driver 118 machine, so a fast preprocessor is very useful. Comparing the first two bars 119 of each group shows how a ~40% faster preprocessor can reduce preprocessing time 120 of these large C++ apps by about 40% (shocking!).</p> 121 122 <p>The third bar on the slide is the interesting part: it shows how trivial 123 caching of file system accesses across invocations of the preprocessor allows 124 clang to reduce time spent in the kernel by 10x, making distcc over 3x more 125 scalable. This is obviously just one simple hack, doing more interesting things 126 (like caching tokens across preprocessed files) would yield another substantial 127 speedup.</p> 128 129 <p>The clean framework-based design of clang means that many things are possible 130 that would be very difficult in other systems, for example incremental 131 compilation, multithreading, intelligent caching, etc. We are only starting 132 to tap the full potential of the clang design.</p> 133 134 135 <!--=======================================================================--> 136 <h3><a name="expressivediags">Expressive Diagnostics</a></h3> 137 <!--=======================================================================--> 138 139 <p>In addition to being fast and functional, we aim to make Clang extremely user 140 friendly. As far as a command-line compiler goes, this basically boils down to 141 making the diagnostics (error and warning messages) generated by the compiler 142 be as useful as possible. There are several ways that we do this, but the 143 most important are pinpointing exactly what is wrong in the program, 144 highlighting related information so that it is easy to understand at a glance, 145 and making the wording as clear as possible.</p> 146 147 <p>Here is one simple example that illustrates the difference between a typical 148 GCC and Clang diagnostic:</p> 149 150 <pre> 151 $ <b>gcc-4.2 -fsyntax-only t.c</b> 152 t.c:7: error: invalid operands to binary + (have 'int' and 'struct A') 153 $ <b>clang -fsyntax-only t.c</b> 154 t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A') 155 <span style="color:darkgreen"> return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</span> 156 <span style="color:blue"> ~~~~~~~~~~~~~~ ^ ~~~~~</span> 157 </pre> 158 159 <p>Here you can see that you don't even need to see the original source code to 160 understand what is wrong based on the Clang error: Because clang prints a 161 caret, you know exactly <em>which</em> plus it is complaining about. The range 162 information highlights the left and right side of the plus which makes it 163 immediately obvious what the compiler is talking about, which is very useful for 164 cases involving precedence issues and many other situations.</p> 165 166 <p>Clang diagnostics are very polished and have many features. For more 167 information and examples, please see the <a href="diagnostics.html">Expressive 168 Diagnostics</a> page.</p> 169 170 <!--=======================================================================--> 171 <h3><a name="gcccompat">GCC Compatibility</a></h3> 172 <!--=======================================================================--> 173 174 <p>GCC is currently the defacto-standard open source compiler today, and it 175 routinely compiles a huge volume of code. GCC supports a huge number of 176 extensions and features (many of which are undocumented) and a lot of 177 code and header files depend on these features in order to build.</p> 178 179 <p>While it would be nice to be able to ignore these extensions and focus on 180 implementing the language standards to the letter, pragmatics force us to 181 support the GCC extensions that see the most use. Many users just want their 182 code to compile, they don't care to argue about whether it is pedantically C99 183 or not.</p> 184 185 <p>As mentioned above, all 186 extensions are explicitly recognized as such and marked with extension 187 diagnostics, which can be mapped to warnings, errors, or just ignored. 188 </p> 189 190 191 <!--*************************************************************************--> 192 <h2><a name="applications">Utility and Applications</a></h2> 193 <!--*************************************************************************--> 194 195 <!--=======================================================================--> 196 <h3><a name="libraryarch">Library Based Architecture</a></h3> 197 <!--=======================================================================--> 198 199 <p>A major design concept for clang is its use of a library-based 200 architecture. In this design, various parts of the front-end can be cleanly 201 divided into separate libraries which can then be mixed up for different needs 202 and uses. In addition, the library-based approach encourages good interfaces 203 and makes it easier for new developers to get involved (because they only need 204 to understand small pieces of the big picture).</p> 205 206 <blockquote><p> 207 "The world needs better compiler tools, tools which are built as libraries. 208 This design point allows reuse of the tools in new and novel ways. However, 209 building the tools as libraries isn't enough: they must have clean APIs, be as 210 decoupled from each other as possible, and be easy to modify/extend. This 211 requires clean layering, decent design, and keeping the libraries independent of 212 any specific client."</p></blockquote> 213 214 <p> 215 Currently, clang is divided into the following libraries and tool: 216 </p> 217 218 <ul> 219 <li><b>libsupport</b> - Basic support library, from LLVM.</li> 220 <li><b>libsystem</b> - System abstraction library, from LLVM.</li> 221 <li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction, 222 file system caching for input source files.</li> 223 <li><b>libast</b> - Provides classes to represent the C AST, the C type system, 224 builtin functions, and various helpers for analyzing and manipulating the 225 AST (visitors, pretty printers, etc).</li> 226 <li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma 227 handling, tokens, and macro expansion.</li> 228 <li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions' 229 provided by the client (e.g. libsema builds ASTs) but knows nothing about 230 ASTs or other client-specific data structures.</li> 231 <li><b>libsema</b> - Semantic Analysis. This provides a set of parser actions 232 to build a standardized AST for programs.</li> 233 <li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization & code 234 generation.</li> 235 <li><b>librewrite</b> - Editing of text buffers (important for code rewriting 236 transformation, like refactoring).</li> 237 <li><b>libanalysis</b> - Static analysis support.</li> 238 <li><b>clang</b> - A driver program, client of the libraries at various 239 levels.</li> 240 </ul> 241 242 <p>As an example of the power of this library based design.... If you wanted to 243 build a preprocessor, you would take the Basic and Lexer libraries. If you want 244 an indexer, you would take the previous two and add the Parser library and 245 some actions for indexing. If you want a refactoring, static analysis, or 246 source-to-source compiler tool, you would then add the AST building and 247 semantic analyzer libraries.</p> 248 249 <p>For more information about the low-level implementation details of the 250 various clang libraries, please see the <a href="docs/InternalsManual.html"> 251 clang Internals Manual</a>.</p> 252 253 <!--=======================================================================--> 254 <h3><a name="diverseclients">Support Diverse Clients</a></h3> 255 <!--=======================================================================--> 256 257 <p>Clang is designed and built with many grand plans for how we can use it. The 258 driving force is the fact that we use C and C++ daily, and have to suffer due to 259 a lack of good tools available for it. We believe that the C and C++ tools 260 ecosystem has been significantly limited by how difficult it is to parse and 261 represent the source code for these languages, and we aim to rectify this 262 problem in clang.</p> 263 264 <p>The problem with this goal is that different clients have very different 265 requirements. Consider code generation, for example: a simple front-end that 266 parses for code generation must analyze the code for validity and emit code 267 in some intermediate form to pass off to a optimizer or backend. Because 268 validity analysis and code generation can largely be done on the fly, there is 269 not hard requirement that the front-end actually build up a full AST for all 270 the expressions and statements in the code. TCC and GCC are examples of 271 compilers that either build no real AST (in the former case) or build a stripped 272 down and simplified AST (in the later case) because they focus primarily on 273 codegen.</p> 274 275 <p>On the opposite side of the spectrum, some clients (like refactoring) want 276 highly detailed information about the original source code and want a complete 277 AST to describe it with. Refactoring wants to have information about macro 278 expansions, the location of every paren expression '(((x)))' vs 'x', full 279 position information, and much more. Further, refactoring wants to look 280 <em>across the whole program</em> to ensure that it is making transformations 281 that are safe. Making this efficient and getting this right requires a 282 significant amount of engineering and algorithmic work that simply are 283 unnecessary for a simple static compiler.</p> 284 285 <p>The beauty of the clang approach is that it does not restrict how you use it. 286 In particular, it is possible to use the clang preprocessor and parser to build 287 an extremely quick and light-weight on-the-fly code generator (similar to TCC) 288 that does not build an AST at all. As an intermediate step, clang supports 289 using the current AST generation and semantic analysis code and having a code 290 generation client free the AST for each function after code generation. Finally, 291 clang provides support for building and retaining fully-fledged ASTs, and even 292 supports writing them out to disk.</p> 293 294 <p>Designing the libraries with clean and simple APIs allows these high-level 295 policy decisions to be determined in the client, instead of forcing "one true 296 way" in the implementation of any of these libraries. Getting this right is 297 hard, and we don't always get it right the first time, but we fix any problems 298 when we realize we made a mistake.</p> 299 300 <!--=======================================================================--> 301 <h3 id="ideintegration">Integration with IDEs</h3> 302 <!--=======================================================================--> 303 304 <p> 305 We believe that Integrated Development Environments (IDE's) are a great way 306 to pull together various pieces of the development puzzle, and aim to make clang 307 work well in such an environment. The chief advantage of an IDE is that they 308 typically have visibility across your entire project and are long-lived 309 processes, whereas stand-alone compiler tools are typically invoked on each 310 individual file in the project, and thus have limited scope.</p> 311 312 <p>There are many implications of this difference, but a significant one has to 313 do with efficiency and caching: sharing an address space across different files 314 in a project, means that you can use intelligent caching and other techniques to 315 dramatically reduce analysis/compilation time.</p> 316 317 <p>A further difference between IDEs and batch compiler is that they often 318 impose very different requirements on the front-end: they depend on high 319 performance in order to provide a "snappy" experience, and thus really want 320 techniques like "incremental compilation", "fuzzy parsing", etc. Finally, IDEs 321 often have very different requirements than code generation, often requiring 322 information that a codegen-only frontend can throw away. Clang is 323 specifically designed and built to capture this information. 324 </p> 325 326 327 <!--=======================================================================--> 328 <h3><a name="license">Use the LLVM 'BSD' License</a></h3> 329 <!--=======================================================================--> 330 331 <p>We actively intend for clang (and LLVM as a whole) to be used for 332 commercial projects, not only as a stand-alone compiler but also as a library 333 embedded inside a proprietary application. The BSD license is the simplest way 334 to allow this. We feel that the license encourages contributors to pick up the 335 source and work with it, and believe that those individuals and organizations 336 will contribute back their work if they do not want to have to maintain a fork 337 forever (which is time consuming and expensive when merges are involved). 338 Further, nobody makes money on compilers these days, but many people need them 339 to get bigger goals accomplished: it makes sense for everyone to work 340 together.</p> 341 342 <p>For more information about the LLVM/clang license, please see the <a 343 href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License 344 Description</a> for more information.</p> 345 346 347 348 <!--*************************************************************************--> 349 <h2><a name="design">Internal Design and Implementation</a></h2> 350 <!--*************************************************************************--> 351 352 <!--=======================================================================--> 353 <h3><a name="real">A real-world, production quality compiler</a></h3> 354 <!--=======================================================================--> 355 356 <p> 357 Clang is designed and built by experienced compiler developers who 358 are increasingly frustrated with the problems that <a 359 href="comparison.html">existing open source compilers</a> have. Clang is 360 carefully and thoughtfully designed and built to provide the foundation of a 361 whole new generation of C/C++/Objective C development tools, and we intend for 362 it to be production quality.</p> 363 364 <p>Being a production quality compiler means many things: it means being high 365 performance, being solid and (relatively) bug free, and it means eventually 366 being used and depended on by a broad range of people. While we are still in 367 the early development stages, we strongly believe that this will become a 368 reality.</p> 369 370 <!--=======================================================================--> 371 <h3><a name="simplecode">A simple and hackable code base</a></h3> 372 <!--=======================================================================--> 373 374 <p>Our goal is to make it possible for anyone with a basic understanding 375 of compilers and working knowledge of the C/C++/ObjC languages to understand and 376 extend the clang source base. A large part of this falls out of our decision to 377 make the AST mirror the languages as closely as possible: you have your friendly 378 if statement, for statement, parenthesis expression, structs, unions, etc, all 379 represented in a simple and explicit way.</p> 380 381 <p>In addition to a simple design, we work to make the source base approachable 382 by commenting it well, including citations of the language standards where 383 appropriate, and designing the code for simplicity. Beyond that, clang offers 384 a set of AST dumpers, printers, and visualizers that make it easy to put code in 385 and see how it is represented.</p> 386 387 <!--=======================================================================--> 388 <h3><a name="unifiedparser">A single unified parser for C, Objective C, C++, 389 and Objective C++</a></h3> 390 <!--=======================================================================--> 391 392 <p>Clang is the "C Language Family Front-end", which means we intend to support 393 the most popular members of the C family. We are convinced that the right 394 parsing technology for this class of languages is a hand-built recursive-descent 395 parser. Because it is plain C++ code, recursive descent makes it very easy for 396 new developers to understand the code, it easily supports ad-hoc rules and other 397 strange hacks required by C/C++, and makes it straight-forward to implement 398 excellent diagnostics and error recovery.</p> 399 400 <p>We believe that implementing C/C++/ObjC in a single unified parser makes the 401 end result easier to maintain and evolve than maintaining a separate C and C++ 402 parser which must be bugfixed and maintained independently of each other.</p> 403 404 <!--=======================================================================--> 405 <h3><a name="conformance">Conformance with C/C++/ObjC and their 406 variants</a></h3> 407 <!--=======================================================================--> 408 409 <p>When you start work on implementing a language, you find out that there is a 410 huge gap between how the language works and how most people understand it to 411 work. This gap is the difference between a normal programmer and a (scary? 412 super-natural?) "language lawyer", who knows the ins and outs of the language 413 and can grok standardese with ease.</p> 414 415 <p>In practice, being conformant with the languages means that we aim to support 416 the full language, including the dark and dusty corners (like trigraphs, 417 preprocessor arcana, C99 VLAs, etc). Where we support extensions above and 418 beyond what the standard officially allows, we make an effort to explicitly call 419 this out in the code and emit warnings about it (which are disabled by default, 420 but can optionally be mapped to either warnings or errors), allowing you to use 421 clang in "strict" mode if you desire.</p> 422 423 <p>We also intend to support "dialects" of these languages, such as C89, K&R 424 C, C++'03, Objective-C 2, etc.</p> 425 426 </div> 427 </body> 428 </html> 429