Home | History | Annotate | Download | only in www
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
      6   <title>Clang - Features and Goals</title>
      7   <link type="text/css" rel="stylesheet" href="menu.css" />
      8   <link type="text/css" rel="stylesheet" href="content.css" />
      9   <style type="text/css">
     10 </style>
     11 </head>
     12 <body>
     13 
     14 <!--#include virtual="menu.html.incl"-->
     15 
     16 <div id="content">
     17 
     18 <!--*************************************************************************-->
     19 <h1>Clang - Features and Goals</h1>
     20 <!--*************************************************************************-->
     21 
     22 <p>
     23 This page describes the <a href="index.html#goals">features and goals</a> of
     24 Clang in more detail and gives a more broad explanation about what we mean.
     25 These features are:
     26 </p>
     27 
     28 <p>End-User Features:</p>
     29 
     30 <ul>
     31 <li><a href="#performance">Fast compiles and low memory use</a></li>
     32 <li><a href="#expressivediags">Expressive diagnostics</a></li>
     33 <li><a href="#gcccompat">GCC compatibility</a></li>
     34 </ul>
     35 
     36 <p>Utility and Applications:</p>
     37 
     38 <ul>
     39 <li><a href="#libraryarch">Library based architecture</a></li>
     40 <li><a href="#diverseclients">Support diverse clients</a></li>
     41 <li><a href="#ideintegration">Integration with IDEs</a></li>
     42 <li><a href="#license">Use the LLVM 'BSD' License</a></li>
     43 </ul>
     44 
     45 <p>Internal Design and Implementation:</p>
     46 
     47 <ul>
     48 <li><a href="#real">A real-world, production quality compiler</a></li>
     49 <li><a href="#simplecode">A simple and hackable code base</a></li>
     50 <li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
     51     and Objective C++</a></li>
     52 <li><a href="#conformance">Conformance with C/C++/ObjC and their
     53     variants</a></li>
     54 </ul>
     55 
     56 <!--*************************************************************************-->
     57 <h2><a name="enduser">End-User Features</a></h2>
     58 <!--*************************************************************************-->
     59 
     60 
     61 <!--=======================================================================-->
     62 <h3><a name="performance">Fast compiles and Low Memory Use</a></h3>
     63 <!--=======================================================================-->
     64 
     65 <p>A major focus of our work on clang is to make it fast, light and scalable.
     66 The library-based architecture of clang makes it straight-forward to time and
     67 profile the cost of each layer of the stack, and the driver has a number of
     68 options for performance analysis.</p>
     69 
     70 <p>While there is still much that can be done, we find that the clang front-end
     71 is significantly quicker than gcc and uses less memory  For example, when
     72 compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
     73 
     74 <img class="img_slide" src="feature-compile1.png" width="400" height="300" />
     75 
     76 <p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
     77 declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
     78 constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang 
     79 talk</a> for more information). It is also #include'd into almost every C file
     80 in a GUI app on the Mac, so its compile time is very important.</p>
     81 
     82 <p>From the slide above, you can see that we can measure the time to preprocess
     83 the file independently from the time to parse it, and independently from the
     84 time to build the ASTs for the code.  GCC doesn't provide a way to measure the
     85 parser without AST building (it only provides -fsyntax-only).  In our
     86 measurements, we find that clang's preprocessor is consistently 40% faster than
     87 GCCs, and the parser + AST builder is ~4x faster than GCC's.  If you have
     88 sources that do not depend as heavily on the preprocessor (or if you 
     89 use Precompiled Headers) you may see a much bigger speedup from clang.
     90 </p>
     91 
     92 <p>Compile time performance is important, but when using clang as an API, often
     93 memory use is even moreso: the less memory the code takes the more code you can
     94 fit into memory at a time (useful for whole program analysis tools, for
     95 example).</p>
     96 
     97 <img class="img_slide" src="feature-memory1.png" width="400" height="300" />
     98 
     99 <p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
    100 than GCC's syntax trees, despite the fact that clang's ASTs capture far more 
    101 source-level information than GCC's trees do.  This feat is accomplished through
    102 the use of carefully designed APIs and efficient representations.</p>
    103 
    104 <p>In addition to being efficient when pitted head-to-head against GCC in batch
    105 mode, clang is built with a <a href="#libraryarch">library based 
    106 architecture</a> that makes it relatively easy to adapt it and build new tools
    107 with it.  This means that it is often possible to apply out-of-the-box thinking
    108 and novel techniques to improve compilation in various ways.</p> 
    109   
    110 <img class="img_slide" src="feature-compile2.png" width="400" height="300" />
    111 
    112 <p>This slide shows how the clang preprocessor can be used to make "distcc"
    113 parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
    114 "distcc" quickly bottlenecks on the preprocessor running on the central driver
    115 machine, so a fast preprocessor is very useful.  Comparing the first two bars
    116 of each group shows how a ~40% faster preprocessor can reduce preprocessing time
    117 of these large C++ apps by about 40% (shocking!).</p>
    118 
    119 <p>The third bar on the slide is the interesting part: it shows how trivial
    120 caching of file system accesses across invocations of the preprocessor allows 
    121 clang to reduce time spent in the kernel by 10x, making distcc over 3x more
    122 scalable.  This is obviously just one simple hack, doing more interesting things
    123 (like caching tokens across preprocessed files) would yield another substantial
    124 speedup.</p>
    125 
    126 <p>The clean framework-based design of clang means that many things are possible
    127 that would be very difficult in other systems, for example incremental
    128 compilation, multithreading, intelligent caching, etc.  We are only starting
    129 to tap the full potential of the clang design.</p>
    130 
    131 
    132 <!--=======================================================================-->
    133 <h3><a name="expressivediags">Expressive Diagnostics</a></h3>
    134 <!--=======================================================================-->
    135 
    136 <p>In addition to being fast and functional, we aim to make Clang extremely user
    137 friendly.  As far as a command-line compiler goes, this basically boils down to
    138 making the diagnostics (error and warning messages) generated by the compiler
    139 be as useful as possible.  There are several ways that we do this, but the
    140 most important are pinpointing exactly what is wrong in the program,
    141 highlighting related information so that it is easy to understand at a glance,
    142 and making the wording as clear as possible.</p>
    143 
    144 <p>Here is one simple example that illustrates the difference between a typical
    145 GCC and Clang diagnostic:</p>
    146 
    147 <pre>
    148   $ <b>gcc-4.2 -fsyntax-only t.c</b>
    149   t.c:7: error: invalid operands to binary + (have 'int' and 'struct A')
    150   $ <b>clang -fsyntax-only t.c</b>
    151   t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A')
    152   <font color="darkgreen">  return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</font>
    153   <font color="blue">                       ~~~~~~~~~~~~~~ ^ ~~~~~</font>
    154 </pre>
    155 
    156 <p>Here you can see that you don't even need to see the original source code to
    157 understand what is wrong based on the Clang error: Because clang prints a
    158 caret, you know exactly <em>which</em> plus it is complaining about.  The range
    159 information highlights the left and right side of the plus which makes it
    160 immediately obvious what the compiler is talking about, which is very useful for
    161 cases involving precedence issues and many other situations.</p>
    162 
    163 <p>Clang diagnostics are very polished and have many features.  For more 
    164 information and examples, please see the <a href="diagnostics.html">Expressive
    165 Diagnostics</a> page.</p>
    166 
    167 <!--=======================================================================-->
    168 <h3><a name="gcccompat">GCC Compatibility</a></h3>
    169 <!--=======================================================================-->
    170 
    171 <p>GCC is currently the defacto-standard open source compiler today, and it
    172 routinely compiles a huge volume of code.  GCC supports a huge number of
    173 extensions and features (many of which are undocumented) and a lot of 
    174 code and header files depend on these features in order to build.</p>
    175 
    176 <p>While it would be nice to be able to ignore these extensions and focus on
    177 implementing the language standards to the letter, pragmatics force us to
    178 support the GCC extensions that see the most use.  Many users just want their
    179 code to compile, they don't care to argue about whether it is pedantically C99
    180 or not.</p>
    181 
    182 <p>As mentioned above, all
    183 extensions are explicitly recognized as such and marked with extension
    184 diagnostics, which can be mapped to warnings, errors, or just ignored.
    185 </p>
    186 
    187 
    188 <!--*************************************************************************-->
    189 <h2><a name="applications">Utility and Applications</a></h2>
    190 <!--*************************************************************************-->
    191 
    192 <!--=======================================================================-->
    193 <h3><a name="libraryarch">Library Based Architecture</a></h3>
    194 <!--=======================================================================-->
    195 
    196 <p>A major design concept for clang is its use of a library-based
    197 architecture.  In this design, various parts of the front-end can be cleanly
    198 divided into separate libraries which can then be mixed up for different needs
    199 and uses.  In addition, the library-based approach encourages good interfaces
    200 and makes it easier for new developers to get involved (because they only need
    201 to understand small pieces of the big picture).</p>
    202 
    203 <blockquote>
    204 "The world needs better compiler tools, tools which are built as libraries.
    205 This design point allows reuse of the tools in new and novel ways. However,
    206 building the tools as libraries isn't enough: they must have clean APIs, be as
    207 decoupled from each other as possible, and be easy to modify/extend. This
    208 requires clean layering, decent design, and keeping the libraries independent of
    209 any specific client."</blockquote>
    210 
    211 <p>
    212 Currently, clang is divided into the following libraries and tool:
    213 </p>
    214 
    215 <ul>
    216 <li><b>libsupport</b> - Basic support library, from LLVM.</li>
    217 <li><b>libsystem</b> - System abstraction library, from LLVM.</li>
    218 <li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction,
    219     file system caching for input source files.</li>
    220 <li><b>libast</b> - Provides classes to represent the C AST, the C type system,
    221     builtin functions, and various helpers for analyzing and manipulating the
    222     AST (visitors, pretty printers, etc).</li>
    223 <li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma
    224     handling, tokens, and macro expansion.</li>
    225 <li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions'
    226     provided by the client (e.g. libsema builds ASTs) but knows nothing about
    227     ASTs or other client-specific data structures.</li>
    228 <li><b>libsema</b> - Semantic Analysis.  This provides a set of parser actions
    229     to build a standardized AST for programs.</li>
    230 <li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization &amp; code
    231     generation.</li>
    232 <li><b>librewrite</b> - Editing of text buffers (important for code rewriting
    233     transformation, like refactoring).</li>
    234 <li><b>libanalysis</b> - Static analysis support.</li>
    235 <li><b>clang</b> - A driver program, client of the libraries at various
    236     levels.</li>
    237 </ul>
    238 
    239 <p>As an example of the power of this library based design....  If you wanted to
    240 build a preprocessor, you would take the Basic and Lexer libraries. If you want
    241 an indexer, you would take the previous two and add the Parser library and
    242 some actions for indexing. If you want a refactoring, static analysis, or
    243 source-to-source compiler tool, you would then add the AST building and
    244 semantic analyzer libraries.</p>
    245 
    246 <p>For more information about the low-level implementation details of the
    247 various clang libraries, please see the <a href="docs/InternalsManual.html">
    248 clang Internals Manual</a>.</p>
    249 
    250 <!--=======================================================================-->
    251 <h3><a name="diverseclients">Support Diverse Clients</a></h3>
    252 <!--=======================================================================-->
    253 
    254 <p>Clang is designed and built with many grand plans for how we can use it.  The
    255 driving force is the fact that we use C and C++ daily, and have to suffer due to
    256 a lack of good tools available for it.  We believe that the C and C++ tools
    257 ecosystem has been significantly limited by how difficult it is to parse and
    258 represent the source code for these languages, and we aim to rectify this
    259 problem in clang.</p>
    260 
    261 <p>The problem with this goal is that different clients have very different
    262 requirements.  Consider code generation, for example: a simple front-end that
    263 parses for code generation must analyze the code for validity and emit code
    264 in some intermediate form to pass off to a optimizer or backend.  Because
    265 validity analysis and code generation can largely be done on the fly, there is
    266 not hard requirement that the front-end actually build up a full AST for all
    267 the expressions and statements in the code.  TCC and GCC are examples of
    268 compilers that either build no real AST (in the former case) or build a stripped
    269 down and simplified AST (in the later case) because they focus primarily on
    270 codegen.</p>
    271 
    272 <p>On the opposite side of the spectrum, some clients (like refactoring) want
    273 highly detailed information about the original source code and want a complete
    274 AST to describe it with.  Refactoring wants to have information about macro
    275 expansions, the location of every paren expression '(((x)))' vs 'x', full
    276 position information, and much more.  Further, refactoring wants to look
    277 <em>across the whole program</em> to ensure that it is making transformations
    278 that are safe.  Making this efficient and getting this right requires a
    279 significant amount of engineering and algorithmic work that simply are
    280 unnecessary for a simple static compiler.</p>
    281 
    282 <p>The beauty of the clang approach is that it does not restrict how you use it.
    283 In particular, it is possible to use the clang preprocessor and parser to build
    284 an extremely quick and light-weight on-the-fly code generator (similar to TCC)
    285 that does not build an AST at all.   As an intermediate step, clang supports
    286 using the current AST generation and semantic analysis code and having a code 
    287 generation client free the AST for each function after code generation. Finally,
    288 clang provides support for building and retaining fully-fledged ASTs, and even
    289 supports writing them out to disk.</p>
    290 
    291 <p>Designing the libraries with clean and simple APIs allows these high-level
    292 policy decisions to be determined in the client, instead of forcing "one true
    293 way" in the implementation of any of these libraries.  Getting this right is
    294 hard, and we don't always get it right the first time, but we fix any problems
    295 when we realize we made a mistake.</p>
    296 
    297 <!--=======================================================================-->
    298 <h3><a name="ideintegration">Integration with IDEs</h3>
    299 <!--=======================================================================-->
    300 
    301 <p>
    302 We believe that Integrated Development Environments (IDE's) are a great way
    303 to pull together various pieces of the development puzzle, and aim to make clang
    304 work well in such an environment.  The chief advantage of an IDE is that they
    305 typically have visibility across your entire project and are long-lived
    306 processes, whereas stand-alone compiler tools are typically invoked on each
    307 individual file in the project, and thus have limited scope.</p>
    308 
    309 <p>There are many implications of this difference, but a significant one has to
    310 do with efficiency and caching: sharing an address space across different files
    311 in a project, means that you can use intelligent caching and other techniques to
    312 dramatically reduce analysis/compilation time.</p>
    313 
    314 <p>A further difference between IDEs and batch compiler is that they often
    315 impose very different requirements on the front-end: they depend on high
    316 performance in order to provide a "snappy" experience, and thus really want
    317 techniques like "incremental compilation", "fuzzy parsing", etc.  Finally, IDEs
    318 often have very different requirements than code generation, often requiring
    319 information that a codegen-only frontend can throw away.  Clang is
    320 specifically designed and built to capture this information.
    321 </p>
    322 
    323 
    324 <!--=======================================================================-->
    325 <h3><a name="license">Use the LLVM 'BSD' License</a></h3>
    326 <!--=======================================================================-->
    327 
    328 <p>We actively intend for clang (and LLVM as a whole) to be used for
    329 commercial projects, and the BSD license is the simplest way to allow this.  We
    330 feel that the license encourages contributors to pick up the source and work
    331 with it, and believe that those individuals and organizations will contribute
    332 back their work if they do not want to have to maintain a fork forever (which is
    333 time consuming and expensive when merges are involved).  Further, nobody makes
    334 money on compilers these days, but many people need them to get bigger goals
    335 accomplished: it makes sense for everyone to work together.</p>
    336 
    337 <p>For more information about the LLVM/clang license, please see the <a 
    338 href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License 
    339 Description</a> for more information.</p>
    340 
    341 
    342 
    343 <!--*************************************************************************-->
    344 <h2><a name="design">Internal Design and Implementation</a></h2>
    345 <!--*************************************************************************-->
    346 
    347 <!--=======================================================================-->
    348 <h3><a name="real">A real-world, production quality compiler</a></h3>
    349 <!--=======================================================================-->
    350 
    351 <p>
    352 Clang is designed and built by experienced compiler developers who
    353 are increasingly frustrated with the problems that <a 
    354 href="comparison.html">existing open source compilers</a> have.  Clang is
    355 carefully and thoughtfully designed and built to provide the foundation of a
    356 whole new generation of C/C++/Objective C development tools, and we intend for
    357 it to be production quality.</p>
    358 
    359 <p>Being a production quality compiler means many things: it means being high
    360 performance, being solid and (relatively) bug free, and it means eventually
    361 being used and depended on by a broad range of people.  While we are still in
    362 the early development stages, we strongly believe that this will become a
    363 reality.</p>
    364 
    365 <!--=======================================================================-->
    366 <h3><a name="simplecode">A simple and hackable code base</a></h3>
    367 <!--=======================================================================-->
    368 
    369 <p>Our goal is to make it possible for anyone with a basic understanding
    370 of compilers and working knowledge of the C/C++/ObjC languages to understand and
    371 extend the clang source base.  A large part of this falls out of our decision to
    372 make the AST mirror the languages as closely as possible: you have your friendly
    373 if statement, for statement, parenthesis expression, structs, unions, etc, all
    374 represented in a simple and explicit way.</p>
    375 
    376 <p>In addition to a simple design, we work to make the source base approachable
    377 by commenting it well, including citations of the language standards where
    378 appropriate, and designing the code for simplicity.  Beyond that, clang offers
    379 a set of AST dumpers, printers, and visualizers that make it easy to put code in
    380 and see how it is represented.</p>
    381 
    382 <!--=======================================================================-->
    383 <h3><a name="unifiedparser">A single unified parser for C, Objective C, C++,
    384 and Objective C++</a></h3>
    385 <!--=======================================================================-->
    386 
    387 <p>Clang is the "C Language Family Front-end", which means we intend to support
    388 the most popular members of the C family.  We are convinced that the right
    389 parsing technology for this class of languages is a hand-built recursive-descent
    390 parser.  Because it is plain C++ code, recursive descent makes it very easy for
    391 new developers to understand the code, it easily supports ad-hoc rules and other
    392 strange hacks required by C/C++, and makes it straight-forward to implement
    393 excellent diagnostics and error recovery.</p>
    394 
    395 <p>We believe that implementing C/C++/ObjC in a single unified parser makes the
    396 end result easier to maintain and evolve than maintaining a separate C and C++
    397 parser which must be bugfixed and maintained independently of each other.</p>
    398 
    399 <!--=======================================================================-->
    400 <h3><a name="conformance">Conformance with C/C++/ObjC and their
    401  variants</a></h3>
    402 <!--=======================================================================-->
    403 
    404 <p>When you start work on implementing a language, you find out that there is a
    405 huge gap between how the language works and how most people understand it to
    406 work.  This gap is the difference between a normal programmer and a (scary?
    407 super-natural?) "language lawyer", who knows the ins and outs of the language
    408 and can grok standardese with ease.</p>
    409 
    410 <p>In practice, being conformant with the languages means that we aim to support
    411 the full language, including the dark and dusty corners (like trigraphs,
    412 preprocessor arcana, C99 VLAs, etc).  Where we support extensions above and
    413 beyond what the standard officially allows, we make an effort to explicitly call
    414 this out in the code and emit warnings about it (which are disabled by default,
    415 but can optionally be mapped to either warnings or errors), allowing you to use
    416 clang in "strict" mode if you desire.</p>
    417 
    418 <p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
    419 C, C++'03, Objective-C 2, etc.</p>
    420 
    421 </div>
    422 </body>
    423 </html>
    424