Home | History | Annotate | Download | only in www
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
      6   <title>Clang - Features and Goals</title>
      7   <link type="text/css" rel="stylesheet" href="menu.css">
      8   <link type="text/css" rel="stylesheet" href="content.css">
      9   <style type="text/css">
     10 </style>
     11 </head>
     12 <body>
     13 
     14 <!--#include virtual="menu.html.incl"-->
     15 
     16 <div id="content">
     17 
     18 <!--*************************************************************************-->
     19 <h1>Clang - Features and Goals</h1>
     20 <!--*************************************************************************-->
     21 
     22 <p>
     23 This page describes the <a href="index.html#goals">features and goals</a> of
     24 Clang in more detail and gives a more broad explanation about what we mean.
     25 These features are:
     26 </p>
     27 
     28 <p>End-User Features:</p>
     29 
     30 <ul>
     31 <li><a href="#performance">Fast compiles and low memory use</a></li>
     32 <li><a href="#expressivediags">Expressive diagnostics</a></li>
     33 <li><a href="#gcccompat">GCC compatibility</a></li>
     34 </ul>
     35 
     36 <p>Utility and Applications:</p>
     37 
     38 <ul>
     39 <li><a href="#libraryarch">Library based architecture</a></li>
     40 <li><a href="#diverseclients">Support diverse clients</a></li>
     41 <li><a href="#ideintegration">Integration with IDEs</a></li>
     42 <li><a href="#license">Use the LLVM 'BSD' License</a></li>
     43 </ul>
     44 
     45 <p>Internal Design and Implementation:</p>
     46 
     47 <ul>
     48 <li><a href="#real">A real-world, production quality compiler</a></li>
     49 <li><a href="#simplecode">A simple and hackable code base</a></li>
     50 <li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
     51     and Objective C++</a></li>
     52 <li><a href="#conformance">Conformance with C/C++/ObjC and their
     53     variants</a></li>
     54 </ul>
     55 
     56 <!--*************************************************************************-->
     57 <h2><a name="enduser">End-User Features</a></h2>
     58 <!--*************************************************************************-->
     59 
     60 
     61 <!--=======================================================================-->
     62 <h3><a name="performance">Fast compiles and Low Memory Use</a></h3>
     63 <!--=======================================================================-->
     64 
     65 <p>A major focus of our work on clang is to make it fast, light and scalable.
     66 The library-based architecture of clang makes it straight-forward to time and
     67 profile the cost of each layer of the stack, and the driver has a number of
     68 options for performance analysis. Many detailed benchmarks can be found online.</p>
     69 
     70 <p>Compile time performance is important, but when using clang as an API, often
     71 memory use is even moreso: the less memory the code takes the more code you can
     72 fit into memory at a time (useful for whole program analysis tools, for
     73 example).</p>
     74 
     75 <p>In addition to being efficient when pitted head-to-head against GCC in batch
     76 mode, clang is built with a <a href="#libraryarch">library based
     77 architecture</a> that makes it relatively easy to adapt it and build new tools
     78 with it.  This means that it is often possible to apply out-of-the-box thinking
     79 and novel techniques to improve compilation in various ways.</p>
     80 
     81 
     82 <!--=======================================================================-->
     83 <h3><a name="expressivediags">Expressive Diagnostics</a></h3>
     84 <!--=======================================================================-->
     85 
     86 <p>In addition to being fast and functional, we aim to make Clang extremely user
     87 friendly.  As far as a command-line compiler goes, this basically boils down to
     88 making the diagnostics (error and warning messages) generated by the compiler
     89 be as useful as possible.  There are several ways that we do this, but the
     90 most important are pinpointing exactly what is wrong in the program,
     91 highlighting related information so that it is easy to understand at a glance,
     92 and making the wording as clear as possible.</p>
     93 
     94 <p>Here is one simple example that illustrates the difference between a typical
     95 GCC and Clang diagnostic:</p>
     96 
     97 <pre>
     98   $ <b>gcc-4.2 -fsyntax-only t.c</b>
     99   t.c:7: error: invalid operands to binary + (have 'int' and 'struct A')
    100   $ <b>clang -fsyntax-only t.c</b>
    101   t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A')
    102   <span style="color:darkgreen">  return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</span>
    103   <span style="color:blue">                       ~~~~~~~~~~~~~~ ^ ~~~~~</span>
    104 </pre>
    105 
    106 <p>Here you can see that you don't even need to see the original source code to
    107 understand what is wrong based on the Clang error: Because Clang prints a
    108 caret, you know exactly <em>which</em> plus it is complaining about.  The range
    109 information highlights the left and right side of the plus which makes it
    110 immediately obvious what the compiler is talking about, which is very useful for
    111 cases involving precedence issues and many other situations.</p>
    112 
    113 <p>Clang diagnostics are very polished and have many features.  For more 
    114 information and examples, please see the <a href="diagnostics.html">Expressive
    115 Diagnostics</a> page.</p>
    116 
    117 <!--=======================================================================-->
    118 <h3><a name="gcccompat">GCC Compatibility</a></h3>
    119 <!--=======================================================================-->
    120 
    121 <p>GCC is currently the defacto-standard open source compiler today, and it
    122 routinely compiles a huge volume of code.  GCC supports a huge number of
    123 extensions and features (many of which are undocumented) and a lot of 
    124 code and header files depend on these features in order to build.</p>
    125 
    126 <p>While it would be nice to be able to ignore these extensions and focus on
    127 implementing the language standards to the letter, pragmatics force us to
    128 support the GCC extensions that see the most use.  Many users just want their
    129 code to compile, they don't care to argue about whether it is pedantically C99
    130 or not.</p>
    131 
    132 <p>As mentioned above, all
    133 extensions are explicitly recognized as such and marked with extension
    134 diagnostics, which can be mapped to warnings, errors, or just ignored.
    135 </p>
    136 
    137 
    138 <!--*************************************************************************-->
    139 <h2><a name="applications">Utility and Applications</a></h2>
    140 <!--*************************************************************************-->
    141 
    142 <!--=======================================================================-->
    143 <h3><a name="libraryarch">Library Based Architecture</a></h3>
    144 <!--=======================================================================-->
    145 
    146 <p>A major design concept for clang is its use of a library-based
    147 architecture.  In this design, various parts of the front-end can be cleanly
    148 divided into separate libraries which can then be mixed up for different needs
    149 and uses.  In addition, the library-based approach encourages good interfaces
    150 and makes it easier for new developers to get involved (because they only need
    151 to understand small pieces of the big picture).</p>
    152 
    153 <blockquote><p>
    154 "The world needs better compiler tools, tools which are built as libraries.
    155 This design point allows reuse of the tools in new and novel ways. However,
    156 building the tools as libraries isn't enough: they must have clean APIs, be as
    157 decoupled from each other as possible, and be easy to modify/extend. This
    158 requires clean layering, decent design, and keeping the libraries independent of
    159 any specific client."</p></blockquote>
    160 
    161 <p>
    162 Currently, clang is divided into the following libraries and tool:
    163 </p>
    164 
    165 <ul>
    166 <li><b>libsupport</b> - Basic support library, from LLVM.</li>
    167 <li><b>libsystem</b> - System abstraction library, from LLVM.</li>
    168 <li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction,
    169     file system caching for input source files.</li>
    170 <li><b>libast</b> - Provides classes to represent the C AST, the C type system,
    171     builtin functions, and various helpers for analyzing and manipulating the
    172     AST (visitors, pretty printers, etc).</li>
    173 <li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma
    174     handling, tokens, and macro expansion.</li>
    175 <li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions'
    176     provided by the client (e.g. libsema builds ASTs) but knows nothing about
    177     ASTs or other client-specific data structures.</li>
    178 <li><b>libsema</b> - Semantic Analysis.  This provides a set of parser actions
    179     to build a standardized AST for programs.</li>
    180 <li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization &amp; code
    181     generation.</li>
    182 <li><b>librewrite</b> - Editing of text buffers (important for code rewriting
    183     transformation, like refactoring).</li>
    184 <li><b>libanalysis</b> - Static analysis support.</li>
    185 <li><b>clang</b> - A driver program, client of the libraries at various
    186     levels.</li>
    187 </ul>
    188 
    189 <p>As an example of the power of this library based design....  If you wanted to
    190 build a preprocessor, you would take the Basic and Lexer libraries. If you want
    191 an indexer, you would take the previous two and add the Parser library and
    192 some actions for indexing. If you want a refactoring, static analysis, or
    193 source-to-source compiler tool, you would then add the AST building and
    194 semantic analyzer libraries.</p>
    195 
    196 <p>For more information about the low-level implementation details of the
    197 various clang libraries, please see the <a href="docs/InternalsManual.html">
    198 clang Internals Manual</a>.</p>
    199 
    200 <!--=======================================================================-->
    201 <h3><a name="diverseclients">Support Diverse Clients</a></h3>
    202 <!--=======================================================================-->
    203 
    204 <p>Clang is designed and built with many grand plans for how we can use it.  The
    205 driving force is the fact that we use C and C++ daily, and have to suffer due to
    206 a lack of good tools available for it.  We believe that the C and C++ tools
    207 ecosystem has been significantly limited by how difficult it is to parse and
    208 represent the source code for these languages, and we aim to rectify this
    209 problem in clang.</p>
    210 
    211 <p>The problem with this goal is that different clients have very different
    212 requirements.  Consider code generation, for example: a simple front-end that
    213 parses for code generation must analyze the code for validity and emit code
    214 in some intermediate form to pass off to a optimizer or backend.  Because
    215 validity analysis and code generation can largely be done on the fly, there is
    216 not hard requirement that the front-end actually build up a full AST for all
    217 the expressions and statements in the code.  TCC and GCC are examples of
    218 compilers that either build no real AST (in the former case) or build a stripped
    219 down and simplified AST (in the later case) because they focus primarily on
    220 codegen.</p>
    221 
    222 <p>On the opposite side of the spectrum, some clients (like refactoring) want
    223 highly detailed information about the original source code and want a complete
    224 AST to describe it with.  Refactoring wants to have information about macro
    225 expansions, the location of every paren expression '(((x)))' vs 'x', full
    226 position information, and much more.  Further, refactoring wants to look
    227 <em>across the whole program</em> to ensure that it is making transformations
    228 that are safe.  Making this efficient and getting this right requires a
    229 significant amount of engineering and algorithmic work that simply are
    230 unnecessary for a simple static compiler.</p>
    231 
    232 <p>The beauty of the clang approach is that it does not restrict how you use it.
    233 In particular, it is possible to use the clang preprocessor and parser to build
    234 an extremely quick and light-weight on-the-fly code generator (similar to TCC)
    235 that does not build an AST at all.   As an intermediate step, clang supports
    236 using the current AST generation and semantic analysis code and having a code 
    237 generation client free the AST for each function after code generation. Finally,
    238 clang provides support for building and retaining fully-fledged ASTs, and even
    239 supports writing them out to disk.</p>
    240 
    241 <p>Designing the libraries with clean and simple APIs allows these high-level
    242 policy decisions to be determined in the client, instead of forcing "one true
    243 way" in the implementation of any of these libraries.  Getting this right is
    244 hard, and we don't always get it right the first time, but we fix any problems
    245 when we realize we made a mistake.</p>
    246 
    247 <!--=======================================================================-->
    248 <h3 id="ideintegration">Integration with IDEs</h3>
    249 <!--=======================================================================-->
    250 
    251 <p>
    252 We believe that Integrated Development Environments (IDE's) are a great way
    253 to pull together various pieces of the development puzzle, and aim to make clang
    254 work well in such an environment.  The chief advantage of an IDE is that they
    255 typically have visibility across your entire project and are long-lived
    256 processes, whereas stand-alone compiler tools are typically invoked on each
    257 individual file in the project, and thus have limited scope.</p>
    258 
    259 <p>There are many implications of this difference, but a significant one has to
    260 do with efficiency and caching: sharing an address space across different files
    261 in a project, means that you can use intelligent caching and other techniques to
    262 dramatically reduce analysis/compilation time.</p>
    263 
    264 <p>A further difference between IDEs and batch compiler is that they often
    265 impose very different requirements on the front-end: they depend on high
    266 performance in order to provide a "snappy" experience, and thus really want
    267 techniques like "incremental compilation", "fuzzy parsing", etc.  Finally, IDEs
    268 often have very different requirements than code generation, often requiring
    269 information that a codegen-only frontend can throw away.  Clang is
    270 specifically designed and built to capture this information.
    271 </p>
    272 
    273 
    274 <!--=======================================================================-->
    275 <h3><a name="license">Use the LLVM 'BSD' License</a></h3>
    276 <!--=======================================================================-->
    277 
    278 <p>We actively intend for clang (and LLVM as a whole) to be used for
    279 commercial projects, not only as a stand-alone compiler but also as a library
    280 embedded inside a proprietary application.  The BSD license is the simplest way
    281 to allow this.  We feel that the license encourages contributors to pick up the
    282 source and work with it, and believe that those individuals and organizations
    283 will contribute back their work if they do not want to have to maintain a fork
    284 forever (which is time consuming and expensive when merges are involved).
    285 Further, nobody makes money on compilers these days, but many people need them
    286 to get bigger goals accomplished: it makes sense for everyone to work
    287 together.</p>
    288 
    289 <p>For more information about the LLVM/clang license, please see the <a 
    290 href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License 
    291 Description</a> for more information.</p>
    292 
    293 
    294 
    295 <!--*************************************************************************-->
    296 <h2><a name="design">Internal Design and Implementation</a></h2>
    297 <!--*************************************************************************-->
    298 
    299 <!--=======================================================================-->
    300 <h3><a name="real">A real-world, production quality compiler</a></h3>
    301 <!--=======================================================================-->
    302 
    303 <p>
    304 Clang is designed and built by experienced compiler developers who
    305 are increasingly frustrated with the problems that <a 
    306 href="comparison.html">existing open source compilers</a> have.  Clang is
    307 carefully and thoughtfully designed and built to provide the foundation of a
    308 whole new generation of C/C++/Objective C development tools, and we intend for
    309 it to be production quality.</p>
    310 
    311 <p>Being a production quality compiler means many things: it means being high
    312 performance, being solid and (relatively) bug free, and it means eventually
    313 being used and depended on by a broad range of people.  While we are still in
    314 the early development stages, we strongly believe that this will become a
    315 reality.</p>
    316 
    317 <!--=======================================================================-->
    318 <h3><a name="simplecode">A simple and hackable code base</a></h3>
    319 <!--=======================================================================-->
    320 
    321 <p>Our goal is to make it possible for anyone with a basic understanding
    322 of compilers and working knowledge of the C/C++/ObjC languages to understand and
    323 extend the clang source base.  A large part of this falls out of our decision to
    324 make the AST mirror the languages as closely as possible: you have your friendly
    325 if statement, for statement, parenthesis expression, structs, unions, etc, all
    326 represented in a simple and explicit way.</p>
    327 
    328 <p>In addition to a simple design, we work to make the source base approachable
    329 by commenting it well, including citations of the language standards where
    330 appropriate, and designing the code for simplicity.  Beyond that, clang offers
    331 a set of AST dumpers, printers, and visualizers that make it easy to put code in
    332 and see how it is represented.</p>
    333 
    334 <!--=======================================================================-->
    335 <h3><a name="unifiedparser">A single unified parser for C, Objective C, C++,
    336 and Objective C++</a></h3>
    337 <!--=======================================================================-->
    338 
    339 <p>Clang is the "C Language Family Front-end", which means we intend to support
    340 the most popular members of the C family.  We are convinced that the right
    341 parsing technology for this class of languages is a hand-built recursive-descent
    342 parser.  Because it is plain C++ code, recursive descent makes it very easy for
    343 new developers to understand the code, it easily supports ad-hoc rules and other
    344 strange hacks required by C/C++, and makes it straight-forward to implement
    345 excellent diagnostics and error recovery.</p>
    346 
    347 <p>We believe that implementing C/C++/ObjC in a single unified parser makes the
    348 end result easier to maintain and evolve than maintaining a separate C and C++
    349 parser which must be bugfixed and maintained independently of each other.</p>
    350 
    351 <!--=======================================================================-->
    352 <h3><a name="conformance">Conformance with C/C++/ObjC and their
    353  variants</a></h3>
    354 <!--=======================================================================-->
    355 
    356 <p>When you start work on implementing a language, you find out that there is a
    357 huge gap between how the language works and how most people understand it to
    358 work.  This gap is the difference between a normal programmer and a (scary?
    359 super-natural?) "language lawyer", who knows the ins and outs of the language
    360 and can grok standardese with ease.</p>
    361 
    362 <p>In practice, being conformant with the languages means that we aim to support
    363 the full language, including the dark and dusty corners (like trigraphs,
    364 preprocessor arcana, C99 VLAs, etc).  Where we support extensions above and
    365 beyond what the standard officially allows, we make an effort to explicitly call
    366 this out in the code and emit warnings about it (which are disabled by default,
    367 but can optionally be mapped to either warnings or errors), allowing you to use
    368 clang in "strict" mode if you desire.</p>
    369 
    370 <p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
    371 C, C++'03, Objective-C 2, etc.</p>
    372 
    373 </div>
    374 </body>
    375 </html>
    376