Home | History | Annotate | Download | only in docs
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5 <title>"Clang" CFE Internals Manual</title>
      6 <link type="text/css" rel="stylesheet" href="../menu.css">
      7 <link type="text/css" rel="stylesheet" href="../content.css">
      8 <style type="text/css">
      9 td {
     10 	vertical-align: top;
     11 }
     12 </style>
     13 </head>
     14 <body>
     15 
     16 <!--#include virtual="../menu.html.incl"-->
     17 
     18 <div id="content">
     19 
     20 <h1>"Clang" CFE Internals Manual</h1>
     21 
     22 <ul>
     23 <li><a href="#intro">Introduction</a></li>
     24 <li><a href="#libsupport">LLVM Support Library</a></li>
     25 <li><a href="#libbasic">The Clang 'Basic' Library</a>
     26   <ul>
     27   <li><a href="#Diagnostics">The Diagnostics Subsystem</a></li>
     28   <li><a href="#SourceLocation">The SourceLocation and SourceManager
     29       classes</a></li>
     30   <li><a href="#SourceRange">SourceRange and CharSourceRange</a></li>
     31   </ul>
     32 </li>
     33 <li><a href="#libdriver">The Driver Library</a>
     34 </li>
     35 <li><a href="#pch">Precompiled Headers</a>
     36 <li><a href="#libfrontend">The Frontend Library</a>
     37 </li>
     38 <li><a href="#liblex">The Lexer and Preprocessor Library</a>
     39   <ul>
     40   <li><a href="#Token">The Token class</a></li>
     41   <li><a href="#Lexer">The Lexer class</a></li>
     42   <li><a href="#AnnotationToken">Annotation Tokens</a></li>
     43   <li><a href="#TokenLexer">The TokenLexer class</a></li>
     44   <li><a href="#MultipleIncludeOpt">The MultipleIncludeOpt class</a></li>
     45   </ul>
     46 </li>
     47 <li><a href="#libparse">The Parser Library</a>
     48 </li>
     49 <li><a href="#libast">The AST Library</a>
     50   <ul>
     51   <li><a href="#Type">The Type class and its subclasses</a></li>
     52   <li><a href="#QualType">The QualType class</a></li>
     53   <li><a href="#DeclarationName">Declaration names</a></li>
     54   <li><a href="#DeclContext">Declaration contexts</a>
     55     <ul>
     56       <li><a href="#Redeclarations">Redeclarations and Overloads</a></li>
     57       <li><a href="#LexicalAndSemanticContexts">Lexical and Semantic
     58       Contexts</a></li>
     59       <li><a href="#TransparentContexts">Transparent Declaration Contexts</a></li>
     60       <li><a href="#MultiDeclContext">Multiply-Defined Declaration Contexts</a></li>
     61     </ul>
     62   </li>
     63   <li><a href="#CFG">The CFG class</a></li>
     64   <li><a href="#Constants">Constant Folding in the Clang AST</a></li>
     65   </ul>
     66 </li>
     67 <li><a href="#Howtos">Howto guides</a>
     68   <ul>
     69     <li><a href="#AddingAttributes">How to add an attribute</a></li>
     70     <li><a href="#AddingExprStmt">How to add a new expression or statement</a></li>
     71   </ul>
     72 </li>
     73 </ul>
     74 
     75 
     76 <!-- ======================================================================= -->
     77 <h2 id="intro">Introduction</h2>
     78 <!-- ======================================================================= -->
     79 
     80 <p>This document describes some of the more important APIs and internal design
     81 decisions made in the Clang C front-end.  The purpose of this document is to
     82 both capture some of this high level information and also describe some of the
     83 design decisions behind it.  This is meant for people interested in hacking on
     84 Clang, not for end-users.  The description below is categorized by
     85 libraries, and does not describe any of the clients of the libraries.</p>
     86 
     87 <!-- ======================================================================= -->
     88 <h2 id="libsupport">LLVM Support Library</h2>
     89 <!-- ======================================================================= -->
     90 
     91 <p>The LLVM libsupport library provides many underlying libraries and
     92 <a href="http://llvm.org/docs/ProgrammersManual.html">data-structures</a>,
     93 including command line option processing, various containers and a system
     94 abstraction layer, which is used for file system access.</p>
     95 
     96 <!-- ======================================================================= -->
     97 <h2 id="libbasic">The Clang 'Basic' Library</h2>
     98 <!-- ======================================================================= -->
     99 
    100 <p>This library certainly needs a better name.  The 'basic' library contains a
    101 number of low-level utilities for tracking and manipulating source buffers,
    102 locations within the source buffers, diagnostics, tokens, target abstraction,
    103 and information about the subset of the language being compiled for.</p>
    104 
    105 <p>Part of this infrastructure is specific to C (such as the TargetInfo class),
    106 other parts could be reused for other non-C-based languages (SourceLocation,
    107 SourceManager, Diagnostics, FileManager).  When and if there is future demand
    108 we can figure out if it makes sense to introduce a new library, move the general
    109 classes somewhere else, or introduce some other solution.</p>
    110 
    111 <p>We describe the roles of these classes in order of their dependencies.</p>
    112 
    113 
    114 <!-- ======================================================================= -->
    115 <h3 id="Diagnostics">The Diagnostics Subsystem</h3>
    116 <!-- ======================================================================= -->
    117 
    118 <p>The Clang Diagnostics subsystem is an important part of how the compiler
    119 communicates with the human.  Diagnostics are the warnings and errors produced
    120 when the code is incorrect or dubious.  In Clang, each diagnostic produced has
    121 (at the minimum) a unique ID, an English translation associated with it, a <a
    122 href="#SourceLocation">SourceLocation</a> to "put the caret", and a severity (e.g.
    123 <tt>WARNING</tt> or <tt>ERROR</tt>).  They can also optionally include a number
    124 of arguments to the dianostic (which fill in "%0"'s in the string) as well as a
    125 number of source ranges that related to the diagnostic.</p>
    126 
    127 <p>In this section, we'll be giving examples produced by the Clang command line
    128 driver, but diagnostics can be <a href="#DiagnosticClient">rendered in many
    129 different ways</a> depending on how the DiagnosticClient interface is
    130 implemented.  A representative example of a diagnostic is:</p>
    131 
    132 <pre>
    133 t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float')
    134    <span style="color:darkgreen">P = (P-42) + Gamma*4;</span>
    135        <span style="color:blue">~~~~~~ ^ ~~~~~~~</span>
    136 </pre>
    137 
    138 <p>In this example, you can see the English translation, the severity (error),
    139 you can see the source location (the caret ("^") and file/line/column info),
    140 the source ranges "~~~~", arguments to the diagnostic ("int*" and "_Complex
    141 float").  You'll have to believe me that there is a unique ID backing the
    142 diagnostic :).</p>
    143 
    144 <p>Getting all of this to happen has several steps and involves many moving
    145 pieces, this section describes them and talks about best practices when adding
    146 a new diagnostic.</p>
    147 
    148 <!-- ============================= -->
    149 <h4>The Diagnostic*Kinds.td files</h4>
    150 <!-- ============================= -->
    151 
    152 <p>Diagnostics are created by adding an entry to one of the <tt>
    153 clang/Basic/Diagnostic*Kinds.td</tt> files, depending on what library will
    154 be using it.  From this file, tblgen generates the unique ID of the diagnostic,
    155 the severity of the diagnostic and the English translation + format string.</p>
    156 
    157 <p>There is little sanity with the naming of the unique ID's right now.  Some
    158 start with err_, warn_, ext_ to encode the severity into the name.  Since the
    159 enum is referenced in the C++ code that produces the diagnostic, it is somewhat
    160 useful for it to be reasonably short.</p>
    161 
    162 <p>The severity of the diagnostic comes from the set {<tt>NOTE</tt>,
    163 <tt>WARNING</tt>, <tt>EXTENSION</tt>, <tt>EXTWARN</tt>, <tt>ERROR</tt>}.  The
    164 <tt>ERROR</tt> severity is used for diagnostics indicating the program is never
    165 acceptable under any circumstances.  When an error is emitted, the AST for the
    166 input code may not be fully built.  The <tt>EXTENSION</tt> and <tt>EXTWARN</tt>
    167 severities are used for extensions to the language that Clang accepts.  This
    168 means that Clang fully understands and can represent them in the AST, but we
    169 produce diagnostics to tell the user their code is non-portable.  The difference
    170 is that the former are ignored by default, and the later warn by default.  The
    171 <tt>WARNING</tt> severity is used for constructs that are valid in the currently
    172 selected source language but that are dubious in some way.  The <tt>NOTE</tt>
    173 level is used to staple more information onto previous diagnostics.</p>
    174 
    175 <p>These <em>severities</em> are mapped into a smaller set (the
    176 Diagnostic::Level enum, {<tt>Ignored</tt>, <tt>Note</tt>, <tt>Warning</tt>,
    177 <tt>Error</tt>, <tt>Fatal</tt> }) of output <em>levels</em> by the diagnostics
    178 subsystem based on various configuration options.  Clang internally supports a
    179 fully fine grained mapping mechanism that allows you to map almost any
    180 diagnostic to the output level that you want.  The only diagnostics that cannot
    181 be mapped are <tt>NOTE</tt>s, which always follow the severity of the previously
    182 emitted diagnostic and <tt>ERROR</tt>s, which can only be mapped to
    183 <tt>Fatal</tt> (it is not possible to turn an error into a warning,
    184 for example).</p>
    185 
    186 <p>Diagnostic mappings are used in many ways.  For example, if the user
    187 specifies <tt>-pedantic</tt>, <tt>EXTENSION</tt> maps to <tt>Warning</tt>, if
    188 they specify <tt>-pedantic-errors</tt>, it turns into <tt>Error</tt>.  This is
    189 used to implement options like <tt>-Wunused_macros</tt>, <tt>-Wundef</tt> etc.
    190 </p>
    191 
    192 <p>
    193 Mapping to <tt>Fatal</tt> should only be used for diagnostics that are
    194 considered so severe that error recovery won't be able to recover sensibly from
    195 them (thus spewing a ton of bogus errors).  One example of this class of error
    196 are failure to #include a file.
    197 </p>
    198 
    199 <!-- ================= -->
    200 <h4>The Format String</h4>
    201 <!-- ================= -->
    202 
    203 <p>The format string for the diagnostic is very simple, but it has some power.
    204 It takes the form of a string in English with markers that indicate where and
    205 how arguments to the diagnostic are inserted and formatted.  For example, here
    206 are some simple format strings:</p>
    207 
    208 <pre>
    209   "binary integer literals are an extension"
    210   "format string contains '\\0' within the string body"
    211   "more '<b>%%</b>' conversions than data arguments"
    212   "invalid operands to binary expression (<b>%0</b> and <b>%1</b>)"
    213   "overloaded '<b>%0</b>' must be a <b>%select{unary|binary|unary or binary}2</b> operator"
    214        " (has <b>%1</b> parameter<b>%s1</b>)"
    215 </pre>
    216 
    217 <p>These examples show some important points of format strings.  You can use any
    218    plain ASCII character in the diagnostic string except "%" without a problem,
    219    but these are C strings, so you have to use and be aware of all the C escape
    220    sequences (as in the second example).  If you want to produce a "%" in the
    221    output, use the "%%" escape sequence, like the third diagnostic.  Finally,
    222    Clang uses the "%...[digit]" sequences to specify where and how arguments to
    223    the diagnostic are formatted.</p>
    224    
    225 <p>Arguments to the diagnostic are numbered according to how they are specified
    226    by the C++ code that <a href="#producingdiag">produces them</a>, and are
    227    referenced by <tt>%0</tt> .. <tt>%9</tt>.  If you have more than 10 arguments
    228    to your diagnostic, you are doing something wrong :).  Unlike printf, there
    229    is no requirement that arguments to the diagnostic end up in the output in
    230    the same order as they are specified, you could have a format string with
    231    <tt>"%1 %0"</tt> that swaps them, for example.  The text in between the
    232    percent and digit are formatting instructions.  If there are no instructions,
    233    the argument is just turned into a string and substituted in.</p>
    234 
    235 <p>Here are some "best practices" for writing the English format string:</p>
    236 
    237 <ul>
    238 <li>Keep the string short.  It should ideally fit in the 80 column limit of the
    239     <tt>DiagnosticKinds.td</tt> file.  This avoids the diagnostic wrapping when
    240     printed, and forces you to think about the important point you are conveying
    241     with the diagnostic.</li>
    242 <li>Take advantage of location information.  The user will be able to see the
    243     line and location of the caret, so you don't need to tell them that the
    244     problem is with the 4th argument to the function: just point to it.</li>
    245 <li>Do not capitalize the diagnostic string, and do not end it with a
    246     period.</li>
    247 <li>If you need to quote something in the diagnostic string, use single
    248     quotes.</li>
    249 </ul>
    250 
    251 <p>Diagnostics should never take random English strings as arguments: you
    252 shouldn't use <tt>"you have a problem with %0"</tt> and pass in things like
    253 <tt>"your argument"</tt> or <tt>"your return value"</tt> as arguments. Doing
    254 this prevents <a href="#translation">translating</a> the Clang diagnostics to
    255 other languages (because they'll get random English words in their otherwise
    256 localized diagnostic).  The exceptions to this are C/C++ language keywords
    257 (e.g. auto, const, mutable, etc) and C/C++ operators (<tt>/=</tt>).  Note
    258 that things like "pointer" and "reference" are not keywords.  On the other
    259 hand, you <em>can</em> include anything that comes from the user's source code,
    260 including variable names, types, labels, etc.  The 'select' format can be 
    261 used to achieve this sort of thing in a localizable way, see below.</p>
    262 
    263 <!-- ==================================== -->
    264 <h4>Formatting a Diagnostic Argument</h4>
    265 <!-- ==================================== -->
    266 
    267 <p>Arguments to diagnostics are fully typed internally, and come from a couple
    268 different classes: integers, types, names, and random strings.  Depending on
    269 the class of the argument, it can be optionally formatted in different ways.
    270 This gives the DiagnosticClient information about what the argument means
    271 without requiring it to use a specific presentation (consider this MVC for
    272 Clang :).</p>
    273 
    274 <p>Here are the different diagnostic argument formats currently supported by
    275 Clang:</p>
    276 
    277 <table>
    278 <tr><td colspan="2"><b>"s" format</b></td></tr>
    279 <tr><td>Example:</td><td><tt>"requires %1 parameter%s1"</tt></td></tr>
    280 <tr><td>Class:</td><td>Integers</td></tr>
    281 <tr><td>Description:</td><td>This is a simple formatter for integers that is
    282     useful when producing English diagnostics.  When the integer is 1, it prints
    283     as nothing.  When the integer is not 1, it prints as "s".  This allows some
    284     simple grammatical forms to be to be handled correctly, and eliminates the
    285     need to use gross things like <tt>"requires %1 parameter(s)"</tt>.</td></tr>
    286 
    287 <tr><td colspan="2"><b>"select" format</b></td></tr>
    288 <tr><td>Example:</td><td><tt>"must be a %select{unary|binary|unary or binary}2
    289      operator"</tt></td></tr>
    290 <tr><td>Class:</td><td>Integers</td></tr>
    291 <tr><td>Description:</td><td><p>This format specifier is used to merge multiple
    292     related diagnostics together into one common one, without requiring the
    293     difference to be specified as an English string argument.  Instead of
    294     specifying the string, the diagnostic gets an integer argument and the
    295     format string selects the numbered option.  In this case, the "%2" value
    296     must be an integer in the range [0..2].  If it is 0, it prints 'unary', if
    297     it is 1 it prints 'binary' if it is 2, it prints 'unary or binary'.  This
    298     allows other language translations to substitute reasonable words (or entire
    299     phrases) based on the semantics of the diagnostic instead of having to do
    300     things textually.</p>
    301     <p>The selected string does undergo formatting.</p></td></tr>
    302 
    303 <tr><td colspan="2"><b>"plural" format</b></td></tr>
    304 <tr><td>Example:</td><td><tt>"you have %1 %plural{1:mouse|:mice}1 connected to
    305     your computer"</tt></td></tr>
    306 <tr><td>Class:</td><td>Integers</td></tr>
    307 <tr><td>Description:</td><td><p>This is a formatter for complex plural forms.
    308     It is designed to handle even the requirements of languages with very
    309 	complex plural forms, as many Baltic languages have. The argument consists
    310 	of a series of expression/form pairs, separated by ':', where the first form
    311 	whose expression evaluates to true is the result of the modifier.</p>
    312 	<p>An expression can be empty, in which case it is always true. See the
    313 	example at the top. Otherwise, it is a series of one or more numeric
    314 	conditions, separated by ','. If any condition matches, the expression
    315 	matches. Each numeric condition can take one of three forms.</p>
    316 	<ul>
    317 	    <li>number: A simple decimal number matches if the argument is the same
    318 		as the number. Example: <tt>"%plural{1:mouse|:mice}4"</tt></li>
    319 		<li>range: A range in square brackets matches if the argument is within
    320 		the range. Then range is inclusive on both ends. Example:
    321 		<tt>"%plural{0:none|1:one|[2,5]:some|:many}2"</tt></li>
    322 		<li>modulo: A modulo operator is followed by a number, and
    323                 equals sign and either a number or a range. The tests are the
    324                 same as for plain
    325 		numbers and ranges, but the argument is taken modulo the number first.
    326 		Example: <tt>"%plural{%100=0:even hundred|%100=[1,50]:lower half|:everything
    327 		else}1"</tt></li>
    328 	</ul>
    329 	<p>The parser is very unforgiving. A syntax error, even whitespace, will
    330 	abort, as will a failure to match the argument against any
    331 	expression.</p></td></tr>
    332 
    333 <tr><td colspan="2"><b>"ordinal" format</b></td></tr>
    334 <tr><td>Example:</td><td><tt>"ambiguity in %ordinal0 argument"</tt></td></tr>
    335 <tr><td>Class:</td><td>Integers</td></tr>
    336 <tr><td>Description:</td><td><p>This is a formatter which represents the
    337     argument number as an ordinal:  the value <tt>1</tt> becomes <tt>1st</tt>,
    338     <tt>3</tt> becomes <tt>3rd</tt>, and so on.  Values less than <tt>1</tt>
    339     are not supported.</p>
    340     <p>This formatter is currently hard-coded to use English ordinals.</p></td></tr>
    341 
    342 <tr><td colspan="2"><b>"objcclass" format</b></td></tr>
    343 <tr><td>Example:</td><td><tt>"method %objcclass0 not found"</tt></td></tr>
    344 <tr><td>Class:</td><td>DeclarationName</td></tr>
    345 <tr><td>Description:</td><td><p>This is a simple formatter that indicates the
    346     DeclarationName corresponds to an Objective-C class method selector.  As
    347     such, it prints the selector with a leading '+'.</p></td></tr>
    348 
    349 <tr><td colspan="2"><b>"objcinstance" format</b></td></tr>
    350 <tr><td>Example:</td><td><tt>"method %objcinstance0 not found"</tt></td></tr>
    351 <tr><td>Class:</td><td>DeclarationName</td></tr>
    352 <tr><td>Description:</td><td><p>This is a simple formatter that indicates the
    353     DeclarationName corresponds to an Objective-C instance method selector.  As
    354     such, it prints the selector with a leading '-'.</p></td></tr>
    355 
    356 <tr><td colspan="2"><b>"q" format</b></td></tr>
    357 <tr><td>Example:</td><td><tt>"candidate found by name lookup is %q0"</tt></td></tr>
    358 <tr><td>Class:</td><td>NamedDecl*</td></tr>
    359 <tr><td>Description</td><td><p>This formatter indicates that the fully-qualified name of the declaration should be printed, e.g., "std::vector" rather than "vector".</p></td></tr>
    360 
    361 <tr><td colspan="2"><b>"diff" format</b></td></tr>
    362 <tr><td>Example:</td><td><tt>"no known conversion %diff{from | to | }1,2"</tt></td></tr>
    363 <tr><td>Class:</td><td>QualType</td></tr>
    364 <tr><td>Description</td><td><p>This formatter takes two QualTypes and attempts to print a template difference between the two.  If tree printing is off, the text inside the braces before the pipe is printed, with the formatted text replacing the $.  If tree printing is on, the text after the pipe is printed and a type tree is printed after the diagnostic message.
    365 </p></td></tr>
    366     
    367 </table>
    368 
    369 <p>It is really easy to add format specifiers to the Clang diagnostics system,
    370 but they should be discussed before they are added.  If you are creating a lot
    371 of repetitive diagnostics and/or have an idea for a useful formatter, please
    372 bring it up on the cfe-dev mailing list.</p>
    373 
    374 <!-- ===================================================== -->
    375 <h4 id="producingdiag">Producing the Diagnostic</h4>
    376 <!-- ===================================================== -->
    377 
    378 <p>Now that you've created the diagnostic in the DiagnosticKinds.td file, you
    379 need to write the code that detects the condition in question and emits the
    380 new diagnostic.  Various components of Clang (e.g. the preprocessor, Sema,
    381 etc) provide a helper function named "Diag".  It creates a diagnostic and
    382 accepts the arguments, ranges, and other information that goes along with
    383 it.</p>
    384 
    385 <p>For example, the binary expression error comes from code like this:</p>
    386 
    387 <pre>
    388   if (various things that are bad)
    389     Diag(Loc, diag::err_typecheck_invalid_operands)
    390       &lt;&lt; lex-&gt;getType() &lt;&lt; rex-&gt;getType()
    391       &lt;&lt; lex-&gt;getSourceRange() &lt;&lt; rex-&gt;getSourceRange();
    392 </pre>
    393 
    394 <p>This shows that use of the Diag method: they take a location (a <a
    395 href="#SourceLocation">SourceLocation</a> object) and a diagnostic enum value
    396 (which matches the name from DiagnosticKinds.td).  If the diagnostic takes
    397 arguments, they are specified with the &lt;&lt; operator: the first argument
    398 becomes %0, the second becomes %1, etc.  The diagnostic interface allows you to
    399 specify arguments of many different types, including <tt>int</tt> and
    400 <tt>unsigned</tt> for integer arguments, <tt>const char*</tt> and
    401 <tt>std::string</tt> for string arguments, <tt>DeclarationName</tt> and
    402 <tt>const IdentifierInfo*</tt> for names, <tt>QualType</tt> for types, etc.
    403 SourceRanges are also specified with the &lt;&lt; operator, but do not have a
    404 specific ordering requirement.</p>
    405 
    406 <p>As you can see, adding and producing a diagnostic is pretty straightforward.
    407 The hard part is deciding exactly what you need to say to help the user, picking
    408 a suitable wording, and providing the information needed to format it correctly.
    409 The good news is that the call site that issues a diagnostic should be
    410 completely independent of how the diagnostic is formatted and in what language
    411 it is rendered.
    412 </p>
    413 
    414 <!-- ==================================================== -->
    415 <h4 id="fix-it-hints">Fix-It Hints</h4>
    416 <!-- ==================================================== -->
    417 
    418 <p>In some cases, the front end emits diagnostics when it is clear
    419 that some small change to the source code would fix the problem. For
    420 example, a missing semicolon at the end of a statement or a use of
    421 deprecated syntax that is easily rewritten into a more modern form. 
    422 Clang tries very hard to emit the diagnostic and recover gracefully
    423 in these and other cases.</p>
    424 
    425 <p>However, for these cases where the fix is obvious, the diagnostic
    426 can be annotated with a hint (referred to as a "fix-it hint") that
    427 describes how to change the code referenced by the diagnostic to fix
    428 the problem. For example, it might add the missing semicolon at the
    429 end of the statement or rewrite the use of a deprecated construct
    430 into something more palatable. Here is one such example from the C++
    431 front end, where we warn about the right-shift operator changing
    432 meaning from C++98 to C++11:</p>
    433 
    434 <pre>
    435 test.cpp:3:7: warning: use of right-shift operator ('&gt;&gt;') in template argument will require parentheses in C++11
    436 A&lt;100 &gt;&gt; 2&gt; *a;
    437       ^
    438   (       )
    439 </pre>
    440 
    441 <p>Here, the fix-it hint is suggesting that parentheses be added,
    442 and showing exactly where those parentheses would be inserted into the
    443 source code. The fix-it hints themselves describe what changes to make
    444 to the source code in an abstract manner, which the text diagnostic
    445 printer renders as a line of "insertions" below the caret line. <a
    446 href="#DiagnosticClient">Other diagnostic clients</a> might choose
    447 to render the code differently (e.g., as markup inline) or even give
    448 the user the ability to automatically fix the problem.</p>
    449 
    450 <p>Fix-it hints on errors and warnings need to obey these rules:</p>
    451 
    452 <ul>
    453 <li>Since they are automatically applied if <code>-Xclang -fixit</code>
    454 is passed to the driver, they should only be used when it's very likely they
    455 match the user's intent.</li>
    456 <li>Clang must recover from errors as if the fix-it had been applied.</li>
    457 </ul>
    458 
    459 <p>If a fix-it can't obey these rules, put the fix-it on a note. Fix-its on
    460 notes are not applied automatically.</p>
    461 
    462 <p>All fix-it hints are described by the <code>FixItHint</code> class,
    463 instances of which should be attached to the diagnostic using the
    464 &lt;&lt; operator in the same way that highlighted source ranges and
    465 arguments are passed to the diagnostic. Fix-it hints can be created
    466 with one of three constructors:</p>
    467 
    468 <dl>
    469   <dt><code>FixItHint::CreateInsertion(Loc, Code)</code></dt>
    470   <dd>Specifies that the given <code>Code</code> (a string) should be inserted
    471   before the source location <code>Loc</code>.</dd>
    472 
    473   <dt><code>FixItHint::CreateRemoval(Range)</code></dt>
    474   <dd>Specifies that the code in the given source <code>Range</code>
    475   should be removed.</dd>
    476 
    477   <dt><code>FixItHint::CreateReplacement(Range, Code)</code></dt>
    478   <dd>Specifies that the code in the given source <code>Range</code>
    479   should be removed, and replaced with the given <code>Code</code> string.</dd>
    480 </dl>
    481 
    482 <!-- ============================================================= -->
    483 <h4><a name="DiagnosticClient">The DiagnosticClient Interface</a></h4>
    484 <!-- ============================================================= -->
    485 
    486 <p>Once code generates a diagnostic with all of the arguments and the rest of
    487 the relevant information, Clang needs to know what to do with it.  As previously
    488 mentioned, the diagnostic machinery goes through some filtering to map a
    489 severity onto a diagnostic level, then (assuming the diagnostic is not mapped to
    490 "<tt>Ignore</tt>") it invokes an object that implements the DiagnosticClient
    491 interface with the information.</p>
    492 
    493 <p>It is possible to implement this interface in many different ways.  For
    494 example, the normal Clang DiagnosticClient (named 'TextDiagnosticPrinter') turns
    495 the arguments into strings (according to the various formatting rules), prints
    496 out the file/line/column information and the string, then prints out the line of
    497 code, the source ranges, and the caret.  However, this behavior isn't required.
    498 </p>
    499 
    500 <p>Another implementation of the DiagnosticClient interface is the
    501 'TextDiagnosticBuffer' class, which is used when Clang is in -verify mode.
    502 Instead of formatting and printing out the diagnostics, this implementation just
    503 captures and remembers the diagnostics as they fly by.  Then -verify compares
    504 the list of produced diagnostics to the list of expected ones.  If they disagree,
    505 it prints out its own output.
    506 </p>
    507 
    508 <p>There are many other possible implementations of this interface, and this is
    509 why we prefer diagnostics to pass down rich structured information in arguments.
    510 For example, an HTML output might want declaration names be linkified to where
    511 they come from in the source.  Another example is that a GUI might let you click
    512 on typedefs to expand them.  This application would want to pass significantly
    513 more information about types through to the GUI than a simple flat string.  The
    514 interface allows this to happen.</p>
    515 
    516 <!-- ====================================================== -->
    517 <h4><a name="translation">Adding Translations to Clang</a></h4>
    518 <!-- ====================================================== -->
    519 
    520 <p>Not possible yet!  Diagnostic strings should be written in UTF-8, the client
    521 can translate to the relevant code page if needed.  Each translation completely
    522 replaces the format string for the diagnostic.</p>
    523 
    524 
    525 <!-- ======================================================================= -->
    526 <h3 id="SourceLocation">The SourceLocation and SourceManager classes</h3>
    527 <!-- ======================================================================= -->
    528 
    529 <p>Strangely enough, the SourceLocation class represents a location within the
    530 source code of the program.  Important design points include:</p>
    531 
    532 <ol>
    533 <li>sizeof(SourceLocation) must be extremely small, as these are embedded into
    534     many AST nodes and are passed around often.  Currently it is 32 bits.</li>
    535 <li>SourceLocation must be a simple value object that can be efficiently
    536     copied.</li>
    537 <li>We should be able to represent a source location for any byte of any input
    538     file.  This includes in the middle of tokens, in whitespace, in trigraphs,
    539     etc.</li>
    540 <li>A SourceLocation must encode the current #include stack that was active when
    541     the location was processed.  For example, if the location corresponds to a
    542     token, it should contain the set of #includes active when the token was
    543     lexed.  This allows us to print the #include stack for a diagnostic.</li>
    544 <li>SourceLocation must be able to describe macro expansions, capturing both
    545     the ultimate instantiation point and the source of the original character
    546     data.</li>
    547 </ol>
    548 
    549 <p>In practice, the SourceLocation works together with the SourceManager class
    550 to encode two pieces of information about a location: its spelling location
    551 and its instantiation location.  For most tokens, these will be the same.
    552 However, for a macro expansion (or tokens that came from a _Pragma directive)
    553 these will describe the location of the characters corresponding to the token
    554 and the location where the token was used (i.e. the macro instantiation point
    555 or the location of the _Pragma itself).</p>
    556 
    557 <p>The Clang front-end inherently depends on the location of a token being
    558 tracked correctly.  If it is ever incorrect, the front-end may get confused and
    559 die.  The reason for this is that the notion of the 'spelling' of a Token in
    560 Clang depends on being able to find the original input characters for the token.
    561 This concept maps directly to the "spelling location" for the token.</p>
    562 
    563 
    564 <!-- ======================================================================= -->
    565 <h3 id="SourceRange">SourceRange and CharSourceRange</h3>
    566 <!-- ======================================================================= -->
    567 <!-- mostly taken from
    568   http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010595.html -->
    569 
    570 <p>Clang represents most source ranges by [first, last], where first and last
    571 each point to the beginning of their respective tokens. For example
    572 consider the SourceRange of the following statement:</p>
    573 <pre>
    574 x = foo + bar;
    575 ^first    ^last
    576 </pre>
    577 
    578 <p>To map from this representation to a character-based
    579 representation, the 'last' location needs to be adjusted to point to
    580 (or past) the end of that token with either
    581 <code>Lexer::MeasureTokenLength()</code> or
    582 <code>Lexer::getLocForEndOfToken()</code>. For the rare cases
    583 where character-level source ranges information is needed we use
    584 the <code>CharSourceRange</code> class.</p>
    585 
    586 
    587 <!-- ======================================================================= -->
    588 <h2 id="libdriver">The Driver Library</h2>
    589 <!-- ======================================================================= -->
    590 
    591 <p>The clang Driver and library are documented <a
    592 href="DriverInternals.html">here</a>.<p>
    593 
    594 <!-- ======================================================================= -->
    595 <h2 id="pch">Precompiled Headers</h2>
    596 <!-- ======================================================================= -->
    597 
    598 <p>Clang supports two implementations of precompiled headers. The
    599    default implementation, precompiled headers (<a
    600     href="PCHInternals.html">PCH</a>) uses a serialized representation
    601    of Clang's internal data structures, encoded with the <a
    602     href="http://llvm.org/docs/BitCodeFormat.html">LLVM bitstream
    603    format</a>. Pretokenized headers (<a
    604     href="PTHInternals.html">PTH</a>), on the other hand, contain a
    605    serialized representation of the tokens encountered when
    606    preprocessing a header (and anything that header includes).</p>
    607 
    608 
    609 <!-- ======================================================================= -->
    610 <h2 id="libfrontend">The Frontend Library</h2>
    611 <!-- ======================================================================= -->
    612 
    613 <p>The Frontend library contains functionality useful for building
    614 tools on top of the clang libraries, for example several methods for
    615 outputting diagnostics.</p>
    616 
    617 <!-- ======================================================================= -->
    618 <h2 id="liblex">The Lexer and Preprocessor Library</h2>
    619 <!-- ======================================================================= -->
    620 
    621 <p>The Lexer library contains several tightly-connected classes that are involved
    622 with the nasty process of lexing and preprocessing C source code.  The main
    623 interface to this library for outside clients is the large <a 
    624 href="#Preprocessor">Preprocessor</a> class.
    625 It contains the various pieces of state that are required to coherently read
    626 tokens out of a translation unit.</p>
    627 
    628 <p>The core interface to the Preprocessor object (once it is set up) is the
    629 Preprocessor::Lex method, which returns the next <a href="#Token">Token</a> from
    630 the preprocessor stream.  There are two types of token providers that the
    631 preprocessor is capable of reading from: a buffer lexer (provided by the <a 
    632 href="#Lexer">Lexer</a> class) and a buffered token stream (provided by the <a
    633 href="#TokenLexer">TokenLexer</a> class).  
    634 
    635 
    636 <!-- ======================================================================= -->
    637 <h3 id="Token">The Token class</h3>
    638 <!-- ======================================================================= -->
    639 
    640 <p>The Token class is used to represent a single lexed token.  Tokens are
    641 intended to be used by the lexer/preprocess and parser libraries, but are not
    642 intended to live beyond them (for example, they should not live in the ASTs).<p>
    643 
    644 <p>Tokens most often live on the stack (or some other location that is efficient
    645 to access) as the parser is running, but occasionally do get buffered up.  For
    646 example, macro definitions are stored as a series of tokens, and the C++
    647 front-end periodically needs to buffer tokens up for tentative parsing and
    648 various pieces of look-ahead.  As such, the size of a Token matter.  On a 32-bit
    649 system, sizeof(Token) is currently 16 bytes.</p>
    650 
    651 <p>Tokens occur in two forms: "<a href="#AnnotationToken">Annotation
    652 Tokens</a>" and normal tokens.  Normal tokens are those returned by the lexer,
    653 annotation tokens represent semantic information and are produced by the parser,
    654 replacing normal tokens in the token stream.  Normal tokens contain the
    655 following information:</p>
    656 
    657 <ul>
    658 <li><b>A SourceLocation</b> - This indicates the location of the start of the
    659 token.</li>
    660 
    661 <li><b>A length</b> - This stores the length of the token as stored in the
    662 SourceBuffer.  For tokens that include them, this length includes trigraphs and
    663 escaped newlines which are ignored by later phases of the compiler.  By pointing
    664 into the original source buffer, it is always possible to get the original
    665 spelling of a token completely accurately.</li>
    666 
    667 <li><b>IdentifierInfo</b> - If a token takes the form of an identifier, and if
    668 identifier lookup was enabled when the token was lexed (e.g. the lexer was not
    669 reading in 'raw' mode) this contains a pointer to the unique hash value for the
    670 identifier.  Because the lookup happens before keyword identification, this
    671 field is set even for language keywords like 'for'.</li>
    672 
    673 <li><b>TokenKind</b> - This indicates the kind of token as classified by the
    674 lexer.  This includes things like <tt>tok::starequal</tt> (for the "*="
    675 operator), <tt>tok::ampamp</tt> for the "&amp;&amp;" token, and keyword values
    676 (e.g. <tt>tok::kw_for</tt>) for identifiers that correspond to keywords.  Note 
    677 that some tokens can be spelled multiple ways.  For example, C++ supports
    678 "operator keywords", where things like "and" are treated exactly like the
    679 "&amp;&amp;" operator.  In these cases, the kind value is set to
    680 <tt>tok::ampamp</tt>, which is good for the parser, which doesn't have to 
    681 consider both forms.  For something that cares about which form is used (e.g.
    682 the preprocessor 'stringize' operator) the spelling indicates the original
    683 form.</li>
    684 
    685 <li><b>Flags</b> - There are currently four flags tracked by the
    686 lexer/preprocessor system on a per-token basis:
    687 
    688   <ol>
    689   <li><b>StartOfLine</b> - This was the first token that occurred on its input
    690        source line.</li>
    691   <li><b>LeadingSpace</b> - There was a space character either immediately
    692        before the token or transitively before the token as it was expanded
    693        through a macro.  The definition of this flag is very closely defined by
    694        the stringizing requirements of the preprocessor.</li>
    695   <li><b>DisableExpand</b> - This flag is used internally to the preprocessor to
    696       represent identifier tokens which have macro expansion disabled.  This
    697       prevents them from being considered as candidates for macro expansion ever
    698       in the future.</li>
    699   <li><b>NeedsCleaning</b> - This flag is set if the original spelling for the
    700       token includes a trigraph or escaped newline.  Since this is uncommon,
    701       many pieces of code can fast-path on tokens that did not need cleaning.
    702    </ol>
    703 </li>
    704 </ul>
    705 
    706 <p>One interesting (and somewhat unusual) aspect of normal tokens is that they
    707 don't contain any semantic information about the lexed value.  For example, if
    708 the token was a pp-number token, we do not represent the value of the number
    709 that was lexed (this is left for later pieces of code to decide).  Additionally,
    710 the lexer library has no notion of typedef names vs variable names: both are
    711 returned as identifiers, and the parser is left to decide whether a specific
    712 identifier is a typedef or a variable (tracking this requires scope information 
    713 among other things).  The parser can do this translation by replacing tokens
    714 returned by the preprocessor with "Annotation Tokens".</p>
    715 
    716 <!-- ======================================================================= -->
    717 <h3 id="AnnotationToken">Annotation Tokens</h3>
    718 <!-- ======================================================================= -->
    719 
    720 <p>Annotation Tokens are tokens that are synthesized by the parser and injected
    721 into the preprocessor's token stream (replacing existing tokens) to record
    722 semantic information found by the parser.  For example, if "foo" is found to be
    723 a typedef, the "foo" <tt>tok::identifier</tt> token is replaced with an
    724 <tt>tok::annot_typename</tt>.  This is useful for a couple of reasons: 1) this
    725 makes it easy to handle qualified type names (e.g. "foo::bar::baz&lt;42&gt;::t")
    726 in C++ as a single "token" in the parser. 2) if the parser backtracks, the
    727 reparse does not need to redo semantic analysis to determine whether a token
    728 sequence is a variable, type, template, etc.</p>
    729 
    730 <p>Annotation Tokens are created by the parser and reinjected into the parser's
    731 token stream (when backtracking is enabled).  Because they can only exist in
    732 tokens that the preprocessor-proper is done with, it doesn't need to keep around
    733 flags like "start of line" that the preprocessor uses to do its job.
    734 Additionally, an annotation token may "cover" a sequence of preprocessor tokens
    735 (e.g. <tt>a::b::c</tt> is five preprocessor tokens).  As such, the valid fields
    736 of an annotation token are different than the fields for a normal token (but
    737 they are multiplexed into the normal Token fields):</p>
    738 
    739 <ul>
    740 <li><b>SourceLocation "Location"</b> - The SourceLocation for the annotation
    741 token indicates the first token replaced by the annotation token. In the example
    742 above, it would be the location of the "a" identifier.</li>
    743 
    744 <li><b>SourceLocation "AnnotationEndLoc"</b> - This holds the location of the
    745 last token replaced with the annotation token.  In the example above, it would
    746 be the location of the "c" identifier.</li>
    747 
    748 <li><b>void* "AnnotationValue"</b> - This contains an opaque object
    749 that the parser gets from Sema.  The parser merely preserves the
    750 information for Sema to later interpret based on the annotation token
    751 kind.</li>
    752 
    753 <li><b>TokenKind "Kind"</b> - This indicates the kind of Annotation token this
    754 is.  See below for the different valid kinds.</li>
    755 </ul>
    756 
    757 <p>Annotation tokens currently come in three kinds:</p>
    758 
    759 <ol>
    760 <li><b>tok::annot_typename</b>: This annotation token represents a
    761 resolved typename token that is potentially qualified.  The
    762 AnnotationValue field contains the <tt>QualType</tt> returned by
    763 Sema::getTypeName(), possibly with source location information
    764 attached.</li>
    765 
    766 <li><b>tok::annot_cxxscope</b>: This annotation token represents a C++
    767 scope specifier, such as "A::B::".  This corresponds to the grammar
    768 productions "::" and ":: [opt] nested-name-specifier".  The
    769 AnnotationValue pointer is a <tt>NestedNameSpecifier*</tt> returned by
    770 the Sema::ActOnCXXGlobalScopeSpecifier and
    771 Sema::ActOnCXXNestedNameSpecifier callbacks.</li>
    772 
    773 <li><b>tok::annot_template_id</b>: This annotation token represents a
    774 C++ template-id such as "foo&lt;int, 4&gt;", where "foo" is the name
    775 of a template. The AnnotationValue pointer is a pointer to a malloc'd
    776 TemplateIdAnnotation object. Depending on the context, a parsed
    777 template-id that names a type might become a typename annotation token
    778 (if all we care about is the named type, e.g., because it occurs in a
    779 type specifier) or might remain a template-id token (if we want to
    780 retain more source location information or produce a new type, e.g.,
    781 in a declaration of a class template specialization). template-id
    782 annotation tokens that refer to a type can be "upgraded" to typename
    783 annotation tokens by the parser.</li>
    784 
    785 </ol>
    786 
    787 <p>As mentioned above, annotation tokens are not returned by the preprocessor,
    788 they are formed on demand by the parser.  This means that the parser has to be
    789 aware of cases where an annotation could occur and form it where appropriate.
    790 This is somewhat similar to how the parser handles Translation Phase 6 of C99:
    791 String Concatenation (see C99 5.1.1.2).  In the case of string concatenation,
    792 the preprocessor just returns distinct tok::string_literal and
    793 tok::wide_string_literal tokens and the parser eats a sequence of them wherever
    794 the grammar indicates that a string literal can occur.</p>
    795 
    796 <p>In order to do this, whenever the parser expects a tok::identifier or
    797 tok::coloncolon, it should call the TryAnnotateTypeOrScopeToken or
    798 TryAnnotateCXXScopeToken methods to form the annotation token.  These methods
    799 will maximally form the specified annotation tokens and replace the current
    800 token with them, if applicable.  If the current tokens is not valid for an
    801 annotation token, it will remain an identifier or :: token.</p>
    802 
    803 
    804 
    805 <!-- ======================================================================= -->
    806 <h3 id="Lexer">The Lexer class</h3>
    807 <!-- ======================================================================= -->
    808 
    809 <p>The Lexer class provides the mechanics of lexing tokens out of a source
    810 buffer and deciding what they mean.  The Lexer is complicated by the fact that
    811 it operates on raw buffers that have not had spelling eliminated (this is a
    812 necessity to get decent performance), but this is countered with careful coding
    813 as well as standard performance techniques (for example, the comment handling
    814 code is vectorized on X86 and PowerPC hosts).</p>
    815 
    816 <p>The lexer has a couple of interesting modal features:</p>
    817 
    818 <ul>
    819 <li>The lexer can operate in 'raw' mode.  This mode has several features that
    820     make it possible to quickly lex the file (e.g. it stops identifier lookup,
    821     doesn't specially handle preprocessor tokens, handles EOF differently, etc).
    822     This mode is used for lexing within an "<tt>#if 0</tt>" block, for
    823     example.</li>
    824 <li>The lexer can capture and return comments as tokens.  This is required to
    825     support the -C preprocessor mode, which passes comments through, and is
    826     used by the diagnostic checker to identifier expect-error annotations.</li>
    827 <li>The lexer can be in ParsingFilename mode, which happens when preprocessing
    828     after reading a #include directive.  This mode changes the parsing of '&lt;'
    829     to return an "angled string" instead of a bunch of tokens for each thing
    830     within the filename.</li>
    831 <li>When parsing a preprocessor directive (after "<tt>#</tt>") the
    832     ParsingPreprocessorDirective mode is entered.  This changes the parser to
    833     return EOD at a newline.</li>
    834 <li>The Lexer uses a LangOptions object to know whether trigraphs are enabled,
    835     whether C++ or ObjC keywords are recognized, etc.</li>
    836 </ul>
    837 
    838 <p>In addition to these modes, the lexer keeps track of a couple of other
    839    features that are local to a lexed buffer, which change as the buffer is
    840    lexed:</p>
    841 
    842 <ul>
    843 <li>The Lexer uses BufferPtr to keep track of the current character being
    844     lexed.</li>
    845 <li>The Lexer uses IsAtStartOfLine to keep track of whether the next lexed token
    846     will start with its "start of line" bit set.</li>
    847 <li>The Lexer keeps track of the current #if directives that are active (which
    848     can be nested).</li>
    849 <li>The Lexer keeps track of an <a href="#MultipleIncludeOpt">
    850     MultipleIncludeOpt</a> object, which is used to
    851     detect whether the buffer uses the standard "<tt>#ifndef XX</tt> /
    852     <tt>#define XX</tt>" idiom to prevent multiple inclusion.  If a buffer does,
    853     subsequent includes can be ignored if the XX macro is defined.</li>
    854 </ul>
    855 
    856 <!-- ======================================================================= -->
    857 <h3 id="TokenLexer">The TokenLexer class</h3>
    858 <!-- ======================================================================= -->
    859 
    860 <p>The TokenLexer class is a token provider that returns tokens from a list
    861 of tokens that came from somewhere else.  It typically used for two things: 1)
    862 returning tokens from a macro definition as it is being expanded 2) returning
    863 tokens from an arbitrary buffer of tokens.  The later use is used by _Pragma and
    864 will most likely be used to handle unbounded look-ahead for the C++ parser.</p>
    865 
    866 <!-- ======================================================================= -->
    867 <h3 id="MultipleIncludeOpt">The MultipleIncludeOpt class</h3>
    868 <!-- ======================================================================= -->
    869 
    870 <p>The MultipleIncludeOpt class implements a really simple little state machine
    871 that is used to detect the standard "<tt>#ifndef XX</tt> / <tt>#define XX</tt>"
    872 idiom that people typically use to prevent multiple inclusion of headers.  If a
    873 buffer uses this idiom and is subsequently #include'd, the preprocessor can
    874 simply check to see whether the guarding condition is defined or not.  If so,
    875 the preprocessor can completely ignore the include of the header.</p>
    876 
    877 
    878 
    879 <!-- ======================================================================= -->
    880 <h2 id="libparse">The Parser Library</h2>
    881 <!-- ======================================================================= -->
    882 
    883 <!-- ======================================================================= -->
    884 <h2 id="libast">The AST Library</h2>
    885 <!-- ======================================================================= -->
    886 
    887 <!-- ======================================================================= -->
    888 <h3 id="Type">The Type class and its subclasses</h3>
    889 <!-- ======================================================================= -->
    890 
    891 <p>The Type class (and its subclasses) are an important part of the AST.  Types
    892 are accessed through the ASTContext class, which implicitly creates and uniques
    893 them as they are needed.  Types have a couple of non-obvious features: 1) they
    894 do not capture type qualifiers like const or volatile (See
    895 <a href="#QualType">QualType</a>), and 2) they implicitly capture typedef
    896 information.  Once created, types are immutable (unlike decls).</p>
    897 
    898 <p>Typedefs in C make semantic analysis a bit more complex than it would
    899 be without them.  The issue is that we want to capture typedef information
    900 and represent it in the AST perfectly, but the semantics of operations need to
    901 "see through" typedefs.  For example, consider this code:</p>
    902 
    903 <code>
    904 void func() {<br>
    905 &nbsp;&nbsp;typedef int foo;<br>
    906 &nbsp;&nbsp;foo X, *Y;<br>
    907 &nbsp;&nbsp;typedef foo* bar;<br>
    908 &nbsp;&nbsp;bar Z;<br>
    909 &nbsp;&nbsp;*X;   <i>// error</i><br>
    910 &nbsp;&nbsp;**Y;  <i>// error</i><br>
    911 &nbsp;&nbsp;**Z;  <i>// error</i><br>
    912 }<br>
    913 </code>
    914 
    915 <p>The code above is illegal, and thus we expect there to be diagnostics emitted
    916 on the annotated lines.  In this example, we expect to get:</p>
    917 
    918 <pre>
    919 <b>test.c:6:1: error: indirection requires pointer operand ('foo' invalid)</b>
    920 *X; // error
    921 <span style="color:blue">^~</span>
    922 <b>test.c:7:1: error: indirection requires pointer operand ('foo' invalid)</b>
    923 **Y; // error
    924 <span style="color:blue">^~~</span>
    925 <b>test.c:8:1: error: indirection requires pointer operand ('foo' invalid)</b>
    926 **Z; // error
    927 <span style="color:blue">^~~</span>
    928 </pre>
    929 
    930 <p>While this example is somewhat silly, it illustrates the point: we want to
    931 retain typedef information where possible, so that we can emit errors about
    932 "<tt>std::string</tt>" instead of "<tt>std::basic_string&lt;char, std:...</tt>".
    933 Doing this requires properly keeping typedef information (for example, the type
    934 of "X" is "foo", not "int"), and requires properly propagating it through the
    935 various operators (for example, the type of *Y is "foo", not "int").  In order
    936 to retain this information, the type of these expressions is an instance of the
    937 TypedefType class, which indicates that the type of these expressions is a
    938 typedef for foo.
    939 </p>
    940 
    941 <p>Representing types like this is great for diagnostics, because the
    942 user-specified type is always immediately available.  There are two problems
    943 with this: first, various semantic checks need to make judgements about the
    944 <em>actual structure</em> of a type, ignoring typedefs.  Second, we need an
    945 efficient way to query whether two types are structurally identical to each
    946 other, ignoring typedefs.  The solution to both of these problems is the idea of
    947 canonical types.</p>
    948 
    949 <!-- =============== -->
    950 <h4>Canonical Types</h4>
    951 <!-- =============== -->
    952 
    953 <p>Every instance of the Type class contains a canonical type pointer.  For
    954 simple types with no typedefs involved (e.g. "<tt>int</tt>", "<tt>int*</tt>",
    955 "<tt>int**</tt>"), the type just points to itself.  For types that have a
    956 typedef somewhere in their structure (e.g. "<tt>foo</tt>", "<tt>foo*</tt>",
    957 "<tt>foo**</tt>", "<tt>bar</tt>"), the canonical type pointer points to their
    958 structurally equivalent type without any typedefs (e.g. "<tt>int</tt>",
    959 "<tt>int*</tt>", "<tt>int**</tt>", and "<tt>int*</tt>" respectively).</p>
    960 
    961 <p>This design provides a constant time operation (dereferencing the canonical
    962 type pointer) that gives us access to the structure of types.  For example,
    963 we can trivially tell that "bar" and "foo*" are the same type by dereferencing
    964 their canonical type pointers and doing a pointer comparison (they both point
    965 to the single "<tt>int*</tt>" type).</p>
    966 
    967 <p>Canonical types and typedef types bring up some complexities that must be
    968 carefully managed.  Specifically, the "isa/cast/dyncast" operators generally
    969 shouldn't be used in code that is inspecting the AST.  For example, when type
    970 checking the indirection operator (unary '*' on a pointer), the type checker
    971 must verify that the operand has a pointer type.  It would not be correct to
    972 check that with "<tt>isa&lt;PointerType&gt;(SubExpr-&gt;getType())</tt>",
    973 because this predicate would fail if the subexpression had a typedef type.</p>
    974 
    975 <p>The solution to this problem are a set of helper methods on Type, used to
    976 check their properties.  In this case, it would be correct to use
    977 "<tt>SubExpr-&gt;getType()-&gt;isPointerType()</tt>" to do the check.  This
    978 predicate will return true if the <em>canonical type is a pointer</em>, which is
    979 true any time the type is structurally a pointer type.  The only hard part here
    980 is remembering not to use the <tt>isa/cast/dyncast</tt> operations.</p>
    981 
    982 <p>The second problem we face is how to get access to the pointer type once we
    983 know it exists.  To continue the example, the result type of the indirection
    984 operator is the pointee type of the subexpression.  In order to determine the
    985 type, we need to get the instance of PointerType that best captures the typedef
    986 information in the program.  If the type of the expression is literally a
    987 PointerType, we can return that, otherwise we have to dig through the
    988 typedefs to find the pointer type.  For example, if the subexpression had type
    989 "<tt>foo*</tt>", we could return that type as the result.  If the subexpression
    990 had type "<tt>bar</tt>", we want to return "<tt>foo*</tt>" (note that we do
    991 <em>not</em> want "<tt>int*</tt>").  In order to provide all of this, Type has
    992 a getAsPointerType() method that checks whether the type is structurally a
    993 PointerType and, if so, returns the best one.  If not, it returns a null
    994 pointer.</p>
    995 
    996 <p>This structure is somewhat mystical, but after meditating on it, it will 
    997 make sense to you :).</p>
    998 
    999 <!-- ======================================================================= -->
   1000 <h3 id="QualType">The QualType class</h3>
   1001 <!-- ======================================================================= -->
   1002 
   1003 <p>The QualType class is designed as a trivial value class that is
   1004 small, passed by-value and is efficient to query.  The idea of
   1005 QualType is that it stores the type qualifiers (const, volatile,
   1006 restrict, plus some extended qualifiers required by language
   1007 extensions) separately from the types themselves.  QualType is
   1008 conceptually a pair of "Type*" and the bits for these type qualifiers.</p>
   1009 
   1010 <p>By storing the type qualifiers as bits in the conceptual pair, it is
   1011 extremely efficient to get the set of qualifiers on a QualType (just return the
   1012 field of the pair), add a type qualifier (which is a trivial constant-time
   1013 operation that sets a bit), and remove one or more type qualifiers (just return
   1014 a QualType with the bitfield set to empty).</p>
   1015 
   1016 <p>Further, because the bits are stored outside of the type itself, we do not
   1017 need to create duplicates of types with different sets of qualifiers (i.e. there
   1018 is only a single heap allocated "int" type: "const int" and "volatile const int"
   1019 both point to the same heap allocated "int" type).  This reduces the heap size
   1020 used to represent bits and also means we do not have to consider qualifiers when
   1021 uniquing types (<a href="#Type">Type</a> does not even contain qualifiers).</p>
   1022 
   1023 <p>In practice, the two most common type qualifiers (const and
   1024 restrict) are stored in the low bits of the pointer to the Type
   1025 object, together with a flag indicating whether extended qualifiers
   1026 are present (which must be heap-allocated).  This means that QualType
   1027 is exactly the same size as a pointer.</p>
   1028 
   1029 <!-- ======================================================================= -->
   1030 <h3 id="DeclarationName">Declaration names</h3>
   1031 <!-- ======================================================================= -->
   1032 
   1033 <p>The <tt>DeclarationName</tt> class represents the name of a
   1034   declaration in Clang. Declarations in the C family of languages can
   1035   take several different forms. Most declarations are named by 
   1036   simple identifiers, e.g., "<code>f</code>" and "<code>x</code>" in
   1037   the function declaration <code>f(int x)</code>. In C++, declaration
   1038   names can also name class constructors ("<code>Class</code>"
   1039   in <code>struct Class { Class(); }</code>), class destructors
   1040   ("<code>~Class</code>"), overloaded operator names ("operator+"),
   1041   and conversion functions ("<code>operator void const *</code>"). In
   1042   Objective-C, declaration names can refer to the names of Objective-C
   1043   methods, which involve the method name and the parameters,
   1044   collectively called a <i>selector</i>, e.g.,
   1045   "<code>setWidth:height:</code>". Since all of these kinds of
   1046   entities - variables, functions, Objective-C methods, C++
   1047   constructors, destructors, and operators - are represented as
   1048   subclasses of Clang's common <code>NamedDecl</code>
   1049   class, <code>DeclarationName</code> is designed to efficiently
   1050   represent any kind of name.</p>
   1051 
   1052 <p>Given
   1053   a <code>DeclarationName</code> <code>N</code>, <code>N.getNameKind()</code>
   1054   will produce a value that describes what kind of name <code>N</code>
   1055   stores. There are 8 options (all of the names are inside
   1056   the <code>DeclarationName</code> class)</p>
   1057 <dl>
   1058   <dt>Identifier</dt>
   1059   <dd>The name is a simple
   1060   identifier. Use <code>N.getAsIdentifierInfo()</code> to retrieve the
   1061   corresponding <code>IdentifierInfo*</code> pointing to the actual
   1062   identifier. Note that C++ overloaded operators (e.g.,
   1063   "<code>operator+</code>") are represented as special kinds of
   1064   identifiers. Use <code>IdentifierInfo</code>'s <code>getOverloadedOperatorID</code>
   1065   function to determine whether an identifier is an overloaded
   1066   operator name.</dd>
   1067 
   1068   <dt>ObjCZeroArgSelector, ObjCOneArgSelector,
   1069   ObjCMultiArgSelector</dt>
   1070   <dd>The name is an Objective-C selector, which can be retrieved as a
   1071     <code>Selector</code> instance
   1072     via <code>N.getObjCSelector()</code>. The three possible name
   1073     kinds for Objective-C reflect an optimization within
   1074     the <code>DeclarationName</code> class: both zero- and
   1075     one-argument selectors are stored as a
   1076     masked <code>IdentifierInfo</code> pointer, and therefore require
   1077     very little space, since zero- and one-argument selectors are far
   1078     more common than multi-argument selectors (which use a different
   1079     structure).</dd>
   1080 
   1081   <dt>CXXConstructorName</dt>
   1082   <dd>The name is a C++ constructor
   1083     name. Use <code>N.getCXXNameType()</code> to retrieve
   1084     the <a href="#QualType">type</a> that this constructor is meant to
   1085     construct. The type is always the canonical type, since all
   1086     constructors for a given type have the same name.</dd>
   1087 
   1088   <dt>CXXDestructorName</dt>
   1089   <dd>The name is a C++ destructor
   1090     name. Use <code>N.getCXXNameType()</code> to retrieve
   1091     the <a href="#QualType">type</a> whose destructor is being
   1092     named. This type is always a canonical type.</dd>
   1093 
   1094   <dt>CXXConversionFunctionName</dt>
   1095   <dd>The name is a C++ conversion function. Conversion functions are
   1096   named according to the type they convert to, e.g., "<code>operator void
   1097       const *</code>". Use <code>N.getCXXNameType()</code> to retrieve
   1098   the type that this conversion function converts to. This type is
   1099     always a canonical type.</dd>
   1100 
   1101   <dt>CXXOperatorName</dt>
   1102   <dd>The name is a C++ overloaded operator name. Overloaded operators
   1103   are named according to their spelling, e.g.,
   1104   "<code>operator+</code>" or "<code>operator new
   1105   []</code>". Use <code>N.getCXXOverloadedOperator()</code> to
   1106   retrieve the overloaded operator (a value of
   1107     type <code>OverloadedOperatorKind</code>).</dd>
   1108 </dl>
   1109 
   1110 <p><code>DeclarationName</code>s are cheap to create, copy, and
   1111   compare. They require only a single pointer's worth of storage in
   1112   the common cases (identifiers, zero-
   1113   and one-argument Objective-C selectors) and use dense, uniqued
   1114   storage for the other kinds of
   1115   names. Two <code>DeclarationName</code>s can be compared for
   1116   equality (<code>==</code>, <code>!=</code>) using a simple bitwise
   1117   comparison, can be ordered
   1118   with <code>&lt;</code>, <code>&gt;</code>, <code>&lt;=</code>,
   1119   and <code>&gt;=</code> (which provide a lexicographical ordering for
   1120   normal identifiers but an unspecified ordering for other kinds of
   1121   names), and can be placed into LLVM <code>DenseMap</code>s
   1122   and <code>DenseSet</code>s.</p>
   1123 
   1124 <p><code>DeclarationName</code> instances can be created in different
   1125   ways depending on what kind of name the instance will store. Normal
   1126   identifiers (<code>IdentifierInfo</code> pointers) and Objective-C selectors
   1127   (<code>Selector</code>) can be implicitly converted
   1128   to <code>DeclarationName</code>s. Names for C++ constructors,
   1129   destructors, conversion functions, and overloaded operators can be retrieved from
   1130   the <code>DeclarationNameTable</code>, an instance of which is
   1131   available as <code>ASTContext::DeclarationNames</code>. The member
   1132   functions <code>getCXXConstructorName</code>, <code>getCXXDestructorName</code>,
   1133   <code>getCXXConversionFunctionName</code>, and <code>getCXXOperatorName</code>, respectively,
   1134   return <code>DeclarationName</code> instances for the four kinds of
   1135   C++ special function names.</p>
   1136 
   1137 <!-- ======================================================================= -->
   1138 <h3 id="DeclContext">Declaration contexts</h3>
   1139 <!-- ======================================================================= -->
   1140 <p>Every declaration in a program exists within some <i>declaration
   1141     context</i>, such as a translation unit, namespace, class, or
   1142     function. Declaration contexts in Clang are represented by
   1143     the <code>DeclContext</code> class, from which the various
   1144   declaration-context AST nodes
   1145   (<code>TranslationUnitDecl</code>, <code>NamespaceDecl</code>, <code>RecordDecl</code>, <code>FunctionDecl</code>,
   1146   etc.) will derive. The <code>DeclContext</code> class provides
   1147   several facilities common to each declaration context:</p>
   1148 <dl>
   1149   <dt>Source-centric vs. Semantics-centric View of Declarations</dt>
   1150   <dd><code>DeclContext</code> provides two views of the declarations
   1151   stored within a declaration context. The source-centric view
   1152   accurately represents the program source code as written, including
   1153   multiple declarations of entities where present (see the
   1154     section <a href="#Redeclarations">Redeclarations and
   1155   Overloads</a>), while the semantics-centric view represents the
   1156   program semantics. The two views are kept synchronized by semantic
   1157   analysis while the ASTs are being constructed.</dd>
   1158 
   1159   <dt>Storage of declarations within that context</dt>
   1160   <dd>Every declaration context can contain some number of
   1161     declarations. For example, a C++ class (represented
   1162     by <code>RecordDecl</code>) contains various member functions,
   1163     fields, nested types, and so on. All of these declarations will be
   1164     stored within the <code>DeclContext</code>, and one can iterate
   1165     over the declarations via
   1166     [<code>DeclContext::decls_begin()</code>, 
   1167     <code>DeclContext::decls_end()</code>). This mechanism provides
   1168     the source-centric view of declarations in the context.</dd>
   1169 
   1170   <dt>Lookup of declarations within that context</dt>
   1171   <dd>The <code>DeclContext</code> structure provides efficient name
   1172     lookup for names within that declaration context. For example,
   1173     if <code>N</code> is a namespace we can look for the
   1174     name <code>N::f</code>
   1175     using <code>DeclContext::lookup</code>. The lookup itself is
   1176     based on a lazily-constructed array (for declaration contexts
   1177     with a small number of declarations) or hash table (for
   1178     declaration contexts with more declarations). The lookup
   1179     operation provides the semantics-centric view of the declarations
   1180     in the context.</dd>
   1181 
   1182   <dt>Ownership of declarations</dt>
   1183   <dd>The <code>DeclContext</code> owns all of the declarations that
   1184   were declared within its declaration context, and is responsible
   1185   for the management of their memory as well as their
   1186   (de-)serialization.</dd>
   1187 </dl>
   1188 
   1189 <p>All declarations are stored within a declaration context, and one
   1190   can query
   1191   information about the context in which each declaration lives. One
   1192   can retrieve the <code>DeclContext</code> that contains a
   1193   particular <code>Decl</code>
   1194   using <code>Decl::getDeclContext</code>. However, see the
   1195   section <a href="#LexicalAndSemanticContexts">Lexical and Semantic
   1196   Contexts</a> for more information about how to interpret this
   1197   context information.</p>
   1198 
   1199 <h4 id="Redeclarations">Redeclarations and Overloads</h4>
   1200 <p>Within a translation unit, it is common for an entity to be
   1201 declared several times. For example, we might declare a function "f"
   1202   and then later re-declare it as part of an inlined definition:</p>
   1203 
   1204 <pre>
   1205 void f(int x, int y, int z = 1);
   1206 
   1207 inline void f(int x, int y, int z) { /* ... */ }
   1208 </pre>
   1209 
   1210 <p>The representation of "f" differs in the source-centric and
   1211   semantics-centric views of a declaration context. In the
   1212   source-centric view, all redeclarations will be present, in the
   1213   order they occurred in the source code, making 
   1214     this view suitable for clients that wish to see the structure of
   1215     the source code. In the semantics-centric view, only the most recent "f"
   1216   will be found by the lookup, since it effectively replaces the first
   1217   declaration of "f".</p>
   1218 
   1219 <p>In the semantics-centric view, overloading of functions is
   1220   represented explicitly. For example, given two declarations of a
   1221   function "g" that are overloaded, e.g.,</p>
   1222 <pre>
   1223 void g();
   1224 void g(int);
   1225 </pre>
   1226 <p>the <code>DeclContext::lookup</code> operation will return
   1227   a <code>DeclContext::lookup_result</code> that contains a range of iterators 
   1228   over declarations of "g". Clients that perform semantic analysis on a
   1229   program that is not concerned with the actual source code will
   1230   primarily use this semantics-centric view.</p>
   1231 
   1232 <h4 id="LexicalAndSemanticContexts">Lexical and Semantic Contexts</h4>
   1233 <p>Each declaration has two potentially different
   1234   declaration contexts: a <i>lexical</i> context, which corresponds to
   1235   the source-centric view of the declaration context, and
   1236   a <i>semantic</i> context, which corresponds to the
   1237   semantics-centric view. The lexical context is accessible
   1238   via <code>Decl::getLexicalDeclContext</code> while the
   1239   semantic context is accessible
   1240   via <code>Decl::getDeclContext</code>, both of which return
   1241   <code>DeclContext</code> pointers. For most declarations, the two
   1242   contexts are identical. For example:</p>
   1243 
   1244 <pre>
   1245 class X {
   1246 public:
   1247   void f(int x);
   1248 };
   1249 </pre>
   1250 
   1251 <p>Here, the semantic and lexical contexts of <code>X::f</code> are
   1252   the <code>DeclContext</code> associated with the
   1253   class <code>X</code> (itself stored as a <code>RecordDecl</code> AST
   1254   node). However, we can now define <code>X::f</code> out-of-line:</p>
   1255 
   1256 <pre>
   1257 void X::f(int x = 17) { /* ... */ }
   1258 </pre>
   1259 
   1260 <p>This definition of has different lexical and semantic
   1261   contexts. The lexical context corresponds to the declaration
   1262   context in which the actual declaration occurred in the source
   1263   code, e.g., the translation unit containing <code>X</code>. Thus,
   1264   this declaration of <code>X::f</code> can be found by traversing
   1265   the declarations provided by
   1266   [<code>decls_begin()</code>, <code>decls_end()</code>) in the
   1267   translation unit.</p>
   1268 
   1269 <p>The semantic context of <code>X::f</code> corresponds to the
   1270   class <code>X</code>, since this member function is (semantically) a
   1271   member of <code>X</code>. Lookup of the name <code>f</code> into
   1272   the <code>DeclContext</code> associated with <code>X</code> will
   1273   then return the definition of <code>X::f</code> (including
   1274   information about the default argument).</p>
   1275 
   1276 <h4 id="TransparentContexts">Transparent Declaration Contexts</h4>
   1277 <p>In C and C++, there are several contexts in which names that are
   1278   logically declared inside another declaration will actually "leak"
   1279   out into the enclosing scope from the perspective of name
   1280   lookup. The most obvious instance of this behavior is in
   1281   enumeration types, e.g.,</p>
   1282 <pre>
   1283 enum Color {
   1284   Red, 
   1285   Green,
   1286   Blue
   1287 };
   1288 </pre>
   1289 
   1290 <p>Here, <code>Color</code> is an enumeration, which is a declaration
   1291   context that contains the
   1292   enumerators <code>Red</code>, <code>Green</code>,
   1293   and <code>Blue</code>. Thus, traversing the list of declarations
   1294   contained in the enumeration <code>Color</code> will
   1295   yield <code>Red</code>, <code>Green</code>,
   1296   and <code>Blue</code>. However, outside of the scope
   1297   of <code>Color</code> one can name the enumerator <code>Red</code>
   1298   without qualifying the name, e.g.,</p>
   1299 
   1300 <pre>
   1301 Color c = Red;
   1302 </pre>
   1303 
   1304 <p>There are other entities in C++ that provide similar behavior. For
   1305   example, linkage specifications that use curly braces:</p>
   1306 
   1307 <pre>
   1308 extern "C" {
   1309   void f(int);
   1310   void g(int);
   1311 }
   1312 // f and g are visible here
   1313 </pre>
   1314 
   1315 <p>For source-level accuracy, we treat the linkage specification and
   1316   enumeration type as a
   1317   declaration context in which its enclosed declarations ("Red",
   1318   "Green", and "Blue"; "f" and "g")
   1319   are declared. However, these declarations are visible outside of the
   1320   scope of the declaration context.</p>
   1321 
   1322 <p>These language features (and several others, described below) have
   1323   roughly the same set of 
   1324   requirements: declarations are declared within a particular lexical
   1325   context, but the declarations are also found via name lookup in
   1326   scopes enclosing the declaration itself. This feature is implemented
   1327   via <i>transparent</i> declaration contexts
   1328   (see <code>DeclContext::isTransparentContext()</code>), whose
   1329   declarations are visible in the nearest enclosing non-transparent
   1330   declaration context. This means that the lexical context of the
   1331   declaration (e.g., an enumerator) will be the
   1332   transparent <code>DeclContext</code> itself, as will the semantic
   1333   context, but the declaration will be visible in every outer context
   1334   up to and including the first non-transparent declaration context (since
   1335   transparent declaration contexts can be nested).</p>
   1336 
   1337 <p>The transparent <code>DeclContexts</code> are:</p>
   1338 <ul>
   1339   <li>Enumerations (but not C++11 "scoped enumerations"):
   1340     <pre>
   1341 enum Color { 
   1342   Red, 
   1343   Green, 
   1344   Blue 
   1345 };
   1346 // Red, Green, and Blue are in scope
   1347   </pre></li>
   1348   <li>C++ linkage specifications:
   1349   <pre>
   1350 extern "C" {
   1351   void f(int);
   1352   void g(int);
   1353 }
   1354 // f and g are in scope
   1355   </pre></li>
   1356   <li>Anonymous unions and structs:
   1357     <pre>
   1358 struct LookupTable {
   1359   bool IsVector;
   1360   union {
   1361     std::vector&lt;Item&gt; *Vector;
   1362     std::set&lt;Item&gt; *Set;
   1363   };
   1364 };
   1365 
   1366 LookupTable LT;
   1367 LT.Vector = 0; // Okay: finds Vector inside the unnamed union
   1368     </pre>
   1369   </li>
   1370   <li>C++11 inline namespaces:
   1371 <pre>
   1372 namespace mylib {
   1373   inline namespace debug {
   1374     class X;
   1375   }
   1376 }
   1377 mylib::X *xp; // okay: mylib::X refers to mylib::debug::X
   1378 </pre>
   1379 </li>
   1380 </ul>
   1381 
   1382 
   1383 <h4 id="MultiDeclContext">Multiply-Defined Declaration Contexts</h4>
   1384 <p>C++ namespaces have the interesting--and, so far, unique--property that 
   1385 the namespace can be defined multiple times, and the declarations
   1386 provided by each namespace definition are effectively merged (from
   1387 the semantic point of view). For example, the following two code
   1388 snippets are semantically indistinguishable:</p>
   1389 <pre>
   1390 // Snippet #1:
   1391 namespace N {
   1392   void f();
   1393 }
   1394 namespace N {
   1395   void f(int);
   1396 }
   1397 
   1398 // Snippet #2:
   1399 namespace N {
   1400   void f();
   1401   void f(int);
   1402 }
   1403 </pre>
   1404 
   1405 <p>In Clang's representation, the source-centric view of declaration
   1406   contexts will actually have two separate <code>NamespaceDecl</code>
   1407   nodes in Snippet #1, each of which is a declaration context that
   1408   contains a single declaration of "f". However, the semantics-centric
   1409   view provided by name lookup into the namespace <code>N</code> for
   1410   "f" will return a <code>DeclContext::lookup_result</code> that contains
   1411   a range of iterators over declarations of "f".</p>
   1412 
   1413 <p><code>DeclContext</code> manages multiply-defined declaration
   1414   contexts internally. The
   1415   function <code>DeclContext::getPrimaryContext</code> retrieves the
   1416   "primary" context for a given <code>DeclContext</code> instance,
   1417   which is the <code>DeclContext</code> responsible for maintaining
   1418   the lookup table used for the semantics-centric view. Given the
   1419   primary context, one can follow the chain
   1420   of <code>DeclContext</code> nodes that define additional
   1421   declarations via <code>DeclContext::getNextContext</code>. Note that
   1422   these functions are used internally within the lookup and insertion
   1423   methods of the <code>DeclContext</code>, so the vast majority of
   1424   clients can ignore them.</p>
   1425 
   1426 <!-- ======================================================================= -->
   1427 <h3 id="CFG">The <tt>CFG</tt> class</h3>
   1428 <!-- ======================================================================= -->
   1429 
   1430 <p>The <tt>CFG</tt> class is designed to represent a source-level
   1431 control-flow graph for a single statement (<tt>Stmt*</tt>).  Typically
   1432 instances of <tt>CFG</tt> are constructed for function bodies (usually
   1433 an instance of <tt>CompoundStmt</tt>), but can also be instantiated to
   1434 represent the control-flow of any class that subclasses <tt>Stmt</tt>,
   1435 which includes simple expressions.  Control-flow graphs are especially
   1436 useful for performing
   1437 <a href="http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities">flow-
   1438 or path-sensitive</a> program analyses on a given function.</p>
   1439 
   1440 <!-- ============ -->
   1441 <h4>Basic Blocks</h4>
   1442 <!-- ============ -->
   1443 
   1444 <p>Concretely, an instance of <tt>CFG</tt> is a collection of basic
   1445 blocks.  Each basic block is an instance of <tt>CFGBlock</tt>, which
   1446 simply contains an ordered sequence of <tt>Stmt*</tt> (each referring
   1447 to statements in the AST).  The ordering of statements within a block
   1448 indicates unconditional flow of control from one statement to the
   1449 next.  <a href="#ConditionalControlFlow">Conditional control-flow</a>
   1450 is represented using edges between basic blocks.  The statements
   1451 within a given <tt>CFGBlock</tt> can be traversed using
   1452 the <tt>CFGBlock::*iterator</tt> interface.</p>
   1453 
   1454 <p>
   1455 A <tt>CFG</tt> object owns the instances of <tt>CFGBlock</tt> within
   1456 the control-flow graph it represents.  Each <tt>CFGBlock</tt> within a
   1457 CFG is also uniquely numbered (accessible
   1458 via <tt>CFGBlock::getBlockID()</tt>).  Currently the number is
   1459 based on the ordering the blocks were created, but no assumptions
   1460 should be made on how <tt>CFGBlock</tt>s are numbered other than their
   1461 numbers are unique and that they are numbered from 0..N-1 (where N is
   1462 the number of basic blocks in the CFG).</p>
   1463 
   1464 <!-- ===================== -->
   1465 <h4>Entry and Exit Blocks</h4>
   1466 <!-- ===================== -->
   1467 
   1468 Each instance of <tt>CFG</tt> contains two special blocks:
   1469 an <i>entry</i> block (accessible via <tt>CFG::getEntry()</tt>), which
   1470 has no incoming edges, and an <i>exit</i> block (accessible
   1471 via <tt>CFG::getExit()</tt>), which has no outgoing edges.  Neither
   1472 block contains any statements, and they serve the role of providing a
   1473 clear entrance and exit for a body of code such as a function body.
   1474 The presence of these empty blocks greatly simplifies the
   1475 implementation of many analyses built on top of CFGs.
   1476 
   1477 <!-- ===================================================== -->
   1478 <h4 id ="ConditionalControlFlow">Conditional Control-Flow</h4>
   1479 <!-- ===================================================== -->
   1480 
   1481 <p>Conditional control-flow (such as those induced by if-statements
   1482 and loops) is represented as edges between <tt>CFGBlock</tt>s.
   1483 Because different C language constructs can induce control-flow,
   1484 each <tt>CFGBlock</tt> also records an extra <tt>Stmt*</tt> that
   1485 represents the <i>terminator</i> of the block.  A terminator is simply
   1486 the statement that caused the control-flow, and is used to identify
   1487 the nature of the conditional control-flow between blocks.  For
   1488 example, in the case of an if-statement, the terminator refers to
   1489 the <tt>IfStmt</tt> object in the AST that represented the given
   1490 branch.</p>
   1491 
   1492 <p>To illustrate, consider the following code example:</p>
   1493 
   1494 <code>
   1495 int foo(int x) {<br>
   1496 &nbsp;&nbsp;x = x + 1;<br>
   1497 <br>
   1498 &nbsp;&nbsp;if (x > 2) x++;<br>
   1499 &nbsp;&nbsp;else {<br>
   1500 &nbsp;&nbsp;&nbsp;&nbsp;x += 2;<br>
   1501 &nbsp;&nbsp;&nbsp;&nbsp;x *= 2;<br>
   1502 &nbsp;&nbsp;}<br>
   1503 <br>
   1504 &nbsp;&nbsp;return x;<br>
   1505 }
   1506 </code>
   1507 
   1508 <p>After invoking the parser+semantic analyzer on this code fragment,
   1509 the AST of the body of <tt>foo</tt> is referenced by a
   1510 single <tt>Stmt*</tt>.  We can then construct an instance
   1511 of <tt>CFG</tt> representing the control-flow graph of this function
   1512 body by single call to a static class method:</p>
   1513 
   1514 <code>
   1515 &nbsp;&nbsp;Stmt* FooBody = ...<br>
   1516 &nbsp;&nbsp;CFG*  FooCFG = <b>CFG::buildCFG</b>(FooBody);
   1517 </code>
   1518 
   1519 <p>It is the responsibility of the caller of <tt>CFG::buildCFG</tt>
   1520 to <tt>delete</tt> the returned <tt>CFG*</tt> when the CFG is no
   1521 longer needed.</p>
   1522 
   1523 <p>Along with providing an interface to iterate over
   1524 its <tt>CFGBlock</tt>s, the <tt>CFG</tt> class also provides methods
   1525 that are useful for debugging and visualizing CFGs.  For example, the
   1526 method
   1527 <tt>CFG::dump()</tt> dumps a pretty-printed version of the CFG to
   1528 standard error.  This is especially useful when one is using a
   1529 debugger such as gdb.  For example, here is the output
   1530 of <tt>FooCFG->dump()</tt>:</p>
   1531 
   1532 <code>
   1533 &nbsp;[ B5 (ENTRY) ]<br>
   1534 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (0):<br>
   1535 &nbsp;&nbsp;&nbsp;&nbsp;Successors (1): B4<br>
   1536 <br>
   1537 &nbsp;[ B4 ]<br>
   1538 &nbsp;&nbsp;&nbsp;&nbsp;1: x = x + 1<br>
   1539 &nbsp;&nbsp;&nbsp;&nbsp;2: (x > 2)<br>
   1540 &nbsp;&nbsp;&nbsp;&nbsp;<b>T: if [B4.2]</b><br>
   1541 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (1): B5<br>
   1542 &nbsp;&nbsp;&nbsp;&nbsp;Successors (2): B3 B2<br>
   1543 <br>
   1544 &nbsp;[ B3 ]<br>
   1545 &nbsp;&nbsp;&nbsp;&nbsp;1: x++<br>
   1546 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (1): B4<br>
   1547 &nbsp;&nbsp;&nbsp;&nbsp;Successors (1): B1<br>
   1548 <br>
   1549 &nbsp;[ B2 ]<br>
   1550 &nbsp;&nbsp;&nbsp;&nbsp;1: x += 2<br>
   1551 &nbsp;&nbsp;&nbsp;&nbsp;2: x *= 2<br>
   1552 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (1): B4<br>
   1553 &nbsp;&nbsp;&nbsp;&nbsp;Successors (1): B1<br>
   1554 <br>
   1555 &nbsp;[ B1 ]<br>
   1556 &nbsp;&nbsp;&nbsp;&nbsp;1: return x;<br>
   1557 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (2): B2 B3<br>
   1558 &nbsp;&nbsp;&nbsp;&nbsp;Successors (1): B0<br>
   1559 <br>
   1560 &nbsp;[ B0 (EXIT) ]<br>
   1561 &nbsp;&nbsp;&nbsp;&nbsp;Predecessors (1): B1<br>
   1562 &nbsp;&nbsp;&nbsp;&nbsp;Successors (0):
   1563 </code>
   1564 
   1565 <p>For each block, the pretty-printed output displays for each block
   1566 the number of <i>predecessor</i> blocks (blocks that have outgoing
   1567 control-flow to the given block) and <i>successor</i> blocks (blocks
   1568 that have control-flow that have incoming control-flow from the given
   1569 block).  We can also clearly see the special entry and exit blocks at
   1570 the beginning and end of the pretty-printed output.  For the entry
   1571 block (block B5), the number of predecessor blocks is 0, while for the
   1572 exit block (block B0) the number of successor blocks is 0.</p>
   1573 
   1574 <p>The most interesting block here is B4, whose outgoing control-flow
   1575 represents the branching caused by the sole if-statement
   1576 in <tt>foo</tt>.  Of particular interest is the second statement in
   1577 the block, <b><tt>(x > 2)</tt></b>, and the terminator, printed
   1578 as <b><tt>if [B4.2]</tt></b>.  The second statement represents the
   1579 evaluation of the condition of the if-statement, which occurs before
   1580 the actual branching of control-flow.  Within the <tt>CFGBlock</tt>
   1581 for B4, the <tt>Stmt*</tt> for the second statement refers to the
   1582 actual expression in the AST for <b><tt>(x > 2)</tt></b>.  Thus
   1583 pointers to subclasses of <tt>Expr</tt> can appear in the list of
   1584 statements in a block, and not just subclasses of <tt>Stmt</tt> that
   1585 refer to proper C statements.</p>
   1586 
   1587 <p>The terminator of block B4 is a pointer to the <tt>IfStmt</tt>
   1588 object in the AST.  The pretty-printer outputs <b><tt>if
   1589 [B4.2]</tt></b> because the condition expression of the if-statement
   1590 has an actual place in the basic block, and thus the terminator is
   1591 essentially
   1592 <i>referring</i> to the expression that is the second statement of
   1593 block B4 (i.e., B4.2).  In this manner, conditions for control-flow
   1594 (which also includes conditions for loops and switch statements) are
   1595 hoisted into the actual basic block.</p>
   1596 
   1597 <!-- ===================== -->
   1598 <!-- <h4>Implicit Control-Flow</h4> -->
   1599 <!-- ===================== -->
   1600 
   1601 <!--
   1602 <p>A key design principle of the <tt>CFG</tt> class was to not require
   1603 any transformations to the AST in order to represent control-flow.
   1604 Thus the <tt>CFG</tt> does not perform any "lowering" of the
   1605 statements in an AST: loops are not transformed into guarded gotos,
   1606 short-circuit operations are not converted to a set of if-statements,
   1607 and so on.</p>
   1608 -->
   1609 
   1610 
   1611 <!-- ======================================================================= -->
   1612 <h3 id="Constants">Constant Folding in the Clang AST</h3>
   1613 <!-- ======================================================================= -->
   1614 
   1615 <p>There are several places where constants and constant folding matter a lot to
   1616 the Clang front-end.  First, in general, we prefer the AST to retain the source
   1617 code as close to how the user wrote it as possible.  This means that if they
   1618 wrote "5+4", we want to keep the addition and two constants in the AST, we don't
   1619 want to fold to "9".  This means that constant folding in various ways turns
   1620 into a tree walk that needs to handle the various cases.</p>
   1621 
   1622 <p>However, there are places in both C and C++ that require constants to be
   1623 folded.  For example, the C standard defines what an "integer constant
   1624 expression" (i-c-e) is with very precise and specific requirements.  The
   1625 language then requires i-c-e's in a lot of places (for example, the size of a
   1626 bitfield, the value for a case statement, etc).  For these, we have to be able
   1627 to constant fold the constants, to do semantic checks (e.g. verify bitfield size
   1628 is non-negative and that case statements aren't duplicated).  We aim for Clang
   1629 to be very pedantic about this, diagnosing cases when the code does not use an
   1630 i-c-e where one is required, but accepting the code unless running with
   1631 <tt>-pedantic-errors</tt>.</p>
   1632 
   1633 <p>Things get a little bit more tricky when it comes to compatibility with
   1634 real-world source code.  Specifically, GCC has historically accepted a huge
   1635 superset of expressions as i-c-e's, and a lot of real world code depends on this
   1636 unfortuate accident of history (including, e.g., the glibc system headers).  GCC
   1637 accepts anything its "fold" optimizer is capable of reducing to an integer
   1638 constant, which means that the definition of what it accepts changes as its
   1639 optimizer does.  One example is that GCC accepts things like "case X-X:" even
   1640 when X is a variable, because it can fold this to 0.</p>
   1641 
   1642 <p>Another issue are how constants interact with the extensions we support, such
   1643 as __builtin_constant_p, __builtin_inf, __extension__ and many others.  C99
   1644 obviously does not specify the semantics of any of these extensions, and the
   1645 definition of i-c-e does not include them.  However, these extensions are often
   1646 used in real code, and we have to have a way to reason about them.</p>
   1647 
   1648 <p>Finally, this is not just a problem for semantic analysis.  The code
   1649 generator and other clients have to be able to fold constants (e.g. to
   1650 initialize global variables) and has to handle a superset of what C99 allows.
   1651 Further, these clients can benefit from extended information.  For example, we
   1652 know that "foo()||1" always evaluates to true, but we can't replace the
   1653 expression with true because it has side effects.</p>
   1654 
   1655 <!-- ======================= -->
   1656 <h4>Implementation Approach</h4>
   1657 <!-- ======================= -->
   1658 
   1659 <p>After trying several different approaches, we've finally converged on a
   1660 design (Note, at the time of this writing, not all of this has been implemented,
   1661 consider this a design goal!).  Our basic approach is to define a single
   1662 recursive method evaluation method (<tt>Expr::Evaluate</tt>), which is
   1663 implemented in <tt>AST/ExprConstant.cpp</tt>.  Given an expression with 'scalar'
   1664 type (integer, fp, complex, or pointer) this method returns the following
   1665 information:</p>
   1666 
   1667 <ul>
   1668 <li>Whether the expression is an integer constant expression, a general
   1669     constant that was folded but has no side effects, a general constant that
   1670     was folded but that does have side effects, or an uncomputable/unfoldable
   1671     value.
   1672 </li>
   1673 <li>If the expression was computable in any way, this method returns the APValue
   1674     for the result of the expression.</li>
   1675 <li>If the expression is not evaluatable at all, this method returns
   1676     information on one of the problems with the expression.  This includes a
   1677     SourceLocation for where the problem is, and a diagnostic ID that explains
   1678     the problem.  The diagnostic should be have ERROR type.</li>
   1679 <li>If the expression is not an integer constant expression, this method returns
   1680     information on one of the problems with the expression.  This includes a
   1681     SourceLocation for where the problem is, and a diagnostic ID that explains
   1682     the problem.  The diagnostic should be have EXTENSION type.</li>
   1683 </ul>
   1684 
   1685 <p>This information gives various clients the flexibility that they want, and we
   1686 will eventually have some helper methods for various extensions.  For example,
   1687 Sema should have a <tt>Sema::VerifyIntegerConstantExpression</tt> method, which
   1688 calls Evaluate on the expression.  If the expression is not foldable, the error
   1689 is emitted, and it would return true.  If the expression is not an i-c-e, the
   1690 EXTENSION diagnostic is emitted.  Finally it would return false to indicate that
   1691 the AST is ok.</p>
   1692 
   1693 <p>Other clients can use the information in other ways, for example, codegen can
   1694 just use expressions that are foldable in any way.</p>
   1695 
   1696 <!-- ========== -->
   1697 <h4>Extensions</h4>
   1698 <!-- ========== -->
   1699 
   1700 <p>This section describes how some of the various extensions Clang supports 
   1701 interacts with constant evaluation:</p>
   1702 
   1703 <ul>
   1704 <li><b><tt>__extension__</tt></b>: The expression form of this extension causes
   1705     any evaluatable subexpression to be accepted as an integer constant
   1706     expression.</li>
   1707 <li><b><tt>__builtin_constant_p</tt></b>: This returns true (as an integer
   1708     constant expression) if the operand evaluates to either a numeric value
   1709     (that is, not a pointer cast to integral type) of integral, enumeration,
   1710     floating or complex type, or if it evaluates to the address of the first
   1711     character of a string literal (possibly cast to some other type). As a
   1712     special case, if <tt>__builtin_constant_p</tt> is the (potentially
   1713     parenthesized) condition of a conditional operator expression ("?:"), only
   1714     the true side of the conditional operator is considered, and it is evaluated
   1715     with full constant folding.</li>
   1716 <li><b><tt>__builtin_choose_expr</tt></b>: The condition is required to be an
   1717     integer constant expression, but we accept any constant as an "extension of
   1718     an extension".  This only evaluates one operand depending on which way the
   1719     condition evaluates.</li>
   1720 <li><b><tt>__builtin_classify_type</tt></b>: This always returns an integer
   1721     constant expression.</li>
   1722 <li><b><tt>__builtin_inf,nan,..</tt></b>: These are treated just like a
   1723     floating-point literal.</li>
   1724 <li><b><tt>__builtin_abs,copysign,..</tt></b>: These are constant folded as
   1725     general constant expressions.</li>
   1726 <li><b><tt>__builtin_strlen</tt></b> and <b><tt>strlen</tt></b>: These are
   1727     constant folded as integer constant expressions if the argument is a string
   1728     literal.</li>
   1729 </ul>
   1730 
   1731 
   1732 <!-- ======================================================================= -->
   1733 <h2 id="Howtos">How to change Clang</h2>
   1734 <!-- ======================================================================= -->
   1735 
   1736 <!-- ======================================================================= -->
   1737 <h3 id="AddingAttributes">How to add an attribute</h3>
   1738 <!-- ======================================================================= -->
   1739 
   1740 <p>To add an attribute, you'll have to add it to the list of attributes, add it
   1741 to the parsing phase, and look for it in the AST scan.
   1742 <a href="http://llvm.org/viewvc/llvm-project?view=rev&revision=124217">r124217</a>
   1743 has a good example of adding a warning attribute.</p>
   1744 
   1745 <p>(Beware that this hasn't been reviewed/fixed by the people who designed the
   1746 attributes system yet.)</p>
   1747 
   1748 <h4><a
   1749 href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?view=markup">include/clang/Basic/Attr.td</a></h4>
   1750 
   1751 <p>Each attribute gets a <tt>def</tt> inheriting from <tt>Attr</tt> or one of
   1752 its subclasses.  <tt>InheritableAttr</tt> means that the attribute also applies
   1753 to subsequent declarations of the same name.</p>
   1754 
   1755 <p><tt>Spellings</tt> lists the strings that can appear in
   1756 <tt>__attribute__((here))</tt> or <tt>[[here]]</tt>.  All such strings
   1757 will be synonymous.  If you want to allow the <tt>[[]]</tt> C++11
   1758 syntax, you have to define a list of <tt>Namespaces</tt>, which will
   1759 let users write <tt>[[namespace:spelling]]</tt>. Using the empty
   1760 string for a namespace will allow users to write just the spelling
   1761 with no "<tt>:</tt>".</p>
   1762 
   1763 <p><tt>Subjects</tt> restricts what kinds of AST node to which this attribute
   1764 can appertain (roughly, attach).</p>
   1765 
   1766 <p><tt>Args</tt> names the arguments the attribute takes, in order. If
   1767 <tt>Args</tt> is <tt>[StringArgument&lt;"Arg1">, IntArgument&lt;"Arg2">]</tt>
   1768 then <tt>__attribute__((myattribute("Hello", 3)))</tt> will be a valid use.</p>
   1769 
   1770 <h4>Boilerplate</h4>
   1771 
   1772 <p>Write a new <tt>HandleYourAttr()</tt> function in <a
   1773 href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?view=markup">lib/Sema/SemaDeclAttr.cpp</a>,
   1774 and add a case to the switch in <tt>ProcessNonInheritableDeclAttr()</tt> or
   1775 <tt>ProcessInheritableDeclAttr()</tt> forwarding to it.</p>
   1776 
   1777 <p>If your attribute causes extra warnings to fire, define a <tt>DiagGroup</tt>
   1778 in <a
   1779 href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticGroups.td?view=markup">include/clang/Basic/DiagnosticGroups.td</a>
   1780 named after the attribute's <tt>Spelling</tt> with "_"s replaced by "-"s.  If
   1781 you're only defining one diagnostic, you can skip <tt>DiagnosticGroups.td</tt>
   1782 and use <tt>InGroup&lt;DiagGroup&lt;"your-attribute">></tt> directly in <a
   1783 href="http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?view=markup">DiagnosticSemaKinds.td</a></p>
   1784 
   1785 <h4>The meat of your attribute</h4>
   1786 
   1787 <p>Find an appropriate place in Clang to do whatever your attribute needs to do.
   1788 Check for the attribute's presence using <tt>Decl::getAttr&lt;YourAttr>()</tt>.</p>
   1789 
   1790 <p>Update the <a href="LanguageExtensions.html">Clang Language Extensions</a>
   1791 document to describe your new attribute.</p>
   1792 
   1793 <!-- ======================================================================= -->
   1794 <h3 id="AddingExprStmt">How to add an expression or statement</h3>
   1795 <!-- ======================================================================= -->
   1796 
   1797 <p>Expressions and statements are one of the most fundamental constructs within a
   1798 compiler, because they interact with many different parts of the AST,
   1799 semantic analysis, and IR generation. Therefore, adding a new
   1800 expression or statement kind into Clang requires some care. The following list
   1801 details the various places in Clang where an expression or statement needs to be
   1802 introduced, along with patterns to follow to ensure that the new
   1803 expression or statement works well across all of the C languages. We
   1804 focus on expressions, but statements are similar.</p>
   1805 
   1806 <ol>
   1807   <li>Introduce parsing actions into the parser. Recursive-descent
   1808   parsing is mostly self-explanatory, but there are a few things that
   1809   are worth keeping in mind:
   1810   <ul>
   1811     <li>Keep as much source location information as possible! You'll
   1812     want it later to produce great diagnostics and support Clang's
   1813     various features that map between source code and the AST.</li>
   1814    <li>Write tests for all of the "bad" parsing cases, to make sure
   1815     your recovery is good. If you have matched delimiters (e.g.,
   1816     parentheses, square brackets, etc.), use
   1817     <tt>Parser::BalancedDelimiterTracker</tt> to give nice diagnostics when
   1818     things go wrong.</li>
   1819   </ul>
   1820   </li>
   1821 
   1822   <li>Introduce semantic analysis actions into <tt>Sema</tt>. Semantic
   1823   analysis should always involve two functions: an <tt>ActOnXXX</tt>
   1824   function that will be called directly from the parser, and a
   1825   <tt>BuildXXX</tt> function that performs the actual semantic
   1826   analysis and will (eventually!) build the AST node. It's fairly
   1827   common for the <tt>ActOnCXX</tt> function to do very little (often
   1828   just some minor translation from the parser's representation to
   1829   <tt>Sema</tt>'s representation of the same thing), but the separation
   1830   is still important: C++ template instantiation, for example,
   1831   should always call the <tt>BuildXXX</tt> variant. Several notes on
   1832   semantic analysis before we get into construction of the AST:
   1833   <ul>
   1834     <li>Your expression probably involves some types and some
   1835     subexpressions. Make sure to fully check that those types, and the
   1836     types of those subexpressions, meet your expectations. Add
   1837     implicit conversions where necessary to make sure that all of the
   1838     types line up exactly the way you want them. Write extensive tests
   1839     to check that you're getting good diagnostics for mistakes and
   1840     that you can use various forms of subexpressions with your
   1841     expression.</li>
   1842    <li>When type-checking a type or subexpression, make sure to first
   1843     check whether the type is "dependent"
   1844     (<tt>Type::isDependentType()</tt>) or whether a subexpression is
   1845     type-dependent (<tt>Expr::isTypeDependent()</tt>). If any of these
   1846     return true, then you're inside a template and you can't do much
   1847     type-checking now. That's normal, and your AST node (when you get
   1848     there) will have to deal with this case. At this point, you can
   1849     write tests that use your expression within templates, but don't
   1850     try to instantiate the templates.</li>
   1851    <li>For each subexpression, be sure to call
   1852     <tt>Sema::CheckPlaceholderExpr()</tt> to deal with "weird"
   1853     expressions that don't behave well as subexpressions. Then,
   1854     determine whether you need to perform
   1855     lvalue-to-rvalue conversions
   1856     (<tt>Sema::DefaultLvalueConversion</tt>e) or
   1857     the usual unary conversions
   1858     (<tt>Sema::UsualUnaryConversions</tt>), for places where the
   1859     subexpression is producing a value you intend to use.</li>
   1860     <li>Your <tt>BuildXXX</tt> function will probably just return
   1861     <tt>ExprError()</tt> at this point, since you don't have an AST.
   1862     That's perfectly fine, and shouldn't impact your testing.</li>
   1863   </ul>
   1864   </li>
   1865 
   1866   <li>Introduce an AST node for your new expression. This starts with
   1867   declaring the node in <tt>include/Basic/StmtNodes.td</tt> and
   1868   creating a new class for your expression in the appropriate
   1869   <tt>include/AST/Expr*.h</tt> header. It's best to look at the class
   1870   for a similar expression to get ideas, and there are some specific
   1871   things to watch for:
   1872   <ul>
   1873     <li>If you need to allocate memory, use the <tt>ASTContext</tt>
   1874     allocator to allocate memory. Never use raw <tt>malloc</tt> or
   1875     <tt>new</tt>, and never hold any resources in an AST node, because
   1876     the destructor of an AST node is never called.</li>
   1877 
   1878     <li>Make sure that <tt>getSourceRange()</tt> covers the exact
   1879     source range of your expression. This is needed for diagnostics
   1880     and for IDE support.</li>
   1881 
   1882     <li>Make sure that <tt>children()</tt> visits all of the
   1883     subexpressions. This is important for a number of features (e.g., IDE
   1884     support, C++ variadic templates). If you have sub-types, you'll
   1885     also need to visit those sub-types in the
   1886     <tt>RecursiveASTVisitor</tt>.</li>
   1887 
   1888     <li>Add printing support (<tt>StmtPrinter.cpp</tt>) and dumping
   1889     support (<tt>StmtDumper.cpp</tt>) for your expression.</li>
   1890 
   1891     <li>Add profiling support (<tt>StmtProfile.cpp</tt>) for your AST
   1892     node, noting the distinguishing (non-source location)
   1893     characteristics of an instance of your expression. Omitting this
   1894     step will lead to hard-to-diagnose failures regarding matching of
   1895     template declarations.</li>
   1896   </ul>
   1897   </li>
   1898 
   1899   <li>Teach semantic analysis to build your AST node! At this point,
   1900   you can wire up your <tt>Sema::BuildXXX</tt> function to actually
   1901   create your AST. A few things to check at this point:
   1902   <ul>
   1903     <li>If your expression can construct a new C++ class or return a
   1904     new Objective-C object, be sure to update and then call
   1905     <tt>Sema::MaybeBindToTemporary</tt> for your just-created AST node
   1906     to be sure that the object gets properly destructed. An easy way
   1907     to test this is to return a C++ class with a private destructor:
   1908     semantic analysis should flag an error here with the attempt to
   1909     call the destructor.</li>
   1910    <li>Inspect the generated AST by printing it using <tt>clang -cc1
   1911     -ast-print</tt>, to make sure you're capturing all of the
   1912     important information about how the AST was written.</li>
   1913    <li>Inspect the generated AST under <tt>clang -cc1 -ast-dump</tt>
   1914     to verify that all of the types in the generated AST line up the
   1915     way you want them. Remember that clients of the AST should never
   1916     have to "think" to understand what's going on. For example, all
   1917     implicit conversions should show up explicitly in the AST.</li>
   1918     <li>Write tests that use your expression as a subexpression of
   1919     other, well-known expressions. Can you call a function using your
   1920     expression as an argument? Can you use the ternary operator?</li>
   1921   </ul>
   1922   </li>
   1923 
   1924   <li>Teach code generation to create IR to your AST node. This step
   1925   is the first (and only) that requires knowledge of LLVM IR. There
   1926   are several things to keep in mind:
   1927   <ul>
   1928     <li>Code generation is separated into scalar/aggregate/complex and
   1929     lvalue/rvalue paths, depending on what kind of result your
   1930     expression produces. On occasion, this requires some careful
   1931     factoring of code to avoid duplication.</li>
   1932 
   1933     <li><tt>CodeGenFunction</tt> contains functions
   1934     <tt>ConvertType</tt> and <tt>ConvertTypeForMem</tt> that convert
   1935     Clang's types (<tt>clang::Type*</tt> or <tt>clang::QualType</tt>)
   1936     to LLVM types.
   1937     Use the former for values, and the later for memory locations:
   1938     test with the C++ "bool" type to check this. If you find
   1939     that you are having to use LLVM bitcasts to make
   1940     the subexpressions of your expression have the type that your
   1941     expression expects, STOP! Go fix semantic analysis and the AST so
   1942     that you don't need these bitcasts.</li>
   1943     
   1944     <li>The <tt>CodeGenFunction</tt> class has a number of helper
   1945     functions to make certain operations easy, such as generating code
   1946     to produce an lvalue or an rvalue, or to initialize a memory
   1947     location with a given value. Prefer to use these functions rather
   1948     than directly writing loads and stores, because these functions
   1949     take care of some of the tricky details for you (e.g., for
   1950     exceptions).</li>
   1951 
   1952     <li>If your expression requires some special behavior in the event
   1953     of an exception, look at the <tt>push*Cleanup</tt> functions in
   1954     <tt>CodeGenFunction</tt> to introduce a cleanup. You shouldn't
   1955     have to deal with exception-handling directly.</li>
   1956 
   1957     <li>Testing is extremely important in IR generation. Use <tt>clang
   1958     -cc1 -emit-llvm</tt> and <a
   1959     href="http://llvm.org/cmds/FileCheck.html">FileCheck</a> to verify
   1960     that you're generating the right IR.</li>
   1961   </ul>
   1962   </li>
   1963 
   1964   <li>Teach template instantiation how to cope with your AST
   1965   node, which requires some fairly simple code:
   1966   <ul>
   1967     <li>Make sure that your expression's constructor properly
   1968     computes the flags for type dependence (i.e., the type your
   1969     expression produces can change from one instantiation to the
   1970     next), value dependence (i.e., the constant value your expression
   1971     produces can change from one instantiation to the next),
   1972     instantiation dependence (i.e., a template parameter occurs
   1973     anywhere in your expression), and whether your expression contains
   1974     a parameter pack (for variadic templates). Often, computing these
   1975     flags just means combining the results from the various types and
   1976     subexpressions.</li>
   1977     
   1978     <li>Add <tt>TransformXXX</tt> and <tt>RebuildXXX</tt> functions to
   1979     the
   1980     <tt>TreeTransform</tt> class template in <tt>Sema</tt>.
   1981     <tt>TransformXXX</tt> should (recursively) transform all of the
   1982     subexpressions and types
   1983     within your expression, using <tt>getDerived().TransformYYY</tt>.
   1984     If all of the subexpressions and types transform without error, it
   1985     will then call the <tt>RebuildXXX</tt> function, which will in
   1986     turn call <tt>getSema().BuildXXX</tt> to perform semantic analysis
   1987     and build your expression.</li>
   1988     
   1989     <li>To test template instantiation, take those tests you wrote to
   1990     make sure that you were type checking with type-dependent
   1991     expressions and dependent types (from step #2) and instantiate
   1992     those templates with various types, some of which type-check and
   1993     some that don't, and test the error messages in each case.</li>
   1994   </ul>
   1995   </li>
   1996   
   1997   <li>There are some "extras" that make other features work better.
   1998   It's worth handling these extras to give your expression complete
   1999   integration into Clang:
   2000   <ul>
   2001     <li>Add code completion support for your expression in
   2002     <tt>SemaCodeComplete.cpp</tt>.</li>
   2003     
   2004     <li>If your expression has types in it, or has any "interesting"
   2005     features other than subexpressions, extend libclang's
   2006     <tt>CursorVisitor</tt> to provide proper visitation for your
   2007     expression, enabling various IDE features such as syntax
   2008     highlighting, cross-referencing, and so on. The
   2009     <tt>c-index-test</tt> helper program can be used to test these
   2010     features.</li>
   2011   </ul>
   2012   </li>
   2013 </ol>
   2014 
   2015 </div>
   2016 </body>
   2017 </html>
   2018