Home | History | Annotate | Download | only in docs
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5 <title>Introduction to the Clang AST</title>
      6 <link type="text/css" rel="stylesheet" href="../menu.css" />
      7 <link type="text/css" rel="stylesheet" href="../content.css" />
      8 </head>
      9 <body>
     10 
     11 <!--#include virtual="../menu.html.incl"-->
     12 
     13 <div id="content">
     14 
     15 <h1>Introduction to the Clang AST</h1>
     16 <p>This document gives a gentle introduction to the mysteries of the Clang AST.
     17 It is targeted at developers who either want to contribute to Clang, or use
     18 tools that work based on Clang's AST, like the AST matchers.</p>
     19 <!-- FIXME: Add link once we have an AST matcher document -->
     20 
     21 <!-- ======================================================================= -->
     22 <h2 id="intro">Introduction</h2>
     23 <!-- ======================================================================= -->
     24 
     25 <p>Clang's AST is different from ASTs produced by some other compilers in that it closely
     26 resembles both the written C++ code and the C++ standard. For example,
     27 parenthesis expressions and compile time constants are available in an unreduced
     28 form in the AST. This makes Clang's AST a good fit for refactoring tools.</p>
     29 
     30 <p>Documentation for all Clang AST nodes is available via the generated
     31 <a href="http://clang.llvm.org/doxygen">Doxygen</a>. The doxygen online
     32 documentation is also indexed by your favorite search engine, which will make
     33 a search for clang and the AST node's class name usually turn up the doxygen
     34 of the class you're looking for (for example, search for: clang ParenExpr).</p>
     35 
     36 <!-- ======================================================================= -->
     37 <h2 id="examine">Examining the AST</h2>
     38 <!-- ======================================================================= -->
     39 
     40 <p>A good way to familarize yourself with the Clang AST is to actually look
     41 at it on some simple example code. Clang has a builtin AST-dump modes, which
     42 can be enabled with the flags -ast-dump and -ast-dump-xml. Note that -ast-dump-xml
     43 currently only works with debug-builds of clang.</p>
     44 
     45 <p>Let's look at a simple example AST:</p>
     46 <pre>
     47 # cat test.cc
     48 int f(int x) {
     49   int result = (x / 42);
     50   return result;
     51 }
     52 
     53 # Clang by default is a frontend for many tools; -cc1 tells it to directly
     54 # use the C++ compiler mode. -undef leaves out some internal declarations.
     55 $ clang -cc1 -undef -ast-dump-xml test.cc
     56 ... cutting out internal declarations of clang ...
     57 &lt;TranslationUnit ptr="0x4871160">
     58  &lt;Function ptr="0x48a5800" name="f" prototype="true">
     59   &lt;FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
     60    &lt;BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     61    &lt;parameters>
     62     &lt;BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     63    &lt;/parameters>
     64   &lt;/FunctionProtoType>
     65   &lt;ParmVar ptr="0x4871d80" name="x" initstyle="c">
     66    &lt;BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     67   &lt;/ParmVar>
     68   &lt;Stmt>
     69 (CompoundStmt 0x48a5a38 &lt;t2.cc:1:14, line:4:1>
     70   (DeclStmt 0x48a59c0 &lt;line:2:3, col:24>
     71     0x48a58c0 "int result =
     72       (ParenExpr 0x48a59a0 &lt;col:16, col:23> 'int'
     73         (BinaryOperator 0x48a5978 &lt;col:17, col:21> 'int' '/'
     74           (ImplicitCastExpr 0x48a5960 &lt;col:17> 'int' &lt;LValueToRValue>
     75             (DeclRefExpr 0x48a5918 &lt;col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
     76           (IntegerLiteral 0x48a5940 &lt;col:21> 'int' 42)))")
     77   (ReturnStmt 0x48a5a18 &lt;line:3:3, col:10>
     78     (ImplicitCastExpr 0x48a5a00 &lt;col:10> 'int' &lt;LValueToRValue>
     79       (DeclRefExpr 0x48a59d8 &lt;col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))
     80 
     81   &lt;/Stmt>
     82  &lt;/Function>
     83 &lt;/TranslationUnit>
     84 </pre>
     85 <p>In general, -ast-dump-xml dumps declarations in an XML-style format and
     86 statements in an S-expression-style format.
     87 The toplevel declaration in a translation unit is always the
     88 <a href="http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html">translation unit declaration</a>.
     89 In this example, our first user written declaration is the
     90 <a href="http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html">function declaration</a>
     91 of 'f'. The body of 'f' is a <a href="http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html">compound statement</a>,
     92 whose child nodes are a <a href="http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html">declaration statement</a>
     93 that declares our result variable, and the
     94 <a href="http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html">return statement</a>.</p>
     95 
     96 <!-- ======================================================================= -->
     97 <h2 id="context">AST Context</h2>
     98 <!-- ======================================================================= -->
     99 
    100 <p>All information about the AST for a translation unit is bundled up in the class
    101 <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html">ASTContext</a>.
    102 It allows traversal of the whole translation unit starting from
    103 <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64">getTranslationUnitDecl</a>,
    104 or to access Clang's <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4">table of identifiers</a>
    105 for the parsed translation unit.</p>
    106 
    107 <!-- ======================================================================= -->
    108 <h2 id="nodes">AST Nodes</h2>
    109 <!-- ======================================================================= -->
    110 
    111 <p>Clang's AST nodes are modeled on a class hierarchy that does not have a common
    112 ancestor. Instead, there are multiple larger hierarchies for basic node types like
    113 <a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a> and
    114 <a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>. Many
    115 important AST nodes derive from <a href="http://clang.llvm.org/doxygen/classclang_1_1Type.html">Type</a>,
    116 <a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>,
    117 <a href="http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html">DeclContext</a> or
    118 <a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>,
    119 with some classes deriving from both Decl and DeclContext.</p>
    120 <p>There are also a multitude of nodes in the AST that are not part of a
    121 larger hierarchy, and are only reachable from specific other nodes,
    122 like <a href="http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>.
    123 </p>
    124 
    125 <p>Thus, to traverse the full AST, one starts from the <a href="http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html">TranslationUnitDecl</a>
    126 and then recursively traverses everything that can be reached from that node
    127 - this information has to be encoded for each specific node type. This algorithm
    128 is encoded in the <a href="http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html">RecursiveASTVisitor</a>.
    129 See the <a href="http://clang.llvm.org/docs/RAVFrontendAction.html">RecursiveASTVisitor tutorial</a>.</p>
    130 
    131 <p>The two most basic nodes in the Clang AST are statements (<a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>)
    132 and declarations (<a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>).
    133 Note that expressions (<a href="http://clang.llvm.org/doxygen/classclang_1_1Expr.html">Expr</a>)
    134 are also statements in Clang's AST.</p>
    135 
    136 </div>
    137 </body>
    138 </html>
    139 
    140