Home | History | Annotate | Download | only in docs
      1 =============================
      2 Introduction to the Clang AST
      3 =============================
      4 
      5 This document gives a gentle introduction to the mysteries of the Clang
      6 AST. It is targeted at developers who either want to contribute to
      7 Clang, or use tools that work based on Clang's AST, like the AST
      8 matchers.
      9 
     10 .. raw:: html
     11 
     12   <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center>
     13 
     14 `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_
     15 
     16 Introduction
     17 ============
     18 
     19 Clang's AST is different from ASTs produced by some other compilers in
     20 that it closely resembles both the written C++ code and the C++
     21 standard. For example, parenthesis expressions and compile time
     22 constants are available in an unreduced form in the AST. This makes
     23 Clang's AST a good fit for refactoring tools.
     24 
     25 Documentation for all Clang AST nodes is available via the generated
     26 `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
     27 documentation is also indexed by your favorite search engine, which will
     28 make a search for clang and the AST node's class name usually turn up
     29 the doxygen of the class you're looking for (for example, search for:
     30 clang ParenExpr).
     31 
     32 Examining the AST
     33 =================
     34 
     35 A good way to familarize yourself with the Clang AST is to actually look
     36 at it on some simple example code. Clang has a builtin AST-dump modes,
     37 which can be enabled with the flags ``-ast-dump`` and ``-ast-dump-xml``. Note
     38 that ``-ast-dump-xml`` currently only works with debug builds of clang.
     39 
     40 Let's look at a simple example AST:
     41 
     42 ::
     43 
     44     $ cat test.cc
     45     int f(int x) {
     46       int result = (x / 42);
     47       return result;
     48     }
     49 
     50     # Clang by default is a frontend for many tools; -cc1 tells it to directly
     51     # use the C++ compiler mode. -undef leaves out some internal declarations.
     52     $ clang -cc1 -undef -ast-dump-xml test.cc
     53     ... cutting out internal declarations of clang ...
     54     <TranslationUnit ptr="0x4871160">
     55      <Function ptr="0x48a5800" name="f" prototype="true">
     56       <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
     57        <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     58        <parameters>
     59         <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     60        </parameters>
     61       </FunctionProtoType>
     62       <ParmVar ptr="0x4871d80" name="x" initstyle="c">
     63        <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
     64       </ParmVar>
     65       <Stmt>
     66     (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1>
     67       (DeclStmt 0x48a59c0 <line:2:3, col:24>
     68         0x48a58c0 "int result =
     69           (ParenExpr 0x48a59a0 <col:16, col:23> 'int'
     70             (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/'
     71               (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue>
     72                 (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
     73               (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))")
     74       (ReturnStmt 0x48a5a18 <line:3:3, col:10>
     75         (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue>
     76           (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))
     77 
     78       </Stmt>
     79      </Function>
     80     </TranslationUnit>
     81 
     82 In general, ``-ast-dump-xml`` dumps declarations in an XML-style format and
     83 statements in an S-expression-style format. The toplevel declaration in
     84 a translation unit is always the `translation unit
     85 declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
     86 In this example, our first user written declaration is the `function
     87 declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
     88 of "``f``". The body of "``f``" is a `compound
     89 statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
     90 whose child nodes are a `declaration
     91 statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
     92 that declares our result variable, and the `return
     93 statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
     94 
     95 AST Context
     96 ===========
     97 
     98 All information about the AST for a translation unit is bundled up in
     99 the class
    100 `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
    101 It allows traversal of the whole translation unit starting from
    102 `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
    103 or to access Clang's `table of
    104 identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
    105 for the parsed translation unit.
    106 
    107 AST Nodes
    108 =========
    109 
    110 Clang's AST nodes are modeled on a class hierarchy that does not have a
    111 common ancestor. Instead, there are multiple larger hierarchies for
    112 basic node types like
    113 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
    114 `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
    115 important AST nodes derive from
    116 `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
    117 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
    118 `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
    119 or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
    120 some classes deriving from both Decl and DeclContext.
    121 
    122 There are also a multitude of nodes in the AST that are not part of a
    123 larger hierarchy, and are only reachable from specific other nodes, like
    124 `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
    125 
    126 Thus, to traverse the full AST, one starts from the
    127 `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
    128 and then recursively traverses everything that can be reached from that
    129 node - this information has to be encoded for each specific node type.
    130 This algorithm is encoded in the
    131 `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
    132 See the `RecursiveASTVisitor
    133 tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
    134 
    135 The two most basic nodes in the Clang AST are statements
    136 (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
    137 declarations
    138 (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
    139 that expressions
    140 (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
    141 also statements in Clang's AST.
    142