Home | History | Annotate | Download | only in docs
      1 =============================
      2 Introduction to the Clang AST
      3 =============================
      4 
      5 This document gives a gentle introduction to the mysteries of the Clang
      6 AST. It is targeted at developers who either want to contribute to
      7 Clang, or use tools that work based on Clang's AST, like the AST
      8 matchers.
      9 
     10 .. raw:: html
     11 
     12   <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center>
     13 
     14 `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_
     15 
     16 Introduction
     17 ============
     18 
     19 Clang's AST is different from ASTs produced by some other compilers in
     20 that it closely resembles both the written C++ code and the C++
     21 standard. For example, parenthesis expressions and compile time
     22 constants are available in an unreduced form in the AST. This makes
     23 Clang's AST a good fit for refactoring tools.
     24 
     25 Documentation for all Clang AST nodes is available via the generated
     26 `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
     27 documentation is also indexed by your favorite search engine, which will
     28 make a search for clang and the AST node's class name usually turn up
     29 the doxygen of the class you're looking for (for example, search for:
     30 clang ParenExpr).
     31 
     32 Examining the AST
     33 =================
     34 
     35 A good way to familarize yourself with the Clang AST is to actually look
     36 at it on some simple example code. Clang has a builtin AST-dump mode,
     37 which can be enabled with the flag ``-ast-dump``.
     38 
     39 Let's look at a simple example AST:
     40 
     41 ::
     42 
     43     $ cat test.cc
     44     int f(int x) {
     45       int result = (x / 42);
     46       return result;
     47     }
     48 
     49     # Clang by default is a frontend for many tools; -Xclang is used to pass
     50     # options directly to the C++ frontend.
     51     $ clang -Xclang -ast-dump -fsyntax-only test.cc
     52     TranslationUnitDecl 0x5aea0d0 <<invalid sloc>>
     53     ... cutting out internal declarations of clang ...
     54     `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)'
     55       |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int'
     56       `-CompoundStmt 0x5aead88 <col:14, line:4:1>
     57         |-DeclStmt 0x5aead10 <line:2:3, col:24>
     58         | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int'
     59         |   `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int'
     60         |     `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/'
     61         |       |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue>
     62         |       | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int'
     63         |       `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42
     64         `-ReturnStmt 0x5aead68 <line:3:3, col:10>
     65           `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue>
     66             `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int'
     67 
     68 The toplevel declaration in
     69 a translation unit is always the `translation unit
     70 declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
     71 In this example, our first user written declaration is the `function
     72 declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
     73 of "``f``". The body of "``f``" is a `compound
     74 statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
     75 whose child nodes are a `declaration
     76 statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
     77 that declares our result variable, and the `return
     78 statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
     79 
     80 AST Context
     81 ===========
     82 
     83 All information about the AST for a translation unit is bundled up in
     84 the class
     85 `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
     86 It allows traversal of the whole translation unit starting from
     87 `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
     88 or to access Clang's `table of
     89 identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
     90 for the parsed translation unit.
     91 
     92 AST Nodes
     93 =========
     94 
     95 Clang's AST nodes are modeled on a class hierarchy that does not have a
     96 common ancestor. Instead, there are multiple larger hierarchies for
     97 basic node types like
     98 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
     99 `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
    100 important AST nodes derive from
    101 `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
    102 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
    103 `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
    104 or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
    105 some classes deriving from both Decl and DeclContext.
    106 
    107 There are also a multitude of nodes in the AST that are not part of a
    108 larger hierarchy, and are only reachable from specific other nodes, like
    109 `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
    110 
    111 Thus, to traverse the full AST, one starts from the
    112 `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
    113 and then recursively traverses everything that can be reached from that
    114 node - this information has to be encoded for each specific node type.
    115 This algorithm is encoded in the
    116 `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
    117 See the `RecursiveASTVisitor
    118 tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
    119 
    120 The two most basic nodes in the Clang AST are statements
    121 (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
    122 declarations
    123 (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
    124 that expressions
    125 (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
    126 also statements in Clang's AST.
    127