1 ============================= 2 Introduction to the Clang AST 3 ============================= 4 5 This document gives a gentle introduction to the mysteries of the Clang 6 AST. It is targeted at developers who either want to contribute to 7 Clang, or use tools that work based on Clang's AST, like the AST 8 matchers. 9 10 .. raw:: html 11 12 <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> 13 14 `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ 15 16 Introduction 17 ============ 18 19 Clang's AST is different from ASTs produced by some other compilers in 20 that it closely resembles both the written C++ code and the C++ 21 standard. For example, parenthesis expressions and compile time 22 constants are available in an unreduced form in the AST. This makes 23 Clang's AST a good fit for refactoring tools. 24 25 Documentation for all Clang AST nodes is available via the generated 26 `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online 27 documentation is also indexed by your favorite search engine, which will 28 make a search for clang and the AST node's class name usually turn up 29 the doxygen of the class you're looking for (for example, search for: 30 clang ParenExpr). 31 32 Examining the AST 33 ================= 34 35 A good way to familarize yourself with the Clang AST is to actually look 36 at it on some simple example code. Clang has a builtin AST-dump mode, 37 which can be enabled with the flag ``-ast-dump``. 38 39 Let's look at a simple example AST: 40 41 :: 42 43 $ cat test.cc 44 int f(int x) { 45 int result = (x / 42); 46 return result; 47 } 48 49 # Clang by default is a frontend for many tools; -Xclang is used to pass 50 # options directly to the C++ frontend. 51 $ clang -Xclang -ast-dump -fsyntax-only test.cc 52 TranslationUnitDecl 0x5aea0d0 <<invalid sloc>> 53 ... cutting out internal declarations of clang ... 54 `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)' 55 |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int' 56 `-CompoundStmt 0x5aead88 <col:14, line:4:1> 57 |-DeclStmt 0x5aead10 <line:2:3, col:24> 58 | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int' 59 | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int' 60 | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/' 61 | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue> 62 | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int' 63 | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42 64 `-ReturnStmt 0x5aead68 <line:3:3, col:10> 65 `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue> 66 `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int' 67 68 The toplevel declaration in 69 a translation unit is always the `translation unit 70 declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. 71 In this example, our first user written declaration is the `function 72 declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ 73 of "``f``". The body of "``f``" is a `compound 74 statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, 75 whose child nodes are a `declaration 76 statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ 77 that declares our result variable, and the `return 78 statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. 79 80 AST Context 81 =========== 82 83 All information about the AST for a translation unit is bundled up in 84 the class 85 `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. 86 It allows traversal of the whole translation unit starting from 87 `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, 88 or to access Clang's `table of 89 identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ 90 for the parsed translation unit. 91 92 AST Nodes 93 ========= 94 95 Clang's AST nodes are modeled on a class hierarchy that does not have a 96 common ancestor. Instead, there are multiple larger hierarchies for 97 basic node types like 98 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and 99 `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many 100 important AST nodes derive from 101 `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, 102 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, 103 `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ 104 or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with 105 some classes deriving from both Decl and DeclContext. 106 107 There are also a multitude of nodes in the AST that are not part of a 108 larger hierarchy, and are only reachable from specific other nodes, like 109 `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. 110 111 Thus, to traverse the full AST, one starts from the 112 `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ 113 and then recursively traverses everything that can be reached from that 114 node - this information has to be encoded for each specific node type. 115 This algorithm is encoded in the 116 `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. 117 See the `RecursiveASTVisitor 118 tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. 119 120 The two most basic nodes in the Clang AST are statements 121 (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and 122 declarations 123 (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note 124 that expressions 125 (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are 126 also statements in Clang's AST. 127