1 ============================= 2 Introduction to the Clang AST 3 ============================= 4 5 This document gives a gentle introduction to the mysteries of the Clang 6 AST. It is targeted at developers who either want to contribute to 7 Clang, or use tools that work based on Clang's AST, like the AST 8 matchers. 9 10 .. raw:: html 11 12 <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> 13 14 `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ 15 16 Introduction 17 ============ 18 19 Clang's AST is different from ASTs produced by some other compilers in 20 that it closely resembles both the written C++ code and the C++ 21 standard. For example, parenthesis expressions and compile time 22 constants are available in an unreduced form in the AST. This makes 23 Clang's AST a good fit for refactoring tools. 24 25 Documentation for all Clang AST nodes is available via the generated 26 `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online 27 documentation is also indexed by your favorite search engine, which will 28 make a search for clang and the AST node's class name usually turn up 29 the doxygen of the class you're looking for (for example, search for: 30 clang ParenExpr). 31 32 Examining the AST 33 ================= 34 35 A good way to familarize yourself with the Clang AST is to actually look 36 at it on some simple example code. Clang has a builtin AST-dump modes, 37 which can be enabled with the flags ``-ast-dump`` and ``-ast-dump-xml``. Note 38 that ``-ast-dump-xml`` currently only works with debug builds of clang. 39 40 Let's look at a simple example AST: 41 42 :: 43 44 $ cat test.cc 45 int f(int x) { 46 int result = (x / 42); 47 return result; 48 } 49 50 # Clang by default is a frontend for many tools; -cc1 tells it to directly 51 # use the C++ compiler mode. -undef leaves out some internal declarations. 52 $ clang -cc1 -undef -ast-dump-xml test.cc 53 ... cutting out internal declarations of clang ... 54 <TranslationUnit ptr="0x4871160"> 55 <Function ptr="0x48a5800" name="f" prototype="true"> 56 <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0"> 57 <BuiltinType ptr="0x4871250" canonical="0x4871250"/> 58 <parameters> 59 <BuiltinType ptr="0x4871250" canonical="0x4871250"/> 60 </parameters> 61 </FunctionProtoType> 62 <ParmVar ptr="0x4871d80" name="x" initstyle="c"> 63 <BuiltinType ptr="0x4871250" canonical="0x4871250"/> 64 </ParmVar> 65 <Stmt> 66 (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1> 67 (DeclStmt 0x48a59c0 <line:2:3, col:24> 68 0x48a58c0 "int result = 69 (ParenExpr 0x48a59a0 <col:16, col:23> 'int' 70 (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/' 71 (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue> 72 (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int')) 73 (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))") 74 (ReturnStmt 0x48a5a18 <line:3:3, col:10> 75 (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue> 76 (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int')))) 77 78 </Stmt> 79 </Function> 80 </TranslationUnit> 81 82 In general, ``-ast-dump-xml`` dumps declarations in an XML-style format and 83 statements in an S-expression-style format. The toplevel declaration in 84 a translation unit is always the `translation unit 85 declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. 86 In this example, our first user written declaration is the `function 87 declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ 88 of "``f``". The body of "``f``" is a `compound 89 statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, 90 whose child nodes are a `declaration 91 statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ 92 that declares our result variable, and the `return 93 statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. 94 95 AST Context 96 =========== 97 98 All information about the AST for a translation unit is bundled up in 99 the class 100 `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. 101 It allows traversal of the whole translation unit starting from 102 `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, 103 or to access Clang's `table of 104 identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ 105 for the parsed translation unit. 106 107 AST Nodes 108 ========= 109 110 Clang's AST nodes are modeled on a class hierarchy that does not have a 111 common ancestor. Instead, there are multiple larger hierarchies for 112 basic node types like 113 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and 114 `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many 115 important AST nodes derive from 116 `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, 117 `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, 118 `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ 119 or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with 120 some classes deriving from both Decl and DeclContext. 121 122 There are also a multitude of nodes in the AST that are not part of a 123 larger hierarchy, and are only reachable from specific other nodes, like 124 `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. 125 126 Thus, to traverse the full AST, one starts from the 127 `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ 128 and then recursively traverses everything that can be reached from that 129 node - this information has to be encoded for each specific node type. 130 This algorithm is encoded in the 131 `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. 132 See the `RecursiveASTVisitor 133 tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. 134 135 The two most basic nodes in the Clang AST are statements 136 (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and 137 declarations 138 (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note 139 that expressions 140 (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are 141 also statements in Clang's AST. 142