Home | History | Annotate | only in /external/clang/lib/StaticAnalyzer
Up to higher level directory
NameDateSize
Checkers/10-Jul-2012
CMakeLists.txt10-Jul-201277
Core/10-Jul-2012
Frontend/10-Jul-2012
Makefile10-Jul-2012588
README.txt10-Jul-20126.6K

README.txt

      1 //===----------------------------------------------------------------------===//
      2 // Clang Static Analyzer
      3 //===----------------------------------------------------------------------===//
      4 
      5 = Library Structure =
      6 
      7 The analyzer library has two layers: a (low-level) static analysis
      8 engine (GRExprEngine.cpp and friends), and some static checkers
      9 (*Checker.cpp).  The latter are built on top of the former via the
     10 Checker and CheckerVisitor interfaces (Checker.h and
     11 CheckerVisitor.h).  The Checker interface is designed to be minimal
     12 and simple for checker writers, and attempts to isolate them from much
     13 of the gore of the internal analysis engine.
     14 
     15 = How It Works =
     16 
     17 The analyzer is inspired by several foundational research papers ([1],
     18 [2]).  (FIXME: kremenek to add more links)
     19 
     20 In a nutshell, the analyzer is basically a source code simulator that
     21 traces out possible paths of execution.  The state of the program
     22 (values of variables and expressions) is encapsulated by the state
     23 (ProgramState).  A location in the program is called a program point
     24 (ProgramPoint), and the combination of state and program point is a
     25 node in an exploded graph (ExplodedGraph).  The term "exploded" comes
     26 from exploding the control-flow edges in the control-flow graph (CFG).
     27 
     28 Conceptually the analyzer does a reachability analysis through the
     29 ExplodedGraph.  We start at a root node, which has the entry program
     30 point and initial state, and then simulate transitions by analyzing
     31 individual expressions.  The analysis of an expression can cause the
     32 state to change, resulting in a new node in the ExplodedGraph with an
     33 updated program point and an updated state.  A bug is found by hitting
     34 a node that satisfies some "bug condition" (basically a violation of a
     35 checking invariant).
     36 
     37 The analyzer traces out multiple paths by reasoning about branches and
     38 then bifurcating the state: on the true branch the conditions of the
     39 branch are assumed to be true and on the false branch the conditions
     40 of the branch are assumed to be false.  Such "assumptions" create
     41 constraints on the values of the program, and those constraints are
     42 recorded in the ProgramState object (and are manipulated by the
     43 ConstraintManager).  If assuming the conditions of a branch would
     44 cause the constraints to be unsatisfiable, the branch is considered
     45 infeasible and that path is not taken.  This is how we get
     46 path-sensitivity.  We reduce exponential blow-up by caching nodes.  If
     47 a new node with the same state and program point as an existing node
     48 would get generated, the path "caches out" and we simply reuse the
     49 existing node.  Thus the ExplodedGraph is not a DAG; it can contain
     50 cycles as paths loop back onto each other and cache out.
     51 
     52 ProgramState and ExplodedNodes are basically immutable once created.  Once
     53 one creates a ProgramState, you need to create a new one to get a new
     54 ProgramState.  This immutability is key since the ExplodedGraph represents
     55 the behavior of the analyzed program from the entry point.  To
     56 represent these efficiently, we use functional data structures (e.g.,
     57 ImmutableMaps) which share data between instances.
     58 
     59 Finally, individual Checkers work by also manipulating the analysis
     60 state.  The analyzer engine talks to them via a visitor interface.
     61 For example, the PreVisitCallExpr() method is called by GRExprEngine
     62 to tell the Checker that we are about to analyze a CallExpr, and the
     63 checker is asked to check for any preconditions that might not be
     64 satisfied.  The checker can do nothing, or it can generate a new
     65 ProgramState and ExplodedNode which contains updated checker state.  If it
     66 finds a bug, it can tell the BugReporter object about the bug,
     67 providing it an ExplodedNode which is the last node in the path that
     68 triggered the problem.
     69 
     70 = Notes about C++ =
     71 
     72 Since now constructors are seen before the variable that is constructed 
     73 in the CFG, we create a temporary object as the destination region that 
     74 is constructed into. See ExprEngine::VisitCXXConstructExpr().
     75 
     76 In ExprEngine::processCallExit(), we always bind the object region to the
     77 evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
     78 corresponding lazy compound value if the variable is not a reference, and
     79 bind the variable region to the lazy compound value. If the variable
     80 is a reference, just use the object region as the initilizer value.
     81 
     82 Before entering a C++ method (or ctor/dtor), the 'this' region is bound
     83 to the object region. In ctors, we synthesize 'this' region with  
     84 CXXRecordDecl*, which means we do not use type qualifiers. In methods, we
     85 synthesize 'this' region with CXXMethodDecl*, which has getThisType() 
     86 taking type qualifiers into account. It does not matter we use qualified
     87 'this' region in one method and unqualified 'this' region in another
     88 method, because we only need to ensure the 'this' region is consistent 
     89 when we synthesize it and create it directly from CXXThisExpr in a single
     90 method call.
     91 
     92 = Working on the Analyzer =
     93 
     94 If you are interested in bringing up support for C++ expressions, the
     95 best place to look is the visitation logic in GRExprEngine, which
     96 handles the simulation of individual expressions.  There are plenty of
     97 examples there of how other expressions are handled.
     98 
     99 If you are interested in writing checkers, look at the Checker and
    100 CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
    101 at the files named *Checker.cpp for examples on how you can implement
    102 these interfaces.
    103 
    104 = Debugging the Analyzer =
    105 
    106 There are some useful command-line options for debugging.  For example:
    107 
    108 $ clang -cc1 -help | grep analyze
    109  -analyze-function <value>
    110  -analyzer-display-progress
    111  -analyzer-viz-egraph-graphviz
    112  ...
    113 
    114 The first allows you to specify only analyzing a specific function.
    115 The second prints to the console what function is being analyzed.  The
    116 third generates a graphviz dot file of the ExplodedGraph.  This is
    117 extremely useful when debugging the analyzer and viewing the
    118 simulation results.
    119 
    120 Of course, viewing the CFG (Control-Flow Graph) is also useful:
    121 
    122 $ clang -cc1 -help | grep cfg
    123  -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
    124  -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
    125  -cfg-dump               Display Control-Flow Graphs
    126  -cfg-view               View Control-Flow Graphs using GraphViz
    127  -unoptimized-cfg        Generate unoptimized CFGs for all analyses
    128 
    129 -cfg-dump dumps a textual representation of the CFG to the console,
    130 and -cfg-view creates a GraphViz representation.
    131 
    132 = References =
    133 
    134 [1] Precise interprocedural dataflow analysis via graph reachability,
    135     T Reps, S Horwitz, and M Sagiv, POPL '95,
    136     http://portal.acm.org/citation.cfm?id=199462
    137 
    138 [2] A memory model for static analysis of C programs, Z Xu, T
    139     Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
    140