Home | History | Annotate | Download | only in analyzer
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <title>Checker Developer Manual</title>
      6   <link type="text/css" rel="stylesheet" href="menu.css">
      7   <link type="text/css" rel="stylesheet" href="content.css">
      8   <script type="text/javascript" src="scripts/menu.js"></script>
      9 </head>
     10 <body>
     11 
     12 <div id="page">
     13 <!--#include virtual="menu.html.incl"-->
     14 
     15 <div id="content">
     16 
     17 <h1 style="color:red">This Page Is Under Construction</h1>
     18 
     19 <h1>Checker Developer Manual</h1>
     20 
     21 <p>The static analyzer engine performs symbolic execution of the program and 
     22 relies on a set of checkers to implement the logic for detecting and 
     23 constructing bug reports. This page provides hints and guidelines for anyone 
     24 who is interested in implementing their own checker. The static analyzer is a 
     25 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
     26 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
     27 for general developer guidelines and information. </p>
     28 
     29     <ul>
     30       <li><a href="#start">Getting Started</a></li>
     31       <li><a href="#analyzer">Analyzer Overview</a></li>
     32       <li><a href="#idea">Idea for a Checker</a></li>
     33       <li><a href="#registration">Checker Registration</a></li>
     34       <li><a href="#skeleton">Checker Skeleton</a></li>
     35       <li><a href="#node">Exploded Node</a></li>
     36       <li><a href="#bugs">Bug Reports</a></li>
     37       <li><a href="#ast">AST Visitors</a></li>
     38       <li><a href="#testing">Testing</a></li>
     39       <li><a href="#commands">Useful Commands</a></li>
     40     </ul>
     41 
     42 <h2 id=start>Getting Started</h2>
     43   <ul>
     44     <li>To check out the source code and build the project, follow steps 1-4 of 
     45     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
     46   page.</li>
     47 
     48     <li>The analyzer source code is located under the Clang source tree:
     49     <br><tt>
     50     $ <b>cd llvm/tools/clang</b>
     51     </tt>
     52     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
     53      <tt>test/Analysis</tt>.</li>
     54 
     55     <li>The analyzer regression tests can be executed from the Clang's build 
     56     directory:
     57     <br><tt>
     58     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
     59     </tt></li>
     60     
     61     <li>Analyze a file with the specified checker:
     62     <br><tt>
     63     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
     64     </tt></li>
     65 
     66     <li>List the available checkers:
     67     <br><tt>
     68     $ <b>clang -cc1 -analyzer-checker-help</b>
     69     </tt></li>
     70 
     71     <li>See the analyzer help for different output formats, fine tuning, and 
     72     debug options:
     73     <br><tt>
     74     $ <b>clang -cc1 -help | grep "analyzer"</b>
     75     </tt></li>
     76 
     77   </ul>
     78  
     79 <h2 id=analyzer>Static Analyzer Overview</h2>
     80   The analyzer core performs symbolic execution of the given program. All the 
     81   input values are represented with symbolic values; further, the engine deduces 
     82   the values of all the expressions in the program based on the input symbols  
     83   and the path. The execution is path sensitive and every possible path through 
     84   the program is explored. The explored execution traces are represented with 
     85   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
     86   Each node of the graph is 
     87   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
     88   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
     89   <p>
     90   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
     91   represents the corresponding location in the program (or the CFG graph). 
     92   <tt>ProgramPoint</tt> is also used to record additional information on 
     93   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
     94   kind means that the state is the result of purging dead symbols - the 
     95   analyzer's equivalent of garbage collection. 
     96   <p>
     97   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
     98   represents abstract state of the program. It consists of:
     99   <ul>
    100     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
    101     values
    102     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
    103     <li><tt>GenericDataMap</tt> - constraints on symbolic values
    104   </ul>
    105   
    106   <h3>Interaction with Checkers</h3>
    107   Checkers are not merely passive receivers of the analyzer core changes - they 
    108   actively participate in the <tt>ProgramState</tt> construction through the
    109   <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
    110   of the state. Each time the analyzer engine explores a new statement, it 
    111   notifies each checker registered to listen for that statement, giving it an 
    112   opportunity to either report a bug or modify the state. (As a rule of thumb, 
    113   the checker itself should be stateless.) The checkers are called one after another 
    114   in the predefined order; thus, calling all the checkers adds a chain to the 
    115   <tt>ExplodedGraph</tt>. 
    116   
    117   <h3>Representing Values</h3>
    118   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
    119   objects are used to represent the semantic evaluation of expressions. 
    120   They can represent things like concrete 
    121   integers, symbolic values, or memory locations (which are memory regions). 
    122   They are a discriminated union of "values", symbolic and otherwise. 
    123   If a value isn't symbolic, usually that means there is no symbolic 
    124   information to track. For example, if the value was an integer, such as 
    125   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
    126   and the checker doesn't usually need to track any state with the concrete 
    127   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
    128   a symbolic value. This happens when the analyzer cannot reason about something 
    129   (yet). An example is floating point numbers. In such cases, the 
    130   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal<a>. 
    131   This represents a case that is outside the realm of the analyzer's reasoning 
    132   capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
    133   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
    134   symbols or regions. 
    135   <p>
    136   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
    137   is meant to represent abstract, but named, symbolic value. Symbols represent 
    138   an actual (immutable) value. We might not know what its specific value is, but 
    139   we can associate constraints with that value as we analyze a path. For 
    140   example, we might record that the value of a symbol is greater than 
    141   <tt>0</tt>, etc.
    142   <p>
    143 
    144   <p>
    145   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
    146   It is used to provide a lexicon of how to describe abstract memory. Regions can 
    147   layer on top of other regions, providing a layered approach to representing memory. 
    148   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
    149   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
    150   be used to represent the memory associated with a specific field of that object.
    151   So how do we represent symbolic memory regions? That's what 
    152   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
    153   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
    154   symbol is unique and has a unique name; that symbol names the region.
    155   
    156   <P>
    157   Let's see how the analyzer processes the expressions in the following example:
    158   <p>
    159   <pre class="code_example">
    160   int foo(int x) {
    161      int y = x * 2;
    162      int z = x;
    163      ...
    164   }
    165   </pre>
    166   <p>
    167 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
    168 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
    169 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
    170 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
    171 which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
    172 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
    173 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
    174 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
    175 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
    176 and create a new <tt>SVal</tt> that represents their multiplication (which in 
    177 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
    178 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
    179 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
    180 to the <tt>MemRegion</tt> in the symbolic store.
    181 <br>
    182 The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
    183 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
    184 might reference the same underlying values.
    185 
    186 <p>
    187 To summarize, MemRegions are unique names for blocks of memory. Symbols are 
    188 unique names for abstract symbolic values. Some MemRegions represents abstract 
    189 symbolic chunks of memory, and thus are also based on symbols. SVals are just 
    190 references to values, and can reference either MemRegions, Symbols, or concrete 
    191 values (e.g., the number 1).
    192 
    193   <!-- 
    194   TODO: Add a picture.
    195   <br>
    196   Symbols<br>
    197   FunctionalObjects are used throughout.  
    198   -->
    199 <h2 id=idea>Idea for a Checker</h2>
    200   Here are several questions which you should consider when evaluating your 
    201   checker idea:
    202   <ul>
    203     <li>Can the check be effectively implemented without path-sensitive 
    204     analysis? See <a href="#ast">AST Visitors</a>.</li>
    205     
    206     <li>How high the false positive rate is going to be? Looking at the occurrences 
    207     of the issue you want to write a checker for in the existing code bases might 
    208     give you some ideas. </li>
    209     
    210     <li>How the current limitations of the analysis will effect the false alarm 
    211     rate? Currently, the analyzer only reasons about one procedure at a time (no 
    212     inter-procedural analysis). Also, it uses a simple range tracking based 
    213     solver to model symbolic execution.</li>
    214     
    215     <li>Consult the <a
    216     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 
    217     to get some ideas for new checkers and consider starting with improving/fixing  
    218     bugs in the existing checkers.</li>
    219   </ul>
    220 
    221 <h2 id=registration>Checker Registration</h2>
    222   All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> 
    223   folder. Follow the steps below to register a new checker with the analyzer.
    224 <ol>
    225   <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
    226 <pre class="code_example">
    227 using namespace clang;
    228 using namespace ento;
    229 
    230 namespace {
    231 class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
    232 public:
    233   void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
    234 }
    235 }
    236 void ento::registerNewChecker(CheckerManager &amp;mgr) {
    237   mgr.registerChecker&lt;NewChecker>();
    238 }
    239 </pre>
    240 
    241 <li>Pick the package name for your checker and add the registration code to 
    242 <tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should 
    243 first be developed as experimental. Suppose our new checker performs security 
    244 related checks, then we should add the following lines under 
    245 <tt>SecurityExperimental</tt> package: 
    246 <pre class="code_example">
    247 let ParentPackage = SecurityExperimental in {
    248 ...
    249 def NewChecker : Checker<"NewChecker">,
    250   HelpText<"This text should give a short description of the checks performed.">,
    251   DescFile<"NewChecker.cpp">;
    252 ...
    253 } // end "security.experimental"
    254 </pre>
    255 
    256 <li>Make the source code file visible to CMake by adding it to 
    257 <tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
    258 
    259 <li>Compile and see your checker in the list of available checkers by running:<br>
    260 <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
    261 </ol>
    262    
    263 
    264 <h2 id=skeleton>Checker Skeleton</h2>
    265   There are two main decisions you need to make:
    266   <ul>
    267     <li> Which events the checker should be tracking. 
    268     See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> 
    269     for the list of available checker callbacks.</li>
    270     <li> What data you want to store as part of the checker-specific program 
    271     state. Try to minimize the checker state as much as possible. </li>
    272   </ul>
    273 
    274 <h2 id=bugs>Bug Reports</h2>
    275 
    276 <h2 id=ast>AST Visitors</h2>
    277   Some checks might not require path-sensitivity to be effective. Simple AST walk 
    278   might be sufficient. If that is the case, consider implementing a Clang 
    279   compiler warning. On the other hand, a check might not be acceptable as a compiler 
    280   warning; for example, because of a relatively high false positive rate. In this 
    281   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
    282   <tt><b>checkASTCodeBody</b></tt> are your best friends. 
    283 
    284 <h2 id=testing>Testing</h2>
    285   Every patch should be well tested with Clang regression tests. The checker tests 
    286   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
    287   execute the following from the <tt>clang</tt> build directory:
    288     <pre class="code">
    289     $ <b>TESTDIRS=Analysis make test</b>
    290     </pre>
    291 
    292 <h2 id=commands>Useful Commands/Debugging Hints</h2>
    293 <ul>
    294 <li>
    295 While investigating a checker-related issue, instruct the analyzer to only 
    296 execute a single checker:
    297 <br><tt>
    298 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
    299 </tt>
    300 </li>
    301 <li>
    302 To dump AST:
    303 <br><tt>
    304 $ <b>clang -cc1 -ast-dump test.c</b>
    305 </tt>
    306 </li>
    307 <li>
    308 To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
    309 <br><tt>
    310 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
    311 </tt> 
    312 </li>
    313 <li>
    314 To see all available debug checkers:
    315 <br><tt>
    316 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
    317 </tt>
    318 </li>
    319 <li>
    320 To see which function is failing while processing a large file use 
    321 <tt>-analyzer-display-progress</tt> option.
    322 </li>
    323 <li>
    324 While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
    325 instead of <tt>clang --analyze</tt>, as the later would call the compiler 
    326 in a separate process.
    327 </li>
    328 <li>
    329 To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
    330 debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
    331 execute:
    332 <br><tt> 
    333 (gdb) <b>p ViewGraph(0)</b>
    334 </tt>
    335 </li>
    336 <li>
    337 To see the <tt>ProgramState</tt> while debugging use the following command. 
    338 <br><tt>
    339 (gdb) <b>p State->dump()</b>
    340 </tt> 
    341 </li>
    342 <li>
    343 To see <tt>clang::Expr</tt> while debugging use the following command. If you 
    344 pass in a SourceManager object, it will also dump the corresponding line in the 
    345 source code.
    346 <br><tt>
    347 (gdb) <b>p E->dump()</b>
    348 </tt> 
    349 </li>
    350 <li>
    351 To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
    352 <br><tt>
    353 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
    354 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
    355 </tt>
    356 </li>
    357 </ul>
    358 
    359 </div>
    360 </div>
    361 </body>
    362 </html>
    363