Home | History | Annotate | Download | only in analyzer
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <title>Checker Developer Manual</title>
      6   <link type="text/css" rel="stylesheet" href="menu.css">
      7   <link type="text/css" rel="stylesheet" href="content.css">
      8   <script type="text/javascript" src="scripts/menu.js"></script>
      9 </head>
     10 <body>
     11 
     12 <div id="page">
     13 <!--#include virtual="menu.html.incl"-->
     14 
     15 <div id="content">
     16 
     17 <h3 style="color:red">This Page Is Under Construction</h3>
     18 
     19 <h1>Checker Developer Manual</h1>
     20 
     21 <p>The static analyzer engine performs path-sensitive exploration of the program and 
     22 relies on a set of checkers to implement the logic for detecting and 
     23 constructing specific bug reports. Anyone who is interested in implementing their own 
     24 checker, should check out the Building a Checker in 24 Hours talk 
     25 (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
     26  <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>) 
     27 and refer to this page for additional information on writing a checker. The static analyzer is a 
     28 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
     29 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> 
     30 for developer guidelines and send your questions and proposals to 
     31 <a href=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. 
     32 </p>
     33 
     34     <ul>
     35       <li><a href="#start">Getting Started</a></li>
     36       <li><a href="#analyzer">Static Analyzer Overview</a>
     37       <ul>
     38         <li><a href="#interaction">Interaction with Checkers</a></li>
     39         <li><a href="#values">Representing Values</a></li>
     40       </ul></li>
     41       <li><a href="#idea">Idea for a Checker</a></li>
     42       <li><a href="#registration">Checker Registration</a></li>
     43       <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
     44       <li><a href="#extendingstates">Custom Program States</a></li>
     45       <li><a href="#bugs">Bug Reports</a></li>
     46       <li><a href="#ast">AST Visitors</a></li>
     47       <li><a href="#testing">Testing</a></li>
     48       <li><a href="#commands">Useful Commands/Debugging Hints</a></li>
     49       <li><a href="#additioninformation">Additional Sources of Information</a></li>
     50     </ul>
     51 
     52 <h2 id=start>Getting Started</h2>
     53   <ul>
     54     <li>To check out the source code and build the project, follow steps 1-4 of 
     55     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
     56   page.</li>
     57 
     58     <li>The analyzer source code is located under the Clang source tree:
     59     <br><tt>
     60     $ <b>cd llvm/tools/clang</b>
     61     </tt>
     62     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
     63      <tt>test/Analysis</tt>.</li>
     64 
     65     <li>The analyzer regression tests can be executed from the Clang's build 
     66     directory:
     67     <br><tt>
     68     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
     69     </tt></li>
     70     
     71     <li>Analyze a file with the specified checker:
     72     <br><tt>
     73     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
     74     </tt></li>
     75 
     76     <li>List the available checkers:
     77     <br><tt>
     78     $ <b>clang -cc1 -analyzer-checker-help</b>
     79     </tt></li>
     80 
     81     <li>See the analyzer help for different output formats, fine tuning, and 
     82     debug options:
     83     <br><tt>
     84     $ <b>clang -cc1 -help | grep "analyzer"</b>
     85     </tt></li>
     86 
     87   </ul>
     88  
     89 <h2 id=analyzer>Static Analyzer Overview</h2>
     90   The analyzer core performs symbolic execution of the given program. All the 
     91   input values are represented with symbolic values; further, the engine deduces 
     92   the values of all the expressions in the program based on the input symbols  
     93   and the path. The execution is path sensitive and every possible path through 
     94   the program is explored. The explored execution traces are represented with 
     95   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
     96   Each node of the graph is 
     97   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
     98   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
     99   <p>
    100   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
    101   represents the corresponding location in the program (or the CFG graph). 
    102   <tt>ProgramPoint</tt> is also used to record additional information on 
    103   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
    104   kind means that the state is the result of purging dead symbols - the 
    105   analyzer's equivalent of garbage collection. 
    106   <p>
    107   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
    108   represents abstract state of the program. It consists of:
    109   <ul>
    110     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
    111     values
    112     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
    113     <li><tt>GenericDataMap</tt> - constraints on symbolic values
    114   </ul>
    115   
    116   <h3 id=interaction>Interaction with Checkers</h3>
    117   Checkers are not merely passive receivers of the analyzer core changes - they 
    118   actively participate in the <tt>ProgramState</tt> construction through the
    119   <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
    120   of the state. Each time the analyzer engine explores a new statement, it 
    121   notifies each checker registered to listen for that statement, giving it an 
    122   opportunity to either report a bug or modify the state. (As a rule of thumb, 
    123   the checker itself should be stateless.) The checkers are called one after another 
    124   in the predefined order; thus, calling all the checkers adds a chain to the 
    125   <tt>ExplodedGraph</tt>. 
    126   
    127   <h3 id=values>Representing Values</h3>
    128   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
    129   objects are used to represent the semantic evaluation of expressions. 
    130   They can represent things like concrete 
    131   integers, symbolic values, or memory locations (which are memory regions). 
    132   They are a discriminated union of "values", symbolic and otherwise. 
    133   If a value isn't symbolic, usually that means there is no symbolic 
    134   information to track. For example, if the value was an integer, such as 
    135   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
    136   and the checker doesn't usually need to track any state with the concrete 
    137   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
    138   a symbolic value. This happens when the analyzer cannot reason about something 
    139   (yet). An example is floating point numbers. In such cases, the 
    140   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
    141   This represents a case that is outside the realm of the analyzer's reasoning 
    142   capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
    143   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
    144   symbols or regions. 
    145   <p>
    146   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
    147   is meant to represent abstract, but named, symbolic value. Symbols represent 
    148   an actual (immutable) value. We might not know what its specific value is, but 
    149   we can associate constraints with that value as we analyze a path. For 
    150   example, we might record that the value of a symbol is greater than 
    151   <tt>0</tt>, etc.
    152   <p>
    153 
    154   <p>
    155   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
    156   It is used to provide a lexicon of how to describe abstract memory. Regions can 
    157   layer on top of other regions, providing a layered approach to representing memory. 
    158   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
    159   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
    160   be used to represent the memory associated with a specific field of that object.
    161   So how do we represent symbolic memory regions? That's what 
    162   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
    163   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
    164   symbol is unique and has a unique name; that symbol names the region.
    165   
    166   <P>
    167   Let's see how the analyzer processes the expressions in the following example:
    168   <p>
    169   <pre class="code_example">
    170   int foo(int x) {
    171      int y = x * 2;
    172      int z = x;
    173      ...
    174   }
    175   </pre>
    176   <p>
    177 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
    178 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
    179 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
    180 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
    181 which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
    182 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
    183 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
    184 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
    185 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
    186 and create a new <tt>SVal</tt> that represents their multiplication (which in 
    187 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
    188 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
    189 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
    190 to the <tt>MemRegion</tt> in the symbolic store.
    191 <br>
    192 The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
    193 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
    194 might reference the same underlying values.
    195 
    196 <p>
    197 To summarize, MemRegions are unique names for blocks of memory. Symbols are 
    198 unique names for abstract symbolic values. Some MemRegions represents abstract 
    199 symbolic chunks of memory, and thus are also based on symbols. SVals are just 
    200 references to values, and can reference either MemRegions, Symbols, or concrete 
    201 values (e.g., the number 1).
    202 
    203   <!-- 
    204   TODO: Add a picture.
    205   <br>
    206   Symbols<br>
    207   FunctionalObjects are used throughout.  
    208   -->
    209 
    210 <h2 id=idea>Idea for a Checker</h2>
    211   Here are several questions which you should consider when evaluating your 
    212   checker idea:
    213   <ul>
    214     <li>Can the check be effectively implemented without path-sensitive 
    215     analysis? See <a href="#ast">AST Visitors</a>.</li>
    216     
    217     <li>How high the false positive rate is going to be? Looking at the occurrences 
    218     of the issue you want to write a checker for in the existing code bases might 
    219     give you some ideas. </li>
    220     
    221     <li>How the current limitations of the analysis will effect the false alarm 
    222     rate? Currently, the analyzer only reasons about one procedure at a time (no 
    223     inter-procedural analysis). Also, it uses a simple range tracking based 
    224     solver to model symbolic execution.</li>
    225     
    226     <li>Consult the <a
    227     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 
    228     to get some ideas for new checkers and consider starting with improving/fixing  
    229     bugs in the existing checkers.</li>
    230   </ul>
    231 
    232 <p>Once an idea for a checker has been chosen, there are two key decisions that
    233 need to be made:
    234   <ul>
    235     <li> Which events the checker should be tracking. This is discussed in more
    236     detail in the section <a href="#events_callbacks">Events, Callbacks, and
    237     Checker Class Structure</a>.
    238     <li> What checker-specific data needs to be stored as part of the program
    239     state (if any). This should be minimized as much as possible. More detail about
    240     implementing custom program state is given in section <a
    241     href="#extendingstates">Custom Program States</a>.
    242   </ul>
    243 
    244 
    245 <h2 id=registration>Checker Registration</h2>
    246   All checker implementation files are located in
    247   <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
    248   how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of 
    249   stream APIs, was registered with the analyzer.
    250   Similar steps should be followed for a new checker.
    251 <ol>
    252   <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
    253   created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
    254   <li>The following registration code was added to the implementation file:
    255 <pre class="code_example">
    256 void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
    257   mgr.registerChecker&lt;SimpleStreamChecker&gt();
    258 }
    259 </pre>
    260 <li>A package was selected for the checker and the checker was defined in the
    261 table of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all
    262 checkers should first be developed as "alpha", and the SimpleStreamChecker
    263 performs UNIX API checks, the correct package is "alpha.unix", and the following
    264 was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
    265 <pre class="code_example">
    266 let ParentPackage = UnixAlpha in {
    267 ...
    268 def SimpleStreamChecker : Checker<"SimpleStream">,
    269   HelpText<"Check for misuses of stream APIs">,
    270   DescFile<"SimpleStreamChecker.cpp">;
    271 ...
    272 } // end "alpha.unix"
    273 </pre>
    274 
    275 <li>The source code file was made visible to CMake by adding it to
    276 <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
    277 
    278 </ol>
    279 
    280 After adding a new checker to the analyzer, one can verify that the new checker
    281 was successfully added by seeing if it appears in the list of available checkers:
    282 <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
    283 
    284 <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
    285 
    286 <p> All checkers inherit from the <tt><a
    287 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
    288 Checker</a></tt> template class; the template parameter(s) describe the type of
    289 events that the checker is interested in processing. The various types of events
    290 that are available are described in the file <a
    291 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
    292 CheckerDocumentation.cpp</a>
    293 
    294 <p> For each event type requested, a corresponding callback function must be
    295 defined in the checker class (<a
    296 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
    297 CheckerDocumentation.cpp</a> shows the
    298 correct function name and signature for each event type).
    299 
    300 <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
    301 take action at the following times:
    302 
    303 <ul>
    304 <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
    305 If so, check the parameter being passed.
    306 <li>After making a function call, check if the function is <tt>fopen</tt>. If
    307 so, process the return value.
    308 <li>When values go out of scope, check whether they are still-open file
    309 descriptors, and report a bug if so. In addition, remove any information about
    310 them from the program state in order to keep the state as small as possible.
    311 <li>When file pointers "escape" (are used in a way that the analyzer can no longer
    312 track them), mark them as such. This prevents false positives in the cases where
    313 the analyzer cannot be sure whether the file was closed or not.
    314 </ul>
    315 
    316 <p>These events that will be used for each of these actions are, respectively, <a
    317 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
    318 <a
    319 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
    320 <a
    321 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
    322 and <a
    323 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
    324 The high-level structure of the checker's class is thus:
    325 
    326 <pre class="code_example">
    327 class SimpleStreamChecker : public Checker&lt;check::PreCall,
    328                                            check::PostCall,
    329                                            check::DeadSymbols,
    330                                            check::PointerEscape&gt; {
    331 public:
    332 
    333   void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
    334 
    335   void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
    336 
    337   void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
    338 
    339   ProgramStateRef checkPointerEscape(ProgramStateRef State,
    340                                      const InvalidatedSymbols &amp;Escaped,
    341                                      const CallEvent *Call,
    342                                      PointerEscapeKind Kind) const;
    343 };
    344 </pre>
    345 
    346 <h2 id=extendingstates>Custom Program States</h2>
    347 
    348 <p> Checkers often need to keep track of information specific to the checks they
    349 perform. However, since checkers have no guarantee about the order in which the
    350 program will be explored, or even that all possible paths will be explored, this
    351 state information cannot be kept within individual checkers. Therefore, if
    352 checkers need to store custom information, they need to add new categories of
    353 data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
    354 several macros designed for this purpose. They are:
    355 
    356 <ul>
    357 <li><a
    358 href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
    359 Used when the state information is a single value. The methods available for
    360 state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
    361 <tt>remove</tt>.
    362 <li><a
    363 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
    364 Used when the state information is a list of values. The methods available for
    365 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
    366 <tt>remove</tt>, and <tt>contains</tt>.
    367 <li><a
    368 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
    369 Used when the state information is a set of values. The methods available for
    370 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
    371 <tt>remove</tt>, and <tt>contains</tt>.
    372 <li><a
    373 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
    374 Used when the state information is a map from a key to a value. The methods
    375 available for state types declared with this macro are <tt>add</tt>,
    376 <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
    377 </ul>
    378 
    379 <p>All of these macros take as parameters the name to be used for the custom
    380 category of state information and the data type(s) to be used for storage. The
    381 data type(s) specified will become the parameter type and/or return type of the
    382 methods that manipulate the new category of state information. Each of these
    383 methods are templated with the name of the custom data type.
    384 
    385 <p>For example, a common case is the need to track data associated with a
    386 symbolic expression; a map type is the most logical way to implement this. The
    387 key for this map will be a pointer to a symbolic expression
    388 (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
    389 expression is an integer, then the custom category of state information would be
    390 declared as
    391 
    392 <pre class="code_example">
    393 REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
    394 </pre>
    395 
    396 The data would be accessed with the function
    397 
    398 <pre class="code_example">
    399 ProgramStateRef state;
    400 SymbolRef Sym;
    401 ...
    402 int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
    403 </pre>
    404 
    405 and set with the function
    406 
    407 <pre class="code_example">
    408 ProgramStateRef state;
    409 SymbolRef Sym;
    410 int newValue;
    411 ...
    412 ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
    413 </pre>
    414 
    415 <p>In addition, the macros define a data type used for storing the data of the
    416 new data category; the name of this type is the name of the data category with
    417 "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
    418 be passed data type; for the other three macros, this will be a specialized
    419 version of the <a
    420 href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
    421 <a
    422 href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
    423 or <a
    424 href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
    425 templated class. For the <tt>ExampleDataType</tt> example above, the type
    426 created would be equivalent to writing the declaration:
    427 
    428 <pre class="code_example">
    429 typedef llvm::ImmutableMap&lt;SymbolRef, int&gt; ExampleDataTypeTy;
    430 </pre>
    431 
    432 <p>These macros will cover a majority of use cases; however, they still have a
    433 few limitations. They cannot be used inside namespaces (since they expand to
    434 contain top-level namespace references), and the data types that they define
    435 cannot be referenced from more than one file.
    436 
    437 <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
    438 one, functions that modify the state will return a copy of the previous state
    439 with the change applied. This updated state must be then provided to the
    440 analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
    441 <h2 id=bugs>Bug Reports</h2>
    442 
    443 
    444 <p> When a checker detects a mistake in the analyzed code, it needs a way to
    445 report it to the analyzer core so that it can be displayed. The two classes used
    446 to construct this report are <tt><a
    447 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
    448 and <tt><a
    449 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
    450 BugReport</a></tt>.
    451 
    452 <p>
    453 <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
    454 constructor for <tt>BugType</tt> takes two parameters: The name of the bug
    455 type, and the name of the category of the bug. These are used (e.g.) in the
    456 summary page generated by the scan-build tool.
    457 
    458 <P>
    459   The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
    460   the most common case, three parameters are used to form a <tt>BugReport</tt>:
    461 <ol>
    462 <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
    463 <li>A short descriptive string. This is placed at the location of the bug in
    464 the detailed line-by-line output generated by scan-build.
    465 <li>The context in which the bug occurred. This includes both the location of
    466 the bug in the program and the program's state when the location is reached. These are
    467 both encapsulated in an <tt>ExplodedNode</tt>.
    468 </ol>
    469 
    470 <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
    471 as to whether or not analysis can continue along the current path. This decision
    472 is based on whether the detected bug is one that would prevent the program under
    473 analysis from continuing. For example, leaking of a resource should not stop
    474 analysis, as the program can continue to run after the leak. Dereferencing a
    475 null pointer, on the other hand, should stop analysis, as there is no way for
    476 the program to meaningfully continue after such an error.
    477 
    478 <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> 
    479 generated by the checker can be passed to the <tt>BugReport</tt> constructor 
    480 without additional modification. This <tt>ExplodedNode</tt> will be the one 
    481 returned by the most recent call to <a
    482 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
    483 If no transition has been performed during the current callback, the checker should call <a
    484 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> 
    485 and use the returned node for bug reporting.
    486 
    487 <p>If analysis can not continue, then the current state should be transitioned
    488 into a so-called <i>sink node</i>, a node from which no further analysis will be
    489 performed. This is done by calling the <a
    490 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
    491 CheckerContext::generateSink</a> function; this function is the same as the
    492 <tt>addTransition</tt> function, but marks the state as a sink node. Like
    493 <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
    494 state, which can then be passed to the <tt>BugReport</tt> constructor.
    495 
    496 <p>
    497 After a <tt>BugReport</tt> is created, it should be passed to the analyzer core 
    498 by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
    499 
    500 <h2 id=ast>AST Visitors</h2>
    501   Some checks might not require path-sensitivity to be effective. Simple AST walk 
    502   might be sufficient. If that is the case, consider implementing a Clang 
    503   compiler warning. On the other hand, a check might not be acceptable as a compiler 
    504   warning; for example, because of a relatively high false positive rate. In this 
    505   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
    506   <tt><b>checkASTCodeBody</b></tt> are your best friends. 
    507 
    508 <h2 id=testing>Testing</h2>
    509   Every patch should be well tested with Clang regression tests. The checker tests 
    510   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
    511   execute the following from the <tt>clang</tt> build directory:
    512     <pre class="code">
    513     $ <b>TESTDIRS=Analysis make test</b>
    514     </pre>
    515 
    516 <h2 id=commands>Useful Commands/Debugging Hints</h2>
    517 <ul>
    518 <li>
    519 While investigating a checker-related issue, instruct the analyzer to only 
    520 execute a single checker:
    521 <br><tt>
    522 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
    523 </tt>
    524 </li>
    525 <li>
    526 To dump AST:
    527 <br><tt>
    528 $ <b>clang -cc1 -ast-dump test.c</b>
    529 </tt>
    530 </li>
    531 <li>
    532 To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
    533 <br><tt>
    534 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
    535 </tt> 
    536 </li>
    537 <li>
    538 To see all available debug checkers:
    539 <br><tt>
    540 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
    541 </tt>
    542 </li>
    543 <li>
    544 To see which function is failing while processing a large file use 
    545 <tt>-analyzer-display-progress</tt> option.
    546 </li>
    547 <li>
    548 While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
    549 instead of <tt>clang --analyze</tt>, as the later would call the compiler 
    550 in a separate process.
    551 </li>
    552 <li>
    553 To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
    554 debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
    555 execute:
    556 <br><tt> 
    557 (gdb) <b>p ViewGraph(0)</b>
    558 </tt>
    559 </li>
    560 <li>
    561 To see the <tt>ProgramState</tt> while debugging use the following command. 
    562 <br><tt>
    563 (gdb) <b>p State->dump()</b>
    564 </tt> 
    565 </li>
    566 <li>
    567 To see <tt>clang::Expr</tt> while debugging use the following command. If you 
    568 pass in a SourceManager object, it will also dump the corresponding line in the 
    569 source code.
    570 <br><tt>
    571 (gdb) <b>p E->dump()</b>
    572 </tt> 
    573 </li>
    574 <li>
    575 To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
    576 <br><tt>
    577 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
    578 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
    579 </tt>
    580 </li>
    581 </ul>
    582 
    583 <h2 id=additioninformation>Additional Sources of Information</h2>
    584 
    585 Here are some additional resources that are useful when working on the Clang
    586 Static Analyzer:
    587 
    588 <ul>
    589 <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
    590 up-to-date documentation about the APIs available in Clang. Relevant entries
    591 have been linked throughout this page. Also of use is the
    592 <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
    593 from LLVM.
    594 <li> The <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">
    595 cfe-dev mailing list</a>. This is the primary mailing list used for
    596 discussion of Clang development (including static code analysis). The
    597 <a href="http://lists.cs.uiuc.edu/pipermail/cfe-dev">archive</a> also contains
    598 a lot of information.
    599 <li> The "Building a Checker in 24 hours" presentation given at the <a
    600 href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
    601 meeting</a>. Describes the construction of SimpleStreamChecker. <a
    602 href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
    603 and <a
    604 href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
    605 are available.
    606 </ul>
    607 
    608 </div>
    609 </div>
    610 </body>
    611 </html>
    612