Home | History | Annotate | Download | only in analyzer
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2           "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <title>Checker Developer Manual</title>
      6   <link type="text/css" rel="stylesheet" href="menu.css">
      7   <link type="text/css" rel="stylesheet" href="content.css">
      8   <script type="text/javascript" src="scripts/menu.js"></script>
      9 </head>
     10 <body>
     11 
     12 <div id="page">
     13 <!--#include virtual="menu.html.incl"-->
     14 
     15 <div id="content">
     16 
     17 <h3 style="color:red">This Page Is Under Construction</h3>
     18 
     19 <h1>Checker Developer Manual</h1>
     20 
     21 <p>The static analyzer engine performs path-sensitive exploration of the program and 
     22 relies on a set of checkers to implement the logic for detecting and 
     23 constructing specific bug reports. Anyone who is interested in implementing their own 
     24 checker, should check out the Building a Checker in 24 Hours talk 
     25 (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
     26  <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>) 
     27 and refer to this page for additional information on writing a checker. The static analyzer is a 
     28 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> 
     29 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> 
     30 for developer guidelines and send your questions and proposals to 
     31 <a href=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. 
     32 </p>
     33 
     34     <ul>
     35       <li><a href="#start">Getting Started</a></li>
     36       <li><a href="#analyzer">Static Analyzer Overview</a>
     37       <ul>
     38         <li><a href="#interaction">Interaction with Checkers</a></li>
     39         <li><a href="#values">Representing Values</a></li>
     40       </ul></li>
     41       <li><a href="#idea">Idea for a Checker</a></li>
     42       <li><a href="#registration">Checker Registration</a></li>
     43       <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
     44       <li><a href="#extendingstates">Custom Program States</a></li>
     45       <li><a href="#bugs">Bug Reports</a></li>
     46       <li><a href="#ast">AST Visitors</a></li>
     47       <li><a href="#testing">Testing</a></li>
     48       <li><a href="#commands">Useful Commands/Debugging Hints</a></li>
     49       <li><a href="#additioninformation">Additional Sources of Information</a></li>
     50       <li><a href="#links">Useful Links</a></li>
     51     </ul>
     52 
     53 <h2 id=start>Getting Started</h2>
     54   <ul>
     55     <li>To check out the source code and build the project, follow steps 1-4 of 
     56     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> 
     57   page.</li>
     58 
     59     <li>The analyzer source code is located under the Clang source tree:
     60     <br><tt>
     61     $ <b>cd llvm/tools/clang</b>
     62     </tt>
     63     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
     64      <tt>test/Analysis</tt>.</li>
     65 
     66     <li>The analyzer regression tests can be executed from the Clang's build 
     67     directory:
     68     <br><tt>
     69     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
     70     </tt></li>
     71     
     72     <li>Analyze a file with the specified checker:
     73     <br><tt>
     74     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
     75     </tt></li>
     76 
     77     <li>List the available checkers:
     78     <br><tt>
     79     $ <b>clang -cc1 -analyzer-checker-help</b>
     80     </tt></li>
     81 
     82     <li>See the analyzer help for different output formats, fine tuning, and 
     83     debug options:
     84     <br><tt>
     85     $ <b>clang -cc1 -help | grep "analyzer"</b>
     86     </tt></li>
     87 
     88   </ul>
     89  
     90 <h2 id=analyzer>Static Analyzer Overview</h2>
     91   The analyzer core performs symbolic execution of the given program. All the 
     92   input values are represented with symbolic values; further, the engine deduces 
     93   the values of all the expressions in the program based on the input symbols  
     94   and the path. The execution is path sensitive and every possible path through 
     95   the program is explored. The explored execution traces are represented with 
     96   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
     97   Each node of the graph is 
     98   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 
     99   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
    100   <p>
    101   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 
    102   represents the corresponding location in the program (or the CFG graph). 
    103   <tt>ProgramPoint</tt> is also used to record additional information on 
    104   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 
    105   kind means that the state is the result of purging dead symbols - the 
    106   analyzer's equivalent of garbage collection. 
    107   <p>
    108   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 
    109   represents abstract state of the program. It consists of:
    110   <ul>
    111     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 
    112     values
    113     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
    114     <li><tt>GenericDataMap</tt> - constraints on symbolic values
    115   </ul>
    116   
    117   <h3 id=interaction>Interaction with Checkers</h3>
    118   Checkers are not merely passive receivers of the analyzer core changes - they 
    119   actively participate in the <tt>ProgramState</tt> construction through the
    120   <tt>GenericDataMap</tt> which can be used to store the checker-defined part 
    121   of the state. Each time the analyzer engine explores a new statement, it 
    122   notifies each checker registered to listen for that statement, giving it an 
    123   opportunity to either report a bug or modify the state. (As a rule of thumb, 
    124   the checker itself should be stateless.) The checkers are called one after another 
    125   in the predefined order; thus, calling all the checkers adds a chain to the 
    126   <tt>ExplodedGraph</tt>. 
    127   
    128   <h3 id=values>Representing Values</h3>
    129   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 
    130   objects are used to represent the semantic evaluation of expressions. 
    131   They can represent things like concrete 
    132   integers, symbolic values, or memory locations (which are memory regions). 
    133   They are a discriminated union of "values", symbolic and otherwise. 
    134   If a value isn't symbolic, usually that means there is no symbolic 
    135   information to track. For example, if the value was an integer, such as 
    136   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 
    137   and the checker doesn't usually need to track any state with the concrete 
    138   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 
    139   a symbolic value. This happens when the analyzer cannot reason about something 
    140   (yet). An example is floating point numbers. In such cases, the 
    141   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
    142   This represents a case that is outside the realm of the analyzer's reasoning 
    143   capabilities. <tt>SVals</tt> are value objects and their values can be viewed 
    144   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 
    145   symbols or regions. 
    146   <p>
    147   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 
    148   is meant to represent abstract, but named, symbolic value. Symbols represent 
    149   an actual (immutable) value. We might not know what its specific value is, but 
    150   we can associate constraints with that value as we analyze a path. For 
    151   example, we might record that the value of a symbol is greater than 
    152   <tt>0</tt>, etc.
    153   <p>
    154 
    155   <p>
    156   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.  
    157   It is used to provide a lexicon of how to describe abstract memory. Regions can 
    158   layer on top of other regions, providing a layered approach to representing memory. 
    159   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 
    160   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 
    161   be used to represent the memory associated with a specific field of that object.
    162   So how do we represent symbolic memory regions? That's what 
    163   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 
    164   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 
    165   symbol is unique and has a unique name; that symbol names the region.
    166   
    167   <P>
    168   Let's see how the analyzer processes the expressions in the following example:
    169   <p>
    170   <pre class="code_example">
    171   int foo(int x) {
    172      int y = x * 2;
    173      int z = x;
    174      ...
    175   }
    176   </pre>
    177   <p>
    178 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 
    179 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 
    180 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 
    181 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 
    182 which references the value <b>currently bound</b> to <tt>x</tt>. That value is 
    183 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 
    184 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 
    185 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 
    186 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 
    187 and create a new <tt>SVal</tt> that represents their multiplication (which in 
    188 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 
    189 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 
    190 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 
    191 to the <tt>MemRegion</tt> in the symbolic store.
    192 <br>
    193 The second line is similar. When we evaluate <tt>x</tt> again, we do the same 
    194 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 
    195 might reference the same underlying values.
    196 
    197 <p>
    198 To summarize, MemRegions are unique names for blocks of memory. Symbols are 
    199 unique names for abstract symbolic values. Some MemRegions represents abstract 
    200 symbolic chunks of memory, and thus are also based on symbols. SVals are just 
    201 references to values, and can reference either MemRegions, Symbols, or concrete 
    202 values (e.g., the number 1).
    203 
    204   <!-- 
    205   TODO: Add a picture.
    206   <br>
    207   Symbols<br>
    208   FunctionalObjects are used throughout.  
    209   -->
    210 
    211 <h2 id=idea>Idea for a Checker</h2>
    212   Here are several questions which you should consider when evaluating your 
    213   checker idea:
    214   <ul>
    215     <li>Can the check be effectively implemented without path-sensitive 
    216     analysis? See <a href="#ast">AST Visitors</a>.</li>
    217     
    218     <li>How high the false positive rate is going to be? Looking at the occurrences 
    219     of the issue you want to write a checker for in the existing code bases might 
    220     give you some ideas. </li>
    221     
    222     <li>How the current limitations of the analysis will effect the false alarm 
    223     rate? Currently, the analyzer only reasons about one procedure at a time (no 
    224     inter-procedural analysis). Also, it uses a simple range tracking based 
    225     solver to model symbolic execution.</li>
    226     
    227     <li>Consult the <a
    228     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 
    229     to get some ideas for new checkers and consider starting with improving/fixing  
    230     bugs in the existing checkers.</li>
    231   </ul>
    232 
    233 <p>Once an idea for a checker has been chosen, there are two key decisions that
    234 need to be made:
    235   <ul>
    236     <li> Which events the checker should be tracking. This is discussed in more
    237     detail in the section <a href="#events_callbacks">Events, Callbacks, and
    238     Checker Class Structure</a>.
    239     <li> What checker-specific data needs to be stored as part of the program
    240     state (if any). This should be minimized as much as possible. More detail about
    241     implementing custom program state is given in section <a
    242     href="#extendingstates">Custom Program States</a>.
    243   </ul>
    244 
    245 
    246 <h2 id=registration>Checker Registration</h2>
    247   All checker implementation files are located in
    248   <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
    249   how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of 
    250   stream APIs, was registered with the analyzer.
    251   Similar steps should be followed for a new checker.
    252 <ol>
    253   <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
    254   created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
    255   <li>The following registration code was added to the implementation file:
    256 <pre class="code_example">
    257 void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
    258   mgr.registerChecker&lt;SimpleStreamChecker&gt();
    259 }
    260 </pre>
    261 <li>A package was selected for the checker and the checker was defined in the
    262 table of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all
    263 checkers should first be developed as "alpha", and the SimpleStreamChecker
    264 performs UNIX API checks, the correct package is "alpha.unix", and the following
    265 was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
    266 <pre class="code_example">
    267 let ParentPackage = UnixAlpha in {
    268 ...
    269 def SimpleStreamChecker : Checker<"SimpleStream">,
    270   HelpText<"Check for misuses of stream APIs">,
    271   DescFile<"SimpleStreamChecker.cpp">;
    272 ...
    273 } // end "alpha.unix"
    274 </pre>
    275 
    276 <li>The source code file was made visible to CMake by adding it to
    277 <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
    278 
    279 </ol>
    280 
    281 After adding a new checker to the analyzer, one can verify that the new checker
    282 was successfully added by seeing if it appears in the list of available checkers:
    283 <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
    284 
    285 <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
    286 
    287 <p> All checkers inherit from the <tt><a
    288 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
    289 Checker</a></tt> template class; the template parameter(s) describe the type of
    290 events that the checker is interested in processing. The various types of events
    291 that are available are described in the file <a
    292 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
    293 CheckerDocumentation.cpp</a>
    294 
    295 <p> For each event type requested, a corresponding callback function must be
    296 defined in the checker class (<a
    297 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
    298 CheckerDocumentation.cpp</a> shows the
    299 correct function name and signature for each event type).
    300 
    301 <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
    302 take action at the following times:
    303 
    304 <ul>
    305 <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
    306 If so, check the parameter being passed.
    307 <li>After making a function call, check if the function is <tt>fopen</tt>. If
    308 so, process the return value.
    309 <li>When values go out of scope, check whether they are still-open file
    310 descriptors, and report a bug if so. In addition, remove any information about
    311 them from the program state in order to keep the state as small as possible.
    312 <li>When file pointers "escape" (are used in a way that the analyzer can no longer
    313 track them), mark them as such. This prevents false positives in the cases where
    314 the analyzer cannot be sure whether the file was closed or not.
    315 </ul>
    316 
    317 <p>These events that will be used for each of these actions are, respectively, <a
    318 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
    319 <a
    320 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
    321 <a
    322 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
    323 and <a
    324 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
    325 The high-level structure of the checker's class is thus:
    326 
    327 <pre class="code_example">
    328 class SimpleStreamChecker : public Checker&lt;check::PreCall,
    329                                            check::PostCall,
    330                                            check::DeadSymbols,
    331                                            check::PointerEscape&gt; {
    332 public:
    333 
    334   void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
    335 
    336   void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
    337 
    338   void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
    339 
    340   ProgramStateRef checkPointerEscape(ProgramStateRef State,
    341                                      const InvalidatedSymbols &amp;Escaped,
    342                                      const CallEvent *Call,
    343                                      PointerEscapeKind Kind) const;
    344 };
    345 </pre>
    346 
    347 <h2 id=extendingstates>Custom Program States</h2>
    348 
    349 <p> Checkers often need to keep track of information specific to the checks they
    350 perform. However, since checkers have no guarantee about the order in which the
    351 program will be explored, or even that all possible paths will be explored, this
    352 state information cannot be kept within individual checkers. Therefore, if
    353 checkers need to store custom information, they need to add new categories of
    354 data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
    355 several macros designed for this purpose. They are:
    356 
    357 <ul>
    358 <li><a
    359 href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
    360 Used when the state information is a single value. The methods available for
    361 state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
    362 <tt>remove</tt>.
    363 <li><a
    364 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
    365 Used when the state information is a list of values. The methods available for
    366 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
    367 <tt>remove</tt>, and <tt>contains</tt>.
    368 <li><a
    369 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
    370 Used when the state information is a set of values. The methods available for
    371 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
    372 <tt>remove</tt>, and <tt>contains</tt>.
    373 <li><a
    374 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
    375 Used when the state information is a map from a key to a value. The methods
    376 available for state types declared with this macro are <tt>add</tt>,
    377 <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
    378 </ul>
    379 
    380 <p>All of these macros take as parameters the name to be used for the custom
    381 category of state information and the data type(s) to be used for storage. The
    382 data type(s) specified will become the parameter type and/or return type of the
    383 methods that manipulate the new category of state information. Each of these
    384 methods are templated with the name of the custom data type.
    385 
    386 <p>For example, a common case is the need to track data associated with a
    387 symbolic expression; a map type is the most logical way to implement this. The
    388 key for this map will be a pointer to a symbolic expression
    389 (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
    390 expression is an integer, then the custom category of state information would be
    391 declared as
    392 
    393 <pre class="code_example">
    394 REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
    395 </pre>
    396 
    397 The data would be accessed with the function
    398 
    399 <pre class="code_example">
    400 ProgramStateRef state;
    401 SymbolRef Sym;
    402 ...
    403 int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
    404 </pre>
    405 
    406 and set with the function
    407 
    408 <pre class="code_example">
    409 ProgramStateRef state;
    410 SymbolRef Sym;
    411 int newValue;
    412 ...
    413 ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
    414 </pre>
    415 
    416 <p>In addition, the macros define a data type used for storing the data of the
    417 new data category; the name of this type is the name of the data category with
    418 "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
    419 be passed data type; for the other three macros, this will be a specialized
    420 version of the <a
    421 href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
    422 <a
    423 href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
    424 or <a
    425 href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
    426 templated class. For the <tt>ExampleDataType</tt> example above, the type
    427 created would be equivalent to writing the declaration:
    428 
    429 <pre class="code_example">
    430 typedef llvm::ImmutableMap&lt;SymbolRef, int&gt; ExampleDataTypeTy;
    431 </pre>
    432 
    433 <p>These macros will cover a majority of use cases; however, they still have a
    434 few limitations. They cannot be used inside namespaces (since they expand to
    435 contain top-level namespace references), and the data types that they define
    436 cannot be referenced from more than one file.
    437 
    438 <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
    439 one, functions that modify the state will return a copy of the previous state
    440 with the change applied. This updated state must be then provided to the
    441 analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
    442 <h2 id=bugs>Bug Reports</h2>
    443 
    444 
    445 <p> When a checker detects a mistake in the analyzed code, it needs a way to
    446 report it to the analyzer core so that it can be displayed. The two classes used
    447 to construct this report are <tt><a
    448 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
    449 and <tt><a
    450 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
    451 BugReport</a></tt>.
    452 
    453 <p>
    454 <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
    455 constructor for <tt>BugType</tt> takes two parameters: The name of the bug
    456 type, and the name of the category of the bug. These are used (e.g.) in the
    457 summary page generated by the scan-build tool.
    458 
    459 <P>
    460   The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
    461   the most common case, three parameters are used to form a <tt>BugReport</tt>:
    462 <ol>
    463 <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
    464 <li>A short descriptive string. This is placed at the location of the bug in
    465 the detailed line-by-line output generated by scan-build.
    466 <li>The context in which the bug occurred. This includes both the location of
    467 the bug in the program and the program's state when the location is reached. These are
    468 both encapsulated in an <tt>ExplodedNode</tt>.
    469 </ol>
    470 
    471 <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
    472 as to whether or not analysis can continue along the current path. This decision
    473 is based on whether the detected bug is one that would prevent the program under
    474 analysis from continuing. For example, leaking of a resource should not stop
    475 analysis, as the program can continue to run after the leak. Dereferencing a
    476 null pointer, on the other hand, should stop analysis, as there is no way for
    477 the program to meaningfully continue after such an error.
    478 
    479 <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> 
    480 generated by the checker can be passed to the <tt>BugReport</tt> constructor 
    481 without additional modification. This <tt>ExplodedNode</tt> will be the one 
    482 returned by the most recent call to <a
    483 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
    484 If no transition has been performed during the current callback, the checker should call <a
    485 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> 
    486 and use the returned node for bug reporting.
    487 
    488 <p>If analysis can not continue, then the current state should be transitioned
    489 into a so-called <i>sink node</i>, a node from which no further analysis will be
    490 performed. This is done by calling the <a
    491 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
    492 CheckerContext::generateSink</a> function; this function is the same as the
    493 <tt>addTransition</tt> function, but marks the state as a sink node. Like
    494 <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
    495 state, which can then be passed to the <tt>BugReport</tt> constructor.
    496 
    497 <p>
    498 After a <tt>BugReport</tt> is created, it should be passed to the analyzer core 
    499 by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
    500 
    501 <h2 id=ast>AST Visitors</h2>
    502   Some checks might not require path-sensitivity to be effective. Simple AST walk 
    503   might be sufficient. If that is the case, consider implementing a Clang 
    504   compiler warning. On the other hand, a check might not be acceptable as a compiler 
    505   warning; for example, because of a relatively high false positive rate. In this 
    506   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 
    507   <tt><b>checkASTCodeBody</b></tt> are your best friends. 
    508 
    509 <h2 id=testing>Testing</h2>
    510   Every patch should be well tested with Clang regression tests. The checker tests 
    511   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 
    512   execute the following from the <tt>clang</tt> build directory:
    513     <pre class="code">
    514     $ <b>TESTDIRS=Analysis make test</b>
    515     </pre>
    516 
    517 <h2 id=commands>Useful Commands/Debugging Hints</h2>
    518 <ul>
    519 <li>
    520 While investigating a checker-related issue, instruct the analyzer to only 
    521 execute a single checker:
    522 <br><tt>
    523 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
    524 </tt>
    525 </li>
    526 <li>
    527 To dump AST:
    528 <br><tt>
    529 $ <b>clang -cc1 -ast-dump test.c</b>
    530 </tt>
    531 </li>
    532 <li>
    533 To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
    534 <br><tt>
    535 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
    536 </tt> 
    537 </li>
    538 <li>
    539 To see all available debug checkers:
    540 <br><tt>
    541 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
    542 </tt>
    543 </li>
    544 <li>
    545 To see which function is failing while processing a large file use 
    546 <tt>-analyzer-display-progress</tt> option.
    547 </li>
    548 <li>
    549 While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> 
    550 instead of <tt>clang --analyze</tt>, as the later would call the compiler 
    551 in a separate process.
    552 </li>
    553 <li>
    554 To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while 
    555 debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and 
    556 execute:
    557 <br><tt> 
    558 (gdb) <b>p ViewGraph(0)</b>
    559 </tt>
    560 </li>
    561 <li>
    562 To see the <tt>ProgramState</tt> while debugging use the following command. 
    563 <br><tt>
    564 (gdb) <b>p State->dump()</b>
    565 </tt> 
    566 </li>
    567 <li>
    568 To see <tt>clang::Expr</tt> while debugging use the following command. If you 
    569 pass in a SourceManager object, it will also dump the corresponding line in the 
    570 source code.
    571 <br><tt>
    572 (gdb) <b>p E->dump()</b>
    573 </tt> 
    574 </li>
    575 <li>
    576 To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
    577 <br><tt>
    578 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
    579 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
    580 </tt>
    581 </li>
    582 </ul>
    583 
    584 <h2 id=additioninformation>Additional Sources of Information</h2>
    585 
    586 Here are some additional resources that are useful when working on the Clang
    587 Static Analyzer:
    588 
    589 <ul>
    590 <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
    591 up-to-date documentation about the APIs available in Clang. Relevant entries
    592 have been linked throughout this page. Also of use is the
    593 <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
    594 from LLVM.
    595 <li> The <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">
    596 cfe-dev mailing list</a>. This is the primary mailing list used for
    597 discussion of Clang development (including static code analysis). The
    598 <a href="http://lists.cs.uiuc.edu/pipermail/cfe-dev">archive</a> also contains
    599 a lot of information.
    600 <li> The "Building a Checker in 24 hours" presentation given at the <a
    601 href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
    602 meeting</a>. Describes the construction of SimpleStreamChecker. <a
    603 href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
    604 and <a
    605 href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
    606 are available.
    607 </ul>
    608 
    609 <h2 id=links>Useful Links</h2>
    610 <ul>
    611 <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
    612 </ul>
    613 
    614 </div>
    615 </div>
    616 </body>
    617 </html>
    618