This Page Is Under Construction

Checker Developer Manual

The static analyzer engine performs symbolic execution of the program and relies on a set of checkers to implement the logic for detecting and constructing bug reports. This page provides hints and guidelines for anyone who is interested in implementing their own checker. The static analyzer is a part of the Clang project, so consult Hacking on Clang and LLVM Programmer's Manual for general developer guidelines and information.

Getting Started

Static Analyzer Overview

The analyzer core performs symbolic execution of the given program. All the input values are represented with symbolic values; further, the engine deduces the values of all the expressions in the program based on the input symbols and the path. The execution is path sensitive and every possible path through the program is explored. The explored execution traces are represented with ExplodedGraph object. Each node of the graph is ExplodedNode, which consists of a ProgramPoint and a ProgramState.

ProgramPoint represents the corresponding location in the program (or the CFG graph). ProgramPoint is also used to record additional information on when/how the state was added. For example, PostPurgeDeadSymbolsKind kind means that the state is the result of purging dead symbols - the analyzer's equivalent of garbage collection.

ProgramState represents abstract state of the program. It consists of:

Interaction with Checkers

Checkers are not merely passive receivers of the analyzer core changes - they actively participate in the ProgramState construction through the GenericDataMap which can be used to store the checker-defined part of the state. Each time the analyzer engine explores a new statement, it notifies each checker registered to listen for that statement, giving it an opportunity to either report a bug or modify the state. (As a rule of thumb, the checker itself should be stateless.) The checkers are called one after another in the predefined order; thus, calling all the checkers adds a chain to the ExplodedGraph.

Representing Values

During symbolic execution, SVal objects are used to represent the semantic evaluation of expressions. They can represent things like concrete integers, symbolic values, or memory locations (which are memory regions). They are a discriminated union of "values", symbolic and otherwise. If a value isn't symbolic, usually that means there is no symbolic information to track. For example, if the value was an integer, such as 42, it would be a ConcreteInt, and the checker doesn't usually need to track any state with the concrete number. In some cases, SVal is not a symbol, but it really should be a symbolic value. This happens when the analyzer cannot reason about something (yet). An example is floating point numbers. In such cases, the SVal will evaluate to UnknownVal. This represents a case that is outside the realm of the analyzer's reasoning capabilities. SVals are value objects and their values can be viewed using the .dump() method. Often they wrap persistent objects such as symbols or regions.

SymExpr (symbol) is meant to represent abstract, but named, symbolic value. Symbols represent an actual (immutable) value. We might not know what its specific value is, but we can associate constraints with that value as we analyze a path. For example, we might record that the value of a symbol is greater than 0, etc.

MemRegion is similar to a symbol. It is used to provide a lexicon of how to describe abstract memory. Regions can layer on top of other regions, providing a layered approach to representing memory. For example, a struct object on the stack might be represented by a VarRegion, but a FieldRegion which is a subregion of the VarRegion could be used to represent the memory associated with a specific field of that object. So how do we represent symbolic memory regions? That's what SymbolicRegion is for. It is a MemRegion that has an associated symbol. Since the symbol is unique and has a unique name; that symbol names the region.

Let's see how the analyzer processes the expressions in the following example:

  int foo(int x) {
     int y = x * 2;
     int z = x;
     ...
  }
  

Let's look at how x*2 gets evaluated. When x is evaluated, we first construct an SVal that represents the lvalue of x, in this case it is an SVal that references the MemRegion for x. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, which references the value currently bound to x. That value is symbolic; it's whatever x was bound to at the start of the function. Let's call that symbol $0. Similarly, we evaluate the expression for 2, and get an SVal that references the concrete number 2. When we evaluate x*2, we take the two SVals of the subexpressions, and create a new SVal that represents their multiplication (which in this case is a new symbolic expression, which we might call $1). When we evaluate the assignment to y, we again compute its lvalue (a MemRegion), and then bind the SVal for the RHS (which references the symbolic value $1) to the MemRegion in the symbolic store.
The second line is similar. When we evaluate x again, we do the same dance, and create an SVal that references the symbol $0. Note, two SVals might reference the same underlying values.

To summarize, MemRegions are unique names for blocks of memory. Symbols are unique names for abstract symbolic values. Some MemRegions represents abstract symbolic chunks of memory, and thus are also based on symbols. SVals are just references to values, and can reference either MemRegions, Symbols, or concrete values (e.g., the number 1).

Idea for a Checker

Here are several questions which you should consider when evaluating your checker idea:

Checker Registration

All checker implementation files are located in clang/lib/StaticAnalyzer/Checkers folder. Follow the steps below to register a new checker with the analyzer.
  1. Create a new checker implementation file, for example ./lib/StaticAnalyzer/Checkers/NewChecker.cpp
    using namespace clang;
    using namespace ento;
    
    namespace {
    class NewChecker: public Checker< check::PreStmt<CallExpr> > {
    public:
      void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
    }
    }
    void ento::registerNewChecker(CheckerManager &mgr) {
      mgr.registerChecker<NewChecker>();
    }
    
  2. Pick the package name for your checker and add the registration code to ./lib/StaticAnalyzer/Checkers/Checkers.td. Note, all checkers should first be developed as experimental. Suppose our new checker performs security related checks, then we should add the following lines under SecurityExperimental package:
    let ParentPackage = SecurityExperimental in {
    ...
    def NewChecker : Checker<"NewChecker">,
      HelpText<"This text should give a short description of the checks performed.">,
      DescFile<"NewChecker.cpp">;
    ...
    } // end "security.experimental"
    
  3. Make the source code file visible to CMake by adding it to ./lib/StaticAnalyzer/Checkers/CMakeLists.txt.
  4. Compile and see your checker in the list of available checkers by running:
    $clang -cc1 -analyzer-checker-help

Checker Skeleton

There are two main decisions you need to make:

Bug Reports

AST Visitors

Some checks might not require path-sensitivity to be effective. Simple AST walk might be sufficient. If that is the case, consider implementing a Clang compiler warning. On the other hand, a check might not be acceptable as a compiler warning; for example, because of a relatively high false positive rate. In this situation, AST callbacks checkASTDecl and checkASTCodeBody are your best friends.

Testing

Every patch should be well tested with Clang regression tests. The checker tests live in clang/test/Analysis folder. To run all of the analyzer tests, execute the following from the clang build directory:
    $ TESTDIRS=Analysis make test
    

Useful Commands/Debugging Hints