Home | History | Annotate | Download | only in analyzer
      1 The analyzer "Store" represents the contents of memory regions. It is an opaque
      2 functional data structure stored in each ProgramState; the only class that can
      3 modify the store is its associated StoreManager.
      4 
      5 Currently (Feb. 2013), the only StoreManager implementation being used is
      6 RegionStoreManager. This store records bindings to memory regions using a "base
      7 region + offset" key. (This allows `*p` and `p[0]` to map to the same location,
      8 among other benefits.)
      9 
     10 Regions are grouped into "clusters", which roughly correspond to "regions with
     11 the same base region". This allows certain operations to be more efficient,
     12 such as invalidation.
     13 
     14 Regions that do not have a known offset use a special "symbolic" offset. These
     15 keys store both the original region, and the "concrete offset region" -- the
     16 last region whose offset is entirely concrete. (For example, in the expression
     17 `foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`,
     18 since that has a known offset from the start of the top-level `foo` struct.)
     19 
     20 
     21 Binding Invalidation
     22 ====================
     23 
     24 Supporting both concrete and symbolic offsets makes things a bit tricky. Here's
     25 an example:
     26 
     27     foo[0] = 0;
     28     foo[1] = 1;
     29     foo[i] = i;
     30 
     31 After the third assignment, nothing can be said about the value of `foo[0]`,
     32 because `foo[i]` may have overwritten it! Thus, *binding to a region with a
     33 symbolic offset invalidates the entire concrete offset region.* We know
     34 `foo[i]` is somewhere within `foo`, so we don't have to invalidate anything
     35 else, but we do have to be conservative about all other bindings within `foo`.
     36 
     37 Continuing the example:
     38 
     39     foo[i] = i;
     40     foo[0] = 0;
     41 
     42 After this latest assignment, nothing can be said about the value of `foo[i]`,
     43 because `foo[0]` may have overwritten it! *Binding to a region R with a
     44 concrete offset invalidates any symbolic offset bindings whose concrete offset
     45 region is a super-region **or** sub-region of R.* All we know about `foo[i]` is
     46 that it is somewhere within `foo`, so changing *anything* within `foo` might
     47 change `foo[i]`, and changing *all* of `foo` (or its base region) will
     48 *definitely* change `foo[i]`.
     49 
     50 This logic could be improved by using the current constraints on `i`, at the
     51 cost of speed. The latter case could also be improved by matching region kinds,
     52 i.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i`
     53 is.
     54 
     55 For more detail, read through RegionStoreManager::removeSubRegionBindings in
     56 RegionStore.cpp.
     57 
     58 
     59 ObjCIvarRegions
     60 ===============
     61 
     62 Objective-C instance variables require a bit of special handling. Like struct
     63 fields, they are not base regions, and when their parent object region is
     64 invalidated, all the instance variables must be invalidated as well. However,
     65 they have no concrete compile-time offsets (in the modern, "non-fragile"
     66 runtime), and so cannot easily be represented as an offset from the start of
     67 the object in the analyzer. Moreover, this means that invalidating a single
     68 instance variable should *not* invalidate the rest of the object, since unlike
     69 struct fields or array elements there is no way to perform pointer arithmetic
     70 to access another instance variable.
     71 
     72 Consequently, although the base region of an ObjCIvarRegion is the entire
     73 object, RegionStore offsets are computed from the start of the instance
     74 variable. Thus it is not valid to assume that all bindings with non-symbolic
     75 offsets start from the base region!
     76 
     77 
     78 Region Invalidation
     79 ===================
     80 
     81 Unlike binding invalidation, region invalidation occurs when the entire
     82 contents of a region may have changed---say, because it has been passed to a
     83 function the analyzer can model, like memcpy, or because its address has
     84 escaped, usually as an argument to an opaque function call. In these cases we
     85 need to throw away not just all bindings within the region itself, but within
     86 its entire cluster, since neighboring regions may be accessed via pointer
     87 arithmetic.
     88 
     89 Region invalidation typically does even more than this, however. Because it
     90 usually represents the complete escape of a region from the analyzer's model,
     91 its *contents* must also be transitively invalidated. (For example, if a region
     92 'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have
     93 changed as well.) The algorithm that traverses this transitive closure of
     94 accessible regions is known as ClusterAnalysis, and is also used for finding
     95 all live bindings in the store (in order to throw away the dead ones). The name
     96 "ClusterAnalysis" predates the cluster-based organization of bindings, but
     97 refers to the same concept: during invalidation and liveness analysis, all
     98 bindings within a cluster must be treated in the same way for a conservative
     99 model of program behavior.
    100 
    101 
    102 Default Bindings
    103 ================
    104 
    105 Most bindings in RegionStore are simple scalar values -- integers and pointers.
    106 These are known as "Direct" bindings. However, RegionStore supports a second
    107 type of binding called a "Default" binding. These are used to provide values to
    108 all the elements of an aggregate type (struct or array) without having to
    109 explicitly specify a binding for each individual element.
    110 
    111 When there is no Direct binding for a particular region, the store manager
    112 looks at each super-region in turn to see if there is a Default binding. If so,
    113 this value is used as the value of the original region. The search ends when
    114 the base region is reached, at which point the RegionStore will pick an
    115 appropriate default value for the region (usually a symbolic value, but
    116 sometimes zero, for static data, or "uninitialized", for stack variables).
    117 
    118   int manyInts[10];
    119   manyInts[1] = 42;   // Creates a Direct binding for manyInts[1].
    120   print(manyInts[1]); // Retrieves the Direct binding for manyInts[1];
    121   print(manyInts[0]); // There is no Direct binding for manyInts[1].
    122                       // Is there a Default binding for the entire array?
    123                       // There is not, but it is a stack variable, so we use
    124                       // "uninitialized" as the default value (and emit a
    125                       // diagnostic!).
    126 
    127 NOTE: The fact that bindings are stored as a base region plus an offset limits
    128 the Default Binding strategy, because in C aggregates can contain other
    129 aggregates. In the current implementation of RegionStore, there is no way to
    130 distinguish a Default binding for an entire aggregate from a Default binding
    131 for the sub-aggregate at offset 0.
    132 
    133 
    134 Lazy Bindings (LazyCompoundVal)
    135 ===============================
    136 
    137 RegionStore implements an optimization for copying aggregates (structs and
    138 arrays) called "lazy bindings", implemented using a special SVal called
    139 LazyCompoundVal. When the store is asked for the "binding" for an entire
    140 aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a
    141 LazyCompoundVal instead. When this value is then stored into a variable, it is
    142 bound as a Default value. This makes copying arrays and structs much cheaper
    143 than if they had required memberwise access.
    144 
    145 Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region,
    146 store), representing "the value of the region during this 'snapshot' of the
    147 store". This has important implications for any sort of liveness or
    148 reachability analysis, which must take the bindings in the old store into
    149 account.
    150 
    151 Retrieving a value from a lazy binding happens in the same way as any other
    152 Default binding: since there is no direct binding, the store manager falls back
    153 to super-regions to look for an appropriate default binding. LazyCompoundVal
    154 differs from a normal default binding, however, in that it contains several
    155 different values, instead of one value that will appear several times. Because
    156 of this, the store manager has to reconstruct the subregion chain on top of the
    157 LazyCompoundVal region, and look up *that* region in the previous store.
    158 
    159 Here's a concrete example:
    160 
    161     CGPoint p;
    162     p.x = 42;       // A Direct binding is made to the FieldRegion 'p.x'.
    163     CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a
    164                     // snapshot of the current store state. This value is then
    165                     // used as a Default binding for the VarRegion 'p2'.
    166     return p2.x;    // The binding for FieldRegion 'p2.x' is requested.
    167                     // There is no Direct binding, so we look for a Default
    168                     // binding to 'p2' and find the LCV.
    169                     // Because it's an LCV, we look at our requested region
    170                     // and see that it's the '.x' field. We ask for the value
    171                     // of 'p.x' within the snapshot, and get back 42.
    172