Home | History | Annotate | Download | only in docs
      1 ========================================
      2 Precompiled Header and Modules Internals
      3 ========================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 This document describes the design and implementation of Clang's precompiled
      9 headers (PCH) and modules.  If you are interested in the end-user view, please
     10 see the :ref:`User's Manual <usersmanual-precompiled-headers>`.
     11 
     12 Using Precompiled Headers with ``clang``
     13 ----------------------------------------
     14 
     15 The Clang compiler frontend, ``clang -cc1``, supports two command line options
     16 for generating and using PCH files.
     17 
     18 To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`:
     19 
     20 .. code-block:: bash
     21 
     22   $ clang -cc1 test.h -emit-pch -o test.h.pch
     23 
     24 This option is transparently used by ``clang`` when generating PCH files.  The
     25 resulting PCH file contains the serialized form of the compiler's internal
     26 representation after it has completed parsing and semantic analysis.  The PCH
     27 file can then be used as a prefix header with the :option:`-include-pch`
     28 option:
     29 
     30 .. code-block:: bash
     31 
     32   $ clang -cc1 -include-pch test.h.pch test.c -o test.s
     33 
     34 Design Philosophy
     35 -----------------
     36 
     37 Precompiled headers are meant to improve overall compile times for projects, so
     38 the design of precompiled headers is entirely driven by performance concerns.
     39 The use case for precompiled headers is relatively simple: when there is a
     40 common set of headers that is included in nearly every source file in the
     41 project, we *precompile* that bundle of headers into a single precompiled
     42 header (PCH file).  Then, when compiling the source files in the project, we
     43 load the PCH file first (as a prefix header), which acts as a stand-in for that
     44 bundle of headers.
     45 
     46 A precompiled header implementation improves performance when:
     47 
     48 * Loading the PCH file is significantly faster than re-parsing the bundle of
     49   headers stored within the PCH file.  Thus, a precompiled header design
     50   attempts to minimize the cost of reading the PCH file.  Ideally, this cost
     51   should not vary with the size of the precompiled header file.
     52 
     53 * The cost of generating the PCH file initially is not so large that it
     54   counters the per-source-file performance improvement due to eliminating the
     55   need to parse the bundled headers in the first place.  This is particularly
     56   important on multi-core systems, because PCH file generation serializes the
     57   build when all compilations require the PCH file to be up-to-date.
     58 
     59 Modules, as implemented in Clang, use the same mechanisms as precompiled
     60 headers to save a serialized AST file (one per module) and use those AST
     61 modules.  From an implementation standpoint, modules are a generalization of
     62 precompiled headers, lifting a number of restrictions placed on precompiled
     63 headers.  In particular, there can only be one precompiled header and it must
     64 be included at the beginning of the translation unit.  The extensions to the
     65 AST file format required for modules are discussed in the section on
     66 :ref:`modules <pchinternals-modules>`.
     67 
     68 Clang's AST files are designed with a compact on-disk representation, which
     69 minimizes both creation time and the time required to initially load the AST
     70 file.  The AST file itself contains a serialized representation of Clang's
     71 abstract syntax trees and supporting data structures, stored using the same
     72 compressed bitstream as `LLVM's bitcode file format
     73 <http://llvm.org/docs/BitCodeFormat.html>`_.
     74 
     75 Clang's AST files are loaded "lazily" from disk.  When an AST file is initially
     76 loaded, Clang reads only a small amount of data from the AST file to establish
     77 where certain important data structures are stored.  The amount of data read in
     78 this initial load is independent of the size of the AST file, such that a
     79 larger AST file does not lead to longer AST load times.  The actual header data
     80 in the AST file --- macros, functions, variables, types, etc. --- is loaded
     81 only when it is referenced from the user's code, at which point only that
     82 entity (and those entities it depends on) are deserialized from the AST file.
     83 With this approach, the cost of using an AST file for a translation unit is
     84 proportional to the amount of code actually used from the AST file, rather than
     85 being proportional to the size of the AST file itself.
     86 
     87 When given the :option:`-print-stats` option, Clang produces statistics
     88 describing how much of the AST file was actually loaded from disk.  For a
     89 simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
     90 (which is built as a precompiled header), this option illustrates how little of
     91 the actual precompiled header is required:
     92 
     93 .. code-block:: none
     94 
     95   *** AST File Statistics:
     96     895/39981 source location entries read (2.238563%)
     97     19/15315 types read (0.124061%)
     98     20/82685 declarations read (0.024188%)
     99     154/58070 identifiers read (0.265197%)
    100     0/7260 selectors read (0.000000%)
    101     0/30842 statements read (0.000000%)
    102     4/8400 macros read (0.047619%)
    103     1/4995 lexical declcontexts read (0.020020%)
    104     0/4413 visible declcontexts read (0.000000%)
    105     0/7230 method pool entries read (0.000000%)
    106     0 method pool misses
    107 
    108 For this small program, only a tiny fraction of the source locations, types,
    109 declarations, identifiers, and macros were actually deserialized from the
    110 precompiled header.  These statistics can be useful to determine whether the
    111 AST file implementation can be improved by making more of the implementation
    112 lazy.
    113 
    114 Precompiled headers can be chained.  When you create a PCH while including an
    115 existing PCH, Clang can create the new PCH by referencing the original file and
    116 only writing the new data to the new file.  For example, you could create a PCH
    117 out of all the headers that are very commonly used throughout your project, and
    118 then create a PCH for every single source file in the project that includes the
    119 code that is specific to that file, so that recompiling the file itself is very
    120 fast, without duplicating the data from the common headers for every file.  The
    121 mechanisms behind chained precompiled headers are discussed in a :ref:`later
    122 section <pchinternals-chained>`.
    123 
    124 AST File Contents
    125 -----------------
    126 
    127 Clang's AST files are organized into several different blocks, each of which
    128 contains the serialized representation of a part of Clang's internal
    129 representation.  Each of the blocks corresponds to either a block or a record
    130 within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
    131 The contents of each of these logical blocks are described below.
    132 
    133 .. image:: PCHLayout.png
    134 
    135 For a given AST file, the `llvm-bcanalyzer
    136 <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used
    137 to examine the actual structure of the bitstream for the AST file.  This
    138 information can be used both to help understand the structure of the AST file
    139 and to isolate areas where AST files can still be optimized, e.g., through the
    140 introduction of abbreviations.
    141 
    142 Metadata Block
    143 ^^^^^^^^^^^^^^
    144 
    145 The metadata block contains several records that provide information about how
    146 the AST file was built.  This metadata is primarily used to validate the use of
    147 an AST file.  For example, a precompiled header built for a 32-bit x86 target
    148 cannot be used when compiling for a 64-bit x86 target.  The metadata block
    149 contains information about:
    150 
    151 Language options
    152   Describes the particular language dialect used to compile the AST file,
    153   including major options (e.g., Objective-C support) and more minor options
    154   (e.g., support for "``//``" comments).  The contents of this record correspond to
    155   the ``LangOptions`` class.
    156 
    157 Target architecture
    158   The target triple that describes the architecture, platform, and ABI for
    159   which the AST file was generated, e.g., ``i386-apple-darwin9``.
    160 
    161 AST version
    162   The major and minor version numbers of the AST file format.  Changes in the
    163   minor version number should not affect backward compatibility, while changes
    164   in the major version number imply that a newer compiler cannot read an older
    165   precompiled header (and vice-versa).
    166 
    167 Original file name
    168   The full path of the header that was used to generate the AST file.
    169 
    170 Predefines buffer
    171   Although not explicitly stored as part of the metadata, the predefines buffer
    172   is used in the validation of the AST file.  The predefines buffer itself
    173   contains code generated by the compiler to initialize the preprocessor state
    174   according to the current target, platform, and command-line options.  For
    175   example, the predefines buffer will contain "``#define __STDC__ 1``" when we
    176   are compiling C without Microsoft extensions.  The predefines buffer itself
    177   is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
    178   verified along with the rest of the metadata.
    179 
    180 A chained PCH file (that is, one that references another PCH) and a module
    181 (which may import other modules) have additional metadata containing the list
    182 of all AST files that this AST file depends on.  Each of those files will be
    183 loaded along with this AST file.
    184 
    185 For chained precompiled headers, the language options, target architecture and
    186 predefines buffer data is taken from the end of the chain, since they have to
    187 match anyway.
    188 
    189 .. _pchinternals-sourcemgr:
    190 
    191 Source Manager Block
    192 ^^^^^^^^^^^^^^^^^^^^
    193 
    194 The source manager block contains the serialized representation of Clang's
    195 :ref:`SourceManager <SourceManager>` class, which handles the mapping from
    196 source locations (as represented in Clang's abstract syntax tree) into actual
    197 column/line positions within a source file or macro instantiation.  The AST
    198 file's representation of the source manager also includes information about all
    199 of the headers that were (transitively) included when building the AST file.
    200 
    201 The bulk of the source manager block is dedicated to information about the
    202 various files, buffers, and macro instantiations into which a source location
    203 can refer.  Each of these is referenced by a numeric "file ID", which is a
    204 unique number (allocated starting at 1) stored in the source location.  Clang
    205 serializes the information for each kind of file ID, along with an index that
    206 maps file IDs to the position within the AST file where the information about
    207 that file ID is stored.  The data associated with a file ID is loaded only when
    208 required by the front end, e.g., to emit a diagnostic that includes a macro
    209 instantiation history inside the header itself.
    210 
    211 The source manager block also contains information about all of the headers
    212 that were included when building the AST file.  This includes information about
    213 the controlling macro for the header (e.g., when the preprocessor identified
    214 that the contents of the header dependent on a macro like
    215 ``LLVM_CLANG_SOURCEMANAGER_H``).
    216 
    217 .. _pchinternals-preprocessor:
    218 
    219 Preprocessor Block
    220 ^^^^^^^^^^^^^^^^^^
    221 
    222 The preprocessor block contains the serialized representation of the
    223 preprocessor.  Specifically, it contains all of the macros that have been
    224 defined by the end of the header used to build the AST file, along with the
    225 token sequences that comprise each macro.  The macro definitions are only read
    226 from the AST file when the name of the macro first occurs in the program.  This
    227 lazy loading of macro definitions is triggered by lookups into the
    228 :ref:`identifier table <pchinternals-ident-table>`.
    229 
    230 .. _pchinternals-types:
    231 
    232 Types Block
    233 ^^^^^^^^^^^
    234 
    235 The types block contains the serialized representation of all of the types
    236 referenced in the translation unit.  Each Clang type node (``PointerType``,
    237 ``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
    238 When types are deserialized from the AST file, the data within the record is
    239 used to reconstruct the appropriate type node using the AST context.
    240 
    241 Each type has a unique type ID, which is an integer that uniquely identifies
    242 that type.  Type ID 0 represents the NULL type, type IDs less than
    243 ``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
    244 while other "user-defined" type IDs are assigned consecutively from
    245 ``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered.  The AST file has
    246 an associated mapping from the user-defined types block to the location within
    247 the types block where the serialized representation of that type resides,
    248 enabling lazy deserialization of types.  When a type is referenced from within
    249 the AST file, that reference is encoded using the type ID shifted left by 3
    250 bits.  The lower three bits are used to represent the ``const``, ``volatile``,
    251 and ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class.
    252 
    253 .. _pchinternals-decls:
    254 
    255 Declarations Block
    256 ^^^^^^^^^^^^^^^^^^
    257 
    258 The declarations block contains the serialized representation of all of the
    259 declarations referenced in the translation unit.  Each Clang declaration node
    260 (``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
    261 AST file.  When declarations are deserialized from the AST file, the data
    262 within the record is used to build and populate a new instance of the
    263 corresponding ``Decl`` node.  As with types, each declaration node has a
    264 numeric ID that is used to refer to that declaration within the AST file.  In
    265 addition, a lookup table provides a mapping from that numeric ID to the offset
    266 within the precompiled header where that declaration is described.
    267 
    268 Declarations in Clang's abstract syntax trees are stored hierarchically.  At
    269 the top of the hierarchy is the translation unit (``TranslationUnitDecl``),
    270 which contains all of the declarations in the translation unit but is not
    271 actually written as a specific declaration node.  Its child declarations (such
    272 as functions or struct types) may also contain other declarations inside them,
    273 and so on.  Within Clang, each declaration is stored within a :ref:`declaration
    274 context <DeclContext>`, as represented by the ``DeclContext`` class.
    275 Declaration contexts provide the mechanism to perform name lookup within a
    276 given declaration (e.g., find the member named ``x`` in a structure) and
    277 iterate over the declarations stored within a context (e.g., iterate over all
    278 of the fields of a structure for structure layout).
    279 
    280 In Clang's AST file format, deserializing a declaration that is a
    281 ``DeclContext`` is a separate operation from deserializing all of the
    282 declarations stored within that declaration context.  Therefore, Clang will
    283 deserialize the translation unit declaration without deserializing the
    284 declarations within that translation unit.  When required, the declarations
    285 stored within a declaration context will be deserialized.  There are two
    286 representations of the declarations within a declaration context, which
    287 correspond to the name-lookup and iteration behavior described above:
    288 
    289 * When the front end performs name lookup to find a name ``x`` within a given
    290   declaration context (for example, during semantic analysis of the expression
    291   ``p->x``, where ``p``'s type is defined in the precompiled header), Clang
    292   refers to an on-disk hash table that maps from the names within that
    293   declaration context to the declaration IDs that represent each visible
    294   declaration with that name.  The actual declarations will then be
    295   deserialized to provide the results of name lookup.
    296 * When the front end performs iteration over all of the declarations within a
    297   declaration context, all of those declarations are immediately
    298   de-serialized.  For large declaration contexts (e.g., the translation unit),
    299   this operation is expensive; however, large declaration contexts are not
    300   traversed in normal compilation, since such a traversal is unnecessary.
    301   However, it is common for the code generator and semantic analysis to
    302   traverse declaration contexts for structs, classes, unions, and
    303   enumerations, although those contexts contain relatively few declarations in
    304   the common case.
    305 
    306 Statements and Expressions
    307 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    308 
    309 Statements and expressions are stored in the AST file in both the :ref:`types
    310 <pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
    311 because every statement or expression will be associated with either a type or
    312 declaration.  The actual statement and expression records are stored
    313 immediately following the declaration or type that owns the statement or
    314 expression.  For example, the statement representing the body of a function
    315 will be stored directly following the declaration of the function.
    316 
    317 As with types and declarations, each statement and expression kind in Clang's
    318 abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
    319 record type in the AST file, which contains the serialized representation of
    320 that statement or expression.  Each substatement or subexpression within an
    321 expression is stored as a separate record (which keeps most records to a fixed
    322 size).  Within the AST file, the subexpressions of an expression are stored, in
    323 reverse order, prior to the expression that owns those expression, using a form
    324 of `Reverse Polish Notation
    325 <http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_.  For example, an
    326 expression ``3 - 4 + 5`` would be represented as follows:
    327 
    328 +-----------------------+
    329 | ``IntegerLiteral(5)`` |
    330 +-----------------------+
    331 | ``IntegerLiteral(4)`` |
    332 +-----------------------+
    333 | ``IntegerLiteral(3)`` |
    334 +-----------------------+
    335 | ``IntegerLiteral(-)`` |
    336 +-----------------------+
    337 | ``IntegerLiteral(+)`` |
    338 +-----------------------+
    339 |       ``STOP``        |
    340 +-----------------------+
    341 
    342 When reading this representation, Clang evaluates each expression record it
    343 encounters, builds the appropriate abstract syntax tree node, and then pushes
    344 that expression on to a stack.  When a record contains *N* subexpressions ---
    345 ``BinaryOperator`` has two of them --- those expressions are popped from the
    346 top of the stack.  The special STOP code indicates that we have reached the end
    347 of a serialized expression or statement; other expression or statement records
    348 may follow, but they are part of a different expression.
    349 
    350 .. _pchinternals-ident-table:
    351 
    352 Identifier Table Block
    353 ^^^^^^^^^^^^^^^^^^^^^^
    354 
    355 The identifier table block contains an on-disk hash table that maps each
    356 identifier mentioned within the AST file to the serialized representation of
    357 the identifier's information (e.g, the ``IdentifierInfo`` structure).  The
    358 serialized representation contains:
    359 
    360 * The actual identifier string.
    361 * Flags that describe whether this identifier is the name of a built-in, a
    362   poisoned identifier, an extension token, or a macro.
    363 * If the identifier names a macro, the offset of the macro definition within
    364   the :ref:`pchinternals-preprocessor`.
    365 * If the identifier names one or more declarations visible from translation
    366   unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
    367   declarations.
    368 
    369 When an AST file is loaded, the AST file reader mechanism introduces itself
    370 into the identifier table as an external lookup source.  Thus, when the user
    371 program refers to an identifier that has not yet been seen, Clang will perform
    372 a lookup into the identifier table.  If an identifier is found, its contents
    373 (macro definitions, flags, top-level declarations, etc.) will be deserialized,
    374 at which point the corresponding ``IdentifierInfo`` structure will have the
    375 same contents it would have after parsing the headers in the AST file.
    376 
    377 Within the AST file, the identifiers used to name declarations are represented
    378 with an integral value.  A separate table provides a mapping from this integral
    379 value (the identifier ID) to the location within the on-disk hash table where
    380 that identifier is stored.  This mapping is used when deserializing the name of
    381 a declaration, the identifier of a token, or any other construct in the AST
    382 file that refers to a name.
    383 
    384 .. _pchinternals-method-pool:
    385 
    386 Method Pool Block
    387 ^^^^^^^^^^^^^^^^^
    388 
    389 The method pool block is represented as an on-disk hash table that serves two
    390 purposes: it provides a mapping from the names of Objective-C selectors to the
    391 set of Objective-C instance and class methods that have that particular
    392 selector (which is required for semantic analysis in Objective-C) and also
    393 stores all of the selectors used by entities within the AST file.  The design
    394 of the method pool is similar to that of the :ref:`identifier table
    395 <pchinternals-ident-table>`: the first time a particular selector is formed
    396 during the compilation of the program, Clang will search in the on-disk hash
    397 table of selectors; if found, Clang will read the Objective-C methods
    398 associated with that selector into the appropriate front-end data structure
    399 (``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
    400 class methods, respectively).
    401 
    402 As with identifiers, selectors are represented by numeric values within the AST
    403 file.  A separate index maps these numeric selector values to the offset of the
    404 selector within the on-disk hash table, and will be used when de-serializing an
    405 Objective-C method declaration (or other Objective-C construct) that refers to
    406 the selector.
    407 
    408 AST Reader Integration Points
    409 -----------------------------
    410 
    411 The "lazy" deserialization behavior of AST files requires their integration
    412 into several completely different submodules of Clang.  For example, lazily
    413 deserializing the declarations during name lookup requires that the name-lookup
    414 routines be able to query the AST file to find entities stored there.
    415 
    416 For each Clang data structure that requires direct interaction with the AST
    417 reader logic, there is an abstract class that provides the interface between
    418 the two modules.  The ``ASTReader`` class, which handles the loading of an AST
    419 file, inherits from all of these abstract classes to provide lazy
    420 deserialization of Clang's data structures.  ``ASTReader`` implements the
    421 following abstract classes:
    422 
    423 ``ExternalSLocEntrySource``
    424   This abstract interface is associated with the ``SourceManager`` class, and
    425   is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
    426   load the details of a file, buffer, or macro instantiation.
    427 
    428 ``IdentifierInfoLookup``
    429   This abstract interface is associated with the ``IdentifierTable`` class, and
    430   is used whenever the program source refers to an identifier that has not yet
    431   been seen.  In this case, the AST reader searches for this identifier within
    432   its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
    433   declarations or macros associated with that identifier.
    434 
    435 ``ExternalASTSource``
    436   This abstract interface is associated with the ``ASTContext`` class, and is
    437   used whenever the abstract syntax tree nodes need to loaded from the AST
    438   file.  It provides the ability to de-serialize declarations and types
    439   identified by their numeric values, read the bodies of functions when
    440   required, and read the declarations stored within a declaration context
    441   (either for iteration or for name lookup).
    442 
    443 ``ExternalSemaSource``
    444   This abstract interface is associated with the ``Sema`` class, and is used
    445   whenever semantic analysis needs to read information from the :ref:`global
    446   method pool <pchinternals-method-pool>`.
    447 
    448 .. _pchinternals-chained:
    449 
    450 Chained precompiled headers
    451 ---------------------------
    452 
    453 Chained precompiled headers were initially intended to improve the performance
    454 of IDE-centric operations such as syntax highlighting and code completion while
    455 a particular source file is being edited by the user.  To minimize the amount
    456 of reparsing required after a change to the file, a form of precompiled header
    457 --- called a precompiled *preamble* --- is automatically generated by parsing
    458 all of the headers in the source file, up to and including the last
    459 ``#include``.  When only the source file changes (and none of the headers it
    460 depends on), reparsing of that source file can use the precompiled preamble and
    461 start parsing after the ``#include``\ s, so parsing time is proportional to the
    462 size of the source file (rather than all of its includes).  However, the
    463 compilation of that translation unit may already use a precompiled header: in
    464 this case, Clang will create the precompiled preamble as a chained precompiled
    465 header that refers to the original precompiled header.  This drastically
    466 reduces the time needed to serialize the precompiled preamble for use in
    467 reparsing.
    468 
    469 Chained precompiled headers get their name because each precompiled header can
    470 depend on one other precompiled header, forming a chain of dependencies.  A
    471 translation unit will then include the precompiled header that starts the chain
    472 (i.e., nothing depends on it).  This linearity of dependencies is important for
    473 the semantic model of chained precompiled headers, because the most-recent
    474 precompiled header can provide information that overrides the information
    475 provided by the precompiled headers it depends on, just like a header file
    476 ``B.h`` that includes another header ``A.h`` can modify the state produced by
    477 parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
    478 
    479 There are several ways in which chained precompiled headers generalize the AST
    480 file model:
    481 
    482 Numbering of IDs
    483   Many different kinds of entities --- identifiers, declarations, types, etc.
    484   --- have ID numbers that start at 1 or some other predefined constant and
    485   grow upward.  Each precompiled header records the maximum ID number it has
    486   assigned in each category.  Then, when a new precompiled header is generated
    487   that depends on (chains to) another precompiled header, it will start
    488   counting at the next available ID number.  This way, one can determine, given
    489   an ID number, which AST file actually contains the entity.
    490 
    491 Name lookup
    492   When writing a chained precompiled header, Clang attempts to write only
    493   information that has changed from the precompiled header on which it is
    494   based.  This changes the lookup algorithm for the various tables, such as the
    495   :ref:`identifier table <pchinternals-ident-table>`: the search starts at the
    496   most-recent precompiled header.  If no entry is found, lookup then proceeds
    497   to the identifier table in the precompiled header it depends on, and so one.
    498   Once a lookup succeeds, that result is considered definitive, overriding any
    499   results from earlier precompiled headers.
    500 
    501 Update records
    502   There are various ways in which a later precompiled header can modify the
    503   entities described in an earlier precompiled header.  For example, later
    504   precompiled headers can add entries into the various name-lookup tables for
    505   the translation unit or namespaces, or add new categories to an Objective-C
    506   class.  Each of these updates is captured in an "update record" that is
    507   stored in the chained precompiled header file and will be loaded along with
    508   the original entity.
    509 
    510 .. _pchinternals-modules:
    511 
    512 Modules
    513 -------
    514 
    515 Modules generalize the chained precompiled header model yet further, from a
    516 linear chain of precompiled headers to an arbitrary directed acyclic graph
    517 (DAG) of AST files.  All of the same techniques used to make chained
    518 precompiled headers work --- ID number, name lookup, update records --- are
    519 shared with modules.  However, the DAG nature of modules introduce a number of
    520 additional complications to the model:
    521 
    522 Numbering of IDs
    523   The simple, linear numbering scheme used in chained precompiled headers falls
    524   apart with the module DAG, because different modules may end up with
    525   different numbering schemes for entities they imported from common shared
    526   modules.  To account for this, each module file provides information about
    527   which modules it depends on and which ID numbers it assigned to the entities
    528   in those modules, as well as which ID numbers it took for its own new
    529   entities.  The AST reader then maps these "local" ID numbers into a "global"
    530   ID number space for the current translation unit, providing a 1-1 mapping
    531   between entities (in whatever AST file they inhabit) and global ID numbers.
    532   If that translation unit is then serialized into an AST file, this mapping
    533   will be stored for use when the AST file is imported.
    534 
    535 Declaration merging
    536   It is possible for a given entity (from the language's perspective) to be
    537   declared multiple times in different places.  For example, two different
    538   headers can have the declaration of ``printf`` or could forward-declare
    539   ``struct stat``.  If each of those headers is included in a module, and some
    540   third party imports both of those modules, there is a potentially serious
    541   problem: name lookup for ``printf`` or ``struct stat`` will find both
    542   declarations, but the AST nodes are unrelated.  This would result in a
    543   compilation error, due to an ambiguity in name lookup.  Therefore, the AST
    544   reader performs declaration merging according to the appropriate language
    545   semantics, ensuring that the two disjoint declarations are merged into a
    546   single redeclaration chain (with a common canonical declaration), so that it
    547   is as if one of the headers had been included before the other.
    548 
    549 Name Visibility
    550   Modules allow certain names that occur during module creation to be "hidden",
    551   so that they are not part of the public interface of the module and are not
    552   visible to its clients.  The AST reader maintains a "visible" bit on various
    553   AST nodes (declarations, macros, etc.) to indicate whether that particular
    554   AST node is currently visible; the various name lookup mechanisms in Clang
    555   inspect the visible bit to determine whether that entity, which is still in
    556   the AST (because other, visible AST nodes may depend on it), can actually be
    557   found by name lookup.  When a new (sub)module is imported, it may make
    558   existing, non-visible, already-deserialized AST nodes visible; it is the
    559   responsibility of the AST reader to find and update these AST nodes when it
    560   is notified of the import.
    561 
    562