Home | History | Annotate | Download | only in docs
      1 ================================
      2 Source Level Debugging with LLVM
      3 ================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 This document is the central repository for all information pertaining to debug
     12 information in LLVM.  It describes the :ref:`actual format that the LLVM debug
     13 information takes <format>`, which is useful for those interested in creating
     14 front-ends or dealing directly with the information.  Further, this document
     15 provides specific examples of what debug information for C/C++ looks like.
     16 
     17 Philosophy behind LLVM debugging information
     18 --------------------------------------------
     19 
     20 The idea of the LLVM debugging information is to capture how the important
     21 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
     22 Several design aspects have shaped the solution that appears here.  The
     23 important ones are:
     24 
     25 * Debugging information should have very little impact on the rest of the
     26   compiler.  No transformations, analyses, or code generators should need to
     27   be modified because of debugging information.
     28 
     29 * LLVM optimizations should interact in :ref:`well-defined and easily described
     30   ways <intro_debugopt>` with the debugging information.
     31 
     32 * Because LLVM is designed to support arbitrary programming languages,
     33   LLVM-to-LLVM tools should not need to know anything about the semantics of
     34   the source-level-language.
     35 
     36 * Source-level languages are often **widely** different from one another.
     37   LLVM should not put any restrictions of the flavor of the source-language,
     38   and the debugging information should work with any language.
     39 
     40 * With code generator support, it should be possible to use an LLVM compiler
     41   to compile a program to native machine code and standard debugging
     42   formats.  This allows compatibility with traditional machine-code level
     43   debuggers, like GDB or DBX.
     44 
     45 The approach used by the LLVM implementation is to use a small set of
     46 :ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
     47 between LLVM program objects and the source-level objects.  The description of
     48 the source-level program is maintained in LLVM metadata in an
     49 :ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
     50 currently uses working draft 7 of the `DWARF 3 standard
     51 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
     52 
     53 When a program is being debugged, a debugger interacts with the user and turns
     54 the stored debug information into source-language specific information.  As
     55 such, a debugger must be aware of the source-language, and is thus tied to a
     56 specific language or family of languages.
     57 
     58 Debug information consumers
     59 ---------------------------
     60 
     61 The role of debug information is to provide meta information normally stripped
     62 away during the compilation process.  This meta information provides an LLVM
     63 user a relationship between generated code and the original program source
     64 code.
     65 
     66 Currently, debug information is consumed by DwarfDebug to produce dwarf
     67 information used by the gdb debugger.  Other targets could use the same
     68 information to produce stabs or other debug forms.
     69 
     70 It would also be reasonable to use debug information to feed profiling tools
     71 for analysis of generated code, or, tools for reconstructing the original
     72 source from generated code.
     73 
     74 TODO - expound a bit more.
     75 
     76 .. _intro_debugopt:
     77 
     78 Debugging optimized code
     79 ------------------------
     80 
     81 An extremely high priority of LLVM debugging information is to make it interact
     82 well with optimizations and analysis.  In particular, the LLVM debug
     83 information provides the following guarantees:
     84 
     85 * LLVM debug information **always provides information to accurately read
     86   the source-level state of the program**, regardless of which LLVM
     87   optimizations have been run, and without any modification to the
     88   optimizations themselves.  However, some optimizations may impact the
     89   ability to modify the current state of the program with a debugger, such
     90   as setting program variables, or calling functions that have been
     91   deleted.
     92 
     93 * As desired, LLVM optimizations can be upgraded to be aware of the LLVM
     94   debugging information, allowing them to update the debugging information
     95   as they perform aggressive optimizations.  This means that, with effort,
     96   the LLVM optimizers could optimize debug code just as well as non-debug
     97   code.
     98 
     99 * LLVM debug information does not prevent optimizations from
    100   happening (for example inlining, basic block reordering/merging/cleanup,
    101   tail duplication, etc).
    102 
    103 * LLVM debug information is automatically optimized along with the rest of
    104   the program, using existing facilities.  For example, duplicate
    105   information is automatically merged by the linker, and unused information
    106   is automatically removed.
    107 
    108 Basically, the debug information allows you to compile a program with
    109 "``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
    110 the program as it executes from a debugger.  Compiling a program with
    111 "``-O3 -g``" gives you full debug information that is always available and
    112 accurate for reading (e.g., you get accurate stack traces despite tail call
    113 elimination and inlining), but you might lose the ability to modify the program
    114 and call functions where were optimized out of the program, or inlined away
    115 completely.
    116 
    117 :ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test
    118 optimizer's handling of debugging information.  It can be run like this:
    119 
    120 .. code-block:: bash
    121 
    122   % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
    123   % make TEST=dbgopt
    124 
    125 This will test impact of debugging information on optimization passes.  If
    126 debugging information influences optimization passes then it will be reported
    127 as a failure.  See :doc:`TestingGuide` for more information on LLVM test
    128 infrastructure and how to run various tests.
    129 
    130 .. _format:
    131 
    132 Debugging information format
    133 ============================
    134 
    135 LLVM debugging information has been carefully designed to make it possible for
    136 the optimizer to optimize the program and debugging information without
    137 necessarily having to know anything about debugging information.  In
    138 particular, the use of metadata avoids duplicated debugging information from
    139 the beginning, and the global dead code elimination pass automatically deletes
    140 debugging information for a function if it decides to delete the function.
    141 
    142 To do this, most of the debugging information (descriptors for types,
    143 variables, functions, source files, etc) is inserted by the language front-end
    144 in the form of LLVM metadata.
    145 
    146 Debug information is designed to be agnostic about the target debugger and
    147 debugging information representation (e.g. DWARF/Stabs/etc).  It uses a generic
    148 pass to decode the information that represents variables, types, functions,
    149 namespaces, etc: this allows for arbitrary source-language semantics and
    150 type-systems to be used, as long as there is a module written for the target
    151 debugger to interpret the information.
    152 
    153 To provide basic functionality, the LLVM debugger does have to make some
    154 assumptions about the source-level language being debugged, though it keeps
    155 these to a minimum.  The only common features that the LLVM debugger assumes
    156 exist are :ref:`source files <format_files>`, and :ref:`program objects
    157 <format_global_variables>`.  These abstract objects are used by a debugger to
    158 form stack traces, show information about local variables, etc.
    159 
    160 This section of the documentation first describes the representation aspects
    161 common to any source-language.  :ref:`ccxx_frontend` describes the data layout
    162 conventions used by the C and C++ front-ends.
    163 
    164 Debug information descriptors
    165 -----------------------------
    166 
    167 In consideration of the complexity and volume of debug information, LLVM
    168 provides a specification for well formed debug descriptors.
    169 
    170 Consumers of LLVM debug information expect the descriptors for program objects
    171 to start in a canonical format, but the descriptors can include additional
    172 information appended at the end that is source-language specific.  All debugging
    173 information objects start with a tag to indicate what type of object it is.
    174 The source-language is allowed to define its own objects, by using unreserved
    175 tag numbers.  We recommend using with tags in the range 0x1000 through 0x2000
    176 (there is a defined ``enum DW_TAG_user_base = 0x1000``.)
    177 
    178 The fields of debug descriptors used internally by LLVM are restricted to only
    179 the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and
    180 ``mdnode``.
    181 
    182 .. code-block:: llvm
    183 
    184   !1 = metadata !{
    185     i32,   ;; A tag
    186     ...
    187   }
    188 
    189 <a name="LLVMDebugVersion">The first field of a descriptor is always an
    190 ``i32`` containing a tag value identifying the content of the descriptor.
    191 The remaining fields are specific to the descriptor.  The values of tags are
    192 loosely bound to the tag values of DWARF information entries.  However, that
    193 does not restrict the use of the information supplied to DWARF targets.
    194 
    195 The details of the various descriptors follow.
    196 
    197 Compile unit descriptors
    198 ^^^^^^^^^^^^^^^^^^^^^^^^
    199 
    200 .. code-block:: llvm
    201 
    202   !0 = metadata !{
    203     i32,       ;; Tag = 17 (DW_TAG_compile_unit)
    204     metadata,  ;; Source directory (including trailing slash) & file pair
    205     i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
    206     metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    207     i1,        ;; True if this is optimized.
    208     metadata,  ;; Flags
    209     i32        ;; Runtime version
    210     metadata   ;; List of enums types
    211     metadata   ;; List of retained types
    212     metadata   ;; List of subprograms
    213     metadata   ;; List of global variables
    214     metadata   ;; List of imported entities
    215     metadata   ;; Split debug filename
    216   }
    217 
    218 These descriptors contain a source language ID for the file (we use the DWARF
    219 3.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``,
    220 ``DW_LANG_Cobol74``, etc), a reference to a metadata node containing a pair of
    221 strings for the source file name and the working directory, as well as an
    222 identifier string for the compiler that produced it.
    223 
    224 Compile unit descriptors provide the root context for objects declared in a
    225 specific compilation unit.  File descriptors are defined using this context.
    226 These descriptors are collected by a named metadata ``!llvm.dbg.cu``.  They
    227 keep track of subprograms, global variables, type information, and imported
    228 entities (declarations and namespaces).
    229 
    230 .. _format_files:
    231 
    232 File descriptors
    233 ^^^^^^^^^^^^^^^^
    234 
    235 .. code-block:: llvm
    236 
    237   !0 = metadata !{
    238     i32,      ;; Tag = 41 (DW_TAG_file_type)
    239     metadata, ;; Source directory (including trailing slash) & file pair
    240   }
    241 
    242 These descriptors contain information for a file.  Global variables and top
    243 level functions would be defined using this context.  File descriptors also
    244 provide context for source line correspondence.
    245 
    246 Each input file is encoded as a separate file descriptor in LLVM debugging
    247 information output.
    248 
    249 .. _format_global_variables:
    250 
    251 Global variable descriptors
    252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    253 
    254 .. code-block:: llvm
    255 
    256   !1 = metadata !{
    257     i32,      ;; Tag = 52 (DW_TAG_variable)
    258     i32,      ;; Unused field.
    259     metadata, ;; Reference to context descriptor
    260     metadata, ;; Name
    261     metadata, ;; Display name (fully qualified C++ name)
    262     metadata, ;; MIPS linkage name (for C++)
    263     metadata, ;; Reference to file where defined
    264     i32,      ;; Line number where defined
    265     metadata, ;; Reference to type descriptor
    266     i1,       ;; True if the global is local to compile unit (static)
    267     i1,       ;; True if the global is defined in the compile unit (not extern)
    268     {}*,      ;; Reference to the global variable
    269     metadata, ;; The static member declaration, if any
    270   }
    271 
    272 These descriptors provide debug information about global variables.  They
    273 provide details such as name, type and where the variable is defined.  All
    274 global variables are collected inside the named metadata ``!llvm.dbg.cu``.
    275 
    276 .. _format_subprograms:
    277 
    278 Subprogram descriptors
    279 ^^^^^^^^^^^^^^^^^^^^^^
    280 
    281 .. code-block:: llvm
    282 
    283   !2 = metadata !{
    284     i32,      ;; Tag = 46 (DW_TAG_subprogram)
    285     metadata, ;; Source directory (including trailing slash) & file pair
    286     metadata, ;; Reference to context descriptor
    287     metadata, ;; Name
    288     metadata, ;; Display name (fully qualified C++ name)
    289     metadata, ;; MIPS linkage name (for C++)
    290     i32,      ;; Line number where defined
    291     metadata, ;; Reference to type descriptor
    292     i1,       ;; True if the global is local to compile unit (static)
    293     i1,       ;; True if the global is defined in the compile unit (not extern)
    294     i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
    295     i32,      ;; Index into a virtual function
    296     metadata, ;; indicates which base type contains the vtable pointer for the
    297               ;; derived class
    298     i32,      ;; Flags - Artificial, Private, Protected, Explicit, Prototyped.
    299     i1,       ;; isOptimized
    300     {}*,      ;; Reference to the LLVM function
    301     metadata, ;; Lists function template parameters
    302     metadata, ;; Function declaration descriptor
    303     metadata, ;; List of function variables
    304     i32       ;; Line number where the scope of the subprogram begins
    305   }
    306 
    307 These descriptors provide debug information about functions, methods and
    308 subprograms.  They provide details such as name, return types and the source
    309 location where the subprogram is defined.
    310 
    311 Block descriptors
    312 ^^^^^^^^^^^^^^^^^
    313 
    314 .. code-block:: llvm
    315 
    316   !3 = metadata !{
    317     i32,      ;; Tag = 11 (DW_TAG_lexical_block)
    318     metadata, ;; Source directory (including trailing slash) & file pair
    319     metadata, ;; Reference to context descriptor
    320     i32,      ;; Line number
    321     i32,      ;; Column number
    322     i32,      ;; DWARF path discriminator value
    323     i32       ;; Unique ID to identify blocks from a template function
    324   }
    325 
    326 This descriptor provides debug information about nested blocks within a
    327 subprogram.  The line number and column numbers are used to dinstinguish two
    328 lexical blocks at same depth.
    329 
    330 .. code-block:: llvm
    331 
    332   !3 = metadata !{
    333     i32,      ;; Tag = 11 (DW_TAG_lexical_block)
    334     metadata, ;; Source directory (including trailing slash) & file pair
    335     metadata  ;; Reference to the scope we're annotating with a file change
    336   }
    337 
    338 This descriptor provides a wrapper around a lexical scope to handle file
    339 changes in the middle of a lexical block.
    340 
    341 .. _format_basic_type:
    342 
    343 Basic type descriptors
    344 ^^^^^^^^^^^^^^^^^^^^^^
    345 
    346 .. code-block:: llvm
    347 
    348   !4 = metadata !{
    349     i32,      ;; Tag = 36 (DW_TAG_base_type)
    350     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
    351     metadata, ;; Reference to context
    352     metadata, ;; Name (may be "" for anonymous types)
    353     i32,      ;; Line number where defined (may be 0)
    354     i64,      ;; Size in bits
    355     i64,      ;; Alignment in bits
    356     i64,      ;; Offset in bits
    357     i32,      ;; Flags
    358     i32       ;; DWARF type encoding
    359   }
    360 
    361 These descriptors define primitive types used in the code.  Example ``int``,
    362 ``bool`` and ``float``.  The context provides the scope of the type, which is
    363 usually the top level.  Since basic types are not usually user defined the
    364 context and line number can be left as NULL and 0.  The size, alignment and
    365 offset are expressed in bits and can be 64 bit values.  The alignment is used
    366 to round the offset when embedded in a :ref:`composite type
    367 <format_composite_type>` (example to keep float doubles on 64 bit boundaries).
    368 The offset is the bit offset if embedded in a :ref:`composite type
    369 <format_composite_type>`.
    370 
    371 The type encoding provides the details of the type.  The values are typically
    372 one of the following:
    373 
    374 .. code-block:: llvm
    375 
    376   DW_ATE_address       = 1
    377   DW_ATE_boolean       = 2
    378   DW_ATE_float         = 4
    379   DW_ATE_signed        = 5
    380   DW_ATE_signed_char   = 6
    381   DW_ATE_unsigned      = 7
    382   DW_ATE_unsigned_char = 8
    383 
    384 .. _format_derived_type:
    385 
    386 Derived type descriptors
    387 ^^^^^^^^^^^^^^^^^^^^^^^^
    388 
    389 .. code-block:: llvm
    390 
    391   !5 = metadata !{
    392     i32,      ;; Tag (see below)
    393     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
    394     metadata, ;; Reference to context
    395     metadata, ;; Name (may be "" for anonymous types)
    396     i32,      ;; Line number where defined (may be 0)
    397     i64,      ;; Size in bits
    398     i64,      ;; Alignment in bits
    399     i64,      ;; Offset in bits
    400     i32,      ;; Flags to encode attributes, e.g. private
    401     metadata, ;; Reference to type derived from
    402     metadata, ;; (optional) Name of the Objective C property associated with
    403               ;; Objective-C an ivar, or the type of which this
    404               ;; pointer-to-member is pointing to members of.
    405     metadata, ;; (optional) Name of the Objective C property getter selector.
    406     metadata, ;; (optional) Name of the Objective C property setter selector.
    407     i32       ;; (optional) Objective C property attributes.
    408   }
    409 
    410 These descriptors are used to define types derived from other types.  The value
    411 of the tag varies depending on the meaning.  The following are possible tag
    412 values:
    413 
    414 .. code-block:: llvm
    415 
    416   DW_TAG_formal_parameter   = 5
    417   DW_TAG_member             = 13
    418   DW_TAG_pointer_type       = 15
    419   DW_TAG_reference_type     = 16
    420   DW_TAG_typedef            = 22
    421   DW_TAG_ptr_to_member_type = 31
    422   DW_TAG_const_type         = 38
    423   DW_TAG_volatile_type      = 53
    424   DW_TAG_restrict_type      = 55
    425 
    426 ``DW_TAG_member`` is used to define a member of a :ref:`composite type
    427 <format_composite_type>` or :ref:`subprogram <format_subprograms>`.  The type
    428 of the member is the :ref:`derived type <format_derived_type>`.
    429 ``DW_TAG_formal_parameter`` is used to define a member which is a formal
    430 argument of a subprogram.
    431 
    432 ``DW_TAG_typedef`` is used to provide a name for the derived type.
    433 
    434 ``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``,
    435 ``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the
    436 :ref:`derived type <format_derived_type>`.
    437 
    438 :ref:`Derived type <format_derived_type>` location can be determined from the
    439 context and line number.  The size, alignment and offset are expressed in bits
    440 and can be 64 bit values.  The alignment is used to round the offset when
    441 embedded in a :ref:`composite type <format_composite_type>`  (example to keep
    442 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
    443 in a :ref:`composite type <format_composite_type>`.
    444 
    445 Note that the ``void *`` type is expressed as a type derived from NULL.
    446 
    447 .. _format_composite_type:
    448 
    449 Composite type descriptors
    450 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    451 
    452 .. code-block:: llvm
    453 
    454   !6 = metadata !{
    455     i32,      ;; Tag (see below)
    456     metadata, ;; Source directory (including trailing slash) & file pair (may be null)
    457     metadata, ;; Reference to context
    458     metadata, ;; Name (may be "" for anonymous types)
    459     i32,      ;; Line number where defined (may be 0)
    460     i64,      ;; Size in bits
    461     i64,      ;; Alignment in bits
    462     i64,      ;; Offset in bits
    463     i32,      ;; Flags
    464     metadata, ;; Reference to type derived from
    465     metadata, ;; Reference to array of member descriptors
    466     i32,      ;; Runtime languages
    467     metadata, ;; Base type containing the vtable pointer for this type
    468     metadata, ;; Template parameters
    469     metadata  ;; A unique identifier for type uniquing purpose (may be null)
    470   }
    471 
    472 These descriptors are used to define types that are composed of 0 or more
    473 elements.  The value of the tag varies depending on the meaning.  The following
    474 are possible tag values:
    475 
    476 .. code-block:: llvm
    477 
    478   DW_TAG_array_type       = 1
    479   DW_TAG_enumeration_type = 4
    480   DW_TAG_structure_type   = 19
    481   DW_TAG_union_type       = 23
    482   DW_TAG_subroutine_type  = 21
    483   DW_TAG_inheritance      = 28
    484 
    485 The vector flag indicates that an array type is a native packed vector.
    486 
    487 The members of array types (tag = ``DW_TAG_array_type``) are
    488 :ref:`subrange descriptors <format_subrange>`, each
    489 representing the range of subscripts at that level of indexing.
    490 
    491 The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are
    492 :ref:`enumerator descriptors <format_enumerator>`, each representing the
    493 definition of enumeration value for the set.  All enumeration type descriptors
    494 are collected inside the named metadata ``!llvm.dbg.cu``.
    495 
    496 The members of structure (tag = ``DW_TAG_structure_type``) or union (tag =
    497 ``DW_TAG_union_type``) types are any one of the :ref:`basic
    498 <format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite
    499 <format_composite_type>` type descriptors, each representing a field member of
    500 the structure or union.
    501 
    502 For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide
    503 information about base classes, static members and member functions.  If a
    504 member is a :ref:`derived type descriptor <format_derived_type>` and has a tag
    505 of ``DW_TAG_inheritance``, then the type represents a base class.  If the member
    506 of is a :ref:`global variable descriptor <format_global_variables>` then it
    507 represents a static member.  And, if the member is a :ref:`subprogram
    508 descriptor <format_subprograms>` then it represents a member function.  For
    509 static members and member functions, ``getName()`` returns the members link or
    510 the C++ mangled name.  ``getDisplayName()`` the simplied version of the name.
    511 
    512 The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements
    513 is the return type for the subroutine.  The remaining elements are the formal
    514 arguments to the subroutine.
    515 
    516 :ref:`Composite type <format_composite_type>` location can be determined from
    517 the context and line number.  The size, alignment and offset are expressed in
    518 bits and can be 64 bit values.  The alignment is used to round the offset when
    519 embedded in a :ref:`composite type <format_composite_type>` (as an example, to
    520 keep float doubles on 64 bit boundaries).  The offset is the bit offset if
    521 embedded in a :ref:`composite type <format_composite_type>`.
    522 
    523 .. _format_subrange:
    524 
    525 Subrange descriptors
    526 ^^^^^^^^^^^^^^^^^^^^
    527 
    528 .. code-block:: llvm
    529 
    530   !42 = metadata !{
    531     i32,      ;; Tag = 33 (DW_TAG_subrange_type)
    532     i64,      ;; Low value
    533     i64       ;; High value
    534   }
    535 
    536 These descriptors are used to define ranges of array subscripts for an array
    537 :ref:`composite type <format_composite_type>`.  The low value defines the lower
    538 bounds typically zero for C/C++.  The high value is the upper bounds.  Values
    539 are 64 bit.  ``High - Low + 1`` is the size of the array.  If ``Low > High``
    540 the array bounds are not included in generated debugging information.
    541 
    542 .. _format_enumerator:
    543 
    544 Enumerator descriptors
    545 ^^^^^^^^^^^^^^^^^^^^^^
    546 
    547 .. code-block:: llvm
    548 
    549   !6 = metadata !{
    550     i32,      ;; Tag = 40 (DW_TAG_enumerator)
    551     metadata, ;; Name
    552     i64       ;; Value
    553   }
    554 
    555 These descriptors are used to define members of an enumeration :ref:`composite
    556 type <format_composite_type>`, it associates the name to the value.
    557 
    558 Local variables
    559 ^^^^^^^^^^^^^^^
    560 
    561 .. code-block:: llvm
    562 
    563   !7 = metadata !{
    564     i32,      ;; Tag (see below)
    565     metadata, ;; Context
    566     metadata, ;; Name
    567     metadata, ;; Reference to file where defined
    568     i32,      ;; 24 bit - Line number where defined
    569               ;; 8 bit - Argument number. 1 indicates 1st argument.
    570     metadata, ;; Reference to the type descriptor
    571     i32,      ;; flags
    572     metadata  ;; (optional) Reference to inline location
    573     metadata  ;; (optional) Reference to a complex expression (see below)
    574   }
    575 
    576 These descriptors are used to define variables local to a sub program.  The
    577 value of the tag depends on the usage of the variable:
    578 
    579 .. code-block:: llvm
    580 
    581   DW_TAG_auto_variable   = 256
    582   DW_TAG_arg_variable    = 257
    583 
    584 An auto variable is any variable declared in the body of the function.  An
    585 argument variable is any variable that appears as a formal argument to the
    586 function.
    587 
    588 The context is either the subprogram or block where the variable is defined.
    589 Name the source variable name.  Context and line indicate where the variable
    590 was defined.  Type descriptor defines the declared type of the variable.
    591 
    592 .. _format_common_intrinsics:
    593 
    594 Debugger intrinsic functions
    595 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    596 
    597 LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
    598 provide debug information at various points in generated code.
    599 
    600 ``llvm.dbg.declare``
    601 ^^^^^^^^^^^^^^^^^^^^
    602 
    603 .. code-block:: llvm
    604 
    605   void %llvm.dbg.declare(metadata, metadata)
    606 
    607 This intrinsic provides information about a local element (e.g., variable).
    608 The first argument is metadata holding the alloca for the variable.  The second
    609 argument is metadata containing a description of the variable.
    610 
    611 ``llvm.dbg.value``
    612 ^^^^^^^^^^^^^^^^^^
    613 
    614 .. code-block:: llvm
    615 
    616   void %llvm.dbg.value(metadata, i64, metadata)
    617 
    618 This intrinsic provides information when a user source variable is set to a new
    619 value.  The first argument is the new value (wrapped as metadata).  The second
    620 argument is the offset in the user source variable where the new value is
    621 written.  The third argument is metadata containing a description of the user
    622 source variable.
    623 
    624 Object lifetimes and scoping
    625 ============================
    626 
    627 In many languages, the local variables in functions can have their lifetimes or
    628 scopes limited to a subset of a function.  In the C family of languages, for
    629 example, variables are only live (readable and writable) within the source
    630 block that they are defined in.  In functional languages, values are only
    631 readable after they have been defined.  Though this is a very obvious concept,
    632 it is non-trivial to model in LLVM, because it has no notion of scoping in this
    633 sense, and does not want to be tied to a language's scoping rules.
    634 
    635 In order to handle this, the LLVM debug format uses the metadata attached to
    636 llvm instructions to encode line number and scoping information.  Consider the
    637 following C fragment, for example:
    638 
    639 .. code-block:: c
    640 
    641   1.  void foo() {
    642   2.    int X = 21;
    643   3.    int Y = 22;
    644   4.    {
    645   5.      int Z = 23;
    646   6.      Z = X;
    647   7.    }
    648   8.    X = Y;
    649   9.  }
    650 
    651 Compiled to LLVM, this function would be represented like this:
    652 
    653 .. code-block:: llvm
    654 
    655   define void @foo() #0 {
    656   entry:
    657    %X = alloca i32, align 4
    658     %Y = alloca i32, align 4
    659     %Z = alloca i32, align 4
    660     call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
    661       ; [debug line = 2:7] [debug variable = X]
    662     store i32 21, i32* %X, align 4, !dbg !12
    663     call void @llvm.dbg.declare(metadata !{i32* %Y}, metadata !13), !dbg !14
    664       ; [debug line = 3:7] [debug variable = Y]
    665     store i32 22, i32* %Y, align 4, !dbg !14
    666     call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17
    667       ; [debug line = 5:9] [debug variable = Z]
    668     store i32 23, i32* %Z, align 4, !dbg !17
    669     %0 = load i32* %X, align 4, !dbg !18
    670       [debug line = 6:5]
    671     store i32 %0, i32* %Z, align 4, !dbg !18
    672     %1 = load i32* %Y, align 4, !dbg !19
    673       [debug line = 8:3]
    674     store i32 %1, i32* %X, align 4, !dbg !19
    675     ret void, !dbg !20
    676   }
    677 
    678   ; Function Attrs: nounwind readnone
    679   declare void @llvm.dbg.declare(metadata, metadata) #1
    680 
    681   attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false"
    682     "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
    683     "no-infs-fp-math"="false" "no-nans-fp-math"="false"
    684     "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
    685     "use-soft-float"="false" }
    686   attributes #1 = { nounwind readnone }
    687 
    688   !llvm.dbg.cu = !{!0}
    689   !llvm.module.flags = !{!8}
    690   !llvm.ident = !{!9}
    691 
    692   !0 = metadata !{i32 786449, metadata !1, i32 12,
    693                   metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)",
    694                   i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3,
    695                   metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] \
    696                     [/private/tmp/foo.c] \
    697                     [DW_LANG_C99]
    698   !1 = metadata !{metadata !"t.c", metadata !"/private/tmp"}
    699   !2 = metadata !{i32 0}
    700   !3 = metadata !{metadata !4}
    701   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
    702                   metadata !"foo", metadata !"", i32 1, metadata !6,
    703                   i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
    704                   void ()* @foo, null, null, metadata !2, i32 1}
    705                   ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
    706   !5 = metadata !{i32 786473, metadata !1}  ; [ DW_TAG_file_type ] \
    707                     [/private/tmp/t.c]
    708   !6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0,
    709                   i64 0, i32 0, null, metadata !7, i32 0, null, null, null}
    710                   ; [ DW_TAG_subroutine_type ] \
    711                     [line 0, size 0, align 0, offset 0] [from ]
    712   !7 = metadata !{null}
    713   !8 = metadata !{i32 2, metadata !"Dwarf Version", i32 2}
    714   !9 = metadata !{metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)"}
    715   !10 = metadata !{i32 786688, metadata !4, metadata !"X", metadata !5, i32 2,
    716                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [X] \
    717                      [line 2]
    718   !11 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32,
    719                    i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] \
    720                      [line 0, size 32, align 32, offset 0, enc DW_ATE_signed]
    721   !12 = metadata !{i32 2, i32 0, metadata !4, null}
    722   !13 = metadata !{i32 786688, metadata !4, metadata !"Y", metadata !5, i32 3,
    723                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Y] \
    724                      [line 3]
    725   !14 = metadata !{i32 3, i32 0, metadata !4, null}
    726   !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5,
    727                    metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \
    728                      [line 5]
    729   !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
    730                    i32 0} \
    731                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
    732   !17 = metadata !{i32 5, i32 0, metadata !16, null}
    733   !18 = metadata !{i32 6, i32 0, metadata !16, null}
    734   !19 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ]
    735   !20 = metadata !{i32 9, i32 0, metadata !4, null}
    736 
    737 This example illustrates a few important details about LLVM debugging
    738 information.  In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
    739 location information, which are attached to an instruction, are applied
    740 together to allow a debugger to analyze the relationship between statements,
    741 variable definitions, and the code used to implement the function.
    742 
    743 .. code-block:: llvm
    744 
    745   call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12
    746     ; [debug line = 2:7] [debug variable = X]
    747 
    748 The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
    749 variable ``X``.  The metadata ``!dbg !12`` attached to the intrinsic provides
    750 scope information for the variable ``X``.
    751 
    752 .. code-block:: llvm
    753 
    754   !12 = metadata !{i32 2, i32 0, metadata !4, null}
    755   !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo",
    756                   metadata !"foo", metadata !"", i32 1, metadata !6,
    757                   i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false,
    758                   void ()* @foo, null, null, metadata !2, i32 1}
    759                     ; [ DW_TAG_subprogram ] [line 1] [def] [foo]
    760 
    761 Here ``!12`` is metadata providing location information.  It has four fields:
    762 line number, column number, scope, and original scope.  The original scope
    763 represents inline location if this instruction is inlined inside a caller, and
    764 is null otherwise.  In this example, scope is encoded by ``!4``, a
    765 :ref:`subprogram descriptor <format_subprograms>`.  This way the location
    766 information attached to the intrinsics indicates that the variable ``X`` is
    767 declared at line number 2 at a function level scope in function ``foo``.
    768 
    769 Now lets take another example.
    770 
    771 .. code-block:: llvm
    772 
    773   call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17
    774     ; [debug line = 5:9] [debug variable = Z]
    775 
    776 The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
    777 variable ``Z``.  The metadata ``!dbg !17`` attached to the intrinsic provides
    778 scope information for the variable ``Z``.
    779 
    780 .. code-block:: llvm
    781 
    782   !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0,
    783                    i32 0}
    784                    ; [ DW_TAG_lexical_block ] [/private/tmp/t.c]
    785   !17 = metadata !{i32 5, i32 0, metadata !16, null}
    786 
    787 Here ``!15`` indicates that ``Z`` is declared at line number 5 and
    788 column number 0 inside of lexical scope ``!16``.  The lexical scope itself
    789 resides inside of subprogram ``!4`` described above.
    790 
    791 The scope information attached with each instruction provides a straightforward
    792 way to find instructions covered by a scope.
    793 
    794 .. _ccxx_frontend:
    795 
    796 C/C++ front-end specific debug information
    797 ==========================================
    798 
    799 The C and C++ front-ends represent information about the program in a format
    800 that is effectively identical to `DWARF 3.0
    801 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
    802 content.  This allows code generators to trivially support native debuggers by
    803 generating standard dwarf information, and contains enough information for
    804 non-dwarf targets to translate it as needed.
    805 
    806 This section describes the forms used to represent C and C++ programs.  Other
    807 languages could pattern themselves after this (which itself is tuned to
    808 representing programs in the same way that DWARF 3 does), or they could choose
    809 to provide completely different forms if they don't fit into the DWARF model.
    810 As support for debugging information gets added to the various LLVM
    811 source-language front-ends, the information used should be documented here.
    812 
    813 The following sections provide examples of various C/C++ constructs and the
    814 debug information that would best describe those constructs.
    815 
    816 C/C++ source file information
    817 -----------------------------
    818 
    819 Given the source files ``MySource.cpp`` and ``MyHeader.h`` located in the
    820 directory ``/Users/mine/sources``, the following code:
    821 
    822 .. code-block:: c
    823 
    824   #include "MyHeader.h"
    825 
    826   int main(int argc, char *argv[]) {
    827     return 0;
    828   }
    829 
    830 a C/C++ front-end would generate the following descriptors:
    831 
    832 .. code-block:: llvm
    833 
    834   ...
    835   ;;
    836   ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
    837   ;;
    838   !0 = metadata !{
    839     i32 786449,   ;; Tag
    840     metadata !1,  ;; File/directory name
    841     i32 4,        ;; Language Id
    842     metadata !"clang version 3.4 ",
    843     i1 false,     ;; Optimized compile unit
    844     metadata !"", ;; Compiler flags
    845     i32 0,        ;; Runtime version
    846     metadata !2,  ;; Enumeration types
    847     metadata !2,  ;; Retained types
    848     metadata !3,  ;; Subprograms
    849     metadata !2,  ;; Global variables
    850     metadata !2,  ;; Imported entities (declarations and namespaces)
    851     metadata !""  ;; Split debug filename
    852   }
    853 
    854   ;;
    855   ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
    856   ;;
    857   !1 = metadata !{
    858     metadata !"MySource.cpp",
    859     metadata !"/Users/mine/sources"
    860   }
    861   !5 = metadata !{
    862     i32 786473, ;; Tag
    863     metadata !1
    864   }
    865 
    866   ;;
    867   ;; Define the file for the file "/Users/mine/sources/Myheader.h"
    868   ;;
    869   !14 = metadata !{
    870     i32 786473, ;; Tag
    871     metadata !15
    872   }
    873   !15 = metadata !{
    874     metadata !"./MyHeader.h",
    875     metadata !"/Users/mine/sources",
    876   }
    877 
    878   ...
    879 
    880 ``llvm::Instruction`` provides easy access to metadata attached with an
    881 instruction.  One can extract line number information encoded in LLVM IR using
    882 ``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``.
    883 
    884 .. code-block:: c++
    885 
    886   if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
    887     DILocation Loc(N);                      // DILocation is in DebugInfo.h
    888     unsigned Line = Loc.getLineNumber();
    889     StringRef File = Loc.getFilename();
    890     StringRef Dir = Loc.getDirectory();
    891   }
    892 
    893 C/C++ global variable information
    894 ---------------------------------
    895 
    896 Given an integer global variable declared as follows:
    897 
    898 .. code-block:: c
    899 
    900   int MyGlobal = 100;
    901 
    902 a C/C++ front-end would generate the following descriptors:
    903 
    904 .. code-block:: llvm
    905 
    906   ;;
    907   ;; Define the global itself.
    908   ;;
    909   %MyGlobal = global int 100
    910   ...
    911   ;;
    912   ;; List of debug info of globals
    913   ;;
    914   !llvm.dbg.cu = !{!0}
    915 
    916   ;; Define the compile unit.
    917   !0 = metadata !{
    918     i32 786449,                       ;; Tag
    919     i32 0,                            ;; Context
    920     i32 4,                            ;; Language
    921     metadata !"foo.cpp",              ;; File
    922     metadata !"/Volumes/Data/tmp",    ;; Directory
    923     metadata !"clang version 3.1 ",   ;; Producer
    924     i1 true,                          ;; Deprecated field
    925     i1 false,                         ;; "isOptimized"?
    926     metadata !"",                     ;; Flags
    927     i32 0,                            ;; Runtime Version
    928     metadata !1,                      ;; Enum Types
    929     metadata !1,                      ;; Retained Types
    930     metadata !1,                      ;; Subprograms
    931     metadata !3,                      ;; Global Variables
    932     metadata !1,                      ;; Imported entities
    933     "",                               ;; Split debug filename
    934   } ; [ DW_TAG_compile_unit ]
    935 
    936   ;; The Array of Global Variables
    937   !3 = metadata !{
    938     metadata !4
    939   }
    940 
    941   ;;
    942   ;; Define the global variable itself.
    943   ;;
    944   !4 = metadata !{
    945     i32 786484,                        ;; Tag
    946     i32 0,                             ;; Unused
    947     null,                              ;; Unused
    948     metadata !"MyGlobal",              ;; Name
    949     metadata !"MyGlobal",              ;; Display Name
    950     metadata !"",                      ;; Linkage Name
    951     metadata !6,                       ;; File
    952     i32 1,                             ;; Line
    953     metadata !7,                       ;; Type
    954     i32 0,                             ;; IsLocalToUnit
    955     i32 1,                             ;; IsDefinition
    956     i32* @MyGlobal,                    ;; LLVM-IR Value
    957     null                               ;; Static member declaration
    958   } ; [ DW_TAG_variable ]
    959 
    960   ;;
    961   ;; Define the file
    962   ;;
    963   !5 = metadata !{
    964     metadata !"foo.cpp",               ;; File
    965     metadata !"/Volumes/Data/tmp",     ;; Directory
    966   }
    967   !6 = metadata !{
    968     i32 786473,                        ;; Tag
    969     metadata !5                        ;; Unused
    970   } ; [ DW_TAG_file_type ]
    971 
    972   ;;
    973   ;; Define the type
    974   ;;
    975   !7 = metadata !{
    976     i32 786468,                         ;; Tag
    977     null,                               ;; Unused
    978     null,                               ;; Unused
    979     metadata !"int",                    ;; Name
    980     i32 0,                              ;; Line
    981     i64 32,                             ;; Size in Bits
    982     i64 32,                             ;; Align in Bits
    983     i64 0,                              ;; Offset
    984     i32 0,                              ;; Flags
    985     i32 5                               ;; Encoding
    986   } ; [ DW_TAG_base_type ]
    987 
    988 C/C++ function information
    989 --------------------------
    990 
    991 Given a function declared as follows:
    992 
    993 .. code-block:: c
    994 
    995   int main(int argc, char *argv[]) {
    996     return 0;
    997   }
    998 
    999 a C/C++ front-end would generate the following descriptors:
   1000 
   1001 .. code-block:: llvm
   1002 
   1003   ;;
   1004   ;; Define the anchor for subprograms.
   1005   ;;
   1006   !6 = metadata !{
   1007     i32 786484,        ;; Tag
   1008     metadata !1,       ;; File
   1009     metadata !1,       ;; Context
   1010     metadata !"main",  ;; Name
   1011     metadata !"main",  ;; Display name
   1012     metadata !"main",  ;; Linkage name
   1013     i32 1,             ;; Line number
   1014     metadata !4,       ;; Type
   1015     i1 false,          ;; Is local
   1016     i1 true,           ;; Is definition
   1017     i32 0,             ;; Virtuality attribute, e.g. pure virtual function
   1018     i32 0,             ;; Index into virtual table for C++ methods
   1019     i32 0,             ;; Type that holds virtual table.
   1020     i32 0,             ;; Flags
   1021     i1 false,          ;; True if this function is optimized
   1022     Function *,        ;; Pointer to llvm::Function
   1023     null,              ;; Function template parameters
   1024     null,              ;; List of function variables (emitted when optimizing)
   1025     1                  ;; Line number of the opening '{' of the function
   1026   }
   1027   ;;
   1028   ;; Define the subprogram itself.
   1029   ;;
   1030   define i32 @main(i32 %argc, i8** %argv) {
   1031   ...
   1032   }
   1033 
   1034 C/C++ basic types
   1035 -----------------
   1036 
   1037 The following are the basic type descriptors for C/C++ core types:
   1038 
   1039 bool
   1040 ^^^^
   1041 
   1042 .. code-block:: llvm
   1043 
   1044   !2 = metadata !{
   1045     i32 786468,        ;; Tag
   1046     null,              ;; File
   1047     null,              ;; Context
   1048     metadata !"bool",  ;; Name
   1049     i32 0,             ;; Line number
   1050     i64 8,             ;; Size in Bits
   1051     i64 8,             ;; Align in Bits
   1052     i64 0,             ;; Offset in Bits
   1053     i32 0,             ;; Flags
   1054     i32 2              ;; Encoding
   1055   }
   1056 
   1057 char
   1058 ^^^^
   1059 
   1060 .. code-block:: llvm
   1061 
   1062   !2 = metadata !{
   1063     i32 786468,        ;; Tag
   1064     null,              ;; File
   1065     null,              ;; Context
   1066     metadata !"char",  ;; Name
   1067     i32 0,             ;; Line number
   1068     i64 8,             ;; Size in Bits
   1069     i64 8,             ;; Align in Bits
   1070     i64 0,             ;; Offset in Bits
   1071     i32 0,             ;; Flags
   1072     i32 6              ;; Encoding
   1073   }
   1074 
   1075 unsigned char
   1076 ^^^^^^^^^^^^^
   1077 
   1078 .. code-block:: llvm
   1079 
   1080   !2 = metadata !{
   1081     i32 786468,        ;; Tag
   1082     null,              ;; File
   1083     null,              ;; Context
   1084     metadata !"unsigned char",
   1085     i32 0,             ;; Line number
   1086     i64 8,             ;; Size in Bits
   1087     i64 8,             ;; Align in Bits
   1088     i64 0,             ;; Offset in Bits
   1089     i32 0,             ;; Flags
   1090     i32 8              ;; Encoding
   1091   }
   1092 
   1093 short
   1094 ^^^^^
   1095 
   1096 .. code-block:: llvm
   1097 
   1098   !2 = metadata !{
   1099     i32 786468,        ;; Tag
   1100     null,              ;; File
   1101     null,              ;; Context
   1102     metadata !"short int",
   1103     i32 0,             ;; Line number
   1104     i64 16,            ;; Size in Bits
   1105     i64 16,            ;; Align in Bits
   1106     i64 0,             ;; Offset in Bits
   1107     i32 0,             ;; Flags
   1108     i32 5              ;; Encoding
   1109   }
   1110 
   1111 unsigned short
   1112 ^^^^^^^^^^^^^^
   1113 
   1114 .. code-block:: llvm
   1115 
   1116   !2 = metadata !{
   1117     i32 786468,        ;; Tag
   1118     null,              ;; File
   1119     null,              ;; Context
   1120     metadata !"short unsigned int",
   1121     i32 0,             ;; Line number
   1122     i64 16,            ;; Size in Bits
   1123     i64 16,            ;; Align in Bits
   1124     i64 0,             ;; Offset in Bits
   1125     i32 0,             ;; Flags
   1126     i32 7              ;; Encoding
   1127   }
   1128 
   1129 int
   1130 ^^^
   1131 
   1132 .. code-block:: llvm
   1133 
   1134   !2 = metadata !{
   1135     i32 786468,        ;; Tag
   1136     null,              ;; File
   1137     null,              ;; Context
   1138     metadata !"int",   ;; Name
   1139     i32 0,             ;; Line number
   1140     i64 32,            ;; Size in Bits
   1141     i64 32,            ;; Align in Bits
   1142     i64 0,             ;; Offset in Bits
   1143     i32 0,             ;; Flags
   1144     i32 5              ;; Encoding
   1145   }
   1146 
   1147 unsigned int
   1148 ^^^^^^^^^^^^
   1149 
   1150 .. code-block:: llvm
   1151 
   1152   !2 = metadata !{
   1153     i32 786468,        ;; Tag
   1154     null,              ;; File
   1155     null,              ;; Context
   1156     metadata !"unsigned int",
   1157     i32 0,             ;; Line number
   1158     i64 32,            ;; Size in Bits
   1159     i64 32,            ;; Align in Bits
   1160     i64 0,             ;; Offset in Bits
   1161     i32 0,             ;; Flags
   1162     i32 7              ;; Encoding
   1163   }
   1164 
   1165 long long
   1166 ^^^^^^^^^
   1167 
   1168 .. code-block:: llvm
   1169 
   1170   !2 = metadata !{
   1171     i32 786468,        ;; Tag
   1172     null,              ;; File
   1173     null,              ;; Context
   1174     metadata !"long long int",
   1175     i32 0,             ;; Line number
   1176     i64 64,            ;; Size in Bits
   1177     i64 64,            ;; Align in Bits
   1178     i64 0,             ;; Offset in Bits
   1179     i32 0,             ;; Flags
   1180     i32 5              ;; Encoding
   1181   }
   1182 
   1183 unsigned long long
   1184 ^^^^^^^^^^^^^^^^^^
   1185 
   1186 .. code-block:: llvm
   1187 
   1188   !2 = metadata !{
   1189     i32 786468,        ;; Tag
   1190     null,              ;; File
   1191     null,              ;; Context
   1192     metadata !"long long unsigned int",
   1193     i32 0,             ;; Line number
   1194     i64 64,            ;; Size in Bits
   1195     i64 64,            ;; Align in Bits
   1196     i64 0,             ;; Offset in Bits
   1197     i32 0,             ;; Flags
   1198     i32 7              ;; Encoding
   1199   }
   1200 
   1201 float
   1202 ^^^^^
   1203 
   1204 .. code-block:: llvm
   1205 
   1206   !2 = metadata !{
   1207     i32 786468,        ;; Tag
   1208     null,              ;; File
   1209     null,              ;; Context
   1210     metadata !"float",
   1211     i32 0,             ;; Line number
   1212     i64 32,            ;; Size in Bits
   1213     i64 32,            ;; Align in Bits
   1214     i64 0,             ;; Offset in Bits
   1215     i32 0,             ;; Flags
   1216     i32 4              ;; Encoding
   1217   }
   1218 
   1219 double
   1220 ^^^^^^
   1221 
   1222 .. code-block:: llvm
   1223 
   1224   !2 = metadata !{
   1225     i32 786468,        ;; Tag
   1226     null,              ;; File
   1227     null,              ;; Context
   1228     metadata !"double",;; Name
   1229     i32 0,             ;; Line number
   1230     i64 64,            ;; Size in Bits
   1231     i64 64,            ;; Align in Bits
   1232     i64 0,             ;; Offset in Bits
   1233     i32 0,             ;; Flags
   1234     i32 4              ;; Encoding
   1235   }
   1236 
   1237 C/C++ derived types
   1238 -------------------
   1239 
   1240 Given the following as an example of C/C++ derived type:
   1241 
   1242 .. code-block:: c
   1243 
   1244   typedef const int *IntPtr;
   1245 
   1246 a C/C++ front-end would generate the following descriptors:
   1247 
   1248 .. code-block:: llvm
   1249 
   1250   ;;
   1251   ;; Define the typedef "IntPtr".
   1252   ;;
   1253   !2 = metadata !{
   1254     i32 786454,          ;; Tag
   1255     metadata !3,         ;; File
   1256     metadata !1,         ;; Context
   1257     metadata !"IntPtr",  ;; Name
   1258     i32 0,               ;; Line number
   1259     i64 0,               ;; Size in bits
   1260     i64 0,               ;; Align in bits
   1261     i64 0,               ;; Offset in bits
   1262     i32 0,               ;; Flags
   1263     metadata !4          ;; Derived From type
   1264   }
   1265   ;;
   1266   ;; Define the pointer type.
   1267   ;;
   1268   !4 = metadata !{
   1269     i32 786447,          ;; Tag
   1270     null,                ;; File
   1271     null,                ;; Context
   1272     metadata !"",        ;; Name
   1273     i32 0,               ;; Line number
   1274     i64 64,              ;; Size in bits
   1275     i64 64,              ;; Align in bits
   1276     i64 0,               ;; Offset in bits
   1277     i32 0,               ;; Flags
   1278     metadata !5          ;; Derived From type
   1279   }
   1280   ;;
   1281   ;; Define the const type.
   1282   ;;
   1283   !5 = metadata !{
   1284     i32 786470,          ;; Tag
   1285     null,                ;; File
   1286     null,                ;; Context
   1287     metadata !"",        ;; Name
   1288     i32 0,               ;; Line number
   1289     i64 0,               ;; Size in bits
   1290     i64 0,               ;; Align in bits
   1291     i64 0,               ;; Offset in bits
   1292     i32 0,               ;; Flags
   1293     metadata !6          ;; Derived From type
   1294   }
   1295   ;;
   1296   ;; Define the int type.
   1297   ;;
   1298   !6 = metadata !{
   1299     i32 786468,          ;; Tag
   1300     null,                ;; File
   1301     null,                ;; Context
   1302     metadata !"int",     ;; Name
   1303     i32 0,               ;; Line number
   1304     i64 32,              ;; Size in bits
   1305     i64 32,              ;; Align in bits
   1306     i64 0,               ;; Offset in bits
   1307     i32 0,               ;; Flags
   1308     i32 5                ;; Encoding
   1309   }
   1310 
   1311 C/C++ struct/union types
   1312 ------------------------
   1313 
   1314 Given the following as an example of C/C++ struct type:
   1315 
   1316 .. code-block:: c
   1317 
   1318   struct Color {
   1319     unsigned Red;
   1320     unsigned Green;
   1321     unsigned Blue;
   1322   };
   1323 
   1324 a C/C++ front-end would generate the following descriptors:
   1325 
   1326 .. code-block:: llvm
   1327 
   1328   ;;
   1329   ;; Define basic type for unsigned int.
   1330   ;;
   1331   !5 = metadata !{
   1332     i32 786468,        ;; Tag
   1333     null,              ;; File
   1334     null,              ;; Context
   1335     metadata !"unsigned int",
   1336     i32 0,             ;; Line number
   1337     i64 32,            ;; Size in Bits
   1338     i64 32,            ;; Align in Bits
   1339     i64 0,             ;; Offset in Bits
   1340     i32 0,             ;; Flags
   1341     i32 7              ;; Encoding
   1342   }
   1343   ;;
   1344   ;; Define composite type for struct Color.
   1345   ;;
   1346   !2 = metadata !{
   1347     i32 786451,        ;; Tag
   1348     metadata !1,       ;; Compile unit
   1349     null,              ;; Context
   1350     metadata !"Color", ;; Name
   1351     i32 1,             ;; Line number
   1352     i64 96,            ;; Size in bits
   1353     i64 32,            ;; Align in bits
   1354     i64 0,             ;; Offset in bits
   1355     i32 0,             ;; Flags
   1356     null,              ;; Derived From
   1357     metadata !3,       ;; Elements
   1358     i32 0,             ;; Runtime Language
   1359     null,              ;; Base type containing the vtable pointer for this type
   1360     null               ;; Template parameters
   1361   }
   1362 
   1363   ;;
   1364   ;; Define the Red field.
   1365   ;;
   1366   !4 = metadata !{
   1367     i32 786445,        ;; Tag
   1368     metadata !1,       ;; File
   1369     metadata !1,       ;; Context
   1370     metadata !"Red",   ;; Name
   1371     i32 2,             ;; Line number
   1372     i64 32,            ;; Size in bits
   1373     i64 32,            ;; Align in bits
   1374     i64 0,             ;; Offset in bits
   1375     i32 0,             ;; Flags
   1376     metadata !5        ;; Derived From type
   1377   }
   1378 
   1379   ;;
   1380   ;; Define the Green field.
   1381   ;;
   1382   !6 = metadata !{
   1383     i32 786445,        ;; Tag
   1384     metadata !1,       ;; File
   1385     metadata !1,       ;; Context
   1386     metadata !"Green", ;; Name
   1387     i32 3,             ;; Line number
   1388     i64 32,            ;; Size in bits
   1389     i64 32,            ;; Align in bits
   1390     i64 32,             ;; Offset in bits
   1391     i32 0,             ;; Flags
   1392     metadata !5        ;; Derived From type
   1393   }
   1394 
   1395   ;;
   1396   ;; Define the Blue field.
   1397   ;;
   1398   !7 = metadata !{
   1399     i32 786445,        ;; Tag
   1400     metadata !1,       ;; File
   1401     metadata !1,       ;; Context
   1402     metadata !"Blue",  ;; Name
   1403     i32 4,             ;; Line number
   1404     i64 32,            ;; Size in bits
   1405     i64 32,            ;; Align in bits
   1406     i64 64,             ;; Offset in bits
   1407     i32 0,             ;; Flags
   1408     metadata !5        ;; Derived From type
   1409   }
   1410 
   1411   ;;
   1412   ;; Define the array of fields used by the composite type Color.
   1413   ;;
   1414   !3 = metadata !{metadata !4, metadata !6, metadata !7}
   1415 
   1416 C/C++ enumeration types
   1417 -----------------------
   1418 
   1419 Given the following as an example of C/C++ enumeration type:
   1420 
   1421 .. code-block:: c
   1422 
   1423   enum Trees {
   1424     Spruce = 100,
   1425     Oak = 200,
   1426     Maple = 300
   1427   };
   1428 
   1429 a C/C++ front-end would generate the following descriptors:
   1430 
   1431 .. code-block:: llvm
   1432 
   1433   ;;
   1434   ;; Define composite type for enum Trees
   1435   ;;
   1436   !2 = metadata !{
   1437     i32 786436,        ;; Tag
   1438     metadata !1,       ;; File
   1439     metadata !1,       ;; Context
   1440     metadata !"Trees", ;; Name
   1441     i32 1,             ;; Line number
   1442     i64 32,            ;; Size in bits
   1443     i64 32,            ;; Align in bits
   1444     i64 0,             ;; Offset in bits
   1445     i32 0,             ;; Flags
   1446     null,              ;; Derived From type
   1447     metadata !3,       ;; Elements
   1448     i32 0              ;; Runtime language
   1449   }
   1450 
   1451   ;;
   1452   ;; Define the array of enumerators used by composite type Trees.
   1453   ;;
   1454   !3 = metadata !{metadata !4, metadata !5, metadata !6}
   1455 
   1456   ;;
   1457   ;; Define Spruce enumerator.
   1458   ;;
   1459   !4 = metadata !{i32 786472, metadata !"Spruce", i64 100}
   1460 
   1461   ;;
   1462   ;; Define Oak enumerator.
   1463   ;;
   1464   !5 = metadata !{i32 786472, metadata !"Oak", i64 200}
   1465 
   1466   ;;
   1467   ;; Define Maple enumerator.
   1468   ;;
   1469   !6 = metadata !{i32 786472, metadata !"Maple", i64 300}
   1470 
   1471 Debugging information format
   1472 ============================
   1473 
   1474 Debugging Information Extension for Objective C Properties
   1475 ----------------------------------------------------------
   1476 
   1477 Introduction
   1478 ^^^^^^^^^^^^
   1479 
   1480 Objective C provides a simpler way to declare and define accessor methods using
   1481 declared properties.  The language provides features to declare a property and
   1482 to let compiler synthesize accessor methods.
   1483 
   1484 The debugger lets developer inspect Objective C interfaces and their instance
   1485 variables and class variables.  However, the debugger does not know anything
   1486 about the properties defined in Objective C interfaces.  The debugger consumes
   1487 information generated by compiler in DWARF format.  The format does not support
   1488 encoding of Objective C properties.  This proposal describes DWARF extensions to
   1489 encode Objective C properties, which the debugger can use to let developers
   1490 inspect Objective C properties.
   1491 
   1492 Proposal
   1493 ^^^^^^^^
   1494 
   1495 Objective C properties exist separately from class members.  A property can be
   1496 defined only by "setter" and "getter" selectors, and be calculated anew on each
   1497 access.  Or a property can just be a direct access to some declared ivar.
   1498 Finally it can have an ivar "automatically synthesized" for it by the compiler,
   1499 in which case the property can be referred to in user code directly using the
   1500 standard C dereference syntax as well as through the property "dot" syntax, but
   1501 there is no entry in the ``@interface`` declaration corresponding to this ivar.
   1502 
   1503 To facilitate debugging, these properties we will add a new DWARF TAG into the
   1504 ``DW_TAG_structure_type`` definition for the class to hold the description of a
   1505 given property, and a set of DWARF attributes that provide said description.
   1506 The property tag will also contain the name and declared type of the property.
   1507 
   1508 If there is a related ivar, there will also be a DWARF property attribute placed
   1509 in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
   1510 for that property.  And in the case where the compiler synthesizes the ivar
   1511 directly, the compiler is expected to generate a ``DW_TAG_member`` for that
   1512 ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
   1513 to access this ivar directly in code, and with the property attribute pointing
   1514 back to the property it is backing.
   1515 
   1516 The following examples will serve as illustration for our discussion:
   1517 
   1518 .. code-block:: objc
   1519 
   1520   @interface I1 {
   1521     int n2;
   1522   }
   1523 
   1524   @property int p1;
   1525   @property int p2;
   1526   @end
   1527 
   1528   @implementation I1
   1529   @synthesize p1;
   1530   @synthesize p2 = n2;
   1531   @end
   1532 
   1533 This produces the following DWARF (this is a "pseudo dwarfdump" output):
   1534 
   1535 .. code-block:: none
   1536 
   1537   0x00000100:  TAG_structure_type [7] *
   1538                  AT_APPLE_runtime_class( 0x10 )
   1539                  AT_name( "I1" )
   1540                  AT_decl_file( "Objc_Property.m" )
   1541                  AT_decl_line( 3 )
   1542 
   1543   0x00000110    TAG_APPLE_property
   1544                   AT_name ( "p1" )
   1545                   AT_type ( {0x00000150} ( int ) )
   1546 
   1547   0x00000120:   TAG_APPLE_property
   1548                   AT_name ( "p2" )
   1549                   AT_type ( {0x00000150} ( int ) )
   1550 
   1551   0x00000130:   TAG_member [8]
   1552                   AT_name( "_p1" )
   1553                   AT_APPLE_property ( {0x00000110} "p1" )
   1554                   AT_type( {0x00000150} ( int ) )
   1555                   AT_artificial ( 0x1 )
   1556 
   1557   0x00000140:    TAG_member [8]
   1558                    AT_name( "n2" )
   1559                    AT_APPLE_property ( {0x00000120} "p2" )
   1560                    AT_type( {0x00000150} ( int ) )
   1561 
   1562   0x00000150:  AT_type( ( int ) )
   1563 
   1564 Note, the current convention is that the name of the ivar for an
   1565 auto-synthesized property is the name of the property from which it derives
   1566 with an underscore prepended, as is shown in the example.  But we actually
   1567 don't need to know this convention, since we are given the name of the ivar
   1568 directly.
   1569 
   1570 Also, it is common practice in ObjC to have different property declarations in
   1571 the @interface and @implementation - e.g. to provide a read-only property in
   1572 the interface,and a read-write interface in the implementation.  In that case,
   1573 the compiler should emit whichever property declaration will be in force in the
   1574 current translation unit.
   1575 
   1576 Developers can decorate a property with attributes which are encoded using
   1577 ``DW_AT_APPLE_property_attribute``.
   1578 
   1579 .. code-block:: objc
   1580 
   1581   @property (readonly, nonatomic) int pr;
   1582 
   1583 .. code-block:: none
   1584 
   1585   TAG_APPLE_property [8]
   1586     AT_name( "pr" )
   1587     AT_type ( {0x00000147} (int) )
   1588     AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
   1589 
   1590 The setter and getter method names are attached to the property using
   1591 ``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
   1592 
   1593 .. code-block:: objc
   1594 
   1595   @interface I1
   1596   @property (setter=myOwnP3Setter:) int p3;
   1597   -(void)myOwnP3Setter:(int)a;
   1598   @end
   1599 
   1600   @implementation I1
   1601   @synthesize p3;
   1602   -(void)myOwnP3Setter:(int)a{ }
   1603   @end
   1604 
   1605 The DWARF for this would be:
   1606 
   1607 .. code-block:: none
   1608 
   1609   0x000003bd: TAG_structure_type [7] *
   1610                 AT_APPLE_runtime_class( 0x10 )
   1611                 AT_name( "I1" )
   1612                 AT_decl_file( "Objc_Property.m" )
   1613                 AT_decl_line( 3 )
   1614 
   1615   0x000003cd      TAG_APPLE_property
   1616                     AT_name ( "p3" )
   1617                     AT_APPLE_property_setter ( "myOwnP3Setter:" )
   1618                     AT_type( {0x00000147} ( int ) )
   1619 
   1620   0x000003f3:     TAG_member [8]
   1621                     AT_name( "_p3" )
   1622                     AT_type ( {0x00000147} ( int ) )
   1623                     AT_APPLE_property ( {0x000003cd} )
   1624                     AT_artificial ( 0x1 )
   1625 
   1626 New DWARF Tags
   1627 ^^^^^^^^^^^^^^
   1628 
   1629 +-----------------------+--------+
   1630 | TAG                   | Value  |
   1631 +=======================+========+
   1632 | DW_TAG_APPLE_property | 0x4200 |
   1633 +-----------------------+--------+
   1634 
   1635 New DWARF Attributes
   1636 ^^^^^^^^^^^^^^^^^^^^
   1637 
   1638 +--------------------------------+--------+-----------+
   1639 | Attribute                      | Value  | Classes   |
   1640 +================================+========+===========+
   1641 | DW_AT_APPLE_property           | 0x3fed | Reference |
   1642 +--------------------------------+--------+-----------+
   1643 | DW_AT_APPLE_property_getter    | 0x3fe9 | String    |
   1644 +--------------------------------+--------+-----------+
   1645 | DW_AT_APPLE_property_setter    | 0x3fea | String    |
   1646 +--------------------------------+--------+-----------+
   1647 | DW_AT_APPLE_property_attribute | 0x3feb | Constant  |
   1648 +--------------------------------+--------+-----------+
   1649 
   1650 New DWARF Constants
   1651 ^^^^^^^^^^^^^^^^^^^
   1652 
   1653 +--------------------------------+-------+
   1654 | Name                           | Value |
   1655 +================================+=======+
   1656 | DW_AT_APPLE_PROPERTY_readonly  | 0x1   |
   1657 +--------------------------------+-------+
   1658 | DW_AT_APPLE_PROPERTY_readwrite | 0x2   |
   1659 +--------------------------------+-------+
   1660 | DW_AT_APPLE_PROPERTY_assign    | 0x4   |
   1661 +--------------------------------+-------+
   1662 | DW_AT_APPLE_PROPERTY_retain    | 0x8   |
   1663 +--------------------------------+-------+
   1664 | DW_AT_APPLE_PROPERTY_copy      | 0x10  |
   1665 +--------------------------------+-------+
   1666 | DW_AT_APPLE_PROPERTY_nonatomic | 0x20  |
   1667 +--------------------------------+-------+
   1668 
   1669 Name Accelerator Tables
   1670 -----------------------
   1671 
   1672 Introduction
   1673 ^^^^^^^^^^^^
   1674 
   1675 The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
   1676 debugger needs.  The "``pub``" in the section name indicates that the entries
   1677 in the table are publicly visible names only.  This means no static or hidden
   1678 functions show up in the "``.debug_pubnames``".  No static variables or private
   1679 class variables are in the "``.debug_pubtypes``".  Many compilers add different
   1680 things to these tables, so we can't rely upon the contents between gcc, icc, or
   1681 clang.
   1682 
   1683 The typical query given by users tends not to match up with the contents of
   1684 these tables.  For example, the DWARF spec states that "In the case of the name
   1685 of a function member or static data member of a C++ structure, class or union,
   1686 the name presented in the "``.debug_pubnames``" section is not the simple name
   1687 given by the ``DW_AT_name attribute`` of the referenced debugging information
   1688 entry, but rather the fully qualified name of the data or function member."
   1689 So the only names in these tables for complex C++ entries is a fully
   1690 qualified name.  Debugger users tend not to enter their search strings as
   1691 "``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
   1692 "``a::b::c``".  So the name entered in the name table must be demangled in
   1693 order to chop it up appropriately and additional names must be manually entered
   1694 into the table to make it effective as a name lookup table for debuggers to
   1695 se.
   1696 
   1697 All debuggers currently ignore the "``.debug_pubnames``" table as a result of
   1698 its inconsistent and useless public-only name content making it a waste of
   1699 space in the object file.  These tables, when they are written to disk, are not
   1700 sorted in any way, leaving every debugger to do its own parsing and sorting.
   1701 These tables also include an inlined copy of the string values in the table
   1702 itself making the tables much larger than they need to be on disk, especially
   1703 for large C++ programs.
   1704 
   1705 Can't we just fix the sections by adding all of the names we need to this
   1706 table? No, because that is not what the tables are defined to contain and we
   1707 won't know the difference between the old bad tables and the new good tables.
   1708 At best we could make our own renamed sections that contain all of the data we
   1709 need.
   1710 
   1711 These tables are also insufficient for what a debugger like LLDB needs.  LLDB
   1712 uses clang for its expression parsing where LLDB acts as a PCH.  LLDB is then
   1713 often asked to look for type "``foo``" or namespace "``bar``", or list items in
   1714 namespace "``baz``".  Namespaces are not included in the pubnames or pubtypes
   1715 tables.  Since clang asks a lot of questions when it is parsing an expression,
   1716 we need to be very fast when looking up names, as it happens a lot.  Having new
   1717 accelerator tables that are optimized for very quick lookups will benefit this
   1718 type of debugging experience greatly.
   1719 
   1720 We would like to generate name lookup tables that can be mapped into memory
   1721 from disk, and used as is, with little or no up-front parsing.  We would also
   1722 be able to control the exact content of these different tables so they contain
   1723 exactly what we need.  The Name Accelerator Tables were designed to fix these
   1724 issues.  In order to solve these issues we need to:
   1725 
   1726 * Have a format that can be mapped into memory from disk and used as is
   1727 * Lookups should be very fast
   1728 * Extensible table format so these tables can be made by many producers
   1729 * Contain all of the names needed for typical lookups out of the box
   1730 * Strict rules for the contents of tables
   1731 
   1732 Table size is important and the accelerator table format should allow the reuse
   1733 of strings from common string tables so the strings for the names are not
   1734 duplicated.  We also want to make sure the table is ready to be used as-is by
   1735 simply mapping the table into memory with minimal header parsing.
   1736 
   1737 The name lookups need to be fast and optimized for the kinds of lookups that
   1738 debuggers tend to do.  Optimally we would like to touch as few parts of the
   1739 mapped table as possible when doing a name lookup and be able to quickly find
   1740 the name entry we are looking for, or discover there are no matches.  In the
   1741 case of debuggers we optimized for lookups that fail most of the time.
   1742 
   1743 Each table that is defined should have strict rules on exactly what is in the
   1744 accelerator tables and documented so clients can rely on the content.
   1745 
   1746 Hash Tables
   1747 ^^^^^^^^^^^
   1748 
   1749 Standard Hash Tables
   1750 """"""""""""""""""""
   1751 
   1752 Typical hash tables have a header, buckets, and each bucket points to the
   1753 bucket contents:
   1754 
   1755 .. code-block:: none
   1756 
   1757   .------------.
   1758   |  HEADER    |
   1759   |------------|
   1760   |  BUCKETS   |
   1761   |------------|
   1762   |  DATA      |
   1763   `------------'
   1764 
   1765 The BUCKETS are an array of offsets to DATA for each hash:
   1766 
   1767 .. code-block:: none
   1768 
   1769   .------------.
   1770   | 0x00001000 | BUCKETS[0]
   1771   | 0x00002000 | BUCKETS[1]
   1772   | 0x00002200 | BUCKETS[2]
   1773   | 0x000034f0 | BUCKETS[3]
   1774   |            | ...
   1775   | 0xXXXXXXXX | BUCKETS[n_buckets]
   1776   '------------'
   1777 
   1778 So for ``bucket[3]`` in the example above, we have an offset into the table
   1779 0x000034f0 which points to a chain of entries for the bucket.  Each bucket must
   1780 contain a next pointer, full 32 bit hash value, the string itself, and the data
   1781 for the current string value.
   1782 
   1783 .. code-block:: none
   1784 
   1785               .------------.
   1786   0x000034f0: | 0x00003500 | next pointer
   1787               | 0x12345678 | 32 bit hash
   1788               | "erase"    | string value
   1789               | data[n]    | HashData for this bucket
   1790               |------------|
   1791   0x00003500: | 0x00003550 | next pointer
   1792               | 0x29273623 | 32 bit hash
   1793               | "dump"     | string value
   1794               | data[n]    | HashData for this bucket
   1795               |------------|
   1796   0x00003550: | 0x00000000 | next pointer
   1797               | 0x82638293 | 32 bit hash
   1798               | "main"     | string value
   1799               | data[n]    | HashData for this bucket
   1800               `------------'
   1801 
   1802 The problem with this layout for debuggers is that we need to optimize for the
   1803 negative lookup case where the symbol we're searching for is not present.  So
   1804 if we were to lookup "``printf``" in the table above, we would make a 32 hash
   1805 for "``printf``", it might match ``bucket[3]``.  We would need to go to the
   1806 offset 0x000034f0 and start looking to see if our 32 bit hash matches.  To do
   1807 so, we need to read the next pointer, then read the hash, compare it, and skip
   1808 to the next bucket.  Each time we are skipping many bytes in memory and
   1809 touching new cache pages just to do the compare on the full 32 bit hash.  All
   1810 of these accesses then tell us that we didn't have a match.
   1811 
   1812 Name Hash Tables
   1813 """"""""""""""""
   1814 
   1815 To solve the issues mentioned above we have structured the hash tables a bit
   1816 differently: a header, buckets, an array of all unique 32 bit hash values,
   1817 followed by an array of hash value data offsets, one for each hash value, then
   1818 the data for all hash values:
   1819 
   1820 .. code-block:: none
   1821 
   1822   .-------------.
   1823   |  HEADER     |
   1824   |-------------|
   1825   |  BUCKETS    |
   1826   |-------------|
   1827   |  HASHES     |
   1828   |-------------|
   1829   |  OFFSETS    |
   1830   |-------------|
   1831   |  DATA       |
   1832   `-------------'
   1833 
   1834 The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array.  By
   1835 making all of the full 32 bit hash values contiguous in memory, we allow
   1836 ourselves to efficiently check for a match while touching as little memory as
   1837 possible.  Most often checking the 32 bit hash values is as far as the lookup
   1838 goes.  If it does match, it usually is a match with no collisions.  So for a
   1839 table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
   1840 values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
   1841 ``OFFSETS`` as:
   1842 
   1843 .. code-block:: none
   1844 
   1845   .-------------------------.
   1846   |  HEADER.magic           | uint32_t
   1847   |  HEADER.version         | uint16_t
   1848   |  HEADER.hash_function   | uint16_t
   1849   |  HEADER.bucket_count    | uint32_t
   1850   |  HEADER.hashes_count    | uint32_t
   1851   |  HEADER.header_data_len | uint32_t
   1852   |  HEADER_DATA            | HeaderData
   1853   |-------------------------|
   1854   |  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
   1855   |-------------------------|
   1856   |  HASHES                 | uint32_t[n_hashes] // 32 bit hash values
   1857   |-------------------------|
   1858   |  OFFSETS                | uint32_t[n_hashes] // 32 bit offsets to hash value data
   1859   |-------------------------|
   1860   |  ALL HASH DATA          |
   1861   `-------------------------'
   1862 
   1863 So taking the exact same data from the standard hash example above we end up
   1864 with:
   1865 
   1866 .. code-block:: none
   1867 
   1868               .------------.
   1869               | HEADER     |
   1870               |------------|
   1871               |          0 | BUCKETS[0]
   1872               |          2 | BUCKETS[1]
   1873               |          5 | BUCKETS[2]
   1874               |          6 | BUCKETS[3]
   1875               |            | ...
   1876               |        ... | BUCKETS[n_buckets]
   1877               |------------|
   1878               | 0x........ | HASHES[0]
   1879               | 0x........ | HASHES[1]
   1880               | 0x........ | HASHES[2]
   1881               | 0x........ | HASHES[3]
   1882               | 0x........ | HASHES[4]
   1883               | 0x........ | HASHES[5]
   1884               | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
   1885               | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
   1886               | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
   1887               | 0x........ | HASHES[9]
   1888               | 0x........ | HASHES[10]
   1889               | 0x........ | HASHES[11]
   1890               | 0x........ | HASHES[12]
   1891               | 0x........ | HASHES[13]
   1892               | 0x........ | HASHES[n_hashes]
   1893               |------------|
   1894               | 0x........ | OFFSETS[0]
   1895               | 0x........ | OFFSETS[1]
   1896               | 0x........ | OFFSETS[2]
   1897               | 0x........ | OFFSETS[3]
   1898               | 0x........ | OFFSETS[4]
   1899               | 0x........ | OFFSETS[5]
   1900               | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
   1901               | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
   1902               | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
   1903               | 0x........ | OFFSETS[9]
   1904               | 0x........ | OFFSETS[10]
   1905               | 0x........ | OFFSETS[11]
   1906               | 0x........ | OFFSETS[12]
   1907               | 0x........ | OFFSETS[13]
   1908               | 0x........ | OFFSETS[n_hashes]
   1909               |------------|
   1910               |            |
   1911               |            |
   1912               |            |
   1913               |            |
   1914               |            |
   1915               |------------|
   1916   0x000034f0: | 0x00001203 | .debug_str ("erase")
   1917               | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
   1918               | 0x........ | HashData[0]
   1919               | 0x........ | HashData[1]
   1920               | 0x........ | HashData[2]
   1921               | 0x........ | HashData[3]
   1922               | 0x00000000 | String offset into .debug_str (terminate data for hash)
   1923               |------------|
   1924   0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
   1925               | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
   1926               | 0x........ | HashData[0]
   1927               | 0x........ | HashData[1]
   1928               | 0x00001203 | String offset into .debug_str ("dump")
   1929               | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
   1930               | 0x........ | HashData[0]
   1931               | 0x........ | HashData[1]
   1932               | 0x........ | HashData[2]
   1933               | 0x00000000 | String offset into .debug_str (terminate data for hash)
   1934               |------------|
   1935   0x00003550: | 0x00001203 | String offset into .debug_str ("main")
   1936               | 0x00000009 | A 32 bit array count - number of HashData with name "main"
   1937               | 0x........ | HashData[0]
   1938               | 0x........ | HashData[1]
   1939               | 0x........ | HashData[2]
   1940               | 0x........ | HashData[3]
   1941               | 0x........ | HashData[4]
   1942               | 0x........ | HashData[5]
   1943               | 0x........ | HashData[6]
   1944               | 0x........ | HashData[7]
   1945               | 0x........ | HashData[8]
   1946               | 0x00000000 | String offset into .debug_str (terminate data for hash)
   1947               `------------'
   1948 
   1949 So we still have all of the same data, we just organize it more efficiently for
   1950 debugger lookup.  If we repeat the same "``printf``" lookup from above, we
   1951 would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
   1952 hash value and modulo it by ``n_buckets``.  ``BUCKETS[3]`` contains "6" which
   1953 is the index into the ``HASHES`` table.  We would then compare any consecutive
   1954 32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
   1955 ``BUCKETS[3]``.  We do this by verifying that each subsequent hash value modulo
   1956 ``n_buckets`` is still 3.  In the case of a failed lookup we would access the
   1957 memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
   1958 before we know that we have no match.  We don't end up marching through
   1959 multiple words of memory and we really keep the number of processor data cache
   1960 lines being accessed as small as possible.
   1961 
   1962 The string hash that is used for these lookup tables is the Daniel J.
   1963 Bernstein hash which is also used in the ELF ``GNU_HASH`` sections.  It is a
   1964 very good hash for all kinds of names in programs with very few hash
   1965 collisions.
   1966 
   1967 Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
   1968 
   1969 Details
   1970 ^^^^^^^
   1971 
   1972 These name hash tables are designed to be generic where specializations of the
   1973 table get to define additional data that goes into the header ("``HeaderData``"),
   1974 how the string value is stored ("``KeyType``") and the content of the data for each
   1975 hash value.
   1976 
   1977 Header Layout
   1978 """""""""""""
   1979 
   1980 The header has a fixed part, and the specialized part.  The exact format of the
   1981 header is:
   1982 
   1983 .. code-block:: c
   1984 
   1985   struct Header
   1986   {
   1987     uint32_t   magic;           // 'HASH' magic value to allow endian detection
   1988     uint16_t   version;         // Version number
   1989     uint16_t   hash_function;   // The hash function enumeration that was used
   1990     uint32_t   bucket_count;    // The number of buckets in this hash table
   1991     uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
   1992     uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
   1993                                 // Specifically the length of the following HeaderData field - this does not
   1994                                 // include the size of the preceding fields
   1995     HeaderData header_data;     // Implementation specific header data
   1996   };
   1997 
   1998 The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
   1999 encoded as an ASCII integer.  This allows the detection of the start of the
   2000 hash table and also allows the table's byte order to be determined so the table
   2001 can be correctly extracted.  The "``magic``" value is followed by a 16 bit
   2002 ``version`` number which allows the table to be revised and modified in the
   2003 future.  The current version number is 1. ``hash_function`` is a ``uint16_t``
   2004 enumeration that specifies which hash function was used to produce this table.
   2005 The current values for the hash function enumerations include:
   2006 
   2007 .. code-block:: c
   2008 
   2009   enum HashFunctionType
   2010   {
   2011     eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
   2012   };
   2013 
   2014 ``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
   2015 are in the ``BUCKETS`` array.  ``hashes_count`` is the number of unique 32 bit
   2016 hash values that are in the ``HASHES`` array, and is the same number of offsets
   2017 are contained in the ``OFFSETS`` array.  ``header_data_len`` specifies the size
   2018 in bytes of the ``HeaderData`` that is filled in by specialized versions of
   2019 this table.
   2020 
   2021 Fixed Lookup
   2022 """"""""""""
   2023 
   2024 The header is followed by the buckets, hashes, offsets, and hash value data.
   2025 
   2026 .. code-block:: c
   2027 
   2028   struct FixedTable
   2029   {
   2030     uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
   2031     uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
   2032     uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
   2033   };
   2034 
   2035 ``buckets`` is an array of 32 bit indexes into the ``hashes`` array.  The
   2036 ``hashes`` array contains all of the 32 bit hash values for all names in the
   2037 hash table.  Each hash in the ``hashes`` table has an offset in the ``offsets``
   2038 array that points to the data for the hash value.
   2039 
   2040 This table setup makes it very easy to repurpose these tables to contain
   2041 different data, while keeping the lookup mechanism the same for all tables.
   2042 This layout also makes it possible to save the table to disk and map it in
   2043 later and do very efficient name lookups with little or no parsing.
   2044 
   2045 DWARF lookup tables can be implemented in a variety of ways and can store a lot
   2046 of information for each name.  We want to make the DWARF tables extensible and
   2047 able to store the data efficiently so we have used some of the DWARF features
   2048 that enable efficient data storage to define exactly what kind of data we store
   2049 for each name.
   2050 
   2051 The ``HeaderData`` contains a definition of the contents of each HashData chunk.
   2052 We might want to store an offset to all of the debug information entries (DIEs)
   2053 for each name.  To keep things extensible, we create a list of items, or
   2054 Atoms, that are contained in the data for each name.  First comes the type of
   2055 the data in each atom:
   2056 
   2057 .. code-block:: c
   2058 
   2059   enum AtomType
   2060   {
   2061     eAtomTypeNULL       = 0u,
   2062     eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
   2063     eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
   2064     eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
   2065     eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
   2066     eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
   2067   };
   2068 
   2069 The enumeration values and their meanings are:
   2070 
   2071 .. code-block:: none
   2072 
   2073   eAtomTypeNULL       - a termination atom that specifies the end of the atom list
   2074   eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
   2075   eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
   2076   eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
   2077   eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
   2078   eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
   2079 
   2080 Then we allow each atom type to define the atom type and how the data for each
   2081 atom type data is encoded:
   2082 
   2083 .. code-block:: c
   2084 
   2085   struct Atom
   2086   {
   2087     uint16_t type;  // AtomType enum value
   2088     uint16_t form;  // DWARF DW_FORM_XXX defines
   2089   };
   2090 
   2091 The ``form`` type above is from the DWARF specification and defines the exact
   2092 encoding of the data for the Atom type.  See the DWARF specification for the
   2093 ``DW_FORM_`` definitions.
   2094 
   2095 .. code-block:: c
   2096 
   2097   struct HeaderData
   2098   {
   2099     uint32_t die_offset_base;
   2100     uint32_t atom_count;
   2101     Atoms    atoms[atom_count0];
   2102   };
   2103 
   2104 ``HeaderData`` defines the base DIE offset that should be added to any atoms
   2105 that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
   2106 ``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``.  It also defines
   2107 what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
   2108 each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
   2109 should be interpreted.
   2110 
   2111 For the current implementations of the "``.apple_names``" (all functions +
   2112 globals), the "``.apple_types``" (names of all types that are defined), and
   2113 the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
   2114 array to be:
   2115 
   2116 .. code-block:: c
   2117 
   2118   HeaderData.atom_count = 1;
   2119   HeaderData.atoms[0].type = eAtomTypeDIEOffset;
   2120   HeaderData.atoms[0].form = DW_FORM_data4;
   2121 
   2122 This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
   2123 encoded as a 32 bit value (DW_FORM_data4).  This allows a single name to have
   2124 multiple matching DIEs in a single file, which could come up with an inlined
   2125 function for instance.  Future tables could include more information about the
   2126 DIE such as flags indicating if the DIE is a function, method, block,
   2127 or inlined.
   2128 
   2129 The KeyType for the DWARF table is a 32 bit string table offset into the
   2130 ".debug_str" table.  The ".debug_str" is the string table for the DWARF which
   2131 may already contain copies of all of the strings.  This helps make sure, with
   2132 help from the compiler, that we reuse the strings between all of the DWARF
   2133 sections and keeps the hash table size down.  Another benefit to having the
   2134 compiler generate all strings as DW_FORM_strp in the debug info, is that
   2135 DWARF parsing can be made much faster.
   2136 
   2137 After a lookup is made, we get an offset into the hash data.  The hash data
   2138 needs to be able to deal with 32 bit hash collisions, so the chunk of data
   2139 at the offset in the hash data consists of a triple:
   2140 
   2141 .. code-block:: c
   2142 
   2143   uint32_t str_offset
   2144   uint32_t hash_data_count
   2145   HashData[hash_data_count]
   2146 
   2147 If "str_offset" is zero, then the bucket contents are done. 99.9% of the
   2148 hash data chunks contain a single item (no 32 bit hash collision):
   2149 
   2150 .. code-block:: none
   2151 
   2152   .------------.
   2153   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   2154   | 0x00000004 | uint32_t HashData count
   2155   | 0x........ | uint32_t HashData[0] DIE offset
   2156   | 0x........ | uint32_t HashData[1] DIE offset
   2157   | 0x........ | uint32_t HashData[2] DIE offset
   2158   | 0x........ | uint32_t HashData[3] DIE offset
   2159   | 0x00000000 | uint32_t KeyType (end of hash chain)
   2160   `------------'
   2161 
   2162 If there are collisions, you will have multiple valid string offsets:
   2163 
   2164 .. code-block:: none
   2165 
   2166   .------------.
   2167   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   2168   | 0x00000004 | uint32_t HashData count
   2169   | 0x........ | uint32_t HashData[0] DIE offset
   2170   | 0x........ | uint32_t HashData[1] DIE offset
   2171   | 0x........ | uint32_t HashData[2] DIE offset
   2172   | 0x........ | uint32_t HashData[3] DIE offset
   2173   | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
   2174   | 0x00000002 | uint32_t HashData count
   2175   | 0x........ | uint32_t HashData[0] DIE offset
   2176   | 0x........ | uint32_t HashData[1] DIE offset
   2177   | 0x00000000 | uint32_t KeyType (end of hash chain)
   2178   `------------'
   2179 
   2180 Current testing with real world C++ binaries has shown that there is around 1
   2181 32 bit hash collision per 100,000 name entries.
   2182 
   2183 Contents
   2184 ^^^^^^^^
   2185 
   2186 As we said, we want to strictly define exactly what is included in the
   2187 different tables.  For DWARF, we have 3 tables: "``.apple_names``",
   2188 "``.apple_types``", and "``.apple_namespaces``".
   2189 
   2190 "``.apple_names``" sections should contain an entry for each DWARF DIE whose
   2191 ``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
   2192 ``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
   2193 ``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``.  It also contains
   2194 ``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
   2195 static variables).  All global and static variables should be included,
   2196 including those scoped within functions and classes.  For example using the
   2197 following code:
   2198 
   2199 .. code-block:: c
   2200 
   2201   static int var = 0;
   2202 
   2203   void f ()
   2204   {
   2205     static int var = 0;
   2206   }
   2207 
   2208 Both of the static ``var`` variables would be included in the table.  All
   2209 functions should emit both their full names and their basenames.  For C or C++,
   2210 the full name is the mangled name (if available) which is usually in the
   2211 ``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
   2212 function basename.  If global or static variables have a mangled name in a
   2213 ``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
   2214 simple name found in the ``DW_AT_name`` attribute.
   2215 
   2216 "``.apple_types``" sections should contain an entry for each DWARF DIE whose
   2217 tag is one of:
   2218 
   2219 * DW_TAG_array_type
   2220 * DW_TAG_class_type
   2221 * DW_TAG_enumeration_type
   2222 * DW_TAG_pointer_type
   2223 * DW_TAG_reference_type
   2224 * DW_TAG_string_type
   2225 * DW_TAG_structure_type
   2226 * DW_TAG_subroutine_type
   2227 * DW_TAG_typedef
   2228 * DW_TAG_union_type
   2229 * DW_TAG_ptr_to_member_type
   2230 * DW_TAG_set_type
   2231 * DW_TAG_subrange_type
   2232 * DW_TAG_base_type
   2233 * DW_TAG_const_type
   2234 * DW_TAG_constant
   2235 * DW_TAG_file_type
   2236 * DW_TAG_namelist
   2237 * DW_TAG_packed_type
   2238 * DW_TAG_volatile_type
   2239 * DW_TAG_restrict_type
   2240 * DW_TAG_interface_type
   2241 * DW_TAG_unspecified_type
   2242 * DW_TAG_shared_type
   2243 
   2244 Only entries with a ``DW_AT_name`` attribute are included, and the entry must
   2245 not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
   2246 value).  For example, using the following code:
   2247 
   2248 .. code-block:: c
   2249 
   2250   int main ()
   2251   {
   2252     int *b = 0;
   2253     return *b;
   2254   }
   2255 
   2256 We get a few type DIEs:
   2257 
   2258 .. code-block:: none
   2259 
   2260   0x00000067:     TAG_base_type [5]
   2261                   AT_encoding( DW_ATE_signed )
   2262                   AT_name( "int" )
   2263                   AT_byte_size( 0x04 )
   2264 
   2265   0x0000006e:     TAG_pointer_type [6]
   2266                   AT_type( {0x00000067} ( int ) )
   2267                   AT_byte_size( 0x08 )
   2268 
   2269 The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
   2270 
   2271 "``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
   2272 If we run into a namespace that has no name this is an anonymous namespace, and
   2273 the name should be output as "``(anonymous namespace)``" (without the quotes).
   2274 Why?  This matches the output of the ``abi::cxa_demangle()`` that is in the
   2275 standard C++ library that demangles mangled names.
   2276 
   2277 
   2278 Language Extensions and File Format Changes
   2279 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   2280 
   2281 Objective-C Extensions
   2282 """"""""""""""""""""""
   2283 
   2284 "``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
   2285 Objective-C class.  The name used in the hash table is the name of the
   2286 Objective-C class itself.  If the Objective-C class has a category, then an
   2287 entry is made for both the class name without the category, and for the class
   2288 name with the category.  So if we have a DIE at offset 0x1234 with a name of
   2289 method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
   2290 an entry for "``NSString``" that points to DIE 0x1234, and an entry for
   2291 "``NSString(my_additions)``" that points to 0x1234.  This allows us to quickly
   2292 track down all Objective-C methods for an Objective-C class when doing
   2293 expressions.  It is needed because of the dynamic nature of Objective-C where
   2294 anyone can add methods to a class.  The DWARF for Objective-C methods is also
   2295 emitted differently from C++ classes where the methods are not usually
   2296 contained in the class definition, they are scattered about across one or more
   2297 compile units.  Categories can also be defined in different shared libraries.
   2298 So we need to be able to quickly find all of the methods and class functions
   2299 given the Objective-C class name, or quickly find all methods and class
   2300 functions for a class + category name.  This table does not contain any
   2301 selector names, it just maps Objective-C class names (or class names +
   2302 category) to all of the methods and class functions.  The selectors are added
   2303 as function basenames in the "``.debug_names``" section.
   2304 
   2305 In the "``.apple_names``" section for Objective-C functions, the full name is
   2306 the entire function name with the brackets ("``-[NSString
   2307 stringWithCString:]``") and the basename is the selector only
   2308 ("``stringWithCString:``").
   2309 
   2310 Mach-O Changes
   2311 """"""""""""""
   2312 
   2313 The sections names for the apple hash tables are for non-mach-o files.  For
   2314 mach-o files, the sections should be contained in the ``__DWARF`` segment with
   2315 names as follows:
   2316 
   2317 * "``.apple_names``" -> "``__apple_names``"
   2318 * "``.apple_types``" -> "``__apple_types``"
   2319 * "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
   2320 * "``.apple_objc``" -> "``__apple_objc``"
   2321 
   2322