Home | History | Annotate | Download | only in docs
      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
      2                       "http://www.w3.org/TR/html4/strict.dtd">
      3 <html>
      4 <head>
      5   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      6   <title>Source Level Debugging with LLVM</title>
      7   <link rel="stylesheet" href="_static/llvm.css" type="text/css">
      8 </head>
      9 <body>
     10 
     11 <h1>Source Level Debugging with LLVM</h1>
     12 
     13 <table class="layout" style="width:100%">
     14   <tr class="layout">
     15     <td class="left">
     16 <ul>
     17   <li><a href="#introduction">Introduction</a>
     18   <ol>
     19     <li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
     20     <li><a href="#consumers">Debug information consumers</a></li>
     21     <li><a href="#debugopt">Debugging optimized code</a></li>
     22   </ol></li>
     23   <li><a href="#format">Debugging information format</a>
     24   <ol>
     25     <li><a href="#debug_info_descriptors">Debug information descriptors</a>
     26     <ul>
     27       <li><a href="#format_compile_units">Compile unit descriptors</a></li>
     28       <li><a href="#format_files">File descriptors</a></li>
     29       <li><a href="#format_global_variables">Global variable descriptors</a></li>
     30       <li><a href="#format_subprograms">Subprogram descriptors</a></li>
     31       <li><a href="#format_blocks">Block descriptors</a></li>
     32       <li><a href="#format_basic_type">Basic type descriptors</a></li>
     33       <li><a href="#format_derived_type">Derived type descriptors</a></li>
     34       <li><a href="#format_composite_type">Composite type descriptors</a></li>
     35       <li><a href="#format_subrange">Subrange descriptors</a></li>
     36       <li><a href="#format_enumeration">Enumerator descriptors</a></li>
     37       <li><a href="#format_variables">Local variables</a></li>
     38     </ul></li>
     39     <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a>
     40       <ul>
     41       <li><a href="#format_common_declare">llvm.dbg.declare</a></li>
     42       <li><a href="#format_common_value">llvm.dbg.value</a></li>
     43     </ul></li>
     44   </ol></li>
     45   <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li>
     46   <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a>
     47   <ol>
     48     <li><a href="#ccxx_compile_units">C/C++ source file information</a></li>
     49     <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li>
     50     <li><a href="#ccxx_subprogram">C/C++ function information</a></li>
     51     <li><a href="#ccxx_basic_types">C/C++ basic types</a></li>
     52     <li><a href="#ccxx_derived_types">C/C++ derived types</a></li>
     53     <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li>
     54     <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li>
     55   </ol></li>
     56   <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a>
     57     <ol>
     58       <li><a href="#objcproperty">Debugging Information Extension
     59 	  for Objective C Properties</a>
     60         <ul>
     61 	  <li><a href="#objcpropertyintroduction">Introduction</a></li>
     62 	  <li><a href="#objcpropertyproposal">Proposal</a></li>
     63 	  <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li>
     64 	  <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li>
     65         </ul>
     66       </li>
     67       <li><a href="#acceltable">Name Accelerator Tables</a>
     68         <ul>
     69           <li><a href="#acceltableintroduction">Introduction</a></li>
     70           <li><a href="#acceltablehashes">Hash Tables</a></li>
     71           <li><a href="#acceltabledetails">Details</a></li>
     72           <li><a href="#acceltablecontents">Contents</a></li>
     73           <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li>
     74         </ul>
     75       </li>
     76     </ol>
     77   </li>
     78 </ul>
     79 </td>
     80 </tr></table>
     81 
     82 <div class="doc_author">
     83   <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a>
     84             and <a href="mailto:jlaskey (a] mac.com">Jim Laskey</a></p>
     85 </div>
     86 
     87 
     88 <!-- *********************************************************************** -->
     89 <h2><a name="introduction">Introduction</a></h2>
     90 <!-- *********************************************************************** -->
     91 
     92 <div>
     93 
     94 <p>This document is the central repository for all information pertaining to
     95    debug information in LLVM.  It describes the <a href="#format">actual format
     96    that the LLVM debug information</a> takes, which is useful for those
     97    interested in creating front-ends or dealing directly with the information.
     98    Further, this document provides specific examples of what debug information
     99    for C/C++ looks like.</p>
    100 
    101 <!-- ======================================================================= -->
    102 <h3>
    103   <a name="phil">Philosophy behind LLVM debugging information</a>
    104 </h3>
    105 
    106 <div>
    107 
    108 <p>The idea of the LLVM debugging information is to capture how the important
    109    pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
    110    Several design aspects have shaped the solution that appears here.  The
    111    important ones are:</p>
    112 
    113 <ul>
    114   <li>Debugging information should have very little impact on the rest of the
    115       compiler.  No transformations, analyses, or code generators should need to
    116       be modified because of debugging information.</li>
    117 
    118   <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
    119       easily described ways</a> with the debugging information.</li>
    120 
    121   <li>Because LLVM is designed to support arbitrary programming languages,
    122       LLVM-to-LLVM tools should not need to know anything about the semantics of
    123       the source-level-language.</li>
    124 
    125   <li>Source-level languages are often <b>widely</b> different from one another.
    126       LLVM should not put any restrictions of the flavor of the source-language,
    127       and the debugging information should work with any language.</li>
    128 
    129   <li>With code generator support, it should be possible to use an LLVM compiler
    130       to compile a program to native machine code and standard debugging
    131       formats.  This allows compatibility with traditional machine-code level
    132       debuggers, like GDB or DBX.</li>
    133 </ul>
    134 
    135 <p>The approach used by the LLVM implementation is to use a small set
    136    of <a href="#format_common_intrinsics">intrinsic functions</a> to define a
    137    mapping between LLVM program objects and the source-level objects.  The
    138    description of the source-level program is maintained in LLVM metadata
    139    in an <a href="#ccxx_frontend">implementation-defined format</a>
    140    (the C/C++ front-end currently uses working draft 7 of
    141    the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3
    142    standard</a>).</p>
    143 
    144 <p>When a program is being debugged, a debugger interacts with the user and
    145    turns the stored debug information into source-language specific information.
    146    As such, a debugger must be aware of the source-language, and is thus tied to
    147    a specific language or family of languages.</p>
    148 
    149 </div>
    150 
    151 <!-- ======================================================================= -->
    152 <h3>
    153   <a name="consumers">Debug information consumers</a>
    154 </h3>
    155 
    156 <div>
    157 
    158 <p>The role of debug information is to provide meta information normally
    159    stripped away during the compilation process.  This meta information provides
    160    an LLVM user a relationship between generated code and the original program
    161    source code.</p>
    162 
    163 <p>Currently, debug information is consumed by DwarfDebug to produce dwarf
    164    information used by the gdb debugger.  Other targets could use the same
    165    information to produce stabs or other debug forms.</p>
    166 
    167 <p>It would also be reasonable to use debug information to feed profiling tools
    168    for analysis of generated code, or, tools for reconstructing the original
    169    source from generated code.</p>
    170 
    171 <p>TODO - expound a bit more.</p>
    172 
    173 </div>
    174 
    175 <!-- ======================================================================= -->
    176 <h3>
    177   <a name="debugopt">Debugging optimized code</a>
    178 </h3>
    179 
    180 <div>
    181 
    182 <p>An extremely high priority of LLVM debugging information is to make it
    183    interact well with optimizations and analysis.  In particular, the LLVM debug
    184    information provides the following guarantees:</p>
    185 
    186 <ul>
    187   <li>LLVM debug information <b>always provides information to accurately read
    188       the source-level state of the program</b>, regardless of which LLVM
    189       optimizations have been run, and without any modification to the
    190       optimizations themselves.  However, some optimizations may impact the
    191       ability to modify the current state of the program with a debugger, such
    192       as setting program variables, or calling functions that have been
    193       deleted.</li>
    194 
    195   <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
    196       debugging information, allowing them to update the debugging information
    197       as they perform aggressive optimizations.  This means that, with effort,
    198       the LLVM optimizers could optimize debug code just as well as non-debug
    199       code.</li>
    200 
    201   <li>LLVM debug information does not prevent optimizations from
    202       happening (for example inlining, basic block reordering/merging/cleanup,
    203       tail duplication, etc).</li>
    204 
    205   <li>LLVM debug information is automatically optimized along with the rest of
    206       the program, using existing facilities.  For example, duplicate
    207       information is automatically merged by the linker, and unused information
    208       is automatically removed.</li>
    209 </ul>
    210 
    211 <p>Basically, the debug information allows you to compile a program with
    212    "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily
    213    modify the program as it executes from a debugger.  Compiling a program with
    214    "<tt>-O3 -g</tt>" gives you full debug information that is always available
    215    and accurate for reading (e.g., you get accurate stack traces despite tail
    216    call elimination and inlining), but you might lose the ability to modify the
    217    program and call functions where were optimized out of the program, or
    218    inlined away completely.</p>
    219 
    220 <p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a
    221    framework to test optimizer's handling of debugging information. It can be
    222    run like this:</p>
    223 
    224 <div class="doc_code">
    225 <pre>
    226 % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
    227 % make TEST=dbgopt
    228 </pre>
    229 </div>
    230 
    231 <p>This will test impact of debugging information on optimization passes. If
    232    debugging information influences optimization passes then it will be reported
    233    as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more
    234    information on LLVM test infrastructure and how to run various tests.</p>
    235 
    236 </div>
    237 
    238 </div>
    239 
    240 <!-- *********************************************************************** -->
    241 <h2>
    242   <a name="format">Debugging information format</a>
    243 </h2>
    244 <!-- *********************************************************************** -->
    245 
    246 <div>
    247 
    248 <p>LLVM debugging information has been carefully designed to make it possible
    249    for the optimizer to optimize the program and debugging information without
    250    necessarily having to know anything about debugging information.  In
    251    particular, the use of metadata avoids duplicated debugging information from
    252    the beginning, and the global dead code elimination pass automatically
    253    deletes debugging information for a function if it decides to delete the
    254    function. </p>
    255 
    256 <p>To do this, most of the debugging information (descriptors for types,
    257    variables, functions, source files, etc) is inserted by the language
    258    front-end in the form of LLVM metadata. </p>
    259 
    260 <p>Debug information is designed to be agnostic about the target debugger and
    261    debugging information representation (e.g. DWARF/Stabs/etc).  It uses a
    262    generic pass to decode the information that represents variables, types,
    263    functions, namespaces, etc: this allows for arbitrary source-language
    264    semantics and type-systems to be used, as long as there is a module
    265    written for the target debugger to interpret the information. </p>
    266 
    267 <p>To provide basic functionality, the LLVM debugger does have to make some
    268    assumptions about the source-level language being debugged, though it keeps
    269    these to a minimum.  The only common features that the LLVM debugger assumes
    270    exist are <a href="#format_files">source files</a>,
    271    and <a href="#format_global_variables">program objects</a>.  These abstract
    272    objects are used by a debugger to form stack traces, show information about
    273    local variables, etc.</p>
    274 
    275 <p>This section of the documentation first describes the representation aspects
    276    common to any source-language.  The <a href="#ccxx_frontend">next section</a>
    277    describes the data layout conventions used by the C and C++ front-ends.</p>
    278 
    279 <!-- ======================================================================= -->
    280 <h3>
    281   <a name="debug_info_descriptors">Debug information descriptors</a>
    282 </h3>
    283 
    284 <div>
    285 
    286 <p>In consideration of the complexity and volume of debug information, LLVM
    287    provides a specification for well formed debug descriptors. </p>
    288 
    289 <p>Consumers of LLVM debug information expect the descriptors for program
    290    objects to start in a canonical format, but the descriptors can include
    291    additional information appended at the end that is source-language
    292    specific. All LLVM debugging information is versioned, allowing backwards
    293    compatibility in the case that the core structures need to change in some
    294    way.  Also, all debugging information objects start with a tag to indicate
    295    what type of object it is.  The source-language is allowed to define its own
    296    objects, by using unreserved tag numbers.  We recommend using with tags in
    297    the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base =
    298    0x1000.)</p>
    299 
    300 <p>The fields of debug descriptors used internally by LLVM
    301    are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>,
    302    <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p>
    303 
    304 <div class="doc_code">
    305 <pre>
    306 !1 = metadata !{
    307   i32,   ;; A tag
    308   ...
    309 }
    310 </pre>
    311 </div>
    312 
    313 <p><a name="LLVMDebugVersion">The first field of a descriptor is always an
    314    <tt>i32</tt> containing a tag value identifying the content of the
    315    descriptor.  The remaining fields are specific to the descriptor.  The values
    316    of tags are loosely bound to the tag values of DWARF information entries.
    317    However, that does not restrict the use of the information supplied to DWARF
    318    targets.  To facilitate versioning of debug information, the tag is augmented
    319    with the current debug version (LLVMDebugVersion = 8 &lt;&lt; 16 or
    320    0x80000 or 524288.)</a></p>
    321 
    322 <p>The details of the various descriptors follow.</p>
    323 
    324 <!-- ======================================================================= -->
    325 <h4>
    326   <a name="format_compile_units">Compile unit descriptors</a>
    327 </h4>
    328 
    329 <div>
    330 
    331 <div class="doc_code">
    332 <pre>
    333 !0 = metadata !{
    334   i32,       ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    335              ;; (DW_TAG_compile_unit)
    336   i32,       ;; Unused field.
    337   i32,       ;; DWARF language identifier (ex. DW_LANG_C89)
    338   metadata,  ;; Source file name
    339   metadata,  ;; Source file directory (includes trailing slash)
    340   metadata   ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
    341   i1,        ;; True if this is a main compile unit.
    342   i1,        ;; True if this is optimized.
    343   metadata,  ;; Flags
    344   i32        ;; Runtime version
    345   metadata   ;; List of enums types
    346   metadata   ;; List of retained types
    347   metadata   ;; List of subprograms
    348   metadata   ;; List of global variables
    349 }
    350 </pre>
    351 </div>
    352 
    353 <p>These descriptors contain a source language ID for the file (we use the DWARF
    354    3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>,
    355    <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename,
    356    working directory of the compiler, and an identifier string for the compiler
    357    that produced it.</p>
    358 
    359 <p>Compile unit descriptors provide the root context for objects declared in a
    360    specific compilation unit. File descriptors are defined using this context.
    361    These descriptors are collected by a named metadata
    362    <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms,
    363    global variables and type information.
    364 
    365 </div>
    366 
    367 <!-- ======================================================================= -->
    368 <h4>
    369   <a name="format_files">File descriptors</a>
    370 </h4>
    371 
    372 <div>
    373 
    374 <div class="doc_code">
    375 <pre>
    376 !0 = metadata !{
    377   i32,       ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    378              ;; (DW_TAG_file_type)
    379   metadata,  ;; Source file name
    380   metadata,  ;; Source file directory (includes trailing slash)
    381   metadata   ;; Unused
    382 }
    383 </pre>
    384 </div>
    385 
    386 <p>These descriptors contain information for a file. Global variables and top
    387    level functions would be defined using this context.k File descriptors also
    388    provide context for source line correspondence. </p>
    389 
    390 <p>Each input file is encoded as a separate file descriptor in LLVM debugging
    391    information output. </p>
    392 
    393 </div>
    394 
    395 <!-- ======================================================================= -->
    396 <h4>
    397   <a name="format_global_variables">Global variable descriptors</a>
    398 </h4>
    399 
    400 <div>
    401 
    402 <div class="doc_code">
    403 <pre>
    404 !1 = metadata !{
    405   i32,      ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    406             ;; (DW_TAG_variable)
    407   i32,      ;; Unused field.
    408   metadata, ;; Reference to context descriptor
    409   metadata, ;; Name
    410   metadata, ;; Display name (fully qualified C++ name)
    411   metadata, ;; MIPS linkage name (for C++)
    412   metadata, ;; Reference to file where defined
    413   i32,      ;; Line number where defined
    414   metadata, ;; Reference to type descriptor
    415   i1,       ;; True if the global is local to compile unit (static)
    416   i1,       ;; True if the global is defined in the compile unit (not extern)
    417   {}*       ;; Reference to the global variable
    418 }
    419 </pre>
    420 </div>
    421 
    422 <p>These descriptors provide debug information about globals variables.  The
    423 provide details such as name, type and where the variable is defined. All
    424 global variables are collected inside the named metadata
    425 <tt>!llvm.dbg.cu</tt>.</p>
    426 
    427 </div>
    428 
    429 <!-- ======================================================================= -->
    430 <h4>
    431   <a name="format_subprograms">Subprogram descriptors</a>
    432 </h4>
    433 
    434 <div>
    435 
    436 <div class="doc_code">
    437 <pre>
    438 !2 = metadata !{
    439   i32,      ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    440             ;; (DW_TAG_subprogram)
    441   i32,      ;; Unused field.
    442   metadata, ;; Reference to context descriptor
    443   metadata, ;; Name
    444   metadata, ;; Display name (fully qualified C++ name)
    445   metadata, ;; MIPS linkage name (for C++)
    446   metadata, ;; Reference to file where defined
    447   i32,      ;; Line number where defined
    448   metadata, ;; Reference to type descriptor
    449   i1,       ;; True if the global is local to compile unit (static)
    450   i1,       ;; True if the global is defined in the compile unit (not extern)
    451   i32,      ;; Line number where the scope of the subprogram begins
    452   i32,      ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
    453   i32,      ;; Index into a virtual function
    454   metadata, ;; indicates which base type contains the vtable pointer for the
    455             ;; derived class
    456   i32,      ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
    457   i1,       ;; isOptimized
    458   Function *,;; Pointer to LLVM function
    459   metadata, ;; Lists function template parameters
    460   metadata  ;; Function declaration descriptor
    461   metadata  ;; List of function variables
    462 }
    463 </pre>
    464 </div>
    465 
    466 <p>These descriptors provide debug information about functions, methods and
    467    subprograms.  They provide details such as name, return types and the source
    468    location where the subprogram is defined.
    469 </p>
    470 
    471 </div>
    472 
    473 <!-- ======================================================================= -->
    474 <h4>
    475   <a name="format_blocks">Block descriptors</a>
    476 </h4>
    477 
    478 <div>
    479 
    480 <div class="doc_code">
    481 <pre>
    482 !3 = metadata !{
    483   i32,     ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
    484   metadata,;; Reference to context descriptor
    485   i32,     ;; Line number
    486   i32,     ;; Column number
    487   metadata,;; Reference to source file
    488   i32      ;; Unique ID to identify blocks from a template function
    489 }
    490 </pre>
    491 </div>
    492 
    493 <p>This descriptor provides debug information about nested blocks within a
    494    subprogram. The line number and column numbers are used to dinstinguish
    495    two lexical blocks at same depth. </p>
    496 
    497 <div class="doc_code">
    498 <pre>
    499 !3 = metadata !{
    500   i32,     ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
    501   metadata ;; Reference to the scope we're annotating with a file change
    502   metadata,;; Reference to the file the scope is enclosed in.
    503 }
    504 </pre>
    505 </div>
    506 
    507 <p>This descriptor provides a wrapper around a lexical scope to handle file
    508    changes in the middle of a lexical block.</p>
    509 
    510 </div>
    511 
    512 <!-- ======================================================================= -->
    513 <h4>
    514   <a name="format_basic_type">Basic type descriptors</a>
    515 </h4>
    516 
    517 <div>
    518 
    519 <div class="doc_code">
    520 <pre>
    521 !4 = metadata !{
    522   i32,      ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    523             ;; (DW_TAG_base_type)
    524   metadata, ;; Reference to context
    525   metadata, ;; Name (may be "" for anonymous types)
    526   metadata, ;; Reference to file where defined (may be NULL)
    527   i32,      ;; Line number where defined (may be 0)
    528   i64,      ;; Size in bits
    529   i64,      ;; Alignment in bits
    530   i64,      ;; Offset in bits
    531   i32,      ;; Flags
    532   i32       ;; DWARF type encoding
    533 }
    534 </pre>
    535 </div>
    536 
    537 <p>These descriptors define primitive types used in the code. Example int, bool
    538    and float.  The context provides the scope of the type, which is usually the
    539    top level.  Since basic types are not usually user defined the context
    540    and line number can be left as NULL and 0.  The size, alignment and offset
    541    are expressed in bits and can be 64 bit values.  The alignment is used to
    542    round the offset when embedded in a
    543    <a href="#format_composite_type">composite type</a> (example to keep float
    544    doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
    545    a <a href="#format_composite_type">composite type</a>.</p>
    546 
    547 <p>The type encoding provides the details of the type.  The values are typically
    548    one of the following:</p>
    549 
    550 <div class="doc_code">
    551 <pre>
    552 DW_ATE_address       = 1
    553 DW_ATE_boolean       = 2
    554 DW_ATE_float         = 4
    555 DW_ATE_signed        = 5
    556 DW_ATE_signed_char   = 6
    557 DW_ATE_unsigned      = 7
    558 DW_ATE_unsigned_char = 8
    559 </pre>
    560 </div>
    561 
    562 </div>
    563 
    564 <!-- ======================================================================= -->
    565 <h4>
    566   <a name="format_derived_type">Derived type descriptors</a>
    567 </h4>
    568 
    569 <div>
    570 
    571 <div class="doc_code">
    572 <pre>
    573 !5 = metadata !{
    574   i32,      ;; Tag (see below)
    575   metadata, ;; Reference to context
    576   metadata, ;; Name (may be "" for anonymous types)
    577   metadata, ;; Reference to file where defined (may be NULL)
    578   i32,      ;; Line number where defined (may be 0)
    579   i64,      ;; Size in bits
    580   i64,      ;; Alignment in bits
    581   i64,      ;; Offset in bits
    582   i32,      ;; Flags to encode attributes, e.g. private
    583   metadata, ;; Reference to type derived from
    584   metadata, ;; (optional) Name of the Objective C property associated with
    585             ;; Objective-C an ivar
    586   metadata, ;; (optional) Name of the Objective C property getter selector.
    587   metadata, ;; (optional) Name of the Objective C property setter selector.
    588   i32       ;; (optional) Objective C property attributes.
    589 }
    590 </pre>
    591 </div>
    592 
    593 <p>These descriptors are used to define types derived from other types.  The
    594 value of the tag varies depending on the meaning.  The following are possible
    595 tag values:</p>
    596 
    597 <div class="doc_code">
    598 <pre>
    599 DW_TAG_formal_parameter = 5
    600 DW_TAG_member           = 13
    601 DW_TAG_pointer_type     = 15
    602 DW_TAG_reference_type   = 16
    603 DW_TAG_typedef          = 22
    604 DW_TAG_const_type       = 38
    605 DW_TAG_volatile_type    = 53
    606 DW_TAG_restrict_type    = 55
    607 </pre>
    608 </div>
    609 
    610 <p><tt>DW_TAG_member</tt> is used to define a member of
    611    a <a href="#format_composite_type">composite type</a>
    612    or <a href="#format_subprograms">subprogram</a>.  The type of the member is
    613    the <a href="#format_derived_type">derived
    614    type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which
    615    is a formal argument of a subprogram.</p>
    616 
    617 <p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p>
    618 
    619 <p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>,
    620    <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and
    621    <tt>DW_TAG_restrict_type</tt> are used to qualify
    622    the <a href="#format_derived_type">derived type</a>. </p>
    623 
    624 <p><a href="#format_derived_type">Derived type</a> location can be determined
    625    from the context and line number.  The size, alignment and offset are
    626    expressed in bits and can be 64 bit values.  The alignment is used to round
    627    the offset when embedded in a <a href="#format_composite_type">composite
    628    type</a> (example to keep float doubles on 64 bit boundaries.) The offset is
    629    the bit offset if embedded in a <a href="#format_composite_type">composite
    630    type</a>.</p>
    631 
    632 <p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL.
    633 </p>
    634 
    635 </div>
    636 
    637 <!-- ======================================================================= -->
    638 <h4>
    639   <a name="format_composite_type">Composite type descriptors</a>
    640 </h4>
    641 
    642 <div>
    643 
    644 <div class="doc_code">
    645 <pre>
    646 !6 = metadata !{
    647   i32,      ;; Tag (see below)
    648   metadata, ;; Reference to context
    649   metadata, ;; Name (may be "" for anonymous types)
    650   metadata, ;; Reference to file where defined (may be NULL)
    651   i32,      ;; Line number where defined (may be 0)
    652   i64,      ;; Size in bits
    653   i64,      ;; Alignment in bits
    654   i64,      ;; Offset in bits
    655   i32,      ;; Flags
    656   metadata, ;; Reference to type derived from
    657   metadata, ;; Reference to array of member descriptors
    658   i32       ;; Runtime languages
    659 }
    660 </pre>
    661 </div>
    662 
    663 <p>These descriptors are used to define types that are composed of 0 or more
    664 elements.  The value of the tag varies depending on the meaning.  The following
    665 are possible tag values:</p>
    666 
    667 <div class="doc_code">
    668 <pre>
    669 DW_TAG_array_type       = 1
    670 DW_TAG_enumeration_type = 4
    671 DW_TAG_structure_type   = 19
    672 DW_TAG_union_type       = 23
    673 DW_TAG_vector_type      = 259
    674 DW_TAG_subroutine_type  = 21
    675 DW_TAG_inheritance      = 28
    676 </pre>
    677 </div>
    678 
    679 <p>The vector flag indicates that an array type is a native packed vector.</p>
    680 
    681 <p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types
    682    (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange
    683    descriptors</a>, each representing the range of subscripts at that level of
    684    indexing.</p>
    685 
    686 <p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are
    687    <a href="#format_enumeration">enumerator descriptors</a>, each representing
    688    the definition of enumeration value for the set. All enumeration type
    689    descriptors are collected inside the named metadata
    690    <tt>!llvm.dbg.cu</tt>.</p>
    691 
    692 <p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag
    693    = <tt>DW_TAG_union_type</tt>) types are any one of
    694    the <a href="#format_basic_type">basic</a>,
    695    <a href="#format_derived_type">derived</a>
    696    or <a href="#format_composite_type">composite</a> type descriptors, each
    697    representing a field member of the structure or union.</p>
    698 
    699 <p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors
    700    provide information about base classes, static members and member
    701    functions. If a member is a <a href="#format_derived_type">derived type
    702    descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type
    703    represents a base class. If the member of is
    704    a <a href="#format_global_variables">global variable descriptor</a> then it
    705    represents a static member.  And, if the member is
    706    a <a href="#format_subprograms">subprogram descriptor</a> then it represents
    707    a member function.  For static members and member
    708    functions, <tt>getName()</tt> returns the members link or the C++ mangled
    709    name.  <tt>getDisplayName()</tt> the simplied version of the name.</p>
    710 
    711 <p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type
    712    elements is the return type for the subroutine.  The remaining elements are
    713    the formal arguments to the subroutine.</p>
    714 
    715 <p><a href="#format_composite_type">Composite type</a> location can be
    716    determined from the context and line number.  The size, alignment and
    717    offset are expressed in bits and can be 64 bit values.  The alignment is used
    718    to round the offset when embedded in
    719    a <a href="#format_composite_type">composite type</a> (as an example, to keep
    720    float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
    721    in a <a href="#format_composite_type">composite type</a>.</p>
    722 
    723 </div>
    724 
    725 <!-- ======================================================================= -->
    726 <h4>
    727   <a name="format_subrange">Subrange descriptors</a>
    728 </h4>
    729 
    730 <div>
    731 
    732 <div class="doc_code">
    733 <pre>
    734 !42 = metadata !{
    735   i32,    ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type)
    736   i64,    ;; Low value
    737   i64     ;; High value
    738 }
    739 </pre>
    740 </div>
    741 
    742 <p>These descriptors are used to define ranges of array subscripts for an array
    743    <a href="#format_composite_type">composite type</a>.  The low value defines
    744    the lower bounds typically zero for C/C++.  The high value is the upper
    745    bounds.  Values are 64 bit.  High - low + 1 is the size of the array.  If low
    746    > high the array bounds are not included in generated debugging information.
    747 </p>
    748 
    749 </div>
    750 
    751 <!-- ======================================================================= -->
    752 <h4>
    753   <a name="format_enumeration">Enumerator descriptors</a>
    754 </h4>
    755 
    756 <div>
    757 
    758 <div class="doc_code">
    759 <pre>
    760 !6 = metadata !{
    761   i32,      ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
    762             ;; (DW_TAG_enumerator)
    763   metadata, ;; Name
    764   i64       ;; Value
    765 }
    766 </pre>
    767 </div>
    768 
    769 <p>These descriptors are used to define members of an
    770    enumeration <a href="#format_composite_type">composite type</a>, it
    771    associates the name to the value.</p>
    772 
    773 </div>
    774 
    775 <!-- ======================================================================= -->
    776 <h4>
    777   <a name="format_variables">Local variables</a>
    778 </h4>
    779 
    780 <div>
    781 
    782 <div class="doc_code">
    783 <pre>
    784 !7 = metadata !{
    785   i32,      ;; Tag (see below)
    786   metadata, ;; Context
    787   metadata, ;; Name
    788   metadata, ;; Reference to file where defined
    789   i32,      ;; 24 bit - Line number where defined
    790             ;; 8 bit - Argument number. 1 indicates 1st argument.
    791   metadata, ;; Type descriptor
    792   i32,      ;; flags
    793   metadata  ;; (optional) Reference to inline location
    794 }
    795 </pre>
    796 </div>
    797 
    798 <p>These descriptors are used to define variables local to a sub program.  The
    799    value of the tag depends on the usage of the variable:</p>
    800 
    801 <div class="doc_code">
    802 <pre>
    803 DW_TAG_auto_variable   = 256
    804 DW_TAG_arg_variable    = 257
    805 DW_TAG_return_variable = 258
    806 </pre>
    807 </div>
    808 
    809 <p>An auto variable is any variable declared in the body of the function.  An
    810    argument variable is any variable that appears as a formal argument to the
    811    function.  A return variable is used to track the result of a function and
    812    has no source correspondent.</p>
    813 
    814 <p>The context is either the subprogram or block where the variable is defined.
    815    Name the source variable name.  Context and line indicate where the
    816    variable was defined. Type descriptor defines the declared type of the
    817    variable.</p>
    818 
    819 </div>
    820 
    821 </div>
    822 
    823 <!-- ======================================================================= -->
    824 <h3>
    825   <a name="format_common_intrinsics">Debugger intrinsic functions</a>
    826 </h3>
    827 
    828 <div>
    829 
    830 <p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to
    831    provide debug information at various points in generated code.</p>
    832 
    833 <!-- ======================================================================= -->
    834 <h4>
    835   <a name="format_common_declare">llvm.dbg.declare</a>
    836 </h4>
    837 
    838 <div>
    839 <pre>
    840   void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata)
    841 </pre>
    842 
    843 <p>This intrinsic provides information about a local element (e.g., variable). The
    844    first argument is metadata holding the alloca for the variable. The
    845    second argument is metadata containing a description of the variable.</p>
    846 </div>
    847 
    848 <!-- ======================================================================= -->
    849 <h4>
    850   <a name="format_common_value">llvm.dbg.value</a>
    851 </h4>
    852 
    853 <div>
    854 <pre>
    855   void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata)
    856 </pre>
    857 
    858 <p>This intrinsic provides information when a user source variable is set to a
    859    new value.  The first argument is the new value (wrapped as metadata).  The
    860    second argument is the offset in the user source variable where the new value
    861    is written.  The third argument is metadata containing a description of the
    862    user source variable.</p>
    863 </div>
    864 
    865 </div>
    866 
    867 <!-- ======================================================================= -->
    868 <h3>
    869   <a name="format_common_lifetime">Object lifetimes and scoping</a>
    870 </h3>
    871 
    872 <div>
    873 <p>In many languages, the local variables in functions can have their lifetimes
    874    or scopes limited to a subset of a function.  In the C family of languages,
    875    for example, variables are only live (readable and writable) within the
    876    source block that they are defined in.  In functional languages, values are
    877    only readable after they have been defined.  Though this is a very obvious
    878    concept, it is non-trivial to model in LLVM, because it has no notion of
    879    scoping in this sense, and does not want to be tied to a language's scoping
    880    rules.</p>
    881 
    882 <p>In order to handle this, the LLVM debug format uses the metadata attached to
    883    llvm instructions to encode line number and scoping information. Consider
    884    the following C fragment, for example:</p>
    885 
    886 <div class="doc_code">
    887 <pre>
    888 1.  void foo() {
    889 2.    int X = 21;
    890 3.    int Y = 22;
    891 4.    {
    892 5.      int Z = 23;
    893 6.      Z = X;
    894 7.    }
    895 8.    X = Y;
    896 9.  }
    897 </pre>
    898 </div>
    899 
    900 <p>Compiled to LLVM, this function would be represented like this:</p>
    901 
    902 <div class="doc_code">
    903 <pre>
    904 define void @foo() nounwind ssp {
    905 entry:
    906   %X = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=4]
    907   %Y = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=4]
    908   %Z = alloca i32, align 4                        ; &lt;i32*&gt; [#uses=3]
    909   %0 = bitcast i32* %X to {}*                     ; &lt;{}*&gt; [#uses=1]
    910   call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
    911   store i32 21, i32* %X, !dbg !8
    912   %1 = bitcast i32* %Y to {}*                     ; &lt;{}*&gt; [#uses=1]
    913   call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
    914   store i32 22, i32* %Y, !dbg !11
    915   %2 = bitcast i32* %Z to {}*                     ; &lt;{}*&gt; [#uses=1]
    916   call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
    917   store i32 23, i32* %Z, !dbg !15
    918   %tmp = load i32* %X, !dbg !16                   ; &lt;i32&gt; [#uses=1]
    919   %tmp1 = load i32* %Y, !dbg !16                  ; &lt;i32&gt; [#uses=1]
    920   %add = add nsw i32 %tmp, %tmp1, !dbg !16        ; &lt;i32&gt; [#uses=1]
    921   store i32 %add, i32* %Z, !dbg !16
    922   %tmp2 = load i32* %Y, !dbg !17                  ; &lt;i32&gt; [#uses=1]
    923   store i32 %tmp2, i32* %X, !dbg !17
    924   ret void, !dbg !18
    925 }
    926 
    927 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
    928 
    929 !0 = metadata !{i32 459008, metadata !1, metadata !"X",
    930                 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
    931 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    932 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
    933                metadata !"foo", metadata !3, i32 1, metadata !4,
    934                i1 false, i1 true}; [DW_TAG_subprogram ]
    935 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
    936                 metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
    937                 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
    938 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
    939                 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
    940 !5 = metadata !{null}
    941 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
    942                 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
    943 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    944 !8 = metadata !{i32 2, i32 3, metadata !1, null}
    945 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
    946                 metadata !6}; [ DW_TAG_auto_variable ]
    947 !10 = metadata !{i32 3, i32 7, metadata !1, null}
    948 !11 = metadata !{i32 3, i32 3, metadata !1, null}
    949 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
    950                  metadata !6}; [ DW_TAG_auto_variable ]
    951 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
    952 !14 = metadata !{i32 5, i32 9, metadata !13, null}
    953 !15 = metadata !{i32 5, i32 5, metadata !13, null}
    954 !16 = metadata !{i32 6, i32 5, metadata !13, null}
    955 !17 = metadata !{i32 8, i32 3, metadata !1, null}
    956 !18 = metadata !{i32 9, i32 1, metadata !2, null}
    957 </pre>
    958 </div>
    959 
    960 <p>This example illustrates a few important details about LLVM debugging
    961    information. In particular, it shows how the <tt>llvm.dbg.declare</tt>
    962    intrinsic and location information, which are attached to an instruction,
    963    are applied together to allow a debugger to analyze the relationship between
    964    statements, variable definitions, and the code used to implement the
    965    function.</p>
    966 
    967 <div class="doc_code">
    968 <pre>
    969 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
    970 </pre>
    971 </div>
    972 
    973 <p>The first intrinsic
    974    <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
    975    encodes debugging information for the variable <tt>X</tt>. The metadata
    976    <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the
    977    variable <tt>X</tt>.</p>
    978 
    979 <div class="doc_code">
    980 <pre>
    981 !7 = metadata !{i32 2, i32 7, metadata !1, null}
    982 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
    983 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
    984                 metadata !"foo", metadata !"foo", metadata !3, i32 1,
    985                 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
    986 </pre>
    987 </div>
    988 
    989 <p>Here <tt>!7</tt> is metadata providing location information. It has four
    990    fields: line number, column number, scope, and original scope. The original
    991    scope represents inline location if this instruction is inlined inside a
    992    caller, and is null otherwise. In this example, scope is encoded by
    993    <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope
    994    <tt>!2</tt>, where <tt>!2</tt> is a
    995    <a href="#format_subprograms">subprogram descriptor</a>. This way the
    996    location information attached to the intrinsics indicates that the
    997    variable <tt>X</tt> is declared at line number 2 at a function level scope in
    998    function <tt>foo</tt>.</p>
    999 
   1000 <p>Now lets take another example.</p>
   1001 
   1002 <div class="doc_code">
   1003 <pre>
   1004 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
   1005 </pre>
   1006 </div>
   1007 
   1008 <p>The second intrinsic
   1009    <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
   1010    encodes debugging information for variable <tt>Z</tt>. The metadata
   1011    <tt>!dbg !14</tt> attached to the intrinsic provides scope information for
   1012    the variable <tt>Z</tt>.</p>
   1013 
   1014 <div class="doc_code">
   1015 <pre>
   1016 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
   1017 !14 = metadata !{i32 5, i32 9, metadata !13, null}
   1018 </pre>
   1019 </div>
   1020 
   1021 <p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and
   1022    column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope
   1023    itself resides inside of lexical scope <tt>!1</tt> described above.</p>
   1024 
   1025 <p>The scope information attached with each instruction provides a
   1026    straightforward way to find instructions covered by a scope.</p>
   1027 
   1028 </div>
   1029 
   1030 </div>
   1031 
   1032 <!-- *********************************************************************** -->
   1033 <h2>
   1034   <a name="ccxx_frontend">C/C++ front-end specific debug information</a>
   1035 </h2>
   1036 <!-- *********************************************************************** -->
   1037 
   1038 <div>
   1039 
   1040 <p>The C and C++ front-ends represent information about the program in a format
   1041    that is effectively identical
   1042    to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in
   1043    terms of information content.  This allows code generators to trivially
   1044    support native debuggers by generating standard dwarf information, and
   1045    contains enough information for non-dwarf targets to translate it as
   1046    needed.</p>
   1047 
   1048 <p>This section describes the forms used to represent C and C++ programs. Other
   1049    languages could pattern themselves after this (which itself is tuned to
   1050    representing programs in the same way that DWARF 3 does), or they could
   1051    choose to provide completely different forms if they don't fit into the DWARF
   1052    model.  As support for debugging information gets added to the various LLVM
   1053    source-language front-ends, the information used should be documented
   1054    here.</p>
   1055 
   1056 <p>The following sections provide examples of various C/C++ constructs and the
   1057    debug information that would best describe those constructs.</p>
   1058 
   1059 <!-- ======================================================================= -->
   1060 <h3>
   1061   <a name="ccxx_compile_units">C/C++ source file information</a>
   1062 </h3>
   1063 
   1064 <div>
   1065 
   1066 <p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located
   1067    in the directory <tt>/Users/mine/sources</tt>, the following code:</p>
   1068 
   1069 <div class="doc_code">
   1070 <pre>
   1071 #include "MyHeader.h"
   1072 
   1073 int main(int argc, char *argv[]) {
   1074   return 0;
   1075 }
   1076 </pre>
   1077 </div>
   1078 
   1079 <p>a C/C++ front-end would generate the following descriptors:</p>
   1080 
   1081 <div class="doc_code">
   1082 <pre>
   1083 ...
   1084 ;;
   1085 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
   1086 ;;
   1087 !2 = metadata !{
   1088   i32 524305,    ;; Tag
   1089   i32 0,         ;; Unused
   1090   i32 4,         ;; Language Id
   1091   metadata !"MySource.cpp",
   1092   metadata !"/Users/mine/sources",
   1093   metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
   1094   i1 true,       ;; Main Compile Unit
   1095   i1 false,      ;; Optimized compile unit
   1096   metadata !"",  ;; Compiler flags
   1097   i32 0}         ;; Runtime version
   1098 
   1099 ;;
   1100 ;; Define the file for the file "/Users/mine/sources/MySource.cpp".
   1101 ;;
   1102 !1 = metadata !{
   1103   i32 524329,    ;; Tag
   1104   metadata !"MySource.cpp",
   1105   metadata !"/Users/mine/sources",
   1106   metadata !2    ;; Compile unit
   1107 }
   1108 
   1109 ;;
   1110 ;; Define the file for the file "/Users/mine/sources/Myheader.h"
   1111 ;;
   1112 !3 = metadata !{
   1113   i32 524329,    ;; Tag
   1114   metadata !"Myheader.h"
   1115   metadata !"/Users/mine/sources",
   1116   metadata !2    ;; Compile unit
   1117 }
   1118 
   1119 ...
   1120 </pre>
   1121 </div>
   1122 
   1123 <p>llvm::Instruction provides easy access to metadata attached with an
   1124 instruction. One can extract line number information encoded in LLVM IR
   1125 using <tt>Instruction::getMetadata()</tt> and
   1126 <tt>DILocation::getLineNumber()</tt>.
   1127 <pre>
   1128  if (MDNode *N = I->getMetadata("dbg")) {  // Here I is an LLVM instruction
   1129    DILocation Loc(N);                      // DILocation is in DebugInfo.h
   1130    unsigned Line = Loc.getLineNumber();
   1131    StringRef File = Loc.getFilename();
   1132    StringRef Dir = Loc.getDirectory();
   1133  }
   1134 </pre>
   1135 </div>
   1136 
   1137 <!-- ======================================================================= -->
   1138 <h3>
   1139   <a name="ccxx_global_variable">C/C++ global variable information</a>
   1140 </h3>
   1141 
   1142 <div>
   1143 
   1144 <p>Given an integer global variable declared as follows:</p>
   1145 
   1146 <div class="doc_code">
   1147 <pre>
   1148 int MyGlobal = 100;
   1149 </pre>
   1150 </div>
   1151 
   1152 <p>a C/C++ front-end would generate the following descriptors:</p>
   1153 
   1154 <div class="doc_code">
   1155 <pre>
   1156 ;;
   1157 ;; Define the global itself.
   1158 ;;
   1159 %MyGlobal = global int 100
   1160 ...
   1161 ;;
   1162 ;; List of debug info of globals
   1163 ;;
   1164 !llvm.dbg.cu = !{!0}
   1165 
   1166 ;; Define the compile unit.
   1167 !0 = metadata !{
   1168   i32 786449,                       ;; Tag
   1169   i32 0,                            ;; Context
   1170   i32 4,                            ;; Language
   1171   metadata !"foo.cpp",              ;; File
   1172   metadata !"/Volumes/Data/tmp",    ;; Directory
   1173   metadata !"clang version 3.1 ",   ;; Producer
   1174   i1 true,                          ;; Deprecated field
   1175   i1 false,                         ;; "isOptimized"?
   1176   metadata !"",                     ;; Flags
   1177   i32 0,                            ;; Runtime Version
   1178   metadata !1,                      ;; Enum Types
   1179   metadata !1,                      ;; Retained Types
   1180   metadata !1,                      ;; Subprograms
   1181   metadata !3                       ;; Global Variables
   1182 } ; [ DW_TAG_compile_unit ]
   1183 
   1184 ;; The Array of Global Variables
   1185 !3 = metadata !{
   1186   metadata !4
   1187 }
   1188 
   1189 !4 = metadata !{
   1190   metadata !5
   1191 }
   1192 
   1193 ;;
   1194 ;; Define the global variable itself.
   1195 ;;
   1196 !5 = metadata !{
   1197   i32 786484,                        ;; Tag
   1198   i32 0,                             ;; Unused
   1199   null,                              ;; Unused
   1200   metadata !"MyGlobal",              ;; Name
   1201   metadata !"MyGlobal",              ;; Display Name
   1202   metadata !"",                      ;; Linkage Name
   1203   metadata !6,                       ;; File
   1204   i32 1,                             ;; Line
   1205   metadata !7,                       ;; Type
   1206   i32 0,                             ;; IsLocalToUnit
   1207   i32 1,                             ;; IsDefinition
   1208   i32* @MyGlobal                     ;; LLVM-IR Value
   1209 } ; [ DW_TAG_variable ]
   1210 
   1211 ;;
   1212 ;; Define the file
   1213 ;;
   1214 !6 = metadata !{
   1215   i32 786473,                        ;; Tag
   1216   metadata !"foo.cpp",               ;; File
   1217   metadata !"/Volumes/Data/tmp",     ;; Directory
   1218   null                               ;; Unused
   1219 } ; [ DW_TAG_file_type ]
   1220 
   1221 ;;
   1222 ;; Define the type
   1223 ;;
   1224 !7 = metadata !{
   1225   i32 786468,                         ;; Tag
   1226   null,                               ;; Unused
   1227   metadata !"int",                    ;; Name
   1228   null,                               ;; Unused
   1229   i32 0,                              ;; Line
   1230   i64 32,                             ;; Size in Bits
   1231   i64 32,                             ;; Align in Bits
   1232   i64 0,                              ;; Offset
   1233   i32 0,                              ;; Flags
   1234   i32 5                               ;; Encoding
   1235 } ; [ DW_TAG_base_type ]
   1236 
   1237 </pre>
   1238 </div>
   1239 
   1240 </div>
   1241 
   1242 <!-- ======================================================================= -->
   1243 <h3>
   1244   <a name="ccxx_subprogram">C/C++ function information</a>
   1245 </h3>
   1246 
   1247 <div>
   1248 
   1249 <p>Given a function declared as follows:</p>
   1250 
   1251 <div class="doc_code">
   1252 <pre>
   1253 int main(int argc, char *argv[]) {
   1254   return 0;
   1255 }
   1256 </pre>
   1257 </div>
   1258 
   1259 <p>a C/C++ front-end would generate the following descriptors:</p>
   1260 
   1261 <div class="doc_code">
   1262 <pre>
   1263 ;;
   1264 ;; Define the anchor for subprograms.  Note that the second field of the
   1265 ;; anchor is 46, which is the same as the tag for subprograms
   1266 ;; (46 = DW_TAG_subprogram.)
   1267 ;;
   1268 !6 = metadata !{
   1269   i32 524334,        ;; Tag
   1270   i32 0,             ;; Unused
   1271   metadata !1,       ;; Context
   1272   metadata !"main",  ;; Name
   1273   metadata !"main",  ;; Display name
   1274   metadata !"main",  ;; Linkage name
   1275   metadata !1,       ;; File
   1276   i32 1,             ;; Line number
   1277   metadata !4,       ;; Type
   1278   i1 false,          ;; Is local
   1279   i1 true,           ;; Is definition
   1280   i32 0,             ;; Virtuality attribute, e.g. pure virtual function
   1281   i32 0,             ;; Index into virtual table for C++ methods
   1282   i32 0,             ;; Type that holds virtual table.
   1283   i32 0,             ;; Flags
   1284   i1 false,          ;; True if this function is optimized
   1285   Function *,        ;; Pointer to llvm::Function
   1286   null               ;; Function template parameters
   1287 }
   1288 ;;
   1289 ;; Define the subprogram itself.
   1290 ;;
   1291 define i32 @main(i32 %argc, i8** %argv) {
   1292 ...
   1293 }
   1294 </pre>
   1295 </div>
   1296 
   1297 </div>
   1298 
   1299 <!-- ======================================================================= -->
   1300 <h3>
   1301   <a name="ccxx_basic_types">C/C++ basic types</a>
   1302 </h3>
   1303 
   1304 <div>
   1305 
   1306 <p>The following are the basic type descriptors for C/C++ core types:</p>
   1307 
   1308 <!-- ======================================================================= -->
   1309 <h4>
   1310   <a name="ccxx_basic_type_bool">bool</a>
   1311 </h4>
   1312 
   1313 <div>
   1314 
   1315 <div class="doc_code">
   1316 <pre>
   1317 !2 = metadata !{
   1318   i32 524324,        ;; Tag
   1319   metadata !1,       ;; Context
   1320   metadata !"bool",  ;; Name
   1321   metadata !1,       ;; File
   1322   i32 0,             ;; Line number
   1323   i64 8,             ;; Size in Bits
   1324   i64 8,             ;; Align in Bits
   1325   i64 0,             ;; Offset in Bits
   1326   i32 0,             ;; Flags
   1327   i32 2              ;; Encoding
   1328 }
   1329 </pre>
   1330 </div>
   1331 
   1332 </div>
   1333 
   1334 <!-- ======================================================================= -->
   1335 <h4>
   1336   <a name="ccxx_basic_char">char</a>
   1337 </h4>
   1338 
   1339 <div>
   1340 
   1341 <div class="doc_code">
   1342 <pre>
   1343 !2 = metadata !{
   1344   i32 524324,        ;; Tag
   1345   metadata !1,       ;; Context
   1346   metadata !"char",  ;; Name
   1347   metadata !1,       ;; File
   1348   i32 0,             ;; Line number
   1349   i64 8,             ;; Size in Bits
   1350   i64 8,             ;; Align in Bits
   1351   i64 0,             ;; Offset in Bits
   1352   i32 0,             ;; Flags
   1353   i32 6              ;; Encoding
   1354 }
   1355 </pre>
   1356 </div>
   1357 
   1358 </div>
   1359 
   1360 <!-- ======================================================================= -->
   1361 <h4>
   1362   <a name="ccxx_basic_unsigned_char">unsigned char</a>
   1363 </h4>
   1364 
   1365 <div>
   1366 
   1367 <div class="doc_code">
   1368 <pre>
   1369 !2 = metadata !{
   1370   i32 524324,        ;; Tag
   1371   metadata !1,       ;; Context
   1372   metadata !"unsigned char",
   1373   metadata !1,       ;; File
   1374   i32 0,             ;; Line number
   1375   i64 8,             ;; Size in Bits
   1376   i64 8,             ;; Align in Bits
   1377   i64 0,             ;; Offset in Bits
   1378   i32 0,             ;; Flags
   1379   i32 8              ;; Encoding
   1380 }
   1381 </pre>
   1382 </div>
   1383 
   1384 </div>
   1385 
   1386 <!-- ======================================================================= -->
   1387 <h4>
   1388   <a name="ccxx_basic_short">short</a>
   1389 </h4>
   1390 
   1391 <div>
   1392 
   1393 <div class="doc_code">
   1394 <pre>
   1395 !2 = metadata !{
   1396   i32 524324,        ;; Tag
   1397   metadata !1,       ;; Context
   1398   metadata !"short int",
   1399   metadata !1,       ;; File
   1400   i32 0,             ;; Line number
   1401   i64 16,            ;; Size in Bits
   1402   i64 16,            ;; Align in Bits
   1403   i64 0,             ;; Offset in Bits
   1404   i32 0,             ;; Flags
   1405   i32 5              ;; Encoding
   1406 }
   1407 </pre>
   1408 </div>
   1409 
   1410 </div>
   1411 
   1412 <!-- ======================================================================= -->
   1413 <h4>
   1414   <a name="ccxx_basic_unsigned_short">unsigned short</a>
   1415 </h4>
   1416 
   1417 <div>
   1418 
   1419 <div class="doc_code">
   1420 <pre>
   1421 !2 = metadata !{
   1422   i32 524324,        ;; Tag
   1423   metadata !1,       ;; Context
   1424   metadata !"short unsigned int",
   1425   metadata !1,       ;; File
   1426   i32 0,             ;; Line number
   1427   i64 16,            ;; Size in Bits
   1428   i64 16,            ;; Align in Bits
   1429   i64 0,             ;; Offset in Bits
   1430   i32 0,             ;; Flags
   1431   i32 7              ;; Encoding
   1432 }
   1433 </pre>
   1434 </div>
   1435 
   1436 </div>
   1437 
   1438 <!-- ======================================================================= -->
   1439 <h4>
   1440   <a name="ccxx_basic_int">int</a>
   1441 </h4>
   1442 
   1443 <div>
   1444 
   1445 <div class="doc_code">
   1446 <pre>
   1447 !2 = metadata !{
   1448   i32 524324,        ;; Tag
   1449   metadata !1,       ;; Context
   1450   metadata !"int",   ;; Name
   1451   metadata !1,       ;; File
   1452   i32 0,             ;; Line number
   1453   i64 32,            ;; Size in Bits
   1454   i64 32,            ;; Align in Bits
   1455   i64 0,             ;; Offset in Bits
   1456   i32 0,             ;; Flags
   1457   i32 5              ;; Encoding
   1458 }
   1459 </pre></div>
   1460 
   1461 </div>
   1462 
   1463 <!-- ======================================================================= -->
   1464 <h4>
   1465   <a name="ccxx_basic_unsigned_int">unsigned int</a>
   1466 </h4>
   1467 
   1468 <div>
   1469 
   1470 <div class="doc_code">
   1471 <pre>
   1472 !2 = metadata !{
   1473   i32 524324,        ;; Tag
   1474   metadata !1,       ;; Context
   1475   metadata !"unsigned int",
   1476   metadata !1,       ;; File
   1477   i32 0,             ;; Line number
   1478   i64 32,            ;; Size in Bits
   1479   i64 32,            ;; Align in Bits
   1480   i64 0,             ;; Offset in Bits
   1481   i32 0,             ;; Flags
   1482   i32 7              ;; Encoding
   1483 }
   1484 </pre>
   1485 </div>
   1486 
   1487 </div>
   1488 
   1489 <!-- ======================================================================= -->
   1490 <h4>
   1491   <a name="ccxx_basic_long_long">long long</a>
   1492 </h4>
   1493 
   1494 <div>
   1495 
   1496 <div class="doc_code">
   1497 <pre>
   1498 !2 = metadata !{
   1499   i32 524324,        ;; Tag
   1500   metadata !1,       ;; Context
   1501   metadata !"long long int",
   1502   metadata !1,       ;; File
   1503   i32 0,             ;; Line number
   1504   i64 64,            ;; Size in Bits
   1505   i64 64,            ;; Align in Bits
   1506   i64 0,             ;; Offset in Bits
   1507   i32 0,             ;; Flags
   1508   i32 5              ;; Encoding
   1509 }
   1510 </pre>
   1511 </div>
   1512 
   1513 </div>
   1514 
   1515 <!-- ======================================================================= -->
   1516 <h4>
   1517   <a name="ccxx_basic_unsigned_long_long">unsigned long long</a>
   1518 </h4>
   1519 
   1520 <div>
   1521 
   1522 <div class="doc_code">
   1523 <pre>
   1524 !2 = metadata !{
   1525   i32 524324,        ;; Tag
   1526   metadata !1,       ;; Context
   1527   metadata !"long long unsigned int",
   1528   metadata !1,       ;; File
   1529   i32 0,             ;; Line number
   1530   i64 64,            ;; Size in Bits
   1531   i64 64,            ;; Align in Bits
   1532   i64 0,             ;; Offset in Bits
   1533   i32 0,             ;; Flags
   1534   i32 7              ;; Encoding
   1535 }
   1536 </pre>
   1537 </div>
   1538 
   1539 </div>
   1540 
   1541 <!-- ======================================================================= -->
   1542 <h4>
   1543   <a name="ccxx_basic_float">float</a>
   1544 </h4>
   1545 
   1546 <div>
   1547 
   1548 <div class="doc_code">
   1549 <pre>
   1550 !2 = metadata !{
   1551   i32 524324,        ;; Tag
   1552   metadata !1,       ;; Context
   1553   metadata !"float",
   1554   metadata !1,       ;; File
   1555   i32 0,             ;; Line number
   1556   i64 32,            ;; Size in Bits
   1557   i64 32,            ;; Align in Bits
   1558   i64 0,             ;; Offset in Bits
   1559   i32 0,             ;; Flags
   1560   i32 4              ;; Encoding
   1561 }
   1562 </pre>
   1563 </div>
   1564 
   1565 </div>
   1566 
   1567 <!-- ======================================================================= -->
   1568 <h4>
   1569   <a name="ccxx_basic_double">double</a>
   1570 </h4>
   1571 
   1572 <div>
   1573 
   1574 <div class="doc_code">
   1575 <pre>
   1576 !2 = metadata !{
   1577   i32 524324,        ;; Tag
   1578   metadata !1,       ;; Context
   1579   metadata !"double",;; Name
   1580   metadata !1,       ;; File
   1581   i32 0,             ;; Line number
   1582   i64 64,            ;; Size in Bits
   1583   i64 64,            ;; Align in Bits
   1584   i64 0,             ;; Offset in Bits
   1585   i32 0,             ;; Flags
   1586   i32 4              ;; Encoding
   1587 }
   1588 </pre>
   1589 </div>
   1590 
   1591 </div>
   1592 
   1593 </div>
   1594 
   1595 <!-- ======================================================================= -->
   1596 <h3>
   1597   <a name="ccxx_derived_types">C/C++ derived types</a>
   1598 </h3>
   1599 
   1600 <div>
   1601 
   1602 <p>Given the following as an example of C/C++ derived type:</p>
   1603 
   1604 <div class="doc_code">
   1605 <pre>
   1606 typedef const int *IntPtr;
   1607 </pre>
   1608 </div>
   1609 
   1610 <p>a C/C++ front-end would generate the following descriptors:</p>
   1611 
   1612 <div class="doc_code">
   1613 <pre>
   1614 ;;
   1615 ;; Define the typedef "IntPtr".
   1616 ;;
   1617 !2 = metadata !{
   1618   i32 524310,          ;; Tag
   1619   metadata !1,         ;; Context
   1620   metadata !"IntPtr",  ;; Name
   1621   metadata !3,         ;; File
   1622   i32 0,               ;; Line number
   1623   i64 0,               ;; Size in bits
   1624   i64 0,               ;; Align in bits
   1625   i64 0,               ;; Offset in bits
   1626   i32 0,               ;; Flags
   1627   metadata !4          ;; Derived From type
   1628 }
   1629 
   1630 ;;
   1631 ;; Define the pointer type.
   1632 ;;
   1633 !4 = metadata !{
   1634   i32 524303,          ;; Tag
   1635   metadata !1,         ;; Context
   1636   metadata !"",        ;; Name
   1637   metadata !1,         ;; File
   1638   i32 0,               ;; Line number
   1639   i64 64,              ;; Size in bits
   1640   i64 64,              ;; Align in bits
   1641   i64 0,               ;; Offset in bits
   1642   i32 0,               ;; Flags
   1643   metadata !5          ;; Derived From type
   1644 }
   1645 ;;
   1646 ;; Define the const type.
   1647 ;;
   1648 !5 = metadata !{
   1649   i32 524326,          ;; Tag
   1650   metadata !1,         ;; Context
   1651   metadata !"",        ;; Name
   1652   metadata !1,         ;; File
   1653   i32 0,               ;; Line number
   1654   i64 32,              ;; Size in bits
   1655   i64 32,              ;; Align in bits
   1656   i64 0,               ;; Offset in bits
   1657   i32 0,               ;; Flags
   1658   metadata !6          ;; Derived From type
   1659 }
   1660 ;;
   1661 ;; Define the int type.
   1662 ;;
   1663 !6 = metadata !{
   1664   i32 524324,          ;; Tag
   1665   metadata !1,         ;; Context
   1666   metadata !"int",     ;; Name
   1667   metadata !1,         ;; File
   1668   i32 0,               ;; Line number
   1669   i64 32,              ;; Size in bits
   1670   i64 32,              ;; Align in bits
   1671   i64 0,               ;; Offset in bits
   1672   i32 0,               ;; Flags
   1673   5                    ;; Encoding
   1674 }
   1675 </pre>
   1676 </div>
   1677 
   1678 </div>
   1679 
   1680 <!-- ======================================================================= -->
   1681 <h3>
   1682   <a name="ccxx_composite_types">C/C++ struct/union types</a>
   1683 </h3>
   1684 
   1685 <div>
   1686 
   1687 <p>Given the following as an example of C/C++ struct type:</p>
   1688 
   1689 <div class="doc_code">
   1690 <pre>
   1691 struct Color {
   1692   unsigned Red;
   1693   unsigned Green;
   1694   unsigned Blue;
   1695 };
   1696 </pre>
   1697 </div>
   1698 
   1699 <p>a C/C++ front-end would generate the following descriptors:</p>
   1700 
   1701 <div class="doc_code">
   1702 <pre>
   1703 ;;
   1704 ;; Define basic type for unsigned int.
   1705 ;;
   1706 !5 = metadata !{
   1707   i32 524324,        ;; Tag
   1708   metadata !1,       ;; Context
   1709   metadata !"unsigned int",
   1710   metadata !1,       ;; File
   1711   i32 0,             ;; Line number
   1712   i64 32,            ;; Size in Bits
   1713   i64 32,            ;; Align in Bits
   1714   i64 0,             ;; Offset in Bits
   1715   i32 0,             ;; Flags
   1716   i32 7              ;; Encoding
   1717 }
   1718 ;;
   1719 ;; Define composite type for struct Color.
   1720 ;;
   1721 !2 = metadata !{
   1722   i32 524307,        ;; Tag
   1723   metadata !1,       ;; Context
   1724   metadata !"Color", ;; Name
   1725   metadata !1,       ;; Compile unit
   1726   i32 1,             ;; Line number
   1727   i64 96,            ;; Size in bits
   1728   i64 32,            ;; Align in bits
   1729   i64 0,             ;; Offset in bits
   1730   i32 0,             ;; Flags
   1731   null,              ;; Derived From
   1732   metadata !3,       ;; Elements
   1733   i32 0              ;; Runtime Language
   1734 }
   1735 
   1736 ;;
   1737 ;; Define the Red field.
   1738 ;;
   1739 !4 = metadata !{
   1740   i32 524301,        ;; Tag
   1741   metadata !1,       ;; Context
   1742   metadata !"Red",   ;; Name
   1743   metadata !1,       ;; File
   1744   i32 2,             ;; Line number
   1745   i64 32,            ;; Size in bits
   1746   i64 32,            ;; Align in bits
   1747   i64 0,             ;; Offset in bits
   1748   i32 0,             ;; Flags
   1749   metadata !5        ;; Derived From type
   1750 }
   1751 
   1752 ;;
   1753 ;; Define the Green field.
   1754 ;;
   1755 !6 = metadata !{
   1756   i32 524301,        ;; Tag
   1757   metadata !1,       ;; Context
   1758   metadata !"Green", ;; Name
   1759   metadata !1,       ;; File
   1760   i32 3,             ;; Line number
   1761   i64 32,            ;; Size in bits
   1762   i64 32,            ;; Align in bits
   1763   i64 32,             ;; Offset in bits
   1764   i32 0,             ;; Flags
   1765   metadata !5        ;; Derived From type
   1766 }
   1767 
   1768 ;;
   1769 ;; Define the Blue field.
   1770 ;;
   1771 !7 = metadata !{
   1772   i32 524301,        ;; Tag
   1773   metadata !1,       ;; Context
   1774   metadata !"Blue",  ;; Name
   1775   metadata !1,       ;; File
   1776   i32 4,             ;; Line number
   1777   i64 32,            ;; Size in bits
   1778   i64 32,            ;; Align in bits
   1779   i64 64,             ;; Offset in bits
   1780   i32 0,             ;; Flags
   1781   metadata !5        ;; Derived From type
   1782 }
   1783 
   1784 ;;
   1785 ;; Define the array of fields used by the composite type Color.
   1786 ;;
   1787 !3 = metadata !{metadata !4, metadata !6, metadata !7}
   1788 </pre>
   1789 </div>
   1790 
   1791 </div>
   1792 
   1793 <!-- ======================================================================= -->
   1794 <h3>
   1795   <a name="ccxx_enumeration_types">C/C++ enumeration types</a>
   1796 </h3>
   1797 
   1798 <div>
   1799 
   1800 <p>Given the following as an example of C/C++ enumeration type:</p>
   1801 
   1802 <div class="doc_code">
   1803 <pre>
   1804 enum Trees {
   1805   Spruce = 100,
   1806   Oak = 200,
   1807   Maple = 300
   1808 };
   1809 </pre>
   1810 </div>
   1811 
   1812 <p>a C/C++ front-end would generate the following descriptors:</p>
   1813 
   1814 <div class="doc_code">
   1815 <pre>
   1816 ;;
   1817 ;; Define composite type for enum Trees
   1818 ;;
   1819 !2 = metadata !{
   1820   i32 524292,        ;; Tag
   1821   metadata !1,       ;; Context
   1822   metadata !"Trees", ;; Name
   1823   metadata !1,       ;; File
   1824   i32 1,             ;; Line number
   1825   i64 32,            ;; Size in bits
   1826   i64 32,            ;; Align in bits
   1827   i64 0,             ;; Offset in bits
   1828   i32 0,             ;; Flags
   1829   null,              ;; Derived From type
   1830   metadata !3,       ;; Elements
   1831   i32 0              ;; Runtime language
   1832 }
   1833 
   1834 ;;
   1835 ;; Define the array of enumerators used by composite type Trees.
   1836 ;;
   1837 !3 = metadata !{metadata !4, metadata !5, metadata !6}
   1838 
   1839 ;;
   1840 ;; Define Spruce enumerator.
   1841 ;;
   1842 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
   1843 
   1844 ;;
   1845 ;; Define Oak enumerator.
   1846 ;;
   1847 !5 = metadata !{i32 524328, metadata !"Oak", i64 200}
   1848 
   1849 ;;
   1850 ;; Define Maple enumerator.
   1851 ;;
   1852 !6 = metadata !{i32 524328, metadata !"Maple", i64 300}
   1853 
   1854 </pre>
   1855 </div>
   1856 
   1857 </div>
   1858 
   1859 </div>
   1860 
   1861 
   1862 <!-- *********************************************************************** -->
   1863 <h2>
   1864   <a name="llvmdwarfextension">Debugging information format</a>
   1865 </h2>
   1866 <!-- *********************************************************************** -->
   1867 <div>
   1868 <!-- ======================================================================= -->
   1869 <h3>
   1870   <a name="objcproperty">Debugging Information Extension for Objective C Properties</a>
   1871 </h3>
   1872 <div>
   1873 <!-- *********************************************************************** -->
   1874 <h4>
   1875   <a name="objcpropertyintroduction">Introduction</a>
   1876 </h4>
   1877 <!-- *********************************************************************** -->
   1878 
   1879 <div>
   1880 <p>Objective C provides a simpler way to declare and define accessor methods
   1881 using declared properties. The language provides features to declare a
   1882 property and to let compiler synthesize accessor methods.
   1883 </p>
   1884 
   1885 <p>The debugger lets developer inspect Objective C interfaces and their
   1886 instance variables and class variables. However, the debugger does not know
   1887 anything about the properties defined in Objective C interfaces. The debugger
   1888 consumes information generated by compiler in DWARF format. The format does
   1889 not support encoding of Objective C properties. This proposal describes DWARF
   1890 extensions to encode Objective C properties, which the debugger can use to let
   1891 developers inspect Objective C properties.
   1892 </p>
   1893 
   1894 </div>
   1895 
   1896 
   1897 <!-- *********************************************************************** -->
   1898 <h4>
   1899   <a name="objcpropertyproposal">Proposal</a>
   1900 </h4>
   1901 <!-- *********************************************************************** -->
   1902 
   1903 <div>
   1904 <p>Objective C properties exist separately from class members. A property
   1905 can be defined only by &quot;setter&quot; and &quot;getter&quot; selectors, and
   1906 be calculated anew on each access.  Or a property can just be a direct access
   1907 to some declared ivar.  Finally it can have an ivar &quot;automatically
   1908 synthesized&quot; for it by the compiler, in which case the property can be
   1909 referred to in user code directly using the standard C dereference syntax as
   1910 well as through the property &quot;dot&quot; syntax, but there is no entry in
   1911 the @interface declaration corresponding to this ivar.
   1912 </p>
   1913 <p>
   1914 To facilitate debugging, these properties we will add a new DWARF TAG into the
   1915 DW_TAG_structure_type definition for the class to hold the description of a
   1916 given property, and a set of DWARF attributes that provide said description.
   1917 The property tag will also contain the name and declared type of the property.
   1918 </p>
   1919 <p>
   1920 If there is a related ivar, there will also be a DWARF property attribute placed
   1921 in the DW_TAG_member DIE for that ivar referring back to the property TAG for
   1922 that property. And in the case where the compiler synthesizes the ivar directly,
   1923 the compiler is expected to generate a DW_TAG_member for that ivar (with the
   1924 DW_AT_artificial set to 1), whose name will be the name used to access this
   1925 ivar directly in code, and with the property attribute pointing back to the
   1926 property it is backing.
   1927 </p>
   1928 <p>
   1929 The following examples will serve as illustration for our discussion:
   1930 </p>
   1931 
   1932 <div class="doc_code">
   1933 <pre>
   1934 @interface I1 {
   1935   int n2;
   1936 }
   1937 
   1938 @property int p1;
   1939 @property int p2;
   1940 @end
   1941 
   1942 @implementation I1
   1943 @synthesize p1;
   1944 @synthesize p2 = n2;
   1945 @end
   1946 </pre>
   1947 </div>
   1948 
   1949 <p>
   1950 This produces the following DWARF (this is a &quot;pseudo dwarfdump&quot; output):
   1951 </p>
   1952 <div class="doc_code">
   1953 <pre>
   1954 0x00000100:  TAG_structure_type [7] *
   1955                AT_APPLE_runtime_class( 0x10 )
   1956                AT_name( "I1" )
   1957                AT_decl_file( "Objc_Property.m" )
   1958                AT_decl_line( 3 )
   1959 
   1960 0x00000110    TAG_APPLE_property
   1961                 AT_name ( "p1" )
   1962                 AT_type ( {0x00000150} ( int ) )
   1963 
   1964 0x00000120:   TAG_APPLE_property
   1965                 AT_name ( "p2" )
   1966                 AT_type ( {0x00000150} ( int ) )
   1967 
   1968 0x00000130:   TAG_member [8]
   1969                 AT_name( "_p1" )
   1970                 AT_APPLE_property ( {0x00000110} "p1" )
   1971                 AT_type( {0x00000150} ( int ) )
   1972                 AT_artificial ( 0x1 )
   1973 
   1974 0x00000140:    TAG_member [8]
   1975                  AT_name( "n2" )
   1976                  AT_APPLE_property ( {0x00000120} "p2" )
   1977                  AT_type( {0x00000150} ( int ) )
   1978 
   1979 0x00000150:  AT_type( ( int ) )
   1980 </pre>
   1981 </div>
   1982 
   1983 <p> Note, the current convention is that the name of the ivar for an
   1984 auto-synthesized property is the name of the property from which it derives with
   1985 an underscore prepended, as is shown in the example.
   1986 But we actually don't need to know this convention, since we are given the name
   1987 of the ivar directly.
   1988 </p>
   1989 
   1990 <p>
   1991 Also, it is common practice in ObjC to have different property declarations in
   1992 the @interface and @implementation - e.g. to provide a read-only property in
   1993 the interface,and a read-write interface in the implementation.  In that case,
   1994 the compiler should emit whichever property declaration will be in force in the
   1995 current translation unit.
   1996 </p>
   1997 
   1998 <p> Developers can decorate a property with attributes which are encoded using
   1999 DW_AT_APPLE_property_attribute.
   2000 </p>
   2001 
   2002 <div class="doc_code">
   2003 <pre>
   2004 @property (readonly, nonatomic) int pr;
   2005 </pre>
   2006 </div>
   2007 <p>
   2008 Which produces a property tag:
   2009 <p>
   2010 <div class="doc_code">
   2011 <pre>
   2012 TAG_APPLE_property [8]
   2013   AT_name( "pr" )
   2014   AT_type ( {0x00000147} (int) )
   2015   AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
   2016 </pre>
   2017 </div>
   2018 
   2019 <p> The setter and getter method names are attached to the property using
   2020 DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes.
   2021 </p>
   2022 <div class="doc_code">
   2023 <pre>
   2024 @interface I1
   2025 @property (setter=myOwnP3Setter:) int p3;
   2026 -(void)myOwnP3Setter:(int)a;
   2027 @end
   2028 
   2029 @implementation I1
   2030 @synthesize p3;
   2031 -(void)myOwnP3Setter:(int)a{ }
   2032 @end
   2033 </pre>
   2034 </div>
   2035 
   2036 <p>
   2037 The DWARF for this would be:
   2038 </p>
   2039 <div class="doc_code">
   2040 <pre>
   2041 0x000003bd: TAG_structure_type [7] *
   2042               AT_APPLE_runtime_class( 0x10 )
   2043               AT_name( "I1" )
   2044               AT_decl_file( "Objc_Property.m" )
   2045               AT_decl_line( 3 )
   2046 
   2047 0x000003cd      TAG_APPLE_property
   2048                   AT_name ( "p3" )
   2049                   AT_APPLE_property_setter ( "myOwnP3Setter:" )
   2050                   AT_type( {0x00000147} ( int ) )
   2051 
   2052 0x000003f3:     TAG_member [8]
   2053                   AT_name( "_p3" )
   2054                   AT_type ( {0x00000147} ( int ) )
   2055                   AT_APPLE_property ( {0x000003cd} )
   2056                   AT_artificial ( 0x1 )
   2057 </pre>
   2058 </div>
   2059 
   2060 </div>
   2061 
   2062 <!-- *********************************************************************** -->
   2063 <h4>
   2064   <a name="objcpropertynewtags">New DWARF Tags</a>
   2065 </h4>
   2066 <!-- *********************************************************************** -->
   2067 
   2068 <div>
   2069 <table border="1" cellspacing="0">
   2070   <col width="200">
   2071   <col width="200">
   2072   <tr>
   2073     <th>TAG</th>
   2074     <th>Value</th>
   2075   </tr>
   2076   <tr>
   2077     <td>DW_TAG_APPLE_property</td>
   2078     <td>0x4200</td>
   2079   </tr>
   2080 </table>
   2081 
   2082 </div>
   2083 
   2084 <!-- *********************************************************************** -->
   2085 <h4>
   2086   <a name="objcpropertynewattributes">New DWARF Attributes</a>
   2087 </h4>
   2088 <!-- *********************************************************************** -->
   2089 
   2090 <div>
   2091 <table border="1" cellspacing="0">
   2092   <col width="200">
   2093   <col width="200">
   2094   <col width="200">
   2095   <tr>
   2096     <th>Attribute</th>
   2097     <th>Value</th>
   2098     <th>Classes</th>
   2099   </tr>
   2100   <tr>
   2101     <td>DW_AT_APPLE_property</td>
   2102     <td>0x3fed</td>
   2103     <td>Reference</td>
   2104   </tr>
   2105   <tr>
   2106     <td>DW_AT_APPLE_property_getter</td>
   2107     <td>0x3fe9</td>
   2108     <td>String</td>
   2109   </tr>
   2110   <tr>
   2111     <td>DW_AT_APPLE_property_setter</td>
   2112     <td>0x3fea</td>
   2113     <td>String</td>
   2114   </tr>
   2115   <tr>
   2116     <td>DW_AT_APPLE_property_attribute</td>
   2117     <td>0x3feb</td>
   2118     <td>Constant</td>
   2119   </tr>
   2120 </table>
   2121 
   2122 </div>
   2123 
   2124 <!-- *********************************************************************** -->
   2125 <h4>
   2126   <a name="objcpropertynewconstants">New DWARF Constants</a>
   2127 </h4>
   2128 <!-- *********************************************************************** -->
   2129 
   2130 <div>
   2131 <table border="1" cellspacing="0">
   2132   <col width="200">
   2133   <col width="200">
   2134   <tr>
   2135     <th>Name</th>
   2136     <th>Value</th>
   2137   </tr>
   2138   <tr>
   2139     <td>DW_AT_APPLE_PROPERTY_readonly</td>
   2140     <td>0x1</td>
   2141   </tr>
   2142   <tr>
   2143     <td>DW_AT_APPLE_PROPERTY_readwrite</td>
   2144     <td>0x2</td>
   2145   </tr>
   2146   <tr>
   2147     <td>DW_AT_APPLE_PROPERTY_assign</td>
   2148     <td>0x4</td>
   2149   </tr>
   2150   <tr>
   2151     <td>DW_AT_APPLE_PROPERTY_retain</td>
   2152     <td>0x8</td>
   2153   </tr>
   2154   <tr>
   2155     <td>DW_AT_APPLE_PROPERTY_copy</td>
   2156     <td>0x10</td>
   2157   </tr>
   2158   <tr>
   2159     <td>DW_AT_APPLE_PROPERTY_nonatomic</td>
   2160     <td>0x20</td>
   2161   </tr>
   2162 </table>
   2163 
   2164 </div>
   2165 </div>
   2166 
   2167 <!-- ======================================================================= -->
   2168 <h3>
   2169   <a name="acceltable">Name Accelerator Tables</a>
   2170 </h3>
   2171 <!-- ======================================================================= -->
   2172 <div>
   2173 <!-- ======================================================================= -->
   2174 <h4>
   2175   <a name="acceltableintroduction">Introduction</a>
   2176 </h4>
   2177 <!-- ======================================================================= -->
   2178 <div>
   2179 <p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger
   2180   needs. The "pub" in the section name indicates that the entries in the
   2181   table are publicly visible names only. This means no static or hidden
   2182   functions show up in the .debug_pubnames. No static variables or private class
   2183   variables are in the .debug_pubtypes. Many compilers add different things to
   2184   these tables, so we can't rely upon the contents between gcc, icc, or clang.</p>
   2185 
   2186 <p>The typical query given by users tends not to match up with the contents of
   2187   these tables. For example, the DWARF spec states that "In the case of the
   2188   name of a function member or static data member of a C++ structure, class or
   2189   union, the name presented in the .debug_pubnames section is not the simple
   2190   name given by the DW_AT_name attribute of the referenced debugging information
   2191   entry, but rather the fully qualified name of the data or function member."
   2192   So the only names in these tables for complex C++ entries is a fully
   2193   qualified name.  Debugger users tend not to enter their search strings as
   2194   "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c".  So
   2195   the name entered in the name table must be demangled in order to chop it up
   2196   appropriately and additional names must be manually entered into the table
   2197   to make it effective as a name lookup table for debuggers to use.</p>
   2198 
   2199 <p>All debuggers currently ignore the .debug_pubnames table as a result of
   2200   its inconsistent and useless public-only name content making it a waste of
   2201   space in the object file. These tables, when they are written to disk, are
   2202   not sorted in any way, leaving every debugger to do its own parsing
   2203   and sorting. These tables also include an inlined copy of the string values
   2204   in the table itself making the tables much larger than they need to be on
   2205   disk, especially for large C++ programs.</p>
   2206 
   2207 <p>Can't we just fix the sections by adding all of the names we need to this
   2208   table? No, because that is not what the tables are defined to contain and we
   2209   won't know the difference between the old bad tables and the new good tables.
   2210   At best we could make our own renamed sections that contain all of the data
   2211   we need.</p>
   2212 
   2213 <p>These tables are also insufficient for what a debugger like LLDB needs.
   2214   LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
   2215   then often asked to look for type "foo" or namespace "bar", or list items in
   2216   namespace "baz". Namespaces are not included in the pubnames or pubtypes
   2217   tables. Since clang asks a lot of questions when it is parsing an expression,
   2218   we need to be very fast when looking up names, as it happens a lot. Having new
   2219   accelerator tables that are optimized for very quick lookups will benefit
   2220   this type of debugging experience greatly.</p>
   2221 
   2222 <p>We would like to generate name lookup tables that can be mapped into
   2223   memory from disk, and used as is, with little or no up-front parsing. We would
   2224   also be able to control the exact content of these different tables so they
   2225   contain exactly what we need. The Name Accelerator Tables were designed
   2226   to fix these issues. In order to solve these issues we need to:</p>
   2227 
   2228 <ul>
   2229   <li>Have a format that can be mapped into memory from disk and used as is</li>
   2230   <li>Lookups should be very fast</li>
   2231   <li>Extensible table format so these tables can be made by many producers</li>
   2232   <li>Contain all of the names needed for typical lookups out of the box</li>
   2233   <li>Strict rules for the contents of tables</li>
   2234 </ul>
   2235 
   2236 <p>Table size is important and the accelerator table format should allow the
   2237   reuse of strings from common string tables so the strings for the names are
   2238   not duplicated. We also want to make sure the table is ready to be used as-is
   2239   by simply mapping the table into memory with minimal header parsing.</p>
   2240 
   2241 <p>The name lookups need to be fast and optimized for the kinds of lookups
   2242   that debuggers tend to do. Optimally we would like to touch as few parts of
   2243   the mapped table as possible when doing a name lookup and be able to quickly
   2244   find the name entry we are looking for, or discover there are no matches. In
   2245   the case of debuggers we optimized for lookups that fail most of the time.</p>
   2246 
   2247 <p>Each table that is defined should have strict rules on exactly what is in
   2248   the accelerator tables and documented so clients can rely on the content.</p>
   2249 
   2250 </div>
   2251 
   2252 <!-- ======================================================================= -->
   2253 <h4>
   2254   <a name="acceltablehashes">Hash Tables</a>
   2255 </h4>
   2256 <!-- ======================================================================= -->
   2257 
   2258 <div>
   2259 <h5>Standard Hash Tables</h5>
   2260 
   2261 <p>Typical hash tables have a header, buckets, and each bucket points to the
   2262 bucket contents:
   2263 </p>
   2264 
   2265 <div class="doc_code">
   2266 <pre>
   2267 .------------.
   2268 |  HEADER    |
   2269 |------------|
   2270 |  BUCKETS   |
   2271 |------------|
   2272 |  DATA      |
   2273 `------------'
   2274 </pre>
   2275 </div>
   2276 
   2277 <p>The BUCKETS are an array of offsets to DATA for each hash:</p>
   2278 
   2279 <div class="doc_code">
   2280 <pre>
   2281 .------------.
   2282 | 0x00001000 | BUCKETS[0]
   2283 | 0x00002000 | BUCKETS[1]
   2284 | 0x00002200 | BUCKETS[2]
   2285 | 0x000034f0 | BUCKETS[3]
   2286 |            | ...
   2287 | 0xXXXXXXXX | BUCKETS[n_buckets]
   2288 '------------'
   2289 </pre>
   2290 </div>
   2291 
   2292 <p>So for bucket[3] in the example above, we have an offset into the table
   2293   0x000034f0 which points to a chain of entries for the bucket. Each bucket
   2294   must contain a next pointer, full 32 bit hash value, the string itself,
   2295   and the data for the current string value.</p>
   2296 
   2297 <div class="doc_code">
   2298 <pre>
   2299             .------------.
   2300 0x000034f0: | 0x00003500 | next pointer
   2301             | 0x12345678 | 32 bit hash
   2302             | "erase"    | string value
   2303             | data[n]    | HashData for this bucket
   2304             |------------|
   2305 0x00003500: | 0x00003550 | next pointer
   2306             | 0x29273623 | 32 bit hash
   2307             | "dump"     | string value
   2308             | data[n]    | HashData for this bucket
   2309             |------------|
   2310 0x00003550: | 0x00000000 | next pointer
   2311             | 0x82638293 | 32 bit hash
   2312             | "main"     | string value
   2313             | data[n]    | HashData for this bucket
   2314             `------------'
   2315 </pre>
   2316 </div>
   2317 
   2318 <p>The problem with this layout for debuggers is that we need to optimize for
   2319   the negative lookup case where the symbol we're searching for is not present.
   2320   So if we were to lookup "printf" in the table above, we would make a 32 hash
   2321   for "printf", it might match bucket[3]. We would need to go to the offset
   2322   0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
   2323   need to read the next pointer, then read the hash, compare it, and skip to
   2324   the next bucket. Each time we are skipping many bytes in memory and touching
   2325   new cache pages just to do the compare on the full 32 bit hash. All of these
   2326   accesses then tell us that we didn't have a match.</p>
   2327 
   2328 <h5>Name Hash Tables</h5>
   2329 
   2330 <p>To solve the issues mentioned above we have structured the hash tables
   2331   a bit differently: a header, buckets, an array of all unique 32 bit hash
   2332   values, followed by an array of hash value data offsets, one for each hash
   2333   value, then the data for all hash values:</p>
   2334 
   2335 <div class="doc_code">
   2336 <pre>
   2337 .-------------.
   2338 |  HEADER     |
   2339 |-------------|
   2340 |  BUCKETS    |
   2341 |-------------|
   2342 |  HASHES     |
   2343 |-------------|
   2344 |  OFFSETS    |
   2345 |-------------|
   2346 |  DATA       |
   2347 `-------------'
   2348 </pre>
   2349 </div>
   2350 
   2351 <p>The BUCKETS in the name tables are an index into the HASHES array. By
   2352   making all of the full 32 bit hash values contiguous in memory, we allow
   2353   ourselves to efficiently check for a match while touching as little
   2354   memory as possible. Most often checking the 32 bit hash values is as far as
   2355   the lookup goes. If it does match, it usually is a match with no collisions.
   2356   So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
   2357   values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p>
   2358 
   2359 <div class="doc_code">
   2360 <pre>
   2361 .-------------------------.
   2362 |  HEADER.magic           | uint32_t
   2363 |  HEADER.version         | uint16_t
   2364 |  HEADER.hash_function   | uint16_t
   2365 |  HEADER.bucket_count    | uint32_t
   2366 |  HEADER.hashes_count    | uint32_t
   2367 |  HEADER.header_data_len | uint32_t
   2368 |  HEADER_DATA            | HeaderData
   2369 |-------------------------|
   2370 |  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
   2371 |-------------------------|
   2372 |  HASHES                 | uint32_t[n_buckets] // 32 bit hash values
   2373 |-------------------------|
   2374 |  OFFSETS                | uint32_t[n_buckets] // 32 bit offsets to hash value data
   2375 |-------------------------|
   2376 |  ALL HASH DATA          |
   2377 `-------------------------'
   2378 </pre>
   2379 </div>
   2380 
   2381 <p>So taking the exact same data from the standard hash example above we end up
   2382   with:</p>
   2383 
   2384 <div class="doc_code">
   2385 <pre>
   2386             .------------.
   2387             | HEADER     |
   2388             |------------|
   2389             |          0 | BUCKETS[0]
   2390             |          2 | BUCKETS[1]
   2391             |          5 | BUCKETS[2]
   2392             |          6 | BUCKETS[3]
   2393             |            | ...
   2394             |        ... | BUCKETS[n_buckets]
   2395             |------------|
   2396             | 0x........ | HASHES[0]
   2397             | 0x........ | HASHES[1]
   2398             | 0x........ | HASHES[2]
   2399             | 0x........ | HASHES[3]
   2400             | 0x........ | HASHES[4]
   2401             | 0x........ | HASHES[5]
   2402             | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
   2403             | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
   2404             | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
   2405             | 0x........ | HASHES[9]
   2406             | 0x........ | HASHES[10]
   2407             | 0x........ | HASHES[11]
   2408             | 0x........ | HASHES[12]
   2409             | 0x........ | HASHES[13]
   2410             | 0x........ | HASHES[n_hashes]
   2411             |------------|
   2412             | 0x........ | OFFSETS[0]
   2413             | 0x........ | OFFSETS[1]
   2414             | 0x........ | OFFSETS[2]
   2415             | 0x........ | OFFSETS[3]
   2416             | 0x........ | OFFSETS[4]
   2417             | 0x........ | OFFSETS[5]
   2418             | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
   2419             | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
   2420             | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
   2421             | 0x........ | OFFSETS[9]
   2422             | 0x........ | OFFSETS[10]
   2423             | 0x........ | OFFSETS[11]
   2424             | 0x........ | OFFSETS[12]
   2425             | 0x........ | OFFSETS[13]
   2426             | 0x........ | OFFSETS[n_hashes]
   2427             |------------|
   2428             |            |
   2429             |            |
   2430             |            |
   2431             |            |
   2432             |            |
   2433             |------------|
   2434 0x000034f0: | 0x00001203 | .debug_str ("erase")
   2435             | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
   2436             | 0x........ | HashData[0]
   2437             | 0x........ | HashData[1]
   2438             | 0x........ | HashData[2]
   2439             | 0x........ | HashData[3]
   2440             | 0x00000000 | String offset into .debug_str (terminate data for hash)
   2441             |------------|
   2442 0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
   2443             | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
   2444             | 0x........ | HashData[0]
   2445             | 0x........ | HashData[1]
   2446             | 0x00001203 | String offset into .debug_str ("dump")
   2447             | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
   2448             | 0x........ | HashData[0]
   2449             | 0x........ | HashData[1]
   2450             | 0x........ | HashData[2]
   2451             | 0x00000000 | String offset into .debug_str (terminate data for hash)
   2452             |------------|
   2453 0x00003550: | 0x00001203 | String offset into .debug_str ("main")
   2454             | 0x00000009 | A 32 bit array count - number of HashData with name "main"
   2455             | 0x........ | HashData[0]
   2456             | 0x........ | HashData[1]
   2457             | 0x........ | HashData[2]
   2458             | 0x........ | HashData[3]
   2459             | 0x........ | HashData[4]
   2460             | 0x........ | HashData[5]
   2461             | 0x........ | HashData[6]
   2462             | 0x........ | HashData[7]
   2463             | 0x........ | HashData[8]
   2464             | 0x00000000 | String offset into .debug_str (terminate data for hash)
   2465             `------------'
   2466 </pre>
   2467 </div>
   2468 
   2469 <p>So we still have all of the same data, we just organize it more efficiently
   2470   for debugger lookup. If we repeat the same "printf" lookup from above, we
   2471   would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
   2472   value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
   2473   into the HASHES table. We would then compare any consecutive 32 bit hashes
   2474   values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
   2475   do this by verifying that each subsequent hash value modulo n_buckets is still
   2476   3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
   2477   then compare a few consecutive 32 bit hashes before we know that we have no match.
   2478   We don't end up marching through multiple words of memory and we really keep the
   2479   number of processor data cache lines being accessed as small as possible.</p>
   2480 
   2481 <p>The string hash that is used for these lookup tables is the Daniel J.
   2482   Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
   2483   good hash for all kinds of names in programs with very few hash collisions.</p>
   2484 
   2485 <p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p>
   2486 </div>
   2487 
   2488 <!-- ======================================================================= -->
   2489 <h4>
   2490   <a name="acceltabledetails">Details</a>
   2491 </h4>
   2492 <!-- ======================================================================= -->
   2493 <div>
   2494 <p>These name hash tables are designed to be generic where specializations of
   2495   the table get to define additional data that goes into the header
   2496   ("HeaderData"), how the string value is stored ("KeyType") and the content
   2497   of the data for each hash value.</p>
   2498 
   2499 <h5>Header Layout</h5>
   2500 <p>The header has a fixed part, and the specialized part. The exact format of
   2501   the header is:</p>
   2502 <div class="doc_code">
   2503 <pre>
   2504 struct Header
   2505 {
   2506   uint32_t   magic;           // 'HASH' magic value to allow endian detection
   2507   uint16_t   version;         // Version number
   2508   uint16_t   hash_function;   // The hash function enumeration that was used
   2509   uint32_t   bucket_count;    // The number of buckets in this hash table
   2510   uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
   2511   uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
   2512                               // Specifically the length of the following HeaderData field - this does not
   2513                               // include the size of the preceding fields
   2514   HeaderData header_data;     // Implementation specific header data
   2515 };
   2516 </pre>
   2517 </div>
   2518 <p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as
   2519   an ASCII integer. This allows the detection of the start of the hash table and
   2520   also allows the table's byte order to be determined so the table can be
   2521   correctly extracted. The "magic" value is followed by a 16 bit version number
   2522   which allows the table to be revised and modified in the future. The current
   2523   version number is 1. "hash_function" is a uint16_t enumeration that specifies
   2524   which hash function was used to produce this table. The current values for the
   2525   hash function enumerations include:</p>
   2526 <div class="doc_code">
   2527 <pre>
   2528 enum HashFunctionType
   2529 {
   2530   eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
   2531 };
   2532 </pre>
   2533 </div>
   2534 <p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets
   2535   are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
   2536   values that are in the HASHES array, and is the same number of offsets are
   2537   contained in the OFFSETS array. "header_data_len" specifies the size in
   2538   bytes of the HeaderData that is filled in by specialized versions of this
   2539   table.</p>
   2540 
   2541 <h5>Fixed Lookup</h5>
   2542 <p>The header is followed by the buckets, hashes, offsets, and hash value
   2543   data.
   2544 <div class="doc_code">
   2545 <pre>
   2546 struct FixedTable
   2547 {
   2548   uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
   2549   uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
   2550   uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
   2551 };
   2552 </pre>
   2553 </div>
   2554 <p>"buckets" is an array of 32 bit indexes into the "hashes" array. The
   2555   "hashes" array contains all of the 32 bit hash values for all names in the
   2556   hash table. Each hash in the "hashes" table has an offset in the "offsets"
   2557   array that points to the data for the hash value.</p>
   2558 
   2559 <p>This table setup makes it very easy to repurpose these tables to contain
   2560   different data, while keeping the lookup mechanism the same for all tables.
   2561   This layout also makes it possible to save the table to disk and map it in
   2562   later and do very efficient name lookups with little or no parsing.</p>
   2563 
   2564 <p>DWARF lookup tables can be implemented in a variety of ways and can store
   2565   a lot of information for each name. We want to make the DWARF tables
   2566   extensible and able to store the data efficiently so we have used some of the
   2567   DWARF features that enable efficient data storage to define exactly what kind
   2568   of data we store for each name.</p>
   2569 
   2570 <p>The "HeaderData" contains a definition of the contents of each HashData
   2571   chunk. We might want to store an offset to all of the debug information
   2572   entries (DIEs) for each name. To keep things extensible, we create a list of
   2573   items, or Atoms, that are contained in the data for each name. First comes the
   2574   type of the data in each atom:</p>
   2575 <div class="doc_code">
   2576 <pre>
   2577 enum AtomType
   2578 {
   2579   eAtomTypeNULL       = 0u,
   2580   eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
   2581   eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
   2582   eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
   2583   eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
   2584   eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
   2585 };
   2586 </pre>
   2587 </div>
   2588 <p>The enumeration values and their meanings are:</p>
   2589 <div class="doc_code">
   2590 <pre>
   2591   eAtomTypeNULL       - a termination atom that specifies the end of the atom list
   2592   eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
   2593   eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
   2594   eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
   2595   eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
   2596   eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
   2597 </pre>
   2598 </div>
   2599 <p>Then we allow each atom type to define the atom type and how the data for
   2600   each atom type data is encoded:</p>
   2601 <div class="doc_code">
   2602 <pre>
   2603 struct Atom
   2604 {
   2605   uint16_t type;  // AtomType enum value
   2606   uint16_t form;  // DWARF DW_FORM_XXX defines
   2607 };
   2608 </pre>
   2609 </div>
   2610 <p>The "form" type above is from the DWARF specification and defines the
   2611   exact encoding of the data for the Atom type. See the DWARF specification for
   2612   the DW_FORM_ definitions.</p>
   2613 <div class="doc_code">
   2614 <pre>
   2615 struct HeaderData
   2616 {
   2617   uint32_t die_offset_base;
   2618   uint32_t atom_count;
   2619   Atoms    atoms[atom_count0];
   2620 };
   2621 </pre>
   2622 </div>
   2623 <p>"HeaderData" defines the base DIE offset that should be added to any atoms
   2624   that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
   2625   DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
   2626   each "HashData" object -- Atom.form tells us how large each field will be in
   2627   the HashData and the Atom.type tells us how this data should be interpreted.</p>
   2628 
   2629 <p>For the current implementations of the ".apple_names" (all functions + globals),
   2630   the ".apple_types" (names of all types that are defined), and the
   2631   ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p>
   2632 <div class="doc_code">
   2633 <pre>
   2634 HeaderData.atom_count = 1;
   2635 HeaderData.atoms[0].type = eAtomTypeDIEOffset;
   2636 HeaderData.atoms[0].form = DW_FORM_data4;
   2637 </pre>
   2638 </div>
   2639 <p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
   2640   encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
   2641   multiple matching DIEs in a single file, which could come up with an inlined
   2642   function for instance. Future tables could include more information about the
   2643   DIE such as flags indicating if the DIE is a function, method, block,
   2644   or inlined.</p>
   2645 
   2646 <p>The KeyType for the DWARF table is a 32 bit string table offset into the
   2647   ".debug_str" table. The ".debug_str" is the string table for the DWARF which
   2648   may already contain copies of all of the strings. This helps make sure, with
   2649   help from the compiler, that we reuse the strings between all of the DWARF
   2650   sections and keeps the hash table size down. Another benefit to having the
   2651   compiler generate all strings as DW_FORM_strp in the debug info, is that
   2652   DWARF parsing can be made much faster.</p>
   2653 
   2654 <p>After a lookup is made, we get an offset into the hash data. The hash data
   2655   needs to be able to deal with 32 bit hash collisions, so the chunk of data
   2656   at the offset in the hash data consists of a triple:</p>
   2657 <div class="doc_code">
   2658 <pre>
   2659 uint32_t str_offset
   2660 uint32_t hash_data_count
   2661 HashData[hash_data_count]
   2662 </pre>
   2663 </div>
   2664 <p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the
   2665   hash data chunks contain a single item (no 32 bit hash collision):</p>
   2666 <div class="doc_code">
   2667 <pre>
   2668 .------------.
   2669 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   2670 | 0x00000004 | uint32_t HashData count
   2671 | 0x........ | uint32_t HashData[0] DIE offset
   2672 | 0x........ | uint32_t HashData[1] DIE offset
   2673 | 0x........ | uint32_t HashData[2] DIE offset
   2674 | 0x........ | uint32_t HashData[3] DIE offset
   2675 | 0x00000000 | uint32_t KeyType (end of hash chain)
   2676 `------------'
   2677 </pre>
   2678 </div>
   2679 <p>If there are collisions, you will have multiple valid string offsets:</p>
   2680 <div class="doc_code">
   2681 <pre>
   2682 .------------.
   2683 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   2684 | 0x00000004 | uint32_t HashData count
   2685 | 0x........ | uint32_t HashData[0] DIE offset
   2686 | 0x........ | uint32_t HashData[1] DIE offset
   2687 | 0x........ | uint32_t HashData[2] DIE offset
   2688 | 0x........ | uint32_t HashData[3] DIE offset
   2689 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
   2690 | 0x00000002 | uint32_t HashData count
   2691 | 0x........ | uint32_t HashData[0] DIE offset
   2692 | 0x........ | uint32_t HashData[1] DIE offset
   2693 | 0x00000000 | uint32_t KeyType (end of hash chain)
   2694 `------------'
   2695 </pre>
   2696 </div>
   2697 <p>Current testing with real world C++ binaries has shown that there is around 1
   2698   32 bit hash collision per 100,000 name entries.</p>
   2699 </div>
   2700 <!-- ======================================================================= -->
   2701 <h4>
   2702   <a name="acceltablecontents">Contents</a>
   2703 </h4>
   2704 <!-- ======================================================================= -->
   2705 <div>
   2706 <p>As we said, we want to strictly define exactly what is included in the
   2707   different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
   2708   and ".apple_namespaces".</p>
   2709 
   2710 <p>".apple_names" sections should contain an entry for each DWARF DIE whose
   2711   DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
   2712   has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
   2713   DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
   2714   in the location (global and static variables). All global and static variables
   2715   should be included, including those scoped within functions and classes. For
   2716   example using the following code:</p>
   2717 <div class="doc_code">
   2718 <pre>
   2719 static int var = 0;
   2720 
   2721 void f ()
   2722 {
   2723   static int var = 0;
   2724 }
   2725 </pre>
   2726 </div>
   2727 <p>Both of the static "var" variables would be included in the table. All
   2728   functions should emit both their full names and their basenames. For C or C++,
   2729   the full name is the mangled name (if available) which is usually in the
   2730   DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
   2731   basename. If global or static variables have a mangled name in a
   2732   DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
   2733   simple name found in the DW_AT_name attribute.</p>
   2734 
   2735 <p>".apple_types" sections should contain an entry for each DWARF DIE whose
   2736   tag is one of:</p>
   2737 <ul>
   2738   <li>DW_TAG_array_type</li>
   2739   <li>DW_TAG_class_type</li>
   2740   <li>DW_TAG_enumeration_type</li>
   2741   <li>DW_TAG_pointer_type</li>
   2742   <li>DW_TAG_reference_type</li>
   2743   <li>DW_TAG_string_type</li>
   2744   <li>DW_TAG_structure_type</li>
   2745   <li>DW_TAG_subroutine_type</li>
   2746   <li>DW_TAG_typedef</li>
   2747   <li>DW_TAG_union_type</li>
   2748   <li>DW_TAG_ptr_to_member_type</li>
   2749   <li>DW_TAG_set_type</li>
   2750   <li>DW_TAG_subrange_type</li>
   2751   <li>DW_TAG_base_type</li>
   2752   <li>DW_TAG_const_type</li>
   2753   <li>DW_TAG_constant</li>
   2754   <li>DW_TAG_file_type</li>
   2755   <li>DW_TAG_namelist</li>
   2756   <li>DW_TAG_packed_type</li>
   2757   <li>DW_TAG_volatile_type</li>
   2758   <li>DW_TAG_restrict_type</li>
   2759   <li>DW_TAG_interface_type</li>
   2760   <li>DW_TAG_unspecified_type</li>
   2761   <li>DW_TAG_shared_type</li>
   2762 </ul>
   2763 <p>Only entries with a DW_AT_name attribute are included, and the entry must
   2764   not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
   2765   For example, using the following code:</p>
   2766 <div class="doc_code">
   2767 <pre>
   2768 int main ()
   2769 {
   2770   int *b = 0;
   2771   return *b;
   2772 }
   2773 </pre>
   2774 </div>
   2775 <p>We get a few type DIEs:</p>
   2776 <div class="doc_code">
   2777 <pre>
   2778 0x00000067:     TAG_base_type [5]
   2779                 AT_encoding( DW_ATE_signed )
   2780                 AT_name( "int" )
   2781                 AT_byte_size( 0x04 )
   2782 
   2783 0x0000006e:     TAG_pointer_type [6]
   2784                 AT_type( {0x00000067} ( int ) )
   2785                 AT_byte_size( 0x08 )
   2786 </pre>
   2787 </div>
   2788 <p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p>
   2789 
   2790 <p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If
   2791   we run into a namespace that has no name this is an anonymous namespace,
   2792   and the name should be output as "(anonymous namespace)" (without the quotes).
   2793   Why? This matches the output of the abi::cxa_demangle() that is in the standard
   2794   C++ library that demangles mangled names.</p>
   2795 </div>
   2796 
   2797 <!-- ======================================================================= -->
   2798 <h4>
   2799   <a name="acceltableextensions">Language Extensions and File Format Changes</a>
   2800 </h4>
   2801 <!-- ======================================================================= -->
   2802 <div>
   2803 <h5>Objective-C Extensions</h5>
   2804 <p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an
   2805   Objective-C class. The name used in the hash table is the name of the
   2806   Objective-C class itself. If the Objective-C class has a category, then an
   2807   entry is made for both the class name without the category, and for the class
   2808   name with the category. So if we have a DIE at offset 0x1234 with a name
   2809   of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
   2810   an entry for "NSString" that points to DIE 0x1234, and an entry for
   2811   "NSString(my_additions)" that points to 0x1234. This allows us to quickly
   2812   track down all Objective-C methods for an Objective-C class when doing
   2813   expressions. It is needed because of the dynamic nature of Objective-C where
   2814   anyone can add methods to a class. The DWARF for Objective-C methods is also
   2815   emitted differently from C++ classes where the methods are not usually
   2816   contained in the class definition, they are scattered about across one or more
   2817   compile units. Categories can also be defined in different shared libraries.
   2818   So we need to be able to quickly find all of the methods and class functions
   2819   given the Objective-C class name, or quickly find all methods and class
   2820   functions for a class + category name. This table does not contain any selector
   2821   names, it just maps Objective-C class names (or class names + category) to all
   2822   of the methods and class functions. The selectors are added as function
   2823   basenames in the .debug_names section.</p>
   2824 
   2825 <p>In the ".apple_names" section for Objective-C functions, the full name is the
   2826   entire function name with the brackets ("-[NSString stringWithCString:]") and the
   2827   basename is the selector only ("stringWithCString:").</p>
   2828 
   2829 <h5>Mach-O Changes</h5>
   2830 <p>The sections names for the apple hash tables are for non mach-o files. For
   2831   mach-o files, the sections should be contained in the "__DWARF" segment with
   2832   names as follows:</p>
   2833 <ul>
   2834   <li>".apple_names" -> "__apple_names"</li>
   2835   <li>".apple_types" -> "__apple_types"</li>
   2836   <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li>
   2837   <li> ".apple_objc" -> "__apple_objc"</li>
   2838 </ul>
   2839 </div>
   2840 </div>
   2841 </div>
   2842 
   2843 <!-- *********************************************************************** -->
   2844 
   2845 <hr>
   2846 <address>
   2847   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
   2848   src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
   2849   <a href="http://validator.w3.org/check/referer"><img
   2850   src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
   2851 
   2852   <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br>
   2853   <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
   2854   Last modified: $Date$
   2855 </address>
   2856 
   2857 </body>
   2858 </html>
   2859