Home | History | Annotate | Download | only in docs
      1 ================================
      2 Source Level Debugging with LLVM
      3 ================================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 This document is the central repository for all information pertaining to debug
     12 information in LLVM.  It describes the :ref:`actual format that the LLVM debug
     13 information takes <format>`, which is useful for those interested in creating
     14 front-ends or dealing directly with the information.  Further, this document
     15 provides specific examples of what debug information for C/C++ looks like.
     16 
     17 Philosophy behind LLVM debugging information
     18 --------------------------------------------
     19 
     20 The idea of the LLVM debugging information is to capture how the important
     21 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
     22 Several design aspects have shaped the solution that appears here.  The
     23 important ones are:
     24 
     25 * Debugging information should have very little impact on the rest of the
     26   compiler.  No transformations, analyses, or code generators should need to
     27   be modified because of debugging information.
     28 
     29 * LLVM optimizations should interact in :ref:`well-defined and easily described
     30   ways <intro_debugopt>` with the debugging information.
     31 
     32 * Because LLVM is designed to support arbitrary programming languages,
     33   LLVM-to-LLVM tools should not need to know anything about the semantics of
     34   the source-level-language.
     35 
     36 * Source-level languages are often **widely** different from one another.
     37   LLVM should not put any restrictions of the flavor of the source-language,
     38   and the debugging information should work with any language.
     39 
     40 * With code generator support, it should be possible to use an LLVM compiler
     41   to compile a program to native machine code and standard debugging
     42   formats.  This allows compatibility with traditional machine-code level
     43   debuggers, like GDB or DBX.
     44 
     45 The approach used by the LLVM implementation is to use a small set of
     46 :ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
     47 between LLVM program objects and the source-level objects.  The description of
     48 the source-level program is maintained in LLVM metadata in an
     49 :ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
     50 currently uses working draft 7 of the `DWARF 3 standard
     51 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
     52 
     53 When a program is being debugged, a debugger interacts with the user and turns
     54 the stored debug information into source-language specific information.  As
     55 such, a debugger must be aware of the source-language, and is thus tied to a
     56 specific language or family of languages.
     57 
     58 Debug information consumers
     59 ---------------------------
     60 
     61 The role of debug information is to provide meta information normally stripped
     62 away during the compilation process.  This meta information provides an LLVM
     63 user a relationship between generated code and the original program source
     64 code.
     65 
     66 Currently, there are two backend consumers of debug info: DwarfDebug and
     67 CodeViewDebug. DwarfDebug produces DWARF sutable for use with GDB, LLDB, and
     68 other DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView,
     69 the Microsoft debug info format, which is usable with Microsoft debuggers such
     70 as Visual Studio and WinDBG. LLVM's debug information format is mostly derived
     71 from and inspired by DWARF, but it is feasible to translate into other target
     72 debug info formats such as STABS.
     73 
     74 It would also be reasonable to use debug information to feed profiling tools
     75 for analysis of generated code, or, tools for reconstructing the original
     76 source from generated code.
     77 
     78 .. _intro_debugopt:
     79 
     80 Debugging optimized code
     81 ------------------------
     82 
     83 An extremely high priority of LLVM debugging information is to make it interact
     84 well with optimizations and analysis.  In particular, the LLVM debug
     85 information provides the following guarantees:
     86 
     87 * LLVM debug information **always provides information to accurately read
     88   the source-level state of the program**, regardless of which LLVM
     89   optimizations have been run, and without any modification to the
     90   optimizations themselves.  However, some optimizations may impact the
     91   ability to modify the current state of the program with a debugger, such
     92   as setting program variables, or calling functions that have been
     93   deleted.
     94 
     95 * As desired, LLVM optimizations can be upgraded to be aware of the LLVM
     96   debugging information, allowing them to update the debugging information
     97   as they perform aggressive optimizations.  This means that, with effort,
     98   the LLVM optimizers could optimize debug code just as well as non-debug
     99   code.
    100 
    101 * LLVM debug information does not prevent optimizations from
    102   happening (for example inlining, basic block reordering/merging/cleanup,
    103   tail duplication, etc).
    104 
    105 * LLVM debug information is automatically optimized along with the rest of
    106   the program, using existing facilities.  For example, duplicate
    107   information is automatically merged by the linker, and unused information
    108   is automatically removed.
    109 
    110 Basically, the debug information allows you to compile a program with
    111 "``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
    112 the program as it executes from a debugger.  Compiling a program with
    113 "``-O3 -g``" gives you full debug information that is always available and
    114 accurate for reading (e.g., you get accurate stack traces despite tail call
    115 elimination and inlining), but you might lose the ability to modify the program
    116 and call functions where were optimized out of the program, or inlined away
    117 completely.
    118 
    119 :ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test
    120 optimizer's handling of debugging information.  It can be run like this:
    121 
    122 .. code-block:: bash
    123 
    124   % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
    125   % make TEST=dbgopt
    126 
    127 This will test impact of debugging information on optimization passes.  If
    128 debugging information influences optimization passes then it will be reported
    129 as a failure.  See :doc:`TestingGuide` for more information on LLVM test
    130 infrastructure and how to run various tests.
    131 
    132 .. _format:
    133 
    134 Debugging information format
    135 ============================
    136 
    137 LLVM debugging information has been carefully designed to make it possible for
    138 the optimizer to optimize the program and debugging information without
    139 necessarily having to know anything about debugging information.  In
    140 particular, the use of metadata avoids duplicated debugging information from
    141 the beginning, and the global dead code elimination pass automatically deletes
    142 debugging information for a function if it decides to delete the function.
    143 
    144 To do this, most of the debugging information (descriptors for types,
    145 variables, functions, source files, etc) is inserted by the language front-end
    146 in the form of LLVM metadata.
    147 
    148 Debug information is designed to be agnostic about the target debugger and
    149 debugging information representation (e.g. DWARF/Stabs/etc).  It uses a generic
    150 pass to decode the information that represents variables, types, functions,
    151 namespaces, etc: this allows for arbitrary source-language semantics and
    152 type-systems to be used, as long as there is a module written for the target
    153 debugger to interpret the information.
    154 
    155 To provide basic functionality, the LLVM debugger does have to make some
    156 assumptions about the source-level language being debugged, though it keeps
    157 these to a minimum.  The only common features that the LLVM debugger assumes
    158 exist are `source files <LangRef.html#difile>`_, and `program objects
    159 <LangRef.html#diglobalvariable>`_.  These abstract objects are used by a
    160 debugger to form stack traces, show information about local variables, etc.
    161 
    162 This section of the documentation first describes the representation aspects
    163 common to any source-language.  :ref:`ccxx_frontend` describes the data layout
    164 conventions used by the C and C++ front-ends.
    165 
    166 Debug information descriptors are `specialized metadata nodes
    167 <LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``.
    168 
    169 .. _format_common_intrinsics:
    170 
    171 Debugger intrinsic functions
    172 ----------------------------
    173 
    174 LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
    175 provide debug information at various points in generated code.
    176 
    177 ``llvm.dbg.declare``
    178 ^^^^^^^^^^^^^^^^^^^^
    179 
    180 .. code-block:: llvm
    181 
    182   void @llvm.dbg.declare(metadata, metadata, metadata)
    183 
    184 This intrinsic provides information about a local element (e.g., variable).
    185 The first argument is metadata holding the alloca for the variable.  The second
    186 argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
    187 description of the variable.  The third argument is a `complex expression
    188 <LangRef.html#diexpression>`_.
    189 
    190 ``llvm.dbg.value``
    191 ^^^^^^^^^^^^^^^^^^
    192 
    193 .. code-block:: llvm
    194 
    195   void @llvm.dbg.value(metadata, i64, metadata, metadata)
    196 
    197 This intrinsic provides information when a user source variable is set to a new
    198 value.  The first argument is the new value (wrapped as metadata).  The second
    199 argument is the offset in the user source variable where the new value is
    200 written.  The third argument is a `local variable
    201 <LangRef.html#dilocalvariable>`_ containing a description of the variable.  The
    202 fourth argument is a `complex expression <LangRef.html#diexpression>`_.
    203 
    204 Object lifetimes and scoping
    205 ============================
    206 
    207 In many languages, the local variables in functions can have their lifetimes or
    208 scopes limited to a subset of a function.  In the C family of languages, for
    209 example, variables are only live (readable and writable) within the source
    210 block that they are defined in.  In functional languages, values are only
    211 readable after they have been defined.  Though this is a very obvious concept,
    212 it is non-trivial to model in LLVM, because it has no notion of scoping in this
    213 sense, and does not want to be tied to a language's scoping rules.
    214 
    215 In order to handle this, the LLVM debug format uses the metadata attached to
    216 llvm instructions to encode line number and scoping information.  Consider the
    217 following C fragment, for example:
    218 
    219 .. code-block:: c
    220 
    221   1.  void foo() {
    222   2.    int X = 21;
    223   3.    int Y = 22;
    224   4.    {
    225   5.      int Z = 23;
    226   6.      Z = X;
    227   7.    }
    228   8.    X = Y;
    229   9.  }
    230 
    231 Compiled to LLVM, this function would be represented like this:
    232 
    233 .. code-block:: llvm
    234 
    235   ; Function Attrs: nounwind ssp uwtable
    236   define void @foo() #0 !dbg !4 {
    237   entry:
    238     %X = alloca i32, align 4
    239     %Y = alloca i32, align 4
    240     %Z = alloca i32, align 4
    241     call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
    242     store i32 21, i32* %X, align 4, !dbg !14
    243     call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
    244     store i32 22, i32* %Y, align 4, !dbg !16
    245     call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
    246     store i32 23, i32* %Z, align 4, !dbg !19
    247     %0 = load i32, i32* %X, align 4, !dbg !20
    248     store i32 %0, i32* %Z, align 4, !dbg !21
    249     %1 = load i32, i32* %Y, align 4, !dbg !22
    250     store i32 %1, i32* %X, align 4, !dbg !23
    251     ret void, !dbg !24
    252   }
    253 
    254   ; Function Attrs: nounwind readnone
    255   declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
    256 
    257   attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
    258   attributes #1 = { nounwind readnone }
    259 
    260   !llvm.dbg.cu = !{!0}
    261   !llvm.module.flags = !{!7, !8, !9}
    262   !llvm.ident = !{!10}
    263 
    264   !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2)
    265   !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
    266   !2 = !{}
    267   !3 = !{!4}
    268   !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, variables: !2)
    269   !5 = !DISubroutineType(types: !6)
    270   !6 = !{null}
    271   !7 = !{i32 2, !"Dwarf Version", i32 2}
    272   !8 = !{i32 2, !"Debug Info Version", i32 3}
    273   !9 = !{i32 1, !"PIC Level", i32 2}
    274   !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
    275   !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
    276   !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
    277   !13 = !DIExpression()
    278   !14 = !DILocation(line: 2, column: 9, scope: !4)
    279   !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
    280   !16 = !DILocation(line: 3, column: 9, scope: !4)
    281   !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
    282   !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
    283   !19 = !DILocation(line: 5, column: 11, scope: !18)
    284   !20 = !DILocation(line: 6, column: 11, scope: !18)
    285   !21 = !DILocation(line: 6, column: 9, scope: !18)
    286   !22 = !DILocation(line: 8, column: 9, scope: !4)
    287   !23 = !DILocation(line: 8, column: 7, scope: !4)
    288   !24 = !DILocation(line: 9, column: 3, scope: !4)
    289 
    290 
    291 This example illustrates a few important details about LLVM debugging
    292 information.  In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
    293 location information, which are attached to an instruction, are applied
    294 together to allow a debugger to analyze the relationship between statements,
    295 variable definitions, and the code used to implement the function.
    296 
    297 .. code-block:: llvm
    298 
    299   call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
    300     ; [debug line = 2:7] [debug variable = X]
    301 
    302 The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
    303 variable ``X``.  The metadata ``!dbg !14`` attached to the intrinsic provides
    304 scope information for the variable ``X``.
    305 
    306 .. code-block:: llvm
    307 
    308   !14 = !DILocation(line: 2, column: 9, scope: !4)
    309   !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
    310                               isLocal: false, isDefinition: true, scopeLine: 1,
    311                               isOptimized: false, variables: !2)
    312 
    313 Here ``!14`` is metadata providing `location information
    314 <LangRef.html#dilocation>`_.  In this example, scope is encoded by ``!4``, a
    315 `subprogram descriptor <LangRef.html#disubprogram>`_.  This way the location
    316 information attached to the intrinsics indicates that the variable ``X`` is
    317 declared at line number 2 at a function level scope in function ``foo``.
    318 
    319 Now lets take another example.
    320 
    321 .. code-block:: llvm
    322 
    323   call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
    324     ; [debug line = 5:9] [debug variable = Z]
    325 
    326 The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
    327 variable ``Z``.  The metadata ``!dbg !19`` attached to the intrinsic provides
    328 scope information for the variable ``Z``.
    329 
    330 .. code-block:: llvm
    331 
    332   !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
    333   !19 = !DILocation(line: 5, column: 11, scope: !18)
    334 
    335 Here ``!19`` indicates that ``Z`` is declared at line number 5 and column
    336 number 0 inside of lexical scope ``!18``.  The lexical scope itself resides
    337 inside of subprogram ``!4`` described above.
    338 
    339 The scope information attached with each instruction provides a straightforward
    340 way to find instructions covered by a scope.
    341 
    342 .. _ccxx_frontend:
    343 
    344 C/C++ front-end specific debug information
    345 ==========================================
    346 
    347 The C and C++ front-ends represent information about the program in a format
    348 that is effectively identical to `DWARF 3.0
    349 <http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
    350 content.  This allows code generators to trivially support native debuggers by
    351 generating standard dwarf information, and contains enough information for
    352 non-dwarf targets to translate it as needed.
    353 
    354 This section describes the forms used to represent C and C++ programs.  Other
    355 languages could pattern themselves after this (which itself is tuned to
    356 representing programs in the same way that DWARF 3 does), or they could choose
    357 to provide completely different forms if they don't fit into the DWARF model.
    358 As support for debugging information gets added to the various LLVM
    359 source-language front-ends, the information used should be documented here.
    360 
    361 The following sections provide examples of a few C/C++ constructs and the debug
    362 information that would best describe those constructs.  The canonical
    363 references are the ``DIDescriptor`` classes defined in
    364 ``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions
    365 in ``lib/IR/DIBuilder.cpp``.
    366 
    367 C/C++ source file information
    368 -----------------------------
    369 
    370 ``llvm::Instruction`` provides easy access to metadata attached with an
    371 instruction.  One can extract line number information encoded in LLVM IR using
    372 ``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
    373 
    374 .. code-block:: c++
    375 
    376   if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
    377     unsigned Line = Loc->getLine();
    378     StringRef File = Loc->getFilename();
    379     StringRef Dir = Loc->getDirectory();
    380   }
    381 
    382 C/C++ global variable information
    383 ---------------------------------
    384 
    385 Given an integer global variable declared as follows:
    386 
    387 .. code-block:: c
    388 
    389   int MyGlobal = 100;
    390 
    391 a C/C++ front-end would generate the following descriptors:
    392 
    393 .. code-block:: llvm
    394 
    395   ;;
    396   ;; Define the global itself.
    397   ;;
    398   @MyGlobal = global i32 100, align 4
    399 
    400   ;;
    401   ;; List of debug info of globals
    402   ;;
    403   !llvm.dbg.cu = !{!0}
    404 
    405   ;; Some unrelated metadata.
    406   !llvm.module.flags = !{!6, !7}
    407 
    408   ;; Define the compile unit.
    409   !0 = !DICompileUnit(language: DW_LANG_C99, file: !1,
    410                       producer:
    411                       "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)",
    412                       isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug,
    413                       enums: !2, retainedTypes: !2, subprograms: !2, globals:
    414                       !3, imports: !2)
    415 
    416   ;;
    417   ;; Define the file
    418   ;;
    419   !1 = !DIFile(filename: "/dev/stdin",
    420                directory: "/Users/dexonsmith/data/llvm/debug-info")
    421 
    422   ;; An empty array.
    423   !2 = !{}
    424 
    425   ;; The Array of Global Variables
    426   !3 = !{!4}
    427 
    428   ;;
    429   ;; Define the global variable itself.
    430   ;;
    431   !4 = !DIGlobalVariable(name: "MyGlobal", scope: !0, file: !1, line: 1,
    432                          type: !5, isLocal: false, isDefinition: true,
    433                          variable: i32* @MyGlobal)
    434 
    435   ;;
    436   ;; Define the type
    437   ;;
    438   !5 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
    439 
    440   ;; Dwarf version to output.
    441   !6 = !{i32 2, !"Dwarf Version", i32 2}
    442 
    443   ;; Debug info schema version.
    444   !7 = !{i32 2, !"Debug Info Version", i32 3}
    445 
    446 C/C++ function information
    447 --------------------------
    448 
    449 Given a function declared as follows:
    450 
    451 .. code-block:: c
    452 
    453   int main(int argc, char *argv[]) {
    454     return 0;
    455   }
    456 
    457 a C/C++ front-end would generate the following descriptors:
    458 
    459 .. code-block:: llvm
    460 
    461   ;;
    462   ;; Define the anchor for subprograms.
    463   ;;
    464   !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
    465                      isLocal: false, isDefinition: true, scopeLine: 1,
    466                      flags: DIFlagPrototyped, isOptimized: false,
    467                      variables: !2)
    468 
    469   ;;
    470   ;; Define the subprogram itself.
    471   ;;
    472   define i32 @main(i32 %argc, i8** %argv) !dbg !4 {
    473   ...
    474   }
    475 
    476 Debugging information format
    477 ============================
    478 
    479 Debugging Information Extension for Objective C Properties
    480 ----------------------------------------------------------
    481 
    482 Introduction
    483 ^^^^^^^^^^^^
    484 
    485 Objective C provides a simpler way to declare and define accessor methods using
    486 declared properties.  The language provides features to declare a property and
    487 to let compiler synthesize accessor methods.
    488 
    489 The debugger lets developer inspect Objective C interfaces and their instance
    490 variables and class variables.  However, the debugger does not know anything
    491 about the properties defined in Objective C interfaces.  The debugger consumes
    492 information generated by compiler in DWARF format.  The format does not support
    493 encoding of Objective C properties.  This proposal describes DWARF extensions to
    494 encode Objective C properties, which the debugger can use to let developers
    495 inspect Objective C properties.
    496 
    497 Proposal
    498 ^^^^^^^^
    499 
    500 Objective C properties exist separately from class members.  A property can be
    501 defined only by "setter" and "getter" selectors, and be calculated anew on each
    502 access.  Or a property can just be a direct access to some declared ivar.
    503 Finally it can have an ivar "automatically synthesized" for it by the compiler,
    504 in which case the property can be referred to in user code directly using the
    505 standard C dereference syntax as well as through the property "dot" syntax, but
    506 there is no entry in the ``@interface`` declaration corresponding to this ivar.
    507 
    508 To facilitate debugging, these properties we will add a new DWARF TAG into the
    509 ``DW_TAG_structure_type`` definition for the class to hold the description of a
    510 given property, and a set of DWARF attributes that provide said description.
    511 The property tag will also contain the name and declared type of the property.
    512 
    513 If there is a related ivar, there will also be a DWARF property attribute placed
    514 in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
    515 for that property.  And in the case where the compiler synthesizes the ivar
    516 directly, the compiler is expected to generate a ``DW_TAG_member`` for that
    517 ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
    518 to access this ivar directly in code, and with the property attribute pointing
    519 back to the property it is backing.
    520 
    521 The following examples will serve as illustration for our discussion:
    522 
    523 .. code-block:: objc
    524 
    525   @interface I1 {
    526     int n2;
    527   }
    528 
    529   @property int p1;
    530   @property int p2;
    531   @end
    532 
    533   @implementation I1
    534   @synthesize p1;
    535   @synthesize p2 = n2;
    536   @end
    537 
    538 This produces the following DWARF (this is a "pseudo dwarfdump" output):
    539 
    540 .. code-block:: none
    541 
    542   0x00000100:  TAG_structure_type [7] *
    543                  AT_APPLE_runtime_class( 0x10 )
    544                  AT_name( "I1" )
    545                  AT_decl_file( "Objc_Property.m" )
    546                  AT_decl_line( 3 )
    547 
    548   0x00000110    TAG_APPLE_property
    549                   AT_name ( "p1" )
    550                   AT_type ( {0x00000150} ( int ) )
    551 
    552   0x00000120:   TAG_APPLE_property
    553                   AT_name ( "p2" )
    554                   AT_type ( {0x00000150} ( int ) )
    555 
    556   0x00000130:   TAG_member [8]
    557                   AT_name( "_p1" )
    558                   AT_APPLE_property ( {0x00000110} "p1" )
    559                   AT_type( {0x00000150} ( int ) )
    560                   AT_artificial ( 0x1 )
    561 
    562   0x00000140:    TAG_member [8]
    563                    AT_name( "n2" )
    564                    AT_APPLE_property ( {0x00000120} "p2" )
    565                    AT_type( {0x00000150} ( int ) )
    566 
    567   0x00000150:  AT_type( ( int ) )
    568 
    569 Note, the current convention is that the name of the ivar for an
    570 auto-synthesized property is the name of the property from which it derives
    571 with an underscore prepended, as is shown in the example.  But we actually
    572 don't need to know this convention, since we are given the name of the ivar
    573 directly.
    574 
    575 Also, it is common practice in ObjC to have different property declarations in
    576 the @interface and @implementation - e.g. to provide a read-only property in
    577 the interface,and a read-write interface in the implementation.  In that case,
    578 the compiler should emit whichever property declaration will be in force in the
    579 current translation unit.
    580 
    581 Developers can decorate a property with attributes which are encoded using
    582 ``DW_AT_APPLE_property_attribute``.
    583 
    584 .. code-block:: objc
    585 
    586   @property (readonly, nonatomic) int pr;
    587 
    588 .. code-block:: none
    589 
    590   TAG_APPLE_property [8]
    591     AT_name( "pr" )
    592     AT_type ( {0x00000147} (int) )
    593     AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
    594 
    595 The setter and getter method names are attached to the property using
    596 ``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
    597 
    598 .. code-block:: objc
    599 
    600   @interface I1
    601   @property (setter=myOwnP3Setter:) int p3;
    602   -(void)myOwnP3Setter:(int)a;
    603   @end
    604 
    605   @implementation I1
    606   @synthesize p3;
    607   -(void)myOwnP3Setter:(int)a{ }
    608   @end
    609 
    610 The DWARF for this would be:
    611 
    612 .. code-block:: none
    613 
    614   0x000003bd: TAG_structure_type [7] *
    615                 AT_APPLE_runtime_class( 0x10 )
    616                 AT_name( "I1" )
    617                 AT_decl_file( "Objc_Property.m" )
    618                 AT_decl_line( 3 )
    619 
    620   0x000003cd      TAG_APPLE_property
    621                     AT_name ( "p3" )
    622                     AT_APPLE_property_setter ( "myOwnP3Setter:" )
    623                     AT_type( {0x00000147} ( int ) )
    624 
    625   0x000003f3:     TAG_member [8]
    626                     AT_name( "_p3" )
    627                     AT_type ( {0x00000147} ( int ) )
    628                     AT_APPLE_property ( {0x000003cd} )
    629                     AT_artificial ( 0x1 )
    630 
    631 New DWARF Tags
    632 ^^^^^^^^^^^^^^
    633 
    634 +-----------------------+--------+
    635 | TAG                   | Value  |
    636 +=======================+========+
    637 | DW_TAG_APPLE_property | 0x4200 |
    638 +-----------------------+--------+
    639 
    640 New DWARF Attributes
    641 ^^^^^^^^^^^^^^^^^^^^
    642 
    643 +--------------------------------+--------+-----------+
    644 | Attribute                      | Value  | Classes   |
    645 +================================+========+===========+
    646 | DW_AT_APPLE_property           | 0x3fed | Reference |
    647 +--------------------------------+--------+-----------+
    648 | DW_AT_APPLE_property_getter    | 0x3fe9 | String    |
    649 +--------------------------------+--------+-----------+
    650 | DW_AT_APPLE_property_setter    | 0x3fea | String    |
    651 +--------------------------------+--------+-----------+
    652 | DW_AT_APPLE_property_attribute | 0x3feb | Constant  |
    653 +--------------------------------+--------+-----------+
    654 
    655 New DWARF Constants
    656 ^^^^^^^^^^^^^^^^^^^
    657 
    658 +--------------------------------------+-------+
    659 | Name                                 | Value |
    660 +======================================+=======+
    661 | DW_APPLE_PROPERTY_readonly           | 0x01  |
    662 +--------------------------------------+-------+
    663 | DW_APPLE_PROPERTY_getter             | 0x02  |
    664 +--------------------------------------+-------+
    665 | DW_APPLE_PROPERTY_assign             | 0x04  |
    666 +--------------------------------------+-------+
    667 | DW_APPLE_PROPERTY_readwrite          | 0x08  |
    668 +--------------------------------------+-------+
    669 | DW_APPLE_PROPERTY_retain             | 0x10  |
    670 +--------------------------------------+-------+
    671 | DW_APPLE_PROPERTY_copy               | 0x20  |
    672 +--------------------------------------+-------+
    673 | DW_APPLE_PROPERTY_nonatomic          | 0x40  |
    674 +--------------------------------------+-------+
    675 | DW_APPLE_PROPERTY_setter             | 0x80  |
    676 +--------------------------------------+-------+
    677 | DW_APPLE_PROPERTY_atomic             | 0x100 |
    678 +--------------------------------------+-------+
    679 | DW_APPLE_PROPERTY_weak               | 0x200 |
    680 +--------------------------------------+-------+
    681 | DW_APPLE_PROPERTY_strong             | 0x400 |
    682 +--------------------------------------+-------+
    683 | DW_APPLE_PROPERTY_unsafe_unretained  | 0x800 |
    684 +--------------------------------------+-------+
    685 | DW_APPLE_PROPERTY_nullability        | 0x1000|
    686 +--------------------------------------+-------+
    687 | DW_APPLE_PROPERTY_null_resettable    | 0x2000|
    688 +--------------------------------------+-------+
    689 | DW_APPLE_PROPERTY_class              | 0x4000|
    690 +--------------------------------------+-------+
    691 
    692 Name Accelerator Tables
    693 -----------------------
    694 
    695 Introduction
    696 ^^^^^^^^^^^^
    697 
    698 The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
    699 debugger needs.  The "``pub``" in the section name indicates that the entries
    700 in the table are publicly visible names only.  This means no static or hidden
    701 functions show up in the "``.debug_pubnames``".  No static variables or private
    702 class variables are in the "``.debug_pubtypes``".  Many compilers add different
    703 things to these tables, so we can't rely upon the contents between gcc, icc, or
    704 clang.
    705 
    706 The typical query given by users tends not to match up with the contents of
    707 these tables.  For example, the DWARF spec states that "In the case of the name
    708 of a function member or static data member of a C++ structure, class or union,
    709 the name presented in the "``.debug_pubnames``" section is not the simple name
    710 given by the ``DW_AT_name attribute`` of the referenced debugging information
    711 entry, but rather the fully qualified name of the data or function member."
    712 So the only names in these tables for complex C++ entries is a fully
    713 qualified name.  Debugger users tend not to enter their search strings as
    714 "``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
    715 "``a::b::c``".  So the name entered in the name table must be demangled in
    716 order to chop it up appropriately and additional names must be manually entered
    717 into the table to make it effective as a name lookup table for debuggers to
    718 use.
    719 
    720 All debuggers currently ignore the "``.debug_pubnames``" table as a result of
    721 its inconsistent and useless public-only name content making it a waste of
    722 space in the object file.  These tables, when they are written to disk, are not
    723 sorted in any way, leaving every debugger to do its own parsing and sorting.
    724 These tables also include an inlined copy of the string values in the table
    725 itself making the tables much larger than they need to be on disk, especially
    726 for large C++ programs.
    727 
    728 Can't we just fix the sections by adding all of the names we need to this
    729 table? No, because that is not what the tables are defined to contain and we
    730 won't know the difference between the old bad tables and the new good tables.
    731 At best we could make our own renamed sections that contain all of the data we
    732 need.
    733 
    734 These tables are also insufficient for what a debugger like LLDB needs.  LLDB
    735 uses clang for its expression parsing where LLDB acts as a PCH.  LLDB is then
    736 often asked to look for type "``foo``" or namespace "``bar``", or list items in
    737 namespace "``baz``".  Namespaces are not included in the pubnames or pubtypes
    738 tables.  Since clang asks a lot of questions when it is parsing an expression,
    739 we need to be very fast when looking up names, as it happens a lot.  Having new
    740 accelerator tables that are optimized for very quick lookups will benefit this
    741 type of debugging experience greatly.
    742 
    743 We would like to generate name lookup tables that can be mapped into memory
    744 from disk, and used as is, with little or no up-front parsing.  We would also
    745 be able to control the exact content of these different tables so they contain
    746 exactly what we need.  The Name Accelerator Tables were designed to fix these
    747 issues.  In order to solve these issues we need to:
    748 
    749 * Have a format that can be mapped into memory from disk and used as is
    750 * Lookups should be very fast
    751 * Extensible table format so these tables can be made by many producers
    752 * Contain all of the names needed for typical lookups out of the box
    753 * Strict rules for the contents of tables
    754 
    755 Table size is important and the accelerator table format should allow the reuse
    756 of strings from common string tables so the strings for the names are not
    757 duplicated.  We also want to make sure the table is ready to be used as-is by
    758 simply mapping the table into memory with minimal header parsing.
    759 
    760 The name lookups need to be fast and optimized for the kinds of lookups that
    761 debuggers tend to do.  Optimally we would like to touch as few parts of the
    762 mapped table as possible when doing a name lookup and be able to quickly find
    763 the name entry we are looking for, or discover there are no matches.  In the
    764 case of debuggers we optimized for lookups that fail most of the time.
    765 
    766 Each table that is defined should have strict rules on exactly what is in the
    767 accelerator tables and documented so clients can rely on the content.
    768 
    769 Hash Tables
    770 ^^^^^^^^^^^
    771 
    772 Standard Hash Tables
    773 """"""""""""""""""""
    774 
    775 Typical hash tables have a header, buckets, and each bucket points to the
    776 bucket contents:
    777 
    778 .. code-block:: none
    779 
    780   .------------.
    781   |  HEADER    |
    782   |------------|
    783   |  BUCKETS   |
    784   |------------|
    785   |  DATA      |
    786   `------------'
    787 
    788 The BUCKETS are an array of offsets to DATA for each hash:
    789 
    790 .. code-block:: none
    791 
    792   .------------.
    793   | 0x00001000 | BUCKETS[0]
    794   | 0x00002000 | BUCKETS[1]
    795   | 0x00002200 | BUCKETS[2]
    796   | 0x000034f0 | BUCKETS[3]
    797   |            | ...
    798   | 0xXXXXXXXX | BUCKETS[n_buckets]
    799   '------------'
    800 
    801 So for ``bucket[3]`` in the example above, we have an offset into the table
    802 0x000034f0 which points to a chain of entries for the bucket.  Each bucket must
    803 contain a next pointer, full 32 bit hash value, the string itself, and the data
    804 for the current string value.
    805 
    806 .. code-block:: none
    807 
    808               .------------.
    809   0x000034f0: | 0x00003500 | next pointer
    810               | 0x12345678 | 32 bit hash
    811               | "erase"    | string value
    812               | data[n]    | HashData for this bucket
    813               |------------|
    814   0x00003500: | 0x00003550 | next pointer
    815               | 0x29273623 | 32 bit hash
    816               | "dump"     | string value
    817               | data[n]    | HashData for this bucket
    818               |------------|
    819   0x00003550: | 0x00000000 | next pointer
    820               | 0x82638293 | 32 bit hash
    821               | "main"     | string value
    822               | data[n]    | HashData for this bucket
    823               `------------'
    824 
    825 The problem with this layout for debuggers is that we need to optimize for the
    826 negative lookup case where the symbol we're searching for is not present.  So
    827 if we were to lookup "``printf``" in the table above, we would make a 32 hash
    828 for "``printf``", it might match ``bucket[3]``.  We would need to go to the
    829 offset 0x000034f0 and start looking to see if our 32 bit hash matches.  To do
    830 so, we need to read the next pointer, then read the hash, compare it, and skip
    831 to the next bucket.  Each time we are skipping many bytes in memory and
    832 touching new cache pages just to do the compare on the full 32 bit hash.  All
    833 of these accesses then tell us that we didn't have a match.
    834 
    835 Name Hash Tables
    836 """"""""""""""""
    837 
    838 To solve the issues mentioned above we have structured the hash tables a bit
    839 differently: a header, buckets, an array of all unique 32 bit hash values,
    840 followed by an array of hash value data offsets, one for each hash value, then
    841 the data for all hash values:
    842 
    843 .. code-block:: none
    844 
    845   .-------------.
    846   |  HEADER     |
    847   |-------------|
    848   |  BUCKETS    |
    849   |-------------|
    850   |  HASHES     |
    851   |-------------|
    852   |  OFFSETS    |
    853   |-------------|
    854   |  DATA       |
    855   `-------------'
    856 
    857 The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array.  By
    858 making all of the full 32 bit hash values contiguous in memory, we allow
    859 ourselves to efficiently check for a match while touching as little memory as
    860 possible.  Most often checking the 32 bit hash values is as far as the lookup
    861 goes.  If it does match, it usually is a match with no collisions.  So for a
    862 table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
    863 values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
    864 ``OFFSETS`` as:
    865 
    866 .. code-block:: none
    867 
    868   .-------------------------.
    869   |  HEADER.magic           | uint32_t
    870   |  HEADER.version         | uint16_t
    871   |  HEADER.hash_function   | uint16_t
    872   |  HEADER.bucket_count    | uint32_t
    873   |  HEADER.hashes_count    | uint32_t
    874   |  HEADER.header_data_len | uint32_t
    875   |  HEADER_DATA            | HeaderData
    876   |-------------------------|
    877   |  BUCKETS                | uint32_t[n_buckets] // 32 bit hash indexes
    878   |-------------------------|
    879   |  HASHES                 | uint32_t[n_hashes] // 32 bit hash values
    880   |-------------------------|
    881   |  OFFSETS                | uint32_t[n_hashes] // 32 bit offsets to hash value data
    882   |-------------------------|
    883   |  ALL HASH DATA          |
    884   `-------------------------'
    885 
    886 So taking the exact same data from the standard hash example above we end up
    887 with:
    888 
    889 .. code-block:: none
    890 
    891               .------------.
    892               | HEADER     |
    893               |------------|
    894               |          0 | BUCKETS[0]
    895               |          2 | BUCKETS[1]
    896               |          5 | BUCKETS[2]
    897               |          6 | BUCKETS[3]
    898               |            | ...
    899               |        ... | BUCKETS[n_buckets]
    900               |------------|
    901               | 0x........ | HASHES[0]
    902               | 0x........ | HASHES[1]
    903               | 0x........ | HASHES[2]
    904               | 0x........ | HASHES[3]
    905               | 0x........ | HASHES[4]
    906               | 0x........ | HASHES[5]
    907               | 0x12345678 | HASHES[6]    hash for BUCKETS[3]
    908               | 0x29273623 | HASHES[7]    hash for BUCKETS[3]
    909               | 0x82638293 | HASHES[8]    hash for BUCKETS[3]
    910               | 0x........ | HASHES[9]
    911               | 0x........ | HASHES[10]
    912               | 0x........ | HASHES[11]
    913               | 0x........ | HASHES[12]
    914               | 0x........ | HASHES[13]
    915               | 0x........ | HASHES[n_hashes]
    916               |------------|
    917               | 0x........ | OFFSETS[0]
    918               | 0x........ | OFFSETS[1]
    919               | 0x........ | OFFSETS[2]
    920               | 0x........ | OFFSETS[3]
    921               | 0x........ | OFFSETS[4]
    922               | 0x........ | OFFSETS[5]
    923               | 0x000034f0 | OFFSETS[6]   offset for BUCKETS[3]
    924               | 0x00003500 | OFFSETS[7]   offset for BUCKETS[3]
    925               | 0x00003550 | OFFSETS[8]   offset for BUCKETS[3]
    926               | 0x........ | OFFSETS[9]
    927               | 0x........ | OFFSETS[10]
    928               | 0x........ | OFFSETS[11]
    929               | 0x........ | OFFSETS[12]
    930               | 0x........ | OFFSETS[13]
    931               | 0x........ | OFFSETS[n_hashes]
    932               |------------|
    933               |            |
    934               |            |
    935               |            |
    936               |            |
    937               |            |
    938               |------------|
    939   0x000034f0: | 0x00001203 | .debug_str ("erase")
    940               | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
    941               | 0x........ | HashData[0]
    942               | 0x........ | HashData[1]
    943               | 0x........ | HashData[2]
    944               | 0x........ | HashData[3]
    945               | 0x00000000 | String offset into .debug_str (terminate data for hash)
    946               |------------|
    947   0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
    948               | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
    949               | 0x........ | HashData[0]
    950               | 0x........ | HashData[1]
    951               | 0x00001203 | String offset into .debug_str ("dump")
    952               | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
    953               | 0x........ | HashData[0]
    954               | 0x........ | HashData[1]
    955               | 0x........ | HashData[2]
    956               | 0x00000000 | String offset into .debug_str (terminate data for hash)
    957               |------------|
    958   0x00003550: | 0x00001203 | String offset into .debug_str ("main")
    959               | 0x00000009 | A 32 bit array count - number of HashData with name "main"
    960               | 0x........ | HashData[0]
    961               | 0x........ | HashData[1]
    962               | 0x........ | HashData[2]
    963               | 0x........ | HashData[3]
    964               | 0x........ | HashData[4]
    965               | 0x........ | HashData[5]
    966               | 0x........ | HashData[6]
    967               | 0x........ | HashData[7]
    968               | 0x........ | HashData[8]
    969               | 0x00000000 | String offset into .debug_str (terminate data for hash)
    970               `------------'
    971 
    972 So we still have all of the same data, we just organize it more efficiently for
    973 debugger lookup.  If we repeat the same "``printf``" lookup from above, we
    974 would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
    975 hash value and modulo it by ``n_buckets``.  ``BUCKETS[3]`` contains "6" which
    976 is the index into the ``HASHES`` table.  We would then compare any consecutive
    977 32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
    978 ``BUCKETS[3]``.  We do this by verifying that each subsequent hash value modulo
    979 ``n_buckets`` is still 3.  In the case of a failed lookup we would access the
    980 memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
    981 before we know that we have no match.  We don't end up marching through
    982 multiple words of memory and we really keep the number of processor data cache
    983 lines being accessed as small as possible.
    984 
    985 The string hash that is used for these lookup tables is the Daniel J.
    986 Bernstein hash which is also used in the ELF ``GNU_HASH`` sections.  It is a
    987 very good hash for all kinds of names in programs with very few hash
    988 collisions.
    989 
    990 Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
    991 
    992 Details
    993 ^^^^^^^
    994 
    995 These name hash tables are designed to be generic where specializations of the
    996 table get to define additional data that goes into the header ("``HeaderData``"),
    997 how the string value is stored ("``KeyType``") and the content of the data for each
    998 hash value.
    999 
   1000 Header Layout
   1001 """""""""""""
   1002 
   1003 The header has a fixed part, and the specialized part.  The exact format of the
   1004 header is:
   1005 
   1006 .. code-block:: c
   1007 
   1008   struct Header
   1009   {
   1010     uint32_t   magic;           // 'HASH' magic value to allow endian detection
   1011     uint16_t   version;         // Version number
   1012     uint16_t   hash_function;   // The hash function enumeration that was used
   1013     uint32_t   bucket_count;    // The number of buckets in this hash table
   1014     uint32_t   hashes_count;    // The total number of unique hash values and hash data offsets in this table
   1015     uint32_t   header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
   1016                                 // Specifically the length of the following HeaderData field - this does not
   1017                                 // include the size of the preceding fields
   1018     HeaderData header_data;     // Implementation specific header data
   1019   };
   1020 
   1021 The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
   1022 encoded as an ASCII integer.  This allows the detection of the start of the
   1023 hash table and also allows the table's byte order to be determined so the table
   1024 can be correctly extracted.  The "``magic``" value is followed by a 16 bit
   1025 ``version`` number which allows the table to be revised and modified in the
   1026 future.  The current version number is 1. ``hash_function`` is a ``uint16_t``
   1027 enumeration that specifies which hash function was used to produce this table.
   1028 The current values for the hash function enumerations include:
   1029 
   1030 .. code-block:: c
   1031 
   1032   enum HashFunctionType
   1033   {
   1034     eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
   1035   };
   1036 
   1037 ``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
   1038 are in the ``BUCKETS`` array.  ``hashes_count`` is the number of unique 32 bit
   1039 hash values that are in the ``HASHES`` array, and is the same number of offsets
   1040 are contained in the ``OFFSETS`` array.  ``header_data_len`` specifies the size
   1041 in bytes of the ``HeaderData`` that is filled in by specialized versions of
   1042 this table.
   1043 
   1044 Fixed Lookup
   1045 """"""""""""
   1046 
   1047 The header is followed by the buckets, hashes, offsets, and hash value data.
   1048 
   1049 .. code-block:: c
   1050 
   1051   struct FixedTable
   1052   {
   1053     uint32_t buckets[Header.bucket_count];  // An array of hash indexes into the "hashes[]" array below
   1054     uint32_t hashes [Header.hashes_count];  // Every unique 32 bit hash for the entire table is in this table
   1055     uint32_t offsets[Header.hashes_count];  // An offset that corresponds to each item in the "hashes[]" array above
   1056   };
   1057 
   1058 ``buckets`` is an array of 32 bit indexes into the ``hashes`` array.  The
   1059 ``hashes`` array contains all of the 32 bit hash values for all names in the
   1060 hash table.  Each hash in the ``hashes`` table has an offset in the ``offsets``
   1061 array that points to the data for the hash value.
   1062 
   1063 This table setup makes it very easy to repurpose these tables to contain
   1064 different data, while keeping the lookup mechanism the same for all tables.
   1065 This layout also makes it possible to save the table to disk and map it in
   1066 later and do very efficient name lookups with little or no parsing.
   1067 
   1068 DWARF lookup tables can be implemented in a variety of ways and can store a lot
   1069 of information for each name.  We want to make the DWARF tables extensible and
   1070 able to store the data efficiently so we have used some of the DWARF features
   1071 that enable efficient data storage to define exactly what kind of data we store
   1072 for each name.
   1073 
   1074 The ``HeaderData`` contains a definition of the contents of each HashData chunk.
   1075 We might want to store an offset to all of the debug information entries (DIEs)
   1076 for each name.  To keep things extensible, we create a list of items, or
   1077 Atoms, that are contained in the data for each name.  First comes the type of
   1078 the data in each atom:
   1079 
   1080 .. code-block:: c
   1081 
   1082   enum AtomType
   1083   {
   1084     eAtomTypeNULL       = 0u,
   1085     eAtomTypeDIEOffset  = 1u,   // DIE offset, check form for encoding
   1086     eAtomTypeCUOffset   = 2u,   // DIE offset of the compiler unit header that contains the item in question
   1087     eAtomTypeTag        = 3u,   // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
   1088     eAtomTypeNameFlags  = 4u,   // Flags from enum NameFlags
   1089     eAtomTypeTypeFlags  = 5u,   // Flags from enum TypeFlags
   1090   };
   1091 
   1092 The enumeration values and their meanings are:
   1093 
   1094 .. code-block:: none
   1095 
   1096   eAtomTypeNULL       - a termination atom that specifies the end of the atom list
   1097   eAtomTypeDIEOffset  - an offset into the .debug_info section for the DWARF DIE for this name
   1098   eAtomTypeCUOffset   - an offset into the .debug_info section for the CU that contains the DIE
   1099   eAtomTypeDIETag     - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
   1100   eAtomTypeNameFlags  - Flags for functions and global variables (isFunction, isInlined, isExternal...)
   1101   eAtomTypeTypeFlags  - Flags for types (isCXXClass, isObjCClass, ...)
   1102 
   1103 Then we allow each atom type to define the atom type and how the data for each
   1104 atom type data is encoded:
   1105 
   1106 .. code-block:: c
   1107 
   1108   struct Atom
   1109   {
   1110     uint16_t type;  // AtomType enum value
   1111     uint16_t form;  // DWARF DW_FORM_XXX defines
   1112   };
   1113 
   1114 The ``form`` type above is from the DWARF specification and defines the exact
   1115 encoding of the data for the Atom type.  See the DWARF specification for the
   1116 ``DW_FORM_`` definitions.
   1117 
   1118 .. code-block:: c
   1119 
   1120   struct HeaderData
   1121   {
   1122     uint32_t die_offset_base;
   1123     uint32_t atom_count;
   1124     Atoms    atoms[atom_count0];
   1125   };
   1126 
   1127 ``HeaderData`` defines the base DIE offset that should be added to any atoms
   1128 that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
   1129 ``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``.  It also defines
   1130 what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
   1131 each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
   1132 should be interpreted.
   1133 
   1134 For the current implementations of the "``.apple_names``" (all functions +
   1135 globals), the "``.apple_types``" (names of all types that are defined), and
   1136 the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
   1137 array to be:
   1138 
   1139 .. code-block:: c
   1140 
   1141   HeaderData.atom_count = 1;
   1142   HeaderData.atoms[0].type = eAtomTypeDIEOffset;
   1143   HeaderData.atoms[0].form = DW_FORM_data4;
   1144 
   1145 This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
   1146 encoded as a 32 bit value (DW_FORM_data4).  This allows a single name to have
   1147 multiple matching DIEs in a single file, which could come up with an inlined
   1148 function for instance.  Future tables could include more information about the
   1149 DIE such as flags indicating if the DIE is a function, method, block,
   1150 or inlined.
   1151 
   1152 The KeyType for the DWARF table is a 32 bit string table offset into the
   1153 ".debug_str" table.  The ".debug_str" is the string table for the DWARF which
   1154 may already contain copies of all of the strings.  This helps make sure, with
   1155 help from the compiler, that we reuse the strings between all of the DWARF
   1156 sections and keeps the hash table size down.  Another benefit to having the
   1157 compiler generate all strings as DW_FORM_strp in the debug info, is that
   1158 DWARF parsing can be made much faster.
   1159 
   1160 After a lookup is made, we get an offset into the hash data.  The hash data
   1161 needs to be able to deal with 32 bit hash collisions, so the chunk of data
   1162 at the offset in the hash data consists of a triple:
   1163 
   1164 .. code-block:: c
   1165 
   1166   uint32_t str_offset
   1167   uint32_t hash_data_count
   1168   HashData[hash_data_count]
   1169 
   1170 If "str_offset" is zero, then the bucket contents are done. 99.9% of the
   1171 hash data chunks contain a single item (no 32 bit hash collision):
   1172 
   1173 .. code-block:: none
   1174 
   1175   .------------.
   1176   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   1177   | 0x00000004 | uint32_t HashData count
   1178   | 0x........ | uint32_t HashData[0] DIE offset
   1179   | 0x........ | uint32_t HashData[1] DIE offset
   1180   | 0x........ | uint32_t HashData[2] DIE offset
   1181   | 0x........ | uint32_t HashData[3] DIE offset
   1182   | 0x00000000 | uint32_t KeyType (end of hash chain)
   1183   `------------'
   1184 
   1185 If there are collisions, you will have multiple valid string offsets:
   1186 
   1187 .. code-block:: none
   1188 
   1189   .------------.
   1190   | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
   1191   | 0x00000004 | uint32_t HashData count
   1192   | 0x........ | uint32_t HashData[0] DIE offset
   1193   | 0x........ | uint32_t HashData[1] DIE offset
   1194   | 0x........ | uint32_t HashData[2] DIE offset
   1195   | 0x........ | uint32_t HashData[3] DIE offset
   1196   | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
   1197   | 0x00000002 | uint32_t HashData count
   1198   | 0x........ | uint32_t HashData[0] DIE offset
   1199   | 0x........ | uint32_t HashData[1] DIE offset
   1200   | 0x00000000 | uint32_t KeyType (end of hash chain)
   1201   `------------'
   1202 
   1203 Current testing with real world C++ binaries has shown that there is around 1
   1204 32 bit hash collision per 100,000 name entries.
   1205 
   1206 Contents
   1207 ^^^^^^^^
   1208 
   1209 As we said, we want to strictly define exactly what is included in the
   1210 different tables.  For DWARF, we have 3 tables: "``.apple_names``",
   1211 "``.apple_types``", and "``.apple_namespaces``".
   1212 
   1213 "``.apple_names``" sections should contain an entry for each DWARF DIE whose
   1214 ``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
   1215 ``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
   1216 ``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``.  It also contains
   1217 ``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
   1218 static variables).  All global and static variables should be included,
   1219 including those scoped within functions and classes.  For example using the
   1220 following code:
   1221 
   1222 .. code-block:: c
   1223 
   1224   static int var = 0;
   1225 
   1226   void f ()
   1227   {
   1228     static int var = 0;
   1229   }
   1230 
   1231 Both of the static ``var`` variables would be included in the table.  All
   1232 functions should emit both their full names and their basenames.  For C or C++,
   1233 the full name is the mangled name (if available) which is usually in the
   1234 ``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
   1235 function basename.  If global or static variables have a mangled name in a
   1236 ``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
   1237 simple name found in the ``DW_AT_name`` attribute.
   1238 
   1239 "``.apple_types``" sections should contain an entry for each DWARF DIE whose
   1240 tag is one of:
   1241 
   1242 * DW_TAG_array_type
   1243 * DW_TAG_class_type
   1244 * DW_TAG_enumeration_type
   1245 * DW_TAG_pointer_type
   1246 * DW_TAG_reference_type
   1247 * DW_TAG_string_type
   1248 * DW_TAG_structure_type
   1249 * DW_TAG_subroutine_type
   1250 * DW_TAG_typedef
   1251 * DW_TAG_union_type
   1252 * DW_TAG_ptr_to_member_type
   1253 * DW_TAG_set_type
   1254 * DW_TAG_subrange_type
   1255 * DW_TAG_base_type
   1256 * DW_TAG_const_type
   1257 * DW_TAG_file_type
   1258 * DW_TAG_namelist
   1259 * DW_TAG_packed_type
   1260 * DW_TAG_volatile_type
   1261 * DW_TAG_restrict_type
   1262 * DW_TAG_interface_type
   1263 * DW_TAG_unspecified_type
   1264 * DW_TAG_shared_type
   1265 
   1266 Only entries with a ``DW_AT_name`` attribute are included, and the entry must
   1267 not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
   1268 value).  For example, using the following code:
   1269 
   1270 .. code-block:: c
   1271 
   1272   int main ()
   1273   {
   1274     int *b = 0;
   1275     return *b;
   1276   }
   1277 
   1278 We get a few type DIEs:
   1279 
   1280 .. code-block:: none
   1281 
   1282   0x00000067:     TAG_base_type [5]
   1283                   AT_encoding( DW_ATE_signed )
   1284                   AT_name( "int" )
   1285                   AT_byte_size( 0x04 )
   1286 
   1287   0x0000006e:     TAG_pointer_type [6]
   1288                   AT_type( {0x00000067} ( int ) )
   1289                   AT_byte_size( 0x08 )
   1290 
   1291 The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
   1292 
   1293 "``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
   1294 If we run into a namespace that has no name this is an anonymous namespace, and
   1295 the name should be output as "``(anonymous namespace)``" (without the quotes).
   1296 Why?  This matches the output of the ``abi::cxa_demangle()`` that is in the
   1297 standard C++ library that demangles mangled names.
   1298 
   1299 
   1300 Language Extensions and File Format Changes
   1301 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   1302 
   1303 Objective-C Extensions
   1304 """"""""""""""""""""""
   1305 
   1306 "``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
   1307 Objective-C class.  The name used in the hash table is the name of the
   1308 Objective-C class itself.  If the Objective-C class has a category, then an
   1309 entry is made for both the class name without the category, and for the class
   1310 name with the category.  So if we have a DIE at offset 0x1234 with a name of
   1311 method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
   1312 an entry for "``NSString``" that points to DIE 0x1234, and an entry for
   1313 "``NSString(my_additions)``" that points to 0x1234.  This allows us to quickly
   1314 track down all Objective-C methods for an Objective-C class when doing
   1315 expressions.  It is needed because of the dynamic nature of Objective-C where
   1316 anyone can add methods to a class.  The DWARF for Objective-C methods is also
   1317 emitted differently from C++ classes where the methods are not usually
   1318 contained in the class definition, they are scattered about across one or more
   1319 compile units.  Categories can also be defined in different shared libraries.
   1320 So we need to be able to quickly find all of the methods and class functions
   1321 given the Objective-C class name, or quickly find all methods and class
   1322 functions for a class + category name.  This table does not contain any
   1323 selector names, it just maps Objective-C class names (or class names +
   1324 category) to all of the methods and class functions.  The selectors are added
   1325 as function basenames in the "``.debug_names``" section.
   1326 
   1327 In the "``.apple_names``" section for Objective-C functions, the full name is
   1328 the entire function name with the brackets ("``-[NSString
   1329 stringWithCString:]``") and the basename is the selector only
   1330 ("``stringWithCString:``").
   1331 
   1332 Mach-O Changes
   1333 """"""""""""""
   1334 
   1335 The sections names for the apple hash tables are for non-mach-o files.  For
   1336 mach-o files, the sections should be contained in the ``__DWARF`` segment with
   1337 names as follows:
   1338 
   1339 * "``.apple_names``" -> "``__apple_names``"
   1340 * "``.apple_types``" -> "``__apple_types``"
   1341 * "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
   1342 * "``.apple_objc``" -> "``__apple_objc``"
   1343 
   1344 .. _codeview:
   1345 
   1346 CodeView Debug Info Format
   1347 ==========================
   1348 
   1349 LLVM supports emitting CodeView, the Microsoft debug info format, and this
   1350 section describes the design and implementation of that support.
   1351 
   1352 Format Background
   1353 -----------------
   1354 
   1355 CodeView as a format is clearly oriented around C++ debugging, and in C++, the
   1356 majority of debug information tends to be type information. Therefore, the
   1357 overriding design constraint of CodeView is the separation of type information
   1358 from other "symbol" information so that type information can be efficiently
   1359 merged across translation units. Both type information and symbol information is
   1360 generally stored as a sequence of records, where each record begins with a
   1361 16-bit record size and a 16-bit record kind.
   1362 
   1363 Type information is usually stored in the ``.debug$T`` section of the object
   1364 file.  All other debug info, such as line info, string table, symbol info, and
   1365 inlinee info, is stored in one or more ``.debug$S`` sections. There may only be
   1366 one ``.debug$T`` section per object file, since all other debug info refers to
   1367 it. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation,
   1368 the ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing
   1369 to the PDB. When using PDBs, symbol information appears to remain in the object
   1370 file ``.debug$S`` sections.
   1371 
   1372 Type records are referred to by their index, which is the number of records in
   1373 the stream before a given record plus ``0x1000``. Many common basic types, such
   1374 as the basic integral types and unqualified pointers to them, are represented
   1375 using type indices less than ``0x1000``. Such basic types are built in to
   1376 CodeView consumers and do not require type records.
   1377 
   1378 Each type record may only contain type indices that are less than its own type
   1379 index. This ensures that the graph of type stream references is acyclic. While
   1380 the source-level type graph may contain cycles through pointer types (consider a
   1381 linked list struct), these cycles are removed from the type stream by always
   1382 referring to the forward declaration record of user-defined record types. Only
   1383 "symbol" records in the ``.debug$S`` streams may refer to complete,
   1384 non-forward-declaration type records.
   1385 
   1386 Working with CodeView
   1387 ---------------------
   1388 
   1389 These are instructions for some common tasks for developers working to improve
   1390 LLVM's CodeView support. Most of them revolve around using the CodeView dumper
   1391 embedded in ``llvm-readobj``.
   1392 
   1393 * Testing MSVC's output::
   1394 
   1395     $ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file
   1396     $ llvm-readobj -codeview foo.obj
   1397 
   1398 * Getting LLVM IR debug info out of Clang::
   1399 
   1400     $ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm
   1401 
   1402   Use this to generate LLVM IR for LLVM test cases.
   1403 
   1404 * Generate and dump CodeView from LLVM IR metadata::
   1405 
   1406     $ llc foo.ll -filetype=obj -o foo.obj
   1407     $ llvm-readobj -codeview foo.obj > foo.txt
   1408 
   1409   Use this pattern in lit test cases and FileCheck the output of llvm-readobj
   1410 
   1411 Improving LLVM's CodeView support is a process of finding interesting type
   1412 records, constructing a C++ test case that makes MSVC emit those records,
   1413 dumping the records, understanding them, and then generating equivalent records
   1414 in LLVM's backend.
   1415