1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 <html> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 6 <title>Source Level Debugging with LLVM</title> 7 <link rel="stylesheet" href="_static/llvm.css" type="text/css"> 8 </head> 9 <body> 10 11 <h1>Source Level Debugging with LLVM</h1> 12 13 <table class="layout" style="width:100%"> 14 <tr class="layout"> 15 <td class="left"> 16 <ul> 17 <li><a href="#introduction">Introduction</a> 18 <ol> 19 <li><a href="#phil">Philosophy behind LLVM debugging information</a></li> 20 <li><a href="#consumers">Debug information consumers</a></li> 21 <li><a href="#debugopt">Debugging optimized code</a></li> 22 </ol></li> 23 <li><a href="#format">Debugging information format</a> 24 <ol> 25 <li><a href="#debug_info_descriptors">Debug information descriptors</a> 26 <ul> 27 <li><a href="#format_compile_units">Compile unit descriptors</a></li> 28 <li><a href="#format_files">File descriptors</a></li> 29 <li><a href="#format_global_variables">Global variable descriptors</a></li> 30 <li><a href="#format_subprograms">Subprogram descriptors</a></li> 31 <li><a href="#format_blocks">Block descriptors</a></li> 32 <li><a href="#format_basic_type">Basic type descriptors</a></li> 33 <li><a href="#format_derived_type">Derived type descriptors</a></li> 34 <li><a href="#format_composite_type">Composite type descriptors</a></li> 35 <li><a href="#format_subrange">Subrange descriptors</a></li> 36 <li><a href="#format_enumeration">Enumerator descriptors</a></li> 37 <li><a href="#format_variables">Local variables</a></li> 38 </ul></li> 39 <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a> 40 <ul> 41 <li><a href="#format_common_declare">llvm.dbg.declare</a></li> 42 <li><a href="#format_common_value">llvm.dbg.value</a></li> 43 </ul></li> 44 </ol></li> 45 <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li> 46 <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a> 47 <ol> 48 <li><a href="#ccxx_compile_units">C/C++ source file information</a></li> 49 <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li> 50 <li><a href="#ccxx_subprogram">C/C++ function information</a></li> 51 <li><a href="#ccxx_basic_types">C/C++ basic types</a></li> 52 <li><a href="#ccxx_derived_types">C/C++ derived types</a></li> 53 <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li> 54 <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li> 55 </ol></li> 56 <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a> 57 <ol> 58 <li><a href="#objcproperty">Debugging Information Extension 59 for Objective C Properties</a> 60 <ul> 61 <li><a href="#objcpropertyintroduction">Introduction</a></li> 62 <li><a href="#objcpropertyproposal">Proposal</a></li> 63 <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li> 64 <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li> 65 </ul> 66 </li> 67 <li><a href="#acceltable">Name Accelerator Tables</a> 68 <ul> 69 <li><a href="#acceltableintroduction">Introduction</a></li> 70 <li><a href="#acceltablehashes">Hash Tables</a></li> 71 <li><a href="#acceltabledetails">Details</a></li> 72 <li><a href="#acceltablecontents">Contents</a></li> 73 <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li> 74 </ul> 75 </li> 76 </ol> 77 </li> 78 </ul> 79 </td> 80 </tr></table> 81 82 <div class="doc_author"> 83 <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a> 84 and <a href="mailto:jlaskey (a] mac.com">Jim Laskey</a></p> 85 </div> 86 87 88 <!-- *********************************************************************** --> 89 <h2><a name="introduction">Introduction</a></h2> 90 <!-- *********************************************************************** --> 91 92 <div> 93 94 <p>This document is the central repository for all information pertaining to 95 debug information in LLVM. It describes the <a href="#format">actual format 96 that the LLVM debug information</a> takes, which is useful for those 97 interested in creating front-ends or dealing directly with the information. 98 Further, this document provides specific examples of what debug information 99 for C/C++ looks like.</p> 100 101 <!-- ======================================================================= --> 102 <h3> 103 <a name="phil">Philosophy behind LLVM debugging information</a> 104 </h3> 105 106 <div> 107 108 <p>The idea of the LLVM debugging information is to capture how the important 109 pieces of the source-language's Abstract Syntax Tree map onto LLVM code. 110 Several design aspects have shaped the solution that appears here. The 111 important ones are:</p> 112 113 <ul> 114 <li>Debugging information should have very little impact on the rest of the 115 compiler. No transformations, analyses, or code generators should need to 116 be modified because of debugging information.</li> 117 118 <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and 119 easily described ways</a> with the debugging information.</li> 120 121 <li>Because LLVM is designed to support arbitrary programming languages, 122 LLVM-to-LLVM tools should not need to know anything about the semantics of 123 the source-level-language.</li> 124 125 <li>Source-level languages are often <b>widely</b> different from one another. 126 LLVM should not put any restrictions of the flavor of the source-language, 127 and the debugging information should work with any language.</li> 128 129 <li>With code generator support, it should be possible to use an LLVM compiler 130 to compile a program to native machine code and standard debugging 131 formats. This allows compatibility with traditional machine-code level 132 debuggers, like GDB or DBX.</li> 133 </ul> 134 135 <p>The approach used by the LLVM implementation is to use a small set 136 of <a href="#format_common_intrinsics">intrinsic functions</a> to define a 137 mapping between LLVM program objects and the source-level objects. The 138 description of the source-level program is maintained in LLVM metadata 139 in an <a href="#ccxx_frontend">implementation-defined format</a> 140 (the C/C++ front-end currently uses working draft 7 of 141 the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3 142 standard</a>).</p> 143 144 <p>When a program is being debugged, a debugger interacts with the user and 145 turns the stored debug information into source-language specific information. 146 As such, a debugger must be aware of the source-language, and is thus tied to 147 a specific language or family of languages.</p> 148 149 </div> 150 151 <!-- ======================================================================= --> 152 <h3> 153 <a name="consumers">Debug information consumers</a> 154 </h3> 155 156 <div> 157 158 <p>The role of debug information is to provide meta information normally 159 stripped away during the compilation process. This meta information provides 160 an LLVM user a relationship between generated code and the original program 161 source code.</p> 162 163 <p>Currently, debug information is consumed by DwarfDebug to produce dwarf 164 information used by the gdb debugger. Other targets could use the same 165 information to produce stabs or other debug forms.</p> 166 167 <p>It would also be reasonable to use debug information to feed profiling tools 168 for analysis of generated code, or, tools for reconstructing the original 169 source from generated code.</p> 170 171 <p>TODO - expound a bit more.</p> 172 173 </div> 174 175 <!-- ======================================================================= --> 176 <h3> 177 <a name="debugopt">Debugging optimized code</a> 178 </h3> 179 180 <div> 181 182 <p>An extremely high priority of LLVM debugging information is to make it 183 interact well with optimizations and analysis. In particular, the LLVM debug 184 information provides the following guarantees:</p> 185 186 <ul> 187 <li>LLVM debug information <b>always provides information to accurately read 188 the source-level state of the program</b>, regardless of which LLVM 189 optimizations have been run, and without any modification to the 190 optimizations themselves. However, some optimizations may impact the 191 ability to modify the current state of the program with a debugger, such 192 as setting program variables, or calling functions that have been 193 deleted.</li> 194 195 <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM 196 debugging information, allowing them to update the debugging information 197 as they perform aggressive optimizations. This means that, with effort, 198 the LLVM optimizers could optimize debug code just as well as non-debug 199 code.</li> 200 201 <li>LLVM debug information does not prevent optimizations from 202 happening (for example inlining, basic block reordering/merging/cleanup, 203 tail duplication, etc).</li> 204 205 <li>LLVM debug information is automatically optimized along with the rest of 206 the program, using existing facilities. For example, duplicate 207 information is automatically merged by the linker, and unused information 208 is automatically removed.</li> 209 </ul> 210 211 <p>Basically, the debug information allows you to compile a program with 212 "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily 213 modify the program as it executes from a debugger. Compiling a program with 214 "<tt>-O3 -g</tt>" gives you full debug information that is always available 215 and accurate for reading (e.g., you get accurate stack traces despite tail 216 call elimination and inlining), but you might lose the ability to modify the 217 program and call functions where were optimized out of the program, or 218 inlined away completely.</p> 219 220 <p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a 221 framework to test optimizer's handling of debugging information. It can be 222 run like this:</p> 223 224 <div class="doc_code"> 225 <pre> 226 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 227 % make TEST=dbgopt 228 </pre> 229 </div> 230 231 <p>This will test impact of debugging information on optimization passes. If 232 debugging information influences optimization passes then it will be reported 233 as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more 234 information on LLVM test infrastructure and how to run various tests.</p> 235 236 </div> 237 238 </div> 239 240 <!-- *********************************************************************** --> 241 <h2> 242 <a name="format">Debugging information format</a> 243 </h2> 244 <!-- *********************************************************************** --> 245 246 <div> 247 248 <p>LLVM debugging information has been carefully designed to make it possible 249 for the optimizer to optimize the program and debugging information without 250 necessarily having to know anything about debugging information. In 251 particular, the use of metadata avoids duplicated debugging information from 252 the beginning, and the global dead code elimination pass automatically 253 deletes debugging information for a function if it decides to delete the 254 function. </p> 255 256 <p>To do this, most of the debugging information (descriptors for types, 257 variables, functions, source files, etc) is inserted by the language 258 front-end in the form of LLVM metadata. </p> 259 260 <p>Debug information is designed to be agnostic about the target debugger and 261 debugging information representation (e.g. DWARF/Stabs/etc). It uses a 262 generic pass to decode the information that represents variables, types, 263 functions, namespaces, etc: this allows for arbitrary source-language 264 semantics and type-systems to be used, as long as there is a module 265 written for the target debugger to interpret the information. </p> 266 267 <p>To provide basic functionality, the LLVM debugger does have to make some 268 assumptions about the source-level language being debugged, though it keeps 269 these to a minimum. The only common features that the LLVM debugger assumes 270 exist are <a href="#format_files">source files</a>, 271 and <a href="#format_global_variables">program objects</a>. These abstract 272 objects are used by a debugger to form stack traces, show information about 273 local variables, etc.</p> 274 275 <p>This section of the documentation first describes the representation aspects 276 common to any source-language. The <a href="#ccxx_frontend">next section</a> 277 describes the data layout conventions used by the C and C++ front-ends.</p> 278 279 <!-- ======================================================================= --> 280 <h3> 281 <a name="debug_info_descriptors">Debug information descriptors</a> 282 </h3> 283 284 <div> 285 286 <p>In consideration of the complexity and volume of debug information, LLVM 287 provides a specification for well formed debug descriptors. </p> 288 289 <p>Consumers of LLVM debug information expect the descriptors for program 290 objects to start in a canonical format, but the descriptors can include 291 additional information appended at the end that is source-language 292 specific. All LLVM debugging information is versioned, allowing backwards 293 compatibility in the case that the core structures need to change in some 294 way. Also, all debugging information objects start with a tag to indicate 295 what type of object it is. The source-language is allowed to define its own 296 objects, by using unreserved tag numbers. We recommend using with tags in 297 the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = 298 0x1000.)</p> 299 300 <p>The fields of debug descriptors used internally by LLVM 301 are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>, 302 <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p> 303 304 <div class="doc_code"> 305 <pre> 306 !1 = metadata !{ 307 i32, ;; A tag 308 ... 309 } 310 </pre> 311 </div> 312 313 <p><a name="LLVMDebugVersion">The first field of a descriptor is always an 314 <tt>i32</tt> containing a tag value identifying the content of the 315 descriptor. The remaining fields are specific to the descriptor. The values 316 of tags are loosely bound to the tag values of DWARF information entries. 317 However, that does not restrict the use of the information supplied to DWARF 318 targets. To facilitate versioning of debug information, the tag is augmented 319 with the current debug version (LLVMDebugVersion = 8 << 16 or 320 0x80000 or 524288.)</a></p> 321 322 <p>The details of the various descriptors follow.</p> 323 324 <!-- ======================================================================= --> 325 <h4> 326 <a name="format_compile_units">Compile unit descriptors</a> 327 </h4> 328 329 <div> 330 331 <div class="doc_code"> 332 <pre> 333 !0 = metadata !{ 334 i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 335 ;; (DW_TAG_compile_unit) 336 i32, ;; Unused field. 337 i32, ;; DWARF language identifier (ex. DW_LANG_C89) 338 metadata, ;; Source file name 339 metadata, ;; Source file directory (includes trailing slash) 340 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") 341 i1, ;; True if this is a main compile unit. 342 i1, ;; True if this is optimized. 343 metadata, ;; Flags 344 i32 ;; Runtime version 345 metadata ;; List of enums types 346 metadata ;; List of retained types 347 metadata ;; List of subprograms 348 metadata ;; List of global variables 349 } 350 </pre> 351 </div> 352 353 <p>These descriptors contain a source language ID for the file (we use the DWARF 354 3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, 355 <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename, 356 working directory of the compiler, and an identifier string for the compiler 357 that produced it.</p> 358 359 <p>Compile unit descriptors provide the root context for objects declared in a 360 specific compilation unit. File descriptors are defined using this context. 361 These descriptors are collected by a named metadata 362 <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms, 363 global variables and type information. 364 365 </div> 366 367 <!-- ======================================================================= --> 368 <h4> 369 <a name="format_files">File descriptors</a> 370 </h4> 371 372 <div> 373 374 <div class="doc_code"> 375 <pre> 376 !0 = metadata !{ 377 i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 378 ;; (DW_TAG_file_type) 379 metadata, ;; Source file name 380 metadata, ;; Source file directory (includes trailing slash) 381 metadata ;; Unused 382 } 383 </pre> 384 </div> 385 386 <p>These descriptors contain information for a file. Global variables and top 387 level functions would be defined using this context.k File descriptors also 388 provide context for source line correspondence. </p> 389 390 <p>Each input file is encoded as a separate file descriptor in LLVM debugging 391 information output. </p> 392 393 </div> 394 395 <!-- ======================================================================= --> 396 <h4> 397 <a name="format_global_variables">Global variable descriptors</a> 398 </h4> 399 400 <div> 401 402 <div class="doc_code"> 403 <pre> 404 !1 = metadata !{ 405 i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 406 ;; (DW_TAG_variable) 407 i32, ;; Unused field. 408 metadata, ;; Reference to context descriptor 409 metadata, ;; Name 410 metadata, ;; Display name (fully qualified C++ name) 411 metadata, ;; MIPS linkage name (for C++) 412 metadata, ;; Reference to file where defined 413 i32, ;; Line number where defined 414 metadata, ;; Reference to type descriptor 415 i1, ;; True if the global is local to compile unit (static) 416 i1, ;; True if the global is defined in the compile unit (not extern) 417 {}* ;; Reference to the global variable 418 } 419 </pre> 420 </div> 421 422 <p>These descriptors provide debug information about globals variables. The 423 provide details such as name, type and where the variable is defined. All 424 global variables are collected inside the named metadata 425 <tt>!llvm.dbg.cu</tt>.</p> 426 427 </div> 428 429 <!-- ======================================================================= --> 430 <h4> 431 <a name="format_subprograms">Subprogram descriptors</a> 432 </h4> 433 434 <div> 435 436 <div class="doc_code"> 437 <pre> 438 !2 = metadata !{ 439 i32, ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 440 ;; (DW_TAG_subprogram) 441 i32, ;; Unused field. 442 metadata, ;; Reference to context descriptor 443 metadata, ;; Name 444 metadata, ;; Display name (fully qualified C++ name) 445 metadata, ;; MIPS linkage name (for C++) 446 metadata, ;; Reference to file where defined 447 i32, ;; Line number where defined 448 metadata, ;; Reference to type descriptor 449 i1, ;; True if the global is local to compile unit (static) 450 i1, ;; True if the global is defined in the compile unit (not extern) 451 i32, ;; Line number where the scope of the subprogram begins 452 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual 453 i32, ;; Index into a virtual function 454 metadata, ;; indicates which base type contains the vtable pointer for the 455 ;; derived class 456 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. 457 i1, ;; isOptimized 458 Function *,;; Pointer to LLVM function 459 metadata, ;; Lists function template parameters 460 metadata ;; Function declaration descriptor 461 metadata ;; List of function variables 462 } 463 </pre> 464 </div> 465 466 <p>These descriptors provide debug information about functions, methods and 467 subprograms. They provide details such as name, return types and the source 468 location where the subprogram is defined. 469 </p> 470 471 </div> 472 473 <!-- ======================================================================= --> 474 <h4> 475 <a name="format_blocks">Block descriptors</a> 476 </h4> 477 478 <div> 479 480 <div class="doc_code"> 481 <pre> 482 !3 = metadata !{ 483 i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) 484 metadata,;; Reference to context descriptor 485 i32, ;; Line number 486 i32, ;; Column number 487 metadata,;; Reference to source file 488 i32 ;; Unique ID to identify blocks from a template function 489 } 490 </pre> 491 </div> 492 493 <p>This descriptor provides debug information about nested blocks within a 494 subprogram. The line number and column numbers are used to dinstinguish 495 two lexical blocks at same depth. </p> 496 497 <div class="doc_code"> 498 <pre> 499 !3 = metadata !{ 500 i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) 501 metadata ;; Reference to the scope we're annotating with a file change 502 metadata,;; Reference to the file the scope is enclosed in. 503 } 504 </pre> 505 </div> 506 507 <p>This descriptor provides a wrapper around a lexical scope to handle file 508 changes in the middle of a lexical block.</p> 509 510 </div> 511 512 <!-- ======================================================================= --> 513 <h4> 514 <a name="format_basic_type">Basic type descriptors</a> 515 </h4> 516 517 <div> 518 519 <div class="doc_code"> 520 <pre> 521 !4 = metadata !{ 522 i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 523 ;; (DW_TAG_base_type) 524 metadata, ;; Reference to context 525 metadata, ;; Name (may be "" for anonymous types) 526 metadata, ;; Reference to file where defined (may be NULL) 527 i32, ;; Line number where defined (may be 0) 528 i64, ;; Size in bits 529 i64, ;; Alignment in bits 530 i64, ;; Offset in bits 531 i32, ;; Flags 532 i32 ;; DWARF type encoding 533 } 534 </pre> 535 </div> 536 537 <p>These descriptors define primitive types used in the code. Example int, bool 538 and float. The context provides the scope of the type, which is usually the 539 top level. Since basic types are not usually user defined the context 540 and line number can be left as NULL and 0. The size, alignment and offset 541 are expressed in bits and can be 64 bit values. The alignment is used to 542 round the offset when embedded in a 543 <a href="#format_composite_type">composite type</a> (example to keep float 544 doubles on 64 bit boundaries.) The offset is the bit offset if embedded in 545 a <a href="#format_composite_type">composite type</a>.</p> 546 547 <p>The type encoding provides the details of the type. The values are typically 548 one of the following:</p> 549 550 <div class="doc_code"> 551 <pre> 552 DW_ATE_address = 1 553 DW_ATE_boolean = 2 554 DW_ATE_float = 4 555 DW_ATE_signed = 5 556 DW_ATE_signed_char = 6 557 DW_ATE_unsigned = 7 558 DW_ATE_unsigned_char = 8 559 </pre> 560 </div> 561 562 </div> 563 564 <!-- ======================================================================= --> 565 <h4> 566 <a name="format_derived_type">Derived type descriptors</a> 567 </h4> 568 569 <div> 570 571 <div class="doc_code"> 572 <pre> 573 !5 = metadata !{ 574 i32, ;; Tag (see below) 575 metadata, ;; Reference to context 576 metadata, ;; Name (may be "" for anonymous types) 577 metadata, ;; Reference to file where defined (may be NULL) 578 i32, ;; Line number where defined (may be 0) 579 i64, ;; Size in bits 580 i64, ;; Alignment in bits 581 i64, ;; Offset in bits 582 i32, ;; Flags to encode attributes, e.g. private 583 metadata, ;; Reference to type derived from 584 metadata, ;; (optional) Name of the Objective C property associated with 585 ;; Objective-C an ivar 586 metadata, ;; (optional) Name of the Objective C property getter selector. 587 metadata, ;; (optional) Name of the Objective C property setter selector. 588 i32 ;; (optional) Objective C property attributes. 589 } 590 </pre> 591 </div> 592 593 <p>These descriptors are used to define types derived from other types. The 594 value of the tag varies depending on the meaning. The following are possible 595 tag values:</p> 596 597 <div class="doc_code"> 598 <pre> 599 DW_TAG_formal_parameter = 5 600 DW_TAG_member = 13 601 DW_TAG_pointer_type = 15 602 DW_TAG_reference_type = 16 603 DW_TAG_typedef = 22 604 DW_TAG_const_type = 38 605 DW_TAG_volatile_type = 53 606 DW_TAG_restrict_type = 55 607 </pre> 608 </div> 609 610 <p><tt>DW_TAG_member</tt> is used to define a member of 611 a <a href="#format_composite_type">composite type</a> 612 or <a href="#format_subprograms">subprogram</a>. The type of the member is 613 the <a href="#format_derived_type">derived 614 type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which 615 is a formal argument of a subprogram.</p> 616 617 <p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p> 618 619 <p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>, 620 <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and 621 <tt>DW_TAG_restrict_type</tt> are used to qualify 622 the <a href="#format_derived_type">derived type</a>. </p> 623 624 <p><a href="#format_derived_type">Derived type</a> location can be determined 625 from the context and line number. The size, alignment and offset are 626 expressed in bits and can be 64 bit values. The alignment is used to round 627 the offset when embedded in a <a href="#format_composite_type">composite 628 type</a> (example to keep float doubles on 64 bit boundaries.) The offset is 629 the bit offset if embedded in a <a href="#format_composite_type">composite 630 type</a>.</p> 631 632 <p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL. 633 </p> 634 635 </div> 636 637 <!-- ======================================================================= --> 638 <h4> 639 <a name="format_composite_type">Composite type descriptors</a> 640 </h4> 641 642 <div> 643 644 <div class="doc_code"> 645 <pre> 646 !6 = metadata !{ 647 i32, ;; Tag (see below) 648 metadata, ;; Reference to context 649 metadata, ;; Name (may be "" for anonymous types) 650 metadata, ;; Reference to file where defined (may be NULL) 651 i32, ;; Line number where defined (may be 0) 652 i64, ;; Size in bits 653 i64, ;; Alignment in bits 654 i64, ;; Offset in bits 655 i32, ;; Flags 656 metadata, ;; Reference to type derived from 657 metadata, ;; Reference to array of member descriptors 658 i32 ;; Runtime languages 659 } 660 </pre> 661 </div> 662 663 <p>These descriptors are used to define types that are composed of 0 or more 664 elements. The value of the tag varies depending on the meaning. The following 665 are possible tag values:</p> 666 667 <div class="doc_code"> 668 <pre> 669 DW_TAG_array_type = 1 670 DW_TAG_enumeration_type = 4 671 DW_TAG_structure_type = 19 672 DW_TAG_union_type = 23 673 DW_TAG_vector_type = 259 674 DW_TAG_subroutine_type = 21 675 DW_TAG_inheritance = 28 676 </pre> 677 </div> 678 679 <p>The vector flag indicates that an array type is a native packed vector.</p> 680 681 <p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types 682 (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange 683 descriptors</a>, each representing the range of subscripts at that level of 684 indexing.</p> 685 686 <p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are 687 <a href="#format_enumeration">enumerator descriptors</a>, each representing 688 the definition of enumeration value for the set. All enumeration type 689 descriptors are collected inside the named metadata 690 <tt>!llvm.dbg.cu</tt>.</p> 691 692 <p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag 693 = <tt>DW_TAG_union_type</tt>) types are any one of 694 the <a href="#format_basic_type">basic</a>, 695 <a href="#format_derived_type">derived</a> 696 or <a href="#format_composite_type">composite</a> type descriptors, each 697 representing a field member of the structure or union.</p> 698 699 <p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors 700 provide information about base classes, static members and member 701 functions. If a member is a <a href="#format_derived_type">derived type 702 descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type 703 represents a base class. If the member of is 704 a <a href="#format_global_variables">global variable descriptor</a> then it 705 represents a static member. And, if the member is 706 a <a href="#format_subprograms">subprogram descriptor</a> then it represents 707 a member function. For static members and member 708 functions, <tt>getName()</tt> returns the members link or the C++ mangled 709 name. <tt>getDisplayName()</tt> the simplied version of the name.</p> 710 711 <p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type 712 elements is the return type for the subroutine. The remaining elements are 713 the formal arguments to the subroutine.</p> 714 715 <p><a href="#format_composite_type">Composite type</a> location can be 716 determined from the context and line number. The size, alignment and 717 offset are expressed in bits and can be 64 bit values. The alignment is used 718 to round the offset when embedded in 719 a <a href="#format_composite_type">composite type</a> (as an example, to keep 720 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded 721 in a <a href="#format_composite_type">composite type</a>.</p> 722 723 </div> 724 725 <!-- ======================================================================= --> 726 <h4> 727 <a name="format_subrange">Subrange descriptors</a> 728 </h4> 729 730 <div> 731 732 <div class="doc_code"> 733 <pre> 734 !42 = metadata !{ 735 i32, ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type) 736 i64, ;; Low value 737 i64 ;; High value 738 } 739 </pre> 740 </div> 741 742 <p>These descriptors are used to define ranges of array subscripts for an array 743 <a href="#format_composite_type">composite type</a>. The low value defines 744 the lower bounds typically zero for C/C++. The high value is the upper 745 bounds. Values are 64 bit. High - low + 1 is the size of the array. If low 746 > high the array bounds are not included in generated debugging information. 747 </p> 748 749 </div> 750 751 <!-- ======================================================================= --> 752 <h4> 753 <a name="format_enumeration">Enumerator descriptors</a> 754 </h4> 755 756 <div> 757 758 <div class="doc_code"> 759 <pre> 760 !6 = metadata !{ 761 i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 762 ;; (DW_TAG_enumerator) 763 metadata, ;; Name 764 i64 ;; Value 765 } 766 </pre> 767 </div> 768 769 <p>These descriptors are used to define members of an 770 enumeration <a href="#format_composite_type">composite type</a>, it 771 associates the name to the value.</p> 772 773 </div> 774 775 <!-- ======================================================================= --> 776 <h4> 777 <a name="format_variables">Local variables</a> 778 </h4> 779 780 <div> 781 782 <div class="doc_code"> 783 <pre> 784 !7 = metadata !{ 785 i32, ;; Tag (see below) 786 metadata, ;; Context 787 metadata, ;; Name 788 metadata, ;; Reference to file where defined 789 i32, ;; 24 bit - Line number where defined 790 ;; 8 bit - Argument number. 1 indicates 1st argument. 791 metadata, ;; Type descriptor 792 i32, ;; flags 793 metadata ;; (optional) Reference to inline location 794 } 795 </pre> 796 </div> 797 798 <p>These descriptors are used to define variables local to a sub program. The 799 value of the tag depends on the usage of the variable:</p> 800 801 <div class="doc_code"> 802 <pre> 803 DW_TAG_auto_variable = 256 804 DW_TAG_arg_variable = 257 805 DW_TAG_return_variable = 258 806 </pre> 807 </div> 808 809 <p>An auto variable is any variable declared in the body of the function. An 810 argument variable is any variable that appears as a formal argument to the 811 function. A return variable is used to track the result of a function and 812 has no source correspondent.</p> 813 814 <p>The context is either the subprogram or block where the variable is defined. 815 Name the source variable name. Context and line indicate where the 816 variable was defined. Type descriptor defines the declared type of the 817 variable.</p> 818 819 </div> 820 821 </div> 822 823 <!-- ======================================================================= --> 824 <h3> 825 <a name="format_common_intrinsics">Debugger intrinsic functions</a> 826 </h3> 827 828 <div> 829 830 <p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to 831 provide debug information at various points in generated code.</p> 832 833 <!-- ======================================================================= --> 834 <h4> 835 <a name="format_common_declare">llvm.dbg.declare</a> 836 </h4> 837 838 <div> 839 <pre> 840 void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata) 841 </pre> 842 843 <p>This intrinsic provides information about a local element (e.g., variable). The 844 first argument is metadata holding the alloca for the variable. The 845 second argument is metadata containing a description of the variable.</p> 846 </div> 847 848 <!-- ======================================================================= --> 849 <h4> 850 <a name="format_common_value">llvm.dbg.value</a> 851 </h4> 852 853 <div> 854 <pre> 855 void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata) 856 </pre> 857 858 <p>This intrinsic provides information when a user source variable is set to a 859 new value. The first argument is the new value (wrapped as metadata). The 860 second argument is the offset in the user source variable where the new value 861 is written. The third argument is metadata containing a description of the 862 user source variable.</p> 863 </div> 864 865 </div> 866 867 <!-- ======================================================================= --> 868 <h3> 869 <a name="format_common_lifetime">Object lifetimes and scoping</a> 870 </h3> 871 872 <div> 873 <p>In many languages, the local variables in functions can have their lifetimes 874 or scopes limited to a subset of a function. In the C family of languages, 875 for example, variables are only live (readable and writable) within the 876 source block that they are defined in. In functional languages, values are 877 only readable after they have been defined. Though this is a very obvious 878 concept, it is non-trivial to model in LLVM, because it has no notion of 879 scoping in this sense, and does not want to be tied to a language's scoping 880 rules.</p> 881 882 <p>In order to handle this, the LLVM debug format uses the metadata attached to 883 llvm instructions to encode line number and scoping information. Consider 884 the following C fragment, for example:</p> 885 886 <div class="doc_code"> 887 <pre> 888 1. void foo() { 889 2. int X = 21; 890 3. int Y = 22; 891 4. { 892 5. int Z = 23; 893 6. Z = X; 894 7. } 895 8. X = Y; 896 9. } 897 </pre> 898 </div> 899 900 <p>Compiled to LLVM, this function would be represented like this:</p> 901 902 <div class="doc_code"> 903 <pre> 904 define void @foo() nounwind ssp { 905 entry: 906 %X = alloca i32, align 4 ; <i32*> [#uses=4] 907 %Y = alloca i32, align 4 ; <i32*> [#uses=4] 908 %Z = alloca i32, align 4 ; <i32*> [#uses=3] 909 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] 910 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 911 store i32 21, i32* %X, !dbg !8 912 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] 913 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 914 store i32 22, i32* %Y, !dbg !11 915 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] 916 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 917 store i32 23, i32* %Z, !dbg !15 918 %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] 919 %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] 920 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] 921 store i32 %add, i32* %Z, !dbg !16 922 %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] 923 store i32 %tmp2, i32* %X, !dbg !17 924 ret void, !dbg !18 925 } 926 927 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone 928 929 !0 = metadata !{i32 459008, metadata !1, metadata !"X", 930 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] 931 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 932 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", 933 metadata !"foo", metadata !3, i32 1, metadata !4, 934 i1 false, i1 true}; [DW_TAG_subprogram ] 935 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", 936 metadata !"/private/tmp", metadata !"clang 1.1", i1 true, 937 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] 938 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, 939 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] 940 !5 = metadata !{null} 941 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, 942 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] 943 !7 = metadata !{i32 2, i32 7, metadata !1, null} 944 !8 = metadata !{i32 2, i32 3, metadata !1, null} 945 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, 946 metadata !6}; [ DW_TAG_auto_variable ] 947 !10 = metadata !{i32 3, i32 7, metadata !1, null} 948 !11 = metadata !{i32 3, i32 3, metadata !1, null} 949 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, 950 metadata !6}; [ DW_TAG_auto_variable ] 951 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 952 !14 = metadata !{i32 5, i32 9, metadata !13, null} 953 !15 = metadata !{i32 5, i32 5, metadata !13, null} 954 !16 = metadata !{i32 6, i32 5, metadata !13, null} 955 !17 = metadata !{i32 8, i32 3, metadata !1, null} 956 !18 = metadata !{i32 9, i32 1, metadata !2, null} 957 </pre> 958 </div> 959 960 <p>This example illustrates a few important details about LLVM debugging 961 information. In particular, it shows how the <tt>llvm.dbg.declare</tt> 962 intrinsic and location information, which are attached to an instruction, 963 are applied together to allow a debugger to analyze the relationship between 964 statements, variable definitions, and the code used to implement the 965 function.</p> 966 967 <div class="doc_code"> 968 <pre> 969 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 970 </pre> 971 </div> 972 973 <p>The first intrinsic 974 <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> 975 encodes debugging information for the variable <tt>X</tt>. The metadata 976 <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the 977 variable <tt>X</tt>.</p> 978 979 <div class="doc_code"> 980 <pre> 981 !7 = metadata !{i32 2, i32 7, metadata !1, null} 982 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 983 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", 984 metadata !"foo", metadata !"foo", metadata !3, i32 1, 985 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] 986 </pre> 987 </div> 988 989 <p>Here <tt>!7</tt> is metadata providing location information. It has four 990 fields: line number, column number, scope, and original scope. The original 991 scope represents inline location if this instruction is inlined inside a 992 caller, and is null otherwise. In this example, scope is encoded by 993 <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope 994 <tt>!2</tt>, where <tt>!2</tt> is a 995 <a href="#format_subprograms">subprogram descriptor</a>. This way the 996 location information attached to the intrinsics indicates that the 997 variable <tt>X</tt> is declared at line number 2 at a function level scope in 998 function <tt>foo</tt>.</p> 999 1000 <p>Now lets take another example.</p> 1001 1002 <div class="doc_code"> 1003 <pre> 1004 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 1005 </pre> 1006 </div> 1007 1008 <p>The second intrinsic 1009 <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> 1010 encodes debugging information for variable <tt>Z</tt>. The metadata 1011 <tt>!dbg !14</tt> attached to the intrinsic provides scope information for 1012 the variable <tt>Z</tt>.</p> 1013 1014 <div class="doc_code"> 1015 <pre> 1016 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 1017 !14 = metadata !{i32 5, i32 9, metadata !13, null} 1018 </pre> 1019 </div> 1020 1021 <p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and 1022 column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope 1023 itself resides inside of lexical scope <tt>!1</tt> described above.</p> 1024 1025 <p>The scope information attached with each instruction provides a 1026 straightforward way to find instructions covered by a scope.</p> 1027 1028 </div> 1029 1030 </div> 1031 1032 <!-- *********************************************************************** --> 1033 <h2> 1034 <a name="ccxx_frontend">C/C++ front-end specific debug information</a> 1035 </h2> 1036 <!-- *********************************************************************** --> 1037 1038 <div> 1039 1040 <p>The C and C++ front-ends represent information about the program in a format 1041 that is effectively identical 1042 to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in 1043 terms of information content. This allows code generators to trivially 1044 support native debuggers by generating standard dwarf information, and 1045 contains enough information for non-dwarf targets to translate it as 1046 needed.</p> 1047 1048 <p>This section describes the forms used to represent C and C++ programs. Other 1049 languages could pattern themselves after this (which itself is tuned to 1050 representing programs in the same way that DWARF 3 does), or they could 1051 choose to provide completely different forms if they don't fit into the DWARF 1052 model. As support for debugging information gets added to the various LLVM 1053 source-language front-ends, the information used should be documented 1054 here.</p> 1055 1056 <p>The following sections provide examples of various C/C++ constructs and the 1057 debug information that would best describe those constructs.</p> 1058 1059 <!-- ======================================================================= --> 1060 <h3> 1061 <a name="ccxx_compile_units">C/C++ source file information</a> 1062 </h3> 1063 1064 <div> 1065 1066 <p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located 1067 in the directory <tt>/Users/mine/sources</tt>, the following code:</p> 1068 1069 <div class="doc_code"> 1070 <pre> 1071 #include "MyHeader.h" 1072 1073 int main(int argc, char *argv[]) { 1074 return 0; 1075 } 1076 </pre> 1077 </div> 1078 1079 <p>a C/C++ front-end would generate the following descriptors:</p> 1080 1081 <div class="doc_code"> 1082 <pre> 1083 ... 1084 ;; 1085 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". 1086 ;; 1087 !2 = metadata !{ 1088 i32 524305, ;; Tag 1089 i32 0, ;; Unused 1090 i32 4, ;; Language Id 1091 metadata !"MySource.cpp", 1092 metadata !"/Users/mine/sources", 1093 metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", 1094 i1 true, ;; Main Compile Unit 1095 i1 false, ;; Optimized compile unit 1096 metadata !"", ;; Compiler flags 1097 i32 0} ;; Runtime version 1098 1099 ;; 1100 ;; Define the file for the file "/Users/mine/sources/MySource.cpp". 1101 ;; 1102 !1 = metadata !{ 1103 i32 524329, ;; Tag 1104 metadata !"MySource.cpp", 1105 metadata !"/Users/mine/sources", 1106 metadata !2 ;; Compile unit 1107 } 1108 1109 ;; 1110 ;; Define the file for the file "/Users/mine/sources/Myheader.h" 1111 ;; 1112 !3 = metadata !{ 1113 i32 524329, ;; Tag 1114 metadata !"Myheader.h" 1115 metadata !"/Users/mine/sources", 1116 metadata !2 ;; Compile unit 1117 } 1118 1119 ... 1120 </pre> 1121 </div> 1122 1123 <p>llvm::Instruction provides easy access to metadata attached with an 1124 instruction. One can extract line number information encoded in LLVM IR 1125 using <tt>Instruction::getMetadata()</tt> and 1126 <tt>DILocation::getLineNumber()</tt>. 1127 <pre> 1128 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction 1129 DILocation Loc(N); // DILocation is in DebugInfo.h 1130 unsigned Line = Loc.getLineNumber(); 1131 StringRef File = Loc.getFilename(); 1132 StringRef Dir = Loc.getDirectory(); 1133 } 1134 </pre> 1135 </div> 1136 1137 <!-- ======================================================================= --> 1138 <h3> 1139 <a name="ccxx_global_variable">C/C++ global variable information</a> 1140 </h3> 1141 1142 <div> 1143 1144 <p>Given an integer global variable declared as follows:</p> 1145 1146 <div class="doc_code"> 1147 <pre> 1148 int MyGlobal = 100; 1149 </pre> 1150 </div> 1151 1152 <p>a C/C++ front-end would generate the following descriptors:</p> 1153 1154 <div class="doc_code"> 1155 <pre> 1156 ;; 1157 ;; Define the global itself. 1158 ;; 1159 %MyGlobal = global int 100 1160 ... 1161 ;; 1162 ;; List of debug info of globals 1163 ;; 1164 !llvm.dbg.cu = !{!0} 1165 1166 ;; Define the compile unit. 1167 !0 = metadata !{ 1168 i32 786449, ;; Tag 1169 i32 0, ;; Context 1170 i32 4, ;; Language 1171 metadata !"foo.cpp", ;; File 1172 metadata !"/Volumes/Data/tmp", ;; Directory 1173 metadata !"clang version 3.1 ", ;; Producer 1174 i1 true, ;; Deprecated field 1175 i1 false, ;; "isOptimized"? 1176 metadata !"", ;; Flags 1177 i32 0, ;; Runtime Version 1178 metadata !1, ;; Enum Types 1179 metadata !1, ;; Retained Types 1180 metadata !1, ;; Subprograms 1181 metadata !3 ;; Global Variables 1182 } ; [ DW_TAG_compile_unit ] 1183 1184 ;; The Array of Global Variables 1185 !3 = metadata !{ 1186 metadata !4 1187 } 1188 1189 !4 = metadata !{ 1190 metadata !5 1191 } 1192 1193 ;; 1194 ;; Define the global variable itself. 1195 ;; 1196 !5 = metadata !{ 1197 i32 786484, ;; Tag 1198 i32 0, ;; Unused 1199 null, ;; Unused 1200 metadata !"MyGlobal", ;; Name 1201 metadata !"MyGlobal", ;; Display Name 1202 metadata !"", ;; Linkage Name 1203 metadata !6, ;; File 1204 i32 1, ;; Line 1205 metadata !7, ;; Type 1206 i32 0, ;; IsLocalToUnit 1207 i32 1, ;; IsDefinition 1208 i32* @MyGlobal ;; LLVM-IR Value 1209 } ; [ DW_TAG_variable ] 1210 1211 ;; 1212 ;; Define the file 1213 ;; 1214 !6 = metadata !{ 1215 i32 786473, ;; Tag 1216 metadata !"foo.cpp", ;; File 1217 metadata !"/Volumes/Data/tmp", ;; Directory 1218 null ;; Unused 1219 } ; [ DW_TAG_file_type ] 1220 1221 ;; 1222 ;; Define the type 1223 ;; 1224 !7 = metadata !{ 1225 i32 786468, ;; Tag 1226 null, ;; Unused 1227 metadata !"int", ;; Name 1228 null, ;; Unused 1229 i32 0, ;; Line 1230 i64 32, ;; Size in Bits 1231 i64 32, ;; Align in Bits 1232 i64 0, ;; Offset 1233 i32 0, ;; Flags 1234 i32 5 ;; Encoding 1235 } ; [ DW_TAG_base_type ] 1236 1237 </pre> 1238 </div> 1239 1240 </div> 1241 1242 <!-- ======================================================================= --> 1243 <h3> 1244 <a name="ccxx_subprogram">C/C++ function information</a> 1245 </h3> 1246 1247 <div> 1248 1249 <p>Given a function declared as follows:</p> 1250 1251 <div class="doc_code"> 1252 <pre> 1253 int main(int argc, char *argv[]) { 1254 return 0; 1255 } 1256 </pre> 1257 </div> 1258 1259 <p>a C/C++ front-end would generate the following descriptors:</p> 1260 1261 <div class="doc_code"> 1262 <pre> 1263 ;; 1264 ;; Define the anchor for subprograms. Note that the second field of the 1265 ;; anchor is 46, which is the same as the tag for subprograms 1266 ;; (46 = DW_TAG_subprogram.) 1267 ;; 1268 !6 = metadata !{ 1269 i32 524334, ;; Tag 1270 i32 0, ;; Unused 1271 metadata !1, ;; Context 1272 metadata !"main", ;; Name 1273 metadata !"main", ;; Display name 1274 metadata !"main", ;; Linkage name 1275 metadata !1, ;; File 1276 i32 1, ;; Line number 1277 metadata !4, ;; Type 1278 i1 false, ;; Is local 1279 i1 true, ;; Is definition 1280 i32 0, ;; Virtuality attribute, e.g. pure virtual function 1281 i32 0, ;; Index into virtual table for C++ methods 1282 i32 0, ;; Type that holds virtual table. 1283 i32 0, ;; Flags 1284 i1 false, ;; True if this function is optimized 1285 Function *, ;; Pointer to llvm::Function 1286 null ;; Function template parameters 1287 } 1288 ;; 1289 ;; Define the subprogram itself. 1290 ;; 1291 define i32 @main(i32 %argc, i8** %argv) { 1292 ... 1293 } 1294 </pre> 1295 </div> 1296 1297 </div> 1298 1299 <!-- ======================================================================= --> 1300 <h3> 1301 <a name="ccxx_basic_types">C/C++ basic types</a> 1302 </h3> 1303 1304 <div> 1305 1306 <p>The following are the basic type descriptors for C/C++ core types:</p> 1307 1308 <!-- ======================================================================= --> 1309 <h4> 1310 <a name="ccxx_basic_type_bool">bool</a> 1311 </h4> 1312 1313 <div> 1314 1315 <div class="doc_code"> 1316 <pre> 1317 !2 = metadata !{ 1318 i32 524324, ;; Tag 1319 metadata !1, ;; Context 1320 metadata !"bool", ;; Name 1321 metadata !1, ;; File 1322 i32 0, ;; Line number 1323 i64 8, ;; Size in Bits 1324 i64 8, ;; Align in Bits 1325 i64 0, ;; Offset in Bits 1326 i32 0, ;; Flags 1327 i32 2 ;; Encoding 1328 } 1329 </pre> 1330 </div> 1331 1332 </div> 1333 1334 <!-- ======================================================================= --> 1335 <h4> 1336 <a name="ccxx_basic_char">char</a> 1337 </h4> 1338 1339 <div> 1340 1341 <div class="doc_code"> 1342 <pre> 1343 !2 = metadata !{ 1344 i32 524324, ;; Tag 1345 metadata !1, ;; Context 1346 metadata !"char", ;; Name 1347 metadata !1, ;; File 1348 i32 0, ;; Line number 1349 i64 8, ;; Size in Bits 1350 i64 8, ;; Align in Bits 1351 i64 0, ;; Offset in Bits 1352 i32 0, ;; Flags 1353 i32 6 ;; Encoding 1354 } 1355 </pre> 1356 </div> 1357 1358 </div> 1359 1360 <!-- ======================================================================= --> 1361 <h4> 1362 <a name="ccxx_basic_unsigned_char">unsigned char</a> 1363 </h4> 1364 1365 <div> 1366 1367 <div class="doc_code"> 1368 <pre> 1369 !2 = metadata !{ 1370 i32 524324, ;; Tag 1371 metadata !1, ;; Context 1372 metadata !"unsigned char", 1373 metadata !1, ;; File 1374 i32 0, ;; Line number 1375 i64 8, ;; Size in Bits 1376 i64 8, ;; Align in Bits 1377 i64 0, ;; Offset in Bits 1378 i32 0, ;; Flags 1379 i32 8 ;; Encoding 1380 } 1381 </pre> 1382 </div> 1383 1384 </div> 1385 1386 <!-- ======================================================================= --> 1387 <h4> 1388 <a name="ccxx_basic_short">short</a> 1389 </h4> 1390 1391 <div> 1392 1393 <div class="doc_code"> 1394 <pre> 1395 !2 = metadata !{ 1396 i32 524324, ;; Tag 1397 metadata !1, ;; Context 1398 metadata !"short int", 1399 metadata !1, ;; File 1400 i32 0, ;; Line number 1401 i64 16, ;; Size in Bits 1402 i64 16, ;; Align in Bits 1403 i64 0, ;; Offset in Bits 1404 i32 0, ;; Flags 1405 i32 5 ;; Encoding 1406 } 1407 </pre> 1408 </div> 1409 1410 </div> 1411 1412 <!-- ======================================================================= --> 1413 <h4> 1414 <a name="ccxx_basic_unsigned_short">unsigned short</a> 1415 </h4> 1416 1417 <div> 1418 1419 <div class="doc_code"> 1420 <pre> 1421 !2 = metadata !{ 1422 i32 524324, ;; Tag 1423 metadata !1, ;; Context 1424 metadata !"short unsigned int", 1425 metadata !1, ;; File 1426 i32 0, ;; Line number 1427 i64 16, ;; Size in Bits 1428 i64 16, ;; Align in Bits 1429 i64 0, ;; Offset in Bits 1430 i32 0, ;; Flags 1431 i32 7 ;; Encoding 1432 } 1433 </pre> 1434 </div> 1435 1436 </div> 1437 1438 <!-- ======================================================================= --> 1439 <h4> 1440 <a name="ccxx_basic_int">int</a> 1441 </h4> 1442 1443 <div> 1444 1445 <div class="doc_code"> 1446 <pre> 1447 !2 = metadata !{ 1448 i32 524324, ;; Tag 1449 metadata !1, ;; Context 1450 metadata !"int", ;; Name 1451 metadata !1, ;; File 1452 i32 0, ;; Line number 1453 i64 32, ;; Size in Bits 1454 i64 32, ;; Align in Bits 1455 i64 0, ;; Offset in Bits 1456 i32 0, ;; Flags 1457 i32 5 ;; Encoding 1458 } 1459 </pre></div> 1460 1461 </div> 1462 1463 <!-- ======================================================================= --> 1464 <h4> 1465 <a name="ccxx_basic_unsigned_int">unsigned int</a> 1466 </h4> 1467 1468 <div> 1469 1470 <div class="doc_code"> 1471 <pre> 1472 !2 = metadata !{ 1473 i32 524324, ;; Tag 1474 metadata !1, ;; Context 1475 metadata !"unsigned int", 1476 metadata !1, ;; File 1477 i32 0, ;; Line number 1478 i64 32, ;; Size in Bits 1479 i64 32, ;; Align in Bits 1480 i64 0, ;; Offset in Bits 1481 i32 0, ;; Flags 1482 i32 7 ;; Encoding 1483 } 1484 </pre> 1485 </div> 1486 1487 </div> 1488 1489 <!-- ======================================================================= --> 1490 <h4> 1491 <a name="ccxx_basic_long_long">long long</a> 1492 </h4> 1493 1494 <div> 1495 1496 <div class="doc_code"> 1497 <pre> 1498 !2 = metadata !{ 1499 i32 524324, ;; Tag 1500 metadata !1, ;; Context 1501 metadata !"long long int", 1502 metadata !1, ;; File 1503 i32 0, ;; Line number 1504 i64 64, ;; Size in Bits 1505 i64 64, ;; Align in Bits 1506 i64 0, ;; Offset in Bits 1507 i32 0, ;; Flags 1508 i32 5 ;; Encoding 1509 } 1510 </pre> 1511 </div> 1512 1513 </div> 1514 1515 <!-- ======================================================================= --> 1516 <h4> 1517 <a name="ccxx_basic_unsigned_long_long">unsigned long long</a> 1518 </h4> 1519 1520 <div> 1521 1522 <div class="doc_code"> 1523 <pre> 1524 !2 = metadata !{ 1525 i32 524324, ;; Tag 1526 metadata !1, ;; Context 1527 metadata !"long long unsigned int", 1528 metadata !1, ;; File 1529 i32 0, ;; Line number 1530 i64 64, ;; Size in Bits 1531 i64 64, ;; Align in Bits 1532 i64 0, ;; Offset in Bits 1533 i32 0, ;; Flags 1534 i32 7 ;; Encoding 1535 } 1536 </pre> 1537 </div> 1538 1539 </div> 1540 1541 <!-- ======================================================================= --> 1542 <h4> 1543 <a name="ccxx_basic_float">float</a> 1544 </h4> 1545 1546 <div> 1547 1548 <div class="doc_code"> 1549 <pre> 1550 !2 = metadata !{ 1551 i32 524324, ;; Tag 1552 metadata !1, ;; Context 1553 metadata !"float", 1554 metadata !1, ;; File 1555 i32 0, ;; Line number 1556 i64 32, ;; Size in Bits 1557 i64 32, ;; Align in Bits 1558 i64 0, ;; Offset in Bits 1559 i32 0, ;; Flags 1560 i32 4 ;; Encoding 1561 } 1562 </pre> 1563 </div> 1564 1565 </div> 1566 1567 <!-- ======================================================================= --> 1568 <h4> 1569 <a name="ccxx_basic_double">double</a> 1570 </h4> 1571 1572 <div> 1573 1574 <div class="doc_code"> 1575 <pre> 1576 !2 = metadata !{ 1577 i32 524324, ;; Tag 1578 metadata !1, ;; Context 1579 metadata !"double",;; Name 1580 metadata !1, ;; File 1581 i32 0, ;; Line number 1582 i64 64, ;; Size in Bits 1583 i64 64, ;; Align in Bits 1584 i64 0, ;; Offset in Bits 1585 i32 0, ;; Flags 1586 i32 4 ;; Encoding 1587 } 1588 </pre> 1589 </div> 1590 1591 </div> 1592 1593 </div> 1594 1595 <!-- ======================================================================= --> 1596 <h3> 1597 <a name="ccxx_derived_types">C/C++ derived types</a> 1598 </h3> 1599 1600 <div> 1601 1602 <p>Given the following as an example of C/C++ derived type:</p> 1603 1604 <div class="doc_code"> 1605 <pre> 1606 typedef const int *IntPtr; 1607 </pre> 1608 </div> 1609 1610 <p>a C/C++ front-end would generate the following descriptors:</p> 1611 1612 <div class="doc_code"> 1613 <pre> 1614 ;; 1615 ;; Define the typedef "IntPtr". 1616 ;; 1617 !2 = metadata !{ 1618 i32 524310, ;; Tag 1619 metadata !1, ;; Context 1620 metadata !"IntPtr", ;; Name 1621 metadata !3, ;; File 1622 i32 0, ;; Line number 1623 i64 0, ;; Size in bits 1624 i64 0, ;; Align in bits 1625 i64 0, ;; Offset in bits 1626 i32 0, ;; Flags 1627 metadata !4 ;; Derived From type 1628 } 1629 1630 ;; 1631 ;; Define the pointer type. 1632 ;; 1633 !4 = metadata !{ 1634 i32 524303, ;; Tag 1635 metadata !1, ;; Context 1636 metadata !"", ;; Name 1637 metadata !1, ;; File 1638 i32 0, ;; Line number 1639 i64 64, ;; Size in bits 1640 i64 64, ;; Align in bits 1641 i64 0, ;; Offset in bits 1642 i32 0, ;; Flags 1643 metadata !5 ;; Derived From type 1644 } 1645 ;; 1646 ;; Define the const type. 1647 ;; 1648 !5 = metadata !{ 1649 i32 524326, ;; Tag 1650 metadata !1, ;; Context 1651 metadata !"", ;; Name 1652 metadata !1, ;; File 1653 i32 0, ;; Line number 1654 i64 32, ;; Size in bits 1655 i64 32, ;; Align in bits 1656 i64 0, ;; Offset in bits 1657 i32 0, ;; Flags 1658 metadata !6 ;; Derived From type 1659 } 1660 ;; 1661 ;; Define the int type. 1662 ;; 1663 !6 = metadata !{ 1664 i32 524324, ;; Tag 1665 metadata !1, ;; Context 1666 metadata !"int", ;; Name 1667 metadata !1, ;; File 1668 i32 0, ;; Line number 1669 i64 32, ;; Size in bits 1670 i64 32, ;; Align in bits 1671 i64 0, ;; Offset in bits 1672 i32 0, ;; Flags 1673 5 ;; Encoding 1674 } 1675 </pre> 1676 </div> 1677 1678 </div> 1679 1680 <!-- ======================================================================= --> 1681 <h3> 1682 <a name="ccxx_composite_types">C/C++ struct/union types</a> 1683 </h3> 1684 1685 <div> 1686 1687 <p>Given the following as an example of C/C++ struct type:</p> 1688 1689 <div class="doc_code"> 1690 <pre> 1691 struct Color { 1692 unsigned Red; 1693 unsigned Green; 1694 unsigned Blue; 1695 }; 1696 </pre> 1697 </div> 1698 1699 <p>a C/C++ front-end would generate the following descriptors:</p> 1700 1701 <div class="doc_code"> 1702 <pre> 1703 ;; 1704 ;; Define basic type for unsigned int. 1705 ;; 1706 !5 = metadata !{ 1707 i32 524324, ;; Tag 1708 metadata !1, ;; Context 1709 metadata !"unsigned int", 1710 metadata !1, ;; File 1711 i32 0, ;; Line number 1712 i64 32, ;; Size in Bits 1713 i64 32, ;; Align in Bits 1714 i64 0, ;; Offset in Bits 1715 i32 0, ;; Flags 1716 i32 7 ;; Encoding 1717 } 1718 ;; 1719 ;; Define composite type for struct Color. 1720 ;; 1721 !2 = metadata !{ 1722 i32 524307, ;; Tag 1723 metadata !1, ;; Context 1724 metadata !"Color", ;; Name 1725 metadata !1, ;; Compile unit 1726 i32 1, ;; Line number 1727 i64 96, ;; Size in bits 1728 i64 32, ;; Align in bits 1729 i64 0, ;; Offset in bits 1730 i32 0, ;; Flags 1731 null, ;; Derived From 1732 metadata !3, ;; Elements 1733 i32 0 ;; Runtime Language 1734 } 1735 1736 ;; 1737 ;; Define the Red field. 1738 ;; 1739 !4 = metadata !{ 1740 i32 524301, ;; Tag 1741 metadata !1, ;; Context 1742 metadata !"Red", ;; Name 1743 metadata !1, ;; File 1744 i32 2, ;; Line number 1745 i64 32, ;; Size in bits 1746 i64 32, ;; Align in bits 1747 i64 0, ;; Offset in bits 1748 i32 0, ;; Flags 1749 metadata !5 ;; Derived From type 1750 } 1751 1752 ;; 1753 ;; Define the Green field. 1754 ;; 1755 !6 = metadata !{ 1756 i32 524301, ;; Tag 1757 metadata !1, ;; Context 1758 metadata !"Green", ;; Name 1759 metadata !1, ;; File 1760 i32 3, ;; Line number 1761 i64 32, ;; Size in bits 1762 i64 32, ;; Align in bits 1763 i64 32, ;; Offset in bits 1764 i32 0, ;; Flags 1765 metadata !5 ;; Derived From type 1766 } 1767 1768 ;; 1769 ;; Define the Blue field. 1770 ;; 1771 !7 = metadata !{ 1772 i32 524301, ;; Tag 1773 metadata !1, ;; Context 1774 metadata !"Blue", ;; Name 1775 metadata !1, ;; File 1776 i32 4, ;; Line number 1777 i64 32, ;; Size in bits 1778 i64 32, ;; Align in bits 1779 i64 64, ;; Offset in bits 1780 i32 0, ;; Flags 1781 metadata !5 ;; Derived From type 1782 } 1783 1784 ;; 1785 ;; Define the array of fields used by the composite type Color. 1786 ;; 1787 !3 = metadata !{metadata !4, metadata !6, metadata !7} 1788 </pre> 1789 </div> 1790 1791 </div> 1792 1793 <!-- ======================================================================= --> 1794 <h3> 1795 <a name="ccxx_enumeration_types">C/C++ enumeration types</a> 1796 </h3> 1797 1798 <div> 1799 1800 <p>Given the following as an example of C/C++ enumeration type:</p> 1801 1802 <div class="doc_code"> 1803 <pre> 1804 enum Trees { 1805 Spruce = 100, 1806 Oak = 200, 1807 Maple = 300 1808 }; 1809 </pre> 1810 </div> 1811 1812 <p>a C/C++ front-end would generate the following descriptors:</p> 1813 1814 <div class="doc_code"> 1815 <pre> 1816 ;; 1817 ;; Define composite type for enum Trees 1818 ;; 1819 !2 = metadata !{ 1820 i32 524292, ;; Tag 1821 metadata !1, ;; Context 1822 metadata !"Trees", ;; Name 1823 metadata !1, ;; File 1824 i32 1, ;; Line number 1825 i64 32, ;; Size in bits 1826 i64 32, ;; Align in bits 1827 i64 0, ;; Offset in bits 1828 i32 0, ;; Flags 1829 null, ;; Derived From type 1830 metadata !3, ;; Elements 1831 i32 0 ;; Runtime language 1832 } 1833 1834 ;; 1835 ;; Define the array of enumerators used by composite type Trees. 1836 ;; 1837 !3 = metadata !{metadata !4, metadata !5, metadata !6} 1838 1839 ;; 1840 ;; Define Spruce enumerator. 1841 ;; 1842 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100} 1843 1844 ;; 1845 ;; Define Oak enumerator. 1846 ;; 1847 !5 = metadata !{i32 524328, metadata !"Oak", i64 200} 1848 1849 ;; 1850 ;; Define Maple enumerator. 1851 ;; 1852 !6 = metadata !{i32 524328, metadata !"Maple", i64 300} 1853 1854 </pre> 1855 </div> 1856 1857 </div> 1858 1859 </div> 1860 1861 1862 <!-- *********************************************************************** --> 1863 <h2> 1864 <a name="llvmdwarfextension">Debugging information format</a> 1865 </h2> 1866 <!-- *********************************************************************** --> 1867 <div> 1868 <!-- ======================================================================= --> 1869 <h3> 1870 <a name="objcproperty">Debugging Information Extension for Objective C Properties</a> 1871 </h3> 1872 <div> 1873 <!-- *********************************************************************** --> 1874 <h4> 1875 <a name="objcpropertyintroduction">Introduction</a> 1876 </h4> 1877 <!-- *********************************************************************** --> 1878 1879 <div> 1880 <p>Objective C provides a simpler way to declare and define accessor methods 1881 using declared properties. The language provides features to declare a 1882 property and to let compiler synthesize accessor methods. 1883 </p> 1884 1885 <p>The debugger lets developer inspect Objective C interfaces and their 1886 instance variables and class variables. However, the debugger does not know 1887 anything about the properties defined in Objective C interfaces. The debugger 1888 consumes information generated by compiler in DWARF format. The format does 1889 not support encoding of Objective C properties. This proposal describes DWARF 1890 extensions to encode Objective C properties, which the debugger can use to let 1891 developers inspect Objective C properties. 1892 </p> 1893 1894 </div> 1895 1896 1897 <!-- *********************************************************************** --> 1898 <h4> 1899 <a name="objcpropertyproposal">Proposal</a> 1900 </h4> 1901 <!-- *********************************************************************** --> 1902 1903 <div> 1904 <p>Objective C properties exist separately from class members. A property 1905 can be defined only by "setter" and "getter" selectors, and 1906 be calculated anew on each access. Or a property can just be a direct access 1907 to some declared ivar. Finally it can have an ivar "automatically 1908 synthesized" for it by the compiler, in which case the property can be 1909 referred to in user code directly using the standard C dereference syntax as 1910 well as through the property "dot" syntax, but there is no entry in 1911 the @interface declaration corresponding to this ivar. 1912 </p> 1913 <p> 1914 To facilitate debugging, these properties we will add a new DWARF TAG into the 1915 DW_TAG_structure_type definition for the class to hold the description of a 1916 given property, and a set of DWARF attributes that provide said description. 1917 The property tag will also contain the name and declared type of the property. 1918 </p> 1919 <p> 1920 If there is a related ivar, there will also be a DWARF property attribute placed 1921 in the DW_TAG_member DIE for that ivar referring back to the property TAG for 1922 that property. And in the case where the compiler synthesizes the ivar directly, 1923 the compiler is expected to generate a DW_TAG_member for that ivar (with the 1924 DW_AT_artificial set to 1), whose name will be the name used to access this 1925 ivar directly in code, and with the property attribute pointing back to the 1926 property it is backing. 1927 </p> 1928 <p> 1929 The following examples will serve as illustration for our discussion: 1930 </p> 1931 1932 <div class="doc_code"> 1933 <pre> 1934 @interface I1 { 1935 int n2; 1936 } 1937 1938 @property int p1; 1939 @property int p2; 1940 @end 1941 1942 @implementation I1 1943 @synthesize p1; 1944 @synthesize p2 = n2; 1945 @end 1946 </pre> 1947 </div> 1948 1949 <p> 1950 This produces the following DWARF (this is a "pseudo dwarfdump" output): 1951 </p> 1952 <div class="doc_code"> 1953 <pre> 1954 0x00000100: TAG_structure_type [7] * 1955 AT_APPLE_runtime_class( 0x10 ) 1956 AT_name( "I1" ) 1957 AT_decl_file( "Objc_Property.m" ) 1958 AT_decl_line( 3 ) 1959 1960 0x00000110 TAG_APPLE_property 1961 AT_name ( "p1" ) 1962 AT_type ( {0x00000150} ( int ) ) 1963 1964 0x00000120: TAG_APPLE_property 1965 AT_name ( "p2" ) 1966 AT_type ( {0x00000150} ( int ) ) 1967 1968 0x00000130: TAG_member [8] 1969 AT_name( "_p1" ) 1970 AT_APPLE_property ( {0x00000110} "p1" ) 1971 AT_type( {0x00000150} ( int ) ) 1972 AT_artificial ( 0x1 ) 1973 1974 0x00000140: TAG_member [8] 1975 AT_name( "n2" ) 1976 AT_APPLE_property ( {0x00000120} "p2" ) 1977 AT_type( {0x00000150} ( int ) ) 1978 1979 0x00000150: AT_type( ( int ) ) 1980 </pre> 1981 </div> 1982 1983 <p> Note, the current convention is that the name of the ivar for an 1984 auto-synthesized property is the name of the property from which it derives with 1985 an underscore prepended, as is shown in the example. 1986 But we actually don't need to know this convention, since we are given the name 1987 of the ivar directly. 1988 </p> 1989 1990 <p> 1991 Also, it is common practice in ObjC to have different property declarations in 1992 the @interface and @implementation - e.g. to provide a read-only property in 1993 the interface,and a read-write interface in the implementation. In that case, 1994 the compiler should emit whichever property declaration will be in force in the 1995 current translation unit. 1996 </p> 1997 1998 <p> Developers can decorate a property with attributes which are encoded using 1999 DW_AT_APPLE_property_attribute. 2000 </p> 2001 2002 <div class="doc_code"> 2003 <pre> 2004 @property (readonly, nonatomic) int pr; 2005 </pre> 2006 </div> 2007 <p> 2008 Which produces a property tag: 2009 <p> 2010 <div class="doc_code"> 2011 <pre> 2012 TAG_APPLE_property [8] 2013 AT_name( "pr" ) 2014 AT_type ( {0x00000147} (int) ) 2015 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 2016 </pre> 2017 </div> 2018 2019 <p> The setter and getter method names are attached to the property using 2020 DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes. 2021 </p> 2022 <div class="doc_code"> 2023 <pre> 2024 @interface I1 2025 @property (setter=myOwnP3Setter:) int p3; 2026 -(void)myOwnP3Setter:(int)a; 2027 @end 2028 2029 @implementation I1 2030 @synthesize p3; 2031 -(void)myOwnP3Setter:(int)a{ } 2032 @end 2033 </pre> 2034 </div> 2035 2036 <p> 2037 The DWARF for this would be: 2038 </p> 2039 <div class="doc_code"> 2040 <pre> 2041 0x000003bd: TAG_structure_type [7] * 2042 AT_APPLE_runtime_class( 0x10 ) 2043 AT_name( "I1" ) 2044 AT_decl_file( "Objc_Property.m" ) 2045 AT_decl_line( 3 ) 2046 2047 0x000003cd TAG_APPLE_property 2048 AT_name ( "p3" ) 2049 AT_APPLE_property_setter ( "myOwnP3Setter:" ) 2050 AT_type( {0x00000147} ( int ) ) 2051 2052 0x000003f3: TAG_member [8] 2053 AT_name( "_p3" ) 2054 AT_type ( {0x00000147} ( int ) ) 2055 AT_APPLE_property ( {0x000003cd} ) 2056 AT_artificial ( 0x1 ) 2057 </pre> 2058 </div> 2059 2060 </div> 2061 2062 <!-- *********************************************************************** --> 2063 <h4> 2064 <a name="objcpropertynewtags">New DWARF Tags</a> 2065 </h4> 2066 <!-- *********************************************************************** --> 2067 2068 <div> 2069 <table border="1" cellspacing="0"> 2070 <col width="200"> 2071 <col width="200"> 2072 <tr> 2073 <th>TAG</th> 2074 <th>Value</th> 2075 </tr> 2076 <tr> 2077 <td>DW_TAG_APPLE_property</td> 2078 <td>0x4200</td> 2079 </tr> 2080 </table> 2081 2082 </div> 2083 2084 <!-- *********************************************************************** --> 2085 <h4> 2086 <a name="objcpropertynewattributes">New DWARF Attributes</a> 2087 </h4> 2088 <!-- *********************************************************************** --> 2089 2090 <div> 2091 <table border="1" cellspacing="0"> 2092 <col width="200"> 2093 <col width="200"> 2094 <col width="200"> 2095 <tr> 2096 <th>Attribute</th> 2097 <th>Value</th> 2098 <th>Classes</th> 2099 </tr> 2100 <tr> 2101 <td>DW_AT_APPLE_property</td> 2102 <td>0x3fed</td> 2103 <td>Reference</td> 2104 </tr> 2105 <tr> 2106 <td>DW_AT_APPLE_property_getter</td> 2107 <td>0x3fe9</td> 2108 <td>String</td> 2109 </tr> 2110 <tr> 2111 <td>DW_AT_APPLE_property_setter</td> 2112 <td>0x3fea</td> 2113 <td>String</td> 2114 </tr> 2115 <tr> 2116 <td>DW_AT_APPLE_property_attribute</td> 2117 <td>0x3feb</td> 2118 <td>Constant</td> 2119 </tr> 2120 </table> 2121 2122 </div> 2123 2124 <!-- *********************************************************************** --> 2125 <h4> 2126 <a name="objcpropertynewconstants">New DWARF Constants</a> 2127 </h4> 2128 <!-- *********************************************************************** --> 2129 2130 <div> 2131 <table border="1" cellspacing="0"> 2132 <col width="200"> 2133 <col width="200"> 2134 <tr> 2135 <th>Name</th> 2136 <th>Value</th> 2137 </tr> 2138 <tr> 2139 <td>DW_AT_APPLE_PROPERTY_readonly</td> 2140 <td>0x1</td> 2141 </tr> 2142 <tr> 2143 <td>DW_AT_APPLE_PROPERTY_readwrite</td> 2144 <td>0x2</td> 2145 </tr> 2146 <tr> 2147 <td>DW_AT_APPLE_PROPERTY_assign</td> 2148 <td>0x4</td> 2149 </tr> 2150 <tr> 2151 <td>DW_AT_APPLE_PROPERTY_retain</td> 2152 <td>0x8</td> 2153 </tr> 2154 <tr> 2155 <td>DW_AT_APPLE_PROPERTY_copy</td> 2156 <td>0x10</td> 2157 </tr> 2158 <tr> 2159 <td>DW_AT_APPLE_PROPERTY_nonatomic</td> 2160 <td>0x20</td> 2161 </tr> 2162 </table> 2163 2164 </div> 2165 </div> 2166 2167 <!-- ======================================================================= --> 2168 <h3> 2169 <a name="acceltable">Name Accelerator Tables</a> 2170 </h3> 2171 <!-- ======================================================================= --> 2172 <div> 2173 <!-- ======================================================================= --> 2174 <h4> 2175 <a name="acceltableintroduction">Introduction</a> 2176 </h4> 2177 <!-- ======================================================================= --> 2178 <div> 2179 <p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger 2180 needs. The "pub" in the section name indicates that the entries in the 2181 table are publicly visible names only. This means no static or hidden 2182 functions show up in the .debug_pubnames. No static variables or private class 2183 variables are in the .debug_pubtypes. Many compilers add different things to 2184 these tables, so we can't rely upon the contents between gcc, icc, or clang.</p> 2185 2186 <p>The typical query given by users tends not to match up with the contents of 2187 these tables. For example, the DWARF spec states that "In the case of the 2188 name of a function member or static data member of a C++ structure, class or 2189 union, the name presented in the .debug_pubnames section is not the simple 2190 name given by the DW_AT_name attribute of the referenced debugging information 2191 entry, but rather the fully qualified name of the data or function member." 2192 So the only names in these tables for complex C++ entries is a fully 2193 qualified name. Debugger users tend not to enter their search strings as 2194 "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So 2195 the name entered in the name table must be demangled in order to chop it up 2196 appropriately and additional names must be manually entered into the table 2197 to make it effective as a name lookup table for debuggers to use.</p> 2198 2199 <p>All debuggers currently ignore the .debug_pubnames table as a result of 2200 its inconsistent and useless public-only name content making it a waste of 2201 space in the object file. These tables, when they are written to disk, are 2202 not sorted in any way, leaving every debugger to do its own parsing 2203 and sorting. These tables also include an inlined copy of the string values 2204 in the table itself making the tables much larger than they need to be on 2205 disk, especially for large C++ programs.</p> 2206 2207 <p>Can't we just fix the sections by adding all of the names we need to this 2208 table? No, because that is not what the tables are defined to contain and we 2209 won't know the difference between the old bad tables and the new good tables. 2210 At best we could make our own renamed sections that contain all of the data 2211 we need.</p> 2212 2213 <p>These tables are also insufficient for what a debugger like LLDB needs. 2214 LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is 2215 then often asked to look for type "foo" or namespace "bar", or list items in 2216 namespace "baz". Namespaces are not included in the pubnames or pubtypes 2217 tables. Since clang asks a lot of questions when it is parsing an expression, 2218 we need to be very fast when looking up names, as it happens a lot. Having new 2219 accelerator tables that are optimized for very quick lookups will benefit 2220 this type of debugging experience greatly.</p> 2221 2222 <p>We would like to generate name lookup tables that can be mapped into 2223 memory from disk, and used as is, with little or no up-front parsing. We would 2224 also be able to control the exact content of these different tables so they 2225 contain exactly what we need. The Name Accelerator Tables were designed 2226 to fix these issues. In order to solve these issues we need to:</p> 2227 2228 <ul> 2229 <li>Have a format that can be mapped into memory from disk and used as is</li> 2230 <li>Lookups should be very fast</li> 2231 <li>Extensible table format so these tables can be made by many producers</li> 2232 <li>Contain all of the names needed for typical lookups out of the box</li> 2233 <li>Strict rules for the contents of tables</li> 2234 </ul> 2235 2236 <p>Table size is important and the accelerator table format should allow the 2237 reuse of strings from common string tables so the strings for the names are 2238 not duplicated. We also want to make sure the table is ready to be used as-is 2239 by simply mapping the table into memory with minimal header parsing.</p> 2240 2241 <p>The name lookups need to be fast and optimized for the kinds of lookups 2242 that debuggers tend to do. Optimally we would like to touch as few parts of 2243 the mapped table as possible when doing a name lookup and be able to quickly 2244 find the name entry we are looking for, or discover there are no matches. In 2245 the case of debuggers we optimized for lookups that fail most of the time.</p> 2246 2247 <p>Each table that is defined should have strict rules on exactly what is in 2248 the accelerator tables and documented so clients can rely on the content.</p> 2249 2250 </div> 2251 2252 <!-- ======================================================================= --> 2253 <h4> 2254 <a name="acceltablehashes">Hash Tables</a> 2255 </h4> 2256 <!-- ======================================================================= --> 2257 2258 <div> 2259 <h5>Standard Hash Tables</h5> 2260 2261 <p>Typical hash tables have a header, buckets, and each bucket points to the 2262 bucket contents: 2263 </p> 2264 2265 <div class="doc_code"> 2266 <pre> 2267 .------------. 2268 | HEADER | 2269 |------------| 2270 | BUCKETS | 2271 |------------| 2272 | DATA | 2273 `------------' 2274 </pre> 2275 </div> 2276 2277 <p>The BUCKETS are an array of offsets to DATA for each hash:</p> 2278 2279 <div class="doc_code"> 2280 <pre> 2281 .------------. 2282 | 0x00001000 | BUCKETS[0] 2283 | 0x00002000 | BUCKETS[1] 2284 | 0x00002200 | BUCKETS[2] 2285 | 0x000034f0 | BUCKETS[3] 2286 | | ... 2287 | 0xXXXXXXXX | BUCKETS[n_buckets] 2288 '------------' 2289 </pre> 2290 </div> 2291 2292 <p>So for bucket[3] in the example above, we have an offset into the table 2293 0x000034f0 which points to a chain of entries for the bucket. Each bucket 2294 must contain a next pointer, full 32 bit hash value, the string itself, 2295 and the data for the current string value.</p> 2296 2297 <div class="doc_code"> 2298 <pre> 2299 .------------. 2300 0x000034f0: | 0x00003500 | next pointer 2301 | 0x12345678 | 32 bit hash 2302 | "erase" | string value 2303 | data[n] | HashData for this bucket 2304 |------------| 2305 0x00003500: | 0x00003550 | next pointer 2306 | 0x29273623 | 32 bit hash 2307 | "dump" | string value 2308 | data[n] | HashData for this bucket 2309 |------------| 2310 0x00003550: | 0x00000000 | next pointer 2311 | 0x82638293 | 32 bit hash 2312 | "main" | string value 2313 | data[n] | HashData for this bucket 2314 `------------' 2315 </pre> 2316 </div> 2317 2318 <p>The problem with this layout for debuggers is that we need to optimize for 2319 the negative lookup case where the symbol we're searching for is not present. 2320 So if we were to lookup "printf" in the table above, we would make a 32 hash 2321 for "printf", it might match bucket[3]. We would need to go to the offset 2322 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we 2323 need to read the next pointer, then read the hash, compare it, and skip to 2324 the next bucket. Each time we are skipping many bytes in memory and touching 2325 new cache pages just to do the compare on the full 32 bit hash. All of these 2326 accesses then tell us that we didn't have a match.</p> 2327 2328 <h5>Name Hash Tables</h5> 2329 2330 <p>To solve the issues mentioned above we have structured the hash tables 2331 a bit differently: a header, buckets, an array of all unique 32 bit hash 2332 values, followed by an array of hash value data offsets, one for each hash 2333 value, then the data for all hash values:</p> 2334 2335 <div class="doc_code"> 2336 <pre> 2337 .-------------. 2338 | HEADER | 2339 |-------------| 2340 | BUCKETS | 2341 |-------------| 2342 | HASHES | 2343 |-------------| 2344 | OFFSETS | 2345 |-------------| 2346 | DATA | 2347 `-------------' 2348 </pre> 2349 </div> 2350 2351 <p>The BUCKETS in the name tables are an index into the HASHES array. By 2352 making all of the full 32 bit hash values contiguous in memory, we allow 2353 ourselves to efficiently check for a match while touching as little 2354 memory as possible. Most often checking the 32 bit hash values is as far as 2355 the lookup goes. If it does match, it usually is a match with no collisions. 2356 So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash 2357 values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p> 2358 2359 <div class="doc_code"> 2360 <pre> 2361 .-------------------------. 2362 | HEADER.magic | uint32_t 2363 | HEADER.version | uint16_t 2364 | HEADER.hash_function | uint16_t 2365 | HEADER.bucket_count | uint32_t 2366 | HEADER.hashes_count | uint32_t 2367 | HEADER.header_data_len | uint32_t 2368 | HEADER_DATA | HeaderData 2369 |-------------------------| 2370 | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes 2371 |-------------------------| 2372 | HASHES | uint32_t[n_buckets] // 32 bit hash values 2373 |-------------------------| 2374 | OFFSETS | uint32_t[n_buckets] // 32 bit offsets to hash value data 2375 |-------------------------| 2376 | ALL HASH DATA | 2377 `-------------------------' 2378 </pre> 2379 </div> 2380 2381 <p>So taking the exact same data from the standard hash example above we end up 2382 with:</p> 2383 2384 <div class="doc_code"> 2385 <pre> 2386 .------------. 2387 | HEADER | 2388 |------------| 2389 | 0 | BUCKETS[0] 2390 | 2 | BUCKETS[1] 2391 | 5 | BUCKETS[2] 2392 | 6 | BUCKETS[3] 2393 | | ... 2394 | ... | BUCKETS[n_buckets] 2395 |------------| 2396 | 0x........ | HASHES[0] 2397 | 0x........ | HASHES[1] 2398 | 0x........ | HASHES[2] 2399 | 0x........ | HASHES[3] 2400 | 0x........ | HASHES[4] 2401 | 0x........ | HASHES[5] 2402 | 0x12345678 | HASHES[6] hash for BUCKETS[3] 2403 | 0x29273623 | HASHES[7] hash for BUCKETS[3] 2404 | 0x82638293 | HASHES[8] hash for BUCKETS[3] 2405 | 0x........ | HASHES[9] 2406 | 0x........ | HASHES[10] 2407 | 0x........ | HASHES[11] 2408 | 0x........ | HASHES[12] 2409 | 0x........ | HASHES[13] 2410 | 0x........ | HASHES[n_hashes] 2411 |------------| 2412 | 0x........ | OFFSETS[0] 2413 | 0x........ | OFFSETS[1] 2414 | 0x........ | OFFSETS[2] 2415 | 0x........ | OFFSETS[3] 2416 | 0x........ | OFFSETS[4] 2417 | 0x........ | OFFSETS[5] 2418 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 2419 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 2420 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 2421 | 0x........ | OFFSETS[9] 2422 | 0x........ | OFFSETS[10] 2423 | 0x........ | OFFSETS[11] 2424 | 0x........ | OFFSETS[12] 2425 | 0x........ | OFFSETS[13] 2426 | 0x........ | OFFSETS[n_hashes] 2427 |------------| 2428 | | 2429 | | 2430 | | 2431 | | 2432 | | 2433 |------------| 2434 0x000034f0: | 0x00001203 | .debug_str ("erase") 2435 | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 2436 | 0x........ | HashData[0] 2437 | 0x........ | HashData[1] 2438 | 0x........ | HashData[2] 2439 | 0x........ | HashData[3] 2440 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2441 |------------| 2442 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 2443 | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 2444 | 0x........ | HashData[0] 2445 | 0x........ | HashData[1] 2446 | 0x00001203 | String offset into .debug_str ("dump") 2447 | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 2448 | 0x........ | HashData[0] 2449 | 0x........ | HashData[1] 2450 | 0x........ | HashData[2] 2451 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2452 |------------| 2453 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 2454 | 0x00000009 | A 32 bit array count - number of HashData with name "main" 2455 | 0x........ | HashData[0] 2456 | 0x........ | HashData[1] 2457 | 0x........ | HashData[2] 2458 | 0x........ | HashData[3] 2459 | 0x........ | HashData[4] 2460 | 0x........ | HashData[5] 2461 | 0x........ | HashData[6] 2462 | 0x........ | HashData[7] 2463 | 0x........ | HashData[8] 2464 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2465 `------------' 2466 </pre> 2467 </div> 2468 2469 <p>So we still have all of the same data, we just organize it more efficiently 2470 for debugger lookup. If we repeat the same "printf" lookup from above, we 2471 would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash 2472 value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index 2473 into the HASHES table. We would then compare any consecutive 32 bit hashes 2474 values in the HASHES array as long as the hashes would be in BUCKETS[3]. We 2475 do this by verifying that each subsequent hash value modulo n_buckets is still 2476 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and 2477 then compare a few consecutive 32 bit hashes before we know that we have no match. 2478 We don't end up marching through multiple words of memory and we really keep the 2479 number of processor data cache lines being accessed as small as possible.</p> 2480 2481 <p>The string hash that is used for these lookup tables is the Daniel J. 2482 Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very 2483 good hash for all kinds of names in programs with very few hash collisions.</p> 2484 2485 <p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p> 2486 </div> 2487 2488 <!-- ======================================================================= --> 2489 <h4> 2490 <a name="acceltabledetails">Details</a> 2491 </h4> 2492 <!-- ======================================================================= --> 2493 <div> 2494 <p>These name hash tables are designed to be generic where specializations of 2495 the table get to define additional data that goes into the header 2496 ("HeaderData"), how the string value is stored ("KeyType") and the content 2497 of the data for each hash value.</p> 2498 2499 <h5>Header Layout</h5> 2500 <p>The header has a fixed part, and the specialized part. The exact format of 2501 the header is:</p> 2502 <div class="doc_code"> 2503 <pre> 2504 struct Header 2505 { 2506 uint32_t magic; // 'HASH' magic value to allow endian detection 2507 uint16_t version; // Version number 2508 uint16_t hash_function; // The hash function enumeration that was used 2509 uint32_t bucket_count; // The number of buckets in this hash table 2510 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 2511 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 2512 // Specifically the length of the following HeaderData field - this does not 2513 // include the size of the preceding fields 2514 HeaderData header_data; // Implementation specific header data 2515 }; 2516 </pre> 2517 </div> 2518 <p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as 2519 an ASCII integer. This allows the detection of the start of the hash table and 2520 also allows the table's byte order to be determined so the table can be 2521 correctly extracted. The "magic" value is followed by a 16 bit version number 2522 which allows the table to be revised and modified in the future. The current 2523 version number is 1. "hash_function" is a uint16_t enumeration that specifies 2524 which hash function was used to produce this table. The current values for the 2525 hash function enumerations include:</p> 2526 <div class="doc_code"> 2527 <pre> 2528 enum HashFunctionType 2529 { 2530 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 2531 }; 2532 </pre> 2533 </div> 2534 <p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets 2535 are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash 2536 values that are in the HASHES array, and is the same number of offsets are 2537 contained in the OFFSETS array. "header_data_len" specifies the size in 2538 bytes of the HeaderData that is filled in by specialized versions of this 2539 table.</p> 2540 2541 <h5>Fixed Lookup</h5> 2542 <p>The header is followed by the buckets, hashes, offsets, and hash value 2543 data. 2544 <div class="doc_code"> 2545 <pre> 2546 struct FixedTable 2547 { 2548 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 2549 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 2550 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 2551 }; 2552 </pre> 2553 </div> 2554 <p>"buckets" is an array of 32 bit indexes into the "hashes" array. The 2555 "hashes" array contains all of the 32 bit hash values for all names in the 2556 hash table. Each hash in the "hashes" table has an offset in the "offsets" 2557 array that points to the data for the hash value.</p> 2558 2559 <p>This table setup makes it very easy to repurpose these tables to contain 2560 different data, while keeping the lookup mechanism the same for all tables. 2561 This layout also makes it possible to save the table to disk and map it in 2562 later and do very efficient name lookups with little or no parsing.</p> 2563 2564 <p>DWARF lookup tables can be implemented in a variety of ways and can store 2565 a lot of information for each name. We want to make the DWARF tables 2566 extensible and able to store the data efficiently so we have used some of the 2567 DWARF features that enable efficient data storage to define exactly what kind 2568 of data we store for each name.</p> 2569 2570 <p>The "HeaderData" contains a definition of the contents of each HashData 2571 chunk. We might want to store an offset to all of the debug information 2572 entries (DIEs) for each name. To keep things extensible, we create a list of 2573 items, or Atoms, that are contained in the data for each name. First comes the 2574 type of the data in each atom:</p> 2575 <div class="doc_code"> 2576 <pre> 2577 enum AtomType 2578 { 2579 eAtomTypeNULL = 0u, 2580 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 2581 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 2582 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 2583 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 2584 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 2585 }; 2586 </pre> 2587 </div> 2588 <p>The enumeration values and their meanings are:</p> 2589 <div class="doc_code"> 2590 <pre> 2591 eAtomTypeNULL - a termination atom that specifies the end of the atom list 2592 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 2593 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 2594 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 2595 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 2596 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 2597 </pre> 2598 </div> 2599 <p>Then we allow each atom type to define the atom type and how the data for 2600 each atom type data is encoded:</p> 2601 <div class="doc_code"> 2602 <pre> 2603 struct Atom 2604 { 2605 uint16_t type; // AtomType enum value 2606 uint16_t form; // DWARF DW_FORM_XXX defines 2607 }; 2608 </pre> 2609 </div> 2610 <p>The "form" type above is from the DWARF specification and defines the 2611 exact encoding of the data for the Atom type. See the DWARF specification for 2612 the DW_FORM_ definitions.</p> 2613 <div class="doc_code"> 2614 <pre> 2615 struct HeaderData 2616 { 2617 uint32_t die_offset_base; 2618 uint32_t atom_count; 2619 Atoms atoms[atom_count0]; 2620 }; 2621 </pre> 2622 </div> 2623 <p>"HeaderData" defines the base DIE offset that should be added to any atoms 2624 that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, 2625 DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in 2626 each "HashData" object -- Atom.form tells us how large each field will be in 2627 the HashData and the Atom.type tells us how this data should be interpreted.</p> 2628 2629 <p>For the current implementations of the ".apple_names" (all functions + globals), 2630 the ".apple_types" (names of all types that are defined), and the 2631 ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p> 2632 <div class="doc_code"> 2633 <pre> 2634 HeaderData.atom_count = 1; 2635 HeaderData.atoms[0].type = eAtomTypeDIEOffset; 2636 HeaderData.atoms[0].form = DW_FORM_data4; 2637 </pre> 2638 </div> 2639 <p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 2640 encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 2641 multiple matching DIEs in a single file, which could come up with an inlined 2642 function for instance. Future tables could include more information about the 2643 DIE such as flags indicating if the DIE is a function, method, block, 2644 or inlined.</p> 2645 2646 <p>The KeyType for the DWARF table is a 32 bit string table offset into the 2647 ".debug_str" table. The ".debug_str" is the string table for the DWARF which 2648 may already contain copies of all of the strings. This helps make sure, with 2649 help from the compiler, that we reuse the strings between all of the DWARF 2650 sections and keeps the hash table size down. Another benefit to having the 2651 compiler generate all strings as DW_FORM_strp in the debug info, is that 2652 DWARF parsing can be made much faster.</p> 2653 2654 <p>After a lookup is made, we get an offset into the hash data. The hash data 2655 needs to be able to deal with 32 bit hash collisions, so the chunk of data 2656 at the offset in the hash data consists of a triple:</p> 2657 <div class="doc_code"> 2658 <pre> 2659 uint32_t str_offset 2660 uint32_t hash_data_count 2661 HashData[hash_data_count] 2662 </pre> 2663 </div> 2664 <p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the 2665 hash data chunks contain a single item (no 32 bit hash collision):</p> 2666 <div class="doc_code"> 2667 <pre> 2668 .------------. 2669 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2670 | 0x00000004 | uint32_t HashData count 2671 | 0x........ | uint32_t HashData[0] DIE offset 2672 | 0x........ | uint32_t HashData[1] DIE offset 2673 | 0x........ | uint32_t HashData[2] DIE offset 2674 | 0x........ | uint32_t HashData[3] DIE offset 2675 | 0x00000000 | uint32_t KeyType (end of hash chain) 2676 `------------' 2677 </pre> 2678 </div> 2679 <p>If there are collisions, you will have multiple valid string offsets:</p> 2680 <div class="doc_code"> 2681 <pre> 2682 .------------. 2683 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2684 | 0x00000004 | uint32_t HashData count 2685 | 0x........ | uint32_t HashData[0] DIE offset 2686 | 0x........ | uint32_t HashData[1] DIE offset 2687 | 0x........ | uint32_t HashData[2] DIE offset 2688 | 0x........ | uint32_t HashData[3] DIE offset 2689 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 2690 | 0x00000002 | uint32_t HashData count 2691 | 0x........ | uint32_t HashData[0] DIE offset 2692 | 0x........ | uint32_t HashData[1] DIE offset 2693 | 0x00000000 | uint32_t KeyType (end of hash chain) 2694 `------------' 2695 </pre> 2696 </div> 2697 <p>Current testing with real world C++ binaries has shown that there is around 1 2698 32 bit hash collision per 100,000 name entries.</p> 2699 </div> 2700 <!-- ======================================================================= --> 2701 <h4> 2702 <a name="acceltablecontents">Contents</a> 2703 </h4> 2704 <!-- ======================================================================= --> 2705 <div> 2706 <p>As we said, we want to strictly define exactly what is included in the 2707 different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types", 2708 and ".apple_namespaces".</p> 2709 2710 <p>".apple_names" sections should contain an entry for each DWARF DIE whose 2711 DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that 2712 has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or 2713 DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr 2714 in the location (global and static variables). All global and static variables 2715 should be included, including those scoped within functions and classes. For 2716 example using the following code:</p> 2717 <div class="doc_code"> 2718 <pre> 2719 static int var = 0; 2720 2721 void f () 2722 { 2723 static int var = 0; 2724 } 2725 </pre> 2726 </div> 2727 <p>Both of the static "var" variables would be included in the table. All 2728 functions should emit both their full names and their basenames. For C or C++, 2729 the full name is the mangled name (if available) which is usually in the 2730 DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function 2731 basename. If global or static variables have a mangled name in a 2732 DW_AT_MIPS_linkage_name attribute, this should be emitted along with the 2733 simple name found in the DW_AT_name attribute.</p> 2734 2735 <p>".apple_types" sections should contain an entry for each DWARF DIE whose 2736 tag is one of:</p> 2737 <ul> 2738 <li>DW_TAG_array_type</li> 2739 <li>DW_TAG_class_type</li> 2740 <li>DW_TAG_enumeration_type</li> 2741 <li>DW_TAG_pointer_type</li> 2742 <li>DW_TAG_reference_type</li> 2743 <li>DW_TAG_string_type</li> 2744 <li>DW_TAG_structure_type</li> 2745 <li>DW_TAG_subroutine_type</li> 2746 <li>DW_TAG_typedef</li> 2747 <li>DW_TAG_union_type</li> 2748 <li>DW_TAG_ptr_to_member_type</li> 2749 <li>DW_TAG_set_type</li> 2750 <li>DW_TAG_subrange_type</li> 2751 <li>DW_TAG_base_type</li> 2752 <li>DW_TAG_const_type</li> 2753 <li>DW_TAG_constant</li> 2754 <li>DW_TAG_file_type</li> 2755 <li>DW_TAG_namelist</li> 2756 <li>DW_TAG_packed_type</li> 2757 <li>DW_TAG_volatile_type</li> 2758 <li>DW_TAG_restrict_type</li> 2759 <li>DW_TAG_interface_type</li> 2760 <li>DW_TAG_unspecified_type</li> 2761 <li>DW_TAG_shared_type</li> 2762 </ul> 2763 <p>Only entries with a DW_AT_name attribute are included, and the entry must 2764 not be a forward declaration (DW_AT_declaration attribute with a non-zero value). 2765 For example, using the following code:</p> 2766 <div class="doc_code"> 2767 <pre> 2768 int main () 2769 { 2770 int *b = 0; 2771 return *b; 2772 } 2773 </pre> 2774 </div> 2775 <p>We get a few type DIEs:</p> 2776 <div class="doc_code"> 2777 <pre> 2778 0x00000067: TAG_base_type [5] 2779 AT_encoding( DW_ATE_signed ) 2780 AT_name( "int" ) 2781 AT_byte_size( 0x04 ) 2782 2783 0x0000006e: TAG_pointer_type [6] 2784 AT_type( {0x00000067} ( int ) ) 2785 AT_byte_size( 0x08 ) 2786 </pre> 2787 </div> 2788 <p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p> 2789 2790 <p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If 2791 we run into a namespace that has no name this is an anonymous namespace, 2792 and the name should be output as "(anonymous namespace)" (without the quotes). 2793 Why? This matches the output of the abi::cxa_demangle() that is in the standard 2794 C++ library that demangles mangled names.</p> 2795 </div> 2796 2797 <!-- ======================================================================= --> 2798 <h4> 2799 <a name="acceltableextensions">Language Extensions and File Format Changes</a> 2800 </h4> 2801 <!-- ======================================================================= --> 2802 <div> 2803 <h5>Objective-C Extensions</h5> 2804 <p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an 2805 Objective-C class. The name used in the hash table is the name of the 2806 Objective-C class itself. If the Objective-C class has a category, then an 2807 entry is made for both the class name without the category, and for the class 2808 name with the category. So if we have a DIE at offset 0x1234 with a name 2809 of method "-[NSString(my_additions) stringWithSpecialString:]", we would add 2810 an entry for "NSString" that points to DIE 0x1234, and an entry for 2811 "NSString(my_additions)" that points to 0x1234. This allows us to quickly 2812 track down all Objective-C methods for an Objective-C class when doing 2813 expressions. It is needed because of the dynamic nature of Objective-C where 2814 anyone can add methods to a class. The DWARF for Objective-C methods is also 2815 emitted differently from C++ classes where the methods are not usually 2816 contained in the class definition, they are scattered about across one or more 2817 compile units. Categories can also be defined in different shared libraries. 2818 So we need to be able to quickly find all of the methods and class functions 2819 given the Objective-C class name, or quickly find all methods and class 2820 functions for a class + category name. This table does not contain any selector 2821 names, it just maps Objective-C class names (or class names + category) to all 2822 of the methods and class functions. The selectors are added as function 2823 basenames in the .debug_names section.</p> 2824 2825 <p>In the ".apple_names" section for Objective-C functions, the full name is the 2826 entire function name with the brackets ("-[NSString stringWithCString:]") and the 2827 basename is the selector only ("stringWithCString:").</p> 2828 2829 <h5>Mach-O Changes</h5> 2830 <p>The sections names for the apple hash tables are for non mach-o files. For 2831 mach-o files, the sections should be contained in the "__DWARF" segment with 2832 names as follows:</p> 2833 <ul> 2834 <li>".apple_names" -> "__apple_names"</li> 2835 <li>".apple_types" -> "__apple_types"</li> 2836 <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li> 2837 <li> ".apple_objc" -> "__apple_objc"</li> 2838 </ul> 2839 </div> 2840 </div> 2841 </div> 2842 2843 <!-- *********************************************************************** --> 2844 2845 <hr> 2846 <address> 2847 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 2848 src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> 2849 <a href="http://validator.w3.org/check/referer"><img 2850 src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> 2851 2852 <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br> 2853 <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> 2854 Last modified: $Date$ 2855 </address> 2856 2857 </body> 2858 </html> 2859