1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 <html> 4 <head> 5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 6 <title>Source Level Debugging with LLVM</title> 7 <link rel="stylesheet" href="llvm.css" type="text/css"> 8 </head> 9 <body> 10 11 <h1>Source Level Debugging with LLVM</h1> 12 13 <table class="layout" style="width:100%"> 14 <tr class="layout"> 15 <td class="left"> 16 <ul> 17 <li><a href="#introduction">Introduction</a> 18 <ol> 19 <li><a href="#phil">Philosophy behind LLVM debugging information</a></li> 20 <li><a href="#consumers">Debug information consumers</a></li> 21 <li><a href="#debugopt">Debugging optimized code</a></li> 22 </ol></li> 23 <li><a href="#format">Debugging information format</a> 24 <ol> 25 <li><a href="#debug_info_descriptors">Debug information descriptors</a> 26 <ul> 27 <li><a href="#format_compile_units">Compile unit descriptors</a></li> 28 <li><a href="#format_files">File descriptors</a></li> 29 <li><a href="#format_global_variables">Global variable descriptors</a></li> 30 <li><a href="#format_subprograms">Subprogram descriptors</a></li> 31 <li><a href="#format_blocks">Block descriptors</a></li> 32 <li><a href="#format_basic_type">Basic type descriptors</a></li> 33 <li><a href="#format_derived_type">Derived type descriptors</a></li> 34 <li><a href="#format_composite_type">Composite type descriptors</a></li> 35 <li><a href="#format_subrange">Subrange descriptors</a></li> 36 <li><a href="#format_enumeration">Enumerator descriptors</a></li> 37 <li><a href="#format_variables">Local variables</a></li> 38 </ul></li> 39 <li><a href="#format_common_intrinsics">Debugger intrinsic functions</a> 40 <ul> 41 <li><a href="#format_common_declare">llvm.dbg.declare</a></li> 42 <li><a href="#format_common_value">llvm.dbg.value</a></li> 43 </ul></li> 44 </ol></li> 45 <li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li> 46 <li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a> 47 <ol> 48 <li><a href="#ccxx_compile_units">C/C++ source file information</a></li> 49 <li><a href="#ccxx_global_variable">C/C++ global variable information</a></li> 50 <li><a href="#ccxx_subprogram">C/C++ function information</a></li> 51 <li><a href="#ccxx_basic_types">C/C++ basic types</a></li> 52 <li><a href="#ccxx_derived_types">C/C++ derived types</a></li> 53 <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li> 54 <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li> 55 </ol></li> 56 <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a> 57 <ol> 58 <li><a href="#objcproperty">Debugging Information Extension 59 for Objective C Properties</a> 60 <ul> 61 <li><a href="#objcpropertyintroduction">Introduction</a></li> 62 <li><a href="#objcpropertyproposal">Proposal</a></li> 63 <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li> 64 <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li> 65 </ul> 66 </li> 67 <li><a href="#acceltable">Name Accelerator Tables</a> 68 <ul> 69 <li><a href="#acceltableintroduction">Introduction</a></li> 70 <li><a href="#acceltablehashes">Hash Tables</a></li> 71 <li><a href="#acceltabledetails">Details</a></li> 72 <li><a href="#acceltablecontents">Contents</a></li> 73 <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li> 74 </ul> 75 </li> 76 </ol> 77 </li> 78 </ul> 79 </td> 80 <td class="right"> 81 <img src="img/venusflytrap.jpg" alt="A leafy and green bug eater" width="247" 82 height="369"> 83 </td> 84 </tr></table> 85 86 <div class="doc_author"> 87 <p>Written by <a href="mailto:sabre (a] nondot.org">Chris Lattner</a> 88 and <a href="mailto:jlaskey (a] mac.com">Jim Laskey</a></p> 89 </div> 90 91 92 <!-- *********************************************************************** --> 93 <h2><a name="introduction">Introduction</a></h2> 94 <!-- *********************************************************************** --> 95 96 <div> 97 98 <p>This document is the central repository for all information pertaining to 99 debug information in LLVM. It describes the <a href="#format">actual format 100 that the LLVM debug information</a> takes, which is useful for those 101 interested in creating front-ends or dealing directly with the information. 102 Further, this document provides specific examples of what debug information 103 for C/C++ looks like.</p> 104 105 <!-- ======================================================================= --> 106 <h3> 107 <a name="phil">Philosophy behind LLVM debugging information</a> 108 </h3> 109 110 <div> 111 112 <p>The idea of the LLVM debugging information is to capture how the important 113 pieces of the source-language's Abstract Syntax Tree map onto LLVM code. 114 Several design aspects have shaped the solution that appears here. The 115 important ones are:</p> 116 117 <ul> 118 <li>Debugging information should have very little impact on the rest of the 119 compiler. No transformations, analyses, or code generators should need to 120 be modified because of debugging information.</li> 121 122 <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and 123 easily described ways</a> with the debugging information.</li> 124 125 <li>Because LLVM is designed to support arbitrary programming languages, 126 LLVM-to-LLVM tools should not need to know anything about the semantics of 127 the source-level-language.</li> 128 129 <li>Source-level languages are often <b>widely</b> different from one another. 130 LLVM should not put any restrictions of the flavor of the source-language, 131 and the debugging information should work with any language.</li> 132 133 <li>With code generator support, it should be possible to use an LLVM compiler 134 to compile a program to native machine code and standard debugging 135 formats. This allows compatibility with traditional machine-code level 136 debuggers, like GDB or DBX.</li> 137 </ul> 138 139 <p>The approach used by the LLVM implementation is to use a small set 140 of <a href="#format_common_intrinsics">intrinsic functions</a> to define a 141 mapping between LLVM program objects and the source-level objects. The 142 description of the source-level program is maintained in LLVM metadata 143 in an <a href="#ccxx_frontend">implementation-defined format</a> 144 (the C/C++ front-end currently uses working draft 7 of 145 the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3 146 standard</a>).</p> 147 148 <p>When a program is being debugged, a debugger interacts with the user and 149 turns the stored debug information into source-language specific information. 150 As such, a debugger must be aware of the source-language, and is thus tied to 151 a specific language or family of languages.</p> 152 153 </div> 154 155 <!-- ======================================================================= --> 156 <h3> 157 <a name="consumers">Debug information consumers</a> 158 </h3> 159 160 <div> 161 162 <p>The role of debug information is to provide meta information normally 163 stripped away during the compilation process. This meta information provides 164 an LLVM user a relationship between generated code and the original program 165 source code.</p> 166 167 <p>Currently, debug information is consumed by DwarfDebug to produce dwarf 168 information used by the gdb debugger. Other targets could use the same 169 information to produce stabs or other debug forms.</p> 170 171 <p>It would also be reasonable to use debug information to feed profiling tools 172 for analysis of generated code, or, tools for reconstructing the original 173 source from generated code.</p> 174 175 <p>TODO - expound a bit more.</p> 176 177 </div> 178 179 <!-- ======================================================================= --> 180 <h3> 181 <a name="debugopt">Debugging optimized code</a> 182 </h3> 183 184 <div> 185 186 <p>An extremely high priority of LLVM debugging information is to make it 187 interact well with optimizations and analysis. In particular, the LLVM debug 188 information provides the following guarantees:</p> 189 190 <ul> 191 <li>LLVM debug information <b>always provides information to accurately read 192 the source-level state of the program</b>, regardless of which LLVM 193 optimizations have been run, and without any modification to the 194 optimizations themselves. However, some optimizations may impact the 195 ability to modify the current state of the program with a debugger, such 196 as setting program variables, or calling functions that have been 197 deleted.</li> 198 199 <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM 200 debugging information, allowing them to update the debugging information 201 as they perform aggressive optimizations. This means that, with effort, 202 the LLVM optimizers could optimize debug code just as well as non-debug 203 code.</li> 204 205 <li>LLVM debug information does not prevent optimizations from 206 happening (for example inlining, basic block reordering/merging/cleanup, 207 tail duplication, etc).</li> 208 209 <li>LLVM debug information is automatically optimized along with the rest of 210 the program, using existing facilities. For example, duplicate 211 information is automatically merged by the linker, and unused information 212 is automatically removed.</li> 213 </ul> 214 215 <p>Basically, the debug information allows you to compile a program with 216 "<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily 217 modify the program as it executes from a debugger. Compiling a program with 218 "<tt>-O3 -g</tt>" gives you full debug information that is always available 219 and accurate for reading (e.g., you get accurate stack traces despite tail 220 call elimination and inlining), but you might lose the ability to modify the 221 program and call functions where were optimized out of the program, or 222 inlined away completely.</p> 223 224 <p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a 225 framework to test optimizer's handling of debugging information. It can be 226 run like this:</p> 227 228 <div class="doc_code"> 229 <pre> 230 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 231 % make TEST=dbgopt 232 </pre> 233 </div> 234 235 <p>This will test impact of debugging information on optimization passes. If 236 debugging information influences optimization passes then it will be reported 237 as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more 238 information on LLVM test infrastructure and how to run various tests.</p> 239 240 </div> 241 242 </div> 243 244 <!-- *********************************************************************** --> 245 <h2> 246 <a name="format">Debugging information format</a> 247 </h2> 248 <!-- *********************************************************************** --> 249 250 <div> 251 252 <p>LLVM debugging information has been carefully designed to make it possible 253 for the optimizer to optimize the program and debugging information without 254 necessarily having to know anything about debugging information. In 255 particular, the use of metadata avoids duplicated debugging information from 256 the beginning, and the global dead code elimination pass automatically 257 deletes debugging information for a function if it decides to delete the 258 function. </p> 259 260 <p>To do this, most of the debugging information (descriptors for types, 261 variables, functions, source files, etc) is inserted by the language 262 front-end in the form of LLVM metadata. </p> 263 264 <p>Debug information is designed to be agnostic about the target debugger and 265 debugging information representation (e.g. DWARF/Stabs/etc). It uses a 266 generic pass to decode the information that represents variables, types, 267 functions, namespaces, etc: this allows for arbitrary source-language 268 semantics and type-systems to be used, as long as there is a module 269 written for the target debugger to interpret the information. </p> 270 271 <p>To provide basic functionality, the LLVM debugger does have to make some 272 assumptions about the source-level language being debugged, though it keeps 273 these to a minimum. The only common features that the LLVM debugger assumes 274 exist are <a href="#format_files">source files</a>, 275 and <a href="#format_global_variables">program objects</a>. These abstract 276 objects are used by a debugger to form stack traces, show information about 277 local variables, etc.</p> 278 279 <p>This section of the documentation first describes the representation aspects 280 common to any source-language. The <a href="#ccxx_frontend">next section</a> 281 describes the data layout conventions used by the C and C++ front-ends.</p> 282 283 <!-- ======================================================================= --> 284 <h3> 285 <a name="debug_info_descriptors">Debug information descriptors</a> 286 </h3> 287 288 <div> 289 290 <p>In consideration of the complexity and volume of debug information, LLVM 291 provides a specification for well formed debug descriptors. </p> 292 293 <p>Consumers of LLVM debug information expect the descriptors for program 294 objects to start in a canonical format, but the descriptors can include 295 additional information appended at the end that is source-language 296 specific. All LLVM debugging information is versioned, allowing backwards 297 compatibility in the case that the core structures need to change in some 298 way. Also, all debugging information objects start with a tag to indicate 299 what type of object it is. The source-language is allowed to define its own 300 objects, by using unreserved tag numbers. We recommend using with tags in 301 the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = 302 0x1000.)</p> 303 304 <p>The fields of debug descriptors used internally by LLVM 305 are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>, 306 <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p> 307 308 <div class="doc_code"> 309 <pre> 310 !1 = metadata !{ 311 i32, ;; A tag 312 ... 313 } 314 </pre> 315 </div> 316 317 <p><a name="LLVMDebugVersion">The first field of a descriptor is always an 318 <tt>i32</tt> containing a tag value identifying the content of the 319 descriptor. The remaining fields are specific to the descriptor. The values 320 of tags are loosely bound to the tag values of DWARF information entries. 321 However, that does not restrict the use of the information supplied to DWARF 322 targets. To facilitate versioning of debug information, the tag is augmented 323 with the current debug version (LLVMDebugVersion = 8 << 16 or 324 0x80000 or 524288.)</a></p> 325 326 <p>The details of the various descriptors follow.</p> 327 328 <!-- ======================================================================= --> 329 <h4> 330 <a name="format_compile_units">Compile unit descriptors</a> 331 </h4> 332 333 <div> 334 335 <div class="doc_code"> 336 <pre> 337 !0 = metadata !{ 338 i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 339 ;; (DW_TAG_compile_unit) 340 i32, ;; Unused field. 341 i32, ;; DWARF language identifier (ex. DW_LANG_C89) 342 metadata, ;; Source file name 343 metadata, ;; Source file directory (includes trailing slash) 344 metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") 345 i1, ;; True if this is a main compile unit. 346 i1, ;; True if this is optimized. 347 metadata, ;; Flags 348 i32 ;; Runtime version 349 metadata ;; List of enums types 350 metadata ;; List of retained types 351 metadata ;; List of subprograms 352 metadata ;; List of global variables 353 } 354 </pre> 355 </div> 356 357 <p>These descriptors contain a source language ID for the file (we use the DWARF 358 3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, 359 <tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename, 360 working directory of the compiler, and an identifier string for the compiler 361 that produced it.</p> 362 363 <p>Compile unit descriptors provide the root context for objects declared in a 364 specific compilation unit. File descriptors are defined using this context. 365 These descriptors are collected by a named metadata 366 <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms, 367 global variables and type information. 368 369 </div> 370 371 <!-- ======================================================================= --> 372 <h4> 373 <a name="format_files">File descriptors</a> 374 </h4> 375 376 <div> 377 378 <div class="doc_code"> 379 <pre> 380 !0 = metadata !{ 381 i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 382 ;; (DW_TAG_file_type) 383 metadata, ;; Source file name 384 metadata, ;; Source file directory (includes trailing slash) 385 metadata ;; Unused 386 } 387 </pre> 388 </div> 389 390 <p>These descriptors contain information for a file. Global variables and top 391 level functions would be defined using this context.k File descriptors also 392 provide context for source line correspondence. </p> 393 394 <p>Each input file is encoded as a separate file descriptor in LLVM debugging 395 information output. </p> 396 397 </div> 398 399 <!-- ======================================================================= --> 400 <h4> 401 <a name="format_global_variables">Global variable descriptors</a> 402 </h4> 403 404 <div> 405 406 <div class="doc_code"> 407 <pre> 408 !1 = metadata !{ 409 i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 410 ;; (DW_TAG_variable) 411 i32, ;; Unused field. 412 metadata, ;; Reference to context descriptor 413 metadata, ;; Name 414 metadata, ;; Display name (fully qualified C++ name) 415 metadata, ;; MIPS linkage name (for C++) 416 metadata, ;; Reference to file where defined 417 i32, ;; Line number where defined 418 metadata, ;; Reference to type descriptor 419 i1, ;; True if the global is local to compile unit (static) 420 i1, ;; True if the global is defined in the compile unit (not extern) 421 {}* ;; Reference to the global variable 422 } 423 </pre> 424 </div> 425 426 <p>These descriptors provide debug information about globals variables. The 427 provide details such as name, type and where the variable is defined. All 428 global variables are collected inside the named metadata 429 <tt>!llvm.dbg.cu</tt>.</p> 430 431 </div> 432 433 <!-- ======================================================================= --> 434 <h4> 435 <a name="format_subprograms">Subprogram descriptors</a> 436 </h4> 437 438 <div> 439 440 <div class="doc_code"> 441 <pre> 442 !2 = metadata !{ 443 i32, ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 444 ;; (DW_TAG_subprogram) 445 i32, ;; Unused field. 446 metadata, ;; Reference to context descriptor 447 metadata, ;; Name 448 metadata, ;; Display name (fully qualified C++ name) 449 metadata, ;; MIPS linkage name (for C++) 450 metadata, ;; Reference to file where defined 451 i32, ;; Line number where defined 452 metadata, ;; Reference to type descriptor 453 i1, ;; True if the global is local to compile unit (static) 454 i1, ;; True if the global is defined in the compile unit (not extern) 455 i32, ;; Line number where the scope of the subprogram begins 456 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual 457 i32, ;; Index into a virtual function 458 metadata, ;; indicates which base type contains the vtable pointer for the 459 ;; derived class 460 i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. 461 i1, ;; isOptimized 462 Function *,;; Pointer to LLVM function 463 metadata, ;; Lists function template parameters 464 metadata ;; Function declaration descriptor 465 metadata ;; List of function variables 466 } 467 </pre> 468 </div> 469 470 <p>These descriptors provide debug information about functions, methods and 471 subprograms. They provide details such as name, return types and the source 472 location where the subprogram is defined. 473 </p> 474 475 </div> 476 477 <!-- ======================================================================= --> 478 <h4> 479 <a name="format_blocks">Block descriptors</a> 480 </h4> 481 482 <div> 483 484 <div class="doc_code"> 485 <pre> 486 !3 = metadata !{ 487 i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) 488 metadata,;; Reference to context descriptor 489 i32, ;; Line number 490 i32, ;; Column number 491 metadata,;; Reference to source file 492 i32 ;; Unique ID to identify blocks from a template function 493 } 494 </pre> 495 </div> 496 497 <p>This descriptor provides debug information about nested blocks within a 498 subprogram. The line number and column numbers are used to dinstinguish 499 two lexical blocks at same depth. </p> 500 501 <div class="doc_code"> 502 <pre> 503 !3 = metadata !{ 504 i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block) 505 metadata ;; Reference to the scope we're annotating with a file change 506 metadata,;; Reference to the file the scope is enclosed in. 507 } 508 </pre> 509 </div> 510 511 <p>This descriptor provides a wrapper around a lexical scope to handle file 512 changes in the middle of a lexical block.</p> 513 514 </div> 515 516 <!-- ======================================================================= --> 517 <h4> 518 <a name="format_basic_type">Basic type descriptors</a> 519 </h4> 520 521 <div> 522 523 <div class="doc_code"> 524 <pre> 525 !4 = metadata !{ 526 i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 527 ;; (DW_TAG_base_type) 528 metadata, ;; Reference to context 529 metadata, ;; Name (may be "" for anonymous types) 530 metadata, ;; Reference to file where defined (may be NULL) 531 i32, ;; Line number where defined (may be 0) 532 i64, ;; Size in bits 533 i64, ;; Alignment in bits 534 i64, ;; Offset in bits 535 i32, ;; Flags 536 i32 ;; DWARF type encoding 537 } 538 </pre> 539 </div> 540 541 <p>These descriptors define primitive types used in the code. Example int, bool 542 and float. The context provides the scope of the type, which is usually the 543 top level. Since basic types are not usually user defined the context 544 and line number can be left as NULL and 0. The size, alignment and offset 545 are expressed in bits and can be 64 bit values. The alignment is used to 546 round the offset when embedded in a 547 <a href="#format_composite_type">composite type</a> (example to keep float 548 doubles on 64 bit boundaries.) The offset is the bit offset if embedded in 549 a <a href="#format_composite_type">composite type</a>.</p> 550 551 <p>The type encoding provides the details of the type. The values are typically 552 one of the following:</p> 553 554 <div class="doc_code"> 555 <pre> 556 DW_ATE_address = 1 557 DW_ATE_boolean = 2 558 DW_ATE_float = 4 559 DW_ATE_signed = 5 560 DW_ATE_signed_char = 6 561 DW_ATE_unsigned = 7 562 DW_ATE_unsigned_char = 8 563 </pre> 564 </div> 565 566 </div> 567 568 <!-- ======================================================================= --> 569 <h4> 570 <a name="format_derived_type">Derived type descriptors</a> 571 </h4> 572 573 <div> 574 575 <div class="doc_code"> 576 <pre> 577 !5 = metadata !{ 578 i32, ;; Tag (see below) 579 metadata, ;; Reference to context 580 metadata, ;; Name (may be "" for anonymous types) 581 metadata, ;; Reference to file where defined (may be NULL) 582 i32, ;; Line number where defined (may be 0) 583 i64, ;; Size in bits 584 i64, ;; Alignment in bits 585 i64, ;; Offset in bits 586 i32, ;; Flags to encode attributes, e.g. private 587 metadata, ;; Reference to type derived from 588 metadata, ;; (optional) Name of the Objective C property associated with 589 ;; Objective-C an ivar 590 metadata, ;; (optional) Name of the Objective C property getter selector. 591 metadata, ;; (optional) Name of the Objective C property setter selector. 592 i32 ;; (optional) Objective C property attributes. 593 } 594 </pre> 595 </div> 596 597 <p>These descriptors are used to define types derived from other types. The 598 value of the tag varies depending on the meaning. The following are possible 599 tag values:</p> 600 601 <div class="doc_code"> 602 <pre> 603 DW_TAG_formal_parameter = 5 604 DW_TAG_member = 13 605 DW_TAG_pointer_type = 15 606 DW_TAG_reference_type = 16 607 DW_TAG_typedef = 22 608 DW_TAG_const_type = 38 609 DW_TAG_volatile_type = 53 610 DW_TAG_restrict_type = 55 611 </pre> 612 </div> 613 614 <p><tt>DW_TAG_member</tt> is used to define a member of 615 a <a href="#format_composite_type">composite type</a> 616 or <a href="#format_subprograms">subprogram</a>. The type of the member is 617 the <a href="#format_derived_type">derived 618 type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which 619 is a formal argument of a subprogram.</p> 620 621 <p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p> 622 623 <p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>, 624 <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and 625 <tt>DW_TAG_restrict_type</tt> are used to qualify 626 the <a href="#format_derived_type">derived type</a>. </p> 627 628 <p><a href="#format_derived_type">Derived type</a> location can be determined 629 from the context and line number. The size, alignment and offset are 630 expressed in bits and can be 64 bit values. The alignment is used to round 631 the offset when embedded in a <a href="#format_composite_type">composite 632 type</a> (example to keep float doubles on 64 bit boundaries.) The offset is 633 the bit offset if embedded in a <a href="#format_composite_type">composite 634 type</a>.</p> 635 636 <p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL. 637 </p> 638 639 </div> 640 641 <!-- ======================================================================= --> 642 <h4> 643 <a name="format_composite_type">Composite type descriptors</a> 644 </h4> 645 646 <div> 647 648 <div class="doc_code"> 649 <pre> 650 !6 = metadata !{ 651 i32, ;; Tag (see below) 652 metadata, ;; Reference to context 653 metadata, ;; Name (may be "" for anonymous types) 654 metadata, ;; Reference to file where defined (may be NULL) 655 i32, ;; Line number where defined (may be 0) 656 i64, ;; Size in bits 657 i64, ;; Alignment in bits 658 i64, ;; Offset in bits 659 i32, ;; Flags 660 metadata, ;; Reference to type derived from 661 metadata, ;; Reference to array of member descriptors 662 i32 ;; Runtime languages 663 } 664 </pre> 665 </div> 666 667 <p>These descriptors are used to define types that are composed of 0 or more 668 elements. The value of the tag varies depending on the meaning. The following 669 are possible tag values:</p> 670 671 <div class="doc_code"> 672 <pre> 673 DW_TAG_array_type = 1 674 DW_TAG_enumeration_type = 4 675 DW_TAG_structure_type = 19 676 DW_TAG_union_type = 23 677 DW_TAG_vector_type = 259 678 DW_TAG_subroutine_type = 21 679 DW_TAG_inheritance = 28 680 </pre> 681 </div> 682 683 <p>The vector flag indicates that an array type is a native packed vector.</p> 684 685 <p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types 686 (tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange 687 descriptors</a>, each representing the range of subscripts at that level of 688 indexing.</p> 689 690 <p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are 691 <a href="#format_enumeration">enumerator descriptors</a>, each representing 692 the definition of enumeration value for the set. All enumeration type 693 descriptors are collected inside the named metadata 694 <tt>!llvm.dbg.cu</tt>.</p> 695 696 <p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag 697 = <tt>DW_TAG_union_type</tt>) types are any one of 698 the <a href="#format_basic_type">basic</a>, 699 <a href="#format_derived_type">derived</a> 700 or <a href="#format_composite_type">composite</a> type descriptors, each 701 representing a field member of the structure or union.</p> 702 703 <p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors 704 provide information about base classes, static members and member 705 functions. If a member is a <a href="#format_derived_type">derived type 706 descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type 707 represents a base class. If the member of is 708 a <a href="#format_global_variables">global variable descriptor</a> then it 709 represents a static member. And, if the member is 710 a <a href="#format_subprograms">subprogram descriptor</a> then it represents 711 a member function. For static members and member 712 functions, <tt>getName()</tt> returns the members link or the C++ mangled 713 name. <tt>getDisplayName()</tt> the simplied version of the name.</p> 714 715 <p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type 716 elements is the return type for the subroutine. The remaining elements are 717 the formal arguments to the subroutine.</p> 718 719 <p><a href="#format_composite_type">Composite type</a> location can be 720 determined from the context and line number. The size, alignment and 721 offset are expressed in bits and can be 64 bit values. The alignment is used 722 to round the offset when embedded in 723 a <a href="#format_composite_type">composite type</a> (as an example, to keep 724 float doubles on 64 bit boundaries.) The offset is the bit offset if embedded 725 in a <a href="#format_composite_type">composite type</a>.</p> 726 727 </div> 728 729 <!-- ======================================================================= --> 730 <h4> 731 <a name="format_subrange">Subrange descriptors</a> 732 </h4> 733 734 <div> 735 736 <div class="doc_code"> 737 <pre> 738 !42 = metadata !{ 739 i32, ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type) 740 i64, ;; Low value 741 i64 ;; High value 742 } 743 </pre> 744 </div> 745 746 <p>These descriptors are used to define ranges of array subscripts for an array 747 <a href="#format_composite_type">composite type</a>. The low value defines 748 the lower bounds typically zero for C/C++. The high value is the upper 749 bounds. Values are 64 bit. High - low + 1 is the size of the array. If low 750 > high the array bounds are not included in generated debugging information. 751 </p> 752 753 </div> 754 755 <!-- ======================================================================= --> 756 <h4> 757 <a name="format_enumeration">Enumerator descriptors</a> 758 </h4> 759 760 <div> 761 762 <div class="doc_code"> 763 <pre> 764 !6 = metadata !{ 765 i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> 766 ;; (DW_TAG_enumerator) 767 metadata, ;; Name 768 i64 ;; Value 769 } 770 </pre> 771 </div> 772 773 <p>These descriptors are used to define members of an 774 enumeration <a href="#format_composite_type">composite type</a>, it 775 associates the name to the value.</p> 776 777 </div> 778 779 <!-- ======================================================================= --> 780 <h4> 781 <a name="format_variables">Local variables</a> 782 </h4> 783 784 <div> 785 786 <div class="doc_code"> 787 <pre> 788 !7 = metadata !{ 789 i32, ;; Tag (see below) 790 metadata, ;; Context 791 metadata, ;; Name 792 metadata, ;; Reference to file where defined 793 i32, ;; 24 bit - Line number where defined 794 ;; 8 bit - Argument number. 1 indicates 1st argument. 795 metadata, ;; Type descriptor 796 i32, ;; flags 797 metadata ;; (optional) Reference to inline location 798 } 799 </pre> 800 </div> 801 802 <p>These descriptors are used to define variables local to a sub program. The 803 value of the tag depends on the usage of the variable:</p> 804 805 <div class="doc_code"> 806 <pre> 807 DW_TAG_auto_variable = 256 808 DW_TAG_arg_variable = 257 809 DW_TAG_return_variable = 258 810 </pre> 811 </div> 812 813 <p>An auto variable is any variable declared in the body of the function. An 814 argument variable is any variable that appears as a formal argument to the 815 function. A return variable is used to track the result of a function and 816 has no source correspondent.</p> 817 818 <p>The context is either the subprogram or block where the variable is defined. 819 Name the source variable name. Context and line indicate where the 820 variable was defined. Type descriptor defines the declared type of the 821 variable.</p> 822 823 </div> 824 825 </div> 826 827 <!-- ======================================================================= --> 828 <h3> 829 <a name="format_common_intrinsics">Debugger intrinsic functions</a> 830 </h3> 831 832 <div> 833 834 <p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to 835 provide debug information at various points in generated code.</p> 836 837 <!-- ======================================================================= --> 838 <h4> 839 <a name="format_common_declare">llvm.dbg.declare</a> 840 </h4> 841 842 <div> 843 <pre> 844 void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata) 845 </pre> 846 847 <p>This intrinsic provides information about a local element (e.g., variable). The 848 first argument is metadata holding the alloca for the variable. The 849 second argument is metadata containing a description of the variable.</p> 850 </div> 851 852 <!-- ======================================================================= --> 853 <h4> 854 <a name="format_common_value">llvm.dbg.value</a> 855 </h4> 856 857 <div> 858 <pre> 859 void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata) 860 </pre> 861 862 <p>This intrinsic provides information when a user source variable is set to a 863 new value. The first argument is the new value (wrapped as metadata). The 864 second argument is the offset in the user source variable where the new value 865 is written. The third argument is metadata containing a description of the 866 user source variable.</p> 867 </div> 868 869 </div> 870 871 <!-- ======================================================================= --> 872 <h3> 873 <a name="format_common_lifetime">Object lifetimes and scoping</a> 874 </h3> 875 876 <div> 877 <p>In many languages, the local variables in functions can have their lifetimes 878 or scopes limited to a subset of a function. In the C family of languages, 879 for example, variables are only live (readable and writable) within the 880 source block that they are defined in. In functional languages, values are 881 only readable after they have been defined. Though this is a very obvious 882 concept, it is non-trivial to model in LLVM, because it has no notion of 883 scoping in this sense, and does not want to be tied to a language's scoping 884 rules.</p> 885 886 <p>In order to handle this, the LLVM debug format uses the metadata attached to 887 llvm instructions to encode line number and scoping information. Consider 888 the following C fragment, for example:</p> 889 890 <div class="doc_code"> 891 <pre> 892 1. void foo() { 893 2. int X = 21; 894 3. int Y = 22; 895 4. { 896 5. int Z = 23; 897 6. Z = X; 898 7. } 899 8. X = Y; 900 9. } 901 </pre> 902 </div> 903 904 <p>Compiled to LLVM, this function would be represented like this:</p> 905 906 <div class="doc_code"> 907 <pre> 908 define void @foo() nounwind ssp { 909 entry: 910 %X = alloca i32, align 4 ; <i32*> [#uses=4] 911 %Y = alloca i32, align 4 ; <i32*> [#uses=4] 912 %Z = alloca i32, align 4 ; <i32*> [#uses=3] 913 %0 = bitcast i32* %X to {}* ; <{}*> [#uses=1] 914 call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7 915 store i32 21, i32* %X, !dbg !8 916 %1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1] 917 call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10 918 store i32 22, i32* %Y, !dbg !11 919 %2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1] 920 call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14 921 store i32 23, i32* %Z, !dbg !15 922 %tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1] 923 %tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1] 924 %add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1] 925 store i32 %add, i32* %Z, !dbg !16 926 %tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1] 927 store i32 %tmp2, i32* %X, !dbg !17 928 ret void, !dbg !18 929 } 930 931 declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone 932 933 !0 = metadata !{i32 459008, metadata !1, metadata !"X", 934 metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] 935 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 936 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", 937 metadata !"foo", metadata !3, i32 1, metadata !4, 938 i1 false, i1 true}; [DW_TAG_subprogram ] 939 !3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", 940 metadata !"/private/tmp", metadata !"clang 1.1", i1 true, 941 i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] 942 !4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, 943 i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] 944 !5 = metadata !{null} 945 !6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, 946 i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] 947 !7 = metadata !{i32 2, i32 7, metadata !1, null} 948 !8 = metadata !{i32 2, i32 3, metadata !1, null} 949 !9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, 950 metadata !6}; [ DW_TAG_auto_variable ] 951 !10 = metadata !{i32 3, i32 7, metadata !1, null} 952 !11 = metadata !{i32 3, i32 3, metadata !1, null} 953 !12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, 954 metadata !6}; [ DW_TAG_auto_variable ] 955 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 956 !14 = metadata !{i32 5, i32 9, metadata !13, null} 957 !15 = metadata !{i32 5, i32 5, metadata !13, null} 958 !16 = metadata !{i32 6, i32 5, metadata !13, null} 959 !17 = metadata !{i32 8, i32 3, metadata !1, null} 960 !18 = metadata !{i32 9, i32 1, metadata !2, null} 961 </pre> 962 </div> 963 964 <p>This example illustrates a few important details about LLVM debugging 965 information. In particular, it shows how the <tt>llvm.dbg.declare</tt> 966 intrinsic and location information, which are attached to an instruction, 967 are applied together to allow a debugger to analyze the relationship between 968 statements, variable definitions, and the code used to implement the 969 function.</p> 970 971 <div class="doc_code"> 972 <pre> 973 call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 974 </pre> 975 </div> 976 977 <p>The first intrinsic 978 <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> 979 encodes debugging information for the variable <tt>X</tt>. The metadata 980 <tt>!dbg !7</tt> attached to the intrinsic provides scope information for the 981 variable <tt>X</tt>.</p> 982 983 <div class="doc_code"> 984 <pre> 985 !7 = metadata !{i32 2, i32 7, metadata !1, null} 986 !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] 987 !2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", 988 metadata !"foo", metadata !"foo", metadata !3, i32 1, 989 metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] 990 </pre> 991 </div> 992 993 <p>Here <tt>!7</tt> is metadata providing location information. It has four 994 fields: line number, column number, scope, and original scope. The original 995 scope represents inline location if this instruction is inlined inside a 996 caller, and is null otherwise. In this example, scope is encoded by 997 <tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope 998 <tt>!2</tt>, where <tt>!2</tt> is a 999 <a href="#format_subprograms">subprogram descriptor</a>. This way the 1000 location information attached to the intrinsics indicates that the 1001 variable <tt>X</tt> is declared at line number 2 at a function level scope in 1002 function <tt>foo</tt>.</p> 1003 1004 <p>Now lets take another example.</p> 1005 1006 <div class="doc_code"> 1007 <pre> 1008 call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 1009 </pre> 1010 </div> 1011 1012 <p>The second intrinsic 1013 <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> 1014 encodes debugging information for variable <tt>Z</tt>. The metadata 1015 <tt>!dbg !14</tt> attached to the intrinsic provides scope information for 1016 the variable <tt>Z</tt>.</p> 1017 1018 <div class="doc_code"> 1019 <pre> 1020 !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] 1021 !14 = metadata !{i32 5, i32 9, metadata !13, null} 1022 </pre> 1023 </div> 1024 1025 <p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and 1026 column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope 1027 itself resides inside of lexical scope <tt>!1</tt> described above.</p> 1028 1029 <p>The scope information attached with each instruction provides a 1030 straightforward way to find instructions covered by a scope.</p> 1031 1032 </div> 1033 1034 </div> 1035 1036 <!-- *********************************************************************** --> 1037 <h2> 1038 <a name="ccxx_frontend">C/C++ front-end specific debug information</a> 1039 </h2> 1040 <!-- *********************************************************************** --> 1041 1042 <div> 1043 1044 <p>The C and C++ front-ends represent information about the program in a format 1045 that is effectively identical 1046 to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in 1047 terms of information content. This allows code generators to trivially 1048 support native debuggers by generating standard dwarf information, and 1049 contains enough information for non-dwarf targets to translate it as 1050 needed.</p> 1051 1052 <p>This section describes the forms used to represent C and C++ programs. Other 1053 languages could pattern themselves after this (which itself is tuned to 1054 representing programs in the same way that DWARF 3 does), or they could 1055 choose to provide completely different forms if they don't fit into the DWARF 1056 model. As support for debugging information gets added to the various LLVM 1057 source-language front-ends, the information used should be documented 1058 here.</p> 1059 1060 <p>The following sections provide examples of various C/C++ constructs and the 1061 debug information that would best describe those constructs.</p> 1062 1063 <!-- ======================================================================= --> 1064 <h3> 1065 <a name="ccxx_compile_units">C/C++ source file information</a> 1066 </h3> 1067 1068 <div> 1069 1070 <p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located 1071 in the directory <tt>/Users/mine/sources</tt>, the following code:</p> 1072 1073 <div class="doc_code"> 1074 <pre> 1075 #include "MyHeader.h" 1076 1077 int main(int argc, char *argv[]) { 1078 return 0; 1079 } 1080 </pre> 1081 </div> 1082 1083 <p>a C/C++ front-end would generate the following descriptors:</p> 1084 1085 <div class="doc_code"> 1086 <pre> 1087 ... 1088 ;; 1089 ;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp". 1090 ;; 1091 !2 = metadata !{ 1092 i32 524305, ;; Tag 1093 i32 0, ;; Unused 1094 i32 4, ;; Language Id 1095 metadata !"MySource.cpp", 1096 metadata !"/Users/mine/sources", 1097 metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", 1098 i1 true, ;; Main Compile Unit 1099 i1 false, ;; Optimized compile unit 1100 metadata !"", ;; Compiler flags 1101 i32 0} ;; Runtime version 1102 1103 ;; 1104 ;; Define the file for the file "/Users/mine/sources/MySource.cpp". 1105 ;; 1106 !1 = metadata !{ 1107 i32 524329, ;; Tag 1108 metadata !"MySource.cpp", 1109 metadata !"/Users/mine/sources", 1110 metadata !2 ;; Compile unit 1111 } 1112 1113 ;; 1114 ;; Define the file for the file "/Users/mine/sources/Myheader.h" 1115 ;; 1116 !3 = metadata !{ 1117 i32 524329, ;; Tag 1118 metadata !"Myheader.h" 1119 metadata !"/Users/mine/sources", 1120 metadata !2 ;; Compile unit 1121 } 1122 1123 ... 1124 </pre> 1125 </div> 1126 1127 <p>llvm::Instruction provides easy access to metadata attached with an 1128 instruction. One can extract line number information encoded in LLVM IR 1129 using <tt>Instruction::getMetadata()</tt> and 1130 <tt>DILocation::getLineNumber()</tt>. 1131 <pre> 1132 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction 1133 DILocation Loc(N); // DILocation is in DebugInfo.h 1134 unsigned Line = Loc.getLineNumber(); 1135 StringRef File = Loc.getFilename(); 1136 StringRef Dir = Loc.getDirectory(); 1137 } 1138 </pre> 1139 </div> 1140 1141 <!-- ======================================================================= --> 1142 <h3> 1143 <a name="ccxx_global_variable">C/C++ global variable information</a> 1144 </h3> 1145 1146 <div> 1147 1148 <p>Given an integer global variable declared as follows:</p> 1149 1150 <div class="doc_code"> 1151 <pre> 1152 int MyGlobal = 100; 1153 </pre> 1154 </div> 1155 1156 <p>a C/C++ front-end would generate the following descriptors:</p> 1157 1158 <div class="doc_code"> 1159 <pre> 1160 ;; 1161 ;; Define the global itself. 1162 ;; 1163 %MyGlobal = global int 100 1164 ... 1165 ;; 1166 ;; List of debug info of globals 1167 ;; 1168 !llvm.dbg.cu = !{!0} 1169 1170 ;; Define the compile unit. 1171 !0 = metadata !{ 1172 i32 786449, ;; Tag 1173 i32 0, ;; Context 1174 i32 4, ;; Language 1175 metadata !"foo.cpp", ;; File 1176 metadata !"/Volumes/Data/tmp", ;; Directory 1177 metadata !"clang version 3.1 ", ;; Producer 1178 i1 true, ;; Deprecated field 1179 i1 false, ;; "isOptimized"? 1180 metadata !"", ;; Flags 1181 i32 0, ;; Runtime Version 1182 metadata !1, ;; Enum Types 1183 metadata !1, ;; Retained Types 1184 metadata !1, ;; Subprograms 1185 metadata !3 ;; Global Variables 1186 } ; [ DW_TAG_compile_unit ] 1187 1188 ;; The Array of Global Variables 1189 !3 = metadata !{ 1190 metadata !4 1191 } 1192 1193 !4 = metadata !{ 1194 metadata !5 1195 } 1196 1197 ;; 1198 ;; Define the global variable itself. 1199 ;; 1200 !5 = metadata !{ 1201 i32 786484, ;; Tag 1202 i32 0, ;; Unused 1203 null, ;; Unused 1204 metadata !"MyGlobal", ;; Name 1205 metadata !"MyGlobal", ;; Display Name 1206 metadata !"", ;; Linkage Name 1207 metadata !6, ;; File 1208 i32 1, ;; Line 1209 metadata !7, ;; Type 1210 i32 0, ;; IsLocalToUnit 1211 i32 1, ;; IsDefinition 1212 i32* @MyGlobal ;; LLVM-IR Value 1213 } ; [ DW_TAG_variable ] 1214 1215 ;; 1216 ;; Define the file 1217 ;; 1218 !6 = metadata !{ 1219 i32 786473, ;; Tag 1220 metadata !"foo.cpp", ;; File 1221 metadata !"/Volumes/Data/tmp", ;; Directory 1222 null ;; Unused 1223 } ; [ DW_TAG_file_type ] 1224 1225 ;; 1226 ;; Define the type 1227 ;; 1228 !7 = metadata !{ 1229 i32 786468, ;; Tag 1230 null, ;; Unused 1231 metadata !"int", ;; Name 1232 null, ;; Unused 1233 i32 0, ;; Line 1234 i64 32, ;; Size in Bits 1235 i64 32, ;; Align in Bits 1236 i64 0, ;; Offset 1237 i32 0, ;; Flags 1238 i32 5 ;; Encoding 1239 } ; [ DW_TAG_base_type ] 1240 1241 </pre> 1242 </div> 1243 1244 </div> 1245 1246 <!-- ======================================================================= --> 1247 <h3> 1248 <a name="ccxx_subprogram">C/C++ function information</a> 1249 </h3> 1250 1251 <div> 1252 1253 <p>Given a function declared as follows:</p> 1254 1255 <div class="doc_code"> 1256 <pre> 1257 int main(int argc, char *argv[]) { 1258 return 0; 1259 } 1260 </pre> 1261 </div> 1262 1263 <p>a C/C++ front-end would generate the following descriptors:</p> 1264 1265 <div class="doc_code"> 1266 <pre> 1267 ;; 1268 ;; Define the anchor for subprograms. Note that the second field of the 1269 ;; anchor is 46, which is the same as the tag for subprograms 1270 ;; (46 = DW_TAG_subprogram.) 1271 ;; 1272 !6 = metadata !{ 1273 i32 524334, ;; Tag 1274 i32 0, ;; Unused 1275 metadata !1, ;; Context 1276 metadata !"main", ;; Name 1277 metadata !"main", ;; Display name 1278 metadata !"main", ;; Linkage name 1279 metadata !1, ;; File 1280 i32 1, ;; Line number 1281 metadata !4, ;; Type 1282 i1 false, ;; Is local 1283 i1 true, ;; Is definition 1284 i32 0, ;; Virtuality attribute, e.g. pure virtual function 1285 i32 0, ;; Index into virtual table for C++ methods 1286 i32 0, ;; Type that holds virtual table. 1287 i32 0, ;; Flags 1288 i1 false, ;; True if this function is optimized 1289 Function *, ;; Pointer to llvm::Function 1290 null ;; Function template parameters 1291 } 1292 ;; 1293 ;; Define the subprogram itself. 1294 ;; 1295 define i32 @main(i32 %argc, i8** %argv) { 1296 ... 1297 } 1298 </pre> 1299 </div> 1300 1301 </div> 1302 1303 <!-- ======================================================================= --> 1304 <h3> 1305 <a name="ccxx_basic_types">C/C++ basic types</a> 1306 </h3> 1307 1308 <div> 1309 1310 <p>The following are the basic type descriptors for C/C++ core types:</p> 1311 1312 <!-- ======================================================================= --> 1313 <h4> 1314 <a name="ccxx_basic_type_bool">bool</a> 1315 </h4> 1316 1317 <div> 1318 1319 <div class="doc_code"> 1320 <pre> 1321 !2 = metadata !{ 1322 i32 524324, ;; Tag 1323 metadata !1, ;; Context 1324 metadata !"bool", ;; Name 1325 metadata !1, ;; File 1326 i32 0, ;; Line number 1327 i64 8, ;; Size in Bits 1328 i64 8, ;; Align in Bits 1329 i64 0, ;; Offset in Bits 1330 i32 0, ;; Flags 1331 i32 2 ;; Encoding 1332 } 1333 </pre> 1334 </div> 1335 1336 </div> 1337 1338 <!-- ======================================================================= --> 1339 <h4> 1340 <a name="ccxx_basic_char">char</a> 1341 </h4> 1342 1343 <div> 1344 1345 <div class="doc_code"> 1346 <pre> 1347 !2 = metadata !{ 1348 i32 524324, ;; Tag 1349 metadata !1, ;; Context 1350 metadata !"char", ;; Name 1351 metadata !1, ;; File 1352 i32 0, ;; Line number 1353 i64 8, ;; Size in Bits 1354 i64 8, ;; Align in Bits 1355 i64 0, ;; Offset in Bits 1356 i32 0, ;; Flags 1357 i32 6 ;; Encoding 1358 } 1359 </pre> 1360 </div> 1361 1362 </div> 1363 1364 <!-- ======================================================================= --> 1365 <h4> 1366 <a name="ccxx_basic_unsigned_char">unsigned char</a> 1367 </h4> 1368 1369 <div> 1370 1371 <div class="doc_code"> 1372 <pre> 1373 !2 = metadata !{ 1374 i32 524324, ;; Tag 1375 metadata !1, ;; Context 1376 metadata !"unsigned char", 1377 metadata !1, ;; File 1378 i32 0, ;; Line number 1379 i64 8, ;; Size in Bits 1380 i64 8, ;; Align in Bits 1381 i64 0, ;; Offset in Bits 1382 i32 0, ;; Flags 1383 i32 8 ;; Encoding 1384 } 1385 </pre> 1386 </div> 1387 1388 </div> 1389 1390 <!-- ======================================================================= --> 1391 <h4> 1392 <a name="ccxx_basic_short">short</a> 1393 </h4> 1394 1395 <div> 1396 1397 <div class="doc_code"> 1398 <pre> 1399 !2 = metadata !{ 1400 i32 524324, ;; Tag 1401 metadata !1, ;; Context 1402 metadata !"short int", 1403 metadata !1, ;; File 1404 i32 0, ;; Line number 1405 i64 16, ;; Size in Bits 1406 i64 16, ;; Align in Bits 1407 i64 0, ;; Offset in Bits 1408 i32 0, ;; Flags 1409 i32 5 ;; Encoding 1410 } 1411 </pre> 1412 </div> 1413 1414 </div> 1415 1416 <!-- ======================================================================= --> 1417 <h4> 1418 <a name="ccxx_basic_unsigned_short">unsigned short</a> 1419 </h4> 1420 1421 <div> 1422 1423 <div class="doc_code"> 1424 <pre> 1425 !2 = metadata !{ 1426 i32 524324, ;; Tag 1427 metadata !1, ;; Context 1428 metadata !"short unsigned int", 1429 metadata !1, ;; File 1430 i32 0, ;; Line number 1431 i64 16, ;; Size in Bits 1432 i64 16, ;; Align in Bits 1433 i64 0, ;; Offset in Bits 1434 i32 0, ;; Flags 1435 i32 7 ;; Encoding 1436 } 1437 </pre> 1438 </div> 1439 1440 </div> 1441 1442 <!-- ======================================================================= --> 1443 <h4> 1444 <a name="ccxx_basic_int">int</a> 1445 </h4> 1446 1447 <div> 1448 1449 <div class="doc_code"> 1450 <pre> 1451 !2 = metadata !{ 1452 i32 524324, ;; Tag 1453 metadata !1, ;; Context 1454 metadata !"int", ;; Name 1455 metadata !1, ;; File 1456 i32 0, ;; Line number 1457 i64 32, ;; Size in Bits 1458 i64 32, ;; Align in Bits 1459 i64 0, ;; Offset in Bits 1460 i32 0, ;; Flags 1461 i32 5 ;; Encoding 1462 } 1463 </pre></div> 1464 1465 </div> 1466 1467 <!-- ======================================================================= --> 1468 <h4> 1469 <a name="ccxx_basic_unsigned_int">unsigned int</a> 1470 </h4> 1471 1472 <div> 1473 1474 <div class="doc_code"> 1475 <pre> 1476 !2 = metadata !{ 1477 i32 524324, ;; Tag 1478 metadata !1, ;; Context 1479 metadata !"unsigned int", 1480 metadata !1, ;; File 1481 i32 0, ;; Line number 1482 i64 32, ;; Size in Bits 1483 i64 32, ;; Align in Bits 1484 i64 0, ;; Offset in Bits 1485 i32 0, ;; Flags 1486 i32 7 ;; Encoding 1487 } 1488 </pre> 1489 </div> 1490 1491 </div> 1492 1493 <!-- ======================================================================= --> 1494 <h4> 1495 <a name="ccxx_basic_long_long">long long</a> 1496 </h4> 1497 1498 <div> 1499 1500 <div class="doc_code"> 1501 <pre> 1502 !2 = metadata !{ 1503 i32 524324, ;; Tag 1504 metadata !1, ;; Context 1505 metadata !"long long int", 1506 metadata !1, ;; File 1507 i32 0, ;; Line number 1508 i64 64, ;; Size in Bits 1509 i64 64, ;; Align in Bits 1510 i64 0, ;; Offset in Bits 1511 i32 0, ;; Flags 1512 i32 5 ;; Encoding 1513 } 1514 </pre> 1515 </div> 1516 1517 </div> 1518 1519 <!-- ======================================================================= --> 1520 <h4> 1521 <a name="ccxx_basic_unsigned_long_long">unsigned long long</a> 1522 </h4> 1523 1524 <div> 1525 1526 <div class="doc_code"> 1527 <pre> 1528 !2 = metadata !{ 1529 i32 524324, ;; Tag 1530 metadata !1, ;; Context 1531 metadata !"long long unsigned int", 1532 metadata !1, ;; File 1533 i32 0, ;; Line number 1534 i64 64, ;; Size in Bits 1535 i64 64, ;; Align in Bits 1536 i64 0, ;; Offset in Bits 1537 i32 0, ;; Flags 1538 i32 7 ;; Encoding 1539 } 1540 </pre> 1541 </div> 1542 1543 </div> 1544 1545 <!-- ======================================================================= --> 1546 <h4> 1547 <a name="ccxx_basic_float">float</a> 1548 </h4> 1549 1550 <div> 1551 1552 <div class="doc_code"> 1553 <pre> 1554 !2 = metadata !{ 1555 i32 524324, ;; Tag 1556 metadata !1, ;; Context 1557 metadata !"float", 1558 metadata !1, ;; File 1559 i32 0, ;; Line number 1560 i64 32, ;; Size in Bits 1561 i64 32, ;; Align in Bits 1562 i64 0, ;; Offset in Bits 1563 i32 0, ;; Flags 1564 i32 4 ;; Encoding 1565 } 1566 </pre> 1567 </div> 1568 1569 </div> 1570 1571 <!-- ======================================================================= --> 1572 <h4> 1573 <a name="ccxx_basic_double">double</a> 1574 </h4> 1575 1576 <div> 1577 1578 <div class="doc_code"> 1579 <pre> 1580 !2 = metadata !{ 1581 i32 524324, ;; Tag 1582 metadata !1, ;; Context 1583 metadata !"double",;; Name 1584 metadata !1, ;; File 1585 i32 0, ;; Line number 1586 i64 64, ;; Size in Bits 1587 i64 64, ;; Align in Bits 1588 i64 0, ;; Offset in Bits 1589 i32 0, ;; Flags 1590 i32 4 ;; Encoding 1591 } 1592 </pre> 1593 </div> 1594 1595 </div> 1596 1597 </div> 1598 1599 <!-- ======================================================================= --> 1600 <h3> 1601 <a name="ccxx_derived_types">C/C++ derived types</a> 1602 </h3> 1603 1604 <div> 1605 1606 <p>Given the following as an example of C/C++ derived type:</p> 1607 1608 <div class="doc_code"> 1609 <pre> 1610 typedef const int *IntPtr; 1611 </pre> 1612 </div> 1613 1614 <p>a C/C++ front-end would generate the following descriptors:</p> 1615 1616 <div class="doc_code"> 1617 <pre> 1618 ;; 1619 ;; Define the typedef "IntPtr". 1620 ;; 1621 !2 = metadata !{ 1622 i32 524310, ;; Tag 1623 metadata !1, ;; Context 1624 metadata !"IntPtr", ;; Name 1625 metadata !3, ;; File 1626 i32 0, ;; Line number 1627 i64 0, ;; Size in bits 1628 i64 0, ;; Align in bits 1629 i64 0, ;; Offset in bits 1630 i32 0, ;; Flags 1631 metadata !4 ;; Derived From type 1632 } 1633 1634 ;; 1635 ;; Define the pointer type. 1636 ;; 1637 !4 = metadata !{ 1638 i32 524303, ;; Tag 1639 metadata !1, ;; Context 1640 metadata !"", ;; Name 1641 metadata !1, ;; File 1642 i32 0, ;; Line number 1643 i64 64, ;; Size in bits 1644 i64 64, ;; Align in bits 1645 i64 0, ;; Offset in bits 1646 i32 0, ;; Flags 1647 metadata !5 ;; Derived From type 1648 } 1649 ;; 1650 ;; Define the const type. 1651 ;; 1652 !5 = metadata !{ 1653 i32 524326, ;; Tag 1654 metadata !1, ;; Context 1655 metadata !"", ;; Name 1656 metadata !1, ;; File 1657 i32 0, ;; Line number 1658 i64 32, ;; Size in bits 1659 i64 32, ;; Align in bits 1660 i64 0, ;; Offset in bits 1661 i32 0, ;; Flags 1662 metadata !6 ;; Derived From type 1663 } 1664 ;; 1665 ;; Define the int type. 1666 ;; 1667 !6 = metadata !{ 1668 i32 524324, ;; Tag 1669 metadata !1, ;; Context 1670 metadata !"int", ;; Name 1671 metadata !1, ;; File 1672 i32 0, ;; Line number 1673 i64 32, ;; Size in bits 1674 i64 32, ;; Align in bits 1675 i64 0, ;; Offset in bits 1676 i32 0, ;; Flags 1677 5 ;; Encoding 1678 } 1679 </pre> 1680 </div> 1681 1682 </div> 1683 1684 <!-- ======================================================================= --> 1685 <h3> 1686 <a name="ccxx_composite_types">C/C++ struct/union types</a> 1687 </h3> 1688 1689 <div> 1690 1691 <p>Given the following as an example of C/C++ struct type:</p> 1692 1693 <div class="doc_code"> 1694 <pre> 1695 struct Color { 1696 unsigned Red; 1697 unsigned Green; 1698 unsigned Blue; 1699 }; 1700 </pre> 1701 </div> 1702 1703 <p>a C/C++ front-end would generate the following descriptors:</p> 1704 1705 <div class="doc_code"> 1706 <pre> 1707 ;; 1708 ;; Define basic type for unsigned int. 1709 ;; 1710 !5 = metadata !{ 1711 i32 524324, ;; Tag 1712 metadata !1, ;; Context 1713 metadata !"unsigned int", 1714 metadata !1, ;; File 1715 i32 0, ;; Line number 1716 i64 32, ;; Size in Bits 1717 i64 32, ;; Align in Bits 1718 i64 0, ;; Offset in Bits 1719 i32 0, ;; Flags 1720 i32 7 ;; Encoding 1721 } 1722 ;; 1723 ;; Define composite type for struct Color. 1724 ;; 1725 !2 = metadata !{ 1726 i32 524307, ;; Tag 1727 metadata !1, ;; Context 1728 metadata !"Color", ;; Name 1729 metadata !1, ;; Compile unit 1730 i32 1, ;; Line number 1731 i64 96, ;; Size in bits 1732 i64 32, ;; Align in bits 1733 i64 0, ;; Offset in bits 1734 i32 0, ;; Flags 1735 null, ;; Derived From 1736 metadata !3, ;; Elements 1737 i32 0 ;; Runtime Language 1738 } 1739 1740 ;; 1741 ;; Define the Red field. 1742 ;; 1743 !4 = metadata !{ 1744 i32 524301, ;; Tag 1745 metadata !1, ;; Context 1746 metadata !"Red", ;; Name 1747 metadata !1, ;; File 1748 i32 2, ;; Line number 1749 i64 32, ;; Size in bits 1750 i64 32, ;; Align in bits 1751 i64 0, ;; Offset in bits 1752 i32 0, ;; Flags 1753 metadata !5 ;; Derived From type 1754 } 1755 1756 ;; 1757 ;; Define the Green field. 1758 ;; 1759 !6 = metadata !{ 1760 i32 524301, ;; Tag 1761 metadata !1, ;; Context 1762 metadata !"Green", ;; Name 1763 metadata !1, ;; File 1764 i32 3, ;; Line number 1765 i64 32, ;; Size in bits 1766 i64 32, ;; Align in bits 1767 i64 32, ;; Offset in bits 1768 i32 0, ;; Flags 1769 metadata !5 ;; Derived From type 1770 } 1771 1772 ;; 1773 ;; Define the Blue field. 1774 ;; 1775 !7 = metadata !{ 1776 i32 524301, ;; Tag 1777 metadata !1, ;; Context 1778 metadata !"Blue", ;; Name 1779 metadata !1, ;; File 1780 i32 4, ;; Line number 1781 i64 32, ;; Size in bits 1782 i64 32, ;; Align in bits 1783 i64 64, ;; Offset in bits 1784 i32 0, ;; Flags 1785 metadata !5 ;; Derived From type 1786 } 1787 1788 ;; 1789 ;; Define the array of fields used by the composite type Color. 1790 ;; 1791 !3 = metadata !{metadata !4, metadata !6, metadata !7} 1792 </pre> 1793 </div> 1794 1795 </div> 1796 1797 <!-- ======================================================================= --> 1798 <h3> 1799 <a name="ccxx_enumeration_types">C/C++ enumeration types</a> 1800 </h3> 1801 1802 <div> 1803 1804 <p>Given the following as an example of C/C++ enumeration type:</p> 1805 1806 <div class="doc_code"> 1807 <pre> 1808 enum Trees { 1809 Spruce = 100, 1810 Oak = 200, 1811 Maple = 300 1812 }; 1813 </pre> 1814 </div> 1815 1816 <p>a C/C++ front-end would generate the following descriptors:</p> 1817 1818 <div class="doc_code"> 1819 <pre> 1820 ;; 1821 ;; Define composite type for enum Trees 1822 ;; 1823 !2 = metadata !{ 1824 i32 524292, ;; Tag 1825 metadata !1, ;; Context 1826 metadata !"Trees", ;; Name 1827 metadata !1, ;; File 1828 i32 1, ;; Line number 1829 i64 32, ;; Size in bits 1830 i64 32, ;; Align in bits 1831 i64 0, ;; Offset in bits 1832 i32 0, ;; Flags 1833 null, ;; Derived From type 1834 metadata !3, ;; Elements 1835 i32 0 ;; Runtime language 1836 } 1837 1838 ;; 1839 ;; Define the array of enumerators used by composite type Trees. 1840 ;; 1841 !3 = metadata !{metadata !4, metadata !5, metadata !6} 1842 1843 ;; 1844 ;; Define Spruce enumerator. 1845 ;; 1846 !4 = metadata !{i32 524328, metadata !"Spruce", i64 100} 1847 1848 ;; 1849 ;; Define Oak enumerator. 1850 ;; 1851 !5 = metadata !{i32 524328, metadata !"Oak", i64 200} 1852 1853 ;; 1854 ;; Define Maple enumerator. 1855 ;; 1856 !6 = metadata !{i32 524328, metadata !"Maple", i64 300} 1857 1858 </pre> 1859 </div> 1860 1861 </div> 1862 1863 </div> 1864 1865 1866 <!-- *********************************************************************** --> 1867 <h2> 1868 <a name="llvmdwarfextension">Debugging information format</a> 1869 </h2> 1870 <!-- *********************************************************************** --> 1871 <div> 1872 <!-- ======================================================================= --> 1873 <h3> 1874 <a name="objcproperty">Debugging Information Extension for Objective C Properties</a> 1875 </h3> 1876 <div> 1877 <!-- *********************************************************************** --> 1878 <h4> 1879 <a name="objcpropertyintroduction">Introduction</a> 1880 </h4> 1881 <!-- *********************************************************************** --> 1882 1883 <div> 1884 <p>Objective C provides a simpler way to declare and define accessor methods 1885 using declared properties. The language provides features to declare a 1886 property and to let compiler synthesize accessor methods. 1887 </p> 1888 1889 <p>The debugger lets developer inspect Objective C interfaces and their 1890 instance variables and class variables. However, the debugger does not know 1891 anything about the properties defined in Objective C interfaces. The debugger 1892 consumes information generated by compiler in DWARF format. The format does 1893 not support encoding of Objective C properties. This proposal describes DWARF 1894 extensions to encode Objective C properties, which the debugger can use to let 1895 developers inspect Objective C properties. 1896 </p> 1897 1898 </div> 1899 1900 1901 <!-- *********************************************************************** --> 1902 <h4> 1903 <a name="objcpropertyproposal">Proposal</a> 1904 </h4> 1905 <!-- *********************************************************************** --> 1906 1907 <div> 1908 <p>Objective C properties exist separately from class members. A property 1909 can be defined only by "setter" and "getter" selectors, and 1910 be calculated anew on each access. Or a property can just be a direct access 1911 to some declared ivar. Finally it can have an ivar "automatically 1912 synthesized" for it by the compiler, in which case the property can be 1913 referred to in user code directly using the standard C dereference syntax as 1914 well as through the property "dot" syntax, but there is no entry in 1915 the @interface declaration corresponding to this ivar. 1916 </p> 1917 <p> 1918 To facilitate debugging, these properties we will add a new DWARF TAG into the 1919 DW_TAG_structure_type definition for the class to hold the description of a 1920 given property, and a set of DWARF attributes that provide said description. 1921 The property tag will also contain the name and declared type of the property. 1922 </p> 1923 <p> 1924 If there is a related ivar, there will also be a DWARF property attribute placed 1925 in the DW_TAG_member DIE for that ivar referring back to the property TAG for 1926 that property. And in the case where the compiler synthesizes the ivar directly, 1927 the compiler is expected to generate a DW_TAG_member for that ivar (with the 1928 DW_AT_artificial set to 1), whose name will be the name used to access this 1929 ivar directly in code, and with the property attribute pointing back to the 1930 property it is backing. 1931 </p> 1932 <p> 1933 The following examples will serve as illustration for our discussion: 1934 </p> 1935 1936 <div class="doc_code"> 1937 <pre> 1938 @interface I1 { 1939 int n2; 1940 } 1941 1942 @property int p1; 1943 @property int p2; 1944 @end 1945 1946 @implementation I1 1947 @synthesize p1; 1948 @synthesize p2 = n2; 1949 @end 1950 </pre> 1951 </div> 1952 1953 <p> 1954 This produces the following DWARF (this is a "pseudo dwarfdump" output): 1955 </p> 1956 <div class="doc_code"> 1957 <pre> 1958 0x00000100: TAG_structure_type [7] * 1959 AT_APPLE_runtime_class( 0x10 ) 1960 AT_name( "I1" ) 1961 AT_decl_file( "Objc_Property.m" ) 1962 AT_decl_line( 3 ) 1963 1964 0x00000110 TAG_APPLE_property 1965 AT_name ( "p1" ) 1966 AT_type ( {0x00000150} ( int ) ) 1967 1968 0x00000120: TAG_APPLE_property 1969 AT_name ( "p2" ) 1970 AT_type ( {0x00000150} ( int ) ) 1971 1972 0x00000130: TAG_member [8] 1973 AT_name( "_p1" ) 1974 AT_APPLE_property ( {0x00000110} "p1" ) 1975 AT_type( {0x00000150} ( int ) ) 1976 AT_artificial ( 0x1 ) 1977 1978 0x00000140: TAG_member [8] 1979 AT_name( "n2" ) 1980 AT_APPLE_property ( {0x00000120} "p2" ) 1981 AT_type( {0x00000150} ( int ) ) 1982 1983 0x00000150: AT_type( ( int ) ) 1984 </pre> 1985 </div> 1986 1987 <p> Note, the current convention is that the name of the ivar for an 1988 auto-synthesized property is the name of the property from which it derives with 1989 an underscore prepended, as is shown in the example. 1990 But we actually don't need to know this convention, since we are given the name 1991 of the ivar directly. 1992 </p> 1993 1994 <p> 1995 Also, it is common practice in ObjC to have different property declarations in 1996 the @interface and @implementation - e.g. to provide a read-only property in 1997 the interface,and a read-write interface in the implementation. In that case, 1998 the compiler should emit whichever property declaration will be in force in the 1999 current translation unit. 2000 </p> 2001 2002 <p> Developers can decorate a property with attributes which are encoded using 2003 DW_AT_APPLE_property_attribute. 2004 </p> 2005 2006 <div class="doc_code"> 2007 <pre> 2008 @property (readonly, nonatomic) int pr; 2009 </pre> 2010 </div> 2011 <p> 2012 Which produces a property tag: 2013 <p> 2014 <div class="doc_code"> 2015 <pre> 2016 TAG_APPLE_property [8] 2017 AT_name( "pr" ) 2018 AT_type ( {0x00000147} (int) ) 2019 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 2020 </pre> 2021 </div> 2022 2023 <p> The setter and getter method names are attached to the property using 2024 DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes. 2025 </p> 2026 <div class="doc_code"> 2027 <pre> 2028 @interface I1 2029 @property (setter=myOwnP3Setter:) int p3; 2030 -(void)myOwnP3Setter:(int)a; 2031 @end 2032 2033 @implementation I1 2034 @synthesize p3; 2035 -(void)myOwnP3Setter:(int)a{ } 2036 @end 2037 </pre> 2038 </div> 2039 2040 <p> 2041 The DWARF for this would be: 2042 </p> 2043 <div class="doc_code"> 2044 <pre> 2045 0x000003bd: TAG_structure_type [7] * 2046 AT_APPLE_runtime_class( 0x10 ) 2047 AT_name( "I1" ) 2048 AT_decl_file( "Objc_Property.m" ) 2049 AT_decl_line( 3 ) 2050 2051 0x000003cd TAG_APPLE_property 2052 AT_name ( "p3" ) 2053 AT_APPLE_property_setter ( "myOwnP3Setter:" ) 2054 AT_type( {0x00000147} ( int ) ) 2055 2056 0x000003f3: TAG_member [8] 2057 AT_name( "_p3" ) 2058 AT_type ( {0x00000147} ( int ) ) 2059 AT_APPLE_property ( {0x000003cd} ) 2060 AT_artificial ( 0x1 ) 2061 </pre> 2062 </div> 2063 2064 </div> 2065 2066 <!-- *********************************************************************** --> 2067 <h4> 2068 <a name="objcpropertynewtags">New DWARF Tags</a> 2069 </h4> 2070 <!-- *********************************************************************** --> 2071 2072 <div> 2073 <table border="1" cellspacing="0"> 2074 <col width="200"> 2075 <col width="200"> 2076 <tr> 2077 <th>TAG</th> 2078 <th>Value</th> 2079 </tr> 2080 <tr> 2081 <td>DW_TAG_APPLE_property</td> 2082 <td>0x4200</td> 2083 </tr> 2084 </table> 2085 2086 </div> 2087 2088 <!-- *********************************************************************** --> 2089 <h4> 2090 <a name="objcpropertynewattributes">New DWARF Attributes</a> 2091 </h4> 2092 <!-- *********************************************************************** --> 2093 2094 <div> 2095 <table border="1" cellspacing="0"> 2096 <col width="200"> 2097 <col width="200"> 2098 <col width="200"> 2099 <tr> 2100 <th>Attribute</th> 2101 <th>Value</th> 2102 <th>Classes</th> 2103 </tr> 2104 <tr> 2105 <td>DW_AT_APPLE_property</td> 2106 <td>0x3fed</td> 2107 <td>Reference</td> 2108 </tr> 2109 <tr> 2110 <td>DW_AT_APPLE_property_getter</td> 2111 <td>0x3fe9</td> 2112 <td>String</td> 2113 </tr> 2114 <tr> 2115 <td>DW_AT_APPLE_property_setter</td> 2116 <td>0x3fea</td> 2117 <td>String</td> 2118 </tr> 2119 <tr> 2120 <td>DW_AT_APPLE_property_attribute</td> 2121 <td>0x3feb</td> 2122 <td>Constant</td> 2123 </tr> 2124 </table> 2125 2126 </div> 2127 2128 <!-- *********************************************************************** --> 2129 <h4> 2130 <a name="objcpropertynewconstants">New DWARF Constants</a> 2131 </h4> 2132 <!-- *********************************************************************** --> 2133 2134 <div> 2135 <table border="1" cellspacing="0"> 2136 <col width="200"> 2137 <col width="200"> 2138 <tr> 2139 <th>Name</th> 2140 <th>Value</th> 2141 </tr> 2142 <tr> 2143 <td>DW_AT_APPLE_PROPERTY_readonly</td> 2144 <td>0x1</td> 2145 </tr> 2146 <tr> 2147 <td>DW_AT_APPLE_PROPERTY_readwrite</td> 2148 <td>0x2</td> 2149 </tr> 2150 <tr> 2151 <td>DW_AT_APPLE_PROPERTY_assign</td> 2152 <td>0x4</td> 2153 </tr> 2154 <tr> 2155 <td>DW_AT_APPLE_PROPERTY_retain</td> 2156 <td>0x8</td> 2157 </tr> 2158 <tr> 2159 <td>DW_AT_APPLE_PROPERTY_copy</td> 2160 <td>0x10</td> 2161 </tr> 2162 <tr> 2163 <td>DW_AT_APPLE_PROPERTY_nonatomic</td> 2164 <td>0x20</td> 2165 </tr> 2166 </table> 2167 2168 </div> 2169 </div> 2170 2171 <!-- ======================================================================= --> 2172 <h3> 2173 <a name="acceltable">Name Accelerator Tables</a> 2174 </h3> 2175 <!-- ======================================================================= --> 2176 <div> 2177 <!-- ======================================================================= --> 2178 <h4> 2179 <a name="acceltableintroduction">Introduction</a> 2180 </h4> 2181 <!-- ======================================================================= --> 2182 <div> 2183 <p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger 2184 needs. The "pub" in the section name indicates that the entries in the 2185 table are publicly visible names only. This means no static or hidden 2186 functions show up in the .debug_pubnames. No static variables or private class 2187 variables are in the .debug_pubtypes. Many compilers add different things to 2188 these tables, so we can't rely upon the contents between gcc, icc, or clang.</p> 2189 2190 <p>The typical query given by users tends not to match up with the contents of 2191 these tables. For example, the DWARF spec states that "In the case of the 2192 name of a function member or static data member of a C++ structure, class or 2193 union, the name presented in the .debug_pubnames section is not the simple 2194 name given by the DW_AT_name attribute of the referenced debugging information 2195 entry, but rather the fully qualified name of the data or function member." 2196 So the only names in these tables for complex C++ entries is a fully 2197 qualified name. Debugger users tend not to enter their search strings as 2198 "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So 2199 the name entered in the name table must be demangled in order to chop it up 2200 appropriately and additional names must be manually entered into the table 2201 to make it effective as a name lookup table for debuggers to use.</p> 2202 2203 <p>All debuggers currently ignore the .debug_pubnames table as a result of 2204 its inconsistent and useless public-only name content making it a waste of 2205 space in the object file. These tables, when they are written to disk, are 2206 not sorted in any way, leaving every debugger to do its own parsing 2207 and sorting. These tables also include an inlined copy of the string values 2208 in the table itself making the tables much larger than they need to be on 2209 disk, especially for large C++ programs.</p> 2210 2211 <p>Can't we just fix the sections by adding all of the names we need to this 2212 table? No, because that is not what the tables are defined to contain and we 2213 won't know the difference between the old bad tables and the new good tables. 2214 At best we could make our own renamed sections that contain all of the data 2215 we need.</p> 2216 2217 <p>These tables are also insufficient for what a debugger like LLDB needs. 2218 LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is 2219 then often asked to look for type "foo" or namespace "bar", or list items in 2220 namespace "baz". Namespaces are not included in the pubnames or pubtypes 2221 tables. Since clang asks a lot of questions when it is parsing an expression, 2222 we need to be very fast when looking up names, as it happens a lot. Having new 2223 accelerator tables that are optimized for very quick lookups will benefit 2224 this type of debugging experience greatly.</p> 2225 2226 <p>We would like to generate name lookup tables that can be mapped into 2227 memory from disk, and used as is, with little or no up-front parsing. We would 2228 also be able to control the exact content of these different tables so they 2229 contain exactly what we need. The Name Accelerator Tables were designed 2230 to fix these issues. In order to solve these issues we need to:</p> 2231 2232 <ul> 2233 <li>Have a format that can be mapped into memory from disk and used as is</li> 2234 <li>Lookups should be very fast</li> 2235 <li>Extensible table format so these tables can be made by many producers</li> 2236 <li>Contain all of the names needed for typical lookups out of the box</li> 2237 <li>Strict rules for the contents of tables</li> 2238 </ul> 2239 2240 <p>Table size is important and the accelerator table format should allow the 2241 reuse of strings from common string tables so the strings for the names are 2242 not duplicated. We also want to make sure the table is ready to be used as-is 2243 by simply mapping the table into memory with minimal header parsing.</p> 2244 2245 <p>The name lookups need to be fast and optimized for the kinds of lookups 2246 that debuggers tend to do. Optimally we would like to touch as few parts of 2247 the mapped table as possible when doing a name lookup and be able to quickly 2248 find the name entry we are looking for, or discover there are no matches. In 2249 the case of debuggers we optimized for lookups that fail most of the time.</p> 2250 2251 <p>Each table that is defined should have strict rules on exactly what is in 2252 the accelerator tables and documented so clients can rely on the content.</p> 2253 2254 </div> 2255 2256 <!-- ======================================================================= --> 2257 <h4> 2258 <a name="acceltablehashes">Hash Tables</a> 2259 </h4> 2260 <!-- ======================================================================= --> 2261 2262 <div> 2263 <h5>Standard Hash Tables</h5> 2264 2265 <p>Typical hash tables have a header, buckets, and each bucket points to the 2266 bucket contents: 2267 </p> 2268 2269 <div class="doc_code"> 2270 <pre> 2271 .------------. 2272 | HEADER | 2273 |------------| 2274 | BUCKETS | 2275 |------------| 2276 | DATA | 2277 `------------' 2278 </pre> 2279 </div> 2280 2281 <p>The BUCKETS are an array of offsets to DATA for each hash:</p> 2282 2283 <div class="doc_code"> 2284 <pre> 2285 .------------. 2286 | 0x00001000 | BUCKETS[0] 2287 | 0x00002000 | BUCKETS[1] 2288 | 0x00002200 | BUCKETS[2] 2289 | 0x000034f0 | BUCKETS[3] 2290 | | ... 2291 | 0xXXXXXXXX | BUCKETS[n_buckets] 2292 '------------' 2293 </pre> 2294 </div> 2295 2296 <p>So for bucket[3] in the example above, we have an offset into the table 2297 0x000034f0 which points to a chain of entries for the bucket. Each bucket 2298 must contain a next pointer, full 32 bit hash value, the string itself, 2299 and the data for the current string value.</p> 2300 2301 <div class="doc_code"> 2302 <pre> 2303 .------------. 2304 0x000034f0: | 0x00003500 | next pointer 2305 | 0x12345678 | 32 bit hash 2306 | "erase" | string value 2307 | data[n] | HashData for this bucket 2308 |------------| 2309 0x00003500: | 0x00003550 | next pointer 2310 | 0x29273623 | 32 bit hash 2311 | "dump" | string value 2312 | data[n] | HashData for this bucket 2313 |------------| 2314 0x00003550: | 0x00000000 | next pointer 2315 | 0x82638293 | 32 bit hash 2316 | "main" | string value 2317 | data[n] | HashData for this bucket 2318 `------------' 2319 </pre> 2320 </div> 2321 2322 <p>The problem with this layout for debuggers is that we need to optimize for 2323 the negative lookup case where the symbol we're searching for is not present. 2324 So if we were to lookup "printf" in the table above, we would make a 32 hash 2325 for "printf", it might match bucket[3]. We would need to go to the offset 2326 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we 2327 need to read the next pointer, then read the hash, compare it, and skip to 2328 the next bucket. Each time we are skipping many bytes in memory and touching 2329 new cache pages just to do the compare on the full 32 bit hash. All of these 2330 accesses then tell us that we didn't have a match.</p> 2331 2332 <h5>Name Hash Tables</h5> 2333 2334 <p>To solve the issues mentioned above we have structured the hash tables 2335 a bit differently: a header, buckets, an array of all unique 32 bit hash 2336 values, followed by an array of hash value data offsets, one for each hash 2337 value, then the data for all hash values:</p> 2338 2339 <div class="doc_code"> 2340 <pre> 2341 .-------------. 2342 | HEADER | 2343 |-------------| 2344 | BUCKETS | 2345 |-------------| 2346 | HASHES | 2347 |-------------| 2348 | OFFSETS | 2349 |-------------| 2350 | DATA | 2351 `-------------' 2352 </pre> 2353 </div> 2354 2355 <p>The BUCKETS in the name tables are an index into the HASHES array. By 2356 making all of the full 32 bit hash values contiguous in memory, we allow 2357 ourselves to efficiently check for a match while touching as little 2358 memory as possible. Most often checking the 32 bit hash values is as far as 2359 the lookup goes. If it does match, it usually is a match with no collisions. 2360 So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash 2361 values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p> 2362 2363 <div class="doc_code"> 2364 <pre> 2365 .-------------------------. 2366 | HEADER.magic | uint32_t 2367 | HEADER.version | uint16_t 2368 | HEADER.hash_function | uint16_t 2369 | HEADER.bucket_count | uint32_t 2370 | HEADER.hashes_count | uint32_t 2371 | HEADER.header_data_len | uint32_t 2372 | HEADER_DATA | HeaderData 2373 |-------------------------| 2374 | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes 2375 |-------------------------| 2376 | HASHES | uint32_t[n_buckets] // 32 bit hash values 2377 |-------------------------| 2378 | OFFSETS | uint32_t[n_buckets] // 32 bit offsets to hash value data 2379 |-------------------------| 2380 | ALL HASH DATA | 2381 `-------------------------' 2382 </pre> 2383 </div> 2384 2385 <p>So taking the exact same data from the standard hash example above we end up 2386 with:</p> 2387 2388 <div class="doc_code"> 2389 <pre> 2390 .------------. 2391 | HEADER | 2392 |------------| 2393 | 0 | BUCKETS[0] 2394 | 2 | BUCKETS[1] 2395 | 5 | BUCKETS[2] 2396 | 6 | BUCKETS[3] 2397 | | ... 2398 | ... | BUCKETS[n_buckets] 2399 |------------| 2400 | 0x........ | HASHES[0] 2401 | 0x........ | HASHES[1] 2402 | 0x........ | HASHES[2] 2403 | 0x........ | HASHES[3] 2404 | 0x........ | HASHES[4] 2405 | 0x........ | HASHES[5] 2406 | 0x12345678 | HASHES[6] hash for BUCKETS[3] 2407 | 0x29273623 | HASHES[7] hash for BUCKETS[3] 2408 | 0x82638293 | HASHES[8] hash for BUCKETS[3] 2409 | 0x........ | HASHES[9] 2410 | 0x........ | HASHES[10] 2411 | 0x........ | HASHES[11] 2412 | 0x........ | HASHES[12] 2413 | 0x........ | HASHES[13] 2414 | 0x........ | HASHES[n_hashes] 2415 |------------| 2416 | 0x........ | OFFSETS[0] 2417 | 0x........ | OFFSETS[1] 2418 | 0x........ | OFFSETS[2] 2419 | 0x........ | OFFSETS[3] 2420 | 0x........ | OFFSETS[4] 2421 | 0x........ | OFFSETS[5] 2422 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 2423 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 2424 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 2425 | 0x........ | OFFSETS[9] 2426 | 0x........ | OFFSETS[10] 2427 | 0x........ | OFFSETS[11] 2428 | 0x........ | OFFSETS[12] 2429 | 0x........ | OFFSETS[13] 2430 | 0x........ | OFFSETS[n_hashes] 2431 |------------| 2432 | | 2433 | | 2434 | | 2435 | | 2436 | | 2437 |------------| 2438 0x000034f0: | 0x00001203 | .debug_str ("erase") 2439 | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 2440 | 0x........ | HashData[0] 2441 | 0x........ | HashData[1] 2442 | 0x........ | HashData[2] 2443 | 0x........ | HashData[3] 2444 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2445 |------------| 2446 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 2447 | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 2448 | 0x........ | HashData[0] 2449 | 0x........ | HashData[1] 2450 | 0x00001203 | String offset into .debug_str ("dump") 2451 | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 2452 | 0x........ | HashData[0] 2453 | 0x........ | HashData[1] 2454 | 0x........ | HashData[2] 2455 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2456 |------------| 2457 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 2458 | 0x00000009 | A 32 bit array count - number of HashData with name "main" 2459 | 0x........ | HashData[0] 2460 | 0x........ | HashData[1] 2461 | 0x........ | HashData[2] 2462 | 0x........ | HashData[3] 2463 | 0x........ | HashData[4] 2464 | 0x........ | HashData[5] 2465 | 0x........ | HashData[6] 2466 | 0x........ | HashData[7] 2467 | 0x........ | HashData[8] 2468 | 0x00000000 | String offset into .debug_str (terminate data for hash) 2469 `------------' 2470 </pre> 2471 </div> 2472 2473 <p>So we still have all of the same data, we just organize it more efficiently 2474 for debugger lookup. If we repeat the same "printf" lookup from above, we 2475 would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash 2476 value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index 2477 into the HASHES table. We would then compare any consecutive 32 bit hashes 2478 values in the HASHES array as long as the hashes would be in BUCKETS[3]. We 2479 do this by verifying that each subsequent hash value modulo n_buckets is still 2480 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and 2481 then compare a few consecutive 32 bit hashes before we know that we have no match. 2482 We don't end up marching through multiple words of memory and we really keep the 2483 number of processor data cache lines being accessed as small as possible.</p> 2484 2485 <p>The string hash that is used for these lookup tables is the Daniel J. 2486 Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very 2487 good hash for all kinds of names in programs with very few hash collisions.</p> 2488 2489 <p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p> 2490 </div> 2491 2492 <!-- ======================================================================= --> 2493 <h4> 2494 <a name="acceltabledetails">Details</a> 2495 </h4> 2496 <!-- ======================================================================= --> 2497 <div> 2498 <p>These name hash tables are designed to be generic where specializations of 2499 the table get to define additional data that goes into the header 2500 ("HeaderData"), how the string value is stored ("KeyType") and the content 2501 of the data for each hash value.</p> 2502 2503 <h5>Header Layout</h5> 2504 <p>The header has a fixed part, and the specialized part. The exact format of 2505 the header is:</p> 2506 <div class="doc_code"> 2507 <pre> 2508 struct Header 2509 { 2510 uint32_t magic; // 'HASH' magic value to allow endian detection 2511 uint16_t version; // Version number 2512 uint16_t hash_function; // The hash function enumeration that was used 2513 uint32_t bucket_count; // The number of buckets in this hash table 2514 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 2515 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 2516 // Specifically the length of the following HeaderData field - this does not 2517 // include the size of the preceding fields 2518 HeaderData header_data; // Implementation specific header data 2519 }; 2520 </pre> 2521 </div> 2522 <p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as 2523 an ASCII integer. This allows the detection of the start of the hash table and 2524 also allows the table's byte order to be determined so the table can be 2525 correctly extracted. The "magic" value is followed by a 16 bit version number 2526 which allows the table to be revised and modified in the future. The current 2527 version number is 1. "hash_function" is a uint16_t enumeration that specifies 2528 which hash function was used to produce this table. The current values for the 2529 hash function enumerations include:</p> 2530 <div class="doc_code"> 2531 <pre> 2532 enum HashFunctionType 2533 { 2534 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 2535 }; 2536 </pre> 2537 </div> 2538 <p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets 2539 are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash 2540 values that are in the HASHES array, and is the same number of offsets are 2541 contained in the OFFSETS array. "header_data_len" specifies the size in 2542 bytes of the HeaderData that is filled in by specialized versions of this 2543 table.</p> 2544 2545 <h5>Fixed Lookup</h5> 2546 <p>The header is followed by the buckets, hashes, offsets, and hash value 2547 data. 2548 <div class="doc_code"> 2549 <pre> 2550 struct FixedTable 2551 { 2552 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 2553 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 2554 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 2555 }; 2556 </pre> 2557 </div> 2558 <p>"buckets" is an array of 32 bit indexes into the "hashes" array. The 2559 "hashes" array contains all of the 32 bit hash values for all names in the 2560 hash table. Each hash in the "hashes" table has an offset in the "offsets" 2561 array that points to the data for the hash value.</p> 2562 2563 <p>This table setup makes it very easy to repurpose these tables to contain 2564 different data, while keeping the lookup mechanism the same for all tables. 2565 This layout also makes it possible to save the table to disk and map it in 2566 later and do very efficient name lookups with little or no parsing.</p> 2567 2568 <p>DWARF lookup tables can be implemented in a variety of ways and can store 2569 a lot of information for each name. We want to make the DWARF tables 2570 extensible and able to store the data efficiently so we have used some of the 2571 DWARF features that enable efficient data storage to define exactly what kind 2572 of data we store for each name.</p> 2573 2574 <p>The "HeaderData" contains a definition of the contents of each HashData 2575 chunk. We might want to store an offset to all of the debug information 2576 entries (DIEs) for each name. To keep things extensible, we create a list of 2577 items, or Atoms, that are contained in the data for each name. First comes the 2578 type of the data in each atom:</p> 2579 <div class="doc_code"> 2580 <pre> 2581 enum AtomType 2582 { 2583 eAtomTypeNULL = 0u, 2584 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 2585 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 2586 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 2587 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 2588 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 2589 }; 2590 </pre> 2591 </div> 2592 <p>The enumeration values and their meanings are:</p> 2593 <div class="doc_code"> 2594 <pre> 2595 eAtomTypeNULL - a termination atom that specifies the end of the atom list 2596 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 2597 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 2598 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 2599 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 2600 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 2601 </pre> 2602 </div> 2603 <p>Then we allow each atom type to define the atom type and how the data for 2604 each atom type data is encoded:</p> 2605 <div class="doc_code"> 2606 <pre> 2607 struct Atom 2608 { 2609 uint16_t type; // AtomType enum value 2610 uint16_t form; // DWARF DW_FORM_XXX defines 2611 }; 2612 </pre> 2613 </div> 2614 <p>The "form" type above is from the DWARF specification and defines the 2615 exact encoding of the data for the Atom type. See the DWARF specification for 2616 the DW_FORM_ definitions.</p> 2617 <div class="doc_code"> 2618 <pre> 2619 struct HeaderData 2620 { 2621 uint32_t die_offset_base; 2622 uint32_t atom_count; 2623 Atoms atoms[atom_count0]; 2624 }; 2625 </pre> 2626 </div> 2627 <p>"HeaderData" defines the base DIE offset that should be added to any atoms 2628 that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, 2629 DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in 2630 each "HashData" object -- Atom.form tells us how large each field will be in 2631 the HashData and the Atom.type tells us how this data should be interpreted.</p> 2632 2633 <p>For the current implementations of the ".apple_names" (all functions + globals), 2634 the ".apple_types" (names of all types that are defined), and the 2635 ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p> 2636 <div class="doc_code"> 2637 <pre> 2638 HeaderData.atom_count = 1; 2639 HeaderData.atoms[0].type = eAtomTypeDIEOffset; 2640 HeaderData.atoms[0].form = DW_FORM_data4; 2641 </pre> 2642 </div> 2643 <p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 2644 encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 2645 multiple matching DIEs in a single file, which could come up with an inlined 2646 function for instance. Future tables could include more information about the 2647 DIE such as flags indicating if the DIE is a function, method, block, 2648 or inlined.</p> 2649 2650 <p>The KeyType for the DWARF table is a 32 bit string table offset into the 2651 ".debug_str" table. The ".debug_str" is the string table for the DWARF which 2652 may already contain copies of all of the strings. This helps make sure, with 2653 help from the compiler, that we reuse the strings between all of the DWARF 2654 sections and keeps the hash table size down. Another benefit to having the 2655 compiler generate all strings as DW_FORM_strp in the debug info, is that 2656 DWARF parsing can be made much faster.</p> 2657 2658 <p>After a lookup is made, we get an offset into the hash data. The hash data 2659 needs to be able to deal with 32 bit hash collisions, so the chunk of data 2660 at the offset in the hash data consists of a triple:</p> 2661 <div class="doc_code"> 2662 <pre> 2663 uint32_t str_offset 2664 uint32_t hash_data_count 2665 HashData[hash_data_count] 2666 </pre> 2667 </div> 2668 <p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the 2669 hash data chunks contain a single item (no 32 bit hash collision):</p> 2670 <div class="doc_code"> 2671 <pre> 2672 .------------. 2673 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2674 | 0x00000004 | uint32_t HashData count 2675 | 0x........ | uint32_t HashData[0] DIE offset 2676 | 0x........ | uint32_t HashData[1] DIE offset 2677 | 0x........ | uint32_t HashData[2] DIE offset 2678 | 0x........ | uint32_t HashData[3] DIE offset 2679 | 0x00000000 | uint32_t KeyType (end of hash chain) 2680 `------------' 2681 </pre> 2682 </div> 2683 <p>If there are collisions, you will have multiple valid string offsets:</p> 2684 <div class="doc_code"> 2685 <pre> 2686 .------------. 2687 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 2688 | 0x00000004 | uint32_t HashData count 2689 | 0x........ | uint32_t HashData[0] DIE offset 2690 | 0x........ | uint32_t HashData[1] DIE offset 2691 | 0x........ | uint32_t HashData[2] DIE offset 2692 | 0x........ | uint32_t HashData[3] DIE offset 2693 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 2694 | 0x00000002 | uint32_t HashData count 2695 | 0x........ | uint32_t HashData[0] DIE offset 2696 | 0x........ | uint32_t HashData[1] DIE offset 2697 | 0x00000000 | uint32_t KeyType (end of hash chain) 2698 `------------' 2699 </pre> 2700 </div> 2701 <p>Current testing with real world C++ binaries has shown that there is around 1 2702 32 bit hash collision per 100,000 name entries.</p> 2703 </div> 2704 <!-- ======================================================================= --> 2705 <h4> 2706 <a name="acceltablecontents">Contents</a> 2707 </h4> 2708 <!-- ======================================================================= --> 2709 <div> 2710 <p>As we said, we want to strictly define exactly what is included in the 2711 different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types", 2712 and ".apple_namespaces".</p> 2713 2714 <p>".apple_names" sections should contain an entry for each DWARF DIE whose 2715 DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that 2716 has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or 2717 DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr 2718 in the location (global and static variables). All global and static variables 2719 should be included, including those scoped withing functions and classes. For 2720 example using the following code:</p> 2721 <div class="doc_code"> 2722 <pre> 2723 static int var = 0; 2724 2725 void f () 2726 { 2727 static int var = 0; 2728 } 2729 </pre> 2730 </div> 2731 <p>Both of the static "var" variables would be included in the table. All 2732 functions should emit both their full names and their basenames. For C or C++, 2733 the full name is the mangled name (if available) which is usually in the 2734 DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function 2735 basename. If global or static variables have a mangled name in a 2736 DW_AT_MIPS_linkage_name attribute, this should be emitted along with the 2737 simple name found in the DW_AT_name attribute.</p> 2738 2739 <p>".apple_types" sections should contain an entry for each DWARF DIE whose 2740 tag is one of:</p> 2741 <ul> 2742 <li>DW_TAG_array_type</li> 2743 <li>DW_TAG_class_type</li> 2744 <li>DW_TAG_enumeration_type</li> 2745 <li>DW_TAG_pointer_type</li> 2746 <li>DW_TAG_reference_type</li> 2747 <li>DW_TAG_string_type</li> 2748 <li>DW_TAG_structure_type</li> 2749 <li>DW_TAG_subroutine_type</li> 2750 <li>DW_TAG_typedef</li> 2751 <li>DW_TAG_union_type</li> 2752 <li>DW_TAG_ptr_to_member_type</li> 2753 <li>DW_TAG_set_type</li> 2754 <li>DW_TAG_subrange_type</li> 2755 <li>DW_TAG_base_type</li> 2756 <li>DW_TAG_const_type</li> 2757 <li>DW_TAG_constant</li> 2758 <li>DW_TAG_file_type</li> 2759 <li>DW_TAG_namelist</li> 2760 <li>DW_TAG_packed_type</li> 2761 <li>DW_TAG_volatile_type</li> 2762 <li>DW_TAG_restrict_type</li> 2763 <li>DW_TAG_interface_type</li> 2764 <li>DW_TAG_unspecified_type</li> 2765 <li>DW_TAG_shared_type</li> 2766 </ul> 2767 <p>Only entries with a DW_AT_name attribute are included, and the entry must 2768 not be a forward declaration (DW_AT_declaration attribute with a non-zero value). 2769 For example, using the following code:</p> 2770 <div class="doc_code"> 2771 <pre> 2772 int main () 2773 { 2774 int *b = 0; 2775 return *b; 2776 } 2777 </pre> 2778 </div> 2779 <p>We get a few type DIEs:</p> 2780 <div class="doc_code"> 2781 <pre> 2782 0x00000067: TAG_base_type [5] 2783 AT_encoding( DW_ATE_signed ) 2784 AT_name( "int" ) 2785 AT_byte_size( 0x04 ) 2786 2787 0x0000006e: TAG_pointer_type [6] 2788 AT_type( {0x00000067} ( int ) ) 2789 AT_byte_size( 0x08 ) 2790 </pre> 2791 </div> 2792 <p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p> 2793 2794 <p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If 2795 we run into a namespace that has no name this is an anonymous namespace, 2796 and the name should be output as "(anonymous namespace)" (without the quotes). 2797 Why? This matches the output of the abi::cxa_demangle() that is in the standard 2798 C++ library that demangles mangled names.</p> 2799 </div> 2800 2801 <!-- ======================================================================= --> 2802 <h4> 2803 <a name="acceltableextensions">Language Extensions and File Format Changes</a> 2804 </h4> 2805 <!-- ======================================================================= --> 2806 <div> 2807 <h5>Objective-C Extensions</h5> 2808 <p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an 2809 Objective-C class. The name used in the hash table is the name of the 2810 Objective-C class itself. If the Objective-C class has a category, then an 2811 entry is made for both the class name without the category, and for the class 2812 name with the category. So if we have a DIE at offset 0x1234 with a name 2813 of method "-[NSString(my_additions) stringWithSpecialString:]", we would add 2814 an entry for "NSString" that points to DIE 0x1234, and an entry for 2815 "NSString(my_additions)" that points to 0x1234. This allows us to quickly 2816 track down all Objective-C methods for an Objective-C class when doing 2817 expressions. It is needed because of the dynamic nature of Objective-C where 2818 anyone can add methods to a class. The DWARF for Objective-C methods is also 2819 emitted differently from C++ classes where the methods are not usually 2820 contained in the class definition, they are scattered about across one or more 2821 compile units. Categories can also be defined in different shared libraries. 2822 So we need to be able to quickly find all of the methods and class functions 2823 given the Objective-C class name, or quickly find all methods and class 2824 functions for a class + category name. This table does not contain any selector 2825 names, it just maps Objective-C class names (or class names + category) to all 2826 of the methods and class functions. The selectors are added as function 2827 basenames in the .debug_names section.</p> 2828 2829 <p>In the ".apple_names" section for Objective-C functions, the full name is the 2830 entire function name with the brackets ("-[NSString stringWithCString:]") and the 2831 basename is the selector only ("stringWithCString:").</p> 2832 2833 <h5>Mach-O Changes</h5> 2834 <p>The sections names for the apple hash tables are for non mach-o files. For 2835 mach-o files, the sections should be contained in the "__DWARF" segment with 2836 names as follows:</p> 2837 <ul> 2838 <li>".apple_names" -> "__apple_names"</li> 2839 <li>".apple_types" -> "__apple_types"</li> 2840 <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li> 2841 <li> ".apple_objc" -> "__apple_objc"</li> 2842 </ul> 2843 </div> 2844 </div> 2845 </div> 2846 2847 <!-- *********************************************************************** --> 2848 2849 <hr> 2850 <address> 2851 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 2852 src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> 2853 <a href="http://validator.w3.org/check/referer"><img 2854 src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> 2855 2856 <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br> 2857 <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> 2858 Last modified: $Date$ 2859 </address> 2860 2861 </body> 2862 </html> 2863