1 .. role:: raw-html(raw) 2 :format: html 3 4 ================================= 5 LLVM Code Coverage Mapping Format 6 ================================= 7 8 .. contents:: 9 :local: 10 11 Introduction 12 ============ 13 14 LLVM's code coverage mapping format is used to provide code coverage 15 analysis using LLVM's and Clang's instrumenation based profiling 16 (Clang's ``-fprofile-instr-generate`` option). 17 18 This document is aimed at those who use LLVM's code coverage mapping to provide 19 code coverage analysis for their own programs, and for those who would like 20 to know how it works under the hood. A prior knowledge of how Clang's profile 21 guided optimization works is useful, but not required. 22 23 We start by showing how to use LLVM and Clang for code coverage analysis, 24 then we briefly desribe LLVM's code coverage mapping format and the 25 way that Clang and LLVM's code coverage tool work with this format. After 26 the basics are down, more advanced features of the coverage mapping format 27 are discussed - such as the data structures, LLVM IR representation and 28 the binary encoding. 29 30 Quick Start 31 =========== 32 33 Here's a short story that describes how to generate code coverage overview 34 for a sample source file called *test.c*. 35 36 * First, compile an instrumented version of your program using Clang's 37 ``-fprofile-instr-generate`` option with the additional ``-fcoverage-mapping`` 38 option: 39 40 ``clang -o test -fprofile-instr-generate -fcoverage-mapping test.c`` 41 * Then, run the instrumented binary. The runtime will produce a file called 42 *default.profraw* containing the raw profile instrumentation data: 43 44 ``./test`` 45 * After that, merge the profile data using the *llvm-profdata* tool: 46 47 ``llvm-profdata merge -o test.profdata default.profraw`` 48 * Finally, run LLVM's code coverage tool (*llvm-cov*) to produce the code 49 coverage overview for the sample source file: 50 51 ``llvm-cov show ./test -instr-profile=test.profdata test.c`` 52 53 High Level Overview 54 =================== 55 56 LLVM's code coverage mapping format is designed to be a self contained 57 data format, that can be embedded into the LLVM IR and object files. 58 It's described in this document as a **mapping** format because its goal is 59 to store the data that is required for a code coverage tool to map between 60 the specific source ranges in a file and the execution counts obtained 61 after running the instrumented version of the program. 62 63 The mapping data is used in two places in the code coverage process: 64 65 1. When clang compiles a source file with ``-fcoverage-mapping``, it 66 generates the mapping information that describes the mapping between the 67 source ranges and the profiling instrumentation counters. 68 This information gets embedded into the LLVM IR and conveniently 69 ends up in the final executable file when the program is linked. 70 71 2. It is also used by *llvm-cov* - the mapping information is extracted from an 72 object file and is used to associate the execution counts (the values of the 73 profile instrumentation counters), and the source ranges in a file. 74 After that, the tool is able to generate various code coverage reports 75 for the program. 76 77 The coverage mapping format aims to be a "universal format" that would be 78 suitable for usage by any frontend, and not just by Clang. It also aims to 79 provide the frontend the possibility of generating the minimal coverage mapping 80 data in order to reduce the size of the IR and object files - for example, 81 instead of emitting mapping information for each statement in a function, the 82 frontend is allowed to group the statements with the same execution count into 83 regions of code, and emit the mapping information only for those regions. 84 85 Advanced Concepts 86 ================= 87 88 The remainder of this guide is meant to give you insight into the way the 89 coverage mapping format works. 90 91 The coverage mapping format operates on a per-function level as the 92 profile instrumentation counters are associated with a specific function. 93 For each function that requires code coverage, the frontend has to create 94 coverage mapping data that can map between the source code ranges and 95 the profile instrumentation counters for that function. 96 97 Mapping Region 98 -------------- 99 100 The function's coverage mapping data contains an array of mapping regions. 101 A mapping region stores the `source code range`_ that is covered by this region, 102 the `file id <coverage file id_>`_, the `coverage mapping counter`_ and 103 the region's kind. 104 There are several kinds of mapping regions: 105 106 * Code regions associate portions of source code and `coverage mapping 107 counters`_. They make up the majority of the mapping regions. They are used 108 by the code coverage tool to compute the execution counts for lines, 109 highlight the regions of code that were never executed, and to obtain 110 the various code coverage statistics for a function. 111 For example: 112 113 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:40 to 9:2</span> 114 <span style='background-color:#4A789C'> </span> 115 <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Code Region from 3:17 to 5:4</span> 116 <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span> 117 <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Code Region from 5:10 to 7:4</span> 118 <span style='background-color:#F6D55D'> printf("\n"); </span> 119 <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 120 <span style='background-color:#4A789C'> return 0; </span> 121 <span style='background-color:#4A789C'>}</span> 122 </pre>` 123 * Skipped regions are used to represent source ranges that were skipped 124 by Clang's preprocessor. They don't associate with 125 `coverage mapping counters`_, as the frontend knows that they are never 126 executed. They are used by the code coverage tool to mark the skipped lines 127 inside a function as non-code lines that don't have execution counts. 128 For example: 129 130 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:12 to 6:2</span> 131 <span style='background-color:#85C1F5'>#ifdef DEBUG </span> <span class='c1'>// Skipped Region from 2:1 to 4:2</span> 132 <span style='background-color:#85C1F5'> printf("Hello world"); </span> 133 <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif </span> 134 <span style='background-color:#4A789C'> return 0; </span> 135 <span style='background-color:#4A789C'>}</span> 136 </pre>` 137 * Expansion regions are used to represent Clang's macro expansions. They 138 have an additional property - *expanded file id*. This property can be 139 used by the code coverage tool to find the mapping regions that are created 140 as a result of this macro expansion, by checking if their file id matches the 141 expanded file id. They don't associate with `coverage mapping counters`_, 142 as the code coverage tool can determine the execution count for this region 143 by looking up the execution count of the first region with a corresponding 144 file id. 145 For example: 146 147 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{ </span> 148 <span style='background-color:#4A789C'> #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) > (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'> </span> 149 <span style='background-color:#4A789C'> return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42); </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span> 150 <span style='background-color:#4A789C'>}</span> 151 </pre>` 152 153 .. _source code range: 154 155 Source Range: 156 ^^^^^^^^^^^^^ 157 158 The source range record contains the starting and ending location of a certain 159 mapping region. Both locations include the line and the column numbers. 160 161 .. _coverage file id: 162 163 File ID: 164 ^^^^^^^^ 165 166 The file id an integer value that tells us 167 in which source file or macro expansion is this region located. 168 It enables Clang to produce mapping information for the code 169 defined inside macros, like this example demonstrates: 170 171 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span> 172 <span style='background-color:#4A789C'> #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'> </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span> 173 <span style='background-color:#4A789C'> if(*str) </span> 174 <span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span> 175 <span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span> 176 <span style='background-color:#4A789C'>}</span> 177 </pre>` 178 179 .. _coverage mapping counter: 180 .. _coverage mapping counters: 181 182 Counter: 183 ^^^^^^^^ 184 185 A coverage mapping counter can represents a reference to the profile 186 instrumentation counter. The execution count for a region with such counter 187 is determined by looking up the value of the corresponding profile 188 instrumentation counter. 189 190 It can also represent a binary arithmetical expression that operates on 191 coverage mapping counters or other expressions. 192 The execution count for a region with an expression counter is determined by 193 evaluating the expression's arguments and then adding them together or 194 subtracting them from one another. 195 In the example below, a subtraction expression is used to compute the execution 196 count for the compound statement that follows the *else* keyword: 197 198 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span> 199 <span style='background-color:#4A789C'> </span> 200 <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #1</span> 201 <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span><span> </span> 202 <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span> 203 <span style='background-color:#F6D55D'> printf("\n"); </span> 204 <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> 205 <span style='background-color:#4A789C'> return 0; </span> 206 <span style='background-color:#4A789C'>}</span> 207 </pre>` 208 209 Finally, a coverage mapping counter can also represent an execution count of 210 of zero. The zero counter is used to provide coverage mapping for 211 unreachable statements and expressions, like in the example below: 212 213 :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> 214 <span style='background-color:#4A789C'> return 0; </span> 215 <span style='background-color:#4A789C'> </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Unreachable region's counter is zero</span> 216 <span style='background-color:#4A789C'>}</span> 217 </pre>` 218 219 The zero counters allow the code coverage tool to display proper line execution 220 counts for the unreachable lines and highlight the unreachable code. 221 Without them, the tool would think that those lines and regions were still 222 executed, as it doesn't possess the frontend's knowledge. 223 224 LLVM IR Representation 225 ====================== 226 227 The coverage mapping data is stored in the LLVM IR using a single global 228 constant structure variable called *__llvm_coverage_mapping* 229 with the *__llvm_covmap* section specifier. 230 231 For example, lets consider a C file and how it gets compiled to LLVM: 232 233 .. _coverage mapping sample: 234 235 .. code-block:: c 236 237 int foo() { 238 return 42; 239 } 240 int bar() { 241 return 13; 242 } 243 244 The coverage mapping variable generated by Clang has 3 fields: 245 246 * Coverage mapping header. 247 248 * An array of function records. 249 250 * Coverage mapping data which is an array of bytes. Zero paddings are added at the end to force 8 byte alignment. 251 252 .. code-block:: llvm 253 254 @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [2 x { i64, i32, i64 }], [40 x i8] } 255 { 256 { i32, i32, i32, i32 } ; Coverage map header 257 { 258 i32 2, ; The number of function records 259 i32 20, ; The length of the string that contains the encoded translation unit filenames 260 i32 20, ; The length of the string that contains the encoded coverage mapping data 261 i32 1, ; Coverage mapping format version 262 }, 263 [2 x { i64, i32, i64 }] [ ; Function records 264 { i64, i32, i64 } { 265 i64 0x5cf8c24cdb18bdac, ; Function's name MD5 266 i32 9, ; Function's encoded coverage mapping data string length 267 i64 0 ; Function's structural hash 268 }, 269 { i64, i32, i64 } { 270 i64 0xe413754a191db537, ; Function's name MD5 271 i32 9, ; Function's encoded coverage mapping data string length 272 i64 0 ; Function's structural hash 273 }], 274 [40 x i8] c"..." ; Encoded data (dissected later) 275 }, section "__llvm_covmap", align 8 276 277 The function record layout has evolved since version 1. In version 1, the function record for *foo* is defined as follows: 278 279 .. code-block:: llvm 280 281 { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name 282 i32 3, ; Function's name length 283 i32 9, ; Function's encoded coverage mapping data string length 284 i64 0 ; Function's structural hash 285 } 286 287 288 Coverage Mapping Header: 289 ------------------------ 290 291 The coverage mapping header has the following fields: 292 293 * The number of function records. 294 295 * The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded translation unit filenames. 296 297 * The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded coverage mapping data. 298 299 * The format version. The current version is 2 (encoded as a 1). 300 301 .. _function records: 302 303 Function record: 304 ---------------- 305 306 A function record is a structure of the following type: 307 308 .. code-block:: llvm 309 310 { i64, i32, i64 } 311 312 It contains function name's MD5, the length of the encoded mapping data for that function, and function's 313 structural hash value. 314 315 Encoded data: 316 ------------- 317 318 The encoded data is stored in a single string that contains 319 the encoded filenames used by this translation unit and the encoded coverage 320 mapping data for each function in this translation unit. 321 322 The encoded data has the following structure: 323 324 ``[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]`` 325 326 If necessary, the encoded data is padded with zeroes so that the size 327 of the data string is rounded up to the nearest multiple of 8 bytes. 328 329 Dissecting the sample: 330 ^^^^^^^^^^^^^^^^^^^^^^ 331 332 Here's an overview of the encoded data that was stored in the 333 IR for the `coverage mapping sample`_ that was shown earlier: 334 335 * The IR contains the following string constant that represents the encoded 336 coverage mapping data for the sample translation unit: 337 338 .. code-block:: llvm 339 340 c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00" 341 342 * The string contains values that are encoded in the LEB128 format, which is 343 used throughout for storing integers. It also contains a string value. 344 345 * The length of the substring that contains the encoded translation unit 346 filenames is the value of the second field in the *__llvm_coverage_mapping* 347 structure, which is 20, thus the filenames are encoded in this string: 348 349 .. code-block:: llvm 350 351 c"\01\12/Users/alex/test.c" 352 353 This string contains the following data: 354 355 * Its first byte has a value of ``0x01``. It stores the number of filenames 356 contained in this string. 357 * Its second byte stores the length of the first filename in this string. 358 * The remaining 18 bytes are used to store the first filename. 359 360 * The length of the substring that contains the encoded coverage mapping data 361 for the first function is the value of the third field in the first 362 structure in an array of `function records`_ stored in the 363 third field of the *__llvm_coverage_mapping* structure, which is the 9. 364 Therefore, the coverage mapping for the first function record is encoded 365 in this string: 366 367 .. code-block:: llvm 368 369 c"\01\00\00\01\01\01\0C\02\02" 370 371 This string consists of the following bytes: 372 373 +----------+-------------------------------------------------------------------------------------------------------------------------+ 374 | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. | 375 +----------+-------------------------------------------------------------------------------------------------------------------------+ 376 | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". | 377 +----------+-------------------------------------------------------------------------------------------------------------------------+ 378 | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. | 379 +----------+-------------------------------------------------------------------------------------------------------------------------+ 380 | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. | 381 +----------+-------------------------------------------------------------------------------------------------------------------------+ 382 | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage | 383 | | mapping counter that is a reference to the profile instrumentation counter with an index of 0. | 384 +----------+-------------------------------------------------------------------------------------------------------------------------+ 385 | ``0x01`` | The starting line of the first mapping region in this function. | 386 +----------+-------------------------------------------------------------------------------------------------------------------------+ 387 | ``0x0C`` | The starting column of the first mapping region in this function. | 388 +----------+-------------------------------------------------------------------------------------------------------------------------+ 389 | ``0x02`` | The ending line of the first mapping region in this function. | 390 +----------+-------------------------------------------------------------------------------------------------------------------------+ 391 | ``0x02`` | The ending column of the first mapping region in this function. | 392 +----------+-------------------------------------------------------------------------------------------------------------------------+ 393 394 * The length of the substring that contains the encoded coverage mapping data 395 for the second function record is also 9. It's structured like the mapping data 396 for the first function record. 397 398 * The two trailing bytes are zeroes and are used to pad the coverage mapping 399 data to give it the 8 byte alignment. 400 401 Encoding 402 ======== 403 404 The per-function coverage mapping data is encoded as a stream of bytes, 405 with a simple structure. The structure consists of the encoding 406 `types <cvmtypes_>`_ like variable-length unsigned integers, that 407 are used to encode `File ID Mapping`_, `Counter Expressions`_ and 408 the `Mapping Regions`_. 409 410 The format of the structure follows: 411 412 ``[file id mapping, counter expressions, mapping regions]`` 413 414 The translation unit filenames are encoded using the same encoding 415 `types <cvmtypes_>`_ as the per-function coverage mapping data, with the 416 following structure: 417 418 ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]`` 419 420 .. _cvmtypes: 421 422 Types 423 ----- 424 425 This section describes the basic types that are used by the encoding format 426 and can appear after ``:`` in the ``[foo : type]`` description. 427 428 .. _LEB128: 429 430 LEB128 431 ^^^^^^ 432 433 LEB128 is an unsigned integer value that is encoded using DWARF's LEB128 434 encoding, optimizing for the case where values are small 435 (1 byte for values less than 128). 436 437 .. _Strings: 438 439 Strings 440 ^^^^^^^ 441 442 ``[length : LEB128, characters...]`` 443 444 String values are encoded with a `LEB value <LEB128_>`_ for the length 445 of the string and a sequence of bytes for its characters. 446 447 .. _file id mapping: 448 449 File ID Mapping 450 --------------- 451 452 ``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]`` 453 454 File id mapping in a function's coverage mapping stream 455 contains the indices into the translation unit's filenames array. 456 457 Counter 458 ------- 459 460 ``[value : LEB128]`` 461 462 A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_. 463 It is composed of two things --- the `tag <counter-tag_>`_ 464 which is stored in the lowest 2 bits, and the `counter data`_ which is stored 465 in the remaining bits. 466 467 .. _counter-tag: 468 469 Tag: 470 ^^^^ 471 472 The counter's tag encodes the counter's kind 473 and, if the counter is an expression, the expression's kind. 474 The possible tag values are: 475 476 * 0 - The counter is zero. 477 478 * 1 - The counter is a reference to the profile instrumentation counter. 479 480 * 2 - The counter is a subtraction expression. 481 482 * 3 - The counter is an addition expression. 483 484 .. _counter data: 485 486 Data: 487 ^^^^^ 488 489 The counter's data is interpreted in the following manner: 490 491 * When the counter is a reference to the profile instrumentation counter, 492 then the counter's data is the id of the profile counter. 493 * When the counter is an expression, then the counter's data 494 is the index into the array of counter expressions. 495 496 .. _Counter Expressions: 497 498 Counter Expressions 499 ------------------- 500 501 ``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]`` 502 503 Counter expressions consist of two counters as they 504 represent binary arithmetic operations. 505 The expression's kind is determined from the `tag <counter-tag_>`_ of the 506 counter that references this expression. 507 508 .. _Mapping Regions: 509 510 Mapping Regions 511 --------------- 512 513 ``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]`` 514 515 The mapping regions are stored in an array of sub-arrays where every 516 region in a particular sub-array has the same file id. 517 518 The file id for a sub-array of regions is the index of that 519 sub-array in the main array e.g. The first sub-array will have the file id 520 of 0. 521 522 Sub-Array of Regions 523 ^^^^^^^^^^^^^^^^^^^^ 524 525 ``[numRegions : LEB128, region0, region1, ...]`` 526 527 The mapping regions for a specific file id are stored in an array that is 528 sorted in an ascending order by the region's starting location. 529 530 Mapping Region 531 ^^^^^^^^^^^^^^ 532 533 ``[header, source range]`` 534 535 The mapping region record contains two sub-records --- 536 the `header`_, which stores the counter and/or the region's kind, 537 and the `source range`_ that contains the starting and ending 538 location of this region. 539 540 .. _header: 541 542 Header 543 ^^^^^^ 544 545 ``[counter]`` 546 547 or 548 549 ``[pseudo-counter]`` 550 551 The header encodes the region's counter and the region's kind. 552 553 The value of the counter's tag distinguishes between the counters and 554 pseudo-counters --- if the tag is zero, than this header contains a 555 pseudo-counter, otherwise this header contains an ordinary counter. 556 557 Counter: 558 """""""" 559 560 A mapping region whose header has a counter with a non-zero tag is 561 a code region. 562 563 Pseudo-Counter: 564 """"""""""""""" 565 566 ``[value : LEB128]`` 567 568 A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like 569 the ordinary counter. It has the following interpretation: 570 571 * bits 0-1: tag, which is always 0. 572 573 * bit 2: expansionRegionTag. If this bit is set, then this mapping region 574 is an expansion region. 575 576 * remaining bits: data. If this region is an expansion region, then the data 577 contains the expanded file id of that region. 578 579 Otherwise, the data contains the region's kind. The possible region 580 kind values are: 581 582 * 0 - This mapping region is a code region with a counter of zero. 583 * 2 - This mapping region is a skipped region. 584 585 .. _source range: 586 587 Source Range 588 ^^^^^^^^^^^^ 589 590 ``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]`` 591 592 The source range record contains the following fields: 593 594 * *deltaLineStart*: The difference between the starting line of the 595 current mapping region and the starting line of the previous mapping region. 596 597 If the current mapping region is the first region in the current 598 sub-array, then it stores the starting line of that region. 599 600 * *columnStart*: The starting column of the mapping region. 601 602 * *numLines*: The difference between the ending line and the starting line 603 of the current mapping region. 604 605 * *columnEnd*: The ending column of the mapping region. 606