1 =================================== 2 Stack maps and patch points in LLVM 3 =================================== 4 5 .. contents:: 6 :local: 7 :depth: 2 8 9 Definitions 10 =========== 11 12 In this document we refer to the "runtime" collectively as all 13 components that serve as the LLVM client, including the LLVM IR 14 generator, object code consumer, and code patcher. 15 16 A stack map records the location of ``live values`` at a particular 17 instruction address. These ``live values`` do not refer to all the 18 LLVM values live across the stack map. Instead, they are only the 19 values that the runtime requires to be live at this point. For 20 example, they may be the values the runtime will need to resume 21 program execution at that point independent of the compiled function 22 containing the stack map. 23 24 LLVM emits stack map data into the object code within a designated 25 :ref:`stackmap-section`. This stack map data contains a record for 26 each stack map. The record stores the stack map's instruction address 27 and contains a entry for each mapped value. Each entry encodes a 28 value's location as a register, stack offset, or constant. 29 30 A patch point is an instruction address at which space is reserved for 31 patching a new instruction sequence at run time. Patch points look 32 much like calls to LLVM. They take arguments that follow a calling 33 convention and may return a value. They also imply stack map 34 generation, which allows the runtime to locate the patchpoint and 35 find the location of ``live values`` at that point. 36 37 Motivation 38 ========== 39 40 This functionality is currently experimental but is potentially useful 41 in a variety of settings, the most obvious being a runtime (JIT) 42 compiler. Example applications of the patchpoint intrinsics are 43 implementing an inline call cache for polymorphic method dispatch or 44 optimizing the retrieval of properties in dynamically typed languages 45 such as JavaScript. 46 47 The intrinsics documented here are currently used by the JavaScript 48 compiler within the open source WebKit project, see the `FTL JIT 49 <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be 50 used whenever stack maps or code patching are needed. Because the 51 intrinsics have experimental status, compatibility across LLVM 52 releases is not guaranteed. 53 54 The stack map functionality described in this document is separate 55 from the functionality described in 56 :ref:`stack-map`. `GCFunctionMetadata` provides the location of 57 pointers into a collected heap captured by the `GCRoot` intrinsic, 58 which can also be considered a "stack map". Unlike the stack maps 59 defined above, the `GCFunctionMetadata` stack map interface does not 60 provide a way to associate live register values of arbitrary type with 61 an instruction address, nor does it specify a format for the resulting 62 stack map. The stack maps described here could potentially provide 63 richer information to a garbage collecting runtime, but that usage 64 will not be discussed in this document. 65 66 Intrinsics 67 ========== 68 69 The following two kinds of intrinsics can be used to implement stack 70 maps and patch points: ``llvm.experimental.stackmap`` and 71 ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a 72 stack map record, and they both allow some form of code patching. They 73 can be used independently (i.e. ``llvm.experimental.patchpoint`` 74 implicitly generates a stack map without the need for an additional 75 call to ``llvm.experimental.stackmap``). The choice of which to use 76 depends on whether it is necessary to reserve space for code patching 77 and whether any of the intrinsic arguments should be lowered according 78 to calling conventions. ``llvm.experimental.stackmap`` does not 79 reserve any space, nor does it expect any call arguments. If the 80 runtime patches code at the stack map's address, it will destructively 81 overwrite the program text. This is unlike 82 ``llvm.experimental.patchpoint``, which reserves space for in-place 83 patching without overwriting surrounding code. The 84 ``llvm.experimental.patchpoint`` intrinsic also lowers a specified 85 number of arguments according to its calling convention. This allows 86 patched code to make in-place function calls without marshaling. 87 88 Each instance of one of these intrinsics generates a stack map record 89 in the :ref:`stackmap-section`. The record includes an ID, allowing 90 the runtime to uniquely identify the stack map, and the offset within 91 the code from the beginning of the enclosing function. 92 93 '``llvm.experimental.stackmap``' Intrinsic 94 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 95 96 Syntax: 97 """"""" 98 99 :: 100 101 declare void 102 @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...) 103 104 Overview: 105 """"""""" 106 107 The '``llvm.experimental.stackmap``' intrinsic records the location of 108 specified values in the stack map without generating any code. 109 110 Operands: 111 """"""""" 112 113 The first operand is an ID to be encoded within the stack map. The 114 second operand is the number of shadow bytes following the 115 intrinsic. The variable number of operands that follow are the ``live 116 values`` for which locations will be recorded in the stack map. 117 118 To use this intrinsic as a bare-bones stack map, with no code patching 119 support, the number of shadow bytes can be set to zero. 120 121 Semantics: 122 """""""""" 123 124 The stack map intrinsic generates no code in place, unless nops are 125 needed to cover its shadow (see below). However, its offset from 126 function entry is stored in the stack map. This is the relative 127 instruction address immediately following the instructions that 128 precede the stack map. 129 130 The stack map ID allows a runtime to locate the desired stack map 131 record. LLVM passes this ID through directly to the stack map 132 record without checking uniqueness. 133 134 LLVM guarantees a shadow of instructions following the stack map's 135 instruction offset during which neither the end of the basic block nor 136 another call to ``llvm.experimental.stackmap`` or 137 ``llvm.experimental.patchpoint`` may occur. This allows the runtime to 138 patch the code at this point in response to an event triggered from 139 outside the code. The code for instructions following the stack map 140 may be emitted in the stack map's shadow, and these instructions may 141 be overwritten by destructive patching. Without shadow bytes, this 142 destructive patching could overwrite program text or data outside the 143 current function. We disallow overlapping stack map shadows so that 144 the runtime does not need to consider this corner case. 145 146 For example, a stack map with 8 byte shadow: 147 148 .. code-block:: llvm 149 150 call void @runtime() 151 call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8, 152 i64* %ptr) 153 %val = load i64* %ptr 154 %add = add i64 %val, 3 155 ret i64 %add 156 157 May require one byte of nop-padding: 158 159 .. code-block:: none 160 161 0x00 callq _runtime 162 0x05 nop <--- stack map address 163 0x06 movq (%rdi), %rax 164 0x07 addq $3, %rax 165 0x0a popq %rdx 166 0x0b ret <---- end of 8-byte shadow 167 168 Now, if the runtime needs to invalidate the compiled code, it may 169 patch 8 bytes of code at the stack map's address at follows: 170 171 .. code-block:: none 172 173 0x00 callq _runtime 174 0x05 movl $0xffff, %rax <--- patched code at stack map address 175 0x0a callq *%rax <---- end of 8-byte shadow 176 177 This way, after the normal call to the runtime returns, the code will 178 execute a patched call to a special entry point that can rebuild a 179 stack frame from the values located by the stack map. 180 181 '``llvm.experimental.patchpoint.*``' Intrinsic 182 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 183 184 Syntax: 185 """"""" 186 187 :: 188 189 declare void 190 @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>, 191 i8* <target>, i32 <numArgs>, ...) 192 declare i64 193 @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>, 194 i8* <target>, i32 <numArgs>, ...) 195 196 Overview: 197 """"""""" 198 199 The '``llvm.experimental.patchpoint.*``' intrinsics creates a function 200 call to the specified ``<target>`` and records the location of specified 201 values in the stack map. 202 203 Operands: 204 """"""""" 205 206 The first operand is an ID, the second operand is the number of bytes 207 reserved for the patchable region, the third operand is the target 208 address of a function (optionally null), and the fourth operand 209 specifies how many of the following variable operands are considered 210 function call arguments. The remaining variable number of operands are 211 the ``live values`` for which locations will be recorded in the stack 212 map. 213 214 Semantics: 215 """""""""" 216 217 The patch point intrinsic generates a stack map. It also emits a 218 function call to the address specified by ``<target>`` if the address 219 is not a constant null. The function call and its arguments are 220 lowered according to the calling convention specified at the 221 intrinsic's callsite. Variants of the intrinsic with non-void return 222 type also return a value according to calling convention. 223 224 Requesting zero patch point arguments is valid. In this case, all 225 variable operands are handled just like 226 ``llvm.experimental.stackmap.*``. The difference is that space will 227 still be reserved for patching, a call will be emitted, and a return 228 value is allowed. 229 230 The location of the arguments are not normally recorded in the stack 231 map because they are already fixed by the calling convention. The 232 remaining ``live values`` will have their location recorded, which 233 could be a register, stack location, or constant. A special calling 234 convention has been introduced for use with stack maps, anyregcc, 235 which forces the arguments to be loaded into registers but allows 236 those register to be dynamically allocated. These argument registers 237 will have their register locations recorded in the stack map in 238 addition to the remaining ``live values``. 239 240 The patch point also emits nops to cover at least ``<numBytes>`` of 241 instruction encoding space. Hence, the client must ensure that 242 ``<numBytes>`` is enough to encode a call to the target address on the 243 supported targets. If the call target is constant null, then there is 244 no minimum requirement. A zero-byte null target patchpoint is 245 valid. 246 247 The runtime may patch the code emitted for the patch point, including 248 the call sequence and nops. However, the runtime may not assume 249 anything about the code LLVM emits within the reserved space. Partial 250 patching is not allowed. The runtime must patch all reserved bytes, 251 padding with nops if necessary. 252 253 This example shows a patch point reserving 15 bytes, with one argument 254 in $rdi, and a return value in $rax per native calling convention: 255 256 .. code-block:: llvm 257 258 %target = inttoptr i64 -281474976710654 to i8* 259 %val = call i64 (i64, i32, ...)* 260 @llvm.experimental.patchpoint.i64(i64 78, i32 15, 261 i8* %target, i32 1, i64* %ptr) 262 %add = add i64 %val, 3 263 ret i64 %add 264 265 May generate: 266 267 .. code-block:: none 268 269 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address 270 0x0a callq *%r11 271 0x0d nop 272 0x0e nop <--- end of reserved 15-bytes 273 0x0f addq $0x3, %rax 274 0x10 movl %rax, 8(%rsp) 275 276 Note that no stack map locations will be recorded. If the patched code 277 sequence does not need arguments fixed to specific calling convention 278 registers, then the ``anyregcc`` convention may be used: 279 280 .. code-block:: none 281 282 %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15, 283 i8* %target, i32 1, 284 i64* %ptr) 285 286 The stack map now indicates the location of the %ptr argument and 287 return value: 288 289 .. code-block:: none 290 291 Stack Map: ID=78, Loc0=%r9 Loc1=%r8 292 293 The patch code sequence may now use the argument that happened to be 294 allocated in %r8 and return a value allocated in %r9: 295 296 .. code-block:: none 297 298 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address 299 0x03 nop 300 ... 301 0x0e nop <--- end of reserved 15-bytes 302 0x0f addq $0x3, %r9 303 0x10 movl %r9, 8(%rsp) 304 305 .. _stackmap-format: 306 307 Stack Map Format 308 ================ 309 310 The existence of a stack map or patch point intrinsic within an LLVM 311 Module forces code emission to create a :ref:`stackmap-section`. The 312 format of this section follows: 313 314 .. code-block:: none 315 316 Header { 317 uint8 : Stack Map Version (current version is 1) 318 uint8 : Reserved (expected to be 0) 319 uint16 : Reserved (expected to be 0) 320 } 321 uint32 : NumFunctions 322 uint32 : NumConstants 323 uint32 : NumRecords 324 StkSizeRecord[NumFunctions] { 325 uint64 : Function Address 326 uint64 : Stack Size 327 } 328 Constants[NumConstants] { 329 uint64 : LargeConstant 330 } 331 StkMapRecord[NumRecords] { 332 uint64 : PatchPoint ID 333 uint32 : Instruction Offset 334 uint16 : Reserved (record flags) 335 uint16 : NumLocations 336 Location[NumLocations] { 337 uint8 : Register | Direct | Indirect | Constant | ConstantIndex 338 uint8 : Reserved (location flags) 339 uint16 : Dwarf RegNum 340 int32 : Offset or SmallConstant 341 } 342 uint16 : Padding 343 uint16 : NumLiveOuts 344 LiveOuts[NumLiveOuts] 345 uint16 : Dwarf RegNum 346 uint8 : Reserved 347 uint8 : Size in Bytes 348 } 349 uint32 : Padding (only if required to align to 8 byte) 350 } 351 352 The first byte of each location encodes a type that indicates how to 353 interpret the ``RegNum`` and ``Offset`` fields as follows: 354 355 ======== ========== =================== =========================== 356 Encoding Type Value Description 357 -------- ---------- ------------------- --------------------------- 358 0x1 Register Reg Value in a register 359 0x2 Direct Reg + Offset Frame index value 360 0x3 Indirect [Reg + Offset] Spilled value 361 0x4 Constant Offset Small constant 362 0x5 ConstIndex Constants[Offset] Large constant 363 ======== ========== =================== =========================== 364 365 In the common case, a value is available in a register, and the 366 ``Offset`` field will be zero. Values spilled to the stack are encoded 367 as ``Indirect`` locations. The runtime must load those values from a 368 stack address, typically in the form ``[BP + Offset]``. If an 369 ``alloca`` value is passed directly to a stack map intrinsic, then 370 LLVM may fold the frame index into the stack map as an optimization to 371 avoid allocating a register or stack slot. These frame indices will be 372 encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may 373 also optimize constants by emitting them directly in the stack map, 374 either in the ``Offset`` of a ``Constant`` location or in the constant 375 pool, referred to by ``ConstantIndex`` locations. 376 377 At each callsite, a "liveout" register list is also recorded. These 378 are the registers that are live across the stackmap and therefore must 379 be saved by the runtime. This is an important optimization when the 380 patchpoint intrinsic is used with a calling convention that by default 381 preserves most registers as callee-save. 382 383 Each entry in the liveout register list contains a DWARF register 384 number and size in bytes. The stackmap format deliberately omits 385 specific subregister information. Instead the runtime must interpret 386 this information conservatively. For example, if the stackmap reports 387 one byte at ``%rax``, then the value may be in either ``%al`` or 388 ``%ah``. It doesn't matter in practice, because the runtime will 389 simply save ``%rax``. However, if the stackmap reports 16 bytes at 390 ``%ymm0``, then the runtime can safely optimize by saving only 391 ``%xmm0``. 392 393 The stack map format is a contract between an LLVM SVN revision and 394 the runtime. It is currently experimental and may change in the short 395 term, but minimizing the need to update the runtime is 396 important. Consequently, the stack map design is motivated by 397 simplicity and extensibility. Compactness of the representation is 398 secondary because the runtime is expected to parse the data 399 immediately after compiling a module and encode the information in its 400 own format. Since the runtime controls the allocation of sections, it 401 can reuse the same stack map space for multiple modules. 402 403 Stackmap support is currently only implemented for 64-bit 404 platforms. However, a 32-bit implementation should be able to use the 405 same format with an insignificant amount of wasted space. 406 407 .. _stackmap-section: 408 409 Stack Map Section 410 ^^^^^^^^^^^^^^^^^ 411 412 A JIT compiler can easily access this section by providing its own 413 memory manager via the LLVM C API 414 ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory 415 manager, the JIT provides a callback: 416 ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates 417 this section, it invokes the callback and passes the section name. The 418 JIT can record the in-memory address of the section at this time and 419 later parse it to recover the stack map data. 420 421 On Darwin, the stack map section name is "__llvm_stackmaps". The 422 segment name is "__LLVM_STACKMAPS". 423 424 Stack Map Usage 425 =============== 426 427 The stack map support described in this document can be used to 428 precisely determine the location of values at a specific position in 429 the code. LLVM does not maintain any mapping between those values and 430 any higher-level entity. The runtime must be able to interpret the 431 stack map record given only the ID, offset, and the order of the 432 locations, which LLVM preserves. 433 434 Note that this is quite different from the goal of debug information, 435 which is a best-effort attempt to track the location of named 436 variables at every instruction. 437 438 An important motivation for this design is to allow a runtime to 439 commandeer a stack frame when execution reaches an instruction address 440 associated with a stack map. The runtime must be able to rebuild a 441 stack frame and resume program execution using the information 442 provided by the stack map. For example, execution may resume in an 443 interpreter or a recompiled version of the same function. 444 445 This usage restricts LLVM optimization. Clearly, LLVM must not move 446 stores across a stack map. However, loads must also be handled 447 conservatively. If the load may trigger an exception, hoisting it 448 above a stack map could be invalid. For example, the runtime may 449 determine that a load is safe to execute without a type check given 450 the current state of the type system. If the type system changes while 451 some activation of the load's function exists on the stack, the load 452 becomes unsafe. The runtime can prevent subsequent execution of that 453 load by immediately patching any stack map location that lies between 454 the current call site and the load (typically, the runtime would 455 simply patch all stack map locations to invalidate the function). If 456 the compiler had hoisted the load above the stack map, then the 457 program could crash before the runtime could take back control. 458 459 To enforce these semantics, stackmap and patchpoint intrinsics are 460 considered to potentially read and write all memory. This may limit 461 optimization more than some clients desire. This limitation may be 462 avoided by marking the call site as "readonly". In the future we may 463 also allow meta-data to be added to the intrinsic call to express 464 aliasing, thereby allowing optimizations to hoist certain loads above 465 stack maps. 466 467 Direct Stack Map Entries 468 ^^^^^^^^^^^^^^^^^^^^^^^^ 469 470 As shown in :ref:`stackmap-section`, a Direct stack map location 471 records the address of frame index. This address is itself the value 472 that the runtime requested. This differs from Indirect locations, 473 which refer to a stack locations from which the requested values must 474 be loaded. Direct locations can communicate the address if an alloca, 475 while Indirect locations handle register spills. 476 477 For example: 478 479 .. code-block:: none 480 481 entry: 482 %a = alloca i64... 483 llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a) 484 485 The runtime can determine this alloca's relative location on the 486 stack immediately after compilation, or at any time thereafter. This 487 differs from Register and Indirect locations, because the runtime can 488 only read the values in those locations when execution reaches the 489 instruction address of the stack map. 490 491 This functionality requires LLVM to treat entry-block allocas 492 specially when they are directly consumed by an intrinsics. (This is 493 the same requirement imposed by the llvm.gcroot intrinsic.) LLVM 494 transformations must not substitute the alloca with any intervening 495 value. This can be verified by the runtime simply by checking that the 496 stack map's location is a Direct location type. 497