Home | History | Annotate | Download | only in docs
      1 ===================================
      2 Stack maps and patch points in LLVM
      3 ===================================
      4 
      5 .. contents::
      6    :local:
      7    :depth: 2
      8 
      9 Definitions
     10 ===========
     11 
     12 In this document we refer to the "runtime" collectively as all
     13 components that serve as the LLVM client, including the LLVM IR
     14 generator, object code consumer, and code patcher.
     15 
     16 A stack map records the location of ``live values`` at a particular
     17 instruction address. These ``live values`` do not refer to all the
     18 LLVM values live across the stack map. Instead, they are only the
     19 values that the runtime requires to be live at this point. For
     20 example, they may be the values the runtime will need to resume
     21 program execution at that point independent of the compiled function
     22 containing the stack map.
     23 
     24 LLVM emits stack map data into the object code within a designated
     25 :ref:`stackmap-section`. This stack map data contains a record for
     26 each stack map. The record stores the stack map's instruction address
     27 and contains a entry for each mapped value. Each entry encodes a
     28 value's location as a register, stack offset, or constant.
     29 
     30 A patch point is an instruction address at which space is reserved for
     31 patching a new instruction sequence at run time. Patch points look
     32 much like calls to LLVM. They take arguments that follow a calling
     33 convention and may return a value. They also imply stack map
     34 generation, which allows the runtime to locate the patchpoint and
     35 find the location of ``live values`` at that point.
     36 
     37 Motivation
     38 ==========
     39 
     40 This functionality is currently experimental but is potentially useful
     41 in a variety of settings, the most obvious being a runtime (JIT)
     42 compiler. Example applications of the patchpoint intrinsics are
     43 implementing an inline call cache for polymorphic method dispatch or
     44 optimizing the retrieval of properties in dynamically typed languages
     45 such as JavaScript.
     46 
     47 The intrinsics documented here are currently used by the JavaScript
     48 compiler within the open source WebKit project, see the `FTL JIT
     49 <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
     50 used whenever stack maps or code patching are needed. Because the
     51 intrinsics have experimental status, compatibility across LLVM
     52 releases is not guaranteed.
     53 
     54 The stack map functionality described in this document is separate
     55 from the functionality described in
     56 :ref:`stack-map`. `GCFunctionMetadata` provides the location of
     57 pointers into a collected heap captured by the `GCRoot` intrinsic,
     58 which can also be considered a "stack map". Unlike the stack maps
     59 defined above, the `GCFunctionMetadata` stack map interface does not
     60 provide a way to associate live register values of arbitrary type with
     61 an instruction address, nor does it specify a format for the resulting
     62 stack map. The stack maps described here could potentially provide
     63 richer information to a garbage collecting runtime, but that usage
     64 will not be discussed in this document.
     65 
     66 Intrinsics
     67 ==========
     68 
     69 The following two kinds of intrinsics can be used to implement stack
     70 maps and patch points: ``llvm.experimental.stackmap`` and
     71 ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
     72 stack map record, and they both allow some form of code patching. They
     73 can be used independently (i.e. ``llvm.experimental.patchpoint``
     74 implicitly generates a stack map without the need for an additional
     75 call to ``llvm.experimental.stackmap``). The choice of which to use
     76 depends on whether it is necessary to reserve space for code patching
     77 and whether any of the intrinsic arguments should be lowered according
     78 to calling conventions. ``llvm.experimental.stackmap`` does not
     79 reserve any space, nor does it expect any call arguments. If the
     80 runtime patches code at the stack map's address, it will destructively
     81 overwrite the program text. This is unlike
     82 ``llvm.experimental.patchpoint``, which reserves space for in-place
     83 patching without overwriting surrounding code. The
     84 ``llvm.experimental.patchpoint`` intrinsic also lowers a specified
     85 number of arguments according to its calling convention. This allows
     86 patched code to make in-place function calls without marshaling.
     87 
     88 Each instance of one of these intrinsics generates a stack map record
     89 in the :ref:`stackmap-section`. The record includes an ID, allowing
     90 the runtime to uniquely identify the stack map, and the offset within
     91 the code from the beginning of the enclosing function.
     92 
     93 '``llvm.experimental.stackmap``' Intrinsic
     94 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     95 
     96 Syntax:
     97 """""""
     98 
     99 ::
    100 
    101       declare void
    102         @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
    103 
    104 Overview:
    105 """""""""
    106 
    107 The '``llvm.experimental.stackmap``' intrinsic records the location of
    108 specified values in the stack map without generating any code.
    109 
    110 Operands:
    111 """""""""
    112 
    113 The first operand is an ID to be encoded within the stack map. The
    114 second operand is the number of shadow bytes following the
    115 intrinsic. The variable number of operands that follow are the ``live
    116 values`` for which locations will be recorded in the stack map.
    117 
    118 To use this intrinsic as a bare-bones stack map, with no code patching
    119 support, the number of shadow bytes can be set to zero.
    120 
    121 Semantics:
    122 """"""""""
    123 
    124 The stack map intrinsic generates no code in place, unless nops are
    125 needed to cover its shadow (see below). However, its offset from
    126 function entry is stored in the stack map. This is the relative
    127 instruction address immediately following the instructions that
    128 precede the stack map.
    129 
    130 The stack map ID allows a runtime to locate the desired stack map
    131 record. LLVM passes this ID through directly to the stack map
    132 record without checking uniqueness.
    133 
    134 LLVM guarantees a shadow of instructions following the stack map's
    135 instruction offset during which neither the end of the basic block nor
    136 another call to ``llvm.experimental.stackmap`` or
    137 ``llvm.experimental.patchpoint`` may occur. This allows the runtime to
    138 patch the code at this point in response to an event triggered from
    139 outside the code. The code for instructions following the stack map
    140 may be emitted in the stack map's shadow, and these instructions may
    141 be overwritten by destructive patching. Without shadow bytes, this
    142 destructive patching could overwrite program text or data outside the
    143 current function. We disallow overlapping stack map shadows so that
    144 the runtime does not need to consider this corner case.
    145 
    146 For example, a stack map with 8 byte shadow:
    147 
    148 .. code-block:: llvm
    149 
    150   call void @runtime()
    151   call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
    152                                                          i64* %ptr)
    153   %val = load i64* %ptr
    154   %add = add i64 %val, 3
    155   ret i64 %add
    156 
    157 May require one byte of nop-padding:
    158 
    159 .. code-block:: none
    160 
    161   0x00 callq _runtime
    162   0x05 nop                <--- stack map address
    163   0x06 movq (%rdi), %rax
    164   0x07 addq $3, %rax
    165   0x0a popq %rdx
    166   0x0b ret                <---- end of 8-byte shadow
    167 
    168 Now, if the runtime needs to invalidate the compiled code, it may
    169 patch 8 bytes of code at the stack map's address at follows:
    170 
    171 .. code-block:: none
    172 
    173   0x00 callq _runtime
    174   0x05 movl  $0xffff, %rax <--- patched code at stack map address
    175   0x0a callq *%rax         <---- end of 8-byte shadow
    176 
    177 This way, after the normal call to the runtime returns, the code will
    178 execute a patched call to a special entry point that can rebuild a
    179 stack frame from the values located by the stack map.
    180 
    181 '``llvm.experimental.patchpoint.*``' Intrinsic
    182 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    183 
    184 Syntax:
    185 """""""
    186 
    187 ::
    188 
    189       declare void
    190         @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
    191                                            i8* <target>, i32 <numArgs>, ...)
    192       declare i64
    193         @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
    194                                           i8* <target>, i32 <numArgs>, ...)
    195 
    196 Overview:
    197 """""""""
    198 
    199 The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
    200 call to the specified ``<target>`` and records the location of specified
    201 values in the stack map.
    202 
    203 Operands:
    204 """""""""
    205 
    206 The first operand is an ID, the second operand is the number of bytes
    207 reserved for the patchable region, the third operand is the target
    208 address of a function (optionally null), and the fourth operand
    209 specifies how many of the following variable operands are considered
    210 function call arguments. The remaining variable number of operands are
    211 the ``live values`` for which locations will be recorded in the stack
    212 map.
    213 
    214 Semantics:
    215 """"""""""
    216 
    217 The patch point intrinsic generates a stack map. It also emits a
    218 function call to the address specified by ``<target>`` if the address
    219 is not a constant null. The function call and its arguments are
    220 lowered according to the calling convention specified at the
    221 intrinsic's callsite. Variants of the intrinsic with non-void return
    222 type also return a value according to calling convention.
    223 
    224 Requesting zero patch point arguments is valid. In this case, all
    225 variable operands are handled just like
    226 ``llvm.experimental.stackmap.*``. The difference is that space will
    227 still be reserved for patching, a call will be emitted, and a return
    228 value is allowed.
    229 
    230 The location of the arguments are not normally recorded in the stack
    231 map because they are already fixed by the calling convention. The
    232 remaining ``live values`` will have their location recorded, which
    233 could be a register, stack location, or constant. A special calling
    234 convention has been introduced for use with stack maps, anyregcc,
    235 which forces the arguments to be loaded into registers but allows
    236 those register to be dynamically allocated. These argument registers
    237 will have their register locations recorded in the stack map in
    238 addition to the remaining ``live values``.
    239 
    240 The patch point also emits nops to cover at least ``<numBytes>`` of
    241 instruction encoding space. Hence, the client must ensure that
    242 ``<numBytes>`` is enough to encode a call to the target address on the
    243 supported targets. If the call target is constant null, then there is
    244 no minimum requirement. A zero-byte null target patchpoint is
    245 valid.
    246 
    247 The runtime may patch the code emitted for the patch point, including
    248 the call sequence and nops. However, the runtime may not assume
    249 anything about the code LLVM emits within the reserved space. Partial
    250 patching is not allowed. The runtime must patch all reserved bytes,
    251 padding with nops if necessary.
    252 
    253 This example shows a patch point reserving 15 bytes, with one argument
    254 in $rdi, and a return value in $rax per native calling convention:
    255 
    256 .. code-block:: llvm
    257 
    258   %target = inttoptr i64 -281474976710654 to i8*
    259   %val = call i64 (i64, i32, ...)*
    260            @llvm.experimental.patchpoint.i64(i64 78, i32 15,
    261                                              i8* %target, i32 1, i64* %ptr)
    262   %add = add i64 %val, 3
    263   ret i64 %add
    264 
    265 May generate:
    266 
    267 .. code-block:: none
    268 
    269   0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
    270   0x0a callq   *%r11
    271   0x0d nop
    272   0x0e nop                               <--- end of reserved 15-bytes
    273   0x0f addq    $0x3, %rax
    274   0x10 movl    %rax, 8(%rsp)
    275 
    276 Note that no stack map locations will be recorded. If the patched code
    277 sequence does not need arguments fixed to specific calling convention
    278 registers, then the ``anyregcc`` convention may be used:
    279 
    280 .. code-block:: none
    281 
    282   %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
    283                                                      i8* %target, i32 1,
    284                                                      i64* %ptr)
    285 
    286 The stack map now indicates the location of the %ptr argument and
    287 return value:
    288 
    289 .. code-block:: none
    290 
    291   Stack Map: ID=78, Loc0=%r9 Loc1=%r8
    292 
    293 The patch code sequence may now use the argument that happened to be
    294 allocated in %r8 and return a value allocated in %r9:
    295 
    296 .. code-block:: none
    297 
    298   0x00 movslq 4(%r8) %r9              <--- patched code at patch point address
    299   0x03 nop
    300   ...
    301   0x0e nop                            <--- end of reserved 15-bytes
    302   0x0f addq    $0x3, %r9
    303   0x10 movl    %r9, 8(%rsp)
    304 
    305 .. _stackmap-format:
    306 
    307 Stack Map Format
    308 ================
    309 
    310 The existence of a stack map or patch point intrinsic within an LLVM
    311 Module forces code emission to create a :ref:`stackmap-section`. The
    312 format of this section follows:
    313 
    314 .. code-block:: none
    315 
    316   Header {
    317     uint8  : Stack Map Version (current version is 1)
    318     uint8  : Reserved (expected to be 0)
    319     uint16 : Reserved (expected to be 0)
    320   }
    321   uint32 : NumFunctions
    322   uint32 : NumConstants
    323   uint32 : NumRecords
    324   StkSizeRecord[NumFunctions] {
    325     uint64 : Function Address
    326     uint64 : Stack Size
    327   }
    328   Constants[NumConstants] {
    329     uint64 : LargeConstant
    330   }
    331   StkMapRecord[NumRecords] {
    332     uint64 : PatchPoint ID
    333     uint32 : Instruction Offset
    334     uint16 : Reserved (record flags)
    335     uint16 : NumLocations
    336     Location[NumLocations] {
    337       uint8  : Register | Direct | Indirect | Constant | ConstantIndex
    338       uint8  : Reserved (location flags)
    339       uint16 : Dwarf RegNum
    340       int32  : Offset or SmallConstant
    341     }
    342     uint16 : Padding
    343     uint16 : NumLiveOuts
    344     LiveOuts[NumLiveOuts]
    345       uint16 : Dwarf RegNum
    346       uint8  : Reserved
    347       uint8  : Size in Bytes
    348     }
    349     uint32 : Padding (only if required to align to 8 byte)
    350   }
    351 
    352 The first byte of each location encodes a type that indicates how to
    353 interpret the ``RegNum`` and ``Offset`` fields as follows:
    354 
    355 ======== ========== =================== ===========================
    356 Encoding Type       Value               Description
    357 -------- ---------- ------------------- ---------------------------
    358 0x1      Register   Reg                 Value in a register
    359 0x2      Direct     Reg + Offset        Frame index value
    360 0x3      Indirect   [Reg + Offset]      Spilled value
    361 0x4      Constant   Offset              Small constant
    362 0x5      ConstIndex Constants[Offset]   Large constant
    363 ======== ========== =================== ===========================
    364 
    365 In the common case, a value is available in a register, and the
    366 ``Offset`` field will be zero. Values spilled to the stack are encoded
    367 as ``Indirect`` locations. The runtime must load those values from a
    368 stack address, typically in the form ``[BP + Offset]``. If an
    369 ``alloca`` value is passed directly to a stack map intrinsic, then
    370 LLVM may fold the frame index into the stack map as an optimization to
    371 avoid allocating a register or stack slot. These frame indices will be
    372 encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
    373 also optimize constants by emitting them directly in the stack map,
    374 either in the ``Offset`` of a ``Constant`` location or in the constant
    375 pool, referred to by ``ConstantIndex`` locations.
    376 
    377 At each callsite, a "liveout" register list is also recorded. These
    378 are the registers that are live across the stackmap and therefore must
    379 be saved by the runtime. This is an important optimization when the
    380 patchpoint intrinsic is used with a calling convention that by default
    381 preserves most registers as callee-save.
    382 
    383 Each entry in the liveout register list contains a DWARF register
    384 number and size in bytes. The stackmap format deliberately omits
    385 specific subregister information. Instead the runtime must interpret
    386 this information conservatively. For example, if the stackmap reports
    387 one byte at ``%rax``, then the value may be in either ``%al`` or
    388 ``%ah``. It doesn't matter in practice, because the runtime will
    389 simply save ``%rax``. However, if the stackmap reports 16 bytes at
    390 ``%ymm0``, then the runtime can safely optimize by saving only
    391 ``%xmm0``.
    392 
    393 The stack map format is a contract between an LLVM SVN revision and
    394 the runtime. It is currently experimental and may change in the short
    395 term, but minimizing the need to update the runtime is
    396 important. Consequently, the stack map design is motivated by
    397 simplicity and extensibility. Compactness of the representation is
    398 secondary because the runtime is expected to parse the data
    399 immediately after compiling a module and encode the information in its
    400 own format. Since the runtime controls the allocation of sections, it
    401 can reuse the same stack map space for multiple modules.
    402 
    403 Stackmap support is currently only implemented for 64-bit
    404 platforms. However, a 32-bit implementation should be able to use the
    405 same format with an insignificant amount of wasted space.
    406 
    407 .. _stackmap-section:
    408 
    409 Stack Map Section
    410 ^^^^^^^^^^^^^^^^^
    411 
    412 A JIT compiler can easily access this section by providing its own
    413 memory manager via the LLVM C API
    414 ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
    415 manager, the JIT provides a callback:
    416 ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
    417 this section, it invokes the callback and passes the section name. The
    418 JIT can record the in-memory address of the section at this time and
    419 later parse it to recover the stack map data.
    420 
    421 On Darwin, the stack map section name is "__llvm_stackmaps". The
    422 segment name is "__LLVM_STACKMAPS".
    423 
    424 Stack Map Usage
    425 ===============
    426 
    427 The stack map support described in this document can be used to
    428 precisely determine the location of values at a specific position in
    429 the code. LLVM does not maintain any mapping between those values and
    430 any higher-level entity. The runtime must be able to interpret the
    431 stack map record given only the ID, offset, and the order of the
    432 locations, which LLVM preserves.
    433 
    434 Note that this is quite different from the goal of debug information,
    435 which is a best-effort attempt to track the location of named
    436 variables at every instruction.
    437 
    438 An important motivation for this design is to allow a runtime to
    439 commandeer a stack frame when execution reaches an instruction address
    440 associated with a stack map. The runtime must be able to rebuild a
    441 stack frame and resume program execution using the information
    442 provided by the stack map. For example, execution may resume in an
    443 interpreter or a recompiled version of the same function.
    444 
    445 This usage restricts LLVM optimization. Clearly, LLVM must not move
    446 stores across a stack map. However, loads must also be handled
    447 conservatively. If the load may trigger an exception, hoisting it
    448 above a stack map could be invalid. For example, the runtime may
    449 determine that a load is safe to execute without a type check given
    450 the current state of the type system. If the type system changes while
    451 some activation of the load's function exists on the stack, the load
    452 becomes unsafe. The runtime can prevent subsequent execution of that
    453 load by immediately patching any stack map location that lies between
    454 the current call site and the load (typically, the runtime would
    455 simply patch all stack map locations to invalidate the function). If
    456 the compiler had hoisted the load above the stack map, then the
    457 program could crash before the runtime could take back control.
    458 
    459 To enforce these semantics, stackmap and patchpoint intrinsics are
    460 considered to potentially read and write all memory. This may limit
    461 optimization more than some clients desire. This limitation may be
    462 avoided by marking the call site as "readonly". In the future we may
    463 also allow meta-data to be added to the intrinsic call to express
    464 aliasing, thereby allowing optimizations to hoist certain loads above
    465 stack maps.
    466 
    467 Direct Stack Map Entries
    468 ^^^^^^^^^^^^^^^^^^^^^^^^
    469 
    470 As shown in :ref:`stackmap-section`, a Direct stack map location
    471 records the address of frame index. This address is itself the value
    472 that the runtime requested. This differs from Indirect locations,
    473 which refer to a stack locations from which the requested values must
    474 be loaded. Direct locations can communicate the address if an alloca,
    475 while Indirect locations handle register spills.
    476 
    477 For example:
    478 
    479 .. code-block:: none
    480 
    481   entry:
    482     %a = alloca i64...
    483     llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
    484 
    485 The runtime can determine this alloca's relative location on the
    486 stack immediately after compilation, or at any time thereafter. This
    487 differs from Register and Indirect locations, because the runtime can
    488 only read the values in those locations when execution reaches the
    489 instruction address of the stack map.
    490 
    491 This functionality requires LLVM to treat entry-block allocas
    492 specially when they are directly consumed by an intrinsics. (This is
    493 the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
    494 transformations must not substitute the alloca with any intervening
    495 value. This can be verified by the runtime simply by checking that the
    496 stack map's location is a Direct location type.
    497