Home | History | Annotate | Download | only in docs
      1 ===========================
      2 LLVM Branch Weight Metadata
      3 ===========================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 Branch Weight Metadata represents branch weights as its likeliness to be taken
     12 (see :doc:`BlockFrequencyTerminology`). Metadata is assigned to the
     13 ``TerminatorInst`` as a ``MDNode`` of the ``MD_prof`` kind. The first operator
     14 is always a ``MDString`` node with the string "branch_weights".  Number of
     15 operators depends on the terminator type.
     16 
     17 Branch weights might be fetch from the profiling file, or generated based on
     18 `__builtin_expect`_ instruction.
     19 
     20 All weights are represented as an unsigned 32-bit values, where higher value
     21 indicates greater chance to be taken.
     22 
     23 Supported Instructions
     24 ======================
     25 
     26 ``BranchInst``
     27 ^^^^^^^^^^^^^^
     28 
     29 Metadata is only assigned to the conditional branches. There are two extra
     30 operands for the true and the false branch.
     31 
     32 .. code-block:: none
     33 
     34   !0 = metadata !{
     35     metadata !"branch_weights",
     36     i32 <TRUE_BRANCH_WEIGHT>,
     37     i32 <FALSE_BRANCH_WEIGHT>
     38   }
     39 
     40 ``SwitchInst``
     41 ^^^^^^^^^^^^^^
     42 
     43 Branch weights are assigned to every case (including the ``default`` case which
     44 is always case #0).
     45 
     46 .. code-block:: none
     47 
     48   !0 = metadata !{
     49     metadata !"branch_weights",
     50     i32 <DEFAULT_BRANCH_WEIGHT>
     51     [ , i32 <CASE_BRANCH_WEIGHT> ... ]
     52   }
     53 
     54 ``IndirectBrInst``
     55 ^^^^^^^^^^^^^^^^^^
     56 
     57 Branch weights are assigned to every destination.
     58 
     59 .. code-block:: none
     60 
     61   !0 = metadata !{
     62     metadata !"branch_weights",
     63     i32 <LABEL_BRANCH_WEIGHT>
     64     [ , i32 <LABEL_BRANCH_WEIGHT> ... ]
     65   }
     66 
     67 ``CallInst``
     68 ^^^^^^^^^^^^^^^^^^
     69 
     70 Calls may have branch weight metadata, containing the execution count of
     71 the call. It is currently used in SamplePGO mode only, to augment the
     72 block and entry counts which may not be accurate with sampling.
     73 
     74 .. code-block:: none
     75 
     76   !0 = metadata !{
     77     metadata !"branch_weights",
     78     i32 <CALL_BRANCH_WEIGHT>
     79   }
     80 
     81 Other
     82 ^^^^^
     83 
     84 Other terminator instructions are not allowed to contain Branch Weight Metadata.
     85 
     86 .. _\__builtin_expect:
     87 
     88 Built-in ``expect`` Instructions
     89 ================================
     90 
     91 ``__builtin_expect(long exp, long c)`` instruction provides branch prediction
     92 information. The return value is the value of ``exp``.
     93 
     94 It is especially useful in conditional statements. Currently Clang supports two
     95 conditional statements:
     96 
     97 ``if`` statement
     98 ^^^^^^^^^^^^^^^^
     99 
    100 The ``exp`` parameter is the condition. The ``c`` parameter is the expected
    101 comparison value. If it is equal to 1 (true), the condition is likely to be
    102 true, in other case condition is likely to be false. For example:
    103 
    104 .. code-block:: c++
    105 
    106   if (__builtin_expect(x > 0, 1)) {
    107     // This block is likely to be taken.
    108   }
    109 
    110 ``switch`` statement
    111 ^^^^^^^^^^^^^^^^^^^^
    112 
    113 The ``exp`` parameter is the value. The ``c`` parameter is the expected
    114 value. If the expected value doesn't show on the cases list, the ``default``
    115 case is assumed to be likely taken.
    116 
    117 .. code-block:: c++
    118 
    119   switch (__builtin_expect(x, 5)) {
    120   default: break;
    121   case 0:  // ...
    122   case 3:  // ...
    123   case 5:  // This case is likely to be taken.
    124   }
    125 
    126 CFG Modifications
    127 =================
    128 
    129 Branch Weight Metatada is not proof against CFG changes. If terminator operands'
    130 are changed some action should be taken. In other case some misoptimizations may
    131 occur due to incorrect branch prediction information.
    132 
    133 Function Entry Counts
    134 =====================
    135 
    136 To allow comparing different functions during inter-procedural analysis and
    137 optimization, ``MD_prof`` nodes can also be assigned to a function definition.
    138 The first operand is a string indicating the name of the associated counter.
    139 
    140 Currently, one counter is supported: "function_entry_count". The second operand
    141 is a 64-bit counter that indicates the number of times that this function was
    142 invoked (in the case of instrumentation-based profiles). In the case of
    143 sampling-based profiles, this operand is an approximation of how many times
    144 the function was invoked.
    145 
    146 For example, in the code below, the instrumentation for function foo()
    147 indicates that it was called 2,590 times at runtime.
    148 
    149 .. code-block:: llvm
    150 
    151   define i32 @foo() !prof !1 {
    152     ret i32 0
    153   }
    154   !1 = !{!"function_entry_count", i64 2590}
    155 
    156 If "function_entry_count" has more than 2 operands, the later operands are
    157 the GUID of the functions that needs to be imported by ThinLTO. This is only
    158 set by sampling based profile. It is needed because the sampling based profile
    159 was collected on a binary that had already imported and inlined these functions,
    160 and we need to ensure the IR matches in the ThinLTO backends for profile
    161 annotation. The reason why we cannot annotate this on the callsite is that it
    162 can only goes down 1 level in the call chain. For the cases where
    163 foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
    164 in the call chain to import both bar_in_b_cc and baz_in_c_cc.
    165