Home | History | Annotate | Download | only in docs
      1 ==============================
      2 User Guide for AMDGPU Back-end
      3 ==============================
      4 
      5 Introduction
      6 ============
      7 
      8 The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
      9 the R600 family up until the current Volcanic Islands (GCN Gen 3).
     10 
     11 
     12 Conventions
     13 ===========
     14 
     15 Address Spaces
     16 --------------
     17 
     18 The AMDGPU back-end uses the following address space mapping:
     19 
     20    ============= ============================================
     21    Address Space Memory Space
     22    ============= ============================================
     23    0             Private
     24    1             Global
     25    2             Constant
     26    3             Local
     27    4             Generic (Flat)
     28    5             Region
     29    ============= ============================================
     30 
     31 The terminology in the table, aside from the region memory space, is from the
     32 OpenCL standard.
     33 
     34 
     35 Assembler
     36 =========
     37 
     38 The assembler is currently considered experimental.
     39 
     40 For syntax examples look in test/MC/AMDGPU.
     41 
     42 Below some of the currently supported features (modulo bugs).  These
     43 all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
     44 are also supported but may be missing some instructions and have more bugs:
     45 
     46 DS Instructions
     47 ---------------
     48 All DS instructions are supported.
     49 
     50 FLAT Instructions
     51 ------------------
     52 These instructions are only present in the Sea Islands and Volcanic Islands
     53 instruction set.  All FLAT instructions are supported for these architectures
     54 
     55 MUBUF Instructions
     56 ------------------
     57 All non-atomic MUBUF instructions are supported.
     58 
     59 SMRD Instructions
     60 -----------------
     61 Only the s_load_dword* SMRD instructions are supported.
     62 
     63 SOP1 Instructions
     64 -----------------
     65 All SOP1 instructions are supported.
     66 
     67 SOP2 Instructions
     68 -----------------
     69 All SOP2 instructions are supported.
     70 
     71 SOPC Instructions
     72 -----------------
     73 All SOPC instructions are supported.
     74 
     75 SOPP Instructions
     76 -----------------
     77 
     78 Unless otherwise mentioned, all SOPP instructions that have one or more
     79 operands accept integer operands only.  No verification is performed
     80 on the operands, so it is up to the programmer to be familiar with the
     81 range or acceptable values.
     82 
     83 s_waitcnt
     84 ^^^^^^^^^
     85 
     86 s_waitcnt accepts named arguments to specify which memory counter(s) to
     87 wait for.
     88 
     89 .. code-block:: nasm
     90 
     91    ; Wait for all counters to be 0
     92    s_waitcnt 0
     93 
     94    ; Equivalent to s_waitcnt 0.  Counter names can also be delimited by
     95    ; '&' or ','.
     96    s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
     97 
     98    ; Wait for vmcnt counter to be 1.
     99    s_waitcnt vmcnt(1)
    100 
    101 VOP1, VOP2, VOP3, VOPC Instructions
    102 -----------------------------------
    103 
    104 All 32-bit and 64-bit encodings should work.
    105 
    106 The assembler will automatically detect which encoding size to use for
    107 VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force
    108 a specific encoding size, you can add an _e32 (for 32-bit encoding) or
    109 _e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all
    110 instructions support an explicit suffix.  These are all valid assembly
    111 strings:
    112 
    113 .. code-block:: nasm
    114 
    115    v_mul_i32_i24 v1, v2, v3
    116    v_mul_i32_i24_e32 v1, v2, v3
    117    v_mul_i32_i24_e64 v1, v2, v3
    118 
    119 Assembler Directives
    120 --------------------
    121 
    122 .hsa_code_object_version major, minor
    123 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    124 
    125 *major* and *minor* are integers that specify the version of the HSA code
    126 object that will be generated by the assembler.  This value will be stored
    127 in an entry of the .note section.
    128 
    129 .hsa_code_object_isa [major, minor, stepping, vendor, arch]
    130 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    131 
    132 *major*, *minor*, and *stepping* are all integers that describe the instruction
    133 set architecture (ISA) version of the assembly program.
    134 
    135 *vendor* and *arch* are quoted strings.  *vendor* should always be equal to
    136 "AMD" and *arch* should always be equal to "AMDGPU".
    137 
    138 If no arguments are specified, then the assembler will derive the ISA version,
    139 *vendor*, and *arch* from the value of the -mcpu option that is passed to the
    140 assembler.
    141 
    142 ISA version, *vendor*, and *arch* will all be stored in a single entry of the
    143 .note section.
    144 
    145 .amd_kernel_code_t
    146 ^^^^^^^^^^^^^^^^^^
    147 
    148 This directive marks the beginning of a list of key / value pairs that are used
    149 to specify the amd_kernel_code_t object that will be emitted by the assembler.
    150 The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
    151 any amd_kernel_code_t values that are unspecified a default value will be
    152 used.  The default value for all keys is 0, with the following exceptions:
    153 
    154 - *kernel_code_version_major* defaults to 1.
    155 - *machine_kind* defaults to 1.
    156 - *machine_version_major*, *machine_version_minor*, and
    157   *machine_version_stepping* are derived from the value of the -mcpu option
    158   that is passed to the assembler.
    159 - *kernel_code_entry_byte_offset* defaults to 256.
    160 - *wavefront_size* defaults to 6.
    161 - *kernarg_segment_alignment*, *group_segment_alignment*, and
    162   *private_segment_alignment* default to 4.  Note that alignments are specified
    163   as a power of two, so a value of **n** means an alignment of 2^ **n**.
    164 
    165 The *.amd_kernel_code_t* directive must be placed immediately after the
    166 function label and before any instructions.
    167 
    168 For a full list of amd_kernel_code_t keys, see the examples in
    169 test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different
    170 keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
    171 
    172 Here is an example of a minimal amd_kernel_code_t specification:
    173 
    174 .. code-block:: nasm
    175 
    176    .hsa_code_object_version 1,0
    177    .hsa_code_object_isa
    178 
    179    .hsatext
    180    .globl  hello_world
    181    .p2align 8
    182    .amdgpu_hsa_kernel hello_world
    183 
    184    hello_world:
    185 
    186       .amd_kernel_code_t
    187          enable_sgpr_kernarg_segment_ptr = 1
    188          is_ptr64 = 1
    189          compute_pgm_rsrc1_vgprs = 0
    190          compute_pgm_rsrc1_sgprs = 0
    191          compute_pgm_rsrc2_user_sgpr = 2
    192          kernarg_segment_byte_size = 8
    193          wavefront_sgpr_count = 2
    194          workitem_vgpr_count = 3
    195      .end_amd_kernel_code_t
    196 
    197      s_load_dwordx2 s[0:1], s[0:1] 0x0
    198      v_mov_b32 v0, 3.14159
    199      s_waitcnt lgkmcnt(0)
    200      v_mov_b32 v1, s0
    201      v_mov_b32 v2, s1
    202      flat_store_dword v[1:2], v0
    203      s_endpgm
    204    .Lfunc_end0:
    205         .size   hello_world, .Lfunc_end0-hello_world
    206