Home | History | Annotate | Download | only in docs
      1 ==============================
      2 User Guide for AMDGPU Back-end
      3 ==============================
      4 
      5 Introduction
      6 ============
      7 
      8 The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
      9 the R600 family up until the current Volcanic Islands (GCN Gen 3).
     10 
     11 
     12 Assembler
     13 =========
     14 
     15 The assembler is currently considered experimental.
     16 
     17 For syntax examples look in test/MC/AMDGPU.
     18 
     19 Below some of the currently supported features (modulo bugs).  These
     20 all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
     21 are also supported but may be missing some instructions and have more bugs:
     22 
     23 DS Instructions
     24 ---------------
     25 All DS instructions are supported.
     26 
     27 FLAT Instructions
     28 ------------------
     29 These instructions are only present in the Sea Islands and Volcanic Islands
     30 instruction set.  All FLAT instructions are supported for these architectures
     31 
     32 MUBUF Instructions
     33 ------------------
     34 All non-atomic MUBUF instructions are supported.
     35 
     36 SMRD Instructions
     37 -----------------
     38 Only the s_load_dword* SMRD instructions are supported.
     39 
     40 SOP1 Instructions
     41 -----------------
     42 All SOP1 instructions are supported.
     43 
     44 SOP2 Instructions
     45 -----------------
     46 All SOP2 instructions are supported.
     47 
     48 SOPC Instructions
     49 -----------------
     50 All SOPC instructions are supported.
     51 
     52 SOPP Instructions
     53 -----------------
     54 
     55 Unless otherwise mentioned, all SOPP instructions that have one or more
     56 operands accept integer operands only.  No verification is performed
     57 on the operands, so it is up to the programmer to be familiar with the
     58 range or acceptable values.
     59 
     60 s_waitcnt
     61 ^^^^^^^^^
     62 
     63 s_waitcnt accepts named arguments to specify which memory counter(s) to
     64 wait for.
     65 
     66 .. code-block:: nasm
     67 
     68    // Wait for all counters to be 0
     69    s_waitcnt 0
     70 
     71    // Equivalent to s_waitcnt 0.  Counter names can also be delimited by
     72    // '&' or ','.
     73    s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
     74 
     75    // Wait for vmcnt counter to be 1.
     76    s_waitcnt vmcnt(1)
     77 
     78 VOP1, VOP2, VOP3, VOPC Instructions
     79 -----------------------------------
     80 
     81 All 32-bit and 64-bit encodings should work.
     82 
     83 The assembler will automatically detect which encoding size to use for
     84 VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force
     85 a specific encoding size, you can add an _e32 (for 32-bit encoding) or
     86 _e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all
     87 instructions support an explicit suffix.  These are all valid assembly
     88 strings:
     89 
     90 .. code-block:: nasm
     91 
     92    v_mul_i32_i24 v1, v2, v3
     93    v_mul_i32_i24_e32 v1, v2, v3
     94    v_mul_i32_i24_e64 v1, v2, v3
     95 
     96 Assembler Directives
     97 --------------------
     98 
     99 .hsa_code_object_version major, minor
    100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    101 
    102 *major* and *minor* are integers that specify the version of the HSA code
    103 object that will be generated by the assembler.  This value will be stored
    104 in an entry of the .note section.
    105 
    106 .hsa_code_object_isa [major, minor, stepping, vendor, arch]
    107 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    108 
    109 *major*, *minor*, and *stepping* are all integers that describe the instruction
    110 set architecture (ISA) version of the assembly program.
    111 
    112 *vendor* and *arch* are quoted strings.  *vendor* should always be equal to
    113 "AMD" and *arch* should always be equal to "AMDGPU".
    114 
    115 If no arguments are specified, then the assembler will derive the ISA version,
    116 *vendor*, and *arch* from the value of the -mcpu option that is passed to the
    117 assembler.
    118 
    119 ISA version, *vendor*, and *arch* will all be stored in a single entry of the
    120 .note section.
    121 
    122 .amd_kernel_code_t
    123 ^^^^^^^^^^^^^^^^^^
    124 
    125 This directive marks the beginning of a list of key / value pairs that are used
    126 to specify the amd_kernel_code_t object that will be emitted by the assembler.
    127 The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
    128 any amd_kernel_code_t values that are unspecified a default value will be
    129 used.  The default value for all keys is 0, with the following exceptions:
    130 
    131 - *kernel_code_version_major* defaults to 1.
    132 - *machine_kind* defaults to 1.
    133 - *machine_version_major*, *machine_version_minor*, and
    134   *machine_version_stepping* are derived from the value of the -mcpu option
    135   that is passed to the assembler.
    136 - *kernel_code_entry_byte_offset* defaults to 256.
    137 - *wavefront_size* defaults to 6.
    138 - *kernarg_segment_alignment*, *group_segment_alignment*, and
    139   *private_segment_alignment* default to 4.  Note that alignments are specified
    140   as a power of two, so a value of **n** means an alignment of 2^ **n**.
    141 
    142 The *.amd_kernel_code_t* directive must be placed immediately after the
    143 function label and before any instructions.
    144 
    145 For a full list of amd_kernel_code_t keys, see the examples in
    146 test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different
    147 keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
    148 
    149 Here is an example of a minimal amd_kernel_code_t specification:
    150 
    151 .. code-block:: nasm
    152 
    153    .hsa_code_object_version 1,0
    154    .hsa_code_object_isa
    155 
    156    .text
    157 
    158    hello_world:
    159 
    160       .amd_kernel_code_t
    161          enable_sgpr_kernarg_segment_ptr = 1
    162          is_ptr64 = 1
    163          compute_pgm_rsrc1_vgprs = 0
    164          compute_pgm_rsrc1_sgprs = 0
    165          compute_pgm_rsrc2_user_sgpr = 2
    166          kernarg_segment_byte_size = 8
    167          wavefront_sgpr_count = 2
    168          workitem_vgpr_count = 3
    169      .end_amd_kernel_code_t
    170 
    171      s_load_dwordx2 s[0:1], s[0:1] 0x0
    172      v_mov_b32 v0, 3.14159
    173      s_waitcnt lgkmcnt(0)
    174      v_mov_b32 v1, s0
    175      v_mov_b32 v2, s1
    176      flat_store_dword v0, v[1:2]
    177      s_endpgm
    178