1 ============================== 2 User Guide for AMDGPU Back-end 3 ============================== 4 5 Introduction 6 ============ 7 8 The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with 9 the R600 family up until the current Volcanic Islands (GCN Gen 3). 10 11 12 Conventions 13 =========== 14 15 Address Spaces 16 -------------- 17 18 The AMDGPU back-end uses the following address space mapping: 19 20 ============= ============================================ 21 Address Space Memory Space 22 ============= ============================================ 23 0 Private 24 1 Global 25 2 Constant 26 3 Local 27 4 Generic (Flat) 28 5 Region 29 ============= ============================================ 30 31 The terminology in the table, aside from the region memory space, is from the 32 OpenCL standard. 33 34 35 Assembler 36 ========= 37 38 The assembler is currently considered experimental. 39 40 For syntax examples look in test/MC/AMDGPU. 41 42 Below some of the currently supported features (modulo bugs). These 43 all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands 44 are also supported but may be missing some instructions and have more bugs: 45 46 DS Instructions 47 --------------- 48 All DS instructions are supported. 49 50 FLAT Instructions 51 ------------------ 52 These instructions are only present in the Sea Islands and Volcanic Islands 53 instruction set. All FLAT instructions are supported for these architectures 54 55 MUBUF Instructions 56 ------------------ 57 All non-atomic MUBUF instructions are supported. 58 59 SMRD Instructions 60 ----------------- 61 Only the s_load_dword* SMRD instructions are supported. 62 63 SOP1 Instructions 64 ----------------- 65 All SOP1 instructions are supported. 66 67 SOP2 Instructions 68 ----------------- 69 All SOP2 instructions are supported. 70 71 SOPC Instructions 72 ----------------- 73 All SOPC instructions are supported. 74 75 SOPP Instructions 76 ----------------- 77 78 Unless otherwise mentioned, all SOPP instructions that have one or more 79 operands accept integer operands only. No verification is performed 80 on the operands, so it is up to the programmer to be familiar with the 81 range or acceptable values. 82 83 s_waitcnt 84 ^^^^^^^^^ 85 86 s_waitcnt accepts named arguments to specify which memory counter(s) to 87 wait for. 88 89 .. code-block:: nasm 90 91 ; Wait for all counters to be 0 92 s_waitcnt 0 93 94 ; Equivalent to s_waitcnt 0. Counter names can also be delimited by 95 ; '&' or ','. 96 s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) 97 98 ; Wait for vmcnt counter to be 1. 99 s_waitcnt vmcnt(1) 100 101 VOP1, VOP2, VOP3, VOPC Instructions 102 ----------------------------------- 103 104 All 32-bit and 64-bit encodings should work. 105 106 The assembler will automatically detect which encoding size to use for 107 VOP1, VOP2, and VOPC instructions based on the operands. If you want to force 108 a specific encoding size, you can add an _e32 (for 32-bit encoding) or 109 _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all 110 instructions support an explicit suffix. These are all valid assembly 111 strings: 112 113 .. code-block:: nasm 114 115 v_mul_i32_i24 v1, v2, v3 116 v_mul_i32_i24_e32 v1, v2, v3 117 v_mul_i32_i24_e64 v1, v2, v3 118 119 Assembler Directives 120 -------------------- 121 122 .hsa_code_object_version major, minor 123 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 124 125 *major* and *minor* are integers that specify the version of the HSA code 126 object that will be generated by the assembler. This value will be stored 127 in an entry of the .note section. 128 129 .hsa_code_object_isa [major, minor, stepping, vendor, arch] 130 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 131 132 *major*, *minor*, and *stepping* are all integers that describe the instruction 133 set architecture (ISA) version of the assembly program. 134 135 *vendor* and *arch* are quoted strings. *vendor* should always be equal to 136 "AMD" and *arch* should always be equal to "AMDGPU". 137 138 If no arguments are specified, then the assembler will derive the ISA version, 139 *vendor*, and *arch* from the value of the -mcpu option that is passed to the 140 assembler. 141 142 ISA version, *vendor*, and *arch* will all be stored in a single entry of the 143 .note section. 144 145 .amd_kernel_code_t 146 ^^^^^^^^^^^^^^^^^^ 147 148 This directive marks the beginning of a list of key / value pairs that are used 149 to specify the amd_kernel_code_t object that will be emitted by the assembler. 150 The list must be terminated by the *.end_amd_kernel_code_t* directive. For 151 any amd_kernel_code_t values that are unspecified a default value will be 152 used. The default value for all keys is 0, with the following exceptions: 153 154 - *kernel_code_version_major* defaults to 1. 155 - *machine_kind* defaults to 1. 156 - *machine_version_major*, *machine_version_minor*, and 157 *machine_version_stepping* are derived from the value of the -mcpu option 158 that is passed to the assembler. 159 - *kernel_code_entry_byte_offset* defaults to 256. 160 - *wavefront_size* defaults to 6. 161 - *kernarg_segment_alignment*, *group_segment_alignment*, and 162 *private_segment_alignment* default to 4. Note that alignments are specified 163 as a power of two, so a value of **n** means an alignment of 2^ **n**. 164 165 The *.amd_kernel_code_t* directive must be placed immediately after the 166 function label and before any instructions. 167 168 For a full list of amd_kernel_code_t keys, see the examples in 169 test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different 170 keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h 171 172 Here is an example of a minimal amd_kernel_code_t specification: 173 174 .. code-block:: nasm 175 176 .hsa_code_object_version 1,0 177 .hsa_code_object_isa 178 179 .hsatext 180 .globl hello_world 181 .p2align 8 182 .amdgpu_hsa_kernel hello_world 183 184 hello_world: 185 186 .amd_kernel_code_t 187 enable_sgpr_kernarg_segment_ptr = 1 188 is_ptr64 = 1 189 compute_pgm_rsrc1_vgprs = 0 190 compute_pgm_rsrc1_sgprs = 0 191 compute_pgm_rsrc2_user_sgpr = 2 192 kernarg_segment_byte_size = 8 193 wavefront_sgpr_count = 2 194 workitem_vgpr_count = 3 195 .end_amd_kernel_code_t 196 197 s_load_dwordx2 s[0:1], s[0:1] 0x0 198 v_mov_b32 v0, 3.14159 199 s_waitcnt lgkmcnt(0) 200 v_mov_b32 v1, s0 201 v_mov_b32 v2, s1 202 flat_store_dword v[1:2], v0 203 s_endpgm 204 .Lfunc_end0: 205 .size hello_world, .Lfunc_end0-hello_world 206