1 ======== 2 TableGen 3 ======== 4 5 .. contents:: 6 :local: 7 8 .. toctree:: 9 :hidden: 10 11 BackEnds 12 LangRef 13 LangIntro 14 Deficiencies 15 16 Introduction 17 ============ 18 19 TableGen's purpose is to help a human develop and maintain records of 20 domain-specific information. Because there may be a large number of these 21 records, it is specifically designed to allow writing flexible descriptions and 22 for common features of these records to be factored out. This reduces the 23 amount of duplication in the description, reduces the chance of error, and makes 24 it easier to structure domain specific information. 25 26 The core part of TableGen parses a file, instantiates the declarations, and 27 hands the result off to a domain-specific `backend`_ for processing. 28 29 The current major users of TableGen are :doc:`../CodeGenerator` 30 and the 31 `Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. 32 33 Note that if you work on TableGen much, and use emacs or vim, that you can find 34 an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and 35 ``llvm/utils/vim`` directories of your LLVM distribution, respectively. 36 37 .. _intro: 38 39 40 The TableGen program 41 ==================== 42 43 TableGen files are interpreted by the TableGen program: `llvm-tblgen` available 44 on your build directory under `bin`. It is not installed in the system (or where 45 your sysroot is set to), since it has no use beyond LLVM's build process. 46 47 Running TableGen 48 ---------------- 49 50 TableGen runs just like any other LLVM tool. The first (optional) argument 51 specifies the file to read. If a filename is not specified, ``llvm-tblgen`` 52 reads from standard input. 53 54 To be useful, one of the `backends`_ must be used. These backends are 55 selectable on the command line (type '``llvm-tblgen -help``' for a list). For 56 example, to get a list of all of the definitions that subclass a particular type 57 (which can be useful for building up an enum list of these records), use the 58 ``-print-enums`` option: 59 60 .. code-block:: bash 61 62 $ llvm-tblgen X86.td -print-enums -class=Register 63 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, 64 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, 65 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, 66 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, 67 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, 68 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, 69 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, 70 XMM6, XMM7, XMM8, XMM9, 71 72 $ llvm-tblgen X86.td -print-enums -class=Instruction 73 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, 74 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, 75 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, 76 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, 77 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... 78 79 The default backend prints out all of the records. 80 81 If you plan to use TableGen, you will most likely have to write a `backend`_ 82 that extracts the information specific to what you need and formats it in the 83 appropriate way. 84 85 Example 86 ------- 87 88 With no other arguments, `llvm-tblgen` parses the specified file and prints out all 89 of the classes, then all of the definitions. This is a good way to see what the 90 various definitions expand to fully. Running this on the ``X86.td`` file prints 91 this (at the time of this writing): 92 93 .. code-block:: llvm 94 95 ... 96 def ADD32rr { // Instruction X86Inst I 97 string Namespace = "X86"; 98 dag OutOperandList = (outs GR32:$dst); 99 dag InOperandList = (ins GR32:$src1, GR32:$src2); 100 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; 101 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; 102 list<Register> Uses = []; 103 list<Register> Defs = [EFLAGS]; 104 list<Predicate> Predicates = []; 105 int CodeSize = 3; 106 int AddedComplexity = 0; 107 bit isReturn = 0; 108 bit isBranch = 0; 109 bit isIndirectBranch = 0; 110 bit isBarrier = 0; 111 bit isCall = 0; 112 bit canFoldAsLoad = 0; 113 bit mayLoad = 0; 114 bit mayStore = 0; 115 bit isImplicitDef = 0; 116 bit isConvertibleToThreeAddress = 1; 117 bit isCommutable = 1; 118 bit isTerminator = 0; 119 bit isReMaterializable = 0; 120 bit isPredicable = 0; 121 bit hasDelaySlot = 0; 122 bit usesCustomInserter = 0; 123 bit hasCtrlDep = 0; 124 bit isNotDuplicable = 0; 125 bit hasSideEffects = 0; 126 bit neverHasSideEffects = 0; 127 InstrItinClass Itinerary = NoItinerary; 128 string Constraints = ""; 129 string DisableEncoding = ""; 130 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; 131 Format Form = MRMDestReg; 132 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; 133 ImmType ImmT = NoImm; 134 bits<3> ImmTypeBits = { 0, 0, 0 }; 135 bit hasOpSizePrefix = 0; 136 bit hasAdSizePrefix = 0; 137 bits<4> Prefix = { 0, 0, 0, 0 }; 138 bit hasREX_WPrefix = 0; 139 FPFormat FPForm = ?; 140 bits<3> FPFormBits = { 0, 0, 0 }; 141 } 142 ... 143 144 This definition corresponds to the 32-bit register-register ``add`` instruction 145 of the x86 architecture. ``def ADD32rr`` defines a record named 146 ``ADD32rr``, and the comment at the end of the line indicates the superclasses 147 of the definition. The body of the record contains all of the data that 148 TableGen assembled for the record, indicating that the instruction is part of 149 the "X86" namespace, the pattern indicating how the instruction is selected by 150 the code generator, that it is a two-address instruction, has a particular 151 encoding, etc. The contents and semantics of the information in the record are 152 specific to the needs of the X86 backend, and are only shown as an example. 153 154 As you can see, a lot of information is needed for every instruction supported 155 by the code generator, and specifying it all manually would be unmaintainable, 156 prone to bugs, and tiring to do in the first place. Because we are using 157 TableGen, all of the information was derived from the following definition: 158 159 .. code-block:: llvm 160 161 let Defs = [EFLAGS], 162 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y 163 isConvertibleToThreeAddress = 1 in // Can transform into LEA. 164 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), 165 (ins GR32:$src1, GR32:$src2), 166 "add{l}\t{$src2, $dst|$dst, $src2}", 167 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; 168 169 This definition makes use of the custom class ``I`` (extended from the custom 170 class ``X86Inst``), which is defined in the X86-specific TableGen file, to 171 factor out the common features that instructions of its class share. A key 172 feature of TableGen is that it allows the end-user to define the abstractions 173 they prefer to use when describing their information. 174 175 Each ``def`` record has a special entry called "NAME". This is the name of the 176 record ("``ADD32rr``" above). In the general case ``def`` names can be formed 177 from various kinds of string processing expressions and ``NAME`` resolves to the 178 final value obtained after resolving all of those expressions. The user may 179 refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``. 180 ``NAME`` should not be defined anywhere else in user code to avoid conflicts. 181 182 Syntax 183 ====== 184 185 TableGen has a syntax that is loosely based on C++ templates, with built-in 186 types and specification. In addition, TableGen's syntax introduces some 187 automation concepts like multiclass, foreach, let, etc. 188 189 Basic concepts 190 -------------- 191 192 TableGen files consist of two key parts: 'classes' and 'definitions', both of 193 which are considered 'records'. 194 195 **TableGen records** have a unique name, a list of values, and a list of 196 superclasses. The list of values is the main data that TableGen builds for each 197 record; it is this that holds the domain specific information for the 198 application. The interpretation of this data is left to a specific `backend`_, 199 but the structure and format rules are taken care of and are fixed by 200 TableGen. 201 202 **TableGen definitions** are the concrete form of 'records'. These generally do 203 not have any undefined values, and are marked with the '``def``' keyword. 204 205 .. code-block:: llvm 206 207 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", 208 "Enable ARMv8 FP">; 209 210 In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised 211 with some values. The names of the classes are defined via the 212 keyword `class` either on the same file or some other included. Most target 213 TableGen files include the generic ones in ``include/llvm/Target``. 214 215 **TableGen classes** are abstract records that are used to build and describe 216 other records. These classes allow the end-user to build abstractions for 217 either the domain they are targeting (such as "Register", "RegisterClass", and 218 "Instruction" in the LLVM code generator) or for the implementor to help factor 219 out common properties of records (such as "FPInst", which is used to represent 220 floating point instructions in the X86 backend). TableGen keeps track of all of 221 the classes that are used to build up a definition, so the backend can find all 222 definitions of a particular class, such as "Instruction". 223 224 .. code-block:: llvm 225 226 class ProcNoItin<string Name, list<SubtargetFeature> Features> 227 : Processor<Name, NoItineraries, Features>; 228 229 Here, the class ProcNoItin, receiving parameters `Name` of type `string` and 230 a list of target features is specializing the class Processor by passing the 231 arguments down as well as hard-coding NoItineraries. 232 233 **TableGen multiclasses** are groups of abstract records that are instantiated 234 all at once. Each instantiation can result in multiple TableGen definitions. 235 If a multiclass inherits from another multiclass, the definitions in the 236 sub-multiclass become part of the current multiclass, as if they were declared 237 in the current multiclass. 238 239 .. code-block:: llvm 240 241 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, 242 dag address, ValueType sty> { 243 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), 244 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") 245 Base, Offset, Extend)>; 246 247 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), 248 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") 249 Base, Offset, Extend)>; 250 } 251 252 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, 253 !foreach(decls.pattern, address, 254 !subst(SHIFT, imm_eq0, decls.pattern)), 255 i8>; 256 257 258 259 See the :doc:`TableGen Language Introduction <LangIntro>` for more generic 260 information on the usage of the language, and the 261 :doc:`TableGen Language Reference <LangRef>` for more in-depth description 262 of the formal language specification. 263 264 .. _backend: 265 .. _backends: 266 267 TableGen backends 268 ================= 269 270 TableGen files have no real meaning without a back-end. The default operation 271 of running ``llvm-tblgen`` is to print the information in a textual format, but 272 that's only useful for debugging of the TableGen files themselves. The power 273 in TableGen is, however, to interpret the source files into an internal 274 representation that can be generated into anything you want. 275 276 Current usage of TableGen is to create include huge files with tables that you 277 can either include directly (if the output is in the language you're coding), 278 or be used in pre-processing via macros surrounding the include of the file. 279 280 Direct output can be used if the back-end already prints a table in C format 281 or if the output is just a list of strings (for error and warning messages). 282 Pre-processed output should be used if the same information needs to be used 283 in different contexts (like Instruction names), so your back-end should print 284 a meta-information list that can be shaped into different compile-time formats. 285 286 See the `TableGen BackEnds <BackEnds.html>`_ for more information. 287 288 TableGen Deficiencies 289 ===================== 290 291 Despite being very generic, TableGen has some deficiencies that have been 292 pointed out numerous times. The common theme is that, while TableGen allows 293 you to build Domain-Specific-Languages, the final languages that you create 294 lack the power of other DSLs, which in turn increase considerably the size 295 and complecity of TableGen files. 296 297 At the same time, TableGen allows you to create virtually any meaning of 298 the basic concepts via custom-made back-ends, which can pervert the original 299 design and make it very hard for newcomers to understand the evil TableGen 300 file. 301 302 There are some in favour of extending the semantics even more, but making sure 303 back-ends adhere to strict rules. Others are suggesting we should move to less, 304 more powerful DSLs designed with specific purposes, or even re-using existing 305 DSLs. 306 307 Either way, this is a discussion that will likely span across several years, 308 if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_ 309 document. 310