1 =========================== 2 TableGen Language Reference 3 =========================== 4 5 .. sectionauthor:: Sean Silva <silvas (a] purdue.edu> 6 7 .. contents:: 8 :local: 9 10 .. warning:: 11 This document is extremely rough. If you find something lacking, please 12 fix it, file a documentation bug, or ask about it on llvmdev. 13 14 Introduction 15 ============ 16 17 This document is meant to be a normative spec about the TableGen language 18 in and of itself (i.e. how to understand a given construct in terms of how 19 it affects the final set of records represented by the TableGen file). If 20 you are unsure if this document is really what you are looking for, please 21 read :doc:`/TableGenFundamentals` first. 22 23 Notation 24 ======== 25 26 The lexical and syntax notation used here is intended to imitate 27 `Python's`_. In particular, for lexical definitions, the productions 28 operate at the character level and there is no implied whitespace between 29 elements. The syntax definitions operate at the token level, so there is 30 implied whitespace between tokens. 31 32 .. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation 33 34 Lexical Analysis 35 ================ 36 37 TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) 38 comments. 39 40 The following is a listing of the basic punctuation tokens:: 41 42 - + [ ] { } ( ) < > : ; . = ? # 43 44 Numeric literals take one of the following forms: 45 46 .. TableGen actually will lex some pretty strange sequences an interpret 47 them as numbers. What is shown here is an attempt to approximate what it 48 "should" accept. 49 50 .. productionlist:: 51 TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` 52 DecimalInteger: ["+" | "-"] ("0"..."9")+ 53 HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ 54 BinInteger: "0b" ("0" | "1")+ 55 56 One aspect to note is that the :token:`DecimalInteger` token *includes* the 57 ``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as 58 most languages do. 59 60 TableGen has identifier-like tokens: 61 62 .. productionlist:: 63 ualpha: "a"..."z" | "A"..."Z" | "_" 64 TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* 65 TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* 66 67 Note that unlike most languages, TableGen allows :token:`TokIdentifier` to 68 begin with a number. In case of ambiguity, a token will be interpreted as a 69 numeric literal rather than an identifier. 70 71 TableGen also has two string-like literals: 72 73 .. productionlist:: 74 TokString: '"' <non-'"' characters and C-like escapes> '"' 75 TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" 76 77 .. note:: 78 The current implementation accepts the following C-like escapes:: 79 80 \\ \' \" \t \n 81 82 TableGen also has the following keywords:: 83 84 bit bits class code dag 85 def foreach defm field in 86 int let list multiclass string 87 88 TableGen also has "bang operators" which have a 89 wide variety of meanings: 90 91 .. productionlist:: 92 BangOperator: one of 93 :!eq !if !head !tail !con 94 :!add !shl !sra !srl 95 :!cast !empty !subst !foreach !strconcat 96 97 Syntax 98 ====== 99 100 TableGen has an ``include`` mechanism. It does not play a role in the 101 syntax per se, since it is lexically replaced with the contents of the 102 included file. 103 104 .. productionlist:: 105 IncludeDirective: "include" `TokString` 106 107 TableGen's top-level production consists of "objects". 108 109 .. productionlist:: 110 TableGenFile: `Object`* 111 Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach` 112 113 ``class``\es 114 ------------ 115 116 .. productionlist:: 117 Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` 118 119 A ``class`` declaration creates a record which other records can inherit 120 from. A class can be parametrized by a list of "template arguments", whose 121 values can be used in the class body. 122 123 A given class can only be defined once. A ``class`` declaration is 124 considered to define the class if any of the following is true: 125 126 .. break ObjectBody into its consituents so that they are present here? 127 128 #. The :token:`TemplateArgList` is present. 129 #. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. 130 #. The :token:`BaseClassList` in the :token:`ObjectBody` is present. 131 132 You can declare an empty class by giving and empty :token:`TemplateArgList` 133 and an empty :token:`ObjectBody`. This can serve as a restricted form of 134 forward declaration: note that records deriving from the forward-declared 135 class will inherit no fields from it since the record expansion is done 136 when the record is parsed. 137 138 .. productionlist:: 139 TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" 140 141 Declarations 142 ------------ 143 144 .. Omitting mention of arcane "field" prefix to discourage its use. 145 146 The declaration syntax is pretty much what you would expect as a C++ 147 programmer. 148 149 .. productionlist:: 150 Declaration: `Type` `TokIdentifier` ["=" `Value`] 151 152 It assigns the value to the identifer. 153 154 Types 155 ----- 156 157 .. productionlist:: 158 Type: "string" | "code" | "bit" | "int" | "dag" 159 :| "bits" "<" `TokInteger` ">" 160 :| "list" "<" `Type` ">" 161 :| `ClassID` 162 ClassID: `TokIdentifier` 163 164 Both ``string`` and ``code`` correspond to the string type; the difference 165 is purely to indicate programmer intention. 166 167 The :token:`ClassID` must identify a class that has been previously 168 declared or defined. 169 170 Values 171 ------ 172 173 .. productionlist:: 174 Value: `SimpleValue` `ValueSuffix`* 175 ValueSuffix: "{" `RangeList` "}" 176 :| "[" `RangeList` "]" 177 :| "." `TokIdentifier` 178 RangeList: `RangePiece` ("," `RangePiece`)* 179 RangePiece: `TokInteger` 180 :| `TokInteger` "-" `TokInteger` 181 :| `TokInteger` `TokInteger` 182 183 The peculiar last form of :token:`RangePiece` is due to the fact that the 184 "``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as 185 two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, 186 instead of "1", "-", and "5". 187 The :token:`RangeList` can be thought of as specifying "list slice" in some 188 contexts. 189 190 191 :token:`SimpleValue` has a number of forms: 192 193 194 .. productionlist:: 195 SimpleValue: `TokIdentifier` 196 197 The value will be the variable referenced by the identifier. It can be one 198 of: 199 200 .. The code for this is exceptionally abstruse. These examples are a 201 best-effort attempt. 202 203 * name of a ``def``, such as the use of ``Bar`` in:: 204 205 def Bar : SomeClass { 206 int X = 5; 207 } 208 209 def Foo { 210 SomeClass Baz = Bar; 211 } 212 213 * value local to a ``def``, such as the use of ``Bar`` in:: 214 215 def Foo { 216 int Bar = 5; 217 int Baz = Bar; 218 } 219 220 * a template arg of a ``class``, such as the use of ``Bar`` in:: 221 222 class Foo<int Bar> { 223 int Baz = Bar; 224 } 225 226 * value local to a ``multiclass``, such as the use of ``Bar`` in:: 227 228 multiclass Foo { 229 int Bar = 5; 230 int Baz = Bar; 231 } 232 233 * a template arg to a ``multiclass``, such as the use of ``Bar`` in:: 234 235 multiclass Foo<int Bar> { 236 int Baz = Bar; 237 } 238 239 .. productionlist:: 240 SimpleValue: `TokInteger` 241 242 This represents the numeric value of the integer. 243 244 .. productionlist:: 245 SimpleValue: `TokString`+ 246 247 Multiple adjacent string literals are concatenated like in C/C++. The value 248 is the concatenation of the strings. 249 250 .. productionlist:: 251 SimpleValue: `TokCodeFragment` 252 253 The value is the string value of the code fragment. 254 255 .. productionlist:: 256 SimpleValue: "?" 257 258 ``?`` represents an "unset" initializer. 259 260 .. productionlist:: 261 SimpleValue: "{" `ValueList` "}" 262 ValueList: [`ValueListNE`] 263 ValueListNE: `Value` ("," `Value`)* 264 265 This represents a sequence of bits, as would be used to initialize a 266 ``bits<n>`` field (where ``n`` is the number of bits). 267 268 .. productionlist:: 269 SimpleValue: `ClassID` "<" `ValueListNE` ">" 270 271 This generates a new anonymous record definition (as would be created by an 272 unnamed ``def`` inheriting from the given class with the given template 273 arguments) and the value is the value of that record definition. 274 275 .. productionlist:: 276 SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] 277 278 A list initializer. The optional :token:`Type` can be used to indicate a 279 specific element type, otherwise the element type will be deduced from the 280 given values. 281 282 .. The initial `DagArg` of the dag must start with an identifier or 283 !cast, but this is more of an implementation detail and so for now just 284 leave it out. 285 286 .. productionlist:: 287 SimpleValue: "(" `DagArg` `DagArgList` ")" 288 DagArgList: `DagArg` ("," `DagArg`)* 289 DagArg: `Value` [":" `TokVarName`] | `TokVarName` 290 291 The initial :token:`DagArg` is called the "operator" of the dag. 292 293 .. productionlist:: 294 SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" 295 296 Bodies 297 ------ 298 299 .. productionlist:: 300 ObjectBody: `BaseClassList` `Body` 301 BaseClassList: [":" `BaseClassListNE`] 302 BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* 303 SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"] 304 DefmID: `TokIdentifier` 305 306 The version with the :token:`MultiClassID` is only valid in the 307 :token:`BaseClassList` of a ``defm``. 308 The :token:`MultiClassID` should be the name of a ``multiclass``. 309 310 .. put this somewhere else 311 312 It is after parsing the base class list that the "let stack" is applied. 313 314 .. productionlist:: 315 Body: ";" | "{" BodyList "}" 316 BodyList: BodyItem* 317 BodyItem: `Declaration` ";" 318 :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";" 319 320 The ``let`` form allows overriding the value of an inherited field. 321 322 ``def`` 323 ------- 324 325 .. TODO:: 326 There can be pastes in the names here, like ``#NAME#``. Look into that 327 and document it (it boils down to ParseIDValue with IDParseMode == 328 ParseNameMode). ParseObjectName calls into the general ParseValue, with 329 the only different from "arbitrary expression parsing" being IDParseMode 330 == Mode. 331 332 .. productionlist:: 333 Def: "def" `TokIdentifier` `ObjectBody` 334 335 Defines a record whose name is given by the :token:`TokIdentifier`. The 336 fields of the record are inherited from the base classes and defined in the 337 body. 338 339 Special handling occurs if this ``def`` appears inside a ``multiclass`` or 340 a ``foreach``. 341 342 ``defm`` 343 -------- 344 345 .. productionlist:: 346 Defm: "defm" `TokIdentifier` ":" `BaseClassListNE` ";" 347 348 Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must 349 precede any ``class``'s that appear. 350 351 ``foreach`` 352 ----------- 353 354 .. productionlist:: 355 Foreach: "foreach" `Declaration` "in" "{" `Object`* "}" 356 :| "foreach" `Declaration` "in" `Object` 357 358 The value assigned to the variable in the declaration is iterated over and 359 the object or object list is reevaluated with the variable set at each 360 iterated value. 361 362 Top-Level ``let`` 363 ----------------- 364 365 .. productionlist:: 366 Let: "let" `LetList` "in" "{" `Object`* "}" 367 :| "let" `LetList` "in" `Object` 368 LetList: `LetItem` ("," `LetItem`)* 369 LetItem: `TokIdentifier` [`RangeList`] "=" `Value` 370 371 This is effectively equivalent to ``let`` inside the body of a record 372 except that it applies to multiple records at a time. The bindings are 373 applied at the end of parsing the base classes of a record. 374 375 ``multiclass`` 376 -------------- 377 378 .. productionlist:: 379 MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] 380 : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}" 381 BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* 382 MultiClassID: `TokIdentifier` 383 MultiClassObject: `Def` | `Defm` | `Let` | `Foreach` 384