1 =========================== 2 TableGen Language Reference 3 =========================== 4 5 .. contents:: 6 :local: 7 8 .. warning:: 9 This document is extremely rough. If you find something lacking, please 10 fix it, file a documentation bug, or ask about it on llvm-dev. 11 12 Introduction 13 ============ 14 15 This document is meant to be a normative spec about the TableGen language 16 in and of itself (i.e. how to understand a given construct in terms of how 17 it affects the final set of records represented by the TableGen file). If 18 you are unsure if this document is really what you are looking for, please 19 read the :doc:`introduction to TableGen <index>` first. 20 21 Notation 22 ======== 23 24 The lexical and syntax notation used here is intended to imitate 25 `Python's`_. In particular, for lexical definitions, the productions 26 operate at the character level and there is no implied whitespace between 27 elements. The syntax definitions operate at the token level, so there is 28 implied whitespace between tokens. 29 30 .. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation 31 32 Lexical Analysis 33 ================ 34 35 TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) 36 comments. 37 38 The following is a listing of the basic punctuation tokens:: 39 40 - + [ ] { } ( ) < > : ; . = ? # 41 42 Numeric literals take one of the following forms: 43 44 .. TableGen actually will lex some pretty strange sequences an interpret 45 them as numbers. What is shown here is an attempt to approximate what it 46 "should" accept. 47 48 .. productionlist:: 49 TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` 50 DecimalInteger: ["+" | "-"] ("0"..."9")+ 51 HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ 52 BinInteger: "0b" ("0" | "1")+ 53 54 One aspect to note is that the :token:`DecimalInteger` token *includes* the 55 ``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as 56 most languages do. 57 58 Also note that :token:`BinInteger` creates a value of type ``bits<n>`` 59 (where ``n`` is the number of bits). This will implicitly convert to 60 integers when needed. 61 62 TableGen has identifier-like tokens: 63 64 .. productionlist:: 65 ualpha: "a"..."z" | "A"..."Z" | "_" 66 TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* 67 TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* 68 69 Note that unlike most languages, TableGen allows :token:`TokIdentifier` to 70 begin with a number. In case of ambiguity, a token will be interpreted as a 71 numeric literal rather than an identifier. 72 73 TableGen also has two string-like literals: 74 75 .. productionlist:: 76 TokString: '"' <non-'"' characters and C-like escapes> '"' 77 TokCodeFragment: "[{" <shortest text not containing "}]"> "}]" 78 79 :token:`TokCodeFragment` is essentially a multiline string literal 80 delimited by ``[{`` and ``}]``. 81 82 .. note:: 83 The current implementation accepts the following C-like escapes:: 84 85 \\ \' \" \t \n 86 87 TableGen also has the following keywords:: 88 89 bit bits class code dag 90 def foreach defm field in 91 int let list multiclass string 92 93 TableGen also has "bang operators" which have a 94 wide variety of meanings: 95 96 .. productionlist:: 97 BangOperator: one of 98 :!eq !if !head !tail !con 99 :!add !shl !sra !srl !and 100 :!or !empty !subst !foreach !strconcat 101 :!cast !listconcat !size !foldl 102 :!isa !dag !le !lt !ge 103 :!gt !ne 104 105 106 Syntax 107 ====== 108 109 TableGen has an ``include`` mechanism. It does not play a role in the 110 syntax per se, since it is lexically replaced with the contents of the 111 included file. 112 113 .. productionlist:: 114 IncludeDirective: "include" `TokString` 115 116 TableGen's top-level production consists of "objects". 117 118 .. productionlist:: 119 TableGenFile: `Object`* 120 Object: `Class` | `Def` | `Defm` | `Defset` | `Let` | `MultiClass` | 121 `Foreach` 122 123 ``class``\es 124 ------------ 125 126 .. productionlist:: 127 Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` 128 TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" 129 130 A ``class`` declaration creates a record which other records can inherit 131 from. A class can be parametrized by a list of "template arguments", whose 132 values can be used in the class body. 133 134 A given class can only be defined once. A ``class`` declaration is 135 considered to define the class if any of the following is true: 136 137 .. break ObjectBody into its consituents so that they are present here? 138 139 #. The :token:`TemplateArgList` is present. 140 #. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. 141 #. The :token:`BaseClassList` in the :token:`ObjectBody` is present. 142 143 You can declare an empty class by giving and empty :token:`TemplateArgList` 144 and an empty :token:`ObjectBody`. This can serve as a restricted form of 145 forward declaration: note that records deriving from the forward-declared 146 class will inherit no fields from it since the record expansion is done 147 when the record is parsed. 148 149 Every class has an implicit template argument called ``NAME``, which is set 150 to the name of the instantiating ``def`` or ``defm``. The result is undefined 151 if the class is instantiated by an anonymous record. 152 153 Declarations 154 ------------ 155 156 .. Omitting mention of arcane "field" prefix to discourage its use. 157 158 The declaration syntax is pretty much what you would expect as a C++ 159 programmer. 160 161 .. productionlist:: 162 Declaration: `Type` `TokIdentifier` ["=" `Value`] 163 164 It assigns the value to the identifier. 165 166 Types 167 ----- 168 169 .. productionlist:: 170 Type: "string" | "code" | "bit" | "int" | "dag" 171 :| "bits" "<" `TokInteger` ">" 172 :| "list" "<" `Type` ">" 173 :| `ClassID` 174 ClassID: `TokIdentifier` 175 176 Both ``string`` and ``code`` correspond to the string type; the difference 177 is purely to indicate programmer intention. 178 179 The :token:`ClassID` must identify a class that has been previously 180 declared or defined. 181 182 Values 183 ------ 184 185 .. productionlist:: 186 Value: `SimpleValue` `ValueSuffix`* 187 ValueSuffix: "{" `RangeList` "}" 188 :| "[" `RangeList` "]" 189 :| "." `TokIdentifier` 190 RangeList: `RangePiece` ("," `RangePiece`)* 191 RangePiece: `TokInteger` 192 :| `TokInteger` "-" `TokInteger` 193 :| `TokInteger` `TokInteger` 194 195 The peculiar last form of :token:`RangePiece` is due to the fact that the 196 "``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as 197 two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, 198 instead of "1", "-", and "5". 199 The :token:`RangeList` can be thought of as specifying "list slice" in some 200 contexts. 201 202 203 :token:`SimpleValue` has a number of forms: 204 205 206 .. productionlist:: 207 SimpleValue: `TokIdentifier` 208 209 The value will be the variable referenced by the identifier. It can be one 210 of: 211 212 .. The code for this is exceptionally abstruse. These examples are a 213 best-effort attempt. 214 215 * name of a ``def``, such as the use of ``Bar`` in:: 216 217 def Bar : SomeClass { 218 int X = 5; 219 } 220 221 def Foo { 222 SomeClass Baz = Bar; 223 } 224 225 * value local to a ``def``, such as the use of ``Bar`` in:: 226 227 def Foo { 228 int Bar = 5; 229 int Baz = Bar; 230 } 231 232 Values defined in superclasses can be accessed the same way. 233 234 * a template arg of a ``class``, such as the use of ``Bar`` in:: 235 236 class Foo<int Bar> { 237 int Baz = Bar; 238 } 239 240 * value local to a ``class``, such as the use of ``Bar`` in:: 241 242 class Foo { 243 int Bar = 5; 244 int Baz = Bar; 245 } 246 247 * a template arg to a ``multiclass``, such as the use of ``Bar`` in:: 248 249 multiclass Foo<int Bar> { 250 def : SomeClass<Bar>; 251 } 252 253 * the iteration variable of a ``foreach``, such as the use of ``i`` in:: 254 255 foreach i = 0-5 in 256 def Foo#i; 257 258 * a variable defined by ``defset`` 259 260 * the implicit template argument ``NAME`` in a ``class`` or ``multiclass`` 261 262 .. productionlist:: 263 SimpleValue: `TokInteger` 264 265 This represents the numeric value of the integer. 266 267 .. productionlist:: 268 SimpleValue: `TokString`+ 269 270 Multiple adjacent string literals are concatenated like in C/C++. The value 271 is the concatenation of the strings. 272 273 .. productionlist:: 274 SimpleValue: `TokCodeFragment` 275 276 The value is the string value of the code fragment. 277 278 .. productionlist:: 279 SimpleValue: "?" 280 281 ``?`` represents an "unset" initializer. 282 283 .. productionlist:: 284 SimpleValue: "{" `ValueList` "}" 285 ValueList: [`ValueListNE`] 286 ValueListNE: `Value` ("," `Value`)* 287 288 This represents a sequence of bits, as would be used to initialize a 289 ``bits<n>`` field (where ``n`` is the number of bits). 290 291 .. productionlist:: 292 SimpleValue: `ClassID` "<" `ValueListNE` ">" 293 294 This generates a new anonymous record definition (as would be created by an 295 unnamed ``def`` inheriting from the given class with the given template 296 arguments) and the value is the value of that record definition. 297 298 .. productionlist:: 299 SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] 300 301 A list initializer. The optional :token:`Type` can be used to indicate a 302 specific element type, otherwise the element type will be deduced from the 303 given values. 304 305 .. The initial `DagArg` of the dag must start with an identifier or 306 !cast, but this is more of an implementation detail and so for now just 307 leave it out. 308 309 .. productionlist:: 310 SimpleValue: "(" `DagArg` [`DagArgList`] ")" 311 DagArgList: `DagArg` ("," `DagArg`)* 312 DagArg: `Value` [":" `TokVarName`] | `TokVarName` 313 314 The initial :token:`DagArg` is called the "operator" of the dag. 315 316 .. productionlist:: 317 SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" 318 319 Bodies 320 ------ 321 322 .. productionlist:: 323 ObjectBody: `BaseClassList` `Body` 324 BaseClassList: [":" `BaseClassListNE`] 325 BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* 326 SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"] 327 DefmID: `TokIdentifier` 328 329 The version with the :token:`MultiClassID` is only valid in the 330 :token:`BaseClassList` of a ``defm``. 331 The :token:`MultiClassID` should be the name of a ``multiclass``. 332 333 .. put this somewhere else 334 335 It is after parsing the base class list that the "let stack" is applied. 336 337 .. productionlist:: 338 Body: ";" | "{" BodyList "}" 339 BodyList: BodyItem* 340 BodyItem: `Declaration` ";" 341 :| "let" `TokIdentifier` [ "{" `RangeList` "}" ] "=" `Value` ";" 342 343 The ``let`` form allows overriding the value of an inherited field. 344 345 ``def`` 346 ------- 347 348 .. productionlist:: 349 Def: "def" [`Value`] `ObjectBody` 350 351 Defines a record whose name is given by the optional :token:`Value`. The value 352 is parsed in a special mode where global identifiers (records and variables 353 defined by ``defset``) are not recognized, and all unrecognized identifiers 354 are interpreted as strings. 355 356 If no name is given, the record is anonymous. The final name of anonymous 357 records is undefined, but globally unique. 358 359 Special handling occurs if this ``def`` appears inside a ``multiclass`` or 360 a ``foreach``. 361 362 When a non-anonymous record is defined in a multiclass and the given name 363 does not contain a reference to the implicit template argument ``NAME``, such 364 a reference will automatically be prepended. That is, the following are 365 equivalent inside a multiclass:: 366 367 def Foo; 368 def NAME#Foo; 369 370 ``defm`` 371 -------- 372 373 .. productionlist:: 374 Defm: "defm" [`Value`] ":" `BaseClassListNE` ";" 375 376 The :token:`BaseClassList` is a list of at least one ``multiclass`` and any 377 number of ``class``'s. The ``multiclass``'s must occur before any ``class``'s. 378 379 Instantiates all records defined in all given ``multiclass``'s and adds the 380 given ``class``'s as superclasses. 381 382 The name is parsed in the same special mode used by ``def``. If the name is 383 missing, a globally unique string is used instead (but instantiated records 384 are not considered to be anonymous, unless they were originally defined by an 385 anonymous ``def``) That is, the following have different semantics:: 386 387 defm : SomeMultiClass<...>; // some globally unique name 388 defm "" : SomeMultiClass<...>; // empty name string 389 390 When it occurs inside a multiclass, the second variant is equivalent to 391 ``defm NAME : ...``. More generally, when ``defm`` occurs in a multiclass and 392 its name does not contain a reference to the implicit template argument 393 ``NAME``, such a reference will automatically be prepended. That is, the 394 following are equivalent inside a multiclass:: 395 396 defm Foo : SomeMultiClass<...>; 397 defm NAME#Foo : SomeMultiClass<...>; 398 399 ``defset`` 400 ---------- 401 .. productionlist:: 402 Defset: "defset" `Type` `TokIdentifier` "=" "{" `Object`* "}" 403 404 All records defined inside the braces via ``def`` and ``defm`` are collected 405 in a globally accessible list of the given name (in addition to being added 406 to the global collection of records as usual). Anonymous records created inside 407 initializier expressions using the ``Class<args...>`` syntax are never collected 408 in a defset. 409 410 The given type must be ``list<A>``, where ``A`` is some class. It is an error 411 to define a record (via ``def`` or ``defm``) inside the braces which doesn't 412 derive from ``A``. 413 414 ``foreach`` 415 ----------- 416 417 .. productionlist:: 418 Foreach: "foreach" `ForeachDeclaration` "in" "{" `Object`* "}" 419 :| "foreach" `ForeachDeclaration` "in" `Object` 420 ForeachDeclaration: ID "=" ( "{" `RangeList` "}" | `RangePiece` | `Value` ) 421 422 The value assigned to the variable in the declaration is iterated over and 423 the object or object list is reevaluated with the variable set at each 424 iterated value. 425 426 Note that the productions involving RangeList and RangePiece have precedence 427 over the more generic value parsing based on the first token. 428 429 Top-Level ``let`` 430 ----------------- 431 432 .. productionlist:: 433 Let: "let" `LetList` "in" "{" `Object`* "}" 434 :| "let" `LetList` "in" `Object` 435 LetList: `LetItem` ("," `LetItem`)* 436 LetItem: `TokIdentifier` [`RangeList`] "=" `Value` 437 438 This is effectively equivalent to ``let`` inside the body of a record 439 except that it applies to multiple records at a time. The bindings are 440 applied at the end of parsing the base classes of a record. 441 442 ``multiclass`` 443 -------------- 444 445 .. productionlist:: 446 MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] 447 : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}" 448 BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* 449 MultiClassID: `TokIdentifier` 450 MultiClassObject: `Def` | `Defm` | `Let` | `Foreach` 451