Home | History | Annotate | Download | only in TableGen
      1 ========
      2 TableGen
      3 ========
      4 
      5 .. contents::
      6    :local:
      7 
      8 .. toctree::
      9    :hidden:
     10 
     11    BackEnds
     12    LangRef
     13    LangIntro
     14    Deficiencies
     15 
     16 Introduction
     17 ============
     18 
     19 TableGen's purpose is to help a human develop and maintain records of
     20 domain-specific information.  Because there may be a large number of these
     21 records, it is specifically designed to allow writing flexible descriptions and
     22 for common features of these records to be factored out.  This reduces the
     23 amount of duplication in the description, reduces the chance of error, and makes
     24 it easier to structure domain specific information.
     25 
     26 The core part of TableGen parses a file, instantiates the declarations, and
     27 hands the result off to a domain-specific `backend`_ for processing.
     28 
     29 The current major users of TableGen are :doc:`../CodeGenerator`
     30 and the
     31 `Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
     32 
     33 Note that if you work on TableGen much, and use emacs or vim, that you can find
     34 an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
     35 ``llvm/utils/vim`` directories of your LLVM distribution, respectively.
     36 
     37 .. _intro:
     38 
     39 
     40 The TableGen program
     41 ====================
     42 
     43 TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
     44 on your build directory under `bin`. It is not installed in the system (or where
     45 your sysroot is set to), since it has no use beyond LLVM's build process.
     46 
     47 Running TableGen
     48 ----------------
     49 
     50 TableGen runs just like any other LLVM tool.  The first (optional) argument
     51 specifies the file to read.  If a filename is not specified, ``llvm-tblgen``
     52 reads from standard input.
     53 
     54 To be useful, one of the `backends`_ must be used.  These backends are
     55 selectable on the command line (type '``llvm-tblgen -help``' for a list).  For
     56 example, to get a list of all of the definitions that subclass a particular type
     57 (which can be useful for building up an enum list of these records), use the
     58 ``-print-enums`` option:
     59 
     60 .. code-block:: bash
     61 
     62   $ llvm-tblgen X86.td -print-enums -class=Register
     63   AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
     64   ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
     65   MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
     66   R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
     67   R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
     68   RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
     69   XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
     70   XMM6, XMM7, XMM8, XMM9,
     71 
     72   $ llvm-tblgen X86.td -print-enums -class=Instruction 
     73   ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
     74   ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
     75   ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
     76   ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
     77   ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
     78 
     79 The default backend prints out all of the records.
     80 
     81 If you plan to use TableGen, you will most likely have to write a `backend`_
     82 that extracts the information specific to what you need and formats it in the
     83 appropriate way.
     84 
     85 Example
     86 -------
     87 
     88 With no other arguments, `llvm-tblgen` parses the specified file and prints out all
     89 of the classes, then all of the definitions.  This is a good way to see what the
     90 various definitions expand to fully.  Running this on the ``X86.td`` file prints
     91 this (at the time of this writing):
     92 
     93 .. code-block:: llvm
     94 
     95   ...
     96   def ADD32rr {   // Instruction X86Inst I
     97     string Namespace = "X86";
     98     dag OutOperandList = (outs GR32:$dst);
     99     dag InOperandList = (ins GR32:$src1, GR32:$src2);
    100     string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
    101     list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
    102     list<Register> Uses = [];
    103     list<Register> Defs = [EFLAGS];
    104     list<Predicate> Predicates = [];
    105     int CodeSize = 3;
    106     int AddedComplexity = 0;
    107     bit isReturn = 0;
    108     bit isBranch = 0;
    109     bit isIndirectBranch = 0;
    110     bit isBarrier = 0;
    111     bit isCall = 0;
    112     bit canFoldAsLoad = 0;
    113     bit mayLoad = 0;
    114     bit mayStore = 0;
    115     bit isImplicitDef = 0;
    116     bit isConvertibleToThreeAddress = 1;
    117     bit isCommutable = 1;
    118     bit isTerminator = 0;
    119     bit isReMaterializable = 0;
    120     bit isPredicable = 0;
    121     bit hasDelaySlot = 0;
    122     bit usesCustomInserter = 0;
    123     bit hasCtrlDep = 0;
    124     bit isNotDuplicable = 0;
    125     bit hasSideEffects = 0;
    126     bit neverHasSideEffects = 0;
    127     InstrItinClass Itinerary = NoItinerary;
    128     string Constraints = "";
    129     string DisableEncoding = "";
    130     bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
    131     Format Form = MRMDestReg;
    132     bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
    133     ImmType ImmT = NoImm;
    134     bits<3> ImmTypeBits = { 0, 0, 0 };
    135     bit hasOpSizePrefix = 0;
    136     bit hasAdSizePrefix = 0;
    137     bits<4> Prefix = { 0, 0, 0, 0 };
    138     bit hasREX_WPrefix = 0;
    139     FPFormat FPForm = ?;
    140     bits<3> FPFormBits = { 0, 0, 0 };
    141   }
    142   ...
    143 
    144 This definition corresponds to the 32-bit register-register ``add`` instruction
    145 of the x86 architecture.  ``def ADD32rr`` defines a record named
    146 ``ADD32rr``, and the comment at the end of the line indicates the superclasses
    147 of the definition.  The body of the record contains all of the data that
    148 TableGen assembled for the record, indicating that the instruction is part of
    149 the "X86" namespace, the pattern indicating how the instruction is selected by
    150 the code generator, that it is a two-address instruction, has a particular
    151 encoding, etc.  The contents and semantics of the information in the record are
    152 specific to the needs of the X86 backend, and are only shown as an example.
    153 
    154 As you can see, a lot of information is needed for every instruction supported
    155 by the code generator, and specifying it all manually would be unmaintainable,
    156 prone to bugs, and tiring to do in the first place.  Because we are using
    157 TableGen, all of the information was derived from the following definition:
    158 
    159 .. code-block:: llvm
    160 
    161   let Defs = [EFLAGS],
    162       isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
    163       isConvertibleToThreeAddress = 1 in // Can transform into LEA.
    164   def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
    165                                      (ins GR32:$src1, GR32:$src2),
    166                    "add{l}\t{$src2, $dst|$dst, $src2}",
    167                    [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
    168 
    169 This definition makes use of the custom class ``I`` (extended from the custom
    170 class ``X86Inst``), which is defined in the X86-specific TableGen file, to
    171 factor out the common features that instructions of its class share.  A key
    172 feature of TableGen is that it allows the end-user to define the abstractions
    173 they prefer to use when describing their information.
    174 
    175 Each ``def`` record has a special entry called "NAME".  This is the name of the
    176 record ("``ADD32rr``" above).  In the general case ``def`` names can be formed
    177 from various kinds of string processing expressions and ``NAME`` resolves to the
    178 final value obtained after resolving all of those expressions.  The user may
    179 refer to ``NAME`` anywhere she desires to use the ultimate name of the ``def``.
    180 ``NAME`` should not be defined anywhere else in user code to avoid conflicts.
    181 
    182 Syntax
    183 ======
    184 
    185 TableGen has a syntax that is loosely based on C++ templates, with built-in
    186 types and specification. In addition, TableGen's syntax introduces some
    187 automation concepts like multiclass, foreach, let, etc.
    188 
    189 Basic concepts
    190 --------------
    191 
    192 TableGen files consist of two key parts: 'classes' and 'definitions', both of
    193 which are considered 'records'.
    194 
    195 **TableGen records** have a unique name, a list of values, and a list of
    196 superclasses.  The list of values is the main data that TableGen builds for each
    197 record; it is this that holds the domain specific information for the
    198 application.  The interpretation of this data is left to a specific `backend`_,
    199 but the structure and format rules are taken care of and are fixed by
    200 TableGen.
    201 
    202 **TableGen definitions** are the concrete form of 'records'.  These generally do
    203 not have any undefined values, and are marked with the '``def``' keyword.
    204 
    205 .. code-block:: llvm
    206 
    207   def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
    208                                         "Enable ARMv8 FP">;
    209 
    210 In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
    211 with some values. The names of the classes are defined via the
    212 keyword `class` either on the same file or some other included. Most target
    213 TableGen files include the generic ones in ``include/llvm/Target``.
    214 
    215 **TableGen classes** are abstract records that are used to build and describe
    216 other records.  These classes allow the end-user to build abstractions for
    217 either the domain they are targeting (such as "Register", "RegisterClass", and
    218 "Instruction" in the LLVM code generator) or for the implementor to help factor
    219 out common properties of records (such as "FPInst", which is used to represent
    220 floating point instructions in the X86 backend).  TableGen keeps track of all of
    221 the classes that are used to build up a definition, so the backend can find all
    222 definitions of a particular class, such as "Instruction".
    223 
    224 .. code-block:: llvm
    225 
    226  class ProcNoItin<string Name, list<SubtargetFeature> Features>
    227        : Processor<Name, NoItineraries, Features>;
    228   
    229 Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
    230 a list of target features is specializing the class Processor by passing the
    231 arguments down as well as hard-coding NoItineraries.
    232 
    233 **TableGen multiclasses** are groups of abstract records that are instantiated
    234 all at once.  Each instantiation can result in multiple TableGen definitions.
    235 If a multiclass inherits from another multiclass, the definitions in the
    236 sub-multiclass become part of the current multiclass, as if they were declared
    237 in the current multiclass.
    238 
    239 .. code-block:: llvm
    240 
    241   multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
    242                           dag address, ValueType sty> {
    243   def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
    244             (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
    245               Base, Offset, Extend)>;
    246 
    247   def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
    248             (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
    249               Base, Offset, Extend)>;
    250   }
    251 
    252   defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
    253                         !foreach(decls.pattern, address,
    254                                  !subst(SHIFT, imm_eq0, decls.pattern)),
    255                         i8>;
    256 
    257 
    258 
    259 See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
    260 information on the usage of the language, and the
    261 :doc:`TableGen Language Reference <LangRef>` for more in-depth description
    262 of the formal language specification.
    263 
    264 .. _backend:
    265 .. _backends:
    266 
    267 TableGen backends
    268 =================
    269 
    270 TableGen files have no real meaning without a back-end. The default operation
    271 of running ``llvm-tblgen`` is to print the information in a textual format, but
    272 that's only useful for debugging of the TableGen files themselves. The power
    273 in TableGen is, however, to interpret the source files into an internal 
    274 representation that can be generated into anything you want.
    275 
    276 Current usage of TableGen is to create include huge files with tables that you
    277 can either include directly (if the output is in the language you're coding),
    278 or be used in pre-processing via macros surrounding the include of the file.
    279 
    280 Direct output can be used if the back-end already prints a table in C format
    281 or if the output is just a list of strings (for error and warning messages).
    282 Pre-processed output should be used if the same information needs to be used
    283 in different contexts (like Instruction names), so your back-end should print
    284 a meta-information list that can be shaped into different compile-time formats.
    285 
    286 See the `TableGen BackEnds <BackEnds.html>`_ for more information.
    287 
    288 TableGen Deficiencies
    289 =====================
    290 
    291 Despite being very generic, TableGen has some deficiencies that have been
    292 pointed out numerous times. The common theme is that, while TableGen allows
    293 you to build Domain-Specific-Languages, the final languages that you create
    294 lack the power of other DSLs, which in turn increase considerably the size
    295 and complecity of TableGen files.
    296 
    297 At the same time, TableGen allows you to create virtually any meaning of
    298 the basic concepts via custom-made back-ends, which can pervert the original
    299 design and make it very hard for newcomers to understand the evil TableGen
    300 file.
    301 
    302 There are some in favour of extending the semantics even more, but making sure
    303 back-ends adhere to strict rules. Others are suggesting we should move to less,
    304 more powerful DSLs designed with specific purposes, or even re-using existing
    305 DSLs.
    306 
    307 Either way, this is a discussion that will likely span across several years,
    308 if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
    309 document.
    310