1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3 <html> 4 <head> 5 <meta http-equiv="content-type" content="text/html; charset=utf-8"> 6 <title>The LLVM Target-Independent Code Generator</title> 7 <link rel="stylesheet" href="llvm.css" type="text/css"> 8 9 <style type="text/css"> 10 .unknown { background-color: #C0C0C0; text-align: center; } 11 .unknown:before { content: "?" } 12 .no { background-color: #C11B17 } 13 .no:before { content: "N" } 14 .partial { background-color: #F88017 } 15 .yes { background-color: #0F0; } 16 .yes:before { content: "Y" } 17 </style> 18 19 </head> 20 <body> 21 22 <h1> 23 The LLVM Target-Independent Code Generator 24 </h1> 25 26 <ol> 27 <li><a href="#introduction">Introduction</a> 28 <ul> 29 <li><a href="#required">Required components in the code generator</a></li> 30 <li><a href="#high-level-design">The high-level design of the code 31 generator</a></li> 32 <li><a href="#tablegen">Using TableGen for target description</a></li> 33 </ul> 34 </li> 35 <li><a href="#targetdesc">Target description classes</a> 36 <ul> 37 <li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li> 38 <li><a href="#targetdata">The <tt>TargetData</tt> class</a></li> 39 <li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li> 40 <li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li> 41 <li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li> 42 <li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li> 43 <li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li> 44 <li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li> 45 </ul> 46 </li> 47 <li><a href="#codegendesc">The "Machine" Code Generator classes</a> 48 <ul> 49 <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li> 50 <li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt> 51 class</a></li> 52 <li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li> 53 </ul> 54 </li> 55 <li><a href="#mc">The "MC" Layer</a> 56 <ul> 57 <li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li> 58 <li><a href="#mccontext">The <tt>MCContext</tt> class</a> 59 <li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li> 60 <li><a href="#mcsection">The <tt>MCSection</tt> class</a></li> 61 <li><a href="#mcinst">The <tt>MCInst</tt> class</a></li> 62 </ul> 63 </li> 64 <li><a href="#codegenalgs">Target-independent code generation algorithms</a> 65 <ul> 66 <li><a href="#instselect">Instruction Selection</a> 67 <ul> 68 <li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li> 69 <li><a href="#selectiondag_process">SelectionDAG Code Generation 70 Process</a></li> 71 <li><a href="#selectiondag_build">Initial SelectionDAG 72 Construction</a></li> 73 <li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li> 74 <li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li> 75 <li><a href="#selectiondag_optimize">SelectionDAG Optimization 76 Phase: the DAG Combiner</a></li> 77 <li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li> 78 <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation 79 Phase</a></li> 80 <li><a href="#selectiondag_future">Future directions for the 81 SelectionDAG</a></li> 82 </ul></li> 83 <li><a href="#liveintervals">Live Intervals</a> 84 <ul> 85 <li><a href="#livevariable_analysis">Live Variable Analysis</a></li> 86 <li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li> 87 </ul></li> 88 <li><a href="#regalloc">Register Allocation</a> 89 <ul> 90 <li><a href="#regAlloc_represent">How registers are represented in 91 LLVM</a></li> 92 <li><a href="#regAlloc_howTo">Mapping virtual registers to physical 93 registers</a></li> 94 <li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li> 95 <li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li> 96 <li><a href="#regAlloc_fold">Instruction folding</a></li> 97 <li><a href="#regAlloc_builtIn">Built in register allocators</a></li> 98 </ul></li> 99 <li><a href="#codeemit">Code Emission</a></li> 100 </ul> 101 </li> 102 <li><a href="#nativeassembler">Implementing a Native Assembler</a></li> 103 104 <li><a href="#targetimpls">Target-specific Implementation Notes</a> 105 <ul> 106 <li><a href="#targetfeatures">Target Feature Matrix</a></li> 107 <li><a href="#tailcallopt">Tail call optimization</a></li> 108 <li><a href="#sibcallopt">Sibling call optimization</a></li> 109 <li><a href="#x86">The X86 backend</a></li> 110 <li><a href="#ppc">The PowerPC backend</a> 111 <ul> 112 <li><a href="#ppc_abi">LLVM PowerPC ABI</a></li> 113 <li><a href="#ppc_frame">Frame Layout</a></li> 114 <li><a href="#ppc_prolog">Prolog/Epilog</a></li> 115 <li><a href="#ppc_dynamic">Dynamic Allocation</a></li> 116 </ul></li> 117 </ul></li> 118 119 </ol> 120 121 <div class="doc_author"> 122 <p>Written by the LLVM Team.</p> 123 </div> 124 125 <div class="doc_warning"> 126 <p>Warning: This is a work in progress.</p> 127 </div> 128 129 <!-- *********************************************************************** --> 130 <h2> 131 <a name="introduction">Introduction</a> 132 </h2> 133 <!-- *********************************************************************** --> 134 135 <div> 136 137 <p>The LLVM target-independent code generator is a framework that provides a 138 suite of reusable components for translating the LLVM internal representation 139 to the machine code for a specified target—either in assembly form 140 (suitable for a static compiler) or in binary machine code format (usable for 141 a JIT compiler). The LLVM target-independent code generator consists of six 142 main components:</p> 143 144 <ol> 145 <li><a href="#targetdesc">Abstract target description</a> interfaces which 146 capture important properties about various aspects of the machine, 147 independently of how they will be used. These interfaces are defined in 148 <tt>include/llvm/Target/</tt>.</li> 149 150 <li>Classes used to represent the <a href="#codegendesc">code being 151 generated</a> for a target. These classes are intended to be abstract 152 enough to represent the machine code for <i>any</i> target machine. These 153 classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level, 154 concepts like "constant pool entries" and "jump tables" are explicitly 155 exposed.</li> 156 157 <li>Classes and algorithms used to represent code as the object file level, 158 the <a href="#mc">MC Layer</a>. These classes represent assembly level 159 constructs like labels, sections, and instructions. At this level, 160 concepts like "constant pool entries" and "jump tables" don't exist.</li> 161 162 <li><a href="#codegenalgs">Target-independent algorithms</a> used to implement 163 various phases of native code generation (register allocation, scheduling, 164 stack frame representation, etc). This code lives 165 in <tt>lib/CodeGen/</tt>.</li> 166 167 <li><a href="#targetimpls">Implementations of the abstract target description 168 interfaces</a> for particular targets. These machine descriptions make 169 use of the components provided by LLVM, and can optionally provide custom 170 target-specific passes, to build complete code generators for a specific 171 target. Target descriptions live in <tt>lib/Target/</tt>.</li> 172 173 <li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is 174 completely target independent (it uses the <tt>TargetJITInfo</tt> 175 structure to interface for target-specific issues. The code for the 176 target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li> 177 </ol> 178 179 <p>Depending on which part of the code generator you are interested in working 180 on, different pieces of this will be useful to you. In any case, you should 181 be familiar with the <a href="#targetdesc">target description</a> 182 and <a href="#codegendesc">machine code representation</a> classes. If you 183 want to add a backend for a new target, you will need 184 to <a href="#targetimpls">implement the target description</a> classes for 185 your new target and understand the <a href="LangRef.html">LLVM code 186 representation</a>. If you are interested in implementing a 187 new <a href="#codegenalgs">code generation algorithm</a>, it should only 188 depend on the target-description and machine code representation classes, 189 ensuring that it is portable.</p> 190 191 <!-- ======================================================================= --> 192 <h3> 193 <a name="required">Required components in the code generator</a> 194 </h3> 195 196 <div> 197 198 <p>The two pieces of the LLVM code generator are the high-level interface to the 199 code generator and the set of reusable components that can be used to build 200 target-specific backends. The two most important interfaces 201 (<a href="#targetmachine"><tt>TargetMachine</tt></a> 202 and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are 203 required to be defined for a backend to fit into the LLVM system, but the 204 others must be defined if the reusable code generator components are going to 205 be used.</p> 206 207 <p>This design has two important implications. The first is that LLVM can 208 support completely non-traditional code generation targets. For example, the 209 C backend does not require register allocation, instruction selection, or any 210 of the other standard components provided by the system. As such, it only 211 implements these two interfaces, and does its own thing. Another example of 212 a code generator like this is a (purely hypothetical) backend that converts 213 LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p> 214 215 <p>This design also implies that it is possible to design and implement 216 radically different code generators in the LLVM system that do not make use 217 of any of the built-in components. Doing so is not recommended at all, but 218 could be required for radically different targets that do not fit into the 219 LLVM machine description model: FPGAs for example.</p> 220 221 </div> 222 223 <!-- ======================================================================= --> 224 <h3> 225 <a name="high-level-design">The high-level design of the code generator</a> 226 </h3> 227 228 <div> 229 230 <p>The LLVM target-independent code generator is designed to support efficient 231 and quality code generation for standard register-based microprocessors. 232 Code generation in this model is divided into the following stages:</p> 233 234 <ol> 235 <li><b><a href="#instselect">Instruction Selection</a></b> — This phase 236 determines an efficient way to express the input LLVM code in the target 237 instruction set. This stage produces the initial code for the program in 238 the target instruction set, then makes use of virtual registers in SSA 239 form and physical registers that represent any required register 240 assignments due to target constraints or calling conventions. This step 241 turns the LLVM code into a DAG of target instructions.</li> 242 243 <li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> — 244 This phase takes the DAG of target instructions produced by the 245 instruction selection phase, determines an ordering of the instructions, 246 then emits the instructions 247 as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering. 248 Note that we describe this in the <a href="#instselect">instruction 249 selection section</a> because it operates on 250 a <a href="#selectiondag_intro">SelectionDAG</a>.</li> 251 252 <li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> — 253 This optional stage consists of a series of machine-code optimizations 254 that operate on the SSA-form produced by the instruction selector. 255 Optimizations like modulo-scheduling or peephole optimization work 256 here.</li> 257 258 <li><b><a href="#regalloc">Register Allocation</a></b> — The target code 259 is transformed from an infinite virtual register file in SSA form to the 260 concrete register file used by the target. This phase introduces spill 261 code and eliminates all virtual register references from the program.</li> 262 263 <li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once 264 the machine code has been generated for the function and the amount of 265 stack space required is known (used for LLVM alloca's and spill slots), 266 the prolog and epilog code for the function can be inserted and "abstract 267 stack location references" can be eliminated. This stage is responsible 268 for implementing optimizations like frame-pointer elimination and stack 269 packing.</li> 270 271 <li><b><a href="#latemco">Late Machine Code Optimizations</a></b> — 272 Optimizations that operate on "final" machine code can go here, such as 273 spill code scheduling and peephole optimizations.</li> 274 275 <li><b><a href="#codeemit">Code Emission</a></b> — The final stage 276 actually puts out the code for the current function, either in the target 277 assembler format or in machine code.</li> 278 </ol> 279 280 <p>The code generator is based on the assumption that the instruction selector 281 will use an optimal pattern matching selector to create high-quality 282 sequences of native instructions. Alternative code generator designs based 283 on pattern expansion and aggressive iterative peephole optimization are much 284 slower. This design permits efficient compilation (important for JIT 285 environments) and aggressive optimization (used when generating code offline) 286 by allowing components of varying levels of sophistication to be used for any 287 step of compilation.</p> 288 289 <p>In addition to these stages, target implementations can insert arbitrary 290 target-specific passes into the flow. For example, the X86 target uses a 291 special pass to handle the 80x87 floating point stack architecture. Other 292 targets with unusual requirements can be supported with custom passes as 293 needed.</p> 294 295 </div> 296 297 <!-- ======================================================================= --> 298 <h3> 299 <a name="tablegen">Using TableGen for target description</a> 300 </h3> 301 302 <div> 303 304 <p>The target description classes require a detailed description of the target 305 architecture. These target descriptions often have a large amount of common 306 information (e.g., an <tt>add</tt> instruction is almost identical to a 307 <tt>sub</tt> instruction). In order to allow the maximum amount of 308 commonality to be factored out, the LLVM code generator uses 309 the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big 310 chunks of the target machine, which allows the use of domain-specific and 311 target-specific abstractions to reduce the amount of repetition.</p> 312 313 <p>As LLVM continues to be developed and refined, we plan to move more and more 314 of the target description to the <tt>.td</tt> form. Doing so gives us a 315 number of advantages. The most important is that it makes it easier to port 316 LLVM because it reduces the amount of C++ code that has to be written, and 317 the surface area of the code generator that needs to be understood before 318 someone can get something working. Second, it makes it easier to change 319 things. In particular, if tables and other things are all emitted 320 by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to 321 update all of the targets to a new interface.</p> 322 323 </div> 324 325 </div> 326 327 <!-- *********************************************************************** --> 328 <h2> 329 <a name="targetdesc">Target description classes</a> 330 </h2> 331 <!-- *********************************************************************** --> 332 333 <div> 334 335 <p>The LLVM target description classes (located in the 336 <tt>include/llvm/Target</tt> directory) provide an abstract description of 337 the target machine independent of any particular client. These classes are 338 designed to capture the <i>abstract</i> properties of the target (such as the 339 instructions and registers it has), and do not incorporate any particular 340 pieces of code generation algorithms.</p> 341 342 <p>All of the target description classes (except the 343 <tt><a href="#targetdata">TargetData</a></tt> class) are designed to be 344 subclassed by the concrete target implementation, and have virtual methods 345 implemented. To get to these implementations, the 346 <tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors 347 that should be implemented by the target.</p> 348 349 <!-- ======================================================================= --> 350 <h3> 351 <a name="targetmachine">The <tt>TargetMachine</tt> class</a> 352 </h3> 353 354 <div> 355 356 <p>The <tt>TargetMachine</tt> class provides virtual methods that are used to 357 access the target-specific implementations of the various target description 358 classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>, 359 <tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is 360 designed to be specialized by a concrete target implementation 361 (e.g., <tt>X86TargetMachine</tt>) which implements the various virtual 362 methods. The only required target description class is 363 the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code 364 generator components are to be used, the other interfaces should be 365 implemented as well.</p> 366 367 </div> 368 369 <!-- ======================================================================= --> 370 <h3> 371 <a name="targetdata">The <tt>TargetData</tt> class</a> 372 </h3> 373 374 <div> 375 376 <p>The <tt>TargetData</tt> class is the only required target description class, 377 and it is the only class that is not extensible (you cannot derived a new 378 class from it). <tt>TargetData</tt> specifies information about how the 379 target lays out memory for structures, the alignment requirements for various 380 data types, the size of pointers in the target, and whether the target is 381 little-endian or big-endian.</p> 382 383 </div> 384 385 <!-- ======================================================================= --> 386 <h3> 387 <a name="targetlowering">The <tt>TargetLowering</tt> class</a> 388 </h3> 389 390 <div> 391 392 <p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction 393 selectors primarily to describe how LLVM code should be lowered to 394 SelectionDAG operations. Among other things, this class indicates:</p> 395 396 <ul> 397 <li>an initial register class to use for various <tt>ValueType</tt>s,</li> 398 399 <li>which operations are natively supported by the target machine,</li> 400 401 <li>the return type of <tt>setcc</tt> operations,</li> 402 403 <li>the type to use for shift amounts, and</li> 404 405 <li>various high-level characteristics, like whether it is profitable to turn 406 division by a constant into a multiplication sequence</li> 407 </ul> 408 409 </div> 410 411 <!-- ======================================================================= --> 412 <h3> 413 <a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a> 414 </h3> 415 416 <div> 417 418 <p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file 419 of the target and any interactions between the registers.</p> 420 421 <p>Registers in the code generator are represented in the code generator by 422 unsigned integers. Physical registers (those that actually exist in the 423 target description) are unique small numbers, and virtual registers are 424 generally large. Note that register #0 is reserved as a flag value.</p> 425 426 <p>Each register in the processor description has an associated 427 <tt>TargetRegisterDesc</tt> entry, which provides a textual name for the 428 register (used for assembly output and debugging dumps) and a set of aliases 429 (used to indicate whether one register overlaps with another).</p> 430 431 <p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt> 432 class exposes a set of processor specific register classes (instances of the 433 <tt>TargetRegisterClass</tt> class). Each register class contains sets of 434 registers that have the same properties (for example, they are all 32-bit 435 integer registers). Each SSA virtual register created by the instruction 436 selector has an associated register class. When the register allocator runs, 437 it replaces virtual registers with a physical register in the set.</p> 438 439 <p>The target-specific implementations of these classes is auto-generated from 440 a <a href="TableGenFundamentals.html">TableGen</a> description of the 441 register file.</p> 442 443 </div> 444 445 <!-- ======================================================================= --> 446 <h3> 447 <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a> 448 </h3> 449 450 <div> 451 452 <p>The <tt>TargetInstrInfo</tt> class is used to describe the machine 453 instructions supported by the target. It is essentially an array of 454 <tt>TargetInstrDescriptor</tt> objects, each of which describes one 455 instruction the target supports. Descriptors define things like the mnemonic 456 for the opcode, the number of operands, the list of implicit register uses 457 and defs, whether the instruction has certain target-independent properties 458 (accesses memory, is commutable, etc), and holds any target-specific 459 flags.</p> 460 461 </div> 462 463 <!-- ======================================================================= --> 464 <h3> 465 <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a> 466 </h3> 467 468 <div> 469 470 <p>The <tt>TargetFrameInfo</tt> class is used to provide information about the 471 stack frame layout of the target. It holds the direction of stack growth, the 472 known stack alignment on entry to each function, and the offset to the local 473 area. The offset to the local area is the offset from the stack pointer on 474 function entry to the first location where function data (local variables, 475 spill locations) can be stored.</p> 476 477 </div> 478 479 <!-- ======================================================================= --> 480 <h3> 481 <a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a> 482 </h3> 483 484 <div> 485 486 <p>The <tt>TargetSubtarget</tt> class is used to provide information about the 487 specific chip set being targeted. A sub-target informs code generation of 488 which instructions are supported, instruction latencies and instruction 489 execution itinerary; i.e., which processing units are used, in what order, 490 and for how long.</p> 491 492 </div> 493 494 495 <!-- ======================================================================= --> 496 <h3> 497 <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a> 498 </h3> 499 500 <div> 501 502 <p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the 503 Just-In-Time code generator to perform target-specific activities, such as 504 emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it 505 should provide one of these objects through the <tt>getJITInfo</tt> 506 method.</p> 507 508 </div> 509 510 </div> 511 512 <!-- *********************************************************************** --> 513 <h2> 514 <a name="codegendesc">Machine code description classes</a> 515 </h2> 516 <!-- *********************************************************************** --> 517 518 <div> 519 520 <p>At the high-level, LLVM code is translated to a machine specific 521 representation formed out of 522 <a href="#machinefunction"><tt>MachineFunction</tt></a>, 523 <a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>, 524 and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined 525 in <tt>include/llvm/CodeGen</tt>). This representation is completely target 526 agnostic, representing instructions in their most abstract form: an opcode 527 and a series of operands. This representation is designed to support both an 528 SSA representation for machine code, as well as a register allocated, non-SSA 529 form.</p> 530 531 <!-- ======================================================================= --> 532 <h3> 533 <a name="machineinstr">The <tt>MachineInstr</tt> class</a> 534 </h3> 535 536 <div> 537 538 <p>Target machine instructions are represented as instances of the 539 <tt>MachineInstr</tt> class. This class is an extremely abstract way of 540 representing machine instructions. In particular, it only keeps track of an 541 opcode number and a set of operands.</p> 542 543 <p>The opcode number is a simple unsigned integer that only has meaning to a 544 specific backend. All of the instructions for a target should be defined in 545 the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are 546 auto-generated from this description. The <tt>MachineInstr</tt> class does 547 not have any information about how to interpret the instruction (i.e., what 548 the semantics of the instruction are); for that you must refer to the 549 <tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p> 550 551 <p>The operands of a machine instruction can be of several different types: a 552 register reference, a constant integer, a basic block reference, etc. In 553 addition, a machine operand should be marked as a def or a use of the value 554 (though only registers are allowed to be defs).</p> 555 556 <p>By convention, the LLVM code generator orders instruction operands so that 557 all register definitions come before the register uses, even on architectures 558 that are normally printed in other orders. For example, the SPARC add 559 instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers 560 and stores the result into the "%i3" register. In the LLVM code generator, 561 the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the 562 destination first.</p> 563 564 <p>Keeping destination (definition) operands at the beginning of the operand 565 list has several advantages. In particular, the debugging printer will print 566 the instruction like this:</p> 567 568 <div class="doc_code"> 569 <pre> 570 %r3 = add %i1, %i2 571 </pre> 572 </div> 573 574 <p>Also if the first operand is a def, it is easier to <a href="#buildmi">create 575 instructions</a> whose only def is the first operand.</p> 576 577 <!-- _______________________________________________________________________ --> 578 <h4> 579 <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a> 580 </h4> 581 582 <div> 583 584 <p>Machine instructions are created by using the <tt>BuildMI</tt> functions, 585 located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The 586 <tt>BuildMI</tt> functions make it easy to build arbitrary machine 587 instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p> 588 589 <div class="doc_code"> 590 <pre> 591 // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42') 592 // instruction. The '1' specifies how many operands will be added. 593 MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42); 594 595 // Create the same instr, but insert it at the end of a basic block. 596 MachineBasicBlock &MBB = ... 597 BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42); 598 599 // Create the same instr, but insert it before a specified iterator point. 600 MachineBasicBlock::iterator MBBI = ... 601 BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42); 602 603 // Create a 'cmp Reg, 0' instruction, no destination reg. 604 MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0); 605 // Create an 'sahf' instruction which takes no operands and stores nothing. 606 MI = BuildMI(X86::SAHF, 0); 607 608 // Create a self looping branch instruction. 609 BuildMI(MBB, X86::JNE, 1).addMBB(&MBB); 610 </pre> 611 </div> 612 613 <p>The key thing to remember with the <tt>BuildMI</tt> functions is that you 614 have to specify the number of operands that the machine instruction will 615 take. This allows for efficient memory allocation. You also need to specify 616 if operands default to be uses of values, not definitions. If you need to 617 add a definition operand (other than the optional destination register), you 618 must explicitly mark it as such:</p> 619 620 <div class="doc_code"> 621 <pre> 622 MI.addReg(Reg, RegState::Define); 623 </pre> 624 </div> 625 626 </div> 627 628 <!-- _______________________________________________________________________ --> 629 <h4> 630 <a name="fixedregs">Fixed (preassigned) registers</a> 631 </h4> 632 633 <div> 634 635 <p>One important issue that the code generator needs to be aware of is the 636 presence of fixed registers. In particular, there are often places in the 637 instruction stream where the register allocator <em>must</em> arrange for a 638 particular value to be in a particular register. This can occur due to 639 limitations of the instruction set (e.g., the X86 can only do a 32-bit divide 640 with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like 641 calling conventions. In any case, the instruction selector should emit code 642 that copies a virtual register into or out of a physical register when 643 needed.</p> 644 645 <p>For example, consider this simple LLVM example:</p> 646 647 <div class="doc_code"> 648 <pre> 649 define i32 @test(i32 %X, i32 %Y) { 650 %Z = udiv i32 %X, %Y 651 ret i32 %Z 652 } 653 </pre> 654 </div> 655 656 <p>The X86 instruction selector produces this machine code for the <tt>div</tt> 657 and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to 658 get this):</p> 659 660 <div class="doc_code"> 661 <pre> 662 ;; Start of div 663 %EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX 664 %reg1027 = sar %reg1024, 31 665 %EDX = mov %reg1027 ;; Sign extend X into EDX 666 idiv %reg1025 ;; Divide by Y (in reg1025) 667 %reg1026 = mov %EAX ;; Read the result (Z) out of EAX 668 669 ;; Start of ret 670 %EAX = mov %reg1026 ;; 32-bit return value goes in EAX 671 ret 672 </pre> 673 </div> 674 675 <p>By the end of code generation, the register allocator has coalesced the 676 registers and deleted the resultant identity moves producing the following 677 code:</p> 678 679 <div class="doc_code"> 680 <pre> 681 ;; X is in EAX, Y is in ECX 682 mov %EAX, %EDX 683 sar %EDX, 31 684 idiv %ECX 685 ret 686 </pre> 687 </div> 688 689 <p>This approach is extremely general (if it can handle the X86 architecture, it 690 can handle anything!) and allows all of the target specific knowledge about 691 the instruction stream to be isolated in the instruction selector. Note that 692 physical registers should have a short lifetime for good code generation, and 693 all physical registers are assumed dead on entry to and exit from basic 694 blocks (before register allocation). Thus, if you need a value to be live 695 across basic block boundaries, it <em>must</em> live in a virtual 696 register.</p> 697 698 </div> 699 700 <!-- _______________________________________________________________________ --> 701 <h4> 702 <a name="ssa">Machine code in SSA form</a> 703 </h4> 704 705 <div> 706 707 <p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are 708 maintained in SSA-form until register allocation happens. For the most part, 709 this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes 710 become machine code PHI nodes, and virtual registers are only allowed to have 711 a single definition.</p> 712 713 <p>After register allocation, machine code is no longer in SSA-form because 714 there are no virtual registers left in the code.</p> 715 716 </div> 717 718 </div> 719 720 <!-- ======================================================================= --> 721 <h3> 722 <a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a> 723 </h3> 724 725 <div> 726 727 <p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions 728 (<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly 729 corresponds to the LLVM code input to the instruction selector, but there can 730 be a one-to-many mapping (i.e. one LLVM basic block can map to multiple 731 machine basic blocks). The <tt>MachineBasicBlock</tt> class has a 732 "<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it 733 comes from.</p> 734 735 </div> 736 737 <!-- ======================================================================= --> 738 <h3> 739 <a name="machinefunction">The <tt>MachineFunction</tt> class</a> 740 </h3> 741 742 <div> 743 744 <p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks 745 (<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It 746 corresponds one-to-one with the LLVM function input to the instruction 747 selector. In addition to a list of basic blocks, 748 the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>, 749 a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a 750 <tt>MachineRegisterInfo</tt>. See 751 <tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p> 752 753 </div> 754 755 </div> 756 757 <!-- *********************************************************************** --> 758 <h2> 759 <a name="mc">The "MC" Layer</a> 760 </h2> 761 <!-- *********************************************************************** --> 762 763 <div> 764 765 <p> 766 The MC Layer is used to represent and process code at the raw machine code 767 level, devoid of "high level" information like "constant pools", "jump tables", 768 "global variables" or anything like that. At this level, LLVM handles things 769 like label names, machine instructions, and sections in the object file. The 770 code in this layer is used for a number of important purposes: the tail end of 771 the code generator uses it to write a .s or .o file, and it is also used by the 772 llvm-mc tool to implement standalone machine code assemblers and disassemblers. 773 </p> 774 775 <p> 776 This section describes some of the important classes. There are also a number 777 of important subsystems that interact at this layer, they are described later 778 in this manual. 779 </p> 780 781 <!-- ======================================================================= --> 782 <h3> 783 <a name="mcstreamer">The <tt>MCStreamer</tt> API</a> 784 </h3> 785 786 <div> 787 788 <p> 789 MCStreamer is best thought of as an assembler API. It is an abstract API which 790 is <em>implemented</em> in different ways (e.g. to output a .s file, output an 791 ELF .o file, etc) but whose API correspond directly to what you see in a .s 792 file. MCStreamer has one method per directive, such as EmitLabel, 793 EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which 794 directly correspond to assembly level directives. It also has an 795 EmitInstruction method, which is used to output an MCInst to the streamer. 796 </p> 797 798 <p> 799 This API is most important for two clients: the llvm-mc stand-alone assembler is 800 effectively a parser that parses a line, then invokes a method on MCStreamer. In 801 the code generator, the <a href="#codeemit">Code Emission</a> phase of the code 802 generator lowers higher level LLVM IR and Machine* constructs down to the MC 803 layer, emitting directives through MCStreamer.</p> 804 805 <p> 806 On the implementation side of MCStreamer, there are two major implementations: 807 one for writing out a .s file (MCAsmStreamer), and one for writing out a .o 808 file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation 809 that prints out a directive for each method (e.g. EmitValue -> .byte), but 810 MCObjectStreamer implements a full assembler. 811 </p> 812 813 </div> 814 815 <!-- ======================================================================= --> 816 <h3> 817 <a name="mccontext">The <tt>MCContext</tt> class</a> 818 </h3> 819 820 <div> 821 822 <p> 823 The MCContext class is the owner of a variety of uniqued data structures at the 824 MC layer, including symbols, sections, etc. As such, this is the class that you 825 interact with to create symbols and sections. This class can not be subclassed. 826 </p> 827 828 </div> 829 830 <!-- ======================================================================= --> 831 <h3> 832 <a name="mcsymbol">The <tt>MCSymbol</tt> class</a> 833 </h3> 834 835 <div> 836 837 <p> 838 The MCSymbol class represents a symbol (aka label) in the assembly file. There 839 are two interesting kinds of symbols: assembler temporary symbols, and normal 840 symbols. Assembler temporary symbols are used and processed by the assembler 841 but are discarded when the object file is produced. The distinction is usually 842 represented by adding a prefix to the label, for example "L" labels are 843 assembler temporary labels in MachO. 844 </p> 845 846 <p>MCSymbols are created by MCContext and uniqued there. This means that 847 MCSymbols can be compared for pointer equivalence to find out if they are the 848 same symbol. Note that pointer inequality does not guarantee the labels will 849 end up at different addresses though. It's perfectly legal to output something 850 like this to the .s file:<p> 851 852 <pre> 853 foo: 854 bar: 855 .byte 4 856 </pre> 857 858 <p>In this case, both the foo and bar symbols will have the same address.</p> 859 860 </div> 861 862 <!-- ======================================================================= --> 863 <h3> 864 <a name="mcsection">The <tt>MCSection</tt> class</a> 865 </h3> 866 867 <div> 868 869 <p> 870 The MCSection class represents an object-file specific section. It is subclassed 871 by object file specific implementations (e.g. <tt>MCSectionMachO</tt>, 872 <tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued 873 by MCContext. The MCStreamer has a notion of the current section, which can be 874 changed with the SwitchToSection method (which corresponds to a ".section" 875 directive in a .s file). 876 </p> 877 878 </div> 879 880 <!-- ======================================================================= --> 881 <h3> 882 <a name="mcinst">The <tt>MCInst</tt> class</a> 883 </h3> 884 885 <div> 886 887 <p> 888 The MCInst class is a target-independent representation of an instruction. It 889 is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>) 890 that holds a target-specific opcode and a vector of MCOperands. MCOperand, in 891 turn, is a simple discriminated union of three cases: 1) a simple immediate, 892 2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an 893 MCExpr. 894 </p> 895 896 <p>MCInst is the common currency used to represent machine instructions at the 897 MC layer. It is the type used by the instruction encoder, the instruction 898 printer, and the type generated by the assembly parser and disassembler. 899 </p> 900 901 </div> 902 903 </div> 904 905 <!-- *********************************************************************** --> 906 <h2> 907 <a name="codegenalgs">Target-independent code generation algorithms</a> 908 </h2> 909 <!-- *********************************************************************** --> 910 911 <div> 912 913 <p>This section documents the phases described in the 914 <a href="#high-level-design">high-level design of the code generator</a>. 915 It explains how they work and some of the rationale behind their design.</p> 916 917 <!-- ======================================================================= --> 918 <h3> 919 <a name="instselect">Instruction Selection</a> 920 </h3> 921 922 <div> 923 924 <p>Instruction Selection is the process of translating LLVM code presented to 925 the code generator into target-specific machine instructions. There are 926 several well-known ways to do this in the literature. LLVM uses a 927 SelectionDAG based instruction selector.</p> 928 929 <p>Portions of the DAG instruction selector are generated from the target 930 description (<tt>*.td</tt>) files. Our goal is for the entire instruction 931 selector to be generated from these <tt>.td</tt> files, though currently 932 there are still things that require custom C++ code.</p> 933 934 <!-- _______________________________________________________________________ --> 935 <h4> 936 <a name="selectiondag_intro">Introduction to SelectionDAGs</a> 937 </h4> 938 939 <div> 940 941 <p>The SelectionDAG provides an abstraction for code representation in a way 942 that is amenable to instruction selection using automatic techniques 943 (e.g. dynamic-programming based optimal pattern matching selectors). It is 944 also well-suited to other phases of code generation; in particular, 945 instruction scheduling (SelectionDAG's are very close to scheduling DAGs 946 post-selection). Additionally, the SelectionDAG provides a host 947 representation where a large variety of very-low-level (but 948 target-independent) <a href="#selectiondag_optimize">optimizations</a> may be 949 performed; ones which require extensive information about the instructions 950 efficiently supported by the target.</p> 951 952 <p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the 953 <tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its 954 operation code (Opcode) that indicates what operation the node performs and 955 the operands to the operation. The various operation node types are 956 described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt> 957 file.</p> 958 959 <p>Although most operations define a single value, each node in the graph may 960 define multiple values. For example, a combined div/rem operation will 961 define both the dividend and the remainder. Many other situations require 962 multiple values as well. Each node also has some number of operands, which 963 are edges to the node defining the used value. Because nodes may define 964 multiple values, edges are represented by instances of the <tt>SDValue</tt> 965 class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node 966 and result value being used, respectively. Each value produced by 967 an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type) 968 indicating what the type of the value is.</p> 969 970 <p>SelectionDAGs contain two different kinds of values: those that represent 971 data flow and those that represent control flow dependencies. Data values 972 are simple edges with an integer or floating point value type. Control edges 973 are represented as "chain" edges which are of type <tt>MVT::Other</tt>. 974 These edges provide an ordering between nodes that have side effects (such as 975 loads, stores, calls, returns, etc). All nodes that have side effects should 976 take a token chain as input and produce a new one as output. By convention, 977 token chain inputs are always operand #0, and chain results are always the 978 last value produced by an operation.</p> 979 980 <p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is 981 always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root 982 node is the final side-effecting node in the token chain. For example, in a 983 single basic block function it would be the return node.</p> 984 985 <p>One important concept for SelectionDAGs is the notion of a "legal" vs. 986 "illegal" DAG. A legal DAG for a target is one that only uses supported 987 operations and supported types. On a 32-bit PowerPC, for example, a DAG with 988 a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that 989 uses a SREM or UREM operation. The 990 <a href="#selectinodag_legalize_types">legalize types</a> and 991 <a href="#selectiondag_legalize">legalize operations</a> phases are 992 responsible for turning an illegal DAG into a legal DAG.</p> 993 994 </div> 995 996 <!-- _______________________________________________________________________ --> 997 <h4> 998 <a name="selectiondag_process">SelectionDAG Instruction Selection Process</a> 999 </h4> 1000 1001 <div> 1002 1003 <p>SelectionDAG-based instruction selection consists of the following steps:</p> 1004 1005 <ol> 1006 <li><a href="#selectiondag_build">Build initial DAG</a> — This stage 1007 performs a simple translation from the input LLVM code to an illegal 1008 SelectionDAG.</li> 1009 1010 <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This 1011 stage performs simple optimizations on the SelectionDAG to simplify it, 1012 and recognize meta instructions (like rotates 1013 and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta 1014 operations. This makes the resultant code more efficient and 1015 the <a href="#selectiondag_select">select instructions from DAG</a> phase 1016 (below) simpler.</li> 1017 1018 <li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a> 1019 — This stage transforms SelectionDAG nodes to eliminate any types 1020 that are unsupported on the target.</li> 1021 1022 <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The 1023 SelectionDAG optimizer is run to clean up redundancies exposed by type 1024 legalization.</li> 1025 1026 <li><a href="#selectiondag_legalize">Legalize SelectionDAG Ops</a> — 1027 This stage transforms SelectionDAG nodes to eliminate any operations 1028 that are unsupported on the target.</li> 1029 1030 <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The 1031 SelectionDAG optimizer is run to eliminate inefficiencies introduced by 1032 operation legalization.</li> 1033 1034 <li><a href="#selectiondag_select">Select instructions from DAG</a> — 1035 Finally, the target instruction selector matches the DAG operations to 1036 target instructions. This process translates the target-independent input 1037 DAG into another DAG of target instructions.</li> 1038 1039 <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a> 1040 — The last phase assigns a linear order to the instructions in the 1041 target-instruction DAG and emits them into the MachineFunction being 1042 compiled. This step uses traditional prepass scheduling techniques.</li> 1043 </ol> 1044 1045 <p>After all of these steps are complete, the SelectionDAG is destroyed and the 1046 rest of the code generation passes are run.</p> 1047 1048 <p>One great way to visualize what is going on here is to take advantage of a 1049 few LLC command line options. The following options pop up a window 1050 displaying the SelectionDAG at specific times (if you only get errors printed 1051 to the console while using this, you probably 1052 <a href="ProgrammersManual.html#ViewGraph">need to configure your system</a> 1053 to add support for it).</p> 1054 1055 <ul> 1056 <li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built, 1057 before the first optimization pass.</li> 1058 1059 <li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li> 1060 1061 <li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second 1062 optimization pass.</li> 1063 1064 <li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li> 1065 1066 <li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li> 1067 </ul> 1068 1069 <p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph. 1070 This graph is based on the final SelectionDAG, with nodes that must be 1071 scheduled together bundled into a single scheduling-unit node, and with 1072 immediate operands and other nodes that aren't relevant for scheduling 1073 omitted.</p> 1074 1075 </div> 1076 1077 <!-- _______________________________________________________________________ --> 1078 <h4> 1079 <a name="selectiondag_build">Initial SelectionDAG Construction</a> 1080 </h4> 1081 1082 <div> 1083 1084 <p>The initial SelectionDAG is naïvely peephole expanded from the LLVM 1085 input by the <tt>SelectionDAGLowering</tt> class in the 1086 <tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of 1087 this pass is to expose as much low-level, target-specific details to the 1088 SelectionDAG as possible. This pass is mostly hard-coded (e.g. an 1089 LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a 1090 <tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass 1091 requires target-specific hooks to lower calls, returns, varargs, etc. For 1092 these features, the <tt><a href="#targetlowering">TargetLowering</a></tt> 1093 interface is used.</p> 1094 1095 </div> 1096 1097 <!-- _______________________________________________________________________ --> 1098 <h4> 1099 <a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a> 1100 </h4> 1101 1102 <div> 1103 1104 <p>The Legalize phase is in charge of converting a DAG to only use the types 1105 that are natively supported by the target.</p> 1106 1107 <p>There are two main ways of converting values of unsupported scalar types to 1108 values of supported types: converting small types to larger types 1109 ("promoting"), and breaking up large integer types into smaller ones 1110 ("expanding"). For example, a target might require that all f32 values are 1111 promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same 1112 target might require that all i64 values be expanded into pairs of i32 1113 values. These changes can insert sign and zero extensions as needed to make 1114 sure that the final code has the same behavior as the input.</p> 1115 1116 <p>There are two main ways of converting values of unsupported vector types to 1117 value of supported types: splitting vector types, multiple times if 1118 necessary, until a legal type is found, and extending vector types by adding 1119 elements to the end to round them out to legal types ("widening"). If a 1120 vector gets split all the way down to single-element parts with no supported 1121 vector type being found, the elements are converted to scalars 1122 ("scalarizing").</p> 1123 1124 <p>A target implementation tells the legalizer which types are supported (and 1125 which register class to use for them) by calling the 1126 <tt>addRegisterClass</tt> method in its TargetLowering constructor.</p> 1127 1128 </div> 1129 1130 <!-- _______________________________________________________________________ --> 1131 <h4> 1132 <a name="selectiondag_legalize">SelectionDAG Legalize Phase</a> 1133 </h4> 1134 1135 <div> 1136 1137 <p>The Legalize phase is in charge of converting a DAG to only use the 1138 operations that are natively supported by the target.</p> 1139 1140 <p>Targets often have weird constraints, such as not supporting every operation 1141 on every supported datatype (e.g. X86 does not support byte conditional moves 1142 and PowerPC does not support sign-extending loads from a 16-bit memory 1143 location). Legalize takes care of this by open-coding another sequence of 1144 operations to emulate the operation ("expansion"), by promoting one type to a 1145 larger type that supports the operation ("promotion"), or by using a 1146 target-specific hook to implement the legalization ("custom").</p> 1147 1148 <p>A target implementation tells the legalizer which operations are not 1149 supported (and which of the above three actions to take) by calling the 1150 <tt>setOperationAction</tt> method in its <tt>TargetLowering</tt> 1151 constructor.</p> 1152 1153 <p>Prior to the existence of the Legalize passes, we required that every target 1154 <a href="#selectiondag_optimize">selector</a> supported and handled every 1155 operator and type even if they are not natively supported. The introduction 1156 of the Legalize phases allows all of the canonicalization patterns to be 1157 shared across targets, and makes it very easy to optimize the canonicalized 1158 code because it is still in the form of a DAG.</p> 1159 1160 </div> 1161 1162 <!-- _______________________________________________________________________ --> 1163 <h4> 1164 <a name="selectiondag_optimize"> 1165 SelectionDAG Optimization Phase: the DAG Combiner 1166 </a> 1167 </h4> 1168 1169 <div> 1170 1171 <p>The SelectionDAG optimization phase is run multiple times for code 1172 generation, immediately after the DAG is built and once after each 1173 legalization. The first run of the pass allows the initial code to be 1174 cleaned up (e.g. performing optimizations that depend on knowing that the 1175 operators have restricted type inputs). Subsequent runs of the pass clean up 1176 the messy code generated by the Legalize passes, which allows Legalize to be 1177 very simple (it can focus on making code legal instead of focusing on 1178 generating <em>good</em> and legal code).</p> 1179 1180 <p>One important class of optimizations performed is optimizing inserted sign 1181 and zero extension instructions. We currently use ad-hoc techniques, but 1182 could move to more rigorous techniques in the future. Here are some good 1183 papers on the subject:</p> 1184 1185 <p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening 1186 integer arithmetic</a>"<br> 1187 Kevin Redwine and Norman Ramsey<br> 1188 International Conference on Compiler Construction (CC) 2004</p> 1189 1190 <p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective 1191 sign extension elimination</a>"<br> 1192 Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br> 1193 Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design 1194 and Implementation.</p> 1195 1196 </div> 1197 1198 <!-- _______________________________________________________________________ --> 1199 <h4> 1200 <a name="selectiondag_select">SelectionDAG Select Phase</a> 1201 </h4> 1202 1203 <div> 1204 1205 <p>The Select phase is the bulk of the target-specific code for instruction 1206 selection. This phase takes a legal SelectionDAG as input, pattern matches 1207 the instructions supported by the target to this DAG, and produces a new DAG 1208 of target code. For example, consider the following LLVM fragment:</p> 1209 1210 <div class="doc_code"> 1211 <pre> 1212 %t1 = fadd float %W, %X 1213 %t2 = fmul float %t1, %Y 1214 %t3 = fadd float %t2, %Z 1215 </pre> 1216 </div> 1217 1218 <p>This LLVM code corresponds to a SelectionDAG that looks basically like 1219 this:</p> 1220 1221 <div class="doc_code"> 1222 <pre> 1223 (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z) 1224 </pre> 1225 </div> 1226 1227 <p>If a target supports floating point multiply-and-add (FMA) operations, one of 1228 the adds can be merged with the multiply. On the PowerPC, for example, the 1229 output of the instruction selector might look like this DAG:</p> 1230 1231 <div class="doc_code"> 1232 <pre> 1233 (FMADDS (FADDS W, X), Y, Z) 1234 </pre> 1235 </div> 1236 1237 <p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its 1238 first two operands and adds the third (as single-precision floating-point 1239 numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision 1240 add instruction. To perform this pattern match, the PowerPC backend includes 1241 the following instruction definitions:</p> 1242 1243 <div class="doc_code"> 1244 <pre> 1245 def FMADDS : AForm_1<59, 29, 1246 (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), 1247 "fmadds $FRT, $FRA, $FRC, $FRB", 1248 [<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC), 1249 F4RC:$FRB))</b>]>; 1250 def FADDS : AForm_2<59, 21, 1251 (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB), 1252 "fadds $FRT, $FRA, $FRB", 1253 [<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>; 1254 </pre> 1255 </div> 1256 1257 <p>The portion of the instruction definition in bold indicates the pattern used 1258 to match the instruction. The DAG operators 1259 (like <tt>fmul</tt>/<tt>fadd</tt>) are defined in 1260 the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. " 1261 <tt>F4RC</tt>" is the register class of the input and result values.</p> 1262 1263 <p>The TableGen DAG instruction selector generator reads the instruction 1264 patterns in the <tt>.td</tt> file and automatically builds parts of the 1265 pattern matching code for your target. It has the following strengths:</p> 1266 1267 <ul> 1268 <li>At compiler-compiler time, it analyzes your instruction patterns and tells 1269 you if your patterns make sense or not.</li> 1270 1271 <li>It can handle arbitrary constraints on operands for the pattern match. In 1272 particular, it is straight-forward to say things like "match any immediate 1273 that is a 13-bit sign-extended value". For examples, see the 1274 <tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC 1275 backend.</li> 1276 1277 <li>It knows several important identities for the patterns defined. For 1278 example, it knows that addition is commutative, so it allows the 1279 <tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as 1280 well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having 1281 to specially handle this case.</li> 1282 1283 <li>It has a full-featured type-inferencing system. In particular, you should 1284 rarely have to explicitly tell the system what type parts of your patterns 1285 are. In the <tt>FMADDS</tt> case above, we didn't have to tell 1286 <tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'. 1287 It was able to infer and propagate this knowledge from the fact that 1288 <tt>F4RC</tt> has type 'f32'.</li> 1289 1290 <li>Targets can define their own (and rely on built-in) "pattern fragments". 1291 Pattern fragments are chunks of reusable patterns that get inlined into 1292 your patterns during compiler-compiler time. For example, the integer 1293 "<tt>(not x)</tt>" operation is actually defined as a pattern fragment 1294 that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not 1295 have a native '<tt>not</tt>' operation. Targets can define their own 1296 short-hand fragments as they see fit. See the definition of 1297 '<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li> 1298 1299 <li>In addition to instructions, targets can specify arbitrary patterns that 1300 map to one or more instructions using the 'Pat' class. For example, the 1301 PowerPC has no way to load an arbitrary integer immediate into a register 1302 in one instruction. To tell tblgen how to do this, it defines: 1303 <br> 1304 <br> 1305 <div class="doc_code"> 1306 <pre> 1307 // Arbitrary immediate support. Implement in terms of LIS/ORI. 1308 def : Pat<(i32 imm:$imm), 1309 (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>; 1310 </pre> 1311 </div> 1312 <br> 1313 If none of the single-instruction patterns for loading an immediate into a 1314 register match, this will be used. This rule says "match an arbitrary i32 1315 immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and 1316 an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to 1317 the left 16 bits') instruction". To make this work, the 1318 <tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate 1319 the input immediate (in this case, take the high or low 16-bits of the 1320 immediate).</li> 1321 1322 <li>While the system does automate a lot, it still allows you to write custom 1323 C++ code to match special cases if there is something that is hard to 1324 express.</li> 1325 </ul> 1326 1327 <p>While it has many strengths, the system currently has some limitations, 1328 primarily because it is a work in progress and is not yet finished:</p> 1329 1330 <ul> 1331 <li>Overall, there is no way to define or match SelectionDAG nodes that define 1332 multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>, 1333 etc). This is the biggest reason that you currently still <em>have 1334 to</em> write custom C++ code for your instruction selector.</li> 1335 1336 <li>There is no great way to support matching complex addressing modes yet. 1337 In the future, we will extend pattern fragments to allow them to define 1338 multiple values (e.g. the four operands of the <a href="#x86_memory">X86 1339 addressing mode</a>, which are currently matched with custom C++ code). 1340 In addition, we'll extend fragments so that a fragment can match multiple 1341 different patterns.</li> 1342 1343 <li>We don't automatically infer flags like isStore/isLoad yet.</li> 1344 1345 <li>We don't automatically generate the set of supported registers and 1346 operations for the <a href="#selectiondag_legalize">Legalizer</a> 1347 yet.</li> 1348 1349 <li>We don't have a way of tying in custom legalized nodes yet.</li> 1350 </ul> 1351 1352 <p>Despite these limitations, the instruction selector generator is still quite 1353 useful for most of the binary and logical operations in typical instruction 1354 sets. If you run into any problems or can't figure out how to do something, 1355 please let Chris know!</p> 1356 1357 </div> 1358 1359 <!-- _______________________________________________________________________ --> 1360 <h4> 1361 <a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a> 1362 </h4> 1363 1364 <div> 1365 1366 <p>The scheduling phase takes the DAG of target instructions from the selection 1367 phase and assigns an order. The scheduler can pick an order depending on 1368 various constraints of the machines (i.e. order for minimal register pressure 1369 or try to cover instruction latencies). Once an order is established, the 1370 DAG is converted to a list 1371 of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is 1372 destroyed.</p> 1373 1374 <p>Note that this phase is logically separate from the instruction selection 1375 phase, but is tied to it closely in the code because it operates on 1376 SelectionDAGs.</p> 1377 1378 </div> 1379 1380 <!-- _______________________________________________________________________ --> 1381 <h4> 1382 <a name="selectiondag_future">Future directions for the SelectionDAG</a> 1383 </h4> 1384 1385 <div> 1386 1387 <ol> 1388 <li>Optional function-at-a-time selection.</li> 1389 1390 <li>Auto-generate entire selector from <tt>.td</tt> file.</li> 1391 </ol> 1392 1393 </div> 1394 1395 </div> 1396 1397 <!-- ======================================================================= --> 1398 <h3> 1399 <a name="ssamco">SSA-based Machine Code Optimizations</a> 1400 </h3> 1401 <div><p>To Be Written</p></div> 1402 1403 <!-- ======================================================================= --> 1404 <h3> 1405 <a name="liveintervals">Live Intervals</a> 1406 </h3> 1407 1408 <div> 1409 1410 <p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>. 1411 They are used by some <a href="#regalloc">register allocator</a> passes to 1412 determine if two or more virtual registers which require the same physical 1413 register are live at the same point in the program (i.e., they conflict). 1414 When this situation occurs, one virtual register must be <i>spilled</i>.</p> 1415 1416 <!-- _______________________________________________________________________ --> 1417 <h4> 1418 <a name="livevariable_analysis">Live Variable Analysis</a> 1419 </h4> 1420 1421 <div> 1422 1423 <p>The first step in determining the live intervals of variables is to calculate 1424 the set of registers that are immediately dead after the instruction (i.e., 1425 the instruction calculates the value, but it is never used) and the set of 1426 registers that are used by the instruction, but are never used after the 1427 instruction (i.e., they are killed). Live variable information is computed 1428 for each <i>virtual</i> register and <i>register allocatable</i> physical 1429 register in the function. This is done in a very efficient manner because it 1430 uses SSA to sparsely compute lifetime information for virtual registers 1431 (which are in SSA form) and only has to track physical registers within a 1432 block. Before register allocation, LLVM can assume that physical registers 1433 are only live within a single basic block. This allows it to do a single, 1434 local analysis to resolve physical register lifetimes within each basic 1435 block. If a physical register is not register allocatable (e.g., a stack 1436 pointer or condition codes), it is not tracked.</p> 1437 1438 <p>Physical registers may be live in to or out of a function. Live in values are 1439 typically arguments in registers. Live out values are typically return values 1440 in registers. Live in values are marked as such, and are given a dummy 1441 "defining" instruction during live intervals analysis. If the last basic 1442 block of a function is a <tt>return</tt>, then it's marked as using all live 1443 out values in the function.</p> 1444 1445 <p><tt>PHI</tt> nodes need to be handled specially, because the calculation of 1446 the live variable information from a depth first traversal of the CFG of the 1447 function won't guarantee that a virtual register used by the <tt>PHI</tt> 1448 node is defined before it's used. When a <tt>PHI</tt> node is encountered, 1449 only the definition is handled, because the uses will be handled in other 1450 basic blocks.</p> 1451 1452 <p>For each <tt>PHI</tt> node of the current basic block, we simulate an 1453 assignment at the end of the current basic block and traverse the successor 1454 basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of 1455 the <tt>PHI</tt> node's operands is coming from the current basic block, then 1456 the variable is marked as <i>alive</i> within the current basic block and all 1457 of its predecessor basic blocks, until the basic block with the defining 1458 instruction is encountered.</p> 1459 1460 </div> 1461 1462 <!-- _______________________________________________________________________ --> 1463 <h4> 1464 <a name="liveintervals_analysis">Live Intervals Analysis</a> 1465 </h4> 1466 1467 <div> 1468 1469 <p>We now have the information available to perform the live intervals analysis 1470 and build the live intervals themselves. We start off by numbering the basic 1471 blocks and machine instructions. We then handle the "live-in" values. These 1472 are in physical registers, so the physical register is assumed to be killed 1473 by the end of the basic block. Live intervals for virtual registers are 1474 computed for some ordering of the machine instructions <tt>[1, N]</tt>. A 1475 live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j 1476 < N</tt>, for which a variable is live.</p> 1477 1478 <p><i><b>More to come...</b></i></p> 1479 1480 </div> 1481 1482 </div> 1483 1484 <!-- ======================================================================= --> 1485 <h3> 1486 <a name="regalloc">Register Allocation</a> 1487 </h3> 1488 1489 <div> 1490 1491 <p>The <i>Register Allocation problem</i> consists in mapping a program 1492 <i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers, 1493 to a program <i>P<sub>p</sub></i> that contains a finite (possibly small) 1494 number of physical registers. Each target architecture has a different number 1495 of physical registers. If the number of physical registers is not enough to 1496 accommodate all the virtual registers, some of them will have to be mapped 1497 into memory. These virtuals are called <i>spilled virtuals</i>.</p> 1498 1499 <!-- _______________________________________________________________________ --> 1500 1501 <h4> 1502 <a name="regAlloc_represent">How registers are represented in LLVM</a> 1503 </h4> 1504 1505 <div> 1506 1507 <p>In LLVM, physical registers are denoted by integer numbers that normally 1508 range from 1 to 1023. To see how this numbering is defined for a particular 1509 architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that 1510 architecture. For instance, by 1511 inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the 1512 32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register 1513 <tt>MM0</tt> is mapped to 48.</p> 1514 1515 <p>Some architectures contain registers that share the same physical location. A 1516 notable example is the X86 platform. For instance, in the X86 architecture, 1517 the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight 1518 bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a 1519 particular architecture, you can check which registers are aliased by 1520 inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method 1521 <tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing 1522 all the physical registers aliased to the register <tt>p_reg</tt>.</p> 1523 1524 <p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>. 1525 Elements in the same register class are functionally equivalent, and can be 1526 interchangeably used. Each virtual register can only be mapped to physical 1527 registers of a particular class. For instance, in the X86 architecture, some 1528 virtuals can only be allocated to 8 bit registers. A register class is 1529 described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual 1530 register is compatible with a given physical, this code can be used:</p> 1531 1532 <div class="doc_code"> 1533 <pre> 1534 bool RegMapping_Fer::compatible_class(MachineFunction &mf, 1535 unsigned v_reg, 1536 unsigned p_reg) { 1537 assert(TargetRegisterInfo::isPhysicalRegister(p_reg) && 1538 "Target register must be physical"); 1539 const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg); 1540 return trc->contains(p_reg); 1541 } 1542 </pre> 1543 </div> 1544 1545 <p>Sometimes, mostly for debugging purposes, it is useful to change the number 1546 of physical registers available in the target architecture. This must be done 1547 statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt> 1548 for <tt>RegisterClass</tt>, the last parameter of which is a list of 1549 registers. Just commenting some out is one simple way to avoid them being 1550 used. A more polite way is to explicitly exclude some registers from 1551 the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register 1552 class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this. 1553 </p> 1554 1555 <p>Virtual registers are also denoted by integer numbers. Contrary to physical 1556 registers, different virtual registers never share the same number. Whereas 1557 physical registers are statically defined in a <tt>TargetRegisterInfo.td</tt> 1558 file and cannot be created by the application developer, that is not the case 1559 with virtual registers. In order to create new virtual registers, use the 1560 method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method 1561 will return a new virtual register. Use an <tt>IndexedMap<Foo, 1562 VirtReg2IndexFunctor></tt> to hold information per virtual register. If you 1563 need to enumerate all virtual registers, use the function 1564 <tt>TargetRegisterInfo::index2VirtReg()</tt> to find the virtual register 1565 numbers:</p> 1566 1567 <div class="doc_code"> 1568 <pre> 1569 for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) { 1570 unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i); 1571 stuff(VirtReg); 1572 } 1573 </pre> 1574 </div> 1575 1576 <p>Before register allocation, the operands of an instruction are mostly virtual 1577 registers, although physical registers may also be used. In order to check if 1578 a given machine operand is a register, use the boolean 1579 function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of 1580 a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define 1581 or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt> 1582 defines the registers 1024, and uses registers 1025 and 1026. Given a 1583 register operand, the method <tt>MachineOperand::isUse()</tt> informs if that 1584 register is being used by the instruction. The 1585 method <tt>MachineOperand::isDef()</tt> informs if that registers is being 1586 defined.</p> 1587 1588 <p>We will call physical registers present in the LLVM bitcode before register 1589 allocation <i>pre-colored registers</i>. Pre-colored registers are used in 1590 many different situations, for instance, to pass parameters of functions 1591 calls, and to store results of particular instructions. There are two types 1592 of pre-colored registers: the ones <i>implicitly</i> defined, and 1593 those <i>explicitly</i> defined. Explicitly defined registers are normal 1594 operands, and can be accessed 1595 with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check 1596 which registers are implicitly defined by an instruction, use 1597 the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>, 1598 where <tt>opcode</tt> is the opcode of the target instruction. One important 1599 difference between explicit and implicit physical registers is that the 1600 latter are defined statically for each instruction, whereas the former may 1601 vary depending on the program being compiled. For example, an instruction 1602 that represents a function call will always implicitly define or use the same 1603 set of physical registers. To read the registers implicitly used by an 1604 instruction, 1605 use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored 1606 registers impose constraints on any register allocation algorithm. The 1607 register allocator must make sure that none of them are overwritten by 1608 the values of virtual registers while still alive.</p> 1609 1610 </div> 1611 1612 <!-- _______________________________________________________________________ --> 1613 1614 <h4> 1615 <a name="regAlloc_howTo">Mapping virtual registers to physical registers</a> 1616 </h4> 1617 1618 <div> 1619 1620 <p>There are two ways to map virtual registers to physical registers (or to 1621 memory slots). The first way, that we will call <i>direct mapping</i>, is 1622 based on the use of methods of the classes <tt>TargetRegisterInfo</tt>, 1623 and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect 1624 mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads 1625 and stores sending and getting values to and from memory.</p> 1626 1627 <p>The direct mapping provides more flexibility to the developer of the register 1628 allocator; however, it is more error prone, and demands more implementation 1629 work. Basically, the programmer will have to specify where load and store 1630 instructions should be inserted in the target function being compiled in 1631 order to get and store values in memory. To assign a physical register to a 1632 virtual register present in a given operand, 1633 use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction, 1634 use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a 1635 load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p> 1636 1637 <p>The indirect mapping shields the application developer from the complexities 1638 of inserting load and store instructions. In order to map a virtual register 1639 to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In 1640 order to map a certain virtual register to memory, 1641 use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return 1642 the stack slot where <tt>vreg</tt>'s value will be located. If it is 1643 necessary to map another virtual register to the same stack slot, 1644 use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One 1645 important point to consider when using the indirect mapping, is that even if 1646 a virtual register is mapped to memory, it still needs to be mapped to a 1647 physical register. This physical register is the location where the virtual 1648 register is supposed to be found before being stored or after being 1649 reloaded.</p> 1650 1651 <p>If the indirect strategy is used, after all the virtual registers have been 1652 mapped to physical registers or stack slots, it is necessary to use a spiller 1653 object to place load and store instructions in the code. Every virtual that 1654 has been mapped to a stack slot will be stored to memory after been defined 1655 and will be loaded before being used. The implementation of the spiller tries 1656 to recycle load/store instructions, avoiding unnecessary instructions. For an 1657 example of how to invoke the spiller, 1658 see <tt>RegAllocLinearScan::runOnMachineFunction</tt> 1659 in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p> 1660 1661 </div> 1662 1663 <!-- _______________________________________________________________________ --> 1664 <h4> 1665 <a name="regAlloc_twoAddr">Handling two address instructions</a> 1666 </h4> 1667 1668 <div> 1669 1670 <p>With very rare exceptions (e.g., function calls), the LLVM machine code 1671 instructions are three address instructions. That is, each instruction is 1672 expected to define at most one register, and to use at most two registers. 1673 However, some architectures use two address instructions. In this case, the 1674 defined register is also one of the used register. For instance, an 1675 instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent 1676 to <tt>%EAX = %EAX + %EBX</tt>.</p> 1677 1678 <p>In order to produce correct code, LLVM must convert three address 1679 instructions that represent two address instructions into true two address 1680 instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for 1681 this specific purpose. It must be run before register allocation takes 1682 place. After its execution, the resulting code may no longer be in SSA 1683 form. This happens, for instance, in situations where an instruction such 1684 as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p> 1685 1686 <div class="doc_code"> 1687 <pre> 1688 %a = MOVE %b 1689 %a = ADD %a %c 1690 </pre> 1691 </div> 1692 1693 <p>Notice that, internally, the second instruction is represented as 1694 <tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both 1695 used and defined by the instruction.</p> 1696 1697 </div> 1698 1699 <!-- _______________________________________________________________________ --> 1700 <h4> 1701 <a name="regAlloc_ssaDecon">The SSA deconstruction phase</a> 1702 </h4> 1703 1704 <div> 1705 1706 <p>An important transformation that happens during register allocation is called 1707 the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses 1708 that are performed on the control flow graph of programs. However, 1709 traditional instruction sets do not implement PHI instructions. Thus, in 1710 order to generate executable code, compilers must replace PHI instructions 1711 with other instructions that preserve their semantics.</p> 1712 1713 <p>There are many ways in which PHI instructions can safely be removed from the 1714 target code. The most traditional PHI deconstruction algorithm replaces PHI 1715 instructions with copy instructions. That is the strategy adopted by 1716 LLVM. The SSA deconstruction algorithm is implemented 1717 in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the 1718 identifier <tt>PHIEliminationID</tt> must be marked as required in the code 1719 of the register allocator.</p> 1720 1721 </div> 1722 1723 <!-- _______________________________________________________________________ --> 1724 <h4> 1725 <a name="regAlloc_fold">Instruction folding</a> 1726 </h4> 1727 1728 <div> 1729 1730 <p><i>Instruction folding</i> is an optimization performed during register 1731 allocation that removes unnecessary copy instructions. For instance, a 1732 sequence of instructions such as:</p> 1733 1734 <div class="doc_code"> 1735 <pre> 1736 %EBX = LOAD %mem_address 1737 %EAX = COPY %EBX 1738 </pre> 1739 </div> 1740 1741 <p>can be safely substituted by the single instruction:</p> 1742 1743 <div class="doc_code"> 1744 <pre> 1745 %EAX = LOAD %mem_address 1746 </pre> 1747 </div> 1748 1749 <p>Instructions can be folded with 1750 the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be 1751 taken when folding instructions; a folded instruction can be quite different 1752 from the original 1753 instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt> 1754 in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its 1755 use.</p> 1756 1757 </div> 1758 1759 <!-- _______________________________________________________________________ --> 1760 1761 <h4> 1762 <a name="regAlloc_builtIn">Built in register allocators</a> 1763 </h4> 1764 1765 <div> 1766 1767 <p>The LLVM infrastructure provides the application developer with three 1768 different register allocators:</p> 1769 1770 <ul> 1771 <li><i>Linear Scan</i> — <i>The default allocator</i>. This is the 1772 well-know linear scan register allocator. Whereas the 1773 <i>Simple</i> and <i>Local</i> algorithms use a direct mapping 1774 implementation technique, the <i>Linear Scan</i> implementation 1775 uses a spiller in order to place load and stores.</li> 1776 1777 <li><i>Fast</i> — This register allocator is the default for debug 1778 builds. It allocates registers on a basic block level, attempting to keep 1779 values in registers and reusing registers as appropriate.</li> 1780 1781 <li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP) 1782 based register allocator. This allocator works by constructing a PBQP 1783 problem representing the register allocation problem under consideration, 1784 solving this using a PBQP solver, and mapping the solution back to a 1785 register assignment.</li> 1786 1787 </ul> 1788 1789 <p>The type of register allocator used in <tt>llc</tt> can be chosen with the 1790 command line option <tt>-regalloc=...</tt>:</p> 1791 1792 <div class="doc_code"> 1793 <pre> 1794 $ llc -regalloc=linearscan file.bc -o ln.s; 1795 $ llc -regalloc=fast file.bc -o fa.s; 1796 $ llc -regalloc=pbqp file.bc -o pbqp.s; 1797 </pre> 1798 </div> 1799 1800 </div> 1801 1802 </div> 1803 1804 <!-- ======================================================================= --> 1805 <h3> 1806 <a name="proepicode">Prolog/Epilog Code Insertion</a> 1807 </h3> 1808 <div><p>To Be Written</p></div> 1809 <!-- ======================================================================= --> 1810 <h3> 1811 <a name="latemco">Late Machine Code Optimizations</a> 1812 </h3> 1813 <div><p>To Be Written</p></div> 1814 1815 <!-- ======================================================================= --> 1816 <h3> 1817 <a name="codeemit">Code Emission</a> 1818 </h3> 1819 1820 <div> 1821 1822 <p>The code emission step of code generation is responsible for lowering from 1823 the code generator abstractions (like <a 1824 href="#machinefunction">MachineFunction</a>, <a 1825 href="#machineinstr">MachineInstr</a>, etc) down 1826 to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>, 1827 <a href="#mcstreamer">MCStreamer</a>, etc). This is 1828 done with a combination of several different classes: the (misnamed) 1829 target-independent AsmPrinter class, target-specific subclasses of AsmPrinter 1830 (such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p> 1831 1832 <p>Since the MC layer works at the level of abstraction of object files, it 1833 doesn't have a notion of functions, global variables etc. Instead, it thinks 1834 about labels, directives, and instructions. A key class used at this time is 1835 the MCStreamer class. This is an abstract API that is implemented in different 1836 ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively 1837 an "assembler API". MCStreamer has one method per directive, such as EmitLabel, 1838 EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly 1839 level directives. 1840 </p> 1841 1842 <p>If you are interested in implementing a code generator for a target, there 1843 are three important things that you have to implement for your target:</p> 1844 1845 <ol> 1846 <li>First, you need a subclass of AsmPrinter for your target. This class 1847 implements the general lowering process converting MachineFunction's into MC 1848 label constructs. The AsmPrinter base class provides a number of useful methods 1849 and routines, and also allows you to override the lowering process in some 1850 important ways. You should get much of the lowering for free if you are 1851 implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile 1852 class implements much of the common logic.</li> 1853 1854 <li>Second, you need to implement an instruction printer for your target. The 1855 instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a 1856 raw_ostream as text. Most of this is automatically generated from the .td file 1857 (when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the 1858 instructions), but you need to implement routines to print operands.</li> 1859 1860 <li>Third, you need to implement code that lowers a <a 1861 href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in 1862 "<target>MCInstLower.cpp". This lowering process is often target 1863 specific, and is responsible for turning jump table entries, constant pool 1864 indices, global variable addresses, etc into MCLabels as appropriate. This 1865 translation layer is also responsible for expanding pseudo ops used by the code 1866 generator into the actual machine instructions they correspond to. The MCInsts 1867 that are generated by this are fed into the instruction printer or the encoder. 1868 </li> 1869 1870 </ol> 1871 1872 <p>Finally, at your choosing, you can also implement an subclass of 1873 MCCodeEmitter which lowers MCInst's into machine code bytes and relocations. 1874 This is important if you want to support direct .o file emission, or would like 1875 to implement an assembler for your target.</p> 1876 1877 </div> 1878 1879 </div> 1880 1881 <!-- *********************************************************************** --> 1882 <h2> 1883 <a name="nativeassembler">Implementing a Native Assembler</a> 1884 </h2> 1885 <!-- *********************************************************************** --> 1886 1887 <div> 1888 1889 <p>Though you're probably reading this because you want to write or maintain a 1890 compiler backend, LLVM also fully supports building a native assemblers too. 1891 We've tried hard to automate the generation of the assembler from the .td files 1892 (in particular the instruction syntax and encodings), which means that a large 1893 part of the manual and repetitive data entry can be factored and shared with the 1894 compiler.</p> 1895 1896 <!-- ======================================================================= --> 1897 <h3 id="na_instparsing">Instruction Parsing</h3> 1898 1899 <div><p>To Be Written</p></div> 1900 1901 1902 <!-- ======================================================================= --> 1903 <h3 id="na_instaliases"> 1904 Instruction Alias Processing 1905 </h3> 1906 1907 <div> 1908 <p>Once the instruction is parsed, it enters the MatchInstructionImpl function. 1909 The MatchInstructionImpl function performs alias processing and then does 1910 actual matching.</p> 1911 1912 <p>Alias processing is the phase that canonicalizes different lexical forms of 1913 the same instructions down to one representation. There are several different 1914 kinds of alias that are possible to implement and they are listed below in the 1915 order that they are processed (which is in order from simplest/weakest to most 1916 complex/powerful). Generally you want to use the first alias mechanism that 1917 meets the needs of your instruction, because it will allow a more concise 1918 description.</p> 1919 1920 <!-- _______________________________________________________________________ --> 1921 <h4>Mnemonic Aliases</h4> 1922 1923 <div> 1924 1925 <p>The first phase of alias processing is simple instruction mnemonic 1926 remapping for classes of instructions which are allowed with two different 1927 mnemonics. This phase is a simple and unconditionally remapping from one input 1928 mnemonic to one output mnemonic. It isn't possible for this form of alias to 1929 look at the operands at all, so the remapping must apply for all forms of a 1930 given mnemonic. Mnemonic aliases are defined simply, for example X86 has: 1931 </p> 1932 1933 <div class="doc_code"> 1934 <pre> 1935 def : MnemonicAlias<"cbw", "cbtw">; 1936 def : MnemonicAlias<"smovq", "movsq">; 1937 def : MnemonicAlias<"fldcww", "fldcw">; 1938 def : MnemonicAlias<"fucompi", "fucomip">; 1939 def : MnemonicAlias<"ud2a", "ud2">; 1940 </pre> 1941 </div> 1942 1943 <p>... and many others. With a MnemonicAlias definition, the mnemonic is 1944 remapped simply and directly. Though MnemonicAlias's can't look at any aspect 1945 of the instruction (such as the operands) they can depend on global modes (the 1946 same ones supported by the matcher), through a Requires clause:</p> 1947 1948 <div class="doc_code"> 1949 <pre> 1950 def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>; 1951 def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>; 1952 </pre> 1953 </div> 1954 1955 <p>In this example, the mnemonic gets mapped into different a new one depending 1956 on the current instruction set.</p> 1957 1958 </div> 1959 1960 <!-- _______________________________________________________________________ --> 1961 <h4>Instruction Aliases</h4> 1962 1963 <div> 1964 1965 <p>The most general phase of alias processing occurs while matching is 1966 happening: it provides new forms for the matcher to match along with a specific 1967 instruction to generate. An instruction alias has two parts: the string to 1968 match and the instruction to generate. For example: 1969 </p> 1970 1971 <div class="doc_code"> 1972 <pre> 1973 def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>; 1974 def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>; 1975 def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>; 1976 def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>; 1977 def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>; 1978 def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>; 1979 def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>; 1980 </pre> 1981 </div> 1982 1983 <p>This shows a powerful example of the instruction aliases, matching the 1984 same mnemonic in multiple different ways depending on what operands are present 1985 in the assembly. The result of instruction aliases can include operands in a 1986 different order than the destination instruction, and can use an input 1987 multiple times, for example:</p> 1988 1989 <div class="doc_code"> 1990 <pre> 1991 def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>; 1992 def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>; 1993 def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>; 1994 def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>; 1995 </pre> 1996 </div> 1997 1998 <p>This example also shows that tied operands are only listed once. In the X86 1999 backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied 2000 to the output). InstAliases take a flattened operand list without duplicates 2001 for tied operands. The result of an instruction alias can also use immediates 2002 and fixed physical registers which are added as simple immediate operands in the 2003 result, for example:</p> 2004 2005 <div class="doc_code"> 2006 <pre> 2007 // Fixed Immediate operand. 2008 def : InstAlias<"aad", (AAD8i8 10)>; 2009 2010 // Fixed register operand. 2011 def : InstAlias<"fcomi", (COM_FIr ST1)>; 2012 2013 // Simple alias. 2014 def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>; 2015 </pre> 2016 </div> 2017 2018 2019 <p>Instruction aliases can also have a Requires clause to make them 2020 subtarget specific.</p> 2021 2022 <p>If the back-end supports it, the instruction printer can automatically emit 2023 the alias rather than what's being aliased. It typically leads to better, 2024 more readable code. If it's better to print out what's being aliased, then 2025 pass a '0' as the third parameter to the InstAlias definition.</p> 2026 2027 </div> 2028 2029 </div> 2030 2031 <!-- ======================================================================= --> 2032 <h3 id="na_matching">Instruction Matching</h3> 2033 2034 <div><p>To Be Written</p></div> 2035 2036 </div> 2037 2038 <!-- *********************************************************************** --> 2039 <h2> 2040 <a name="targetimpls">Target-specific Implementation Notes</a> 2041 </h2> 2042 <!-- *********************************************************************** --> 2043 2044 <div> 2045 2046 <p>This section of the document explains features or design decisions that are 2047 specific to the code generator for a particular target. First we start 2048 with a table that summarizes what features are supported by each target.</p> 2049 2050 <!-- ======================================================================= --> 2051 <h3> 2052 <a name="targetfeatures">Target Feature Matrix</a> 2053 </h3> 2054 2055 <div> 2056 2057 <p>Note that this table does not include the C backend or Cpp backends, since 2058 they do not use the target independent code generator infrastructure. It also 2059 doesn't list features that are not supported fully by any target yet. It 2060 considers a feature to be supported if at least one subtarget supports it. A 2061 feature being supported means that it is useful and works for most cases, it 2062 does not indicate that there are zero known bugs in the implementation. Here 2063 is the key:</p> 2064 2065 2066 <table border="1" cellspacing="0"> 2067 <tr> 2068 <th>Unknown</th> 2069 <th>No support</th> 2070 <th>Partial Support</th> 2071 <th>Complete Support</th> 2072 </tr> 2073 <tr> 2074 <td class="unknown"></td> 2075 <td class="no"></td> 2076 <td class="partial"></td> 2077 <td class="yes"></td> 2078 </tr> 2079 </table> 2080 2081 <p>Here is the table:</p> 2082 2083 <table width="689" border="1" cellspacing="0"> 2084 <tr><td></td> 2085 <td colspan="13" align="center" style="background-color:#ffc">Target</td> 2086 </tr> 2087 <tr> 2088 <th>Feature</th> 2089 <th>ARM</th> 2090 <th>Alpha</th> 2091 <th>Blackfin</th> 2092 <th>CellSPU</th> 2093 <th>MBlaze</th> 2094 <th>MSP430</th> 2095 <th>Mips</th> 2096 <th>PTX</th> 2097 <th>PowerPC</th> 2098 <th>Sparc</th> 2099 <th>SystemZ</th> 2100 <th>X86</th> 2101 <th>XCore</th> 2102 </tr> 2103 2104 <tr> 2105 <td><a href="#feat_reliable">is generally reliable</a></td> 2106 <td class="yes"></td> <!-- ARM --> 2107 <td class="unknown"></td> <!-- Alpha --> 2108 <td class="no"></td> <!-- Blackfin --> 2109 <td class="no"></td> <!-- CellSPU --> 2110 <td class="no"></td> <!-- MBlaze --> 2111 <td class="unknown"></td> <!-- MSP430 --> 2112 <td class="no"></td> <!-- Mips --> 2113 <td class="no"></td> <!-- PTX --> 2114 <td class="yes"></td> <!-- PowerPC --> 2115 <td class="yes"></td> <!-- Sparc --> 2116 <td class="unknown"></td> <!-- SystemZ --> 2117 <td class="yes"></td> <!-- X86 --> 2118 <td class="unknown"></td> <!-- XCore --> 2119 </tr> 2120 2121 <tr> 2122 <td><a href="#feat_asmparser">assembly parser</a></td> 2123 <td class="no"></td> <!-- ARM --> 2124 <td class="no"></td> <!-- Alpha --> 2125 <td class="no"></td> <!-- Blackfin --> 2126 <td class="no"></td> <!-- CellSPU --> 2127 <td class="yes"></td> <!-- MBlaze --> 2128 <td class="no"></td> <!-- MSP430 --> 2129 <td class="no"></td> <!-- Mips --> 2130 <td class="no"></td> <!-- PTX --> 2131 <td class="no"></td> <!-- PowerPC --> 2132 <td class="no"></td> <!-- Sparc --> 2133 <td class="no"></td> <!-- SystemZ --> 2134 <td class="yes"></td> <!-- X86 --> 2135 <td class="no"></td> <!-- XCore --> 2136 </tr> 2137 2138 <tr> 2139 <td><a href="#feat_disassembler">disassembler</a></td> 2140 <td class="yes"></td> <!-- ARM --> 2141 <td class="no"></td> <!-- Alpha --> 2142 <td class="no"></td> <!-- Blackfin --> 2143 <td class="no"></td> <!-- CellSPU --> 2144 <td class="yes"></td> <!-- MBlaze --> 2145 <td class="no"></td> <!-- MSP430 --> 2146 <td class="no"></td> <!-- Mips --> 2147 <td class="no"></td> <!-- PTX --> 2148 <td class="no"></td> <!-- PowerPC --> 2149 <td class="no"></td> <!-- Sparc --> 2150 <td class="no"></td> <!-- SystemZ --> 2151 <td class="yes"></td> <!-- X86 --> 2152 <td class="no"></td> <!-- XCore --> 2153 </tr> 2154 2155 <tr> 2156 <td><a href="#feat_inlineasm">inline asm</a></td> 2157 <td class="yes"></td> <!-- ARM --> 2158 <td class="unknown"></td> <!-- Alpha --> 2159 <td class="yes"></td> <!-- Blackfin --> 2160 <td class="no"></td> <!-- CellSPU --> 2161 <td class="yes"></td> <!-- MBlaze --> 2162 <td class="unknown"></td> <!-- MSP430 --> 2163 <td class="no"></td> <!-- Mips --> 2164 <td class="unknown"></td> <!-- PTX --> 2165 <td class="yes"></td> <!-- PowerPC --> 2166 <td class="unknown"></td> <!-- Sparc --> 2167 <td class="unknown"></td> <!-- SystemZ --> 2168 <td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 --> 2169 <td class="unknown"></td> <!-- XCore --> 2170 </tr> 2171 2172 <tr> 2173 <td><a href="#feat_jit">jit</a></td> 2174 <td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM --> 2175 <td class="no"></td> <!-- Alpha --> 2176 <td class="no"></td> <!-- Blackfin --> 2177 <td class="no"></td> <!-- CellSPU --> 2178 <td class="no"></td> <!-- MBlaze --> 2179 <td class="unknown"></td> <!-- MSP430 --> 2180 <td class="no"></td> <!-- Mips --> 2181 <td class="unknown"></td> <!-- PTX --> 2182 <td class="yes"></td> <!-- PowerPC --> 2183 <td class="unknown"></td> <!-- Sparc --> 2184 <td class="unknown"></td> <!-- SystemZ --> 2185 <td class="yes"></td> <!-- X86 --> 2186 <td class="unknown"></td> <!-- XCore --> 2187 </tr> 2188 2189 <tr> 2190 <td><a href="#feat_objectwrite">.o file writing</a></td> 2191 <td class="no"></td> <!-- ARM --> 2192 <td class="no"></td> <!-- Alpha --> 2193 <td class="no"></td> <!-- Blackfin --> 2194 <td class="no"></td> <!-- CellSPU --> 2195 <td class="yes"></td> <!-- MBlaze --> 2196 <td class="no"></td> <!-- MSP430 --> 2197 <td class="no"></td> <!-- Mips --> 2198 <td class="no"></td> <!-- PTX --> 2199 <td class="no"></td> <!-- PowerPC --> 2200 <td class="no"></td> <!-- Sparc --> 2201 <td class="no"></td> <!-- SystemZ --> 2202 <td class="yes"></td> <!-- X86 --> 2203 <td class="no"></td> <!-- XCore --> 2204 </tr> 2205 2206 <tr> 2207 <td><a href="#feat_tailcall">tail calls</a></td> 2208 <td class="yes"></td> <!-- ARM --> 2209 <td class="unknown"></td> <!-- Alpha --> 2210 <td class="no"></td> <!-- Blackfin --> 2211 <td class="no"></td> <!-- CellSPU --> 2212 <td class="no"></td> <!-- MBlaze --> 2213 <td class="unknown"></td> <!-- MSP430 --> 2214 <td class="no"></td> <!-- Mips --> 2215 <td class="unknown"></td> <!-- PTX --> 2216 <td class="yes"></td> <!-- PowerPC --> 2217 <td class="unknown"></td> <!-- Sparc --> 2218 <td class="unknown"></td> <!-- SystemZ --> 2219 <td class="yes"></td> <!-- X86 --> 2220 <td class="unknown"></td> <!-- XCore --> 2221 </tr> 2222 2223 2224 </table> 2225 2226 <!-- _______________________________________________________________________ --> 2227 <h4 id="feat_reliable">Is Generally Reliable</h4> 2228 2229 <div> 2230 <p>This box indicates whether the target is considered to be production quality. 2231 This indicates that the target has been used as a static compiler to 2232 compile large amounts of code by a variety of different people and is in 2233 continuous use.</p> 2234 </div> 2235 2236 <!-- _______________________________________________________________________ --> 2237 <h4 id="feat_asmparser">Assembly Parser</h4> 2238 2239 <div> 2240 <p>This box indicates whether the target supports parsing target specific .s 2241 files by implementing the MCAsmParser interface. This is required for llvm-mc 2242 to be able to act as a native assembler and is required for inline assembly 2243 support in the native .o file writer.</p> 2244 2245 </div> 2246 2247 2248 <!-- _______________________________________________________________________ --> 2249 <h4 id="feat_disassembler">Disassembler</h4> 2250 2251 <div> 2252 <p>This box indicates whether the target supports the MCDisassembler API for 2253 disassembling machine opcode bytes into MCInst's.</p> 2254 2255 </div> 2256 2257 <!-- _______________________________________________________________________ --> 2258 <h4 id="feat_inlineasm">Inline Asm</h4> 2259 2260 <div> 2261 <p>This box indicates whether the target supports most popular inline assembly 2262 constraints and modifiers.</p> 2263 2264 <p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly 2265 constraints relating to the X86 floating point stack.</p> 2266 2267 </div> 2268 2269 <!-- _______________________________________________________________________ --> 2270 <h4 id="feat_jit">JIT Support</h4> 2271 2272 <div> 2273 <p>This box indicates whether the target supports the JIT compiler through 2274 the ExecutionEngine interface.</p> 2275 2276 <p id="feat_jit_arm">The ARM backend has basic support for integer code 2277 in ARM codegen mode, but lacks NEON and full Thumb support.</p> 2278 2279 </div> 2280 2281 <!-- _______________________________________________________________________ --> 2282 <h4 id="feat_objectwrite">.o File Writing</h4> 2283 2284 <div> 2285 2286 <p>This box indicates whether the target supports writing .o files (e.g. MachO, 2287 ELF, and/or COFF) files directly from the target. Note that the target also 2288 must include an assembly parser and general inline assembly support for full 2289 inline assembly support in the .o writer.</p> 2290 2291 <p>Targets that don't support this feature can obviously still write out .o 2292 files, they just rely on having an external assembler to translate from a .s 2293 file to a .o file (as is the case for many C compilers).</p> 2294 2295 </div> 2296 2297 <!-- _______________________________________________________________________ --> 2298 <h4 id="feat_tailcall">Tail Calls</h4> 2299 2300 <div> 2301 2302 <p>This box indicates whether the target supports guaranteed tail calls. These 2303 are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc 2304 calling convention. Please see the <a href="#tailcallopt">tail call section 2305 more more details</a>.</p> 2306 2307 </div> 2308 2309 </div> 2310 2311 <!-- ======================================================================= --> 2312 <h3> 2313 <a name="tailcallopt">Tail call optimization</a> 2314 </h3> 2315 2316 <div> 2317 2318 <p>Tail call optimization, callee reusing the stack of the caller, is currently 2319 supported on x86/x86-64 and PowerPC. It is performed if:</p> 2320 2321 <ul> 2322 <li>Caller and callee have the calling convention <tt>fastcc</tt> or 2323 <tt>cc 10</tt> (GHC call convention).</li> 2324 2325 <li>The call is a tail call - in tail position (ret immediately follows call 2326 and ret uses value of call or is void).</li> 2327 2328 <li>Option <tt>-tailcallopt</tt> is enabled.</li> 2329 2330 <li>Platform specific constraints are met.</li> 2331 </ul> 2332 2333 <p>x86/x86-64 constraints:</p> 2334 2335 <ul> 2336 <li>No variable argument lists are used.</li> 2337 2338 <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility 2339 = hidden or protected) are supported.</li> 2340 </ul> 2341 2342 <p>PowerPC constraints:</p> 2343 2344 <ul> 2345 <li>No variable argument lists are used.</li> 2346 2347 <li>No byval parameters are used.</li> 2348 2349 <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li> 2350 </ul> 2351 2352 <p>Example:</p> 2353 2354 <p>Call as <tt>llc -tailcallopt test.ll</tt>.</p> 2355 2356 <div class="doc_code"> 2357 <pre> 2358 declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4) 2359 2360 define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { 2361 %l1 = add i32 %in1, %in2 2362 %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1) 2363 ret i32 %tmp 2364 } 2365 </pre> 2366 </div> 2367 2368 <p>Implications of <tt>-tailcallopt</tt>:</p> 2369 2370 <p>To support tail call optimization in situations where the callee has more 2371 arguments than the caller a 'callee pops arguments' convention is used. This 2372 currently causes each <tt>fastcc</tt> call that is not tail call optimized 2373 (because one or more of above constraints are not met) to be followed by a 2374 readjustment of the stack. So performance might be worse in such cases.</p> 2375 2376 </div> 2377 <!-- ======================================================================= --> 2378 <h3> 2379 <a name="sibcallopt">Sibling call optimization</a> 2380 </h3> 2381 2382 <div> 2383 2384 <p>Sibling call optimization is a restricted form of tail call optimization. 2385 Unlike tail call optimization described in the previous section, it can be 2386 performed automatically on any tail calls when <tt>-tailcallopt</tt> option 2387 is not specified.</p> 2388 2389 <p>Sibling call optimization is currently performed on x86/x86-64 when the 2390 following constraints are met:</p> 2391 2392 <ul> 2393 <li>Caller and callee have the same calling convention. It can be either 2394 <tt>c</tt> or <tt>fastcc</tt>. 2395 2396 <li>The call is a tail call - in tail position (ret immediately follows call 2397 and ret uses value of call or is void).</li> 2398 2399 <li>Caller and callee have matching return type or the callee result is not 2400 used. 2401 2402 <li>If any of the callee arguments are being passed in stack, they must be 2403 available in caller's own incoming argument stack and the frame offsets 2404 must be the same. 2405 </ul> 2406 2407 <p>Example:</p> 2408 <div class="doc_code"> 2409 <pre> 2410 declare i32 @bar(i32, i32) 2411 2412 define i32 @foo(i32 %a, i32 %b, i32 %c) { 2413 entry: 2414 %0 = tail call i32 @bar(i32 %a, i32 %b) 2415 ret i32 %0 2416 } 2417 </pre> 2418 </div> 2419 2420 </div> 2421 <!-- ======================================================================= --> 2422 <h3> 2423 <a name="x86">The X86 backend</a> 2424 </h3> 2425 2426 <div> 2427 2428 <p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This 2429 code generator is capable of targeting a variety of x86-32 and x86-64 2430 processors, and includes support for ISA extensions such as MMX and SSE.</p> 2431 2432 <!-- _______________________________________________________________________ --> 2433 <h4> 2434 <a name="x86_tt">X86 Target Triples supported</a> 2435 </h4> 2436 2437 <div> 2438 2439 <p>The following are the known target triples that are supported by the X86 2440 backend. This is not an exhaustive list, and it would be useful to add those 2441 that people test.</p> 2442 2443 <ul> 2444 <li><b>i686-pc-linux-gnu</b> — Linux</li> 2445 2446 <li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li> 2447 2448 <li><b>i686-pc-cygwin</b> — Cygwin on Win32</li> 2449 2450 <li><b>i686-pc-mingw32</b> — MingW on Win32</li> 2451 2452 <li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li> 2453 2454 <li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li> 2455 2456 <li><b>x86_64-unknown-linux-gnu</b> — Linux</li> 2457 </ul> 2458 2459 </div> 2460 2461 <!-- _______________________________________________________________________ --> 2462 <h4> 2463 <a name="x86_cc">X86 Calling Conventions supported</a> 2464 </h4> 2465 2466 2467 <div> 2468 2469 <p>The following target-specific calling conventions are known to backend:</p> 2470 2471 <ul> 2472 <li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft 2473 Windows platform (CC ID = 64).</li> 2474 <li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft 2475 Windows platform (CC ID = 65).</li> 2476 <li><b>x86_ThisCall</b> — Similar to X86_StdCall. Passes first argument 2477 in ECX, others via stack. Callee is responsible for stack cleaning. This 2478 convention is used by MSVC by default for methods in its ABI 2479 (CC ID = 70).</li> 2480 </ul> 2481 2482 </div> 2483 2484 <!-- _______________________________________________________________________ --> 2485 <h4> 2486 <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a> 2487 </h4> 2488 2489 <div> 2490 2491 <p>The x86 has a very flexible way of accessing memory. It is capable of 2492 forming memory addresses of the following expression directly in integer 2493 instructions (which use ModR/M addressing):</p> 2494 2495 <div class="doc_code"> 2496 <pre> 2497 SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32 2498 </pre> 2499 </div> 2500 2501 <p>In order to represent this, LLVM tracks no less than 5 operands for each 2502 memory operand of this form. This means that the "load" form of 2503 '<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p> 2504 2505 <div class="doc_code"> 2506 <pre> 2507 Index: 0 | 1 2 3 4 5 2508 Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment 2509 OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg 2510 </pre> 2511 </div> 2512 2513 <p>Stores, and all other instructions, treat the four memory operands in the 2514 same way and in the same order. If the segment register is unspecified 2515 (regno = 0), then no segment override is generated. "Lea" operations do not 2516 have a segment register specified, so they only have 4 operands for their 2517 memory reference.</p> 2518 2519 </div> 2520 2521 <!-- _______________________________________________________________________ --> 2522 <h4> 2523 <a name="x86_memory">X86 address spaces supported</a> 2524 </h4> 2525 2526 <div> 2527 2528 <p>x86 has a feature which provides 2529 the ability to perform loads and stores to different address spaces 2530 via the x86 segment registers. A segment override prefix byte on an 2531 instruction causes the instruction's memory access to go to the specified 2532 segment. LLVM address space 0 is the default address space, which includes 2533 the stack, and any unqualified memory accesses in a program. Address spaces 2534 1-255 are currently reserved for user-defined code. The GS-segment is 2535 represented by address space 256, while the FS-segment is represented by 2536 address space 257. Other x86 segments have yet to be allocated address space 2537 numbers.</p> 2538 2539 <p>While these address spaces may seem similar to TLS via the 2540 <tt>thread_local</tt> keyword, and often use the same underlying hardware, 2541 there are some fundamental differences.</p> 2542 2543 <p>The <tt>thread_local</tt> keyword applies to global variables and 2544 specifies that they are to be allocated in thread-local memory. There are 2545 no type qualifiers involved, and these variables can be pointed to with 2546 normal pointers and accessed with normal loads and stores. 2547 The <tt>thread_local</tt> keyword is target-independent at the LLVM IR 2548 level (though LLVM doesn't yet have implementations of it for some 2549 configurations).<p> 2550 2551 <p>Special address spaces, in contrast, apply to static types. Every 2552 load and store has a particular address space in its address operand type, 2553 and this is what determines which address space is accessed. 2554 LLVM ignores these special address space qualifiers on global variables, 2555 and does not provide a way to directly allocate storage in them. 2556 At the LLVM IR level, the behavior of these special address spaces depends 2557 in part on the underlying OS or runtime environment, and they are specific 2558 to x86 (and LLVM doesn't yet handle them correctly in some cases).</p> 2559 2560 <p>Some operating systems and runtime environments use (or may in the future 2561 use) the FS/GS-segment registers for various low-level purposes, so care 2562 should be taken when considering them.</p> 2563 2564 </div> 2565 2566 <!-- _______________________________________________________________________ --> 2567 <h4> 2568 <a name="x86_names">Instruction naming</a> 2569 </h4> 2570 2571 <div> 2572 2573 <p>An instruction name consists of the base name, a default operand size, and a 2574 a character per operand with an optional special size. For example:</p> 2575 2576 <div class="doc_code"> 2577 <pre> 2578 ADD8rr -> add, 8-bit register, 8-bit register 2579 IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate 2580 IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate 2581 MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory 2582 </pre> 2583 </div> 2584 2585 </div> 2586 2587 </div> 2588 2589 <!-- ======================================================================= --> 2590 <h3> 2591 <a name="ppc">The PowerPC backend</a> 2592 </h3> 2593 2594 <div> 2595 2596 <p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The 2597 code generation is retargetable to several variations or <i>subtargets</i> of 2598 the PowerPC ISA; including ppc32, ppc64 and altivec.</p> 2599 2600 <!-- _______________________________________________________________________ --> 2601 <h4> 2602 <a name="ppc_abi">LLVM PowerPC ABI</a> 2603 </h4> 2604 2605 <div> 2606 2607 <p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC 2608 relative (PIC) or static addressing for accessing global values, so no TOC 2609 (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth 2610 of a stack frame. LLVM takes advantage of having no TOC to provide space to 2611 save the frame pointer in the PowerPC linkage area of the caller frame. 2612 Other details of PowerPC ABI can be found at <a href= 2613 "http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html" 2614 >PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI 2615 is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved 2616 for system use.</p> 2617 2618 </div> 2619 2620 <!-- _______________________________________________________________________ --> 2621 <h4> 2622 <a name="ppc_frame">Frame Layout</a> 2623 </h4> 2624 2625 <div> 2626 2627 <p>The size of a PowerPC frame is usually fixed for the duration of a 2628 function's invocation. Since the frame is fixed size, all references 2629 into the frame can be accessed via fixed offsets from the stack pointer. The 2630 exception to this is when dynamic alloca or variable sized arrays are 2631 present, then a base pointer (r31) is used as a proxy for the stack pointer 2632 and stack pointer is free to grow or shrink. A base pointer is also used if 2633 llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is 2634 always aligned to 16 bytes, so that space allocated for altivec vectors will 2635 be properly aligned.</p> 2636 2637 <p>An invocation frame is laid out as follows (low memory at top);</p> 2638 2639 <table class="layout"> 2640 <tr> 2641 <td>Linkage<br><br></td> 2642 </tr> 2643 <tr> 2644 <td>Parameter area<br><br></td> 2645 </tr> 2646 <tr> 2647 <td>Dynamic area<br><br></td> 2648 </tr> 2649 <tr> 2650 <td>Locals area<br><br></td> 2651 </tr> 2652 <tr> 2653 <td>Saved registers area<br><br></td> 2654 </tr> 2655 <tr style="border-style: none hidden none hidden;"> 2656 <td><br></td> 2657 </tr> 2658 <tr> 2659 <td>Previous Frame<br><br></td> 2660 </tr> 2661 </table> 2662 2663 <p>The <i>linkage</i> area is used by a callee to save special registers prior 2664 to allocating its own frame. Only three entries are relevant to LLVM. The 2665 first entry is the previous stack pointer (sp), aka link. This allows 2666 probing tools like gdb or exception handlers to quickly scan the frames in 2667 the stack. A function epilog can also use the link to pop the frame from the 2668 stack. The third entry in the linkage area is used to save the return 2669 address from the lr register. Finally, as mentioned above, the last entry is 2670 used to save the previous frame pointer (r31.) The entries in the linkage 2671 area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit 2672 mode and 48 bytes in 64 bit mode.</p> 2673 2674 <p>32 bit linkage area</p> 2675 2676 <table class="layout"> 2677 <tr> 2678 <td>0</td> 2679 <td>Saved SP (r1)</td> 2680 </tr> 2681 <tr> 2682 <td>4</td> 2683 <td>Saved CR</td> 2684 </tr> 2685 <tr> 2686 <td>8</td> 2687 <td>Saved LR</td> 2688 </tr> 2689 <tr> 2690 <td>12</td> 2691 <td>Reserved</td> 2692 </tr> 2693 <tr> 2694 <td>16</td> 2695 <td>Reserved</td> 2696 </tr> 2697 <tr> 2698 <td>20</td> 2699 <td>Saved FP (r31)</td> 2700 </tr> 2701 </table> 2702 2703 <p>64 bit linkage area</p> 2704 2705 <table class="layout"> 2706 <tr> 2707 <td>0</td> 2708 <td>Saved SP (r1)</td> 2709 </tr> 2710 <tr> 2711 <td>8</td> 2712 <td>Saved CR</td> 2713 </tr> 2714 <tr> 2715 <td>16</td> 2716 <td>Saved LR</td> 2717 </tr> 2718 <tr> 2719 <td>24</td> 2720 <td>Reserved</td> 2721 </tr> 2722 <tr> 2723 <td>32</td> 2724 <td>Reserved</td> 2725 </tr> 2726 <tr> 2727 <td>40</td> 2728 <td>Saved FP (r31)</td> 2729 </tr> 2730 </table> 2731 2732 <p>The <i>parameter area</i> is used to store arguments being passed to a callee 2733 function. Following the PowerPC ABI, the first few arguments are actually 2734 passed in registers, with the space in the parameter area unused. However, 2735 if there are not enough registers or the callee is a thunk or vararg 2736 function, these register arguments can be spilled into the parameter area. 2737 Thus, the parameter area must be large enough to store all the parameters for 2738 the largest call sequence made by the caller. The size must also be 2739 minimally large enough to spill registers r3-r10. This allows callees blind 2740 to the call signature, such as thunks and vararg functions, enough space to 2741 cache the argument registers. Therefore, the parameter area is minimally 32 2742 bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is 2743 a fixed offset from the top of the frame, that a callee can access its spilt 2744 arguments using fixed offsets from the stack pointer (or base pointer.)</p> 2745 2746 <p>Combining the information about the linkage, parameter areas and alignment. A 2747 stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit 2748 mode.</p> 2749 2750 <p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic 2751 alloca then space is added to the stack, the linkage and parameter areas are 2752 shifted to top of stack, and the new space is available immediately below the 2753 linkage and parameter areas. The cost of shifting the linkage and parameter 2754 areas is minor since only the link value needs to be copied. The link value 2755 can be easily fetched by adding the original frame size to the base pointer. 2756 Note that allocations in the dynamic space need to observe 16 byte 2757 alignment.</p> 2758 2759 <p>The <i>locals area</i> is where the llvm compiler reserves space for local 2760 variables.</p> 2761 2762 <p>The <i>saved registers area</i> is where the llvm compiler spills callee 2763 saved registers on entry to the callee.</p> 2764 2765 </div> 2766 2767 <!-- _______________________________________________________________________ --> 2768 <h4> 2769 <a name="ppc_prolog">Prolog/Epilog</a> 2770 </h4> 2771 2772 <div> 2773 2774 <p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with 2775 the following exceptions. Callee saved registers are spilled after the frame 2776 is created. This allows the llvm epilog/prolog support to be common with 2777 other targets. The base pointer callee saved register r31 is saved in the 2778 TOC slot of linkage area. This simplifies allocation of space for the base 2779 pointer and makes it convenient to locate programatically and during 2780 debugging.</p> 2781 2782 </div> 2783 2784 <!-- _______________________________________________________________________ --> 2785 <h4> 2786 <a name="ppc_dynamic">Dynamic Allocation</a> 2787 </h4> 2788 2789 <div> 2790 2791 <p><i>TODO - More to come.</i></p> 2792 2793 </div> 2794 2795 </div> 2796 2797 </div> 2798 2799 <!-- *********************************************************************** --> 2800 <hr> 2801 <address> 2802 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img 2803 src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> 2804 <a href="http://validator.w3.org/check/referer"><img 2805 src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> 2806 2807 <a href="mailto:sabre (a] nondot.org">Chris Lattner</a><br> 2808 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br> 2809 Last modified: $Date$ 2810 </address> 2811 2812 </body> 2813 </html> 2814