1 ======================== 2 LLVM Programmer's Manual 3 ======================== 4 5 .. contents:: 6 :local: 7 8 .. warning:: 9 This is always a work in progress. 10 11 .. _introduction: 12 13 Introduction 14 ============ 15 16 This document is meant to highlight some of the important classes and interfaces 17 available in the LLVM source-base. This manual is not intended to explain what 18 LLVM is, how it works, and what LLVM code looks like. It assumes that you know 19 the basics of LLVM and are interested in writing transformations or otherwise 20 analyzing or manipulating the code. 21 22 This document should get you oriented so that you can find your way in the 23 continuously growing source code that makes up the LLVM infrastructure. Note 24 that this manual is not intended to serve as a replacement for reading the 25 source code, so if you think there should be a method in one of these classes to 26 do something, but it's not listed, check the source. Links to the `doxygen 27 <http://llvm.org/doxygen/>`__ sources are provided to make this as easy as 28 possible. 29 30 The first section of this document describes general information that is useful 31 to know when working in the LLVM infrastructure, and the second describes the 32 Core LLVM classes. In the future this manual will be extended with information 33 describing how to use extension libraries, such as dominator information, CFG 34 traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen 35 <http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template. 36 37 .. _general: 38 39 General Information 40 =================== 41 42 This section contains general information that is useful if you are working in 43 the LLVM source-base, but that isn't specific to any particular API. 44 45 .. _stl: 46 47 The C++ Standard Template Library 48 --------------------------------- 49 50 LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much 51 more than you are used to, or have seen before. Because of this, you might want 52 to do a little background reading in the techniques used and capabilities of the 53 library. There are many good pages that discuss the STL, and several books on 54 the subject that you can get, so it will not be discussed in this document. 55 56 Here are some useful links: 57 58 #. `cppreference.com 59 <http://en.cppreference.com/w/>`_ - an excellent 60 reference for the STL and other parts of the standard C++ library. 61 62 #. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly 63 book in the making. It has a decent Standard Library Reference that rivals 64 Dinkumware's, and is unfortunately no longer free since the book has been 65 published. 66 67 #. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_. 68 69 #. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a 70 useful `Introduction to the STL 71 <http://www.sgi.com/tech/stl/stl_introduction.html>`_. 72 73 #. `Bjarne Stroustrup's C++ Page 74 <http://www.research.att.com/%7Ebs/C++.html>`_. 75 76 #. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 77 (even better, get the book) 78 <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_. 79 80 You are also encouraged to take a look at the :doc:`LLVM Coding Standards 81 <CodingStandards>` guide which focuses on how to write maintainable code more 82 than where to put your curly braces. 83 84 .. _resources: 85 86 Other useful references 87 ----------------------- 88 89 #. `Using static and shared libraries across platforms 90 <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_ 91 92 .. _apis: 93 94 Important and useful LLVM APIs 95 ============================== 96 97 Here we highlight some LLVM APIs that are generally useful and good to know 98 about when writing transformations. 99 100 .. _isa: 101 102 The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates 103 ------------------------------------------------------ 104 105 The LLVM source-base makes extensive use of a custom form of RTTI. These 106 templates have many similarities to the C++ ``dynamic_cast<>`` operator, but 107 they don't have some drawbacks (primarily stemming from the fact that 108 ``dynamic_cast<>`` only works on classes that have a v-table). Because they are 109 used so often, you must know what they do and how they work. All of these 110 templates are defined in the ``llvm/Support/Casting.h`` (`doxygen 111 <http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very 112 rarely have to include this file directly). 113 114 ``isa<>``: 115 The ``isa<>`` operator works exactly like the Java "``instanceof``" operator. 116 It returns true or false depending on whether a reference or pointer points to 117 an instance of the specified class. This can be very useful for constraint 118 checking of various sorts (example below). 119 120 ``cast<>``: 121 The ``cast<>`` operator is a "checked cast" operation. It converts a pointer 122 or reference from a base class to a derived class, causing an assertion 123 failure if it is not really an instance of the right type. This should be 124 used in cases where you have some information that makes you believe that 125 something is of the right type. An example of the ``isa<>`` and ``cast<>`` 126 template is: 127 128 .. code-block:: c++ 129 130 static bool isLoopInvariant(const Value *V, const Loop *L) { 131 if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) 132 return true; 133 134 // Otherwise, it must be an instruction... 135 return !L->contains(cast<Instruction>(V)->getParent()); 136 } 137 138 Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``, 139 for that use the ``dyn_cast<>`` operator. 140 141 ``dyn_cast<>``: 142 The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see 143 if the operand is of the specified type, and if so, returns a pointer to it 144 (this operator does not work with references). If the operand is not of the 145 correct type, a null pointer is returned. Thus, this works very much like 146 the ``dynamic_cast<>`` operator in C++, and should be used in the same 147 circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if`` 148 statement or some other flow control statement like this: 149 150 .. code-block:: c++ 151 152 if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) { 153 // ... 154 } 155 156 This form of the ``if`` statement effectively combines together a call to 157 ``isa<>`` and a call to ``cast<>`` into one statement, which is very 158 convenient. 159 160 Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's 161 ``instanceof`` operator, can be abused. In particular, you should not use big 162 chained ``if/then/else`` blocks to check for lots of different variants of 163 classes. If you find yourself wanting to do this, it is much cleaner and more 164 efficient to use the ``InstVisitor`` class to dispatch over the instruction 165 type directly. 166 167 ``cast_or_null<>``: 168 The ``cast_or_null<>`` operator works just like the ``cast<>`` operator, 169 except that it allows for a null pointer as an argument (which it then 170 propagates). This can sometimes be useful, allowing you to combine several 171 null checks into one. 172 173 ``dyn_cast_or_null<>``: 174 The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>`` 175 operator, except that it allows for a null pointer as an argument (which it 176 then propagates). This can sometimes be useful, allowing you to combine 177 several null checks into one. 178 179 These five templates can be used with any classes, whether they have a v-table 180 or not. If you want to add support for these templates, see the document 181 :doc:`How to set up LLVM-style RTTI for your class hierarchy 182 <HowToSetUpLLVMStyleRTTI>` 183 184 .. _string_apis: 185 186 Passing strings (the ``StringRef`` and ``Twine`` classes) 187 --------------------------------------------------------- 188 189 Although LLVM generally does not do much string manipulation, we do have several 190 important APIs which take strings. Two important examples are the Value class 191 -- which has names for instructions, functions, etc. -- and the ``StringMap`` 192 class which is used extensively in LLVM and Clang. 193 194 These are generic classes, and they need to be able to accept strings which may 195 have embedded null characters. Therefore, they cannot simply take a ``const 196 char *``, and taking a ``const std::string&`` requires clients to perform a heap 197 allocation which is usually unnecessary. Instead, many LLVM APIs use a 198 ``StringRef`` or a ``const Twine&`` for passing strings efficiently. 199 200 .. _StringRef: 201 202 The ``StringRef`` class 203 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 204 205 The ``StringRef`` data type represents a reference to a constant string (a 206 character array and a length) and supports the common operations available on 207 ``std::string``, but does not require heap allocation. 208 209 It can be implicitly constructed using a C style null-terminated string, an 210 ``std::string``, or explicitly with a character pointer and length. For 211 example, the ``StringRef`` find function is declared as: 212 213 .. code-block:: c++ 214 215 iterator find(StringRef Key); 216 217 and clients can call it using any one of: 218 219 .. code-block:: c++ 220 221 Map.find("foo"); // Lookup "foo" 222 Map.find(std::string("bar")); // Lookup "bar" 223 Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" 224 225 Similarly, APIs which need to return a string may return a ``StringRef`` 226 instance, which can be used directly or converted to an ``std::string`` using 227 the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen 228 <http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more 229 information. 230 231 You should rarely use the ``StringRef`` class directly, because it contains 232 pointers to external memory it is not generally safe to store an instance of the 233 class (unless you know that the external storage will not be freed). 234 ``StringRef`` is small and pervasive enough in LLVM that it should always be 235 passed by value. 236 237 The ``Twine`` class 238 ^^^^^^^^^^^^^^^^^^^ 239 240 The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__) 241 class is an efficient way for APIs to accept concatenated strings. For example, 242 a common LLVM paradigm is to name one instruction based on the name of another 243 instruction with a suffix, for example: 244 245 .. code-block:: c++ 246 247 New = CmpInst::Create(..., SO->getName() + ".cmp"); 248 249 The ``Twine`` class is effectively a lightweight `rope 250 <http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to 251 temporary (stack allocated) objects. Twines can be implicitly constructed as 252 the result of the plus operator applied to strings (i.e., a C strings, an 253 ``std::string``, or a ``StringRef``). The twine delays the actual concatenation 254 of strings until it is actually required, at which point it can be efficiently 255 rendered directly into a character array. This avoids unnecessary heap 256 allocation involved in constructing the temporary results of string 257 concatenation. See ``llvm/ADT/Twine.h`` (`doxygen 258 <http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>` 259 for more information. 260 261 As with a ``StringRef``, ``Twine`` objects point to external memory and should 262 almost never be stored or mentioned directly. They are intended solely for use 263 when defining a function which should be able to efficiently accept concatenated 264 strings. 265 266 .. _error_apis: 267 268 Error handling 269 -------------- 270 271 Proper error handling helps us identify bugs in our code, and helps end-users 272 understand errors in their tool usage. Errors fall into two broad categories: 273 *programmatic* and *recoverable*, with different strategies for handling and 274 reporting. 275 276 Programmatic Errors 277 ^^^^^^^^^^^^^^^^^^^ 278 279 Programmatic errors are violations of program invariants or API contracts, and 280 represent bugs within the program itself. Our aim is to document invariants, and 281 to abort quickly at the point of failure (providing some basic diagnostic) when 282 invariants are broken at runtime. 283 284 The fundamental tools for handling programmatic errors are assertions and the 285 llvm_unreachable function. Assertions are used to express invariant conditions, 286 and should include a message describing the invariant: 287 288 .. code-block:: c++ 289 290 assert(isPhysReg(R) && "All virt regs should have been allocated already."); 291 292 The llvm_unreachable function can be used to document areas of control flow 293 that should never be entered if the program invariants hold: 294 295 .. code-block:: c++ 296 297 enum { Foo, Bar, Baz } X = foo(); 298 299 switch (X) { 300 case Foo: /* Handle Foo */; break; 301 case Bar: /* Handle Bar */; break; 302 default: 303 llvm_unreachable("X should be Foo or Bar here"); 304 } 305 306 Recoverable Errors 307 ^^^^^^^^^^^^^^^^^^ 308 309 Recoverable errors represent an error in the program's environment, for example 310 a resource failure (a missing file, a dropped network connection, etc.), or 311 malformed input. These errors should be detected and communicated to a level of 312 the program where they can be handled appropriately. Handling the error may be 313 as simple as reporting the issue to the user, or it may involve attempts at 314 recovery. 315 316 Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme 317 represents errors using function return values, similar to classic C integer 318 error codes, or C++'s ``std::error_code``. However, the ``Error`` class is 319 actually a lightweight wrapper for user-defined error types, allowing arbitrary 320 information to be attached to describe the error. This is similar to the way C++ 321 exceptions allow throwing of user-defined types. 322 323 Success values are created by calling ``Error::success()``: 324 325 .. code-block:: c++ 326 327 Error foo() { 328 // Do something. 329 // Return success. 330 return Error::success(); 331 } 332 333 Success values are very cheap to construct and return - they have minimal 334 impact on program performance. 335 336 Failure values are constructed using ``make_error<T>``, where ``T`` is any class 337 that inherits from the ErrorInfo utility: 338 339 .. code-block:: c++ 340 341 class MyError : public ErrorInfo<MyError> { 342 public: 343 MyError(std::string Msg) : Msg(Msg) {} 344 void log(OStream &OS) const override { OS << "MyError - " << Msg; } 345 static char ID; 346 private: 347 std::string Msg; 348 }; 349 350 char MyError::ID = 0; // In MyError.cpp 351 352 Error bar() { 353 if (checkErrorCondition) 354 return make_error<MyError>("Error condition detected"); 355 356 // No error - proceed with bar. 357 358 // Return success value. 359 return Error::success(); 360 } 361 362 Error values can be implicitly converted to bool: true for error, false for 363 success, enabling the following idiom: 364 365 .. code-block:: c++ 366 367 Error mayFail(); 368 369 Error foo() { 370 if (auto Err = mayFail()) 371 return Err; 372 // Success! We can proceed. 373 ... 374 375 For functions that can fail but need to return a value the ``Expected<T>`` 376 utility can be used. Values of this type can be constructed with either a 377 ``T``, or a ``Error``. Expected<T> values are also implicitly convertible to 378 boolean, but with the opposite convention to Error: true for success, false for 379 error. If success, the ``T`` value can be accessed via the dereference operator. 380 If failure, the ``Error`` value can be extracted using the ``takeError()`` 381 method. Idiomatic usage looks like: 382 383 .. code-block:: c++ 384 385 Expected<float> parseAndSquareRoot(IStream &IS) { 386 float f; 387 OS >> f; 388 if (f < 0) 389 return make_error<FloatingPointError>(...); 390 return sqrt(f); 391 } 392 393 Error foo(IStream &IS) { 394 if (auto SqrtOrErr = parseAndSquartRoot(IS)) { 395 float Sqrt = *SqrtOrErr; 396 // ... 397 } else 398 return SqrtOrErr.takeError(); 399 } 400 401 All Error instances, whether success or failure, must be either checked or 402 moved from (via std::move or a return) before they are destructed. Accidentally 403 discarding an unchecked error will cause a program abort at the point where the 404 unchecked value's destructor is run, making it easy to identify and fix 405 violations of this rule. 406 407 Success values are considered checked once they have been tested (by invoking 408 the boolean conversion operator): 409 410 .. code-block:: c++ 411 412 if (auto Err = canFail(...)) 413 return Err; // Failure value - move error to caller. 414 415 // Safe to continue: Err was checked. 416 417 In contrast, the following code will always cause an abort, regardless of the 418 return value of ``foo``: 419 420 .. code-block:: c++ 421 422 canFail(); 423 // Program will always abort here, even if canFail() returns Success, since 424 // the value is not checked. 425 426 Failure values are considered checked once a handler for the error type has 427 been activated: 428 429 .. code-block:: c++ 430 431 auto Err = canFail(...); 432 if (auto Err2 = 433 handleErrors(std::move(Err), 434 [](std::unique_ptr<MyError> M) { 435 // Try to handle 'M'. If successful, return a success value from 436 // the handler. 437 if (tryToHandle(M)) 438 return Error::success(); 439 440 // We failed to handle 'M' - return it from the handler. 441 // This value will be passed back from catchErrors and 442 // wind up in Err2, where it will be returned from this function. 443 return Error(std::move(M)); 444 }))) 445 return Err2; 446 447 448 .. _function_apis: 449 450 More information on Error and its related utilities can be found in the 451 Error.h header file. 452 453 Passing functions and other callable objects 454 -------------------------------------------- 455 456 Sometimes you may want a function to be passed a callback object. In order to 457 support lambda expressions and other function objects, you should not use the 458 traditional C approach of taking a function pointer and an opaque cookie: 459 460 .. code-block:: c++ 461 462 void takeCallback(bool (*Callback)(Function *, void *), void *Cookie); 463 464 Instead, use one of the following approaches: 465 466 Function template 467 ^^^^^^^^^^^^^^^^^ 468 469 If you don't mind putting the definition of your function into a header file, 470 make it a function template that is templated on the callable type. 471 472 .. code-block:: c++ 473 474 template<typename Callable> 475 void takeCallback(Callable Callback) { 476 Callback(1, 2, 3); 477 } 478 479 The ``function_ref`` class template 480 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 481 482 The ``function_ref`` 483 (`doxygen <http://llvm.org/docs/doxygen/html/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class 484 template represents a reference to a callable object, templated over the type 485 of the callable. This is a good choice for passing a callback to a function, 486 if you don't need to hold onto the callback after the function returns. In this 487 way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to 488 ``std::string``. 489 490 ``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from 491 any callable object that can be called with arguments of type ``Param1``, 492 ``Param2``, ..., and returns a value that can be converted to type ``Ret``. 493 For example: 494 495 .. code-block:: c++ 496 497 void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) { 498 for (BasicBlock &BB : *F) 499 if (Callback(&BB)) 500 return; 501 } 502 503 can be called using: 504 505 .. code-block:: c++ 506 507 visitBasicBlocks(F, [&](BasicBlock *BB) { 508 if (process(BB)) 509 return isEmpty(BB); 510 return false; 511 }); 512 513 Note that a ``function_ref`` object contains pointers to external memory, so it 514 is not generally safe to store an instance of the class (unless you know that 515 the external storage will not be freed). If you need this ability, consider 516 using ``std::function``. ``function_ref`` is small enough that it should always 517 be passed by value. 518 519 .. _DEBUG: 520 521 The ``DEBUG()`` macro and ``-debug`` option 522 ------------------------------------------- 523 524 Often when working on your pass you will put a bunch of debugging printouts and 525 other code into your pass. After you get it working, you want to remove it, but 526 you may need it again in the future (to work out new bugs that you run across). 527 528 Naturally, because of this, you don't want to delete the debug printouts, but 529 you don't want them to always be noisy. A standard compromise is to comment 530 them out, allowing you to enable them if you need them in the future. 531 532 The ``llvm/Support/Debug.h`` (`doxygen 533 <http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named 534 ``DEBUG()`` that is a much nicer solution to this problem. Basically, you can 535 put arbitrary code into the argument of the ``DEBUG`` macro, and it is only 536 executed if '``opt``' (or any other tool) is run with the '``-debug``' command 537 line argument: 538 539 .. code-block:: c++ 540 541 DEBUG(errs() << "I am here!\n"); 542 543 Then you can run your pass like this: 544 545 .. code-block:: none 546 547 $ opt < a.bc > /dev/null -mypass 548 <no output> 549 $ opt < a.bc > /dev/null -mypass -debug 550 I am here! 551 552 Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not 553 have to create "yet another" command line option for the debug output for your 554 pass. Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they 555 do not cause a performance impact at all (for the same reason, they should also 556 not contain side-effects!). 557 558 One additional nice thing about the ``DEBUG()`` macro is that you can enable or 559 disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set 560 DebugFlag=1``" from the gdb if the program is running. If the program hasn't 561 been started yet, you can always just run it with ``-debug``. 562 563 .. _DEBUG_TYPE: 564 565 Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option 566 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 567 568 Sometimes you may find yourself in a situation where enabling ``-debug`` just 569 turns on **too much** information (such as when working on the code generator). 570 If you want to enable debug information with more fine-grained control, you 571 should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as 572 follows: 573 574 .. code-block:: c++ 575 576 #define DEBUG_TYPE "foo" 577 DEBUG(errs() << "'foo' debug type\n"); 578 #undef DEBUG_TYPE 579 #define DEBUG_TYPE "bar" 580 DEBUG(errs() << "'bar' debug type\n")); 581 #undef DEBUG_TYPE 582 583 Then you can run your pass like this: 584 585 .. code-block:: none 586 587 $ opt < a.bc > /dev/null -mypass 588 <no output> 589 $ opt < a.bc > /dev/null -mypass -debug 590 'foo' debug type 591 'bar' debug type 592 $ opt < a.bc > /dev/null -mypass -debug-only=foo 593 'foo' debug type 594 $ opt < a.bc > /dev/null -mypass -debug-only=bar 595 'bar' debug type 596 $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar 597 'foo' debug type 598 'bar' debug type 599 600 Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file, 601 to specify the debug type for the entire module. Be careful that you only do 602 this after including Debug.h and not around any #include of headers. Also, you 603 should use names more meaningful than "foo" and "bar", because there is no 604 system in place to ensure that names do not conflict. If two different modules 605 use the same string, they will all be turned on when the name is specified. 606 This allows, for example, all debug information for instruction scheduling to be 607 enabled with ``-debug-only=InstrSched``, even if the source lives in multiple 608 files. The name must not include a comma (,) as that is used to separate the 609 arguments of the ``-debug-only`` option. 610 611 For performance reasons, -debug-only is not available in optimized build 612 (``--enable-optimized``) of LLVM. 613 614 The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would 615 like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It 616 takes an additional first parameter, which is the type to use. For example, the 617 preceding example could be written as: 618 619 .. code-block:: c++ 620 621 DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); 622 DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); 623 624 .. _Statistic: 625 626 The ``Statistic`` class & ``-stats`` option 627 ------------------------------------------- 628 629 The ``llvm/ADT/Statistic.h`` (`doxygen 630 <http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class 631 named ``Statistic`` that is used as a unified way to keep track of what the LLVM 632 compiler is doing and how effective various optimizations are. It is useful to 633 see what optimizations are contributing to making a particular program run 634 faster. 635 636 Often you may run your pass on some big program, and you're interested to see 637 how many times it makes a certain transformation. Although you can do this with 638 hand inspection, or some ad-hoc method, this is a real pain and not very useful 639 for big programs. Using the ``Statistic`` class makes it very easy to keep 640 track of this information, and the calculated information is presented in a 641 uniform manner with the rest of the passes being executed. 642 643 There are many examples of ``Statistic`` uses, but the basics of using it are as 644 follows: 645 646 #. Define your statistic like this: 647 648 .. code-block:: c++ 649 650 #define DEBUG_TYPE "mypassname" // This goes before any #includes. 651 STATISTIC(NumXForms, "The # of times I did stuff"); 652 653 The ``STATISTIC`` macro defines a static variable, whose name is specified by 654 the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and 655 the description is taken from the second argument. The variable defined 656 ("NumXForms" in this case) acts like an unsigned integer. 657 658 #. Whenever you make a transformation, bump the counter: 659 660 .. code-block:: c++ 661 662 ++NumXForms; // I did stuff! 663 664 That's all you have to do. To get '``opt``' to print out the statistics 665 gathered, use the '``-stats``' option: 666 667 .. code-block:: none 668 669 $ opt -stats -mypassname < program.bc > /dev/null 670 ... statistics output ... 671 672 Note that in order to use the '``-stats``' option, LLVM must be 673 compiled with assertions enabled. 674 675 When running ``opt`` on a C file from the SPEC benchmark suite, it gives a 676 report that looks like this: 677 678 .. code-block:: none 679 680 7646 bitcodewriter - Number of normal instructions 681 725 bitcodewriter - Number of oversized instructions 682 129996 bitcodewriter - Number of bitcode bytes written 683 2817 raise - Number of insts DCEd or constprop'd 684 3213 raise - Number of cast-of-self removed 685 5046 raise - Number of expression trees converted 686 75 raise - Number of other getelementptr's formed 687 138 raise - Number of load/store peepholes 688 42 deadtypeelim - Number of unused typenames removed from symtab 689 392 funcresolve - Number of varargs functions resolved 690 27 globaldce - Number of global variables removed 691 2 adce - Number of basic blocks removed 692 134 cee - Number of branches revectored 693 49 cee - Number of setcc instruction eliminated 694 532 gcse - Number of loads removed 695 2919 gcse - Number of instructions removed 696 86 indvars - Number of canonical indvars added 697 87 indvars - Number of aux indvars removed 698 25 instcombine - Number of dead inst eliminate 699 434 instcombine - Number of insts combined 700 248 licm - Number of load insts hoisted 701 1298 licm - Number of insts hoisted to a loop pre-header 702 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) 703 75 mem2reg - Number of alloca's promoted 704 1444 cfgsimplify - Number of blocks simplified 705 706 Obviously, with so many optimizations, having a unified framework for this stuff 707 is very nice. Making your pass fit well into the framework makes it more 708 maintainable and useful. 709 710 .. _ViewGraph: 711 712 Viewing graphs while debugging code 713 ----------------------------------- 714 715 Several of the important data structures in LLVM are graphs: for example CFGs 716 made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM 717 :ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection 718 DAGs <SelectionDAG>`. In many cases, while debugging various parts of the 719 compiler, it is nice to instantly visualize these graphs. 720 721 LLVM provides several callbacks that are available in a debug build to do 722 exactly that. If you call the ``Function::viewCFG()`` method, for example, the 723 current LLVM tool will pop up a window containing the CFG for the function where 724 each basic block is a node in the graph, and each node contains the instructions 725 in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does 726 not include the instructions), the ``MachineFunction::viewCFG()`` and 727 ``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()`` 728 methods. Within GDB, for example, you can usually use something like ``call 729 DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to 730 these functions in your code in places you want to debug. 731 732 Getting this to work requires a small amount of setup. On Unix systems 733 with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make 734 sure 'dot' and 'gv' are in your path. If you are running on Mac OS X, download 735 and install the Mac OS X `Graphviz program 736 <http://www.pixelglow.com/graphviz/>`_ and add 737 ``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to 738 your path. The programs need not be present when configuring, building or 739 running LLVM and can simply be installed when needed during an active debug 740 session. 741 742 ``SelectionDAG`` has been extended to make it easier to locate *interesting* 743 nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node, 744 "color")``, then the next ``call DAG.viewGraph()`` would highlight the node in 745 the specified color (choices of colors can be found at `colors 746 <http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes 747 can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can 748 be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.) 749 If you want to restart and clear all the current graph attributes, then you can 750 ``call DAG.clearGraphAttrs()``. 751 752 Note that graph visualization features are compiled out of Release builds to 753 reduce file size. This means that you need a Debug+Asserts or Release+Asserts 754 build to use these features. 755 756 .. _datastructure: 757 758 Picking the Right Data Structure for a Task 759 =========================================== 760 761 LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we 762 commonly use STL data structures. This section describes the trade-offs you 763 should consider when you pick one. 764 765 The first step is a choose your own adventure: do you want a sequential 766 container, a set-like container, or a map-like container? The most important 767 thing when choosing a container is the algorithmic properties of how you plan to 768 access the container. Based on that, you should use: 769 770 771 * a :ref:`map-like <ds_map>` container if you need efficient look-up of a 772 value based on another value. Map-like containers also support efficient 773 queries for containment (whether a key is in the map). Map-like containers 774 generally do not support efficient reverse mapping (values to keys). If you 775 need that, use two maps. Some map-like containers also support efficient 776 iteration through the keys in sorted order. Map-like containers are the most 777 expensive sort, only use them if you need one of these capabilities. 778 779 * a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into 780 a container that automatically eliminates duplicates. Some set-like 781 containers support efficient iteration through the elements in sorted order. 782 Set-like containers are more expensive than sequential containers. 783 784 * a :ref:`sequential <ds_sequential>` container provides the most efficient way 785 to add elements and keeps track of the order they are added to the collection. 786 They permit duplicates and support efficient iteration, but do not support 787 efficient look-up based on a key. 788 789 * a :ref:`string <ds_string>` container is a specialized sequential container or 790 reference structure that is used for character or byte arrays. 791 792 * a :ref:`bit <ds_bit>` container provides an efficient way to store and 793 perform set operations on sets of numeric id's, while automatically 794 eliminating duplicates. Bit containers require a maximum of 1 bit for each 795 identifier you want to store. 796 797 Once the proper category of container is determined, you can fine tune the 798 memory use, constant factors, and cache behaviors of access by intelligently 799 picking a member of the category. Note that constant factors and cache behavior 800 can be a big deal. If you have a vector that usually only contains a few 801 elements (but could contain many), for example, it's much better to use 802 :ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so 803 avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding 804 the elements to the container. 805 806 .. _ds_sequential: 807 808 Sequential Containers (std::vector, std::list, etc) 809 --------------------------------------------------- 810 811 There are a variety of sequential containers available for you, based on your 812 needs. Pick the first in this section that will do what you want. 813 814 .. _dss_arrayref: 815 816 llvm/ADT/ArrayRef.h 817 ^^^^^^^^^^^^^^^^^^^ 818 819 The ``llvm::ArrayRef`` class is the preferred class to use in an interface that 820 accepts a sequential list of elements in memory and just reads from them. By 821 taking an ``ArrayRef``, the API can be passed a fixed size array, an 822 ``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous 823 in memory. 824 825 .. _dss_fixedarrays: 826 827 Fixed Size Arrays 828 ^^^^^^^^^^^^^^^^^ 829 830 Fixed size arrays are very simple and very fast. They are good if you know 831 exactly how many elements you have, or you have a (low) upper bound on how many 832 you have. 833 834 .. _dss_heaparrays: 835 836 Heap Allocated Arrays 837 ^^^^^^^^^^^^^^^^^^^^^ 838 839 Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good 840 if the number of elements is variable, if you know how many elements you will 841 need before the array is allocated, and if the array is usually large (if not, 842 consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated 843 array is the cost of the new/delete (aka malloc/free). Also note that if you 844 are allocating an array of a type with a constructor, the constructor and 845 destructors will be run for every element in the array (re-sizable vectors only 846 construct those elements actually used). 847 848 .. _dss_tinyptrvector: 849 850 llvm/ADT/TinyPtrVector.h 851 ^^^^^^^^^^^^^^^^^^^^^^^^ 852 853 ``TinyPtrVector<Type>`` is a highly specialized collection class that is 854 optimized to avoid allocation in the case when a vector has zero or one 855 elements. It has two major restrictions: 1) it can only hold values of pointer 856 type, and 2) it cannot hold a null pointer. 857 858 Since this container is highly specialized, it is rarely used. 859 860 .. _dss_smallvector: 861 862 llvm/ADT/SmallVector.h 863 ^^^^^^^^^^^^^^^^^^^^^^ 864 865 ``SmallVector<Type, N>`` is a simple class that looks and smells just like 866 ``vector<Type>``: it supports efficient iteration, lays out elements in memory 867 order (so you can do pointer arithmetic between elements), supports efficient 868 push_back/pop_back operations, supports efficient random access to its elements, 869 etc. 870 871 The advantage of SmallVector is that it allocates space for some number of 872 elements (N) **in the object itself**. Because of this, if the SmallVector is 873 dynamically smaller than N, no malloc is performed. This can be a big win in 874 cases where the malloc/free call is far more expensive than the code that 875 fiddles around with the elements. 876 877 This is good for vectors that are "usually small" (e.g. the number of 878 predecessors/successors of a block is usually less than 8). On the other hand, 879 this makes the size of the SmallVector itself large, so you don't want to 880 allocate lots of them (doing so will waste a lot of space). As such, 881 SmallVectors are most useful when on the stack. 882 883 SmallVector also provides a nice portable and efficient replacement for 884 ``alloca``. 885 886 .. note:: 887 888 Prefer to use ``SmallVectorImpl<T>`` as a parameter type. 889 890 In APIs that don't care about the "small size" (most?), prefer to use 891 the ``SmallVectorImpl<T>`` class, which is basically just the "vector 892 header" (and methods) without the elements allocated after it. Note that 893 ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the 894 conversion is implicit and costs nothing. E.g. 895 896 .. code-block:: c++ 897 898 // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>. 899 hardcodedSmallSize(SmallVector<Foo, 2> &Out); 900 // GOOD: Clients can pass any SmallVector<Foo, N>. 901 allowsAnySmallSize(SmallVectorImpl<Foo> &Out); 902 903 void someFunc() { 904 SmallVector<Foo, 8> Vec; 905 hardcodedSmallSize(Vec); // Error. 906 allowsAnySmallSize(Vec); // Works. 907 } 908 909 Even though it has "``Impl``" in the name, this is so widely used that 910 it really isn't "private to the implementation" anymore. A name like 911 ``SmallVectorHeader`` would be more appropriate. 912 913 .. _dss_vector: 914 915 <vector> 916 ^^^^^^^^ 917 918 ``std::vector`` is well loved and respected. It is useful when SmallVector 919 isn't: when the size of the vector is often large (thus the small optimization 920 will rarely be a benefit) or if you will be allocating many instances of the 921 vector itself (which would waste space for elements that aren't in the 922 container). vector is also useful when interfacing with code that expects 923 vectors :). 924 925 One worthwhile note about std::vector: avoid code like this: 926 927 .. code-block:: c++ 928 929 for ( ... ) { 930 std::vector<foo> V; 931 // make use of V. 932 } 933 934 Instead, write this as: 935 936 .. code-block:: c++ 937 938 std::vector<foo> V; 939 for ( ... ) { 940 // make use of V. 941 V.clear(); 942 } 943 944 Doing so will save (at least) one heap allocation and free per iteration of the 945 loop. 946 947 .. _dss_deque: 948 949 <deque> 950 ^^^^^^^ 951 952 ``std::deque`` is, in some senses, a generalized version of ``std::vector``. 953 Like ``std::vector``, it provides constant time random access and other similar 954 properties, but it also provides efficient access to the front of the list. It 955 does not guarantee continuity of elements within memory. 956 957 In exchange for this extra flexibility, ``std::deque`` has significantly higher 958 constant factor costs than ``std::vector``. If possible, use ``std::vector`` or 959 something cheaper. 960 961 .. _dss_list: 962 963 <list> 964 ^^^^^^ 965 966 ``std::list`` is an extremely inefficient class that is rarely useful. It 967 performs a heap allocation for every element inserted into it, thus having an 968 extremely high constant factor, particularly for small data types. 969 ``std::list`` also only supports bidirectional iteration, not random access 970 iteration. 971 972 In exchange for this high cost, std::list supports efficient access to both ends 973 of the list (like ``std::deque``, but unlike ``std::vector`` or 974 ``SmallVector``). In addition, the iterator invalidation characteristics of 975 std::list are stronger than that of a vector class: inserting or removing an 976 element into the list does not invalidate iterator or pointers to other elements 977 in the list. 978 979 .. _dss_ilist: 980 981 llvm/ADT/ilist.h 982 ^^^^^^^^^^^^^^^^ 983 984 ``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive, 985 because it requires the element to store and provide access to the prev/next 986 pointers for the list. 987 988 ``ilist`` has the same drawbacks as ``std::list``, and additionally requires an 989 ``ilist_traits`` implementation for the element type, but it provides some novel 990 characteristics. In particular, it can efficiently store polymorphic objects, 991 the traits class is informed when an element is inserted or removed from the 992 list, and ``ilist``\ s are guaranteed to support a constant-time splice 993 operation. 994 995 These properties are exactly what we want for things like ``Instruction``\ s and 996 basic blocks, which is why these are implemented with ``ilist``\ s. 997 998 Related classes of interest are explained in the following subsections: 999 1000 * :ref:`ilist_traits <dss_ilist_traits>` 1001 1002 * :ref:`iplist <dss_iplist>` 1003 1004 * :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>` 1005 1006 * :ref:`Sentinels <dss_ilist_sentinel>` 1007 1008 .. _dss_packedvector: 1009 1010 llvm/ADT/PackedVector.h 1011 ^^^^^^^^^^^^^^^^^^^^^^^ 1012 1013 Useful for storing a vector of values using only a few number of bits for each 1014 value. Apart from the standard operations of a vector-like container, it can 1015 also perform an 'or' set operation. 1016 1017 For example: 1018 1019 .. code-block:: c++ 1020 1021 enum State { 1022 None = 0x0, 1023 FirstCondition = 0x1, 1024 SecondCondition = 0x2, 1025 Both = 0x3 1026 }; 1027 1028 State get() { 1029 PackedVector<State, 2> Vec1; 1030 Vec1.push_back(FirstCondition); 1031 1032 PackedVector<State, 2> Vec2; 1033 Vec2.push_back(SecondCondition); 1034 1035 Vec1 |= Vec2; 1036 return Vec1[0]; // returns 'Both'. 1037 } 1038 1039 .. _dss_ilist_traits: 1040 1041 ilist_traits 1042 ^^^^^^^^^^^^ 1043 1044 ``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>`` 1045 (and consequently ``ilist<T>``) publicly derive from this traits class. 1046 1047 .. _dss_iplist: 1048 1049 iplist 1050 ^^^^^^ 1051 1052 ``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower 1053 interface. Notably, inserters from ``T&`` are absent. 1054 1055 ``ilist_traits<T>`` is a public base of this class and can be used for a wide 1056 variety of customizations. 1057 1058 .. _dss_ilist_node: 1059 1060 llvm/ADT/ilist_node.h 1061 ^^^^^^^^^^^^^^^^^^^^^ 1062 1063 ``ilist_node<T>`` implements the forward and backward links that are expected 1064 by the ``ilist<T>`` (and analogous containers) in the default manner. 1065 1066 ``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually 1067 ``T`` publicly derives from ``ilist_node<T>``. 1068 1069 .. _dss_ilist_sentinel: 1070 1071 Sentinels 1072 ^^^^^^^^^ 1073 1074 ``ilist``\ s have another specialty that must be considered. To be a good 1075 citizen in the C++ ecosystem, it needs to support the standard container 1076 operations, such as ``begin`` and ``end`` iterators, etc. Also, the 1077 ``operator--`` must work correctly on the ``end`` iterator in the case of 1078 non-empty ``ilist``\ s. 1079 1080 The only sensible solution to this problem is to allocate a so-called *sentinel* 1081 along with the intrusive list, which serves as the ``end`` iterator, providing 1082 the back-link to the last element. However conforming to the C++ convention it 1083 is illegal to ``operator++`` beyond the sentinel and it also must not be 1084 dereferenced. 1085 1086 These constraints allow for some implementation freedom to the ``ilist`` how to 1087 allocate and store the sentinel. The corresponding policy is dictated by 1088 ``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need 1089 for a sentinel arises. 1090 1091 While the default policy is sufficient in most cases, it may break down when 1092 ``T`` does not provide a default constructor. Also, in the case of many 1093 instances of ``ilist``\ s, the memory overhead of the associated sentinels is 1094 wasted. To alleviate the situation with numerous and voluminous 1095 ``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*. 1096 1097 Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which 1098 superpose the sentinel with the ``ilist`` instance in memory. Pointer 1099 arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s 1100 ``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves 1101 as the back-link of the sentinel. This is the only field in the ghostly 1102 sentinel which can be legally accessed. 1103 1104 .. _dss_other: 1105 1106 Other Sequential Container options 1107 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1108 1109 Other STL containers are available, such as ``std::string``. 1110 1111 There are also various STL adapter classes such as ``std::queue``, 1112 ``std::priority_queue``, ``std::stack``, etc. These provide simplified access 1113 to an underlying container but don't affect the cost of the container itself. 1114 1115 .. _ds_string: 1116 1117 String-like containers 1118 ---------------------- 1119 1120 There are a variety of ways to pass around and use strings in C and C++, and 1121 LLVM adds a few new options to choose from. Pick the first option on this list 1122 that will do what you need, they are ordered according to their relative cost. 1123 1124 Note that it is generally preferred to *not* pass strings around as ``const 1125 char*``'s. These have a number of problems, including the fact that they 1126 cannot represent embedded nul ("\0") characters, and do not have a length 1127 available efficiently. The general replacement for '``const char*``' is 1128 StringRef. 1129 1130 For more information on choosing string containers for APIs, please see 1131 :ref:`Passing Strings <string_apis>`. 1132 1133 .. _dss_stringref: 1134 1135 llvm/ADT/StringRef.h 1136 ^^^^^^^^^^^^^^^^^^^^ 1137 1138 The StringRef class is a simple value class that contains a pointer to a 1139 character and a length, and is quite related to the :ref:`ArrayRef 1140 <dss_arrayref>` class (but specialized for arrays of characters). Because 1141 StringRef carries a length with it, it safely handles strings with embedded nul 1142 characters in it, getting the length does not require a strlen call, and it even 1143 has very convenient APIs for slicing and dicing the character range that it 1144 represents. 1145 1146 StringRef is ideal for passing simple strings around that are known to be live, 1147 either because they are C string literals, std::string, a C array, or a 1148 SmallVector. Each of these cases has an efficient implicit conversion to 1149 StringRef, which doesn't result in a dynamic strlen being executed. 1150 1151 StringRef has a few major limitations which make more powerful string containers 1152 useful: 1153 1154 #. You cannot directly convert a StringRef to a 'const char*' because there is 1155 no way to add a trailing nul (unlike the .c_str() method on various stronger 1156 classes). 1157 1158 #. StringRef doesn't own or keep alive the underlying string bytes. 1159 As such it can easily lead to dangling pointers, and is not suitable for 1160 embedding in datastructures in most cases (instead, use an std::string or 1161 something like that). 1162 1163 #. For the same reason, StringRef cannot be used as the return value of a 1164 method if the method "computes" the result string. Instead, use std::string. 1165 1166 #. StringRef's do not allow you to mutate the pointed-to string bytes and it 1167 doesn't allow you to insert or remove bytes from the range. For editing 1168 operations like this, it interoperates with the :ref:`Twine <dss_twine>` 1169 class. 1170 1171 Because of its strengths and limitations, it is very common for a function to 1172 take a StringRef and for a method on an object to return a StringRef that points 1173 into some string that it owns. 1174 1175 .. _dss_twine: 1176 1177 llvm/ADT/Twine.h 1178 ^^^^^^^^^^^^^^^^ 1179 1180 The Twine class is used as an intermediary datatype for APIs that want to take a 1181 string that can be constructed inline with a series of concatenations. Twine 1182 works by forming recursive instances of the Twine datatype (a simple value 1183 object) on the stack as temporary objects, linking them together into a tree 1184 which is then linearized when the Twine is consumed. Twine is only safe to use 1185 as the argument to a function, and should always be a const reference, e.g.: 1186 1187 .. code-block:: c++ 1188 1189 void foo(const Twine &T); 1190 ... 1191 StringRef X = ... 1192 unsigned i = ... 1193 foo(X + "." + Twine(i)); 1194 1195 This example forms a string like "blarg.42" by concatenating the values 1196 together, and does not form intermediate strings containing "blarg" or "blarg.". 1197 1198 Because Twine is constructed with temporary objects on the stack, and because 1199 these instances are destroyed at the end of the current statement, it is an 1200 inherently dangerous API. For example, this simple variant contains undefined 1201 behavior and will probably crash: 1202 1203 .. code-block:: c++ 1204 1205 void foo(const Twine &T); 1206 ... 1207 StringRef X = ... 1208 unsigned i = ... 1209 const Twine &Tmp = X + "." + Twine(i); 1210 foo(Tmp); 1211 1212 ... because the temporaries are destroyed before the call. That said, Twine's 1213 are much more efficient than intermediate std::string temporaries, and they work 1214 really well with StringRef. Just be aware of their limitations. 1215 1216 .. _dss_smallstring: 1217 1218 llvm/ADT/SmallString.h 1219 ^^^^^^^^^^^^^^^^^^^^^^ 1220 1221 SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some 1222 convenience APIs like += that takes StringRef's. SmallString avoids allocating 1223 memory in the case when the preallocated space is enough to hold its data, and 1224 it calls back to general heap allocation when required. Since it owns its data, 1225 it is very safe to use and supports full mutation of the string. 1226 1227 Like SmallVector's, the big downside to SmallString is their sizeof. While they 1228 are optimized for small strings, they themselves are not particularly small. 1229 This means that they work great for temporary scratch buffers on the stack, but 1230 should not generally be put into the heap: it is very rare to see a SmallString 1231 as the member of a frequently-allocated heap data structure or returned 1232 by-value. 1233 1234 .. _dss_stdstring: 1235 1236 std::string 1237 ^^^^^^^^^^^ 1238 1239 The standard C++ std::string class is a very general class that (like 1240 SmallString) owns its underlying data. sizeof(std::string) is very reasonable 1241 so it can be embedded into heap data structures and returned by-value. On the 1242 other hand, std::string is highly inefficient for inline editing (e.g. 1243 concatenating a bunch of stuff together) and because it is provided by the 1244 standard library, its performance characteristics depend a lot of the host 1245 standard library (e.g. libc++ and MSVC provide a highly optimized string class, 1246 GCC contains a really slow implementation). 1247 1248 The major disadvantage of std::string is that almost every operation that makes 1249 them larger can allocate memory, which is slow. As such, it is better to use 1250 SmallVector or Twine as a scratch buffer, but then use std::string to persist 1251 the result. 1252 1253 .. _ds_set: 1254 1255 Set-Like Containers (std::set, SmallSet, SetVector, etc) 1256 -------------------------------------------------------- 1257 1258 Set-like containers are useful when you need to canonicalize multiple values 1259 into a single representation. There are several different choices for how to do 1260 this, providing various trade-offs. 1261 1262 .. _dss_sortedvectorset: 1263 1264 A sorted 'vector' 1265 ^^^^^^^^^^^^^^^^^ 1266 1267 If you intend to insert a lot of elements, then do a lot of queries, a great 1268 approach is to use a vector (or other sequential container) with 1269 std::sort+std::unique to remove duplicates. This approach works really well if 1270 your usage pattern has these two distinct phases (insert then query), and can be 1271 coupled with a good choice of :ref:`sequential container <ds_sequential>`. 1272 1273 This combination provides the several nice properties: the result data is 1274 contiguous in memory (good for cache locality), has few allocations, is easy to 1275 address (iterators in the final vector are just indices or pointers), and can be 1276 efficiently queried with a standard binary search (e.g. 1277 ``std::lower_bound``; if you want the whole range of elements comparing 1278 equal, use ``std::equal_range``). 1279 1280 .. _dss_smallset: 1281 1282 llvm/ADT/SmallSet.h 1283 ^^^^^^^^^^^^^^^^^^^ 1284 1285 If you have a set-like data structure that is usually small and whose elements 1286 are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has 1287 space for N elements in place (thus, if the set is dynamically smaller than N, 1288 no malloc traffic is required) and accesses them with a simple linear search. 1289 When the set grows beyond N elements, it allocates a more expensive 1290 representation that guarantees efficient access (for most types, it falls back 1291 to :ref:`std::set <dss_set>`, but for pointers it uses something far better, 1292 :ref:`SmallPtrSet <dss_smallptrset>`. 1293 1294 The magic of this class is that it handles small sets extremely efficiently, but 1295 gracefully handles extremely large sets without loss of efficiency. The 1296 drawback is that the interface is quite small: it supports insertion, queries 1297 and erasing, but does not support iteration. 1298 1299 .. _dss_smallptrset: 1300 1301 llvm/ADT/SmallPtrSet.h 1302 ^^^^^^^^^^^^^^^^^^^^^^ 1303 1304 ``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of 1305 pointers is transparently implemented with a ``SmallPtrSet``), but also supports 1306 iterators. If more than N insertions are performed, a single quadratically 1307 probed hash table is allocated and grows as needed, providing extremely 1308 efficient access (constant time insertion/deleting/queries with low constant 1309 factors) and is very stingy with malloc traffic. 1310 1311 Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet`` 1312 are invalidated whenever an insertion occurs. Also, the values visited by the 1313 iterators are not visited in sorted order. 1314 1315 .. _dss_stringset: 1316 1317 llvm/ADT/StringSet.h 1318 ^^^^^^^^^^^^^^^^^^^^ 1319 1320 ``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`, 1321 and it allows efficient storage and retrieval of unique strings. 1322 1323 Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports 1324 iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you 1325 need to call ``i->getKey()`` to access the item of the StringSet.) On the 1326 other hand, ``StringSet`` doesn't support range-insertion and 1327 copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet 1328 <dss_smallptrset>` do support. 1329 1330 .. _dss_denseset: 1331 1332 llvm/ADT/DenseSet.h 1333 ^^^^^^^^^^^^^^^^^^^ 1334 1335 DenseSet is a simple quadratically probed hash table. It excels at supporting 1336 small values: it uses a single allocation to hold all of the pairs that are 1337 currently inserted in the set. DenseSet is a great way to unique small values 1338 that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for 1339 pointers). Note that DenseSet has the same requirements for the value type that 1340 :ref:`DenseMap <dss_densemap>` has. 1341 1342 .. _dss_sparseset: 1343 1344 llvm/ADT/SparseSet.h 1345 ^^^^^^^^^^^^^^^^^^^^ 1346 1347 SparseSet holds a small number of objects identified by unsigned keys of 1348 moderate size. It uses a lot of memory, but provides operations that are almost 1349 as fast as a vector. Typical keys are physical registers, virtual registers, or 1350 numbered basic blocks. 1351 1352 SparseSet is useful for algorithms that need very fast clear/find/insert/erase 1353 and fast iteration over small sets. It is not intended for building composite 1354 data structures. 1355 1356 .. _dss_sparsemultiset: 1357 1358 llvm/ADT/SparseMultiSet.h 1359 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1360 1361 SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's 1362 desirable attributes. Like SparseSet, it typically uses a lot of memory, but 1363 provides operations that are almost as fast as a vector. Typical keys are 1364 physical registers, virtual registers, or numbered basic blocks. 1365 1366 SparseMultiSet is useful for algorithms that need very fast 1367 clear/find/insert/erase of the entire collection, and iteration over sets of 1368 elements sharing a key. It is often a more efficient choice than using composite 1369 data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for 1370 building composite data structures. 1371 1372 .. _dss_FoldingSet: 1373 1374 llvm/ADT/FoldingSet.h 1375 ^^^^^^^^^^^^^^^^^^^^^ 1376 1377 FoldingSet is an aggregate class that is really good at uniquing 1378 expensive-to-create or polymorphic objects. It is a combination of a chained 1379 hash table with intrusive links (uniqued objects are required to inherit from 1380 FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID 1381 process. 1382 1383 Consider a case where you want to implement a "getOrCreateFoo" method for a 1384 complex object (for example, a node in the code generator). The client has a 1385 description of **what** it wants to generate (it knows the opcode and all the 1386 operands), but we don't want to 'new' a node, then try inserting it into a set 1387 only to find out it already exists, at which point we would have to delete it 1388 and return the node that already exists. 1389 1390 To support this style of client, FoldingSet perform a query with a 1391 FoldingSetNodeID (which wraps SmallVector) that can be used to describe the 1392 element that we want to query for. The query either returns the element 1393 matching the ID or it returns an opaque ID that indicates where insertion should 1394 take place. Construction of the ID usually does not require heap traffic. 1395 1396 Because FoldingSet uses intrusive links, it can support polymorphic objects in 1397 the set (for example, you can have SDNode instances mixed with LoadSDNodes). 1398 Because the elements are individually allocated, pointers to the elements are 1399 stable: inserting or removing elements does not invalidate any pointers to other 1400 elements. 1401 1402 .. _dss_set: 1403 1404 <set> 1405 ^^^^^ 1406 1407 ``std::set`` is a reasonable all-around set class, which is decent at many 1408 things but great at nothing. std::set allocates memory for each element 1409 inserted (thus it is very malloc intensive) and typically stores three pointers 1410 per element in the set (thus adding a large amount of per-element space 1411 overhead). It offers guaranteed log(n) performance, which is not particularly 1412 fast from a complexity standpoint (particularly if the elements of the set are 1413 expensive to compare, like strings), and has extremely high constant factors for 1414 lookup, insertion and removal. 1415 1416 The advantages of std::set are that its iterators are stable (deleting or 1417 inserting an element from the set does not affect iterators or pointers to other 1418 elements) and that iteration over the set is guaranteed to be in sorted order. 1419 If the elements in the set are large, then the relative overhead of the pointers 1420 and malloc traffic is not a big deal, but if the elements of the set are small, 1421 std::set is almost never a good choice. 1422 1423 .. _dss_setvector: 1424 1425 llvm/ADT/SetVector.h 1426 ^^^^^^^^^^^^^^^^^^^^ 1427 1428 LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a 1429 set-like container along with a :ref:`Sequential Container <ds_sequential>` The 1430 important property that this provides is efficient insertion with uniquing 1431 (duplicate elements are ignored) with iteration support. It implements this by 1432 inserting elements into both a set-like container and the sequential container, 1433 using the set-like container for uniquing and the sequential container for 1434 iteration. 1435 1436 The difference between SetVector and other sets is that the order of iteration 1437 is guaranteed to match the order of insertion into the SetVector. This property 1438 is really important for things like sets of pointers. Because pointer values 1439 are non-deterministic (e.g. vary across runs of the program on different 1440 machines), iterating over the pointers in the set will not be in a well-defined 1441 order. 1442 1443 The drawback of SetVector is that it requires twice as much space as a normal 1444 set and has the sum of constant factors from the set-like container and the 1445 sequential container that it uses. Use it **only** if you need to iterate over 1446 the elements in a deterministic order. SetVector is also expensive to delete 1447 elements out of (linear time), unless you use its "pop_back" method, which is 1448 faster. 1449 1450 ``SetVector`` is an adapter class that defaults to using ``std::vector`` and a 1451 size 16 ``SmallSet`` for the underlying containers, so it is quite expensive. 1452 However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class, 1453 which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size. 1454 If you use this, and if your sets are dynamically smaller than ``N``, you will 1455 save a lot of heap traffic. 1456 1457 .. _dss_uniquevector: 1458 1459 llvm/ADT/UniqueVector.h 1460 ^^^^^^^^^^^^^^^^^^^^^^^ 1461 1462 UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a 1463 unique ID for each element inserted into the set. It internally contains a map 1464 and a vector, and it assigns a unique ID for each value inserted into the set. 1465 1466 UniqueVector is very expensive: its cost is the sum of the cost of maintaining 1467 both the map and vector, it has high complexity, high constant factors, and 1468 produces a lot of malloc traffic. It should be avoided. 1469 1470 .. _dss_immutableset: 1471 1472 llvm/ADT/ImmutableSet.h 1473 ^^^^^^^^^^^^^^^^^^^^^^^ 1474 1475 ImmutableSet is an immutable (functional) set implementation based on an AVL 1476 tree. Adding or removing elements is done through a Factory object and results 1477 in the creation of a new ImmutableSet object. If an ImmutableSet already exists 1478 with the given contents, then the existing one is returned; equality is compared 1479 with a FoldingSetNodeID. The time and space complexity of add or remove 1480 operations is logarithmic in the size of the original set. 1481 1482 There is no method for returning an element of the set, you can only check for 1483 membership. 1484 1485 .. _dss_otherset: 1486 1487 Other Set-Like Container Options 1488 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1489 1490 The STL provides several other options, such as std::multiset and the various 1491 "hash_set" like containers (whether from C++ TR1 or from the SGI library). We 1492 never use hash_set and unordered_set because they are generally very expensive 1493 (each insertion requires a malloc) and very non-portable. 1494 1495 std::multiset is useful if you're not interested in elimination of duplicates, 1496 but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector 1497 (where you don't delete duplicate entries) or some other approach is almost 1498 always better. 1499 1500 .. _ds_map: 1501 1502 Map-Like Containers (std::map, DenseMap, etc) 1503 --------------------------------------------- 1504 1505 Map-like containers are useful when you want to associate data to a key. As 1506 usual, there are a lot of different ways to do this. :) 1507 1508 .. _dss_sortedvectormap: 1509 1510 A sorted 'vector' 1511 ^^^^^^^^^^^^^^^^^ 1512 1513 If your usage pattern follows a strict insert-then-query approach, you can 1514 trivially use the same approach as :ref:`sorted vectors for set-like containers 1515 <dss_sortedvectorset>`. The only difference is that your query function (which 1516 uses std::lower_bound to get efficient log(n) lookup) should only compare the 1517 key, not both the key and value. This yields the same advantages as sorted 1518 vectors for sets. 1519 1520 .. _dss_stringmap: 1521 1522 llvm/ADT/StringMap.h 1523 ^^^^^^^^^^^^^^^^^^^^ 1524 1525 Strings are commonly used as keys in maps, and they are difficult to support 1526 efficiently: they are variable length, inefficient to hash and compare when 1527 long, expensive to copy, etc. StringMap is a specialized container designed to 1528 cope with these issues. It supports mapping an arbitrary range of bytes to an 1529 arbitrary other object. 1530 1531 The StringMap implementation uses a quadratically-probed hash table, where the 1532 buckets store a pointer to the heap allocated entries (and some other stuff). 1533 The entries in the map must be heap allocated because the strings are variable 1534 length. The string data (key) and the element object (value) are stored in the 1535 same allocation with the string data immediately after the element object. 1536 This container guarantees the "``(char*)(&Value+1)``" points to the key string 1537 for a value. 1538 1539 The StringMap is very fast for several reasons: quadratic probing is very cache 1540 efficient for lookups, the hash value of strings in buckets is not recomputed 1541 when looking up an element, StringMap rarely has to touch the memory for 1542 unrelated objects when looking up a value (even when hash collisions happen), 1543 hash table growth does not recompute the hash values for strings already in the 1544 table, and each pair in the map is store in a single allocation (the string data 1545 is stored in the same allocation as the Value of a pair). 1546 1547 StringMap also provides query methods that take byte ranges, so it only ever 1548 copies a string if a value is inserted into the table. 1549 1550 StringMap iteratation order, however, is not guaranteed to be deterministic, so 1551 any uses which require that should instead use a std::map. 1552 1553 .. _dss_indexmap: 1554 1555 llvm/ADT/IndexedMap.h 1556 ^^^^^^^^^^^^^^^^^^^^^ 1557 1558 IndexedMap is a specialized container for mapping small dense integers (or 1559 values that can be mapped to small dense integers) to some other type. It is 1560 internally implemented as a vector with a mapping function that maps the keys 1561 to the dense integer range. 1562 1563 This is useful for cases like virtual registers in the LLVM code generator: they 1564 have a dense mapping that is offset by a compile-time constant (the first 1565 virtual register ID). 1566 1567 .. _dss_densemap: 1568 1569 llvm/ADT/DenseMap.h 1570 ^^^^^^^^^^^^^^^^^^^ 1571 1572 DenseMap is a simple quadratically probed hash table. It excels at supporting 1573 small keys and values: it uses a single allocation to hold all of the pairs 1574 that are currently inserted in the map. DenseMap is a great way to map 1575 pointers to pointers, or map other small types to each other. 1576 1577 There are several aspects of DenseMap that you should be aware of, however. 1578 The iterators in a DenseMap are invalidated whenever an insertion occurs, 1579 unlike map. Also, because DenseMap allocates space for a large number of 1580 key/value pairs (it starts with 64 by default), it will waste a lot of space if 1581 your keys or values are large. Finally, you must implement a partial 1582 specialization of DenseMapInfo for the key that you want, if it isn't already 1583 supported. This is required to tell DenseMap about two special marker values 1584 (which can never be inserted into the map) that it needs internally. 1585 1586 DenseMap's find_as() method supports lookup operations using an alternate key 1587 type. This is useful in cases where the normal key type is expensive to 1588 construct, but cheap to compare against. The DenseMapInfo is responsible for 1589 defining the appropriate comparison and hashing methods for each alternate key 1590 type used. 1591 1592 .. _dss_valuemap: 1593 1594 llvm/IR/ValueMap.h 1595 ^^^^^^^^^^^^^^^^^^^ 1596 1597 ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping 1598 ``Value*``\ s (or subclasses) to another type. When a Value is deleted or 1599 RAUW'ed, ValueMap will update itself so the new version of the key is mapped to 1600 the same value, just as if the key were a WeakVH. You can configure exactly how 1601 this happens, and what else happens on these two events, by passing a ``Config`` 1602 parameter to the ValueMap template. 1603 1604 .. _dss_intervalmap: 1605 1606 llvm/ADT/IntervalMap.h 1607 ^^^^^^^^^^^^^^^^^^^^^^ 1608 1609 IntervalMap is a compact map for small keys and values. It maps key intervals 1610 instead of single keys, and it will automatically coalesce adjacent intervals. 1611 When the map only contains a few intervals, they are stored in the map object 1612 itself to avoid allocations. 1613 1614 The IntervalMap iterators are quite big, so they should not be passed around as 1615 STL iterators. The heavyweight iterators allow a smaller data structure. 1616 1617 .. _dss_map: 1618 1619 <map> 1620 ^^^^^ 1621 1622 std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a 1623 single allocation per pair inserted into the map, it offers log(n) lookup with 1624 an extremely large constant factor, imposes a space penalty of 3 pointers per 1625 pair in the map, etc. 1626 1627 std::map is most useful when your keys or values are very large, if you need to 1628 iterate over the collection in sorted order, or if you need stable iterators 1629 into the map (i.e. they don't get invalidated if an insertion or deletion of 1630 another element takes place). 1631 1632 .. _dss_mapvector: 1633 1634 llvm/ADT/MapVector.h 1635 ^^^^^^^^^^^^^^^^^^^^ 1636 1637 ``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The 1638 main difference is that the iteration order is guaranteed to be the insertion 1639 order, making it an easy (but somewhat expensive) solution for non-deterministic 1640 iteration over maps of pointers. 1641 1642 It is implemented by mapping from key to an index in a vector of key,value 1643 pairs. This provides fast lookup and iteration, but has two main drawbacks: 1644 the key is stored twice and removing elements takes linear time. If it is 1645 necessary to remove elements, it's best to remove them in bulk using 1646 ``remove_if()``. 1647 1648 .. _dss_inteqclasses: 1649 1650 llvm/ADT/IntEqClasses.h 1651 ^^^^^^^^^^^^^^^^^^^^^^^ 1652 1653 IntEqClasses provides a compact representation of equivalence classes of small 1654 integers. Initially, each integer in the range 0..n-1 has its own equivalence 1655 class. Classes can be joined by passing two class representatives to the 1656 join(a, b) method. Two integers are in the same class when findLeader() returns 1657 the same representative. 1658 1659 Once all equivalence classes are formed, the map can be compressed so each 1660 integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m 1661 is the total number of equivalence classes. The map must be uncompressed before 1662 it can be edited again. 1663 1664 .. _dss_immutablemap: 1665 1666 llvm/ADT/ImmutableMap.h 1667 ^^^^^^^^^^^^^^^^^^^^^^^ 1668 1669 ImmutableMap is an immutable (functional) map implementation based on an AVL 1670 tree. Adding or removing elements is done through a Factory object and results 1671 in the creation of a new ImmutableMap object. If an ImmutableMap already exists 1672 with the given key set, then the existing one is returned; equality is compared 1673 with a FoldingSetNodeID. The time and space complexity of add or remove 1674 operations is logarithmic in the size of the original map. 1675 1676 .. _dss_othermap: 1677 1678 Other Map-Like Container Options 1679 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1680 1681 The STL provides several other options, such as std::multimap and the various 1682 "hash_map" like containers (whether from C++ TR1 or from the SGI library). We 1683 never use hash_set and unordered_set because they are generally very expensive 1684 (each insertion requires a malloc) and very non-portable. 1685 1686 std::multimap is useful if you want to map a key to multiple values, but has all 1687 the drawbacks of std::map. A sorted vector or some other approach is almost 1688 always better. 1689 1690 .. _ds_bit: 1691 1692 Bit storage containers (BitVector, SparseBitVector) 1693 --------------------------------------------------- 1694 1695 Unlike the other containers, there are only two bit storage containers, and 1696 choosing when to use each is relatively straightforward. 1697 1698 One additional option is ``std::vector<bool>``: we discourage its use for two 1699 reasons 1) the implementation in many common compilers (e.g. commonly 1700 available versions of GCC) is extremely inefficient and 2) the C++ standards 1701 committee is likely to deprecate this container and/or change it significantly 1702 somehow. In any case, please don't use it. 1703 1704 .. _dss_bitvector: 1705 1706 BitVector 1707 ^^^^^^^^^ 1708 1709 The BitVector container provides a dynamic size set of bits for manipulation. 1710 It supports individual bit setting/testing, as well as set operations. The set 1711 operations take time O(size of bitvector), but operations are performed one word 1712 at a time, instead of one bit at a time. This makes the BitVector very fast for 1713 set operations compared to other containers. Use the BitVector when you expect 1714 the number of set bits to be high (i.e. a dense set). 1715 1716 .. _dss_smallbitvector: 1717 1718 SmallBitVector 1719 ^^^^^^^^^^^^^^ 1720 1721 The SmallBitVector container provides the same interface as BitVector, but it is 1722 optimized for the case where only a small number of bits, less than 25 or so, 1723 are needed. It also transparently supports larger bit counts, but slightly less 1724 efficiently than a plain BitVector, so SmallBitVector should only be used when 1725 larger counts are rare. 1726 1727 At this time, SmallBitVector does not support set operations (and, or, xor), and 1728 its operator[] does not provide an assignable lvalue. 1729 1730 .. _dss_sparsebitvector: 1731 1732 SparseBitVector 1733 ^^^^^^^^^^^^^^^ 1734 1735 The SparseBitVector container is much like BitVector, with one major difference: 1736 Only the bits that are set, are stored. This makes the SparseBitVector much 1737 more space efficient than BitVector when the set is sparse, as well as making 1738 set operations O(number of set bits) instead of O(size of universe). The 1739 downside to the SparseBitVector is that setting and testing of random bits is 1740 O(N), and on large SparseBitVectors, this can be slower than BitVector. In our 1741 implementation, setting or testing bits in sorted order (either forwards or 1742 reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends 1743 on size) of the current bit is also O(1). As a general statement, 1744 testing/setting bits in a SparseBitVector is O(distance away from last set bit). 1745 1746 .. _common: 1747 1748 Helpful Hints for Common Operations 1749 =================================== 1750 1751 This section describes how to perform some very simple transformations of LLVM 1752 code. This is meant to give examples of common idioms used, showing the 1753 practical side of LLVM transformations. 1754 1755 Because this is a "how-to" section, you should also read about the main classes 1756 that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference 1757 <coreclasses>` contains details and descriptions of the main classes that you 1758 should know about. 1759 1760 .. _inspection: 1761 1762 Basic Inspection and Traversal Routines 1763 --------------------------------------- 1764 1765 The LLVM compiler infrastructure have many different data structures that may be 1766 traversed. Following the example of the C++ standard template library, the 1767 techniques used to traverse these various data structures are all basically the 1768 same. For a enumerable sequence of values, the ``XXXbegin()`` function (or 1769 method) returns an iterator to the start of the sequence, the ``XXXend()`` 1770 function returns an iterator pointing to one past the last valid element of the 1771 sequence, and there is some ``XXXiterator`` data type that is common between the 1772 two operations. 1773 1774 Because the pattern for iteration is common across many different aspects of the 1775 program representation, the standard template library algorithms may be used on 1776 them, and it is easier to remember how to iterate. First we show a few common 1777 examples of the data structures that need to be traversed. Other data 1778 structures are traversed in very similar ways. 1779 1780 .. _iterate_function: 1781 1782 Iterating over the ``BasicBlock`` in a ``Function`` 1783 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1784 1785 It's quite common to have a ``Function`` instance that you'd like to transform 1786 in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To 1787 facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that 1788 constitute the ``Function``. The following is an example that prints the name 1789 of a ``BasicBlock`` and the number of ``Instruction``\ s it contains: 1790 1791 .. code-block:: c++ 1792 1793 // func is a pointer to a Function instance 1794 for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) 1795 // Print out the name of the basic block if it has one, and then the 1796 // number of instructions that it contains 1797 errs() << "Basic block (name=" << i->getName() << ") has " 1798 << i->size() << " instructions.\n"; 1799 1800 Note that i can be used as if it were a pointer for the purposes of invoking 1801 member functions of the ``Instruction`` class. This is because the indirection 1802 operator is overloaded for the iterator classes. In the above code, the 1803 expression ``i->size()`` is exactly equivalent to ``(*i).size()`` just like 1804 you'd expect. 1805 1806 .. _iterate_basicblock: 1807 1808 Iterating over the ``Instruction`` in a ``BasicBlock`` 1809 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1810 1811 Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to 1812 iterate over the individual instructions that make up ``BasicBlock``\ s. Here's 1813 a code snippet that prints out each instruction in a ``BasicBlock``: 1814 1815 .. code-block:: c++ 1816 1817 // blk is a pointer to a BasicBlock instance 1818 for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) 1819 // The next statement works since operator<<(ostream&,...) 1820 // is overloaded for Instruction& 1821 errs() << *i << "\n"; 1822 1823 1824 However, this isn't really the best way to print out the contents of a 1825 ``BasicBlock``! Since the ostream operators are overloaded for virtually 1826 anything you'll care about, you could have just invoked the print routine on the 1827 basic block itself: ``errs() << *blk << "\n";``. 1828 1829 .. _iterate_insiter: 1830 1831 Iterating over the ``Instruction`` in a ``Function`` 1832 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1833 1834 If you're finding that you commonly iterate over a ``Function``'s 1835 ``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s, 1836 ``InstIterator`` should be used instead. You'll need to include 1837 ``llvm/IR/InstIterator.h`` (`doxygen 1838 <http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate 1839 ``InstIterator``\ s explicitly in your code. Here's a small example that shows 1840 how to dump all instructions in a function to the standard error stream: 1841 1842 .. code-block:: c++ 1843 1844 #include "llvm/IR/InstIterator.h" 1845 1846 // F is a pointer to a Function instance 1847 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 1848 errs() << *I << "\n"; 1849 1850 Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with 1851 its initial contents. For example, if you wanted to initialize a work list to 1852 contain all instructions in a ``Function`` F, all you would need to do is 1853 something like: 1854 1855 .. code-block:: c++ 1856 1857 std::set<Instruction*> worklist; 1858 // or better yet, SmallPtrSet<Instruction*, 64> worklist; 1859 1860 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 1861 worklist.insert(&*I); 1862 1863 The STL set ``worklist`` would now contain all instructions in the ``Function`` 1864 pointed to by F. 1865 1866 .. _iterate_convert: 1867 1868 Turning an iterator into a class pointer (and vice-versa) 1869 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1870 1871 Sometimes, it'll be useful to grab a reference (or pointer) to a class instance 1872 when all you've got at hand is an iterator. Well, extracting a reference or a 1873 pointer from an iterator is very straight-forward. Assuming that ``i`` is a 1874 ``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``: 1875 1876 .. code-block:: c++ 1877 1878 Instruction& inst = *i; // Grab reference to instruction reference 1879 Instruction* pinst = &*i; // Grab pointer to instruction reference 1880 const Instruction& inst = *j; 1881 1882 However, the iterators you'll be working with in the LLVM framework are special: 1883 they will automatically convert to a ptr-to-instance type whenever they need to. 1884 Instead of dereferencing the iterator and then taking the address of the result, 1885 you can simply assign the iterator to the proper pointer type and you get the 1886 dereference and address-of operation as a result of the assignment (behind the 1887 scenes, this is a result of overloading casting mechanisms). Thus the second 1888 line of the last example, 1889 1890 .. code-block:: c++ 1891 1892 Instruction *pinst = &*i; 1893 1894 is semantically equivalent to 1895 1896 .. code-block:: c++ 1897 1898 Instruction *pinst = i; 1899 1900 It's also possible to turn a class pointer into the corresponding iterator, and 1901 this is a constant time operation (very efficient). The following code snippet 1902 illustrates use of the conversion constructors provided by LLVM iterators. By 1903 using these, you can explicitly grab the iterator of something without actually 1904 obtaining it via iteration over some structure: 1905 1906 .. code-block:: c++ 1907 1908 void printNextInstruction(Instruction* inst) { 1909 BasicBlock::iterator it(inst); 1910 ++it; // After this line, it refers to the instruction after *inst 1911 if (it != inst->getParent()->end()) errs() << *it << "\n"; 1912 } 1913 1914 Unfortunately, these implicit conversions come at a cost; they prevent these 1915 iterators from conforming to standard iterator conventions, and thus from being 1916 usable with standard algorithms and containers. For example, they prevent the 1917 following code, where ``B`` is a ``BasicBlock``, from compiling: 1918 1919 .. code-block:: c++ 1920 1921 llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end()); 1922 1923 Because of this, these implicit conversions may be removed some day, and 1924 ``operator*`` changed to return a pointer instead of a reference. 1925 1926 .. _iterate_complex: 1927 1928 Finding call sites: a slightly more complex example 1929 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1930 1931 Say that you're writing a FunctionPass and would like to count all the locations 1932 in the entire module (that is, across every ``Function``) where a certain 1933 function (i.e., some ``Function *``) is already in scope. As you'll learn 1934 later, you may want to use an ``InstVisitor`` to accomplish this in a much more 1935 straight-forward manner, but this example will allow us to explore how you'd do 1936 it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we 1937 want to do: 1938 1939 .. code-block:: none 1940 1941 initialize callCounter to zero 1942 for each Function f in the Module 1943 for each BasicBlock b in f 1944 for each Instruction i in b 1945 if (i is a CallInst and calls the given function) 1946 increment callCounter 1947 1948 And the actual code is (remember, because we're writing a ``FunctionPass``, our 1949 ``FunctionPass``-derived class simply has to override the ``runOnFunction`` 1950 method): 1951 1952 .. code-block:: c++ 1953 1954 Function* targetFunc = ...; 1955 1956 class OurFunctionPass : public FunctionPass { 1957 public: 1958 OurFunctionPass(): callCounter(0) { } 1959 1960 virtual runOnFunction(Function& F) { 1961 for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) { 1962 for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) { 1963 if (CallInst* callInst = dyn_cast<CallInst>(&*i)) { 1964 // We know we've encountered a call instruction, so we 1965 // need to determine if it's a call to the 1966 // function pointed to by m_func or not. 1967 if (callInst->getCalledFunction() == targetFunc) 1968 ++callCounter; 1969 } 1970 } 1971 } 1972 } 1973 1974 private: 1975 unsigned callCounter; 1976 }; 1977 1978 .. _calls_and_invokes: 1979 1980 Treating calls and invokes the same way 1981 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1982 1983 You may have noticed that the previous example was a bit oversimplified in that 1984 it did not deal with call sites generated by 'invoke' instructions. In this, 1985 and in other situations, you may find that you want to treat ``CallInst``\ s and 1986 ``InvokeInst``\ s the same way, even though their most-specific common base 1987 class is ``Instruction``, which includes lots of less closely-related things. 1988 For these cases, LLVM provides a handy wrapper class called ``CallSite`` 1989 (`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is 1990 essentially a wrapper around an ``Instruction`` pointer, with some methods that 1991 provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s. 1992 1993 This class has "value semantics": it should be passed by value, not by reference 1994 and it should not be dynamically allocated or deallocated using ``operator new`` 1995 or ``operator delete``. It is efficiently copyable, assignable and 1996 constructable, with costs equivalents to that of a bare pointer. If you look at 1997 its definition, it has only a single pointer member. 1998 1999 .. _iterate_chains: 2000 2001 Iterating over def-use & use-def chains 2002 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2003 2004 Frequently, we might have an instance of the ``Value`` class (`doxygen 2005 <http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine 2006 which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular 2007 ``Value`` is called a *def-use* chain. For example, let's say we have a 2008 ``Function*`` named ``F`` to a particular function ``foo``. Finding all of the 2009 instructions that *use* ``foo`` is as simple as iterating over the *def-use* 2010 chain of ``F``: 2011 2012 .. code-block:: c++ 2013 2014 Function *F = ...; 2015 2016 for (User *U : F->users()) { 2017 if (Instruction *Inst = dyn_cast<Instruction>(U)) { 2018 errs() << "F is used in instruction:\n"; 2019 errs() << *Inst << "\n"; 2020 } 2021 2022 Alternatively, it's common to have an instance of the ``User`` Class (`doxygen 2023 <http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what 2024 ``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is 2025 known as a *use-def* chain. Instances of class ``Instruction`` are common 2026 ``User`` s, so we might want to iterate over all of the values that a particular 2027 instruction uses (that is, the operands of the particular ``Instruction``): 2028 2029 .. code-block:: c++ 2030 2031 Instruction *pi = ...; 2032 2033 for (Use &U : pi->operands()) { 2034 Value *v = U.get(); 2035 // ... 2036 } 2037 2038 Declaring objects as ``const`` is an important tool of enforcing mutation free 2039 algorithms (such as analyses, etc.). For this purpose above iterators come in 2040 constant flavors as ``Value::const_use_iterator`` and 2041 ``Value::const_op_iterator``. They automatically arise when calling 2042 ``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively. 2043 Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns 2044 remain unchanged. 2045 2046 .. _iterate_preds: 2047 2048 Iterating over predecessors & successors of blocks 2049 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2050 2051 Iterating over the predecessors and successors of a block is quite easy with the 2052 routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to 2053 iterate over all predecessors of BB: 2054 2055 .. code-block:: c++ 2056 2057 #include "llvm/Support/CFG.h" 2058 BasicBlock *BB = ...; 2059 2060 for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) { 2061 BasicBlock *Pred = *PI; 2062 // ... 2063 } 2064 2065 Similarly, to iterate over successors use ``succ_iterator/succ_begin/succ_end``. 2066 2067 .. _simplechanges: 2068 2069 Making simple changes 2070 --------------------- 2071 2072 There are some primitive transformation operations present in the LLVM 2073 infrastructure that are worth knowing about. When performing transformations, 2074 it's fairly common to manipulate the contents of basic blocks. This section 2075 describes some of the common methods for doing so and gives example code. 2076 2077 .. _schanges_creating: 2078 2079 Creating and inserting new ``Instruction``\ s 2080 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2081 2082 *Instantiating Instructions* 2083 2084 Creation of ``Instruction``\ s is straight-forward: simply call the constructor 2085 for the kind of instruction to instantiate and provide the necessary parameters. 2086 For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus: 2087 2088 .. code-block:: c++ 2089 2090 AllocaInst* ai = new AllocaInst(Type::Int32Ty); 2091 2092 will create an ``AllocaInst`` instance that represents the allocation of one 2093 integer in the current stack frame, at run time. Each ``Instruction`` subclass 2094 is likely to have varying default parameters which change the semantics of the 2095 instruction, so refer to the `doxygen documentation for the subclass of 2096 Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that 2097 you're interested in instantiating. 2098 2099 *Naming values* 2100 2101 It is very useful to name the values of instructions when you're able to, as 2102 this facilitates the debugging of your transformations. If you end up looking 2103 at generated LLVM machine code, you definitely want to have logical names 2104 associated with the results of instructions! By supplying a value for the 2105 ``Name`` (default) parameter of the ``Instruction`` constructor, you associate a 2106 logical name with the result of the instruction's execution at run time. For 2107 example, say that I'm writing a transformation that dynamically allocates space 2108 for an integer on the stack, and that integer is going to be used as some kind 2109 of index by some other code. To accomplish this, I place an ``AllocaInst`` at 2110 the first point in the first ``BasicBlock`` of some ``Function``, and I'm 2111 intending to use it within the same ``Function``. I might do: 2112 2113 .. code-block:: c++ 2114 2115 AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc"); 2116 2117 where ``indexLoc`` is now the logical name of the instruction's execution value, 2118 which is a pointer to an integer on the run time stack. 2119 2120 *Inserting instructions* 2121 2122 There are essentially three ways to insert an ``Instruction`` into an existing 2123 sequence of instructions that form a ``BasicBlock``: 2124 2125 * Insertion into an explicit instruction list 2126 2127 Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``, 2128 and a newly-created instruction we wish to insert before ``*pi``, we do the 2129 following: 2130 2131 .. code-block:: c++ 2132 2133 BasicBlock *pb = ...; 2134 Instruction *pi = ...; 2135 Instruction *newInst = new Instruction(...); 2136 2137 pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb 2138 2139 Appending to the end of a ``BasicBlock`` is so common that the ``Instruction`` 2140 class and ``Instruction``-derived classes provide constructors which take a 2141 pointer to a ``BasicBlock`` to be appended to. For example code that looked 2142 like: 2143 2144 .. code-block:: c++ 2145 2146 BasicBlock *pb = ...; 2147 Instruction *newInst = new Instruction(...); 2148 2149 pb->getInstList().push_back(newInst); // Appends newInst to pb 2150 2151 becomes: 2152 2153 .. code-block:: c++ 2154 2155 BasicBlock *pb = ...; 2156 Instruction *newInst = new Instruction(..., pb); 2157 2158 which is much cleaner, especially if you are creating long instruction 2159 streams. 2160 2161 * Insertion into an implicit instruction list 2162 2163 ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly 2164 associated with an existing instruction list: the instruction list of the 2165 enclosing basic block. Thus, we could have accomplished the same thing as the 2166 above code without being given a ``BasicBlock`` by doing: 2167 2168 .. code-block:: c++ 2169 2170 Instruction *pi = ...; 2171 Instruction *newInst = new Instruction(...); 2172 2173 pi->getParent()->getInstList().insert(pi, newInst); 2174 2175 In fact, this sequence of steps occurs so frequently that the ``Instruction`` 2176 class and ``Instruction``-derived classes provide constructors which take (as 2177 a default parameter) a pointer to an ``Instruction`` which the newly-created 2178 ``Instruction`` should precede. That is, ``Instruction`` constructors are 2179 capable of inserting the newly-created instance into the ``BasicBlock`` of a 2180 provided instruction, immediately before that instruction. Using an 2181 ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the 2182 above code becomes: 2183 2184 .. code-block:: c++ 2185 2186 Instruction* pi = ...; 2187 Instruction* newInst = new Instruction(..., pi); 2188 2189 which is much cleaner, especially if you're creating a lot of instructions and 2190 adding them to ``BasicBlock``\ s. 2191 2192 * Insertion using an instance of ``IRBuilder`` 2193 2194 Inserting several ``Instruction``\ s can be quite laborious using the previous 2195 methods. The ``IRBuilder`` is a convenience class that can be used to add 2196 several instructions to the end of a ``BasicBlock`` or before a particular 2197 ``Instruction``. It also supports constant folding and renaming named 2198 registers (see ``IRBuilder``'s template arguments). 2199 2200 The example below demonstrates a very simple use of the ``IRBuilder`` where 2201 three instructions are inserted before the instruction ``pi``. The first two 2202 instructions are Call instructions and third instruction multiplies the return 2203 value of the two calls. 2204 2205 .. code-block:: c++ 2206 2207 Instruction *pi = ...; 2208 IRBuilder<> Builder(pi); 2209 CallInst* callOne = Builder.CreateCall(...); 2210 CallInst* callTwo = Builder.CreateCall(...); 2211 Value* result = Builder.CreateMul(callOne, callTwo); 2212 2213 The example below is similar to the above example except that the created 2214 ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``. 2215 2216 .. code-block:: c++ 2217 2218 BasicBlock *pb = ...; 2219 IRBuilder<> Builder(pb); 2220 CallInst* callOne = Builder.CreateCall(...); 2221 CallInst* callTwo = Builder.CreateCall(...); 2222 Value* result = Builder.CreateMul(callOne, callTwo); 2223 2224 See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``. 2225 2226 2227 .. _schanges_deleting: 2228 2229 Deleting Instructions 2230 ^^^^^^^^^^^^^^^^^^^^^ 2231 2232 Deleting an instruction from an existing sequence of instructions that form a 2233 BasicBlock_ is very straight-forward: just call the instruction's 2234 ``eraseFromParent()`` method. For example: 2235 2236 .. code-block:: c++ 2237 2238 Instruction *I = .. ; 2239 I->eraseFromParent(); 2240 2241 This unlinks the instruction from its containing basic block and deletes it. If 2242 you'd just like to unlink the instruction from its containing basic block but 2243 not delete it, you can use the ``removeFromParent()`` method. 2244 2245 .. _schanges_replacing: 2246 2247 Replacing an Instruction with another Value 2248 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2249 2250 Replacing individual instructions 2251 """"""""""""""""""""""""""""""""" 2252 2253 Including "`llvm/Transforms/Utils/BasicBlockUtils.h 2254 <http://llvm.org/doxygen/BasicBlockUtils_8h-source.html>`_" permits use of two 2255 very useful replace functions: ``ReplaceInstWithValue`` and 2256 ``ReplaceInstWithInst``. 2257 2258 .. _schanges_deleting_sub: 2259 2260 Deleting Instructions 2261 """"""""""""""""""""" 2262 2263 * ``ReplaceInstWithValue`` 2264 2265 This function replaces all uses of a given instruction with a value, and then 2266 removes the original instruction. The following example illustrates the 2267 replacement of the result of a particular ``AllocaInst`` that allocates memory 2268 for a single integer with a null pointer to an integer. 2269 2270 .. code-block:: c++ 2271 2272 AllocaInst* instToReplace = ...; 2273 BasicBlock::iterator ii(instToReplace); 2274 2275 ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, 2276 Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty))); 2277 2278 * ``ReplaceInstWithInst`` 2279 2280 This function replaces a particular instruction with another instruction, 2281 inserting the new instruction into the basic block at the location where the 2282 old instruction was, and replacing any uses of the old instruction with the 2283 new instruction. The following example illustrates the replacement of one 2284 ``AllocaInst`` with another. 2285 2286 .. code-block:: c++ 2287 2288 AllocaInst* instToReplace = ...; 2289 BasicBlock::iterator ii(instToReplace); 2290 2291 ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, 2292 new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt")); 2293 2294 2295 Replacing multiple uses of Users and Values 2296 """"""""""""""""""""""""""""""""""""""""""" 2297 2298 You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to 2299 change more than one use at a time. See the doxygen documentation for the 2300 `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class 2301 <http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more 2302 information. 2303 2304 .. _schanges_deletingGV: 2305 2306 Deleting GlobalVariables 2307 ^^^^^^^^^^^^^^^^^^^^^^^^ 2308 2309 Deleting a global variable from a module is just as easy as deleting an 2310 Instruction. First, you must have a pointer to the global variable that you 2311 wish to delete. You use this pointer to erase it from its parent, the module. 2312 For example: 2313 2314 .. code-block:: c++ 2315 2316 GlobalVariable *GV = .. ; 2317 2318 GV->eraseFromParent(); 2319 2320 2321 .. _create_types: 2322 2323 How to Create Types 2324 ------------------- 2325 2326 In generating IR, you may need some complex types. If you know these types 2327 statically, you can use ``TypeBuilder<...>::get()``, defined in 2328 ``llvm/Support/TypeBuilder.h``, to retrieve them. ``TypeBuilder`` has two forms 2329 depending on whether you're building types for cross-compilation or native 2330 library use. ``TypeBuilder<T, true>`` requires that ``T`` be independent of the 2331 host environment, meaning that it's built out of types from the ``llvm::types`` 2332 (`doxygen <http://llvm.org/doxygen/namespacellvm_1_1types.html>`__) namespace 2333 and pointers, functions, arrays, etc. built of those. ``TypeBuilder<T, false>`` 2334 additionally allows native C types whose size may depend on the host compiler. 2335 For example, 2336 2337 .. code-block:: c++ 2338 2339 FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get(); 2340 2341 is easier to read and write than the equivalent 2342 2343 .. code-block:: c++ 2344 2345 std::vector<const Type*> params; 2346 params.push_back(PointerType::getUnqual(Type::Int32Ty)); 2347 FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false); 2348 2349 See the `class comment 2350 <http://llvm.org/doxygen/TypeBuilder_8h-source.html#l00001>`_ for more details. 2351 2352 .. _threading: 2353 2354 Threads and LLVM 2355 ================ 2356 2357 This section describes the interaction of the LLVM APIs with multithreading, 2358 both on the part of client applications, and in the JIT, in the hosted 2359 application. 2360 2361 Note that LLVM's support for multithreading is still relatively young. Up 2362 through version 2.5, the execution of threaded hosted applications was 2363 supported, but not threaded client access to the APIs. While this use case is 2364 now supported, clients *must* adhere to the guidelines specified below to ensure 2365 proper operation in multithreaded mode. 2366 2367 Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic 2368 intrinsics in order to support threaded operation. If you need a 2369 multhreading-capable LLVM on a platform without a suitably modern system 2370 compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and 2371 using the resultant compiler to build a copy of LLVM with multithreading 2372 support. 2373 2374 .. _shutdown: 2375 2376 Ending Execution with ``llvm_shutdown()`` 2377 ----------------------------------------- 2378 2379 When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to 2380 deallocate memory used for internal structures. 2381 2382 .. _managedstatic: 2383 2384 Lazy Initialization with ``ManagedStatic`` 2385 ------------------------------------------ 2386 2387 ``ManagedStatic`` is a utility class in LLVM used to implement static 2388 initialization of static resources, such as the global type tables. In a 2389 single-threaded environment, it implements a simple lazy initialization scheme. 2390 When LLVM is compiled with support for multi-threading, however, it uses 2391 double-checked locking to implement thread-safe lazy initialization. 2392 2393 .. _llvmcontext: 2394 2395 Achieving Isolation with ``LLVMContext`` 2396 ---------------------------------------- 2397 2398 ``LLVMContext`` is an opaque class in the LLVM API which clients can use to 2399 operate multiple, isolated instances of LLVM concurrently within the same 2400 address space. For instance, in a hypothetical compile-server, the compilation 2401 of an individual translation unit is conceptually independent from all the 2402 others, and it would be desirable to be able to compile incoming translation 2403 units concurrently on independent server threads. Fortunately, ``LLVMContext`` 2404 exists to enable just this kind of scenario! 2405 2406 Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity 2407 (``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's 2408 in-memory IR belongs to an ``LLVMContext``. Entities in different contexts 2409 *cannot* interact with each other: ``Module``\ s in different contexts cannot be 2410 linked together, ``Function``\ s cannot be added to ``Module``\ s in different 2411 contexts, etc. What this means is that is is safe to compile on multiple 2412 threads simultaneously, as long as no two threads operate on entities within the 2413 same context. 2414 2415 In practice, very few places in the API require the explicit specification of a 2416 ``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every 2417 ``Type`` carries a reference to its owning context, most other entities can 2418 determine what context they belong to by looking at their own ``Type``. If you 2419 are adding new entities to LLVM IR, please try to maintain this interface 2420 design. 2421 2422 .. _jitthreading: 2423 2424 Threads and the JIT 2425 ------------------- 2426 2427 LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple 2428 threads can call ``ExecutionEngine::getPointerToFunction()`` or 2429 ``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run 2430 code output by the JIT concurrently. The user must still ensure that only one 2431 thread accesses IR in a given ``LLVMContext`` while another thread might be 2432 modifying it. One way to do that is to always hold the JIT lock while accessing 2433 IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s). 2434 Another way is to only call ``getPointerToFunction()`` from the 2435 ``LLVMContext``'s thread. 2436 2437 When the JIT is configured to compile lazily (using 2438 ``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race 2439 condition <http://llvm.org/bugs/show_bug.cgi?id=5184>`_ in updating call sites 2440 after a function is lazily-jitted. It's still possible to use the lazy JIT in a 2441 threaded program if you ensure that only one thread at a time can call any 2442 particular lazy stub and that the JIT lock guards any IR access, but we suggest 2443 using only the eager JIT in threaded programs. 2444 2445 .. _advanced: 2446 2447 Advanced Topics 2448 =============== 2449 2450 This section describes some of the advanced or obscure API's that most clients 2451 do not need to be aware of. These API's tend manage the inner workings of the 2452 LLVM system, and only need to be accessed in unusual circumstances. 2453 2454 .. _SymbolTable: 2455 2456 The ``ValueSymbolTable`` class 2457 ------------------------------ 2458 2459 The ``ValueSymbolTable`` (`doxygen 2460 <http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides 2461 a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for 2462 naming value definitions. The symbol table can provide a name for any Value_. 2463 2464 Note that the ``SymbolTable`` class should not be directly accessed by most 2465 clients. It should only be used when iteration over the symbol table names 2466 themselves are required, which is very special purpose. Note that not all LLVM 2467 Value_\ s have names, and those without names (i.e. they have an empty name) do 2468 not exist in the symbol table. 2469 2470 Symbol tables support iteration over the values in the symbol table with 2471 ``begin/end/iterator`` and supports querying to see if a specific name is in the 2472 symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no 2473 public mutator methods, instead, simply call ``setName`` on a value, which will 2474 autoinsert it into the appropriate symbol table. 2475 2476 .. _UserLayout: 2477 2478 The ``User`` and owned ``Use`` classes' memory layout 2479 ----------------------------------------------------- 2480 2481 The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__) 2482 class provides a basis for expressing the ownership of ``User`` towards other 2483 `Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The 2484 ``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper 2485 class is employed to do the bookkeeping and to facilitate *O(1)* addition and 2486 removal. 2487 2488 .. _Use2User: 2489 2490 Interaction and relationship between ``User`` and ``Use`` objects 2491 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2492 2493 A subclass of ``User`` can choose between incorporating its ``Use`` objects or 2494 refer to them out-of-line by means of a pointer. A mixed variant (some ``Use`` 2495 s inline others hung off) is impractical and breaks the invariant that the 2496 ``Use`` objects belonging to the same ``User`` form a contiguous array. 2497 2498 We have 2 different layouts in the ``User`` (sub)classes: 2499 2500 * Layout a) 2501 2502 The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User`` 2503 object and there are a fixed number of them. 2504 2505 * Layout b) 2506 2507 The ``Use`` object(s) are referenced by a pointer to an array from the 2508 ``User`` object and there may be a variable number of them. 2509 2510 As of v2.4 each layout still possesses a direct pointer to the start of the 2511 array of ``Use``\ s. Though not mandatory for layout a), we stick to this 2512 redundancy for the sake of simplicity. The ``User`` object also stores the 2513 number of ``Use`` objects it has. (Theoretically this information can also be 2514 calculated given the scheme presented below.) 2515 2516 Special forms of allocation operators (``operator new``) enforce the following 2517 memory layouts: 2518 2519 * Layout a) is modelled by prepending the ``User`` object by the ``Use[]`` 2520 array. 2521 2522 .. code-block:: none 2523 2524 ...---.---.---.---.-------... 2525 | P | P | P | P | User 2526 '''---'---'---'---'-------''' 2527 2528 * Layout b) is modelled by pointing at the ``Use[]`` array. 2529 2530 .. code-block:: none 2531 2532 .-------... 2533 | User 2534 '-------''' 2535 | 2536 v 2537 .---.---.---.---... 2538 | P | P | P | P | 2539 '---'---'---'---''' 2540 2541 *(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in 2542 each* ``Use`` *object in the member* ``Use::Prev`` *)* 2543 2544 .. _Waymarking: 2545 2546 The waymarking algorithm 2547 ^^^^^^^^^^^^^^^^^^^^^^^^ 2548 2549 Since the ``Use`` objects are deprived of the direct (back)pointer to their 2550 ``User`` objects, there must be a fast and exact method to recover it. This is 2551 accomplished by the following scheme: 2552 2553 A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev`` 2554 allows to find the start of the ``User`` object: 2555 2556 * ``00`` --- binary digit 0 2557 2558 * ``01`` --- binary digit 1 2559 2560 * ``10`` --- stop and calculate (``s``) 2561 2562 * ``11`` --- full stop (``S``) 2563 2564 Given a ``Use*``, all we have to do is to walk till we get a stop and we either 2565 have a ``User`` immediately behind or we have to walk to the next stop picking 2566 up digits and calculating the offset: 2567 2568 .. code-block:: none 2569 2570 .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---------------- 2571 | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*) 2572 '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---------------- 2573 |+15 |+10 |+6 |+3 |+1 2574 | | | | | __> 2575 | | | | __________> 2576 | | | ______________________> 2577 | | ______________________________________> 2578 | __________________________________________________________> 2579 2580 Only the significant number of bits need to be stored between the stops, so that 2581 the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects 2582 associated with a ``User``. 2583 2584 .. _ReferenceImpl: 2585 2586 Reference implementation 2587 ^^^^^^^^^^^^^^^^^^^^^^^^ 2588 2589 The following literate Haskell fragment demonstrates the concept: 2590 2591 .. code-block:: haskell 2592 2593 > import Test.QuickCheck 2594 > 2595 > digits :: Int -> [Char] -> [Char] 2596 > digits 0 acc = '0' : acc 2597 > digits 1 acc = '1' : acc 2598 > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc 2599 > 2600 > dist :: Int -> [Char] -> [Char] 2601 > dist 0 [] = ['S'] 2602 > dist 0 acc = acc 2603 > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r 2604 > dist n acc = dist (n - 1) $ dist 1 acc 2605 > 2606 > takeLast n ss = reverse $ take n $ reverse ss 2607 > 2608 > test = takeLast 40 $ dist 20 [] 2609 > 2610 2611 Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"`` 2612 2613 The reverse algorithm computes the length of the string just by examining a 2614 certain prefix: 2615 2616 .. code-block:: haskell 2617 2618 > pref :: [Char] -> Int 2619 > pref "S" = 1 2620 > pref ('s':'1':rest) = decode 2 1 rest 2621 > pref (_:rest) = 1 + pref rest 2622 > 2623 > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest 2624 > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest 2625 > decode walk acc _ = walk + acc 2626 > 2627 2628 Now, as expected, printing <pref test> gives ``40``. 2629 2630 We can *quickCheck* this with following property: 2631 2632 .. code-block:: haskell 2633 2634 > testcase = dist 2000 [] 2635 > testcaseLength = length testcase 2636 > 2637 > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr 2638 > where arr = takeLast n testcase 2639 > 2640 2641 As expected <quickCheck identityProp> gives: 2642 2643 :: 2644 2645 *Main> quickCheck identityProp 2646 OK, passed 100 tests. 2647 2648 Let's be a bit more exhaustive: 2649 2650 .. code-block:: haskell 2651 2652 > 2653 > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p 2654 > 2655 2656 And here is the result of <deepCheck identityProp>: 2657 2658 :: 2659 2660 *Main> deepCheck identityProp 2661 OK, passed 500 tests. 2662 2663 .. _Tagging: 2664 2665 Tagging considerations 2666 ^^^^^^^^^^^^^^^^^^^^^^ 2667 2668 To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never 2669 change after being set up, setters of ``Use::Prev`` must re-tag the new 2670 ``Use**`` on every modification. Accordingly getters must strip the tag bits. 2671 2672 For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit 2673 set). Following this pointer brings us to the ``User``. A portable trick 2674 ensures that the first bytes of ``User`` (if interpreted as a pointer) never has 2675 the LSBit set. (Portability is relying on the fact that all known compilers 2676 place the ``vptr`` in the first word of the instances.) 2677 2678 .. _polymorphism: 2679 2680 Designing Type Hiercharies and Polymorphic Interfaces 2681 ----------------------------------------------------- 2682 2683 There are two different design patterns that tend to result in the use of 2684 virtual dispatch for methods in a type hierarchy in C++ programs. The first is 2685 a genuine type hierarchy where different types in the hierarchy model 2686 a specific subset of the functionality and semantics, and these types nest 2687 strictly within each other. Good examples of this can be seen in the ``Value`` 2688 or ``Type`` type hierarchies. 2689 2690 A second is the desire to dispatch dynamically across a collection of 2691 polymorphic interface implementations. This latter use case can be modeled with 2692 virtual dispatch and inheritance by defining an abstract interface base class 2693 which all implementations derive from and override. However, this 2694 implementation strategy forces an **"is-a"** relationship to exist that is not 2695 actually meaningful. There is often not some nested hierarchy of useful 2696 generalizations which code might interact with and move up and down. Instead, 2697 there is a singular interface which is dispatched across a range of 2698 implementations. 2699 2700 The preferred implementation strategy for the second use case is that of 2701 generic programming (sometimes called "compile-time duck typing" or "static 2702 polymorphism"). For example, a template over some type parameter ``T`` can be 2703 instantiated across any particular implementation that conforms to the 2704 interface or *concept*. A good example here is the highly generic properties of 2705 any type which models a node in a directed graph. LLVM models these primarily 2706 through templates and generic programming. Such templates include the 2707 ``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism 2708 truly needs **dynamic** dispatch you can generalize it using a technique 2709 called *concept-based polymorphism*. This pattern emulates the interfaces and 2710 behaviors of templates using a very limited form of virtual dispatch for type 2711 erasure inside its implementation. You can find examples of this technique in 2712 the ``PassManager.h`` system, and there is a more detailed introduction to it 2713 by Sean Parent in several of his talks and papers: 2714 2715 #. `Inheritance Is The Base Class of Evil 2716 <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_ 2717 - The GoingNative 2013 talk describing this technique, and probably the best 2718 place to start. 2719 #. `Value Semantics and Concepts-based Polymorphism 2720 <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk 2721 describing this technique in more detail. 2722 #. `Sean Parent's Papers and Presentations 2723 <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_ 2724 - A Github project full of links to slides, video, and sometimes code. 2725 2726 When deciding between creating a type hierarchy (with either tagged or virtual 2727 dispatch) and using templates or concepts-based polymorphism, consider whether 2728 there is some refinement of an abstract base class which is a semantically 2729 meaningful type on an interface boundary. If anything more refined than the 2730 root abstract interface is meaningless to talk about as a partial extension of 2731 the semantic model, then your use case likely fits better with polymorphism and 2732 you should avoid using virtual dispatch. However, there may be some exigent 2733 circumstances that require one technique or the other to be used. 2734 2735 If you do need to introduce a type hierarchy, we prefer to use explicitly 2736 closed type hierarchies with manual tagged dispatch and/or RTTI rather than the 2737 open inheritance model and virtual dispatch that is more common in C++ code. 2738 This is because LLVM rarely encourages library consumers to extend its core 2739 types, and leverages the closed and tag-dispatched nature of its hierarchies to 2740 generate significantly more efficient code. We have also found that a large 2741 amount of our usage of type hierarchies fits better with tag-based pattern 2742 matching rather than dynamic dispatch across a common interface. Within LLVM we 2743 have built custom helpers to facilitate this design. See this document's 2744 section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document 2745 <HowToSetUpLLVMStyleRTTI>` which describes how you can implement this 2746 pattern for use with the LLVM helpers. 2747 2748 .. _abi_breaking_checks: 2749 2750 ABI Breaking Checks 2751 ------------------- 2752 2753 Checks and asserts that alter the LLVM C++ ABI are predicated on the 2754 preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM 2755 libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI 2756 compatible LLVM libraries built without it defined. By default, 2757 turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS` 2758 so a default +Asserts build is not ABI compatible with a 2759 default -Asserts build. Clients that want ABI compatibility 2760 between +Asserts and -Asserts builds should use the CMake or autoconf 2761 build systems to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently 2762 of `LLVM_ENABLE_ASSERTIONS`. 2763 2764 .. _coreclasses: 2765 2766 The Core LLVM Class Hierarchy Reference 2767 ======================================= 2768 2769 ``#include "llvm/IR/Type.h"`` 2770 2771 header source: `Type.h <http://llvm.org/doxygen/Type_8h-source.html>`_ 2772 2773 doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_ 2774 2775 The Core LLVM classes are the primary means of representing the program being 2776 inspected or transformed. The core LLVM classes are defined in header files in 2777 the ``include/llvm/IR`` directory, and implemented in the ``lib/IR`` 2778 directory. It's worth noting that, for historical reasons, this library is 2779 called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect. 2780 2781 .. _Type: 2782 2783 The Type class and Derived Types 2784 -------------------------------- 2785 2786 ``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``. 2787 ``Type`` cannot be instantiated directly but only through its subclasses. 2788 Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and 2789 ``DoubleType``) have hidden subclasses. They are hidden because they offer no 2790 useful functionality beyond what the ``Type`` class offers except to distinguish 2791 themselves from other subclasses of ``Type``. 2792 2793 All other types are subclasses of ``DerivedType``. Types can be named, but this 2794 is not a requirement. There exists exactly one instance of a given shape at any 2795 one time. This allows type equality to be performed with address equality of 2796 the Type Instance. That is, given two ``Type*`` values, the types are identical 2797 if the pointers are identical. 2798 2799 .. _m_Type: 2800 2801 Important Public Methods 2802 ^^^^^^^^^^^^^^^^^^^^^^^^ 2803 2804 * ``bool isIntegerTy() const``: Returns true for any integer type. 2805 2806 * ``bool isFloatingPointTy()``: Return true if this is one of the five 2807 floating point types. 2808 2809 * ``bool isSized()``: Return true if the type has known size. Things 2810 that don't have a size are abstract types, labels and void. 2811 2812 .. _derivedtypes: 2813 2814 Important Derived Types 2815 ^^^^^^^^^^^^^^^^^^^^^^^ 2816 2817 ``IntegerType`` 2818 Subclass of DerivedType that represents integer types of any bit width. Any 2819 bit width between ``IntegerType::MIN_INT_BITS`` (1) and 2820 ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented. 2821 2822 * ``static const IntegerType* get(unsigned NumBits)``: get an integer 2823 type of a specific bit width. 2824 2825 * ``unsigned getBitWidth() const``: Get the bit width of an integer type. 2826 2827 ``SequentialType`` 2828 This is subclassed by ArrayType, PointerType and VectorType. 2829 2830 * ``const Type * getElementType() const``: Returns the type of each 2831 of the elements in the sequential type. 2832 2833 ``ArrayType`` 2834 This is a subclass of SequentialType and defines the interface for array 2835 types. 2836 2837 * ``unsigned getNumElements() const``: Returns the number of elements 2838 in the array. 2839 2840 ``PointerType`` 2841 Subclass of SequentialType for pointer types. 2842 2843 ``VectorType`` 2844 Subclass of SequentialType for vector types. A vector type is similar to an 2845 ArrayType but is distinguished because it is a first class type whereas 2846 ArrayType is not. Vector types are used for vector operations and are usually 2847 small vectors of an integer or floating point type. 2848 2849 ``StructType`` 2850 Subclass of DerivedTypes for struct types. 2851 2852 .. _FunctionType: 2853 2854 ``FunctionType`` 2855 Subclass of DerivedTypes for function types. 2856 2857 * ``bool isVarArg() const``: Returns true if it's a vararg function. 2858 2859 * ``const Type * getReturnType() const``: Returns the return type of the 2860 function. 2861 2862 * ``const Type * getParamType (unsigned i)``: Returns the type of the ith 2863 parameter. 2864 2865 * ``const unsigned getNumParams() const``: Returns the number of formal 2866 parameters. 2867 2868 .. _Module: 2869 2870 The ``Module`` class 2871 -------------------- 2872 2873 ``#include "llvm/IR/Module.h"`` 2874 2875 header source: `Module.h <http://llvm.org/doxygen/Module_8h-source.html>`_ 2876 2877 doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_ 2878 2879 The ``Module`` class represents the top level structure present in LLVM 2880 programs. An LLVM module is effectively either a translation unit of the 2881 original program or a combination of several translation units merged by the 2882 linker. The ``Module`` class keeps track of a list of :ref:`Function 2883 <c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_. 2884 Additionally, it contains a few helpful member functions that try to make common 2885 operations easy. 2886 2887 .. _m_Module: 2888 2889 Important Public Members of the ``Module`` class 2890 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2891 2892 * ``Module::Module(std::string name = "")`` 2893 2894 Constructing a Module_ is easy. You can optionally provide a name for it 2895 (probably based on the name of the translation unit). 2896 2897 * | ``Module::iterator`` - Typedef for function list iterator 2898 | ``Module::const_iterator`` - Typedef for const_iterator. 2899 | ``begin()``, ``end()``, ``size()``, ``empty()`` 2900 2901 These are forwarding methods that make it easy to access the contents of a 2902 ``Module`` object's :ref:`Function <c_Function>` list. 2903 2904 * ``Module::FunctionListType &getFunctionList()`` 2905 2906 Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use 2907 when you need to update the list or perform a complex action that doesn't have 2908 a forwarding method. 2909 2910 ---------------- 2911 2912 * | ``Module::global_iterator`` - Typedef for global variable list iterator 2913 | ``Module::const_global_iterator`` - Typedef for const_iterator. 2914 | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()`` 2915 2916 These are forwarding methods that make it easy to access the contents of a 2917 ``Module`` object's GlobalVariable_ list. 2918 2919 * ``Module::GlobalListType &getGlobalList()`` 2920 2921 Returns the list of GlobalVariable_\ s. This is necessary to use when you 2922 need to update the list or perform a complex action that doesn't have a 2923 forwarding method. 2924 2925 ---------------- 2926 2927 * ``SymbolTable *getSymbolTable()`` 2928 2929 Return a reference to the SymbolTable_ for this ``Module``. 2930 2931 ---------------- 2932 2933 * ``Function *getFunction(StringRef Name) const`` 2934 2935 Look up the specified function in the ``Module`` SymbolTable_. If it does not 2936 exist, return ``null``. 2937 2938 * ``Function *getOrInsertFunction(const std::string &Name, const FunctionType 2939 *T)`` 2940 2941 Look up the specified function in the ``Module`` SymbolTable_. If it does not 2942 exist, add an external declaration for the function and return it. 2943 2944 * ``std::string getTypeName(const Type *Ty)`` 2945 2946 If there is at least one entry in the SymbolTable_ for the specified Type_, 2947 return it. Otherwise return the empty string. 2948 2949 * ``bool addTypeName(const std::string &Name, const Type *Ty)`` 2950 2951 Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is 2952 already an entry for this name, true is returned and the SymbolTable_ is not 2953 modified. 2954 2955 .. _Value: 2956 2957 The ``Value`` class 2958 ------------------- 2959 2960 ``#include "llvm/IR/Value.h"`` 2961 2962 header source: `Value.h <http://llvm.org/doxygen/Value_8h-source.html>`_ 2963 2964 doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ 2965 2966 The ``Value`` class is the most important class in the LLVM Source base. It 2967 represents a typed value that may be used (among other things) as an operand to 2968 an instruction. There are many different types of ``Value``\ s, such as 2969 Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function 2970 <c_Function>`\ s are ``Value``\ s. 2971 2972 A particular ``Value`` may be used many times in the LLVM representation for a 2973 program. For example, an incoming argument to a function (represented with an 2974 instance of the Argument_ class) is "used" by every instruction in the function 2975 that references the argument. To keep track of this relationship, the ``Value`` 2976 class keeps a list of all of the ``User``\ s that is using it (the User_ class 2977 is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s). 2978 This use list is how LLVM represents def-use information in the program, and is 2979 accessible through the ``use_*`` methods, shown below. 2980 2981 Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this 2982 Type_ is available through the ``getType()`` method. In addition, all LLVM 2983 values can be named. The "name" of the ``Value`` is a symbolic string printed 2984 in the LLVM code: 2985 2986 .. code-block:: llvm 2987 2988 %foo = add i32 1, 2 2989 2990 .. _nameWarning: 2991 2992 The name of this instruction is "foo". **NOTE** that the name of any value may 2993 be missing (an empty string), so names should **ONLY** be used for debugging 2994 (making the source code easier to read, debugging printouts), they should not be 2995 used to keep track of values or map between them. For this purpose, use a 2996 ``std::map`` of pointers to the ``Value`` itself instead. 2997 2998 One important aspect of LLVM is that there is no distinction between an SSA 2999 variable and the operation that produces it. Because of this, any reference to 3000 the value produced by an instruction (or the value available as an incoming 3001 argument, for example) is represented as a direct pointer to the instance of the 3002 class that represents this value. Although this may take some getting used to, 3003 it simplifies the representation and makes it easier to manipulate. 3004 3005 .. _m_Value: 3006 3007 Important Public Members of the ``Value`` class 3008 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3009 3010 * | ``Value::use_iterator`` - Typedef for iterator over the use-list 3011 | ``Value::const_use_iterator`` - Typedef for const_iterator over the 3012 use-list 3013 | ``unsigned use_size()`` - Returns the number of users of the value. 3014 | ``bool use_empty()`` - Returns true if there are no users. 3015 | ``use_iterator use_begin()`` - Get an iterator to the start of the 3016 use-list. 3017 | ``use_iterator use_end()`` - Get an iterator to the end of the use-list. 3018 | ``User *use_back()`` - Returns the last element in the list. 3019 3020 These methods are the interface to access the def-use information in LLVM. 3021 As with all other iterators in LLVM, the naming conventions follow the 3022 conventions defined by the STL_. 3023 3024 * ``Type *getType() const`` 3025 This method returns the Type of the Value. 3026 3027 * | ``bool hasName() const`` 3028 | ``std::string getName() const`` 3029 | ``void setName(const std::string &Name)`` 3030 3031 This family of methods is used to access and assign a name to a ``Value``, be 3032 aware of the :ref:`precaution above <nameWarning>`. 3033 3034 * ``void replaceAllUsesWith(Value *V)`` 3035 3036 This method traverses the use list of a ``Value`` changing all User_\ s of the 3037 current value to refer to "``V``" instead. For example, if you detect that an 3038 instruction always produces a constant value (for example through constant 3039 folding), you can replace all uses of the instruction with the constant like 3040 this: 3041 3042 .. code-block:: c++ 3043 3044 Inst->replaceAllUsesWith(ConstVal); 3045 3046 .. _User: 3047 3048 The ``User`` class 3049 ------------------ 3050 3051 ``#include "llvm/IR/User.h"`` 3052 3053 header source: `User.h <http://llvm.org/doxygen/User_8h-source.html>`_ 3054 3055 doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_ 3056 3057 Superclass: Value_ 3058 3059 The ``User`` class is the common base class of all LLVM nodes that may refer to 3060 ``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s 3061 that the User is referring to. The ``User`` class itself is a subclass of 3062 ``Value``. 3063 3064 The operands of a ``User`` point directly to the LLVM ``Value`` that it refers 3065 to. Because LLVM uses Static Single Assignment (SSA) form, there can only be 3066 one definition referred to, allowing this direct connection. This connection 3067 provides the use-def information in LLVM. 3068 3069 .. _m_User: 3070 3071 Important Public Members of the ``User`` class 3072 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3073 3074 The ``User`` class exposes the operand list in two ways: through an index access 3075 interface and through an iterator based interface. 3076 3077 * | ``Value *getOperand(unsigned i)`` 3078 | ``unsigned getNumOperands()`` 3079 3080 These two methods expose the operands of the ``User`` in a convenient form for 3081 direct access. 3082 3083 * | ``User::op_iterator`` - Typedef for iterator over the operand list 3084 | ``op_iterator op_begin()`` - Get an iterator to the start of the operand 3085 list. 3086 | ``op_iterator op_end()`` - Get an iterator to the end of the operand list. 3087 3088 Together, these methods make up the iterator based interface to the operands 3089 of a ``User``. 3090 3091 3092 .. _Instruction: 3093 3094 The ``Instruction`` class 3095 ------------------------- 3096 3097 ``#include "llvm/IR/Instruction.h"`` 3098 3099 header source: `Instruction.h 3100 <http://llvm.org/doxygen/Instruction_8h-source.html>`_ 3101 3102 doxygen info: `Instruction Class 3103 <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ 3104 3105 Superclasses: User_, Value_ 3106 3107 The ``Instruction`` class is the common base class for all LLVM instructions. 3108 It provides only a few methods, but is a very commonly used class. The primary 3109 data tracked by the ``Instruction`` class itself is the opcode (instruction 3110 type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To 3111 represent a specific type of instruction, one of many subclasses of 3112 ``Instruction`` are used. 3113 3114 Because the ``Instruction`` class subclasses the User_ class, its operands can 3115 be accessed in the same way as for other ``User``\ s (with the 3116 ``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods). 3117 An important file for the ``Instruction`` class is the ``llvm/Instruction.def`` 3118 file. This file contains some meta-data about the various different types of 3119 instructions in LLVM. It describes the enum values that are used as opcodes 3120 (for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the 3121 concrete sub-classes of ``Instruction`` that implement the instruction (for 3122 example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this 3123 file confuses doxygen, so these enum values don't show up correctly in the 3124 `doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_. 3125 3126 .. _s_Instruction: 3127 3128 Important Subclasses of the ``Instruction`` class 3129 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3130 3131 .. _BinaryOperator: 3132 3133 * ``BinaryOperator`` 3134 3135 This subclasses represents all two operand instructions whose operands must be 3136 the same type, except for the comparison instructions. 3137 3138 .. _CastInst: 3139 3140 * ``CastInst`` 3141 This subclass is the parent of the 12 casting instructions. It provides 3142 common operations on cast instructions. 3143 3144 .. _CmpInst: 3145 3146 * ``CmpInst`` 3147 3148 This subclass respresents the two comparison instructions, 3149 `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and 3150 `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands). 3151 3152 .. _TerminatorInst: 3153 3154 * ``TerminatorInst`` 3155 3156 This subclass is the parent of all terminator instructions (those which can 3157 terminate a block). 3158 3159 .. _m_Instruction: 3160 3161 Important Public Members of the ``Instruction`` class 3162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3163 3164 * ``BasicBlock *getParent()`` 3165 3166 Returns the BasicBlock_ that this 3167 ``Instruction`` is embedded into. 3168 3169 * ``bool mayWriteToMemory()`` 3170 3171 Returns true if the instruction writes to memory, i.e. it is a ``call``, 3172 ``free``, ``invoke``, or ``store``. 3173 3174 * ``unsigned getOpcode()`` 3175 3176 Returns the opcode for the ``Instruction``. 3177 3178 * ``Instruction *clone() const`` 3179 3180 Returns another instance of the specified instruction, identical in all ways 3181 to the original except that the instruction has no parent (i.e. it's not 3182 embedded into a BasicBlock_), and it has no name. 3183 3184 .. _Constant: 3185 3186 The ``Constant`` class and subclasses 3187 ------------------------------------- 3188 3189 Constant represents a base class for different types of constants. It is 3190 subclassed by ConstantInt, ConstantArray, etc. for representing the various 3191 types of Constants. GlobalValue_ is also a subclass, which represents the 3192 address of a global variable or function. 3193 3194 .. _s_Constant: 3195 3196 Important Subclasses of Constant 3197 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3198 3199 * ConstantInt : This subclass of Constant represents an integer constant of 3200 any width. 3201 3202 * ``const APInt& getValue() const``: Returns the underlying 3203 value of this constant, an APInt value. 3204 3205 * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an 3206 int64_t via sign extension. If the value (not the bit width) of the APInt 3207 is too large to fit in an int64_t, an assertion will result. For this 3208 reason, use of this method is discouraged. 3209 3210 * ``uint64_t getZExtValue() const``: Converts the underlying APInt value 3211 to a uint64_t via zero extension. IF the value (not the bit width) of the 3212 APInt is too large to fit in a uint64_t, an assertion will result. For this 3213 reason, use of this method is discouraged. 3214 3215 * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt 3216 object that represents the value provided by ``Val``. The type is implied 3217 as the IntegerType that corresponds to the bit width of ``Val``. 3218 3219 * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the 3220 ConstantInt object that represents the value provided by ``Val`` for integer 3221 type ``Ty``. 3222 3223 * ConstantFP : This class represents a floating point constant. 3224 3225 * ``double getValue() const``: Returns the underlying value of this constant. 3226 3227 * ConstantArray : This represents a constant array. 3228 3229 * ``const std::vector<Use> &getValues() const``: Returns a vector of 3230 component constants that makeup this array. 3231 3232 * ConstantStruct : This represents a constant struct. 3233 3234 * ``const std::vector<Use> &getValues() const``: Returns a vector of 3235 component constants that makeup this array. 3236 3237 * GlobalValue : This represents either a global variable or a function. In 3238 either case, the value is a constant fixed address (after linking). 3239 3240 .. _GlobalValue: 3241 3242 The ``GlobalValue`` class 3243 ------------------------- 3244 3245 ``#include "llvm/IR/GlobalValue.h"`` 3246 3247 header source: `GlobalValue.h 3248 <http://llvm.org/doxygen/GlobalValue_8h-source.html>`_ 3249 3250 doxygen info: `GlobalValue Class 3251 <http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_ 3252 3253 Superclasses: Constant_, User_, Value_ 3254 3255 Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the 3256 only LLVM values that are visible in the bodies of all :ref:`Function 3257 <c_Function>`\ s. Because they are visible at global scope, they are also 3258 subject to linking with other globals defined in different translation units. 3259 To control the linking process, ``GlobalValue``\ s know their linkage rules. 3260 Specifically, ``GlobalValue``\ s know whether they have internal or external 3261 linkage, as defined by the ``LinkageTypes`` enumeration. 3262 3263 If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C), 3264 it is not visible to code outside the current translation unit, and does not 3265 participate in linking. If it has external linkage, it is visible to external 3266 code, and does participate in linking. In addition to linkage information, 3267 ``GlobalValue``\ s keep track of which Module_ they are currently part of. 3268 3269 Because ``GlobalValue``\ s are memory objects, they are always referred to by 3270 their **address**. As such, the Type_ of a global is always a pointer to its 3271 contents. It is important to remember this when using the ``GetElementPtrInst`` 3272 instruction because this pointer must be dereferenced first. For example, if 3273 you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array 3274 of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to 3275 that array. Although the address of the first element of this array and the 3276 value of the ``GlobalVariable`` are the same, they have different types. The 3277 ``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is 3278 ``i32.`` Because of this, accessing a global value requires you to dereference 3279 the pointer with ``GetElementPtrInst`` first, then its elements can be accessed. 3280 This is explained in the `LLVM Language Reference Manual 3281 <LangRef.html#globalvars>`_. 3282 3283 .. _m_GlobalValue: 3284 3285 Important Public Members of the ``GlobalValue`` class 3286 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3287 3288 * | ``bool hasInternalLinkage() const`` 3289 | ``bool hasExternalLinkage() const`` 3290 | ``void setInternalLinkage(bool HasInternalLinkage)`` 3291 3292 These methods manipulate the linkage characteristics of the ``GlobalValue``. 3293 3294 * ``Module *getParent()`` 3295 3296 This returns the Module_ that the 3297 GlobalValue is currently embedded into. 3298 3299 .. _c_Function: 3300 3301 The ``Function`` class 3302 ---------------------- 3303 3304 ``#include "llvm/IR/Function.h"`` 3305 3306 header source: `Function.h <http://llvm.org/doxygen/Function_8h-source.html>`_ 3307 3308 doxygen info: `Function Class 3309 <http://llvm.org/doxygen/classllvm_1_1Function.html>`_ 3310 3311 Superclasses: GlobalValue_, Constant_, User_, Value_ 3312 3313 The ``Function`` class represents a single procedure in LLVM. It is actually 3314 one of the more complex classes in the LLVM hierarchy because it must keep track 3315 of a large amount of data. The ``Function`` class keeps track of a list of 3316 BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_. 3317 3318 The list of BasicBlock_\ s is the most commonly used part of ``Function`` 3319 objects. The list imposes an implicit ordering of the blocks in the function, 3320 which indicate how the code will be laid out by the backend. Additionally, the 3321 first BasicBlock_ is the implicit entry node for the ``Function``. It is not 3322 legal in LLVM to explicitly branch to this initial block. There are no implicit 3323 exit nodes, and in fact there may be multiple exit nodes from a single 3324 ``Function``. If the BasicBlock_ list is empty, this indicates that the 3325 ``Function`` is actually a function declaration: the actual body of the function 3326 hasn't been linked in yet. 3327 3328 In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track 3329 of the list of formal Argument_\ s that the function receives. This container 3330 manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does 3331 for the BasicBlock_\ s. 3332 3333 The SymbolTable_ is a very rarely used LLVM feature that is only used when you 3334 have to look up a value by name. Aside from that, the SymbolTable_ is used 3335 internally to make sure that there are not conflicts between the names of 3336 Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body. 3337 3338 Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The 3339 value of the function is its address (after linking) which is guaranteed to be 3340 constant. 3341 3342 .. _m_Function: 3343 3344 Important Public Members of the ``Function`` 3345 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3346 3347 * ``Function(const FunctionType *Ty, LinkageTypes Linkage, 3348 const std::string &N = "", Module* Parent = 0)`` 3349 3350 Constructor used when you need to create new ``Function``\ s to add the 3351 program. The constructor must specify the type of the function to create and 3352 what type of linkage the function should have. The FunctionType_ argument 3353 specifies the formal arguments and return value for the function. The same 3354 FunctionType_ value can be used to create multiple functions. The ``Parent`` 3355 argument specifies the Module in which the function is defined. If this 3356 argument is provided, the function will automatically be inserted into that 3357 module's list of functions. 3358 3359 * ``bool isDeclaration()`` 3360 3361 Return whether or not the ``Function`` has a body defined. If the function is 3362 "external", it does not have a body, and thus must be resolved by linking with 3363 a function defined in a different translation unit. 3364 3365 * | ``Function::iterator`` - Typedef for basic block list iterator 3366 | ``Function::const_iterator`` - Typedef for const_iterator. 3367 | ``begin()``, ``end()``, ``size()``, ``empty()`` 3368 3369 These are forwarding methods that make it easy to access the contents of a 3370 ``Function`` object's BasicBlock_ list. 3371 3372 * ``Function::BasicBlockListType &getBasicBlockList()`` 3373 3374 Returns the list of BasicBlock_\ s. This is necessary to use when you need to 3375 update the list or perform a complex action that doesn't have a forwarding 3376 method. 3377 3378 * | ``Function::arg_iterator`` - Typedef for the argument list iterator 3379 | ``Function::const_arg_iterator`` - Typedef for const_iterator. 3380 | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()`` 3381 3382 These are forwarding methods that make it easy to access the contents of a 3383 ``Function`` object's Argument_ list. 3384 3385 * ``Function::ArgumentListType &getArgumentList()`` 3386 3387 Returns the list of Argument_. This is necessary to use when you need to 3388 update the list or perform a complex action that doesn't have a forwarding 3389 method. 3390 3391 * ``BasicBlock &getEntryBlock()`` 3392 3393 Returns the entry ``BasicBlock`` for the function. Because the entry block 3394 for the function is always the first block, this returns the first block of 3395 the ``Function``. 3396 3397 * | ``Type *getReturnType()`` 3398 | ``FunctionType *getFunctionType()`` 3399 3400 This traverses the Type_ of the ``Function`` and returns the return type of 3401 the function, or the FunctionType_ of the actual function. 3402 3403 * ``SymbolTable *getSymbolTable()`` 3404 3405 Return a pointer to the SymbolTable_ for this ``Function``. 3406 3407 .. _GlobalVariable: 3408 3409 The ``GlobalVariable`` class 3410 ---------------------------- 3411 3412 ``#include "llvm/IR/GlobalVariable.h"`` 3413 3414 header source: `GlobalVariable.h 3415 <http://llvm.org/doxygen/GlobalVariable_8h-source.html>`_ 3416 3417 doxygen info: `GlobalVariable Class 3418 <http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_ 3419 3420 Superclasses: GlobalValue_, Constant_, User_, Value_ 3421 3422 Global variables are represented with the (surprise surprise) ``GlobalVariable`` 3423 class. Like functions, ``GlobalVariable``\ s are also subclasses of 3424 GlobalValue_, and as such are always referenced by their address (global values 3425 must live in memory, so their "name" refers to their constant address). See 3426 GlobalValue_ for more on this. Global variables may have an initial value 3427 (which must be a Constant_), and if they have an initializer, they may be marked 3428 as "constant" themselves (indicating that their contents never change at 3429 runtime). 3430 3431 .. _m_GlobalVariable: 3432 3433 Important Public Members of the ``GlobalVariable`` class 3434 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3435 3436 * ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage, 3437 Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)`` 3438 3439 Create a new global variable of the specified type. If ``isConstant`` is true 3440 then the global variable will be marked as unchanging for the program. The 3441 Linkage parameter specifies the type of linkage (internal, external, weak, 3442 linkonce, appending) for the variable. If the linkage is InternalLinkage, 3443 WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then 3444 the resultant global variable will have internal linkage. AppendingLinkage 3445 concatenates together all instances (in different translation units) of the 3446 variable into a single variable but is only applicable to arrays. See the 3447 `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details 3448 on linkage types. Optionally an initializer, a name, and the module to put 3449 the variable into may be specified for the global variable as well. 3450 3451 * ``bool isConstant() const`` 3452 3453 Returns true if this is a global variable that is known not to be modified at 3454 runtime. 3455 3456 * ``bool hasInitializer()`` 3457 3458 Returns true if this ``GlobalVariable`` has an intializer. 3459 3460 * ``Constant *getInitializer()`` 3461 3462 Returns the initial value for a ``GlobalVariable``. It is not legal to call 3463 this method if there is no initializer. 3464 3465 .. _BasicBlock: 3466 3467 The ``BasicBlock`` class 3468 ------------------------ 3469 3470 ``#include "llvm/IR/BasicBlock.h"`` 3471 3472 header source: `BasicBlock.h 3473 <http://llvm.org/doxygen/BasicBlock_8h-source.html>`_ 3474 3475 doxygen info: `BasicBlock Class 3476 <http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_ 3477 3478 Superclass: Value_ 3479 3480 This class represents a single entry single exit section of the code, commonly 3481 known as a basic block by the compiler community. The ``BasicBlock`` class 3482 maintains a list of Instruction_\ s, which form the body of the block. Matching 3483 the language definition, the last element of this list of instructions is always 3484 a terminator instruction (a subclass of the TerminatorInst_ class). 3485 3486 In addition to tracking the list of instructions that make up the block, the 3487 ``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that 3488 it is embedded into. 3489 3490 Note that ``BasicBlock``\ s themselves are Value_\ s, because they are 3491 referenced by instructions like branches and can go in the switch tables. 3492 ``BasicBlock``\ s have type ``label``. 3493 3494 .. _m_BasicBlock: 3495 3496 Important Public Members of the ``BasicBlock`` class 3497 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3498 3499 * ``BasicBlock(const std::string &Name = "", Function *Parent = 0)`` 3500 3501 The ``BasicBlock`` constructor is used to create new basic blocks for 3502 insertion into a function. The constructor optionally takes a name for the 3503 new block, and a :ref:`Function <c_Function>` to insert it into. If the 3504 ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically 3505 inserted at the end of the specified :ref:`Function <c_Function>`, if not 3506 specified, the BasicBlock must be manually inserted into the :ref:`Function 3507 <c_Function>`. 3508 3509 * | ``BasicBlock::iterator`` - Typedef for instruction list iterator 3510 | ``BasicBlock::const_iterator`` - Typedef for const_iterator. 3511 | ``begin()``, ``end()``, ``front()``, ``back()``, 3512 ``size()``, ``empty()`` 3513 STL-style functions for accessing the instruction list. 3514 3515 These methods and typedefs are forwarding functions that have the same 3516 semantics as the standard library methods of the same names. These methods 3517 expose the underlying instruction list of a basic block in a way that is easy 3518 to manipulate. To get the full complement of container operations (including 3519 operations to update the list), you must use the ``getInstList()`` method. 3520 3521 * ``BasicBlock::InstListType &getInstList()`` 3522 3523 This method is used to get access to the underlying container that actually 3524 holds the Instructions. This method must be used when there isn't a 3525 forwarding function in the ``BasicBlock`` class for the operation that you 3526 would like to perform. Because there are no forwarding functions for 3527 "updating" operations, you need to use this if you want to update the contents 3528 of a ``BasicBlock``. 3529 3530 * ``Function *getParent()`` 3531 3532 Returns a pointer to :ref:`Function <c_Function>` the block is embedded into, 3533 or a null pointer if it is homeless. 3534 3535 * ``TerminatorInst *getTerminator()`` 3536 3537 Returns a pointer to the terminator instruction that appears at the end of the 3538 ``BasicBlock``. If there is no terminator instruction, or if the last 3539 instruction in the block is not a terminator, then a null pointer is returned. 3540 3541 .. _Argument: 3542 3543 The ``Argument`` class 3544 ---------------------- 3545 3546 This subclass of Value defines the interface for incoming formal arguments to a 3547 function. A Function maintains a list of its formal arguments. An argument has 3548 a pointer to the parent Function. 3549 3550 3551