1 ================================ 2 Frequently Asked Questions (FAQ) 3 ================================ 4 5 .. contents:: 6 :local: 7 8 9 License 10 ======= 11 12 Does the University of Illinois Open Source License really qualify as an "open source" license? 13 ----------------------------------------------------------------------------------------------- 14 Yes, the license is `certified 15 <http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source 16 Initiative (OSI). 17 18 19 Can I modify LLVM source code and redistribute the modified source? 20 ------------------------------------------------------------------- 21 Yes. The modified source distribution must retain the copyright notice and 22 follow the three bulletted conditions listed in the `LLVM license 23 <http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_. 24 25 26 Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source? 27 -------------------------------------------------------------------------------------------------------------------------- 28 Yes. This is why we distribute LLVM under a less restrictive license than GPL, 29 as explained in the first question above. 30 31 32 Source Code 33 =========== 34 35 In what language is LLVM written? 36 --------------------------------- 37 All of the LLVM tools and libraries are written in C++ with extensive use of 38 the STL. 39 40 41 How portable is the LLVM source code? 42 ------------------------------------- 43 The LLVM source code should be portable to most modern Unix-like operating 44 systems. Most of the code is written in standard C++ with operating system 45 services abstracted to a support library. The tools required to build and 46 test LLVM have been ported to a plethora of platforms. 47 48 Some porting problems may exist in the following areas: 49 50 * The autoconf/makefile build system relies heavily on UNIX shell tools, 51 like the Bourne Shell and sed. Porting to systems without these tools 52 (MacOS 9, Plan 9) will require more effort. 53 54 What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation? 55 --------------------------------------------------------------------------------------------------- 56 57 In short: you can't. It's actually kind of a silly question once you grok 58 what's going on. Basically, in code like: 59 60 .. code-block:: llvm 61 62 %result = add i32 %foo, %bar 63 64 , ``%result`` is just a name given to the ``Value`` of the ``add`` 65 instruction. In other words, ``%result`` *is* the add instruction. The 66 "assignment" doesn't explicitly "store" anything to any "virtual register"; 67 the "``=``" is more like the mathematical sense of equality. 68 69 Longer explanation: In order to generate a textual representation of the 70 IR, some kind of name has to be given to each instruction so that other 71 instructions can textually reference it. However, the isomorphic in-memory 72 representation that you manipulate from C++ has no such restriction since 73 instructions can simply keep pointers to any other ``Value``'s that they 74 reference. In fact, the names of dummy numbered temporaries like ``%1`` are 75 not explicitly represented in the in-memory representation at all (see 76 ``Value::getName()``). 77 78 79 Source Languages 80 ================ 81 82 What source languages are supported? 83 ------------------------------------ 84 85 LLVM currently has full support for C and C++ source languages through 86 `Clang <http://clang.llvm.org/>`_. Many other language frontends have 87 been written using LLVM, and an incomplete list is available at 88 `projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_. 89 90 91 I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? 92 ---------------------------------------------------------------------------------------------------------------------------------------- 93 Your compiler front-end will communicate with LLVM by creating a module in the 94 LLVM intermediate representation (IR) format. Assuming you want to write your 95 language's compiler in the language itself (rather than C++), there are 3 96 major ways to tackle generating LLVM IR from a front-end: 97 98 1. **Call into the LLVM libraries code using your language's FFI (foreign 99 function interface).** 100 101 * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format 102 103 * *for:* enables running LLVM optimization passes without a emit/parse 104 overhead 105 106 * *for:* adapts well to a JIT context 107 108 * *against:* lots of ugly glue code to write 109 110 2. **Emit LLVM assembly from your compiler's native language.** 111 112 * *for:* very straightforward to get started 113 114 * *against:* the .ll parser is slower than the bitcode reader when 115 interfacing to the middle end 116 117 * *against:* it may be harder to track changes to the IR 118 119 3. **Emit LLVM bitcode from your compiler's native language.** 120 121 * *for:* can use the more-efficient bitcode reader when interfacing to the 122 middle end 123 124 * *against:* you'll have to re-engineer the LLVM IR object model and bitcode 125 writer in your language 126 127 * *against:* it may be harder to track changes to the IR 128 129 If you go with the first option, the C bindings in include/llvm-c should help 130 a lot, since most languages have strong support for interfacing with C. The 131 most common hurdle with calling C from managed code is interfacing with the 132 garbage collector. The C interface was designed to require very little memory 133 management, and so is straightforward in this regard. 134 135 What support is there for a higher level source language constructs for building a compiler? 136 -------------------------------------------------------------------------------------------- 137 Currently, there isn't much. LLVM supports an intermediate representation 138 which is useful for code representation but will not support the high level 139 (abstract syntax tree) representation needed by most compilers. There are no 140 facilities for lexical nor semantic analysis. 141 142 143 I don't understand the ``GetElementPtr`` instruction. Help! 144 ----------------------------------------------------------- 145 See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_. 146 147 148 Using the C and C++ Front Ends 149 ============================== 150 151 Can I compile C or C++ code to platform-independent LLVM bitcode? 152 ----------------------------------------------------------------- 153 No. C and C++ are inherently platform-dependent languages. The most obvious 154 example of this is the preprocessor. A very common way that C code is made 155 portable is by using the preprocessor to include platform-specific code. In 156 practice, information about other platforms is lost after preprocessing, so 157 the result is inherently dependent on the platform that the preprocessing was 158 targeting. 159 160 Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary 161 between platforms. In most C front-ends, ``sizeof`` is expanded to a 162 constant immediately, thus hard-wiring a platform-specific detail. 163 164 Also, since many platforms define their ABIs in terms of C, and since LLVM is 165 lower-level than C, front-ends currently must emit platform-specific IR in 166 order to have the result conform to the platform ABI. 167 168 169 Questions about code generated by the demo page 170 =============================================== 171 172 What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``? 173 ------------------------------------------------------------------------------------------------------------- 174 If you ``#include`` the ``<iostream>`` header into a C++ translation unit, 175 the file will probably use the ``std::cin``/``std::cout``/... global objects. 176 However, C++ does not guarantee an order of initialization between static 177 objects in different translation units, so if a static ctor/dtor in your .cpp 178 file used ``std::cout``, for example, the object would not necessarily be 179 automatically initialized before your use. 180 181 To make ``std::cout`` and friends work correctly in these scenarios, the STL 182 that we use declares a static object that gets created in every translation 183 unit that includes ``<iostream>``. This object has a static constructor 184 and destructor that initializes and destroys the global iostream objects 185 before they could possibly be used in the file. The code that you see in the 186 ``.ll`` file corresponds to the constructor and destructor registration code. 187 188 If you would like to make it easier to *understand* the LLVM code generated 189 by the compiler in the demo page, consider using ``printf()`` instead of 190 ``iostream``\s to print values. 191 192 193 Where did all of my code go?? 194 ----------------------------- 195 If you are using the LLVM demo page, you may often wonder what happened to 196 all of the code that you typed in. Remember that the demo script is running 197 the code through the LLVM optimizers, so if your code doesn't actually do 198 anything useful, it might all be deleted. 199 200 To prevent this, make sure that the code is actually needed. For example, if 201 you are computing some expression, return the value from the function instead 202 of leaving it in a local variable. If you really want to constrain the 203 optimizer, you can read from and assign to ``volatile`` global variables. 204 205 206 What is this "``undef``" thing that shows up in my code? 207 -------------------------------------------------------- 208 ``undef`` is the LLVM way of representing a value that is not defined. You 209 can get these if you do not initialize a variable before you use it. For 210 example, the C function: 211 212 .. code-block:: c 213 214 int X() { int i; return i; } 215 216 Is compiled to "``ret i32 undef``" because "``i``" never has a value specified 217 for it. 218 219 220 Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it? 221 ---------------------------------------------------------------------------------------------------------------------------------------------------------- 222 This is a common problem run into by authors of front-ends that are using 223 custom calling conventions: you need to make sure to set the right calling 224 convention on both the function and on each call to the function. For 225 example, this code: 226 227 .. code-block:: llvm 228 229 define fastcc void @foo() { 230 ret void 231 } 232 define void @bar() { 233 call void @foo() 234 ret void 235 } 236 237 Is optimized to: 238 239 .. code-block:: llvm 240 241 define fastcc void @foo() { 242 ret void 243 } 244 define void @bar() { 245 unreachable 246 } 247 248 ... with "``opt -instcombine -simplifycfg``". This often bites people because 249 "all their code disappears". Setting the calling convention on the caller and 250 callee is required for indirect calls to work, so people often ask why not 251 make the verifier reject this sort of thing. 252 253 The answer is that this code has undefined behavior, but it is not illegal. 254 If we made it illegal, then every transformation that could potentially create 255 this would have to ensure that it doesn't, and there is valid code that can 256 create this sort of construct (in dead code). The sorts of things that can 257 cause this to happen are fairly contrived, but we still need to accept them. 258 Here's an example: 259 260 .. code-block:: llvm 261 262 define fastcc void @foo() { 263 ret void 264 } 265 define internal void @bar(void()* %FP, i1 %cond) { 266 br i1 %cond, label %T, label %F 267 T: 268 call void %FP() 269 ret void 270 F: 271 call fastcc void %FP() 272 ret void 273 } 274 define void @test() { 275 %X = or i1 false, false 276 call void @bar(void()* @foo, i1 %X) 277 ret void 278 } 279 280 In this example, "test" always passes ``@foo``/``false`` into ``bar``, which 281 ensures that it is dynamically called with the right calling conv (thus, the 282 code is perfectly well defined). If you run this through the inliner, you 283 get this (the explicit "or" is there so that the inliner doesn't dead code 284 eliminate a bunch of stuff): 285 286 .. code-block:: llvm 287 288 define fastcc void @foo() { 289 ret void 290 } 291 define void @test() { 292 %X = or i1 false, false 293 br i1 %X, label %T.i, label %F.i 294 T.i: 295 call void @foo() 296 br label %bar.exit 297 F.i: 298 call fastcc void @foo() 299 br label %bar.exit 300 bar.exit: 301 ret void 302 } 303 304 Here you can see that the inlining pass made an undefined call to ``@foo`` 305 with the wrong calling convention. We really don't want to make the inliner 306 have to know about this sort of thing, so it needs to be valid code. In this 307 case, dead code elimination can trivially remove the undefined code. However, 308 if ``%X`` was an input argument to ``@test``, the inliner would produce this: 309 310 .. code-block:: llvm 311 312 define fastcc void @foo() { 313 ret void 314 } 315 316 define void @test(i1 %X) { 317 br i1 %X, label %T.i, label %F.i 318 T.i: 319 call void @foo() 320 br label %bar.exit 321 F.i: 322 call fastcc void @foo() 323 br label %bar.exit 324 bar.exit: 325 ret void 326 } 327 328 The interesting thing about this is that ``%X`` *must* be false for the 329 code to be well-defined, but no amount of dead code elimination will be able 330 to delete the broken call as unreachable. However, since 331 ``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we 332 end up with a branch on a condition that goes to unreachable: a branch to 333 unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is 334 able to produce: 335 336 .. code-block:: llvm 337 338 define fastcc void @foo() { 339 ret void 340 } 341 define void @test(i1 %X) { 342 F.i: 343 call fastcc void @foo() 344 ret void 345 } 346