1 .. _exception_handling: 2 3 ========================== 4 Exception Handling in LLVM 5 ========================== 6 7 .. contents:: 8 :local: 9 10 Introduction 11 ============ 12 13 This document is the central repository for all information pertaining to 14 exception handling in LLVM. It describes the format that LLVM exception 15 handling information takes, which is useful for those interested in creating 16 front-ends or dealing directly with the information. Further, this document 17 provides specific examples of what exception handling information is used for in 18 C and C++. 19 20 Itanium ABI Zero-cost Exception Handling 21 ---------------------------------------- 22 23 Exception handling for most programming languages is designed to recover from 24 conditions that rarely occur during general use of an application. To that end, 25 exception handling should not interfere with the main flow of an application's 26 algorithm by performing checkpointing tasks, such as saving the current pc or 27 register state. 28 29 The Itanium ABI Exception Handling Specification defines a methodology for 30 providing outlying data in the form of exception tables without inlining 31 speculative exception handling code in the flow of an application's main 32 algorithm. Thus, the specification is said to add "zero-cost" to the normal 33 execution of an application. 34 35 A more complete description of the Itanium ABI exception handling runtime 36 support of can be found at `Itanium C++ ABI: Exception Handling 37 <http://www.codesourcery.com/cxx-abi/abi-eh.html>`_. A description of the 38 exception frame format can be found at `Exception Frames 39 <http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html>`_, 40 with details of the DWARF 4 specification at `DWARF 4 Standard 41 <http://dwarfstd.org/Dwarf4Std.php>`_. A description for the C++ exception 42 table formats can be found at `Exception Handling Tables 43 <http://www.codesourcery.com/cxx-abi/exceptions.pdf>`_. 44 45 Setjmp/Longjmp Exception Handling 46 --------------------------------- 47 48 Setjmp/Longjmp (SJLJ) based exception handling uses LLVM intrinsics 49 `llvm.eh.sjlj.setjmp`_ and `llvm.eh.sjlj.longjmp`_ to handle control flow for 50 exception handling. 51 52 For each function which does exception processing --- be it ``try``/``catch`` 53 blocks or cleanups --- that function registers itself on a global frame 54 list. When exceptions are unwinding, the runtime uses this list to identify 55 which functions need processing. 56 57 Landing pad selection is encoded in the call site entry of the function 58 context. The runtime returns to the function via `llvm.eh.sjlj.longjmp`_, where 59 a switch table transfers control to the appropriate landing pad based on the 60 index stored in the function context. 61 62 In contrast to DWARF exception handling, which encodes exception regions and 63 frame information in out-of-line tables, SJLJ exception handling builds and 64 removes the unwind frame context at runtime. This results in faster exception 65 handling at the expense of slower execution when no exceptions are thrown. As 66 exceptions are, by their nature, intended for uncommon code paths, DWARF 67 exception handling is generally preferred to SJLJ. 68 69 Overview 70 -------- 71 72 When an exception is thrown in LLVM code, the runtime does its best to find a 73 handler suited to processing the circumstance. 74 75 The runtime first attempts to find an *exception frame* corresponding to the 76 function where the exception was thrown. If the programming language supports 77 exception handling (e.g. C++), the exception frame contains a reference to an 78 exception table describing how to process the exception. If the language does 79 not support exception handling (e.g. C), or if the exception needs to be 80 forwarded to a prior activation, the exception frame contains information about 81 how to unwind the current activation and restore the state of the prior 82 activation. This process is repeated until the exception is handled. If the 83 exception is not handled and no activations remain, then the application is 84 terminated with an appropriate error message. 85 86 Because different programming languages have different behaviors when handling 87 exceptions, the exception handling ABI provides a mechanism for 88 supplying *personalities*. An exception handling personality is defined by 89 way of a *personality function* (e.g. ``__gxx_personality_v0`` in C++), 90 which receives the context of the exception, an *exception structure* 91 containing the exception object type and value, and a reference to the exception 92 table for the current function. The personality function for the current 93 compile unit is specified in a *common exception frame*. 94 95 The organization of an exception table is language dependent. For C++, an 96 exception table is organized as a series of code ranges defining what to do if 97 an exception occurs in that range. Typically, the information associated with a 98 range defines which types of exception objects (using C++ *type info*) that are 99 handled in that range, and an associated action that should take place. Actions 100 typically pass control to a *landing pad*. 101 102 A landing pad corresponds roughly to the code found in the ``catch`` portion of 103 a ``try``/``catch`` sequence. When execution resumes at a landing pad, it 104 receives an *exception structure* and a *selector value* corresponding to the 105 *type* of exception thrown. The selector is then used to determine which *catch* 106 should actually process the exception. 107 108 LLVM Code Generation 109 ==================== 110 111 From a C++ developer's perspective, exceptions are defined in terms of the 112 ``throw`` and ``try``/``catch`` statements. In this section we will describe the 113 implementation of LLVM exception handling in terms of C++ examples. 114 115 Throw 116 ----- 117 118 Languages that support exception handling typically provide a ``throw`` 119 operation to initiate the exception process. Internally, a ``throw`` operation 120 breaks down into two steps. 121 122 #. A request is made to allocate exception space for an exception structure. 123 This structure needs to survive beyond the current activation. This structure 124 will contain the type and value of the object being thrown. 125 126 #. A call is made to the runtime to raise the exception, passing the exception 127 structure as an argument. 128 129 In C++, the allocation of the exception structure is done by the 130 ``__cxa_allocate_exception`` runtime function. The exception raising is handled 131 by ``__cxa_throw``. The type of the exception is represented using a C++ RTTI 132 structure. 133 134 Try/Catch 135 --------- 136 137 A call within the scope of a *try* statement can potentially raise an 138 exception. In those circumstances, the LLVM C++ front-end replaces the call with 139 an ``invoke`` instruction. Unlike a call, the ``invoke`` has two potential 140 continuation points: 141 142 #. where to continue when the call succeeds as per normal, and 143 144 #. where to continue if the call raises an exception, either by a throw or the 145 unwinding of a throw 146 147 The term used to define a the place where an ``invoke`` continues after an 148 exception is called a *landing pad*. LLVM landing pads are conceptually 149 alternative function entry points where an exception structure reference and a 150 type info index are passed in as arguments. The landing pad saves the exception 151 structure reference and then proceeds to select the catch block that corresponds 152 to the type info of the exception object. 153 154 The LLVM `landingpad instruction <LangRef.html#i_landingpad>`_ is used to convey 155 information about the landing pad to the back end. For C++, the ``landingpad`` 156 instruction returns a pointer and integer pair corresponding to the pointer to 157 the *exception structure* and the *selector value* respectively. 158 159 The ``landingpad`` instruction takes a reference to the personality function to 160 be used for this ``try``/``catch`` sequence. The remainder of the instruction is 161 a list of *cleanup*, *catch*, and *filter* clauses. The exception is tested 162 against the clauses sequentially from first to last. The selector value is a 163 positive number if the exception matched a type info, a negative number if it 164 matched a filter, and zero if it matched a cleanup. If nothing is matched, the 165 behavior of the program is `undefined`_. If a type info matched, then the 166 selector value is the index of the type info in the exception table, which can 167 be obtained using the `llvm.eh.typeid.for`_ intrinsic. 168 169 Once the landing pad has the type info selector, the code branches to the code 170 for the first catch. The catch then checks the value of the type info selector 171 against the index of type info for that catch. Since the type info index is not 172 known until all the type infos have been gathered in the backend, the catch code 173 must call the `llvm.eh.typeid.for`_ intrinsic to determine the index for a given 174 type info. If the catch fails to match the selector then control is passed on to 175 the next catch. 176 177 Finally, the entry and exit of catch code is bracketed with calls to 178 ``__cxa_begin_catch`` and ``__cxa_end_catch``. 179 180 * ``__cxa_begin_catch`` takes an exception structure reference as an argument 181 and returns the value of the exception object. 182 183 * ``__cxa_end_catch`` takes no arguments. This function: 184 185 #. Locates the most recently caught exception and decrements its handler 186 count, 187 188 #. Removes the exception from the *caught* stack if the handler count goes to 189 zero, and 190 191 #. Destroys the exception if the handler count goes to zero and the exception 192 was not re-thrown by throw. 193 194 .. note:: 195 196 a rethrow from within the catch may replace this call with a 197 ``__cxa_rethrow``. 198 199 Cleanups 200 -------- 201 202 A cleanup is extra code which needs to be run as part of unwinding a scope. C++ 203 destructors are a typical example, but other languages and language extensions 204 provide a variety of different kinds of cleanups. In general, a landing pad may 205 need to run arbitrary amounts of cleanup code before actually entering a catch 206 block. To indicate the presence of cleanups, a `landingpad 207 instruction <LangRef.html#i_landingpad>`_ should have a *cleanup* 208 clause. Otherwise, the unwinder will not stop at the landing pad if there are no 209 catches or filters that require it to. 210 211 .. note:: 212 213 Do not allow a new exception to propagate out of the execution of a 214 cleanup. This can corrupt the internal state of the unwinder. Different 215 languages describe different high-level semantics for these situations: for 216 example, C++ requires that the process be terminated, whereas Ada cancels both 217 exceptions and throws a third. 218 219 When all cleanups are finished, if the exception is not handled by the current 220 function, resume unwinding by calling the `resume 221 instruction <LangRef.html#i_resume>`_, passing in the result of the 222 ``landingpad`` instruction for the original landing pad. 223 224 Throw Filters 225 ------------- 226 227 C++ allows the specification of which exception types may be thrown from a 228 function. To represent this, a top level landing pad may exist to filter out 229 invalid types. To express this in LLVM code the `landingpad 230 instruction <LangRef.html#i_landingpad>`_ will have a filter clause. The clause 231 consists of an array of type infos. ``landingpad`` will return a negative value 232 if the exception does not match any of the type infos. If no match is found then 233 a call to ``__cxa_call_unexpected`` should be made, otherwise 234 ``_Unwind_Resume``. Each of these functions requires a reference to the 235 exception structure. Note that the most general form of a ``landingpad`` 236 instruction can have any number of catch, cleanup, and filter clauses (though 237 having more than one cleanup is pointless). The LLVM C++ front-end can generate 238 such ``landingpad`` instructions due to inlining creating nested exception 239 handling scopes. 240 241 .. _undefined: 242 243 Restrictions 244 ------------ 245 246 The unwinder delegates the decision of whether to stop in a call frame to that 247 call frame's language-specific personality function. Not all unwinders guarantee 248 that they will stop to perform cleanups. For example, the GNU C++ unwinder 249 doesn't do so unless the exception is actually caught somewhere further up the 250 stack. 251 252 In order for inlining to behave correctly, landing pads must be prepared to 253 handle selector results that they did not originally advertise. Suppose that a 254 function catches exceptions of type ``A``, and it's inlined into a function that 255 catches exceptions of type ``B``. The inliner will update the ``landingpad`` 256 instruction for the inlined landing pad to include the fact that ``B`` is also 257 caught. If that landing pad assumes that it will only be entered to catch an 258 ``A``, it's in for a rude awakening. Consequently, landing pads must test for 259 the selector results they understand and then resume exception propagation with 260 the `resume instruction <LangRef.html#i_resume>`_ if none of the conditions 261 match. 262 263 Exception Handling Intrinsics 264 ============================= 265 266 In addition to the ``landingpad`` and ``resume`` instructions, LLVM uses several 267 intrinsic functions (name prefixed with ``llvm.eh``) to provide exception 268 handling information at various points in generated code. 269 270 .. _llvm.eh.typeid.for: 271 272 llvm.eh.typeid.for 273 ------------------ 274 275 .. code-block:: llvm 276 277 i32 @llvm.eh.typeid.for(i8* %type_info) 278 279 280 This intrinsic returns the type info index in the exception table of the current 281 function. This value can be used to compare against the result of 282 ``landingpad`` instruction. The single argument is a reference to a type info. 283 284 .. _llvm.eh.sjlj.setjmp: 285 286 llvm.eh.sjlj.setjmp 287 ------------------- 288 289 .. code-block:: llvm 290 291 i32 @llvm.eh.sjlj.setjmp(i8* %setjmp_buf) 292 293 For SJLJ based exception handling, this intrinsic forces register saving for the 294 current function and stores the address of the following instruction for use as 295 a destination address by `llvm.eh.sjlj.longjmp`_. The buffer format and the 296 overall functioning of this intrinsic is compatible with the GCC 297 ``__builtin_setjmp`` implementation allowing code built with the clang and GCC 298 to interoperate. 299 300 The single parameter is a pointer to a five word buffer in which the calling 301 context is saved. The front end places the frame pointer in the first word, and 302 the target implementation of this intrinsic should place the destination address 303 for a `llvm.eh.sjlj.longjmp`_ in the second word. The following three words are 304 available for use in a target-specific manner. 305 306 .. _llvm.eh.sjlj.longjmp: 307 308 llvm.eh.sjlj.longjmp 309 -------------------- 310 311 .. code-block:: llvm 312 313 void @llvm.eh.sjlj.longjmp(i8* %setjmp_buf) 314 315 For SJLJ based exception handling, the ``llvm.eh.sjlj.longjmp`` intrinsic is 316 used to implement ``__builtin_longjmp()``. The single parameter is a pointer to 317 a buffer populated by `llvm.eh.sjlj.setjmp`_. The frame pointer and stack 318 pointer are restored from the buffer, then control is transferred to the 319 destination address. 320 321 llvm.eh.sjlj.lsda 322 ----------------- 323 324 .. code-block:: llvm 325 326 i8* @llvm.eh.sjlj.lsda() 327 328 For SJLJ based exception handling, the ``llvm.eh.sjlj.lsda`` intrinsic returns 329 the address of the Language Specific Data Area (LSDA) for the current 330 function. The SJLJ front-end code stores this address in the exception handling 331 function context for use by the runtime. 332 333 llvm.eh.sjlj.callsite 334 --------------------- 335 336 .. code-block:: llvm 337 338 void @llvm.eh.sjlj.callsite(i32 %call_site_num) 339 340 For SJLJ based exception handling, the ``llvm.eh.sjlj.callsite`` intrinsic 341 identifies the callsite value associated with the following ``invoke`` 342 instruction. This is used to ensure that landing pad entries in the LSDA are 343 generated in matching order. 344 345 Asm Table Formats 346 ================= 347 348 There are two tables that are used by the exception handling runtime to 349 determine which actions should be taken when an exception is thrown. 350 351 Exception Handling Frame 352 ------------------------ 353 354 An exception handling frame ``eh_frame`` is very similar to the unwind frame 355 used by DWARF debug info. The frame contains all the information necessary to 356 tear down the current frame and restore the state of the prior frame. There is 357 an exception handling frame for each function in a compile unit, plus a common 358 exception handling frame that defines information common to all functions in the 359 unit. 360 361 Exception Tables 362 ---------------- 363 364 An exception table contains information about what actions to take when an 365 exception is thrown in a particular part of a function's code. There is one 366 exception table per function, except leaf functions and functions that have 367 calls only to non-throwing functions. They do not need an exception table. 368