Home | History | Annotate | Download | only in docs
      1 ==========================
      2 Exception Handling in LLVM
      3 ==========================
      4 
      5 .. contents::
      6    :local:
      7 
      8 Introduction
      9 ============
     10 
     11 This document is the central repository for all information pertaining to
     12 exception handling in LLVM.  It describes the format that LLVM exception
     13 handling information takes, which is useful for those interested in creating
     14 front-ends or dealing directly with the information.  Further, this document
     15 provides specific examples of what exception handling information is used for in
     16 C and C++.
     17 
     18 Itanium ABI Zero-cost Exception Handling
     19 ----------------------------------------
     20 
     21 Exception handling for most programming languages is designed to recover from
     22 conditions that rarely occur during general use of an application.  To that end,
     23 exception handling should not interfere with the main flow of an application's
     24 algorithm by performing checkpointing tasks, such as saving the current pc or
     25 register state.
     26 
     27 The Itanium ABI Exception Handling Specification defines a methodology for
     28 providing outlying data in the form of exception tables without inlining
     29 speculative exception handling code in the flow of an application's main
     30 algorithm.  Thus, the specification is said to add "zero-cost" to the normal
     31 execution of an application.
     32 
     33 A more complete description of the Itanium ABI exception handling runtime
     34 support of can be found at `Itanium C++ ABI: Exception Handling
     35 <http://mentorembedded.github.com/cxx-abi/abi-eh.html>`_. A description of the
     36 exception frame format can be found at `Exception Frames
     37 <http://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html>`_,
     38 with details of the DWARF 4 specification at `DWARF 4 Standard
     39 <http://dwarfstd.org/Dwarf4Std.php>`_.  A description for the C++ exception
     40 table formats can be found at `Exception Handling Tables
     41 <http://mentorembedded.github.com/cxx-abi/exceptions.pdf>`_.
     42 
     43 Setjmp/Longjmp Exception Handling
     44 ---------------------------------
     45 
     46 Setjmp/Longjmp (SJLJ) based exception handling uses LLVM intrinsics
     47 `llvm.eh.sjlj.setjmp`_ and `llvm.eh.sjlj.longjmp`_ to handle control flow for
     48 exception handling.
     49 
     50 For each function which does exception processing --- be it ``try``/``catch``
     51 blocks or cleanups --- that function registers itself on a global frame
     52 list. When exceptions are unwinding, the runtime uses this list to identify
     53 which functions need processing.
     54 
     55 Landing pad selection is encoded in the call site entry of the function
     56 context. The runtime returns to the function via `llvm.eh.sjlj.longjmp`_, where
     57 a switch table transfers control to the appropriate landing pad based on the
     58 index stored in the function context.
     59 
     60 In contrast to DWARF exception handling, which encodes exception regions and
     61 frame information in out-of-line tables, SJLJ exception handling builds and
     62 removes the unwind frame context at runtime. This results in faster exception
     63 handling at the expense of slower execution when no exceptions are thrown. As
     64 exceptions are, by their nature, intended for uncommon code paths, DWARF
     65 exception handling is generally preferred to SJLJ.
     66 
     67 Overview
     68 --------
     69 
     70 When an exception is thrown in LLVM code, the runtime does its best to find a
     71 handler suited to processing the circumstance.
     72 
     73 The runtime first attempts to find an *exception frame* corresponding to the
     74 function where the exception was thrown.  If the programming language supports
     75 exception handling (e.g. C++), the exception frame contains a reference to an
     76 exception table describing how to process the exception.  If the language does
     77 not support exception handling (e.g. C), or if the exception needs to be
     78 forwarded to a prior activation, the exception frame contains information about
     79 how to unwind the current activation and restore the state of the prior
     80 activation.  This process is repeated until the exception is handled. If the
     81 exception is not handled and no activations remain, then the application is
     82 terminated with an appropriate error message.
     83 
     84 Because different programming languages have different behaviors when handling
     85 exceptions, the exception handling ABI provides a mechanism for
     86 supplying *personalities*. An exception handling personality is defined by
     87 way of a *personality function* (e.g. ``__gxx_personality_v0`` in C++),
     88 which receives the context of the exception, an *exception structure*
     89 containing the exception object type and value, and a reference to the exception
     90 table for the current function.  The personality function for the current
     91 compile unit is specified in a *common exception frame*.
     92 
     93 The organization of an exception table is language dependent. For C++, an
     94 exception table is organized as a series of code ranges defining what to do if
     95 an exception occurs in that range. Typically, the information associated with a
     96 range defines which types of exception objects (using C++ *type info*) that are
     97 handled in that range, and an associated action that should take place. Actions
     98 typically pass control to a *landing pad*.
     99 
    100 A landing pad corresponds roughly to the code found in the ``catch`` portion of
    101 a ``try``/``catch`` sequence. When execution resumes at a landing pad, it
    102 receives an *exception structure* and a *selector value* corresponding to the
    103 *type* of exception thrown. The selector is then used to determine which *catch*
    104 should actually process the exception.
    105 
    106 LLVM Code Generation
    107 ====================
    108 
    109 From a C++ developer's perspective, exceptions are defined in terms of the
    110 ``throw`` and ``try``/``catch`` statements. In this section we will describe the
    111 implementation of LLVM exception handling in terms of C++ examples.
    112 
    113 Throw
    114 -----
    115 
    116 Languages that support exception handling typically provide a ``throw``
    117 operation to initiate the exception process. Internally, a ``throw`` operation
    118 breaks down into two steps.
    119 
    120 #. A request is made to allocate exception space for an exception structure.
    121    This structure needs to survive beyond the current activation. This structure
    122    will contain the type and value of the object being thrown.
    123 
    124 #. A call is made to the runtime to raise the exception, passing the exception
    125    structure as an argument.
    126 
    127 In C++, the allocation of the exception structure is done by the
    128 ``__cxa_allocate_exception`` runtime function. The exception raising is handled
    129 by ``__cxa_throw``. The type of the exception is represented using a C++ RTTI
    130 structure.
    131 
    132 Try/Catch
    133 ---------
    134 
    135 A call within the scope of a *try* statement can potentially raise an
    136 exception. In those circumstances, the LLVM C++ front-end replaces the call with
    137 an ``invoke`` instruction. Unlike a call, the ``invoke`` has two potential
    138 continuation points:
    139 
    140 #. where to continue when the call succeeds as per normal, and
    141 
    142 #. where to continue if the call raises an exception, either by a throw or the
    143    unwinding of a throw
    144 
    145 The term used to define the place where an ``invoke`` continues after an
    146 exception is called a *landing pad*. LLVM landing pads are conceptually
    147 alternative function entry points where an exception structure reference and a
    148 type info index are passed in as arguments. The landing pad saves the exception
    149 structure reference and then proceeds to select the catch block that corresponds
    150 to the type info of the exception object.
    151 
    152 The LLVM :ref:`i_landingpad` is used to convey information about the landing
    153 pad to the back end. For C++, the ``landingpad`` instruction returns a pointer
    154 and integer pair corresponding to the pointer to the *exception structure* and
    155 the *selector value* respectively.
    156 
    157 The ``landingpad`` instruction takes a reference to the personality function to
    158 be used for this ``try``/``catch`` sequence. The remainder of the instruction is
    159 a list of *cleanup*, *catch*, and *filter* clauses. The exception is tested
    160 against the clauses sequentially from first to last. The clauses have the
    161 following meanings:
    162 
    163 -  ``catch <type> @ExcType``
    164 
    165    - This clause means that the landingpad block should be entered if the
    166      exception being thrown is of type ``@ExcType`` or a subtype of
    167      ``@ExcType``. For C++, ``@ExcType`` is a pointer to the ``std::type_info``
    168      object (an RTTI object) representing the C++ exception type.
    169 
    170    - If ``@ExcType`` is ``null``, any exception matches, so the landingpad
    171      should always be entered. This is used for C++ catch-all blocks ("``catch
    172      (...)``").
    173 
    174    - When this clause is matched, the selector value will be equal to the value
    175      returned by "``@llvm.eh.typeid.for(i8* @ExcType)``". This will always be a
    176      positive value.
    177 
    178 -  ``filter <type> [<type> @ExcType1, ..., <type> @ExcTypeN]``
    179 
    180    - This clause means that the landingpad should be entered if the exception
    181      being thrown does *not* match any of the types in the list (which, for C++,
    182      are again specified as ``std::type_info`` pointers).
    183 
    184    - C++ front-ends use this to implement C++ exception specifications, such as
    185      "``void foo() throw (ExcType1, ..., ExcTypeN) { ... }``".
    186 
    187    - When this clause is matched, the selector value will be negative.
    188 
    189    - The array argument to ``filter`` may be empty; for example, "``[0 x i8**]
    190      undef``". This means that the landingpad should always be entered. (Note
    191      that such a ``filter`` would not be equivalent to "``catch i8* null``",
    192      because ``filter`` and ``catch`` produce negative and positive selector
    193      values respectively.)
    194 
    195 -  ``cleanup``
    196 
    197    - This clause means that the landingpad should always be entered.
    198 
    199    - C++ front-ends use this for calling objects' destructors.
    200 
    201    - When this clause is matched, the selector value will be zero.
    202 
    203    - The runtime may treat "``cleanup``" differently from "``catch <type>
    204      null``".
    205 
    206      In C++, if an unhandled exception occurs, the language runtime will call
    207      ``std::terminate()``, but it is implementation-defined whether the runtime
    208      unwinds the stack and calls object destructors first. For example, the GNU
    209      C++ unwinder does not call object destructors when an unhandled exception
    210      occurs. The reason for this is to improve debuggability: it ensures that
    211      ``std::terminate()`` is called from the context of the ``throw``, so that
    212      this context is not lost by unwinding the stack. A runtime will typically
    213      implement this by searching for a matching non-``cleanup`` clause, and
    214      aborting if it does not find one, before entering any landingpad blocks.
    215 
    216 Once the landing pad has the type info selector, the code branches to the code
    217 for the first catch. The catch then checks the value of the type info selector
    218 against the index of type info for that catch.  Since the type info index is not
    219 known until all the type infos have been gathered in the backend, the catch code
    220 must call the `llvm.eh.typeid.for`_ intrinsic to determine the index for a given
    221 type info. If the catch fails to match the selector then control is passed on to
    222 the next catch.
    223 
    224 Finally, the entry and exit of catch code is bracketed with calls to
    225 ``__cxa_begin_catch`` and ``__cxa_end_catch``.
    226 
    227 * ``__cxa_begin_catch`` takes an exception structure reference as an argument
    228   and returns the value of the exception object.
    229 
    230 * ``__cxa_end_catch`` takes no arguments. This function:
    231 
    232   #. Locates the most recently caught exception and decrements its handler
    233      count,
    234 
    235   #. Removes the exception from the *caught* stack if the handler count goes to
    236      zero, and
    237 
    238   #. Destroys the exception if the handler count goes to zero and the exception
    239      was not re-thrown by throw.
    240 
    241   .. note::
    242 
    243     a rethrow from within the catch may replace this call with a
    244     ``__cxa_rethrow``.
    245 
    246 Cleanups
    247 --------
    248 
    249 A cleanup is extra code which needs to be run as part of unwinding a scope.  C++
    250 destructors are a typical example, but other languages and language extensions
    251 provide a variety of different kinds of cleanups. In general, a landing pad may
    252 need to run arbitrary amounts of cleanup code before actually entering a catch
    253 block. To indicate the presence of cleanups, a :ref:`i_landingpad` should have
    254 a *cleanup* clause.  Otherwise, the unwinder will not stop at the landing pad if
    255 there are no catches or filters that require it to.
    256 
    257 .. note::
    258 
    259   Do not allow a new exception to propagate out of the execution of a
    260   cleanup. This can corrupt the internal state of the unwinder.  Different
    261   languages describe different high-level semantics for these situations: for
    262   example, C++ requires that the process be terminated, whereas Ada cancels both
    263   exceptions and throws a third.
    264 
    265 When all cleanups are finished, if the exception is not handled by the current
    266 function, resume unwinding by calling the `resume
    267 instruction <LangRef.html#i_resume>`_, passing in the result of the
    268 ``landingpad`` instruction for the original landing pad.
    269 
    270 Throw Filters
    271 -------------
    272 
    273 C++ allows the specification of which exception types may be thrown from a
    274 function. To represent this, a top level landing pad may exist to filter out
    275 invalid types. To express this in LLVM code the :ref:`i_landingpad` will have a
    276 filter clause. The clause consists of an array of type infos.
    277 ``landingpad`` will return a negative value
    278 if the exception does not match any of the type infos. If no match is found then
    279 a call to ``__cxa_call_unexpected`` should be made, otherwise
    280 ``_Unwind_Resume``.  Each of these functions requires a reference to the
    281 exception structure.  Note that the most general form of a ``landingpad``
    282 instruction can have any number of catch, cleanup, and filter clauses (though
    283 having more than one cleanup is pointless). The LLVM C++ front-end can generate
    284 such ``landingpad`` instructions due to inlining creating nested exception
    285 handling scopes.
    286 
    287 .. _undefined:
    288 
    289 Restrictions
    290 ------------
    291 
    292 The unwinder delegates the decision of whether to stop in a call frame to that
    293 call frame's language-specific personality function. Not all unwinders guarantee
    294 that they will stop to perform cleanups. For example, the GNU C++ unwinder
    295 doesn't do so unless the exception is actually caught somewhere further up the
    296 stack.
    297 
    298 In order for inlining to behave correctly, landing pads must be prepared to
    299 handle selector results that they did not originally advertise. Suppose that a
    300 function catches exceptions of type ``A``, and it's inlined into a function that
    301 catches exceptions of type ``B``. The inliner will update the ``landingpad``
    302 instruction for the inlined landing pad to include the fact that ``B`` is also
    303 caught. If that landing pad assumes that it will only be entered to catch an
    304 ``A``, it's in for a rude awakening.  Consequently, landing pads must test for
    305 the selector results they understand and then resume exception propagation with
    306 the `resume instruction <LangRef.html#i_resume>`_ if none of the conditions
    307 match.
    308 
    309 Exception Handling Intrinsics
    310 =============================
    311 
    312 In addition to the ``landingpad`` and ``resume`` instructions, LLVM uses several
    313 intrinsic functions (name prefixed with ``llvm.eh``) to provide exception
    314 handling information at various points in generated code.
    315 
    316 .. _llvm.eh.typeid.for:
    317 
    318 ``llvm.eh.typeid.for``
    319 ----------------------
    320 
    321 .. code-block:: llvm
    322 
    323   i32 @llvm.eh.typeid.for(i8* %type_info)
    324 
    325 
    326 This intrinsic returns the type info index in the exception table of the current
    327 function.  This value can be used to compare against the result of
    328 ``landingpad`` instruction.  The single argument is a reference to a type info.
    329 
    330 Uses of this intrinsic are generated by the C++ front-end.
    331 
    332 SJLJ Intrinsics
    333 ---------------
    334 
    335 The ``llvm.eh.sjlj`` intrinsics are used internally within LLVM's
    336 backend.  Uses of them are generated by the backend's
    337 ``SjLjEHPrepare`` pass.
    338 
    339 .. _llvm.eh.sjlj.setjmp:
    340 
    341 ``llvm.eh.sjlj.setjmp``
    342 ~~~~~~~~~~~~~~~~~~~~~~~
    343 
    344 .. code-block:: llvm
    345 
    346   i32 @llvm.eh.sjlj.setjmp(i8* %setjmp_buf)
    347 
    348 For SJLJ based exception handling, this intrinsic forces register saving for the
    349 current function and stores the address of the following instruction for use as
    350 a destination address by `llvm.eh.sjlj.longjmp`_. The buffer format and the
    351 overall functioning of this intrinsic is compatible with the GCC
    352 ``__builtin_setjmp`` implementation allowing code built with the clang and GCC
    353 to interoperate.
    354 
    355 The single parameter is a pointer to a five word buffer in which the calling
    356 context is saved. The front end places the frame pointer in the first word, and
    357 the target implementation of this intrinsic should place the destination address
    358 for a `llvm.eh.sjlj.longjmp`_ in the second word. The following three words are
    359 available for use in a target-specific manner.
    360 
    361 .. _llvm.eh.sjlj.longjmp:
    362 
    363 ``llvm.eh.sjlj.longjmp``
    364 ~~~~~~~~~~~~~~~~~~~~~~~~
    365 
    366 .. code-block:: llvm
    367 
    368   void @llvm.eh.sjlj.longjmp(i8* %setjmp_buf)
    369 
    370 For SJLJ based exception handling, the ``llvm.eh.sjlj.longjmp`` intrinsic is
    371 used to implement ``__builtin_longjmp()``. The single parameter is a pointer to
    372 a buffer populated by `llvm.eh.sjlj.setjmp`_. The frame pointer and stack
    373 pointer are restored from the buffer, then control is transferred to the
    374 destination address.
    375 
    376 ``llvm.eh.sjlj.lsda``
    377 ~~~~~~~~~~~~~~~~~~~~~
    378 
    379 .. code-block:: llvm
    380 
    381   i8* @llvm.eh.sjlj.lsda()
    382 
    383 For SJLJ based exception handling, the ``llvm.eh.sjlj.lsda`` intrinsic returns
    384 the address of the Language Specific Data Area (LSDA) for the current
    385 function. The SJLJ front-end code stores this address in the exception handling
    386 function context for use by the runtime.
    387 
    388 ``llvm.eh.sjlj.callsite``
    389 ~~~~~~~~~~~~~~~~~~~~~~~~~
    390 
    391 .. code-block:: llvm
    392 
    393   void @llvm.eh.sjlj.callsite(i32 %call_site_num)
    394 
    395 For SJLJ based exception handling, the ``llvm.eh.sjlj.callsite`` intrinsic
    396 identifies the callsite value associated with the following ``invoke``
    397 instruction. This is used to ensure that landing pad entries in the LSDA are
    398 generated in matching order.
    399 
    400 Asm Table Formats
    401 =================
    402 
    403 There are two tables that are used by the exception handling runtime to
    404 determine which actions should be taken when an exception is thrown.
    405 
    406 Exception Handling Frame
    407 ------------------------
    408 
    409 An exception handling frame ``eh_frame`` is very similar to the unwind frame
    410 used by DWARF debug info. The frame contains all the information necessary to
    411 tear down the current frame and restore the state of the prior frame. There is
    412 an exception handling frame for each function in a compile unit, plus a common
    413 exception handling frame that defines information common to all functions in the
    414 unit.
    415 
    416 Exception Tables
    417 ----------------
    418 
    419 An exception table contains information about what actions to take when an
    420 exception is thrown in a particular part of a function's code. There is one
    421 exception table per function, except leaf functions and functions that have
    422 calls only to non-throwing functions. They do not need an exception table.
    423