Home | History | Annotate | Download | only in docs
      1 ==========================================
      2 Design and Usage of the InAlloca Attribute
      3 ==========================================
      4 
      5 Introduction
      6 ============
      7 
      8 The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
      9 taking the address of an aggregate argument that is being passed by
     10 value through memory.  Primarily, this feature is required for
     11 compatibility with the Microsoft C++ ABI.  Under that ABI, class
     12 instances that are passed by value are constructed directly into
     13 argument stack memory.  Prior to the addition of inalloca, calls in LLVM
     14 were indivisible instructions.  There was no way to perform intermediate
     15 work, such as object construction, between the first stack adjustment
     16 and the final control transfer.  With inalloca, all arguments passed in
     17 memory are modelled as a single alloca, which can be stored to prior to
     18 the call.  Unfortunately, this complicated feature comes with a large
     19 set of restrictions designed to bound the lifetime of the argument
     20 memory around the call.
     21 
     22 For now, it is recommended that frontends and optimizers avoid producing
     23 this construct, primarily because it forces the use of a base pointer.
     24 This feature may grow in the future to allow general mid-level
     25 optimization, but for now, it should be regarded as less efficient than
     26 passing by value with a copy.
     27 
     28 Intended Usage
     29 ==============
     30 
     31 The example below is the intended LLVM IR lowering for some C++ code
     32 that passes two default-constructed ``Foo`` objects to ``g`` in the
     33 32-bit Microsoft C++ ABI.
     34 
     35 .. code-block:: c++
     36 
     37     // Foo is non-trivial.
     38     struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
     39     void g(Foo a, Foo b);
     40     void f() {
     41       g(Foo(), Foo());
     42     }
     43 
     44 .. code-block:: llvm
     45 
     46     %struct.Foo = type { i32, i32 }
     47     declare void @Foo_ctor(%struct.Foo* %this)
     48     declare void @Foo_dtor(%struct.Foo* %this)
     49     declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
     50 
     51     define void @f() {
     52     entry:
     53       %base = call i8* @llvm.stacksave()
     54       %memargs = alloca <{ %struct.Foo, %struct.Foo }>
     55       %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
     56       call void @Foo_ctor(%struct.Foo* %b)
     57 
     58       ; If a's ctor throws, we must destruct b.
     59       %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
     60       invoke void @Foo_ctor(%struct.Foo* %a)
     61           to label %invoke.cont unwind %invoke.unwind
     62 
     63     invoke.cont:
     64       call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
     65       call void @llvm.stackrestore(i8* %base)
     66       ...
     67 
     68     invoke.unwind:
     69       call void @Foo_dtor(%struct.Foo* %b)
     70       call void @llvm.stackrestore(i8* %base)
     71       ...
     72     }
     73 
     74 To avoid stack leaks, the frontend saves the current stack pointer with
     75 a call to :ref:`llvm.stacksave <int_stacksave>`.  Then, it allocates the
     76 argument stack space with alloca and calls the default constructor.  The
     77 default constructor could throw an exception, so the frontend has to
     78 create a landing pad.  The frontend has to destroy the already
     79 constructed argument ``b`` before restoring the stack pointer.  If the
     80 constructor does not unwind, ``g`` is called.  In the Microsoft C++ ABI,
     81 ``g`` will destroy its arguments, and then the stack is restored in
     82 ``f``.
     83 
     84 Design Considerations
     85 =====================
     86 
     87 Lifetime
     88 --------
     89 
     90 The biggest design consideration for this feature is object lifetime.
     91 We cannot model the arguments as static allocas in the entry block,
     92 because all calls need to use the memory at the top of the stack to pass
     93 arguments.  We cannot vend pointers to that memory at function entry
     94 because after code generation they will alias.
     95 
     96 The rule against allocas between argument allocations and the call site
     97 avoids this problem, but it creates a cleanup problem.  Cleanup and
     98 lifetime is handled explicitly with stack save and restore calls.  In
     99 the future, we may want to introduce a new construct such as ``freea``
    100 or ``afree`` to make it clear that this stack adjusting cleanup is less
    101 powerful than a full stack save and restore.
    102 
    103 Nested Calls and Copy Elision
    104 -----------------------------
    105 
    106 We also want to be able to support copy elision into these argument
    107 slots.  This means we have to support multiple live argument
    108 allocations.
    109 
    110 Consider the evaluation of:
    111 
    112 .. code-block:: c++
    113 
    114     // Foo is non-trivial.
    115     struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
    116     Foo bar(Foo b);
    117     int main() {
    118       bar(bar(Foo()));
    119     }
    120 
    121 In this case, we want to be able to elide copies into ``bar``'s argument
    122 slots.  That means we need to have more than one set of argument frames
    123 active at the same time.  First, we need to allocate the frame for the
    124 outer call so we can pass it in as the hidden struct return pointer to
    125 the middle call.  Then we do the same for the middle call, allocating a
    126 frame and passing its address to ``Foo``'s default constructor.  By
    127 wrapping the evaluation of the inner ``bar`` with stack save and
    128 restore, we can have multiple overlapping active call frames.
    129 
    130 Callee-cleanup Calling Conventions
    131 ----------------------------------
    132 
    133 Another wrinkle is the existence of callee-cleanup conventions.  On
    134 Windows, all methods and many other functions adjust the stack to clear
    135 the memory used to pass their arguments.  In some sense, this means that
    136 the allocas are automatically cleared by the call.  However, LLVM
    137 instead models this as a write of undef to all of the inalloca values
    138 passed to the call instead of a stack adjustment.  Frontends should
    139 still restore the stack pointer to avoid a stack leak.
    140 
    141 Exceptions
    142 ----------
    143 
    144 There is also the possibility of an exception.  If argument evaluation
    145 or copy construction throws an exception, the landing pad must do
    146 cleanup, which includes adjusting the stack pointer to avoid a stack
    147 leak.  This means the cleanup of the stack memory cannot be tied to the
    148 call itself.  There needs to be a separate IR-level instruction that can
    149 perform independent cleanup of arguments.
    150 
    151 Efficiency
    152 ----------
    153 
    154 Eventually, it should be possible to generate efficient code for this
    155 construct.  In particular, using inalloca should not require a base
    156 pointer.  If the backend can prove that all points in the CFG only have
    157 one possible stack level, then it can address the stack directly from
    158 the stack pointer.  While this is not yet implemented, the plan is that
    159 the inalloca attribute should not change much, but the frontend IR
    160 generation recommendations may change.
    161