Home | History | Annotate | Download | only in internals
      1 
      2 This file describes in detail how Calltree accurately tracks function
      3 entry/exit, one of those harder-than-you'd-think things.
      4 
      5 -----------------------------------------------------------------------------
      6 Josef's description
      7 -----------------------------------------------------------------------------
      8 From: Josef Weidendorfer <Josef.Weidendorfer (a] gmx.de>
      9 To: Nicholas Nethercote <njn25 (a] cam.ac.uk>
     10 Cc: valgrind-developers (a] lists.sourceforge.net
     11 Subject: [Valgrind-developers] Re: Tracking function entry/exit
     12 
     13 On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote:
     14 > Josef,
     15 >
     16 > The topic of tracking function entry/exit has come up a few times on the
     17 > mailing lists recently.  My usual answer is that it's difficult to do
     18 > correctly.  However, you seem to do it with Calltree.  I looked at the
     19 > source code a bit, and it looks like you are doing some reasonably
     20 > complicated things to get it right, eg. unwinding the stack.  How robust
     21 > is your approach?  Can you briefly explain how it works?
     22 
     23 A note before describing the mechanism: I need to have a helper call at start
     24 of every BB anyway, so I use this helper to do the tracking. This of course 
     25 has some overhead, and perhaps can be avoided, but it seems to add to the 
     26 robustness. I have a bug fix here for reentrent entering of a signal handler 
     27 (2 bug reports). Otherwise I have no bug reports, so I assume that the 
     28 mechanism to be quite robust.
     29 
     30 I have a shadow call stack for every thread. For signal handlers of a thread, 
     31 I first PUSH a separation marker on the shadow stack, and use the stack as 
     32 normal. The marker is used for unwinding when leaving the signal handler. 
     33 This is fine as there is no scheduling among signal handlers of one thread.
     34 
     35 Instrumentation of calltree:
     36 * Store at the end of each basic block the jmpkind into a tool-global, static 
     37 variable.
     38 * At the start of every BB, jump to a helper function.
     39 
     40 The helper function does the following regarding function call tracking:
     41 - for a control transfer to another ELF object/ELF section, override jmpkind 
     42   with a CALL (*1)
     43 - for a control transfer to the 1st basic block of a function, override 
     44   jmpkind with a CALL (*2)
     45 - do unwinding if needed (i.e, POPs of the shadow call stack)
     46 - if jmpkind is RET and there was no unwinding/POP:
     47         - if our call stack is empty, simulate a CALL lasting from beginning
     48           (with Valgrind 2.1.x, this is not needed any more, as we run on
     49           simulated CPU from first client instruction)
     50         - otherwise this is a JMP using a RET instruction (typically used in
     51           the runtime linker). Do a POP, setting previous BB address to call
     52           site and override jmpkind with a CALL. By this, you get 2 function
     53           calls from a calling site.
     54 - when jmpkind is a CALL, push new function call from previous BB to current
     55   BB on shadow call stack.
     56 - Save current BB address to be available for call to handler in next BB.
     57 
     58 Special care is needed at thread switches and enter/leave of signal handlers, 
     59 as we need separate shadow call stacks.
     60 
     61 Known bug: We should check for the need of unwinding when ESP is explicitly 
     62 written to. I hope this doesn't create too much overhead.
     63 
     64 Remarks:
     65 (*1) Jumps between ELF objects are function calls to a shared library. This is 
     66      mainly done to catch the JMP from PLT code.
     67 (*2) This is what your function tracking skin/tool does. It is needed here
     68      mainly to catch tail recursion. In general, for functions doing a
     69      "return otherfunction()", GCC produces JMPs with -O2. 
     70 
     71 Additional points:
     72 - If I need a name for a function, but there is no debug info, I use the 
     73   instruction address minus the load offset of the corresponding ELF object
     74   (if there is one) to get a relative address for that ELF object. This
     75   offset can be used with objdump later in postprocessing tools (e.g.
     76   objdump). I would suggest this change even for cachegrind instead of a
     77   "???".
     78 - I introduced the ability to specify functions to be "skipped". This means 
     79   that execution of these functions is attributed to the calling function.
     80   The default is to skip all functions located in PLT sections. Thus, in
     81   effect, costs of PLT functions are attributed to callers, and the call to
     82   a shared library function starts directly with code in the other ELF
     83   object.
     84 - As Vg 2.1.x does pointerchecking, the instrumentation can't write to
     85   memory space of Valgrind any longer. Currently, my tool needs
     86   "--pointercheck=no" to be able to run. Jeremy and me already agreed on
     87   replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer
     88   check and keep original LD/ST for tool usage without pointerchecking.
     89 
     90 Looking at these things, it seems possible to do function tracking at end of a 
     91 basic block instead of the beginning of the next BB. This way, we can perhaps 
     92 avoid calls to helpers at every BB.
     93 
     94 From my point of view, it would be great to integrate optional function 
     95 tracking into Valgrind core with some hooks.
     96 
     97 Josef
     98 
     99 
    100 -----------------------------------------------------------------------------
    101 Josef's clarification of Nick's summary of Josef's description
    102 -----------------------------------------------------------------------------
    103 On Monday 21 June 2004 12:15, Nicholas Nethercote wrote:
    104 
    105 > I've paraphrased your description to help me understand it better, but I'm
    106 > still not quite clear on some points.  I looked at the code, but found it
    107 > hard to understand.  Could you help me?  I've written my questions in
    108 > square brackets.  Here's the description.
    109 >
    110 > --------
    111 >
    112 > Data structures:
    113 >
    114 > - have a shadow call stack for every thread
    115 > [not sure exactly what goes on this]
    116 
    117 That's the resizable array of struct _call_entry's.
    118 Probably most important for call tracking is the %ESP value
    119 directly after a CALL, and a pointer to some struct storing information
    120 about the call arc or the called function.
    121 
    122 The esp value is needed to be able to robustly unwind correctly at %esp 
    123 changes with %esp > stored esp on shadow stack.
    124 
    125 > Action at BB start -- depends on jmp_kind from previous BB:
    126 >
    127 > - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring,
    128 > JmpCond or JmpSyscall) and we transferred from one ELF object/section to
    129 > another, it must be a function call to a shared library -- treat as a
    130 > call.  This catches jmps from PLT code.
    131 >
    132 > - If this is the first BB of a function, treat as a call.  This catches
    133 > tail calls (which gcc uses for "return f()" with -O2).
    134 > [What if a function had a 'goto' back to its beginning?  Would that be
    135 > interpreted as a call?]
    136 
    137 Yes. IMHO, there is no way to distinguish between optimized tail recursion 
    138 using a jump and regular jumping. But as most functions need parameters on 
    139 the stack, a normal jump will rarely jump to the first BB of a function, 
    140 wouldn't it?
    141 
    142 > - Unwind the shadow call stack if necessary.
    143 > [when is "necessary"?  If the real %esp > the shadow stack %esp?]
    144 
    145 Yes. Currently I do this at every BB boundary, but perhaps it should be 
    146 checked at every %esp change. Then, OTOH, it would look strange to attribute 
    147 instructions of one BB to different functions?
    148 
    149 > - If this is a function return and there was no shadow stack unwinding,
    150 > this must be a RET control transfer (typically used in the runtime
    151 > linker).  Pop the shadow call stack, setting the previous BB address to
    152 > call site and override jmpkind with a CALL. By this, you get 2 function
    153 > calls from a calling site.
    154 > [I don't understand this...  What is a "RET control transfer"?  Why do
    155 > you end up with 2 function calls -- is that a bad thing?]
    156 
    157 If there is a RET instruction, this usually should unwind (i.e. leave a 
    158 function) at least one entry of the shadow call stack. But this doesn't need 
    159 to be the case, i.e. even after a RET, %esp could be lower or equal to the 
    160 one on the shadow stack. E.g. suppose
    161 
    162 	PUSH addr
    163 	RET
    164 
    165 This is only another way of saying "JMP addr", and doesn't add/remove any 
    166 stack frame at all.
    167 Now, if addr is (according to debug information) inside of another function, 
    168 this is a JMP between functions, let's say from B to C. Suppose B was called 
    169 from A, I generate a RETURN event to A and a CALL event from A to C in this 
    170 case.
    171 
    172 > - If we're treating the control transfer as a call, push new function call
    173 > from previous BB to current BB on shadow call stack.
    174 > [when is this information used?]
    175 
    176 I meant: Append a struct call_entry to the shadow stack (together with the 
    177 current %esp value). As I said before, the shadow stack is used for robust 
    178 unwinding.
    179 
    180 > - Save current BB address to be available for call to handler in next BB.
    181 >
    182 >
    183 > Other actions:
    184 >
    185 > When entering a signal handler, first push a separation marker on the
    186 > thread's shadow stack, then use it as normal.  The marker is used for
    187 > unwinding when leaving the signal handler.  This is fine as there is no
    188 > scheduling among signal handlers of one thread.
    189 >
    190 > Special care is needed at thread switches and enter/leave of signal
    191 > handlers, as we need separate shadow call stacks.
    192 > [Do you mean "separate shadow call stacks for each thread"?]
    193 
    194 Yes.
    195 
    196 > What about stack switching -- does it cope with that?  (Not that Valgrind
    197 > in general does...)
    198 
    199 No.
    200 If you could give me a hint how to do it, I would be pleased. The problem here 
    201 IMHO is: How to distinguish among a stack switch and allocating a huge array 
    202 on the stack?
    203 
    204 Josef
    205 
    206