Home | History | Annotate | Download | only in internals
      1 
      2 /* Make a thread the running thread.  The thread must previously been
      3    sleeping, and not holding the CPU semaphore. This will set the
      4    thread state to VgTs_Runnable, and the thread will attempt to take
      5    the CPU semaphore.  By the time it returns, tid will be the running
      6    thread. */
      7 extern void VG_(set_running) ( ThreadId tid );
      8 
      9 /* Set a thread into a sleeping state.  Before the call, the thread
     10    must be runnable, and holding the CPU semaphore.  When this call
     11    returns, the thread will be set to the specified sleeping state,
     12    and will not be holding the CPU semaphore.  Note that another
     13    thread could be running by the time this call returns, so the
     14    caller must be careful not to touch any shared state.  It is also
     15    the caller's responsibility to actually block until the thread is
     16    ready to run again. */
     17 extern void VG_(set_sleeping) ( ThreadId tid, ThreadStatus state );
     18 
     19 
     20 The master semaphore is run_sema in vg_scheduler.c.
     21 
     22 
     23 (what happens at a fork?)
     24 
     25 VG_(scheduler_init) registers sched_fork_cleanup as a child atfork
     26 handler.  sched_fork_cleanup, among other things, reinitializes the
     27 semaphore with a new pipe so the process has its own.
     28 
     29 --------------------------------------------------------------------
     30 
     31 Re:   New World signal handling
     32 From: Jeremy Fitzhardinge <jeremy (a] goop.org>
     33 To:   Julian Seward <jseward (a] acm.org>
     34 Date: Mon Mar 14 09:03:51 2005
     35 
     36 Well, the big-picture things to be clear about are:
     37 
     38    1. signal handlers are process-wide global state
     39    2. signal masks are per-thread (there's no notion of a process-wide
     40       signal mask)
     41    3. a signal can be targeted to either
     42          1. the whole process (any eligable thread is picked for
     43             delivery), or
     44          2. a specific thread
     45 
     46 1 is why it is always a bug to temporarily reset a signal handler (say,
     47 for SIGSEGV), because if any other thread happens to be sent one in that
     48 window it will cause havok (I think there's still one instance of this
     49 in the symtab stuff).
     50 2 is the meat of your questions; more below.
     51 3 is responsible for some of the nitty detail in the signal stuff, so
     52 its worth bearing in mind to understand it all. (Note that even if a
     53 signal is targeting the whole process, its only ever delivered to one
     54 particular thread; there's no such thing as a broadcast signal.)
     55 
     56 While a thread are running core code or generated code, it has almost
     57 all its signals blocked (all but the fault signals: SEGV, BUS, ILL, etc).
     58 
     59 Every N basic blocks, each thread calls VG_(poll_signals) to see what
     60 signals are pending for it.  poll_signals grabs the next pending signal
     61 which the client signal mask doesn't block, and sets it up for delivery;
     62 it uses the sigtimedwait() syscall to fetch blocked pending signals
     63 rather than have them delivered to a signal handler.   This means that
     64 we avoid the complexity of having signals delivered asynchronously via
     65 the signal handlers; we can just poll for them synchronously when
     66 they're easy to deal with.
     67 
     68 Fault signals, being caused by a specific instruction, are the exception
     69 because they can't be held off; if they're blocked when an instruction
     70 raises one, the kernel will just summarily kill the process.  Therefore,
     71 they need to be always unblocked, and the signal handler is called when
     72 an instruction raises one of these exceptions. (It's also necessary to
     73 call poll_signals after any syscall which may raise a signal, since
     74 signal-raising syscalls are considered to be synchronous with respect to
     75 their signal; ie, calling kill(getpid(), SIGUSR1) will call the handler
     76 for SIGUSR1 before kill is seen to complete.)
     77 
     78 The one time when the thread's real signal mask actually matches the
     79 client's requested signal mask is while running a blocking syscall.  We
     80 have to set things up to accept signals during a syscall so that we get
     81 the right signal-interrupts-syscall semantics.  The tricky part about
     82 this is that there's no general atomic
     83 set-signal-mask-and-block-in-syscall mechanism, so we need to fake it
     84 with the stuff in VGA_(_client_syscall)/VGA_(interrupted_syscall). 
     85 These two basically form an explicit state machine, where the state
     86 variable is the instruction pointer, which allows it to determine what
     87 point the syscall got to when the async signal happens.  By keeping the
     88 window where signals are actually unblocked very narrow, the number of
     89 possible states is pretty small.
     90 
     91 This is all quite nice because the kernel does almost all the work of
     92 determining which thread should get a signal, what the correct action
     93 for a syscall when it has been interrupted is, etc.  Particularly nice
     94 is that we don't need to worry about all the queuing semantics, and the
     95 per-signal special cases (which is, roughly, signals 1-32 are not queued
     96 except when they are, and signals 33-64 are queued except when they aren't).
     97 
     98 BUT, there's another complexity: because the Unix signal mechanism has
     99 been overloaded to deal with two separate kinds of events (asynchronous
    100 signals raised by kill(), and synchronous faults raised by an
    101 instruction), we can't block a signal for one form and not the other. 
    102 That is, because we have to leave SIGSEGV unblocked for faulting
    103 instructions, it also leaves us open to getting an async SIGSEGV sent
    104 with kill(pid, SIGSEGV). 
    105 
    106 To handle this case, there's a small per-thread signal queue set up to
    107 deal with this case (I'm using tid 0's queue for "signals sent to the
    108 whole process" - a hack, I'll admit).  If an async SIGSEGV (etc) signal
    109 appears, then it is pushed onto the appropriate queue. 
    110 VG_(poll_signals) also checks these queues for pending signals to decide
    111 what signal to deliver next.  These queues are only manipulated with
    112 *all* signals blocked, so there's no risk of two concurrent async signal
    113 handlers modifying the queues at once.  Also, because the liklihood of
    114 actually being sent an async SIGSEGV is pretty low, the queues are only
    115 allocated on demand.
    116 
    117 
    118 
    119 There are two mechanisms to prevent disaster if multiple threads get
    120 signals concurrently.  One is that a signal handler is set up to block a
    121 set of signals while the signal is being delivered.  Valgrind's handlers
    122 block all signals, so there's no risk of a new signal being delivered to
    123 the same thread until the old handler has finished.
    124 
    125 The other is that if the thread which recieves the signal is not running
    126 (ie, doesn't hold the run_sema, which implies it must be waiting for a
    127 syscall to complete), then the signal handler will grab the run_sema
    128 before making any global state changes.  Since the only time we can get
    129 an async signal asynchronously is during a blocking syscall, this should
    130 be all the time. (And since synchronous signals are always the result of
    131 running an instruction, we should already be holding run_sema.)
    132 
    133 
    134 Valgrind will occasionally generate signals for itself. These are always
    135 synchronous faults as a result instruction fetch or something an
    136 instruction did.  The two mechanims are the synth_fault_* functions,
    137 which are used to signal a problem while fetching an instruction, or by
    138 getting generated code to call a helper which contains a fault-raising
    139 instruction (used to deal with illegal/unimplemented instructions and
    140 for instructions who's only job is to raise exceptions).
    141 
    142 That all explains how signals come in, but the second part is how they
    143 get delivered.
    144 
    145 The main function for this is VG_(deliver_signal).  There are three cases:
    146 
    147    1. the process is ignoring the signal (SIG_IGN)
    148    2. the process is using the default handler (SIG_DFL)
    149    3. the process has a handler for the signal
    150 
    151 In general, VG_(deliver_signal) shouldn't be called for ignored signals;
    152 if it has been called, it assumes the ignore is being overridden (if an
    153 instruction gets a SEGV etc, SIG_IGN is ignored and treated as SIG_DFL).
    154 
    155 VG_(deliver_signal) handles the default handler case, and the
    156 client-specified signal handler case.
    157 
    158 The default handler case is relatively easy: the signal's default action
    159 is either Terminate, or Ignore.  We can ignore Ignore.
    160 
    161 Terminate always kills the entire process; there's no such thing as a
    162 thread-specific signal death. Terminate comes in two forms: with
    163 coredump, or without.  vg_default_action() will write a core file, and
    164 then will tell all the threads to start terminating; it then longjmps
    165 back to the current thread's scheduler loop.  The scheduler loop will
    166 terminate immediately, and the master_tid thread will wait for all the
    167 others to exit before shutting down the process (this is the same
    168 mechanism as exit_group).
    169 
    170 Delivering a signal to a client-side handler modifys the thread state so
    171 that there's a signal frame on the stack, and the instruction pointer is
    172 pointing to the handler.  The fiddly bit is that there are two
    173 completely different signal frame formats: old and RT.  While in theory
    174 the exact shape of these frames on stack is abstracted, there are real
    175 programs which know exactly where various parts of the structures are on
    176 stack (most notably, g++'s exception throwing code), which is why it has
    177 to have two separate pieces of code for each frame format.  Another
    178 tricky case is dealing with the client stack running out/overflowing
    179 while setting up the signal frame.
    180 
    181 Signal return is also interesting.  There are two syscalls, sigreturn
    182 and rt_sigreturn, which a signal handler will use to resume execution.
    183 The client will call the right one for the frame it was passed, so the
    184 core doesn't need to track that state.  The tricky part is moving the
    185 frame's register state back into the thread's state, particularly all
    186 the FPU state reformatting gunk.  Also, *sigreturn checks for new
    187 pending signals after the old frame has been cleaned up, since there's a
    188 requirement that all deliverable pending signals are delivered before
    189 the mainline code makes progress.  This means that a program could
    190 live-lock on signals, but that's what would happen running natively...
    191 
    192 Another thing to watch for: programs which unwind the stack (like gdb,
    193 or exception throwers) recognize the existence of a signal frame by
    194 looking at the code the return address points to: if it is one of the
    195 two specific signal return sequences, it knows its a signal frame. 
    196 That's why the signal handler return address must point to a very
    197 specific set of instructions.
    198 
    199 
    200 What else.  Ah, the two internal signals.
    201 
    202 SIGVGKILL is pretty straightforward: its just used to dislodge a thread
    203 from being blocked in a syscall, so that we can get the thread to
    204 terminate in a timely fashion.
    205 
    206 SIGVGCHLD is used by a thread to tell the master_tid that it has
    207 exited.  However, the only time the master_tid cares about this is when
    208 it has already exited, and its waiting for everyone else to exit.  If
    209 the master_tid hasn't exited, then this signal is ignored.  It isn't
    210 enough to simply block it, because that will cause a pile of queued
    211 SIGVGCHLDs to build up, eventually clogging the kernel's signal delivery
    212 mechanism.  If its unblocked and ignored, it doesn't interrupt syscalls
    213 and it doesn't accumulate.
    214 
    215 
    216 I hope that helps clarify things.  And explain why there's so much stuff
    217 in there: it's tracking a very complex and arcane underlying set of
    218 machinery.
    219 
    220     J
    221 
    222 --------------------------------------------------------------------
    223 
    224 >I've been seeing references to 'master thread' around the place.
    225 >What distinguishes the master thread from the rest?  Where does
    226 >the requirement to have a master thread come from?
    227 >
    228 It used to be tid 1, but I had to generalize it.
    229 
    230 The master_tid isn't very special; its main job is at process shutdown. 
    231 It waits for all the other threads to exit, and then produces all the
    232 final reports. Until it exits, it's just a normal thread, with no other
    233 responsibilities.
    234 
    235 The alternative to having a master thread would be to make whichever
    236 thread exits last be responsible for emitting all the output.  That
    237 would work, but it would make the results a bit asynchronous (that is,
    238 if the main thread exits and the other hang around for a while, anyone
    239 waiting on the process would see it as having exited, but no results
    240 would have been produced).
    241 
    242 VG_(master_tid) is a varable to handle the case where a threaded program
    243 forks.  In the first process, the master_tid will be 1.  If that program
    244 creates a few threads, and then, say, thread 3 forks, the child process
    245 will have a single thread in it.  In the child, master_tid will be 3. 
    246 It was easier to make the master thread a variable than to try to work
    247 out how to rename thread 3 to 1 after a fork.
    248 
    249     J
    250 
    251 --------------------------------------------------------------------
    252 
    253 Re:   Fwd: Documentation of kernel's signal routing ?
    254 From: David Woodhouse <...>
    255 To:   Julian Seward <jseward (a] acm.org>
    256 
    257 > Regarding sys_clone created threads.  I have a vague idea that 
    258 > there is a notion of 'thread group'.  I further understand that if 
    259 > one thread in a group calls sys_exit_group then all threads in that
    260 > group exit.  Whereas if a thread calls sys_exit then just that
    261 > thread exits.
    262 > 
    263 > I'm pretty hazy on this:
    264 
    265 Hmm, so am I :)
    266 
    267 > * Is the above correct?
    268 
    269 Yes, I believe so.
    270 
    271 > * How is thread-group membership defined/changed?
    272 
    273 By specifying CLONE_THREAD in the flags to clone(), you remain part of
    274 the same thread group as the parent. In a single-threaded process, the
    275 thread group id (tgid) is the same as the pid. 
    276 
    277 Linux just has tasks, which sometimes happen to share VM -- and now with
    278 NPTL we also share other stuff like signals, etc. The 'pid' in Linux is
    279 what POSIX would call the 'thread id', and the 'tgid' in Linux is
    280 equivalent to the POSIX 'pid'.
    281 
    282 > * Do you know offhand how LinuxThreads and NPTL use thread groups?
    283 
    284 I believe that LT doesn't use the kernel's concept of thread groups at
    285 all. LT predates the kernel's support for proper POSIX-like sharing of
    286 anything much but memory, so uses only the CLONE_VM (and possibly
    287 CLONE_FILES) flags. I don't _think_ it uses CLONE_SIGHAND -- it does
    288 most of its work by propagating signals manually between threads.
    289 
    290 NTPL uses thread groups as generated by the CLONE_THREAD flag, which is
    291 what invokes the POSIX-related thread semantics.
    292 
    293 >   Is it the case that each LinuxThreads threads is in its own
    294 >   group whereas all NTPL threads [in a process] are in a single
    295 >   group?
    296 
    297 Yes, that's my understanding.
    298 
    299 -- 
    300 dwmw2
    301