Home | History | Annotate | Download | only in tools
      1 Demonstrations of reset-trace, for Linux bcc/BPF.
      2 
      3 
      4 You will probably never need this tool. If you kill -9 a bcc tool (plus other
      5 signals, like SIGTERM), or if a bcc tool crashes, then kernel tracing can be
      6 left in a semi-enabled state. It's not as bad as it sounds: there may just be
      7 overhead for writing to ring buffers that are never read. This tool can be
      8 used to clean up the tracing state, and reset and disable active tracing.
      9 
     10 WARNING: Make sure no other tracing sessions are active, as it will likely
     11 stop them from functioning (perhaps ungracefully).
     12 
     13 This specifically clears the state in at least the following files in
     14 /sys/kernel/debug/tracing: kprobe_events, uprobe_events, trace_pipe.
     15 Other tracing facilities (ftrace) are checked, and if not in an expected state,
     16 a note is printed. All tracing files can be reset with -F for force, but this
     17 will interfere with any other running tracing sessions (eg, ftrace).
     18 
     19 Here's an example:
     20 
     21 # ./reset-trace.sh
     22 #
     23 
     24 That's it.
     25 
     26 
     27 You can use -v to see what it does:
     28 
     29 # ./reset-trace.sh -v
     30 Reseting tracing state...
     31 
     32 Checking /sys/kernel/debug/tracing/kprobe_events
     33 Checking /sys/kernel/debug/tracing/uprobe_events
     34 Checking /sys/kernel/debug/tracing/trace
     35 Checking /sys/kernel/debug/tracing/current_tracer
     36 Checking /sys/kernel/debug/tracing/set_ftrace_filter
     37 Checking /sys/kernel/debug/tracing/set_graph_function
     38 Checking /sys/kernel/debug/tracing/set_ftrace_pid
     39 Checking /sys/kernel/debug/tracing/events/enable
     40 Checking /sys/kernel/debug/tracing/tracing_thresh
     41 Checking /sys/kernel/debug/tracing/tracing_on
     42 
     43 Done.
     44 
     45 In this example, no resetting was necessary.
     46 
     47 
     48 Here's an example of actually needing it:
     49 
     50 # ./funccount 'bash:r*'
     51 Tracing 317 functions for "bash:r*"... Hit Ctrl-C to end.
     52 ^C
     53 FUNC                                    COUNT
     54 rl_free_undo_list                           1
     55 rl_deprep_terminal                          1
     56 readline_internal_teardown                  1
     57 rl_on_new_line                              1
     58 rl_crlf                                     1
     59 rl_clear_signals                            1
     60 rl_prep_terminal                            1
     61 rl_reset_line_state                         1
     62 rl_initialize                               1
     63 rl_newline                                  1
     64 readline_internal_setup                     1
     65 rl_set_screen_size                          1
     66 readline                                    1
     67 rl_set_signals                              1
     68 rl_expand_prompt                            1
     69 replace_history_data                        1
     70 rl_set_prompt                               1
     71 rl_add_undo                                 1
     72 rl_insert_text                              2
     73 rl_insert                                   2
     74 rl_redisplay                                3
     75 rl_read_key                                 3
     76 rl_getc                                     3
     77 readline_internal_char                      3
     78 restore_parser_state                        6
     79 reap_dead_jobs                              6
     80 reset_parser                                6
     81 restore_input_line_state                    6
     82 realloc                                     7
     83 read_octal                                 10
     84 read_tty_modified                          13
     85 run_exit_trap                              13
     86 redirection_expand                         13
     87 restore_pipestatus_array                   18
     88 reader_loop                                20
     89 run_return_trap                            21
     90 remember_args                              25
     91 reset_signal_handlers                      30
     92 remove_quoted_escapes                      60
     93 run_unwind_frame                          102
     94 reset_terminating_signals                 125
     95 restore_original_signals                  139
     96 reset_internal_getopt                     405
     97 run_debug_trap                            719
     98 read_command                              940
     99 remove_quoted_nulls                      1830
    100 run_pending_traps                        3207
    101 ^C
    102 ^C
    103 ^C
    104 
    105 I've traced 317 functions using funccount, and when I hit Ctrl-C, funccount is
    106 not exiting (it can normally take many seconds, but this really looks stuck):
    107 
    108 # pidstat 1
    109 Linux 4.9.0-rc1-virtual (bgregg-xenial-bpf-i-xxx) 	10/18/2016 	_x86_64_	(8 CPU)
    110 
    111 10:00:33 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
    112 10:00:34 PM 60004      3277    0.00    0.98    0.00    0.98     0  redis-server
    113 10:00:34 PM     0     27980   87.25   10.78    0.00   98.04     3  funccount.py
    114 10:00:34 PM     0     29965    0.00    0.98    0.00    0.98     6  pidstat
    115 
    116 10:00:34 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
    117 10:00:35 PM 65534      3276    0.00    1.00    0.00    1.00     2  multilog
    118 10:00:35 PM     0     27980   77.00   23.00    0.00  100.00     3  funccount.py
    119 10:00:35 PM     0     29965    0.00    1.00    0.00    1.00     6  pidstat
    120 10:00:35 PM 60004     29990    0.00    1.00    0.00    1.00     6  catalina.sh
    121 
    122 funccount looks a lot like it's in an infinite loop (I can use a stack-sampling
    123 profiler to confirm). This is a known bug (#665) and may be fixed by the time
    124 you read this. But right now it's a good example of needing reset-trace.
    125 
    126 I'll send a SIGTERM, before resorting to a SIGKILL:
    127 
    128 # kill 27980
    129 Terminated
    130 
    131 Ok, so the process is now gone, but it did leave tracing in a semi-enabled
    132 state. Using reset-trace:
    133 
    134 # ./reset-trace.sh -v
    135 Reseting tracing state...
    136 
    137 Checking /sys/kernel/debug/tracing/kprobe_events
    138 Checking /sys/kernel/debug/tracing/uprobe_events
    139 Needed to reset /sys/kernel/debug/tracing/uprobe_events
    140 uprobe_events, before (line enumerated):
    141      1	p:uprobes/p__bin_bash_0xa2540 /bin/bash:0x00000000000a2540
    142      2	p:uprobes/p__bin_bash_0x21220 /bin/bash:0x0000000000021220
    143      3	p:uprobes/p__bin_bash_0x78530 /bin/bash:0x0000000000078530
    144      4	p:uprobes/p__bin_bash_0xa3840 /bin/bash:0x00000000000a3840
    145      5	p:uprobes/p__bin_bash_0x9c550 /bin/bash:0x000000000009c550
    146      6	p:uprobes/p__bin_bash_0x5e360 /bin/bash:0x000000000005e360
    147      7	p:uprobes/p__bin_bash_0xb2630 /bin/bash:0x00000000000b2630
    148      8	p:uprobes/p__bin_bash_0xb1e70 /bin/bash:0x00000000000b1e70
    149      9	p:uprobes/p__bin_bash_0xb2540 /bin/bash:0x00000000000b2540
    150     10	p:uprobes/p__bin_bash_0xb16e0 /bin/bash:0x00000000000b16e0
    151 [...]
    152    312	p:uprobes/p__bin_bash_0xa80b0 /bin/bash:0x00000000000a80b0
    153    313	p:uprobes/p__bin_bash_0x9e280 /bin/bash:0x000000000009e280
    154    314	p:uprobes/p__bin_bash_0x9e100 /bin/bash:0x000000000009e100
    155    315	p:uprobes/p__bin_bash_0xb2bd0 /bin/bash:0x00000000000b2bd0
    156    316	p:uprobes/p__bin_bash_0x9d9c0 /bin/bash:0x000000000009d9c0
    157    317	p:uprobes/p__bin_bash_0x4a930 /bin/bash:0x000000000004a930
    158 uprobe_events, after (line enumerated):
    159 
    160 Checking /sys/kernel/debug/tracing/trace
    161 Checking /sys/kernel/debug/tracing/current_tracer
    162 Checking /sys/kernel/debug/tracing/set_ftrace_filter
    163 Checking /sys/kernel/debug/tracing/set_graph_function
    164 Checking /sys/kernel/debug/tracing/set_ftrace_pid
    165 Checking /sys/kernel/debug/tracing/events/enable
    166 Checking /sys/kernel/debug/tracing/tracing_thresh
    167 Checking /sys/kernel/debug/tracing/tracing_on
    168 
    169 Done.
    170 
    171 Now looks clean. I did truncate the output here: there were a few hundred lines
    172 from uprobe_events.
    173 
    174 Here's the same situation, but without the verbose option:
    175 
    176 # ./reset-trace.sh
    177 Needed to reset /sys/kernel/debug/tracing/uprobe_events
    178 #
    179 
    180 And again with quiet:
    181 
    182 # ./reset-trace.sh -q
    183 #
    184 
    185 
    186 Here is an example of reset-trace detecting an unrelated tracing session:
    187 
    188 # ./reset-trace.sh 
    189 Noticed unrelated tracing file /sys/kernel/debug/tracing/set_ftrace_filter isn't set as expected. Not reseting (-F to force, -v for verbose).
    190 
    191 And verbose:
    192 
    193 # ./reset-trace.sh -v
    194 Reseting tracing state...
    195 
    196 Checking /sys/kernel/debug/tracing/kprobe_events
    197 Checking /sys/kernel/debug/tracing/uprobe_events
    198 Checking /sys/kernel/debug/tracing/trace
    199 Checking /sys/kernel/debug/tracing/current_tracer
    200 Checking /sys/kernel/debug/tracing/set_ftrace_filter
    201 Noticed unrelated tracing file /sys/kernel/debug/tracing/set_ftrace_filter isn't set as expected. Not reseting (-F to force, -v for verbose).
    202 Contents of set_ftrace_filter is (line enumerated):
    203      1	tcp_send_mss
    204      2	tcp_sendpage
    205      3	tcp_sendmsg
    206      4	tcp_send_dupack
    207      5	tcp_send_challenge_ack.isra.53
    208      6	tcp_send_rcvq
    209      7	tcp_send_ack
    210      8	tcp_send_loss_probe
    211      9	tcp_send_fin
    212     10	tcp_send_active_reset
    213     11	tcp_send_synack
    214     12	tcp_send_delayed_ack
    215     13	tcp_send_window_probe
    216     14	tcp_send_probe0
    217 Expected "".
    218 Checking /sys/kernel/debug/tracing/set_graph_function
    219 Checking /sys/kernel/debug/tracing/set_ftrace_pid
    220 Checking /sys/kernel/debug/tracing/events/enable
    221 Checking /sys/kernel/debug/tracing/tracing_thresh
    222 Checking /sys/kernel/debug/tracing/tracing_on
    223 
    224 Done.
    225 
    226 So this file is not currently used by bcc, but it may be useful to know that
    227 it's not in the default state -- something is either using it or has left it
    228 enabled. These files can be reset with -F, but that may break other tools that
    229 are currently using them.
    230 
    231 
    232 Use -h to print the USAGE message:
    233 
    234 # ./reset-trace.sh -h
    235 USAGE: reset-trace [-Fhqv]
    236                  -F             # force: reset all tracing files
    237                  -v             # verbose: print details while working
    238                  -h             # this usage message
    239                  -q             # quiet: no output
    240   eg,
    241        reset-trace              # disable semi-enabled tracing
    242