Home | History | Annotate | Download | only in tools
      1 Demonstrations of deadlock_detector.
      2 
      3 This program detects potential deadlocks on a running process. The program
      4 attaches uprobes on `pthread_mutex_lock` and `pthread_mutex_unlock` to build
      5 a mutex wait directed graph, and then looks for a cycle in this graph. This
      6 graph has the following properties:
      7 
      8 - Nodes in the graph represent mutexes.
      9 - Edge (A, B) exists if there exists some thread T where lock(A) was called
     10   and lock(B) was called before unlock(A) was called.
     11 
     12 If there is a cycle in this graph, this indicates that there is a lock order
     13 inversion (potential deadlock). If the program finds a lock order inversion, the
     14 program will dump the cycle of mutexes, dump the stack traces where each mutex
     15 was acquired, and then exit.
     16 
     17 This program can only find potential deadlocks that occur while the program
     18 is tracing the process. It cannot find deadlocks that may have occurred
     19 before the program was attached to the process.
     20 
     21 Since this traces all mutex lock and unlock events and all thread creation
     22 events on the traced process, the overhead of this bpf program can be very
     23 high if the process has many threads and mutexes. You should only run this on
     24 a process where the slowdown is acceptable.
     25 
     26 Note: This tool does not work for shared mutexes or recursive mutexes.
     27 
     28 For shared (read-write) mutexes, a deadlock requires a cycle in the wait
     29 graph where at least one of the mutexes in the cycle is acquiring exclusive
     30 (write) ownership.
     31 
     32 For recursive mutexes, lock() is called multiple times on the same mutex.
     33 However, there is no way to determine if a mutex is a recursive mutex
     34 after the mutex has been created. As a result, this tool will not find
     35 potential deadlocks that involve only one mutex.
     36 
     37 
     38 # ./deadlock_detector.py 181
     39 Tracing... Hit Ctrl-C to end.
     40 ----------------
     41 Potential Deadlock Detected!
     42 
     43 Cycle in lock order graph: Mutex M0 (main::static_mutex3 0x0000000000473c60) => Mutex M1 (0x00007fff6d738400) => Mutex M2 (global_mutex1 0x0000000000473be0) => Mutex M3 (global_mutex2 0x0000000000473c20) => Mutex M0 (main::static_mutex3 0x0000000000473c60)
     44 
     45 Mutex M1 (0x00007fff6d738400) acquired here while holding Mutex M0 (main::static_mutex3 0x0000000000473c60) in Thread 357250 (lockinversion):
     46 @ 00000000004024d0 pthread_mutex_lock
     47 @ 0000000000406dd0 std::mutex::lock()
     48 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
     49 @ 0000000000402e38 main::{lambda()#3}::operator()() const
     50 @ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
     51 @ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
     52 @ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
     53 @ 00007fd4496564e1 execute_native_thread_routine
     54 @ 00007fd449dd57f1 start_thread
     55 @ 00007fd44909746d __clone
     56 
     57 Mutex M0 (main::static_mutex3 0x0000000000473c60) previously acquired by the same Thread 357250 (lockinversion) here:
     58 @ 00000000004024d0 pthread_mutex_lock
     59 @ 0000000000406dd0 std::mutex::lock()
     60 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
     61 @ 0000000000402e22 main::{lambda()#3}::operator()() const
     62 @ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
     63 @ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
     64 @ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
     65 @ 00007fd4496564e1 execute_native_thread_routine
     66 @ 00007fd449dd57f1 start_thread
     67 @ 00007fd44909746d __clone
     68 
     69 Mutex M2 (global_mutex1 0x0000000000473be0) acquired here while holding Mutex M1 (0x00007fff6d738400) in Thread 357251 (lockinversion):
     70 @ 00000000004024d0 pthread_mutex_lock
     71 @ 0000000000406dd0 std::mutex::lock()
     72 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
     73 @ 0000000000402ea8 main::{lambda()#4}::operator()() const
     74 @ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
     75 @ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
     76 @ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
     77 @ 00007fd4496564e1 execute_native_thread_routine
     78 @ 00007fd449dd57f1 start_thread
     79 @ 00007fd44909746d __clone
     80 
     81 Mutex M1 (0x00007fff6d738400) previously acquired by the same Thread 357251 (lockinversion) here:
     82 @ 00000000004024d0 pthread_mutex_lock
     83 @ 0000000000406dd0 std::mutex::lock()
     84 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
     85 @ 0000000000402e97 main::{lambda()#4}::operator()() const
     86 @ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
     87 @ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
     88 @ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
     89 @ 00007fd4496564e1 execute_native_thread_routine
     90 @ 00007fd449dd57f1 start_thread
     91 @ 00007fd44909746d __clone
     92 
     93 Mutex M3 (global_mutex2 0x0000000000473c20) acquired here while holding Mutex M2 (global_mutex1 0x0000000000473be0) in Thread 357247 (lockinversion):
     94 @ 00000000004024d0 pthread_mutex_lock
     95 @ 0000000000406dd0 std::mutex::lock()
     96 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
     97 @ 0000000000402d5f main::{lambda()#1}::operator()() const
     98 @ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
     99 @ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
    100 @ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
    101 @ 00007fd4496564e1 execute_native_thread_routine
    102 @ 00007fd449dd57f1 start_thread
    103 @ 00007fd44909746d __clone
    104 
    105 Mutex M2 (global_mutex1 0x0000000000473be0) previously acquired by the same Thread 357247 (lockinversion) here:
    106 @ 00000000004024d0 pthread_mutex_lock
    107 @ 0000000000406dd0 std::mutex::lock()
    108 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
    109 @ 0000000000402d4e main::{lambda()#1}::operator()() const
    110 @ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
    111 @ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
    112 @ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
    113 @ 00007fd4496564e1 execute_native_thread_routine
    114 @ 00007fd449dd57f1 start_thread
    115 @ 00007fd44909746d __clone
    116 
    117 Mutex M0 (main::static_mutex3 0x0000000000473c60) acquired here while holding Mutex M3 (global_mutex2 0x0000000000473c20) in Thread 357248 (lockinversion):
    118 @ 00000000004024d0 pthread_mutex_lock
    119 @ 0000000000406dd0 std::mutex::lock()
    120 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
    121 @ 0000000000402dc9 main::{lambda()#2}::operator()() const
    122 @ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
    123 @ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
    124 @ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
    125 @ 00007fd4496564e1 execute_native_thread_routine
    126 @ 00007fd449dd57f1 start_thread
    127 @ 00007fd44909746d __clone
    128 
    129 Mutex M3 (global_mutex2 0x0000000000473c20) previously acquired by the same Thread 357248 (lockinversion) here:
    130 @ 00000000004024d0 pthread_mutex_lock
    131 @ 0000000000406dd0 std::mutex::lock()
    132 @ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
    133 @ 0000000000402db8 main::{lambda()#2}::operator()() const
    134 @ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
    135 @ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
    136 @ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
    137 @ 00007fd4496564e1 execute_native_thread_routine
    138 @ 00007fd449dd57f1 start_thread
    139 @ 00007fd44909746d __clone
    140 
    141 Thread 357248 created by Thread 350692 (lockinversion) here:
    142 @ 00007fd449097431 __clone
    143 @ 00007fd449dd5ef5 pthread_create
    144 @ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
    145 @ 00000000004033ac std::thread::thread<main::{lambda()#2}>(main::{lambda()#2}&&)
    146 @ 000000000040308f main
    147 @ 00007fd448faa0f6 __libc_start_main
    148 @ 0000000000402ad8 [unknown]
    149 
    150 Thread 357250 created by Thread 350692 (lockinversion) here:
    151 @ 00007fd449097431 __clone
    152 @ 00007fd449dd5ef5 pthread_create
    153 @ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
    154 @ 00000000004034b2 std::thread::thread<main::{lambda()#3}>(main::{lambda()#3}&&)
    155 @ 00000000004030b9 main
    156 @ 00007fd448faa0f6 __libc_start_main
    157 @ 0000000000402ad8 [unknown]
    158 
    159 Thread 357251 created by Thread 350692 (lockinversion) here:
    160 @ 00007fd449097431 __clone
    161 @ 00007fd449dd5ef5 pthread_create
    162 @ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
    163 @ 00000000004035b8 std::thread::thread<main::{lambda()#4}>(main::{lambda()#4}&&)
    164 @ 00000000004030e6 main
    165 @ 00007fd448faa0f6 __libc_start_main
    166 @ 0000000000402ad8 [unknown]
    167 
    168 Thread 357247 created by Thread 350692 (lockinversion) here:
    169 @ 00007fd449097431 __clone
    170 @ 00007fd449dd5ef5 pthread_create
    171 @ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
    172 @ 00000000004032a6 std::thread::thread<main::{lambda()#1}>(main::{lambda()#1}&&)
    173 @ 0000000000403070 main
    174 @ 00007fd448faa0f6 __libc_start_main
    175 @ 0000000000402ad8 [unknown]
    176 
    177 This is output from a process that has a potential deadlock involving 4 mutexes
    178 and 4 threads:
    179 
    180 - Thread 357250 acquired M1 while holding M0 (edge M0 -> M1)
    181 - Thread 357251 acquired M2 while holding M1 (edge M1 -> M2)
    182 - Thread 357247 acquired M3 while holding M2 (edge M2 -> M3)
    183 - Thread 357248 acquired M0 while holding M3 (edge M3 -> M0)
    184 
    185 This is the C++ program that generated the output above:
    186 
    187 ```c++
    188 #include <chrono>
    189 #include <iostream>
    190 #include <mutex>
    191 #include <thread>
    192 
    193 std::mutex global_mutex1;
    194 std::mutex global_mutex2;
    195 
    196 int main(void) {
    197   static std::mutex static_mutex3;
    198   std::mutex local_mutex4;
    199 
    200   std::cout << "sleeping for a bit to allow trace to attach..." << std::endl;
    201   std::this_thread::sleep_for(std::chrono::seconds(10));
    202   std::cout << "starting program..." << std::endl;
    203 
    204   auto t1 = std::thread([] {
    205     std::lock_guard<std::mutex> g1(global_mutex1);
    206     std::lock_guard<std::mutex> g2(global_mutex2);
    207   });
    208   t1.join();
    209 
    210   auto t2 = std::thread([] {
    211     std::lock_guard<std::mutex> g2(global_mutex2);
    212     std::lock_guard<std::mutex> g3(static_mutex3);
    213   });
    214   t2.join();
    215 
    216   auto t3 = std::thread([&local_mutex4] {
    217     std::lock_guard<std::mutex> g3(static_mutex3);
    218     std::lock_guard<std::mutex> g4(local_mutex4);
    219   });
    220   t3.join();
    221 
    222   auto t4 = std::thread([&local_mutex4] {
    223     std::lock_guard<std::mutex> g4(local_mutex4);
    224     std::lock_guard<std::mutex> g1(global_mutex1);
    225   });
    226   t4.join();
    227 
    228   std::cout << "sleeping to allow trace to collect data..." << std::endl;
    229   std::this_thread::sleep_for(std::chrono::seconds(5));
    230   std::cout << "done!" << std::endl;
    231 }
    232 ```
    233 
    234 Note that an actual deadlock did not occur, although this mutex lock ordering
    235 creates the possibility of a deadlock, and this is a hint to the programmer to
    236 reconsider the lock ordering. If the mutexes are global or static and debug
    237 symbols are enabled, the output will contain the mutex symbol name. The output
    238 uses a similar format as ThreadSanitizer
    239 (https://github.com/google/sanitizers/wiki/ThreadSanitizerDeadlockDetector).
    240 
    241 
    242 # ./deadlock_detector.py 181 --binary /usr/local/bin/lockinversion
    243 
    244 Tracing... Hit Ctrl-C to end.
    245 ^C
    246 
    247 If the traced process is instantiated from a statically-linked executable,
    248 this argument is optional, and the program will determine the path of the
    249 executable from the pid. However, on older kernels without this patch
    250 ("uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET",
    251 https://lkml.org/lkml/2017/1/13/585), binaries that contain `:` in the path
    252 cannot be attached with uprobes. As a workaround, we can create a symlink
    253 to the binary, and provide the symlink name instead to the `--binary` option.
    254 
    255 
    256 # ./deadlock_detector.py 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0
    257 
    258 Tracing... Hit Ctrl-C to end.
    259 ^C
    260 
    261 If the traced process is instantiated from a dynamically-linked executable,
    262 this argument is required and needs to be the path to the pthread shared
    263 library used by the executable.
    264 
    265 
    266 # ./deadlock_detector.py 181 --dump-graph graph.json --verbose
    267 
    268 Tracing... Hit Ctrl-C to end.
    269 Mutexes: 0, Edges: 0
    270 Mutexes: 532, Edges: 411
    271 Mutexes: 735, Edges: 675
    272 Mutexes: 1118, Edges: 1278
    273 Mutexes: 1666, Edges: 2185
    274 Mutexes: 2056, Edges: 2694
    275 Mutexes: 2245, Edges: 2906
    276 Mutexes: 2656, Edges: 3479
    277 Mutexes: 2813, Edges: 3785
    278 ^C
    279 
    280 If the program does not find a deadlock, it will keep running until you hit
    281 Ctrl-C. If you pass the `--verbose` flag, the program will also dump statistics
    282 about the number of mutexes and edges in the mutex wait graph. If you want to
    283 serialize the graph to analyze it later, you can pass the `--dump-graph FILE`
    284 flag, and the program will serialize the graph in json.
    285 
    286 
    287 # ./deadlock_detector.py 181 --lock-symbols custom_mutex1_lock,custom_mutex2_lock --unlock_symbols custom_mutex1_unlock,custom_mutex2_unlock --verbose
    288 
    289 Tracing... Hit Ctrl-C to end.
    290 Mutexes: 0, Edges: 0
    291 Mutexes: 532, Edges: 411
    292 Mutexes: 735, Edges: 675
    293 Mutexes: 1118, Edges: 1278
    294 Mutexes: 1666, Edges: 2185
    295 Mutexes: 2056, Edges: 2694
    296 Mutexes: 2245, Edges: 2906
    297 Mutexes: 2656, Edges: 3479
    298 Mutexes: 2813, Edges: 3785
    299 ^C
    300 
    301 If your program is using custom mutexes and not pthread mutexes, you can use
    302 the `--lock-symbols` and `--unlock-symbols` flags to specify different mutex
    303 symbols to trace. The flags take a comma-separated string of symbol names.
    304 Note that if the symbols are inlined in the binary, then this program can result
    305 in false positives.
    306 
    307 
    308 USAGE message:
    309 
    310 # ./deadlock_detector.py -h
    311 
    312 usage: deadlock_detector.py [-h] [--binary BINARY] [--dump-graph DUMP_GRAPH]
    313                             [--verbose] [--lock-symbols LOCK_SYMBOLS]
    314                             [--unlock-symbols UNLOCK_SYMBOLS]
    315                             pid
    316 
    317 Detect potential deadlocks (lock inversions) in a running binary.
    318 Must be run as root.
    319 
    320 positional arguments:
    321   pid                   Pid to trace
    322 
    323 optional arguments:
    324   -h, --help            show this help message and exit
    325   --binary BINARY       If set, trace the mutexes from the binary at this
    326                         path. For statically-linked binaries, this argument is
    327                         not required. For dynamically-linked binaries, this
    328                         argument is required and should be the path of the
    329                         pthread library the binary is using. Example:
    330                         /lib/x86_64-linux-gnu/libpthread.so.0
    331   --dump-graph DUMP_GRAPH
    332                         If set, this will dump the mutex graph to the
    333                         specified file.
    334   --verbose             Print statistics about the mutex wait graph.
    335   --lock-symbols LOCK_SYMBOLS
    336                         Comma-separated list of lock symbols to trace. Default
    337                         is pthread_mutex_lock. These symbols cannot be inlined
    338                         in the binary.
    339   --unlock-symbols UNLOCK_SYMBOLS
    340                         Comma-separated list of unlock symbols to trace.
    341                         Default is pthread_mutex_unlock. These symbols cannot
    342                         be inlined in the binary.
    343 
    344 Examples:
    345     deadlock_detector 181        # Analyze PID 181
    346 
    347     deadlock_detector 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0
    348                                  # Analyze PID 181 and locks from this binary.
    349                                  # If tracing a process that is running from
    350                                  # a dynamically-linked binary, this argument
    351                                  # is required and should be the path to the
    352                                  # pthread library.
    353 
    354     deadlock_detector 181 --verbose
    355                                  # Analyze PID 181 and print statistics about
    356                                  # the mutex wait graph.
    357 
    358     deadlock_detector 181 --lock-symbols my_mutex_lock1,my_mutex_lock2 \
    359         --unlock-symbols my_mutex_unlock1,my_mutex_unlock2
    360                                  # Analyze PID 181 and trace custom mutex
    361                                  # symbols instead of pthread mutexes.
    362 
    363     deadlock_detector 181 --dump-graph graph.json
    364                                  # Analyze PID 181 and dump the mutex wait
    365                                  # graph to graph.json.
    366