1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 2 <html xmlns="http://www.w3.org/1999/xhtml"> 3 <head> 4 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 5 <link href="style.css" rel="stylesheet" type="text/css" /> 6 <title>LLDB Example - Python Scripting to Debug a Problem</title> 7 </head> 8 9 <body> 10 <div class="www_title"> 11 Example - Using Scripting and Python to Debug in LLDB 12 </div> 13 14 <div id="container"> 15 <div id="content"> 16 <!--#include virtual="sidebar.incl"--> 17 <div id="middle"> 18 <div class="post"> 19 <h1 class ="postheader">Introduction</h1> 20 <div class="postcontent"> 21 22 <p>LLDB has been structured from the beginning to be scriptable in two ways 23 -- a Unix Python session can initiate/run a debug session non-interactively 24 using LLDB; and within the LLDB debugger tool, Python scripts can be used to 25 help with many tasks, including inspecting program data, iterating over 26 containers and determining if a breakpoint should stop execution or continue. 27 This document will show how to do some of these things by going through an 28 example, explaining how to use Python scripting to find a bug in a program 29 that searches for text in a large binary tree.</p> 30 31 </div> 32 <div class="postfooter"></div> 33 34 <div class="post"> 35 <h1 class ="postheader">The Test Program and Input</h1> 36 <div class="postcontent"> 37 38 <p>We have a simple C program (dictionary.c) that reads in a text file, and 39 stores all the words from the file in a Binary Search Tree, sorted 40 alphabetically. It then enters a loop prompting the user for a word, searching 41 for the word in the tree (using Binary Search), and reporting to the user 42 whether or not it found the word in the tree.</p> 43 44 <p>The input text file we are using to test our program contains the text for 45 William Shakespeare's famous tragedy "Romeo and Juliet".</p> 46 47 </div> 48 <div class="postfooter"></div> 49 50 <div class="post"> 51 <h1 class ="postheader">The Bug</h1> 52 <div class="postcontent"> 53 54 <p>When we try running our program, we find there is a problem. While it 55 successfully finds some of the words we would expect to find, such as "love" 56 or "sun", it fails to find the word "Romeo", which MUST be in the input text 57 file:</p> 58 59 <code color=#ff0000> 60 % ./dictionary Romeo-and-Juliet.txt<br> 61 Dictionary loaded.<br> 62 Enter search word: love<br> 63 Yes!<br> 64 Enter search word: sun<br> 65 Yes!<br> 66 Enter search word: Romeo<br> 67 No!<br> 68 Enter search word: ^D<br> 69 %<br> 70 </code> 71 72 </div> 73 <div class="postfooter"></div> 74 75 76 <div class="post"> 77 <h1 class ="postheader">Is the word in our tree: Using Depth First Search</h1> 78 <div class="postcontent"> 79 80 <p>Our first job is to determine if the word "Romeo" actually got inserted into 81 the tree or not. Since "Romeo and Juliet" has thousands of words, trying to 82 examine our binary search tree by hand is completely impractical. Therefore we 83 will write a Python script to search the tree for us. We will write a recursive 84 Depth First Search function that traverses the entire tree searching for a word, 85 and maintaining information about the path from the root of the tree to the 86 current node. If it finds the word in the tree, it returns the path from the 87 root to the node containing the word. This is what our DFS function in Python 88 would look like, with line numbers added for easy reference in later 89 explanations:</p> 90 91 <code> 92 <pre><tt> 93 1: def DFS (root, word, cur_path): 94 2: root_word_ptr = root.GetChildMemberWithName ("word") 95 3: left_child_ptr = root.GetChildMemberWithName ("left") 96 4: right_child_ptr = root.GetChildMemberWithName ("right") 97 5: root_word = root_word_ptr.GetSummary() 98 6: end = len (root_word) - 1 99 7: if root_word[0] == '"' and root_word[end] == '"': 100 8: root_word = root_word[1:end] 101 9: end = len (root_word) - 1 102 10: if root_word[0] == '\'' and root_word[end] == '\'': 103 11: root_word = root_word[1:end] 104 12: if root_word == word: 105 13: return cur_path 106 14: elif word < root_word: 107 15: if left_child_ptr.GetValue() == None: 108 16: return "" 109 17: else: 110 18: cur_path = cur_path + "L" 111 19: return DFS (left_child_ptr, word, cur_path) 112 20: else: 113 21: if right_child_ptr.GetValue() == None: 114 22: return "" 115 23: else: 116 24: cur_path = cur_path + "R" 117 25: return DFS (right_child_ptr, word, cur_path) 118 </tt></pre> 119 </code> 120 121 </div> 122 <div class="postfooter"></div> 123 124 125 <div class="post"> 126 <h1 class ="postheader"><a name="accessing-variables">Accessing & Manipulating <strong>Program</strong> Variables in Python</a> 127 </h1> 128 <div class="postcontent"> 129 130 <p>Before we can call any Python function on any of our program's variables, we 131 need to get the variable into a form that Python can access. To show you how to 132 do this we will look at the parameters for the DFS function. The first 133 parameter is going to be a node in our binary search tree, put into a Python 134 variable. The second parameter is the word we are searching for (a string), and 135 the third parameter is a string representing the path from the root of the tree 136 to our current node.</p> 137 138 <p>The most interesting parameter is the first one, the Python variable that 139 needs to contain a node in our search tree. How can we take a variable out of 140 our program and put it into a Python variable? What kind of Python variable 141 will it be? The answers are to use the LLDB API functions, provided as part of 142 the LLDB Python module. Running Python from inside LLDB, LLDB will 143 automatically give us our current frame object as a Python variable, 144 "lldb.frame". This variable has the type "SBFrame" (see the LLDB API for 145 more information about SBFrame objects). One of the things we can do with a 146 frame object, is to ask it to find and return its local variable. We will call 147 the API function "FindVariable" on the lldb.frame object to give us our 148 dictionary variable as a Python variable:</p> 149 150 <code> 151 root = lldb.frame.FindVariable ("dictionary") 152 </code> 153 154 <p>The line above, executed in the Python script interpreter in LLDB, asks the 155 current frame to find the variable named "dictionary" and return it. We then 156 store the returned value in the Python variable named "root". This answers the 157 question of HOW to get the variable, but it still doesn't explain WHAT actually 158 gets put into "root". If you examine the LLDB API, you will find that the 159 SBFrame method "FindVariable" returns an object of type SBValue. SBValue 160 objects are used, among other things, to wrap up program variables and values. 161 There are many useful methods defined in the SBValue class to allow you to get 162 information or children values out of SBValues. For complete information, see 163 the header file <a href="http://llvm.org/svn/llvm-project/lldb/trunk/include/lldb/API/SBValue.h">SBValue.h</a>. The 164 SBValue methods that we use in our DFS function are 165 <code>GetChildMemberWithName()</code>, 166 <code>GetSummary()</code>, and <code>GetValue()</code>.</p> 167 168 </div> 169 <div class="postfooter"></div> 170 171 172 <div class="post"> 173 <h1 class ="postheader">Explaining Depth First Search Script in Detail</h1> 174 <div class="postcontent"> 175 176 <p><strong>"DFS" Overview.</strong> Before diving into the details of this 177 code, it would be best to give a high-level overview of what it does. The nodes 178 in our binary search tree were defined to have type <code>tree_node *</code>, 179 which is defined as: 180 181 <code> 182 <pre><tt>typedef struct tree_node 183 { 184 const char *word; 185 struct tree_node *left; 186 struct tree_node *right; 187 } tree_node;</tt></pre></code> 188 189 <p>Lines 2-11 of DFS are getting data out of the current tree node and getting 190 ready to do the actual search; lines 12-25 are the actual depth-first search. 191 Lines 2-4 of our DFS function get the <code>word</code>, <code>left</code> and 192 <code>right</code> fields out of the current node and store them in Python 193 variables. Since <code>root_word_ptr</code> is a pointer to our word, and we 194 want the actual word, line 5 calls <code>GetSummary()</code> to get a string 195 containing the value out of the pointer. Since <code>GetSummary()</code> adds 196 quotes around its result, lines 6-11 strip surrounding quotes off the word.</p> 197 198 <p>Line 12 checks to see if the word in the current node is the one we are 199 searching for. If so, we are done, and line 13 returns the current path. 200 Otherwise, line 14 checks to see if we should go left (search word comes before 201 the current word). If we decide to go left, line 15 checks to see if the left 202 pointer child is NULL ("None" is the Python equivalent of NULL). If the left 203 pointer is NULL, then the word is not in this tree and we return an empty path 204 (line 16). Otherwise, we add an "L" to the end of our current path string, to 205 indicate we are going left (line 18), and then recurse on the left child (line 206 19). Lines 20-25 are the same as lines 14-19, except for going right rather 207 than going left.</p> 208 209 <p>One other note: Typing something as long as our DFS function directly into 210 the interpreter can be difficult, as making a single typing mistake means having 211 to start all over. Therefore we recommend doing as we have done: Writing your 212 longer, more complicated script functions in a separate file (in this case 213 tree_utils.py) and then importing it into your LLDB Python interpreter.</p> 214 215 </div> 216 <div class="postfooter"></div> 217 218 219 <div class="post"> 220 <h1 class ="postheader">Seeing the DFS Script in Action</h1> 221 <div class="postcontent"> 222 223 224 <p>At this point we are ready to use the DFS function to see if the word "Romeo" 225 is in our tree or not. To actually use it in LLDB on our dictionary program, 226 you would do something like this:</p> 227 228 <code> 229 % <strong>lldb</strong><br> 230 (lldb) <strong>process attach -n "dictionary"</strong><br> 231 Architecture set to: x86_64.<br> 232 Process 521 stopped<br> 233 * thread #1: tid = 0x2c03, 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8, stop reason = signal SIGSTOP<br> 234 frame #0: 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8<br> 235 (lldb) <strong>breakpoint set -n find_word</strong><br> 236 Breakpoint created: 1: name = 'find_word', locations = 1, resolved = 1<br> 237 (lldb) <strong>continue</strong><br> 238 Process 521 resuming<br> 239 Process 521 stopped<br> 240 * thread #1: tid = 0x2c03, 0x0000000100001830 dictionary`find_word + 16 <br> 241 at dictionary.c:105, stop reason = breakpoint 1.1<br> 242 frame #0: 0x0000000100001830 dictionary`find_word + 16 at dictionary.c:105<br> 243 102 int<br> 244 103 find_word (tree_node *dictionary, char *word)<br> 245 104 {<br> 246 -> 105 if (!word || !dictionary)<br> 247 106 return 0;<br> 248 107 <br> 249 108 int compare_value = strcmp (word, dictionary->word);<br> 250 (lldb) <strong>script</strong><br> 251 Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.<br> 252 >>> <strong>import tree_utils</strong><br> 253 >>> <strong>root = lldb.frame.FindVariable ("dictionary")</strong><br> 254 >>> <strong>current_path = ""</strong><br> 255 >>> <strong>path = tree_utils.DFS (root, "Romeo", current_path)</strong><br> 256 >>> <strong>print path</strong><br> 257 LLRRL<br> 258 >>> <strong>^D</strong><br> 259 (lldb) <br> 260 </code> 261 262 <p>The first bit of code above shows starting lldb, attaching to the dictionary 263 program, and getting to the find_word function in LLDB. The interesting part 264 (as far as this example is concerned) begins when we enter the 265 <code>script</code> command and drop into the embedded interactive Python 266 interpreter. We will go over this Python code line by line. The first line</p> 267 268 <code> 269 import tree_utils 270 </code> 271 272 <p>imports the file where we wrote our DFS function, tree_utils.py, into Python. 273 Notice that to import the file we leave off the ".py" extension. We can now 274 call any function in that file, giving it the prefix "tree_utils.", so that 275 Python knows where to look for the function. The line</p> 276 277 <code> 278 root = lldb.frame.FindVariable ("dictionary") 279 </code> 280 281 <p>gets our program variable "dictionary" (which contains the binary search 282 tree) and puts it into the Python variable "root". See 283 <a href="#accessing-variables">Accessing & Manipulating Program Variables in Python</a> 284 above for more details about how this works. The next line is</p> 285 286 <code> 287 current_path = "" 288 </code> 289 290 <p>This line initializes the current_path from the root of the tree to our 291 current node. Since we are starting at the root of the tree, our current path 292 starts as an empty string. As we go right and left through the tree, the DFS 293 function will append an 'R' or an 'L' to the current path, as appropriate. The 294 line</p> 295 296 <code> 297 path = tree_utils.DFS (root, "Romeo", current_path) 298 </code> 299 300 <p>calls our DFS function (prefixing it with the module name so that Python can 301 find it). We pass in our binary tree stored in the variable <code>root</code>, 302 the word we are searching for, and our current path. We assign whatever path 303 the DFS function returns to the Python variable <code>path</code>.</p> 304 305 306 <p>Finally, we want to see if the word was found or not, and if so we want to 307 see the path through the tree to the word. So we do</p> 308 309 <code> 310 print path 311 </code> 312 313 <p>From this we can see that the word "Romeo" was indeed found in the tree, and 314 the path from the root of the tree to the node containing "Romeo" is 315 left-left-right-right-left.</p> 316 317 </div> 318 <div class="postfooter"></div> 319 320 321 <div class="post"> 322 <h1 class ="postheader">What next? Using Breakpoint Command Scripts...</h1> 323 <div class="postcontent"> 324 325 <p>We are halfway to figuring out what the problem is. We know the word we are 326 looking for is in the binary tree, and we know exactly where it is in the binary 327 tree. Now we need to figure out why our binary search algorithm is not finding 328 the word. We will do this using breakpoint command scripts.</p> 329 330 331 <p>The idea is as follows. The binary search algorithm has two main decision 332 points: the decision to follow the right branch; and, the decision to follow 333 the left branch. We will set a breakpoint at each of these decision points, and 334 attach a Python breakpoint command script to each breakpoint. The breakpoint 335 commands will use the global <code>path</code> Python variable that we got from 336 our DFS function. Each time one of these decision breakpoints is hit, the script 337 will compare the actual decision with the decision the front of the 338 <code>path</code> variable says should be made (the first character of the 339 path). If the actual decision and the path agree, then the front character is 340 stripped off the path, and execution is resumed. In this case the user never 341 even sees the breakpoint being hit. But if the decision differs from what the 342 path says it should be, then the script prints out a message and does NOT resume 343 execution, leaving the user sitting at the first point where a wrong decision is 344 being made.</p> 345 346 </div> 347 <div class="postfooter"></div> 348 349 350 <div class="post"> 351 <h1 class ="postheader">Side Note: Python Breakpoint Command Scripts are NOT What They Seem</h1> 352 <div class="postcontent"> 353 354 </div> 355 <div class="postfooter"></div> 356 357 <p>What do we mean by that? When you enter a Python breakpoint command in LLDB, 358 it appears that you are entering one or more plain lines of Python. BUT LLDB 359 then takes what you entered and wraps it into a Python FUNCTION (just like using 360 the "def" Python command). It automatically gives the function an obscure, 361 unique, hard-to-stumble-across function name, and gives it two parameters: 362 <code>frame</code> and <code>bp_loc</code>. When the breakpoint gets hit, LLDB 363 wraps up the frame object where the breakpoint was hit, and the breakpoint 364 location object for the breakpoint that was hit, and puts them into Python 365 variables for you. It then calls the Python function that was created for the 366 breakpoint command, and passes in the frame and breakpoint location objects.</p> 367 368 <p>So, being practical, what does this mean for you when you write your Python 369 breakpoint commands? It means that there are two things you need to keep in 370 mind: 1. If you want to access any Python variables created outside your script, 371 <strong>you must declare such variables to be global</strong>. If you do not 372 declare them as global, then the Python function will treat them as local 373 variables, and you will get unexpected behavior. 2. <strong>All Python 374 breakpoint command scripts automatically have a <code>frame</code> and a 375 <code>bp_loc</code> variable.</strong> The variables are pre-loaded by LLDB 376 with the correct context for the breakpoint. You do not have to use these 377 variables, but they are there if you want them.</p> 378 379 </div> 380 <div class="postfooter"></div> 381 382 383 <div class="post"> 384 <h1 class ="postheader">The Decision Point Breakpoint Commands</h1> 385 <div class="postcontent"> 386 387 <p>This is what the Python breakpoint command script would look like for the 388 decision to go right:<p> 389 390 <code><pre><tt> 391 global path 392 if path[0] == 'R': 393 path = path[1:] 394 thread = frame.GetThread() 395 process = thread.GetProcess() 396 process.Continue() 397 else: 398 print "Here is the problem; going right, should go left!" 399 </tt></pre></code> 400 401 <p>Just as a reminder, LLDB is going to take this script and wrap it up in a 402 function, like this:</p> 403 404 <code><pre><tt> 405 def some_unique_and_obscure_function_name (frame, bp_loc): 406 global path 407 if path[0] == 'R': 408 path = path[1:] 409 thread = frame.GetThread() 410 process = thread.GetProcess() 411 process.Continue() 412 else: 413 print "Here is the problem; going right, should go left!" 414 </tt></pre></code> 415 416 <p>LLDB will call the function, passing in the correct frame and breakpoint 417 location whenever the breakpoint gets hit. There are several things to notice 418 about this function. The first one is that we are accessing and updating a 419 piece of state (the <code>path</code> variable), and actually conditioning our 420 behavior based upon this variable. Since the variable was defined outside of 421 our script (and therefore outside of the corresponding function) we need to tell 422 Python that we are accessing a global variable. That is what the first line of 423 the script does. Next we check where the path says we should go and compare it to 424 our decision (recall that we are at the breakpoint for the decision to go 425 right). If the path agrees with our decision, then we strip the first character 426 off of the path.</p> 427 428 <p>Since the decision matched the path, we want to resume execution. To do this 429 we make use of the <code>frame</code> parameter that LLDB guarantees will be 430 there for us. We use LLDB API functions to get the current thread from the 431 current frame, and then to get the process from the thread. Once we have the 432 process, we tell it to resume execution (using the <code>Continue()</code> API 433 function).</p> 434 435 <p>If the decision to go right does not agree with the path, then we do not 436 resume execution. We allow the breakpoint to remain stopped (by doing nothing), 437 and we print an informational message telling the user we have found the 438 problem, and what the problem is.</p> 439 440 </div> 441 <div class="postfooter"></div> 442 443 <div class="post"> 444 <h1 class ="postheader">Actually Using the Breakpoint Commands</h1> 445 <div class="postcontent"> 446 447 <p>Now we will look at what happens when we actually use these breakpoint 448 commands on our program. Doing a <code>source list -n find_word</code> shows 449 us the function containing our two decision points. Looking at the code below, 450 we see that we want to set our breakpoints on lines 113 and 115:</p> 451 452 <code><pre><tt> 453 (lldb) source list -n find_word 454 File: /Volumes/Data/HD2/carolinetice/Desktop/LLDB-Web-Examples/dictionary.c. 455 101 456 102 int 457 103 find_word (tree_node *dictionary, char *word) 458 104 { 459 105 if (!word || !dictionary) 460 106 return 0; 461 107 462 108 int compare_value = strcmp (word, dictionary->word); 463 109 464 110 if (compare_value == 0) 465 111 return 1; 466 112 else if (compare_value < 0) 467 113 return find_word (dictionary->left, word); 468 114 else 469 115 return find_word (dictionary->right, word); 470 116 } 471 117 472 </tt></pre></code> 473 474 <p>So, we set our breakpoints, enter our breakpoint command scripts, and see 475 what happens:<p> 476 477 <code><pre><tt> 478 (lldb) breakpoint set -l 113 479 Breakpoint created: 2: file ='dictionary.c', line = 113, locations = 1, resolved = 1 480 (lldb) breakpoint set -l 115 481 Breakpoint created: 3: file ='dictionary.c', line = 115, locations = 1, resolved = 1 482 (lldb) breakpoint command add -s python 2 483 Enter your Python command(s). Type 'DONE' to end. 484 > global path 485 > if (path[0] == 'L'): 486 > path = path[1:] 487 > thread = frame.GetThread() 488 > process = thread.GetProcess() 489 > process.Continue() 490 > else: 491 > print "Here is the problem. Going left, should go right!" 492 > DONE 493 (lldb) breakpoint command add -s python 3 494 Enter your Python command(s). Type 'DONE' to end. 495 > global path 496 > if (path[0] == 'R'): 497 > path = path[1:] 498 > thread = frame.GetThread() 499 > process = thread.GetProcess() 500 > process.Continue() 501 > else: 502 > print "Here is the problem. Going right, should go left!" 503 > DONE 504 (lldb) continue 505 Process 696 resuming 506 Here is the problem. Going right, should go left! 507 Process 696 stopped 508 * thread #1: tid = 0x2d03, 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115, stop reason = breakpoint 3.1 509 frame #0: 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115 510 112 else if (compare_value < 0) 511 113 return find_word (dictionary->left, word); 512 114 else 513 -> 115 return find_word (dictionary->right, word); 514 116 } 515 117 516 118 void 517 (lldb) 518 </tt></pre></code> 519 520 521 <p>After setting our breakpoints, adding our breakpoint commands and continuing, 522 we run for a little bit and then hit one of our breakpoints, printing out the 523 error message from the breakpoint command. Apparently at this point the the 524 tree, our search algorithm decided to go right, but our path says the node we 525 want is to the left. Examining the word at the node where we stopped, and our 526 search word, we see:</p> 527 528 <code> 529 (lldb) expr dictionary->word<br> 530 (const char *) $1 = 0x0000000100100080 "dramatis"<br> 531 (lldb) expr word<br> 532 (char *) $2 = 0x00007fff5fbff108 "romeo"<br> 533 </code> 534 535 <p>So the word at our current node is "dramatis", and the word we are searching 536 for is "romeo". "romeo" comes after "dramatis" alphabetically, so it seems like 537 going right would be the correct decision. Let's ask Python what it thinks the 538 path from the current node to our word is:</p> 539 540 <code> 541 (lldb) script print path<br> 542 LLRRL<br> 543 </code> 544 545 <p>According to Python we need to go left-left-right-right-left from our current 546 node to find the word we are looking for. Let's double check our tree, and see 547 what word it has at that node:</p> 548 549 <code> 550 (lldb) expr dictionary->left->left->right->right->left->word<br> 551 (const char *) $4 = 0x0000000100100880 "Romeo"<br> 552 </code> 553 554 <p>So the word we are searching for is "romeo" and the word at our DFS location 555 is "Romeo". Aha! One is uppercase and the other is lowercase: We seem to have 556 a case conversion problem somewhere in our program (we do).</p> 557 558 <p>This is the end of our example on how you might use Python scripting in LLDB 559 to help you find bugs in your program.</p> 560 561 </div> 562 <div class="postfooter"></div> 563 564 <div class="post"> 565 <h1 class ="postheader">Source Files for The Example</h1> 566 <div class="postcontent"> 567 568 569 </div> 570 <div class="postfooter"></div> 571 572 <p> The complete code for the Dictionary program (with case-conversion bug), 573 the DFS function and other Python script examples (tree_utils.py) used for this 574 example are available via following file links:</p> 575 576 <a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/tree_utils.py">tree_utils.py</a> - Example Python functions using LLDB's API, including DFS<br> 577 <a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/dictionary.c">dictionary.c</a> - Sample dictionary program, with bug<br> 578 579 <p>The text for "Romeo and Juliet" can be obtained from the Gutenberg Project 580 (http://www.gutenberg.org).</p> 581 </div> 582 </div> 583 </div> 584 </div> 585 </body> 586 </html> 587