Home | History | Annotate | Download | only in Objects
      1 All about co_lnotab, the line number table.
      2 
      3 Code objects store a field named co_lnotab.  This is an array of unsigned bytes
      4 disguised as a Python string.  It is used to map bytecode offsets to source code
      5 line #s for tracebacks and to identify line number boundaries for line tracing.
      6 
      7 The array is conceptually a compressed list of
      8     (bytecode offset increment, line number increment)
      9 pairs.  The details are important and delicate, best illustrated by example:
     10 
     11     byte code offset    source code line number
     12         0		    1
     13         6		    2
     14        50		    7
     15       350                 307
     16       361                 308
     17 
     18 Instead of storing these numbers literally, we compress the list by storing only
     19 the increments from one row to the next.  Conceptually, the stored list might
     20 look like:
     21 
     22     0, 1,  6, 1,  44, 5,  300, 300,  11, 1
     23 
     24 The above doesn't really work, but it's a start. Note that an unsigned byte
     25 can't hold negative values, or values larger than 255, and the above example
     26 contains two such values. So we make two tweaks:
     27 
     28  (a) there's a deep assumption that byte code offsets and their corresponding
     29  line #s both increase monotonically, and
     30  (b) if at least one column jumps by more than 255 from one row to the next,
     31  more than one pair is written to the table. In case #b, there's no way to know
     32  from looking at the table later how many were written.  That's the delicate
     33  part.  A user of co_lnotab desiring to find the source line number
     34  corresponding to a bytecode address A should do something like this
     35 
     36     lineno = addr = 0
     37     for addr_incr, line_incr in co_lnotab:
     38         addr += addr_incr
     39         if addr > A:
     40             return lineno
     41         lineno += line_incr
     42 
     43 (In C, this is implemented by PyCode_Addr2Line().)  In order for this to work,
     44 when the addr field increments by more than 255, the line # increment in each
     45 pair generated must be 0 until the remaining addr increment is < 256.  So, in
     46 the example above, assemble_lnotab in compile.c should not (as was actually done
     47 until 2.2) expand 300, 300 to
     48     255, 255, 45, 45,
     49 but to
     50     255, 0, 45, 255, 0, 45.
     51 
     52 The above is sufficient to reconstruct line numbers for tracebacks, but not for
     53 line tracing.  Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
     54 and maybe_call_line_trace() in ceval.c.
     55 
     56 *** Tracing ***
     57 
     58 To a first approximation, we want to call the tracing function when the line
     59 number of the current instruction changes.  Re-computing the current line for
     60 every instruction is a little slow, though, so each time we compute the line
     61 number we save the bytecode indices where it's valid:
     62 
     63      *instr_lb <= frame->f_lasti < *instr_ub
     64 
     65 is true so long as execution does not change lines.  That is, *instr_lb holds
     66 the first bytecode index of the current line, and *instr_ub holds the first
     67 bytecode index of the next line.  As long as the above expression is true,
     68 maybe_call_line_trace() does not need to call PyCode_CheckLineNumber().  Note
     69 that the same line may appear multiple times in the lnotab, either because the
     70 bytecode jumped more than 255 indices between line number changes or because
     71 the compiler inserted the same line twice.  Even in that case, *instr_ub holds
     72 the first index of the next line.
     73 
     74 However, we don't *always* want to call the line trace function when the above
     75 test fails.
     76 
     77 Consider this code:
     78 
     79 1: def f(a):
     80 2:    while a:
     81 3:       print 1,
     82 4:       break
     83 5:    else:
     84 6:       print 2,
     85 
     86 which compiles to this:
     87 
     88   2           0 SETUP_LOOP              19 (to 22)
     89         >>    3 LOAD_FAST                0 (a)
     90               6 POP_JUMP_IF_FALSE       17
     91 
     92   3           9 LOAD_CONST               1 (1)
     93              12 PRINT_ITEM          
     94 
     95   4          13 BREAK_LOOP          
     96              14 JUMP_ABSOLUTE            3
     97         >>   17 POP_BLOCK           
     98 
     99   6          18 LOAD_CONST               2 (2)
    100              21 PRINT_ITEM          
    101         >>   22 LOAD_CONST               0 (None)
    102              25 RETURN_VALUE        
    103 
    104 If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 17
    105 and the co_lnotab will claim that execution has moved to line 4, which is wrong.
    106 In this case, we could instead associate the POP_BLOCK with line 5, but that
    107 would break jumps around loops without else clauses.
    108 
    109 We fix this by only calling the line trace function for a forward jump if the
    110 co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
    111 instruction offset matches the offset given for the start of a line by the
    112 co_lnotab.  For backward jumps, however, we always call the line trace function,
    113 which lets a debugger stop on every evaluation of a loop guard (which usually
    114 won't be the first opcode in a line).
    115 
    116 Why do we set f_lineno when tracing, and only just before calling the trace
    117 function?  Well, consider the code above when 'a' is true.  If stepping through
    118 this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
    119 line events on lines 2, 3, and 4, then a "return" type event -- but because the
    120 code for the return actually falls in the range of the "line 6" opcodes, you
    121 would be shown line 6 during this event.  This is a change from the behaviour in
    122 2.2 and before, and I've found it confusing in practice.  By setting and using
    123 f_lineno when tracing, one can report a line number different from that
    124 suggested by f_lasti on this one occasion where it's desirable.
    125