Home | History | Annotate | Download | only in library
      1 :mod:`difflib` --- Helpers for computing deltas
      2 ===============================================
      3 
      4 .. module:: difflib
      5    :synopsis: Helpers for computing differences between objects.
      6 .. moduleauthor:: Tim Peters <tim_one (a] users.sourceforge.net>
      7 .. sectionauthor:: Tim Peters <tim_one (a] users.sourceforge.net>
      8 .. Markup by Fred L. Drake, Jr. <fdrake (a] acm.org>
      9 
     10 .. testsetup::
     11 
     12    import sys
     13    from difflib import *
     14 
     15 .. versionadded:: 2.1
     16 
     17 This module provides classes and functions for comparing sequences. It
     18 can be used for example, for comparing files, and can produce difference
     19 information in various formats, including HTML and context and unified
     20 diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
     21 
     22 .. class:: SequenceMatcher
     23 
     24    This is a flexible class for comparing pairs of sequences of any type, so long
     25    as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
     26    little fancier than, an algorithm published in the late 1980's by Ratcliff and
     27    Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
     28    find the longest contiguous matching subsequence that contains no "junk"
     29    elements (the Ratcliff and Obershelp algorithm doesn't address junk).  The same
     30    idea is then applied recursively to the pieces of the sequences to the left and
     31    to the right of the matching subsequence.  This does not yield minimal edit
     32    sequences, but does tend to yield matches that "look right" to people.
     33 
     34    **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
     35    case and quadratic time in the expected case. :class:`SequenceMatcher` is
     36    quadratic time for the worst case and has expected-case behavior dependent in a
     37    complicated way on how many elements the sequences have in common; best case
     38    time is linear.
     39 
     40    **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
     41    automatically treats certain sequence items as junk. The heuristic counts how many
     42    times each individual item appears in the sequence. If an item's duplicates (after
     43    the first one) account for more than 1% of the sequence and the sequence is at least
     44    200 items long, this item is marked as "popular" and is treated as junk for
     45    the purpose of sequence matching. This heuristic can be turned off by setting
     46    the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
     47 
     48    .. versionadded:: 2.7.1
     49       The *autojunk* parameter.
     50 
     51 .. class:: Differ
     52 
     53    This is a class for comparing sequences of lines of text, and producing
     54    human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
     55    both to compare sequences of lines, and to compare sequences of characters
     56    within similar (near-matching) lines.
     57 
     58    Each line of a :class:`Differ` delta begins with a two-letter code:
     59 
     60    +----------+-------------------------------------------+
     61    | Code     | Meaning                                   |
     62    +==========+===========================================+
     63    | ``'- '`` | line unique to sequence 1                 |
     64    +----------+-------------------------------------------+
     65    | ``'+ '`` | line unique to sequence 2                 |
     66    +----------+-------------------------------------------+
     67    | ``'  '`` | line common to both sequences             |
     68    +----------+-------------------------------------------+
     69    | ``'? '`` | line not present in either input sequence |
     70    +----------+-------------------------------------------+
     71 
     72    Lines beginning with '``?``' attempt to guide the eye to intraline differences,
     73    and were not present in either input sequence. These lines can be confusing if
     74    the sequences contain tab characters.
     75 
     76 
     77 .. class:: HtmlDiff
     78 
     79    This class can be used to create an HTML table (or a complete HTML file
     80    containing the table) showing a side by side, line by line comparison of text
     81    with inter-line and intra-line change highlights.  The table can be generated in
     82    either full or contextual difference mode.
     83 
     84    The constructor for this class is:
     85 
     86 
     87    .. function:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
     88 
     89       Initializes instance of :class:`HtmlDiff`.
     90 
     91       *tabsize* is an optional keyword argument to specify tab stop spacing and
     92       defaults to ``8``.
     93 
     94       *wrapcolumn* is an optional keyword to specify column number where lines are
     95       broken and wrapped, defaults to ``None`` where lines are not wrapped.
     96 
     97       *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
     98       (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
     99       :func:`ndiff` documentation for argument default values and descriptions.
    100 
    101    The following methods are public:
    102 
    103 
    104    .. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
    105 
    106       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
    107       is a complete HTML file containing a table showing line by line differences with
    108       inter-line and intra-line changes highlighted.
    109 
    110       *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
    111       column header strings (both default to an empty string).
    112 
    113       *context* and *numlines* are both optional keyword arguments. Set *context* to
    114       ``True`` when contextual differences are to be shown, else the default is
    115       ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
    116       is ``True`` *numlines* controls the number of context lines which surround the
    117       difference highlights.  When *context* is ``False`` *numlines* controls the
    118       number of lines which are shown before a difference highlight when using the
    119       "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
    120       the next difference highlight at the top of the browser without any leading
    121       context).
    122 
    123 
    124    .. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
    125 
    126       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
    127       is a complete HTML table showing line by line differences with inter-line and
    128       intra-line changes highlighted.
    129 
    130       The arguments for this method are the same as those for the :meth:`make_file`
    131       method.
    132 
    133    :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
    134    contains a good example of its use.
    135 
    136    .. versionadded:: 2.4
    137 
    138 
    139 .. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
    140 
    141    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
    142    generating the delta lines) in context diff format.
    143 
    144    Context diffs are a compact way of showing just the lines that have changed plus
    145    a few lines of context.  The changes are shown in a before/after style.  The
    146    number of context lines is set by *n* which defaults to three.
    147 
    148    By default, the diff control lines (those with ``***`` or ``---``) are created
    149    with a trailing newline.  This is helpful so that inputs created from
    150    :func:`file.readlines` result in diffs that are suitable for use with
    151    :func:`file.writelines` since both the inputs and outputs have trailing
    152    newlines.
    153 
    154    For inputs that do not have trailing newlines, set the *lineterm* argument to
    155    ``""`` so that the output will be uniformly newline free.
    156 
    157    The context diff format normally has a header for filenames and modification
    158    times.  Any or all of these may be specified using strings for *fromfile*,
    159    *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
    160    expressed in the ISO 8601 format. If not specified, the
    161    strings default to blanks.
    162 
    163       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
    164       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
    165       >>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
    166       ...     sys.stdout.write(line)  # doctest: +NORMALIZE_WHITESPACE
    167       *** before.py
    168       --- after.py
    169       ***************
    170       *** 1,4 ****
    171       ! bacon
    172       ! eggs
    173       ! ham
    174         guido
    175       --- 1,4 ----
    176       ! python
    177       ! eggy
    178       ! hamster
    179         guido
    180 
    181    See :ref:`difflib-interface` for a more detailed example.
    182 
    183    .. versionadded:: 2.3
    184 
    185 
    186 .. function:: get_close_matches(word, possibilities[, n][, cutoff])
    187 
    188    Return a list of the best "good enough" matches.  *word* is a sequence for which
    189    close matches are desired (typically a string), and *possibilities* is a list of
    190    sequences against which to match *word* (typically a list of strings).
    191 
    192    Optional argument *n* (default ``3``) is the maximum number of close matches to
    193    return; *n* must be greater than ``0``.
    194 
    195    Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
    196    Possibilities that don't score at least that similar to *word* are ignored.
    197 
    198    The best (no more than *n*) matches among the possibilities are returned in a
    199    list, sorted by similarity score, most similar first.
    200 
    201       >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
    202       ['apple', 'ape']
    203       >>> import keyword
    204       >>> get_close_matches('wheel', keyword.kwlist)
    205       ['while']
    206       >>> get_close_matches('apple', keyword.kwlist)
    207       []
    208       >>> get_close_matches('accept', keyword.kwlist)
    209       ['except']
    210 
    211 
    212 .. function:: ndiff(a, b[, linejunk][, charjunk])
    213 
    214    Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
    215    delta (a :term:`generator` generating the delta lines).
    216 
    217    Optional keyword parameters *linejunk* and *charjunk* are for filter functions
    218    (or ``None``):
    219 
    220    *linejunk*: A function that accepts a single string argument, and returns true
    221    if the string is junk, or false if not. The default is (``None``), starting with
    222    Python 2.3.  Before then, the default was the module-level function
    223    :func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
    224    for at most one pound character (``'#'``). As of Python 2.3, the underlying
    225    :class:`SequenceMatcher` class does a dynamic analysis of which lines are so
    226    frequent as to constitute noise, and this usually works better than the pre-2.3
    227    default.
    228 
    229    *charjunk*: A function that accepts a character (a string of length 1), and
    230    returns if the character is junk, or false if not. The default is module-level
    231    function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
    232    blank or tab; note: bad idea to include newline in this!).
    233 
    234    :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
    235 
    236       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
    237       ...              'ore\ntree\nemu\n'.splitlines(1))
    238       >>> print ''.join(diff),
    239       - one
    240       ?  ^
    241       + ore
    242       ?  ^
    243       - two
    244       - three
    245       ?  -
    246       + tree
    247       + emu
    248 
    249 
    250 .. function:: restore(sequence, which)
    251 
    252    Return one of the two sequences that generated a delta.
    253 
    254    Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
    255    lines originating from file 1 or 2 (parameter *which*), stripping off line
    256    prefixes.
    257 
    258    Example:
    259 
    260       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
    261       ...              'ore\ntree\nemu\n'.splitlines(1))
    262       >>> diff = list(diff) # materialize the generated delta into a list
    263       >>> print ''.join(restore(diff, 1)),
    264       one
    265       two
    266       three
    267       >>> print ''.join(restore(diff, 2)),
    268       ore
    269       tree
    270       emu
    271 
    272 
    273 .. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
    274 
    275    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
    276    generating the delta lines) in unified diff format.
    277 
    278    Unified diffs are a compact way of showing just the lines that have changed plus
    279    a few lines of context.  The changes are shown in an inline style (instead of
    280    separate before/after blocks).  The number of context lines is set by *n* which
    281    defaults to three.
    282 
    283    By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
    284    created with a trailing newline.  This is helpful so that inputs created from
    285    :func:`file.readlines` result in diffs that are suitable for use with
    286    :func:`file.writelines` since both the inputs and outputs have trailing
    287    newlines.
    288 
    289    For inputs that do not have trailing newlines, set the *lineterm* argument to
    290    ``""`` so that the output will be uniformly newline free.
    291 
    292    The context diff format normally has a header for filenames and modification
    293    times.  Any or all of these may be specified using strings for *fromfile*,
    294    *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
    295    expressed in the ISO 8601 format. If not specified, the
    296    strings default to blanks.
    297 
    298       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
    299       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
    300       >>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
    301       ...     sys.stdout.write(line)   # doctest: +NORMALIZE_WHITESPACE
    302       --- before.py
    303       +++ after.py
    304       @@ -1,4 +1,4 @@
    305       -bacon
    306       -eggs
    307       -ham
    308       +python
    309       +eggy
    310       +hamster
    311        guido
    312 
    313    See :ref:`difflib-interface` for a more detailed example.
    314 
    315    .. versionadded:: 2.3
    316 
    317 
    318 .. function:: IS_LINE_JUNK(line)
    319 
    320    Return true for ignorable lines.  The line *line* is ignorable if *line* is
    321    blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
    322    default for parameter *linejunk* in :func:`ndiff` before Python 2.3.
    323 
    324 
    325 .. function:: IS_CHARACTER_JUNK(ch)
    326 
    327    Return true for ignorable characters.  The character *ch* is ignorable if *ch*
    328    is a space or tab, otherwise it is not ignorable.  Used as a default for
    329    parameter *charjunk* in :func:`ndiff`.
    330 
    331 
    332 .. seealso::
    333 
    334    `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
    335       Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
    336       was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
    337 
    338 
    339 .. _sequence-matcher:
    340 
    341 SequenceMatcher Objects
    342 -----------------------
    343 
    344 The :class:`SequenceMatcher` class has this constructor:
    345 
    346 
    347 .. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
    348 
    349    Optional argument *isjunk* must be ``None`` (the default) or a one-argument
    350    function that takes a sequence element and returns true if and only if the
    351    element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
    352    equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
    353    For example, pass::
    354 
    355       lambda x: x in " \t"
    356 
    357    if you're comparing lines as sequences of characters, and don't want to synch up
    358    on blanks or hard tabs.
    359 
    360    The optional arguments *a* and *b* are sequences to be compared; both default to
    361    empty strings.  The elements of both sequences must be :term:`hashable`.
    362 
    363    The optional argument *autojunk* can be used to disable the automatic junk
    364    heuristic.
    365 
    366    .. versionadded:: 2.7.1
    367       The *autojunk* parameter.
    368 
    369    :class:`SequenceMatcher` objects have the following methods:
    370 
    371    .. method:: set_seqs(a, b)
    372 
    373       Set the two sequences to be compared.
    374 
    375    :class:`SequenceMatcher` computes and caches detailed information about the
    376    second sequence, so if you want to compare one sequence against many
    377    sequences, use :meth:`set_seq2` to set the commonly used sequence once and
    378    call :meth:`set_seq1` repeatedly, once for each of the other sequences.
    379 
    380 
    381    .. method:: set_seq1(a)
    382 
    383       Set the first sequence to be compared.  The second sequence to be compared
    384       is not changed.
    385 
    386 
    387    .. method:: set_seq2(b)
    388 
    389       Set the second sequence to be compared.  The first sequence to be compared
    390       is not changed.
    391 
    392 
    393    .. method:: find_longest_match(alo, ahi, blo, bhi)
    394 
    395       Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
    396 
    397       If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
    398       ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
    399       <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
    400       k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
    401       <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
    402       all maximal matching blocks, return one that starts earliest in *a*, and
    403       of all those maximal matching blocks that start earliest in *a*, return
    404       the one that starts earliest in *b*.
    405 
    406          >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
    407          >>> s.find_longest_match(0, 5, 0, 9)
    408          Match(a=0, b=4, size=5)
    409 
    410       If *isjunk* was provided, first the longest matching block is determined
    411       as above, but with the additional restriction that no junk element appears
    412       in the block.  Then that block is extended as far as possible by matching
    413       (only) junk elements on both sides. So the resulting block never matches
    414       on junk except as identical junk happens to be adjacent to an interesting
    415       match.
    416 
    417       Here's the same example as before, but considering blanks to be junk. That
    418       prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
    419       second sequence directly.  Instead only the ``'abcd'`` can match, and
    420       matches the leftmost ``'abcd'`` in the second sequence:
    421 
    422          >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
    423          >>> s.find_longest_match(0, 5, 0, 9)
    424          Match(a=1, b=0, size=4)
    425 
    426       If no blocks match, this returns ``(alo, blo, 0)``.
    427 
    428       .. versionchanged:: 2.6
    429          This method returns a :term:`named tuple` ``Match(a, b, size)``.
    430 
    431 
    432    .. method:: get_matching_blocks()
    433 
    434       Return list of triples describing matching subsequences. Each triple is of
    435       the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``.  The
    436       triples are monotonically increasing in *i* and *j*.
    437 
    438       The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
    439       is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
    440       are adjacent triples in the list, and the second is not the last triple in
    441       the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
    442       triples always describe non-adjacent equal blocks.
    443 
    444       .. XXX Explain why a dummy is used!
    445 
    446       .. versionchanged:: 2.5
    447          The guarantee that adjacent triples always describe non-adjacent blocks
    448          was implemented.
    449 
    450       .. doctest::
    451 
    452          >>> s = SequenceMatcher(None, "abxcd", "abcd")
    453          >>> s.get_matching_blocks()
    454          [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
    455 
    456 
    457    .. method:: get_opcodes()
    458 
    459       Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
    460       of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
    461       0``, and remaining tuples have *i1* equal to the *i2* from the preceding
    462       tuple, and, likewise, *j1* equal to the previous *j2*.
    463 
    464       The *tag* values are strings, with these meanings:
    465 
    466       +---------------+---------------------------------------------+
    467       | Value         | Meaning                                     |
    468       +===============+=============================================+
    469       | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
    470       |               | ``b[j1:j2]``.                               |
    471       +---------------+---------------------------------------------+
    472       | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
    473       |               | ``j1 == j2`` in this case.                  |
    474       +---------------+---------------------------------------------+
    475       | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
    476       |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
    477       |               | this case.                                  |
    478       +---------------+---------------------------------------------+
    479       | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
    480       |               | are equal).                                 |
    481       +---------------+---------------------------------------------+
    482 
    483       For example:
    484 
    485          >>> a = "qabxcd"
    486          >>> b = "abycdf"
    487          >>> s = SequenceMatcher(None, a, b)
    488          >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
    489          ...    print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
    490          ...           (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
    491           delete a[0:1] (q) b[0:0] ()
    492            equal a[1:3] (ab) b[0:2] (ab)
    493          replace a[3:4] (x) b[2:3] (y)
    494            equal a[4:6] (cd) b[3:5] (cd)
    495           insert a[6:6] () b[5:6] (f)
    496 
    497 
    498    .. method:: get_grouped_opcodes([n])
    499 
    500       Return a :term:`generator` of groups with up to *n* lines of context.
    501 
    502       Starting with the groups returned by :meth:`get_opcodes`, this method
    503       splits out smaller change clusters and eliminates intervening ranges which
    504       have no changes.
    505 
    506       The groups are returned in the same format as :meth:`get_opcodes`.
    507 
    508       .. versionadded:: 2.3
    509 
    510 
    511    .. method:: ratio()
    512 
    513       Return a measure of the sequences' similarity as a float in the range [0,
    514       1].
    515 
    516       Where T is the total number of elements in both sequences, and M is the
    517       number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
    518       sequences are identical, and ``0.0`` if they have nothing in common.
    519 
    520       This is expensive to compute if :meth:`get_matching_blocks` or
    521       :meth:`get_opcodes` hasn't already been called, in which case you may want
    522       to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
    523       upper bound.
    524 
    525 
    526    .. method:: quick_ratio()
    527 
    528       Return an upper bound on :meth:`ratio` relatively quickly.
    529 
    530 
    531    .. method:: real_quick_ratio()
    532 
    533       Return an upper bound on :meth:`ratio` very quickly.
    534 
    535 
    536 The three methods that return the ratio of matching to total characters can give
    537 different results due to differing levels of approximation, although
    538 :meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
    539 :meth:`ratio`:
    540 
    541    >>> s = SequenceMatcher(None, "abcd", "bcde")
    542    >>> s.ratio()
    543    0.75
    544    >>> s.quick_ratio()
    545    0.75
    546    >>> s.real_quick_ratio()
    547    1.0
    548 
    549 
    550 .. _sequencematcher-examples:
    551 
    552 SequenceMatcher Examples
    553 ------------------------
    554 
    555 This example compares two strings, considering blanks to be "junk:"
    556 
    557    >>> s = SequenceMatcher(lambda x: x == " ",
    558    ...                     "private Thread currentThread;",
    559    ...                     "private volatile Thread currentThread;")
    560 
    561 :meth:`ratio` returns a float in [0, 1], measuring the similarity of the
    562 sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
    563 sequences are close matches:
    564 
    565    >>> print round(s.ratio(), 3)
    566    0.866
    567 
    568 If you're only interested in where the sequences match,
    569 :meth:`get_matching_blocks` is handy:
    570 
    571    >>> for block in s.get_matching_blocks():
    572    ...     print "a[%d] and b[%d] match for %d elements" % block
    573    a[0] and b[0] match for 8 elements
    574    a[8] and b[17] match for 21 elements
    575    a[29] and b[38] match for 0 elements
    576 
    577 Note that the last tuple returned by :meth:`get_matching_blocks` is always a
    578 dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
    579 tuple element (number of elements matched) is ``0``.
    580 
    581 If you want to know how to change the first sequence into the second, use
    582 :meth:`get_opcodes`:
    583 
    584    >>> for opcode in s.get_opcodes():
    585    ...     print "%6s a[%d:%d] b[%d:%d]" % opcode
    586     equal a[0:8] b[0:8]
    587    insert a[8:8] b[8:17]
    588     equal a[8:29] b[17:38]
    589 
    590 .. seealso::
    591 
    592    * The :func:`get_close_matches` function in this module which shows how
    593      simple code building on :class:`SequenceMatcher` can be used to do useful
    594      work.
    595 
    596    * `Simple version control recipe
    597      <https://code.activestate.com/recipes/576729/>`_ for a small application
    598      built with :class:`SequenceMatcher`.
    599 
    600 
    601 .. _differ-objects:
    602 
    603 Differ Objects
    604 --------------
    605 
    606 Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
    607 diffs. To the contrary, minimal diffs are often counter-intuitive, because they
    608 synch up anywhere possible, sometimes accidental matches 100 pages apart.
    609 Restricting synch points to contiguous matches preserves some notion of
    610 locality, at the occasional cost of producing a longer diff.
    611 
    612 The :class:`Differ` class has this constructor:
    613 
    614 
    615 .. class:: Differ([linejunk[, charjunk]])
    616 
    617    Optional keyword parameters *linejunk* and *charjunk* are for filter functions
    618    (or ``None``):
    619 
    620    *linejunk*: A function that accepts a single string argument, and returns true
    621    if the string is junk.  The default is ``None``, meaning that no line is
    622    considered junk.
    623 
    624    *charjunk*: A function that accepts a single character argument (a string of
    625    length 1), and returns true if the character is junk. The default is ``None``,
    626    meaning that no character is considered junk.
    627 
    628    :class:`Differ` objects are used (deltas generated) via a single method:
    629 
    630 
    631    .. method:: Differ.compare(a, b)
    632 
    633       Compare two sequences of lines, and generate the delta (a sequence of lines).
    634 
    635       Each sequence must contain individual single-line strings ending with
    636       newlines.  Such sequences can be obtained from the
    637       :meth:`~file.readlines` method of file-like objects.  The delta
    638       generated also consists of newline-terminated strings, ready to be
    639       printed as-is via the :meth:`~file.writelines` method of a
    640       file-like object.
    641 
    642 
    643 .. _differ-examples:
    644 
    645 Differ Example
    646 --------------
    647 
    648 This example compares two texts. First we set up the texts, sequences of
    649 individual single-line strings ending with newlines (such sequences can also be
    650 obtained from the :meth:`~file.readlines` method of file-like objects):
    651 
    652    >>> text1 = '''  1. Beautiful is better than ugly.
    653    ...   2. Explicit is better than implicit.
    654    ...   3. Simple is better than complex.
    655    ...   4. Complex is better than complicated.
    656    ... '''.splitlines(1)
    657    >>> len(text1)
    658    4
    659    >>> text1[0][-1]
    660    '\n'
    661    >>> text2 = '''  1. Beautiful is better than ugly.
    662    ...   3.   Simple is better than complex.
    663    ...   4. Complicated is better than complex.
    664    ...   5. Flat is better than nested.
    665    ... '''.splitlines(1)
    666 
    667 Next we instantiate a Differ object:
    668 
    669    >>> d = Differ()
    670 
    671 Note that when instantiating a :class:`Differ` object we may pass functions to
    672 filter out line and character "junk."  See the :meth:`Differ` constructor for
    673 details.
    674 
    675 Finally, we compare the two:
    676 
    677    >>> result = list(d.compare(text1, text2))
    678 
    679 ``result`` is a list of strings, so let's pretty-print it:
    680 
    681    >>> from pprint import pprint
    682    >>> pprint(result)
    683    ['    1. Beautiful is better than ugly.\n',
    684     '-   2. Explicit is better than implicit.\n',
    685     '-   3. Simple is better than complex.\n',
    686     '+   3.   Simple is better than complex.\n',
    687     '?     ++\n',
    688     '-   4. Complex is better than complicated.\n',
    689     '?            ^                     ---- ^\n',
    690     '+   4. Complicated is better than complex.\n',
    691     '?           ++++ ^                      ^\n',
    692     '+   5. Flat is better than nested.\n']
    693 
    694 As a single multi-line string it looks like this:
    695 
    696    >>> import sys
    697    >>> sys.stdout.writelines(result)
    698        1. Beautiful is better than ugly.
    699    -   2. Explicit is better than implicit.
    700    -   3. Simple is better than complex.
    701    +   3.   Simple is better than complex.
    702    ?     ++
    703    -   4. Complex is better than complicated.
    704    ?            ^                     ---- ^
    705    +   4. Complicated is better than complex.
    706    ?           ++++ ^                      ^
    707    +   5. Flat is better than nested.
    708 
    709 
    710 .. _difflib-interface:
    711 
    712 A command-line interface to difflib
    713 -----------------------------------
    714 
    715 This example shows how to use difflib to create a ``diff``-like utility.
    716 It is also contained in the Python source distribution, as
    717 :file:`Tools/scripts/diff.py`.
    718 
    719 .. testcode::
    720 
    721    """ Command line interface to difflib.py providing diffs in four formats:
    722 
    723    * ndiff:    lists every line and highlights interline changes.
    724    * context:  highlights clusters of changes in a before/after format.
    725    * unified:  highlights clusters of changes in an inline format.
    726    * html:     generates side by side comparison with change highlights.
    727 
    728    """
    729 
    730    import sys, os, time, difflib, optparse
    731 
    732    def main():
    733         # Configure the option parser
    734        usage = "usage: %prog [options] fromfile tofile"
    735        parser = optparse.OptionParser(usage)
    736        parser.add_option("-c", action="store_true", default=False,
    737                          help='Produce a context format diff (default)')
    738        parser.add_option("-u", action="store_true", default=False,
    739                          help='Produce a unified format diff')
    740        hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
    741        parser.add_option("-m", action="store_true", default=False, help=hlp)
    742        parser.add_option("-n", action="store_true", default=False,
    743                          help='Produce a ndiff format diff')
    744        parser.add_option("-l", "--lines", type="int", default=3,
    745                          help='Set number of context lines (default 3)')
    746        (options, args) = parser.parse_args()
    747 
    748        if len(args) == 0:
    749            parser.print_help()
    750            sys.exit(1)
    751        if len(args) != 2:
    752            parser.error("need to specify both a fromfile and tofile")
    753 
    754        n = options.lines
    755        fromfile, tofile = args # as specified in the usage string
    756 
    757        # we're passing these as arguments to the diff function
    758        fromdate = time.ctime(os.stat(fromfile).st_mtime)
    759        todate = time.ctime(os.stat(tofile).st_mtime)
    760        with open(fromfile, 'U') as f:
    761            fromlines = f.readlines()
    762        with open(tofile, 'U') as f:
    763            tolines = f.readlines()
    764 
    765        if options.u:
    766            diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
    767                                        fromdate, todate, n=n)
    768        elif options.n:
    769            diff = difflib.ndiff(fromlines, tolines)
    770        elif options.m:
    771            diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
    772                                                tofile, context=options.c,
    773                                                numlines=n)
    774        else:
    775            diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
    776                                        fromdate, todate, n=n)
    777 
    778        # we're using writelines because diff is a generator
    779        sys.stdout.writelines(diff)
    780 
    781    if __name__ == '__main__':
    782        main()
    783