Home | History | Annotate | Download | only in library
      1 :mod:`difflib` --- Helpers for computing deltas
      2 ===============================================
      3 
      4 .. module:: difflib
      5    :synopsis: Helpers for computing differences between objects.
      6 
      7 .. moduleauthor:: Tim Peters <tim_one (a] users.sourceforge.net>
      8 .. sectionauthor:: Tim Peters <tim_one (a] users.sourceforge.net>
      9 .. Markup by Fred L. Drake, Jr. <fdrake (a] acm.org>
     10 
     11 **Source code:** :source:`Lib/difflib.py`
     12 
     13 .. testsetup::
     14 
     15    import sys
     16    from difflib import *
     17 
     18 --------------
     19 
     20 This module provides classes and functions for comparing sequences. It
     21 can be used for example, for comparing files, and can produce difference
     22 information in various formats, including HTML and context and unified
     23 diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
     24 
     25 
     26 .. class:: SequenceMatcher
     27 
     28    This is a flexible class for comparing pairs of sequences of any type, so long
     29    as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
     30    little fancier than, an algorithm published in the late 1980's by Ratcliff and
     31    Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
     32    find the longest contiguous matching subsequence that contains no "junk"
     33    elements; these "junk" elements are ones that are uninteresting in some
     34    sense, such as blank lines or whitespace.  (Handling junk is an
     35    extension to the Ratcliff and Obershelp algorithm.) The same
     36    idea is then applied recursively to the pieces of the sequences to the left and
     37    to the right of the matching subsequence.  This does not yield minimal edit
     38    sequences, but does tend to yield matches that "look right" to people.
     39 
     40    **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
     41    case and quadratic time in the expected case. :class:`SequenceMatcher` is
     42    quadratic time for the worst case and has expected-case behavior dependent in a
     43    complicated way on how many elements the sequences have in common; best case
     44    time is linear.
     45 
     46    **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
     47    automatically treats certain sequence items as junk. The heuristic counts how many
     48    times each individual item appears in the sequence. If an item's duplicates (after
     49    the first one) account for more than 1% of the sequence and the sequence is at least
     50    200 items long, this item is marked as "popular" and is treated as junk for
     51    the purpose of sequence matching. This heuristic can be turned off by setting
     52    the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
     53 
     54    .. versionadded:: 3.2
     55       The *autojunk* parameter.
     56 
     57 
     58 .. class:: Differ
     59 
     60    This is a class for comparing sequences of lines of text, and producing
     61    human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
     62    both to compare sequences of lines, and to compare sequences of characters
     63    within similar (near-matching) lines.
     64 
     65    Each line of a :class:`Differ` delta begins with a two-letter code:
     66 
     67    +----------+-------------------------------------------+
     68    | Code     | Meaning                                   |
     69    +==========+===========================================+
     70    | ``'- '`` | line unique to sequence 1                 |
     71    +----------+-------------------------------------------+
     72    | ``'+ '`` | line unique to sequence 2                 |
     73    +----------+-------------------------------------------+
     74    | ``'  '`` | line common to both sequences             |
     75    +----------+-------------------------------------------+
     76    | ``'? '`` | line not present in either input sequence |
     77    +----------+-------------------------------------------+
     78 
     79    Lines beginning with '``?``' attempt to guide the eye to intraline differences,
     80    and were not present in either input sequence. These lines can be confusing if
     81    the sequences contain tab characters.
     82 
     83 
     84 .. class:: HtmlDiff
     85 
     86    This class can be used to create an HTML table (or a complete HTML file
     87    containing the table) showing a side by side, line by line comparison of text
     88    with inter-line and intra-line change highlights.  The table can be generated in
     89    either full or contextual difference mode.
     90 
     91    The constructor for this class is:
     92 
     93 
     94    .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
     95 
     96       Initializes instance of :class:`HtmlDiff`.
     97 
     98       *tabsize* is an optional keyword argument to specify tab stop spacing and
     99       defaults to ``8``.
    100 
    101       *wrapcolumn* is an optional keyword to specify column number where lines are
    102       broken and wrapped, defaults to ``None`` where lines are not wrapped.
    103 
    104       *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
    105       (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
    106       :func:`ndiff` documentation for argument default values and descriptions.
    107 
    108    The following methods are public:
    109 
    110    .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
    111                          numlines=5, *, charset='utf-8')
    112 
    113       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
    114       is a complete HTML file containing a table showing line by line differences with
    115       inter-line and intra-line changes highlighted.
    116 
    117       *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
    118       column header strings (both default to an empty string).
    119 
    120       *context* and *numlines* are both optional keyword arguments. Set *context* to
    121       ``True`` when contextual differences are to be shown, else the default is
    122       ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
    123       is ``True`` *numlines* controls the number of context lines which surround the
    124       difference highlights.  When *context* is ``False`` *numlines* controls the
    125       number of lines which are shown before a difference highlight when using the
    126       "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
    127       the next difference highlight at the top of the browser without any leading
    128       context).
    129 
    130       .. versionchanged:: 3.5
    131          *charset* keyword-only argument was added.  The default charset of
    132          HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
    133 
    134    .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
    135 
    136       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
    137       is a complete HTML table showing line by line differences with inter-line and
    138       intra-line changes highlighted.
    139 
    140       The arguments for this method are the same as those for the :meth:`make_file`
    141       method.
    142 
    143    :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
    144    contains a good example of its use.
    145 
    146 
    147 .. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
    148 
    149    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
    150    generating the delta lines) in context diff format.
    151 
    152    Context diffs are a compact way of showing just the lines that have changed plus
    153    a few lines of context.  The changes are shown in a before/after style.  The
    154    number of context lines is set by *n* which defaults to three.
    155 
    156    By default, the diff control lines (those with ``***`` or ``---``) are created
    157    with a trailing newline.  This is helpful so that inputs created from
    158    :func:`io.IOBase.readlines` result in diffs that are suitable for use with
    159    :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
    160    newlines.
    161 
    162    For inputs that do not have trailing newlines, set the *lineterm* argument to
    163    ``""`` so that the output will be uniformly newline free.
    164 
    165    The context diff format normally has a header for filenames and modification
    166    times.  Any or all of these may be specified using strings for *fromfile*,
    167    *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
    168    expressed in the ISO 8601 format. If not specified, the
    169    strings default to blanks.
    170 
    171       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
    172       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
    173       >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
    174       *** before.py
    175       --- after.py
    176       ***************
    177       *** 1,4 ****
    178       ! bacon
    179       ! eggs
    180       ! ham
    181         guido
    182       --- 1,4 ----
    183       ! python
    184       ! eggy
    185       ! hamster
    186         guido
    187 
    188    See :ref:`difflib-interface` for a more detailed example.
    189 
    190 
    191 .. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
    192 
    193    Return a list of the best "good enough" matches.  *word* is a sequence for which
    194    close matches are desired (typically a string), and *possibilities* is a list of
    195    sequences against which to match *word* (typically a list of strings).
    196 
    197    Optional argument *n* (default ``3``) is the maximum number of close matches to
    198    return; *n* must be greater than ``0``.
    199 
    200    Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
    201    Possibilities that don't score at least that similar to *word* are ignored.
    202 
    203    The best (no more than *n*) matches among the possibilities are returned in a
    204    list, sorted by similarity score, most similar first.
    205 
    206       >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
    207       ['apple', 'ape']
    208       >>> import keyword
    209       >>> get_close_matches('wheel', keyword.kwlist)
    210       ['while']
    211       >>> get_close_matches('pineapple', keyword.kwlist)
    212       []
    213       >>> get_close_matches('accept', keyword.kwlist)
    214       ['except']
    215 
    216 
    217 .. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
    218 
    219    Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
    220    delta (a :term:`generator` generating the delta lines).
    221 
    222    Optional keyword parameters *linejunk* and *charjunk* are filtering functions
    223    (or ``None``):
    224 
    225    *linejunk*: A function that accepts a single string argument, and returns
    226    true if the string is junk, or false if not. The default is ``None``. There
    227    is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
    228    without visible characters, except for at most one pound character (``'#'``)
    229    -- however the underlying :class:`SequenceMatcher` class does a dynamic
    230    analysis of which lines are so frequent as to constitute noise, and this
    231    usually works better than using this function.
    232 
    233    *charjunk*: A function that accepts a character (a string of length 1), and
    234    returns if the character is junk, or false if not. The default is module-level
    235    function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
    236    blank or tab; it's a bad idea to include newline in this!).
    237 
    238    :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
    239 
    240       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
    241       ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
    242       >>> print(''.join(diff), end="")
    243       - one
    244       ?  ^
    245       + ore
    246       ?  ^
    247       - two
    248       - three
    249       ?  -
    250       + tree
    251       + emu
    252 
    253 
    254 .. function:: restore(sequence, which)
    255 
    256    Return one of the two sequences that generated a delta.
    257 
    258    Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
    259    lines originating from file 1 or 2 (parameter *which*), stripping off line
    260    prefixes.
    261 
    262    Example:
    263 
    264       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
    265       ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
    266       >>> diff = list(diff) # materialize the generated delta into a list
    267       >>> print(''.join(restore(diff, 1)), end="")
    268       one
    269       two
    270       three
    271       >>> print(''.join(restore(diff, 2)), end="")
    272       ore
    273       tree
    274       emu
    275 
    276 
    277 .. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
    278 
    279    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
    280    generating the delta lines) in unified diff format.
    281 
    282    Unified diffs are a compact way of showing just the lines that have changed plus
    283    a few lines of context.  The changes are shown in an inline style (instead of
    284    separate before/after blocks).  The number of context lines is set by *n* which
    285    defaults to three.
    286 
    287    By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
    288    created with a trailing newline.  This is helpful so that inputs created from
    289    :func:`io.IOBase.readlines` result in diffs that are suitable for use with
    290    :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
    291    newlines.
    292 
    293    For inputs that do not have trailing newlines, set the *lineterm* argument to
    294    ``""`` so that the output will be uniformly newline free.
    295 
    296    The context diff format normally has a header for filenames and modification
    297    times.  Any or all of these may be specified using strings for *fromfile*,
    298    *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
    299    expressed in the ISO 8601 format. If not specified, the
    300    strings default to blanks.
    301 
    302 
    303       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
    304       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
    305       >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
    306       --- before.py
    307       +++ after.py
    308       @@ -1,4 +1,4 @@
    309       -bacon
    310       -eggs
    311       -ham
    312       +python
    313       +eggy
    314       +hamster
    315        guido
    316 
    317    See :ref:`difflib-interface` for a more detailed example.
    318 
    319 .. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
    320 
    321    Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a
    322    sequence of delta lines (also bytes) in the format returned by *dfunc*.
    323    *dfunc* must be a callable, typically either :func:`unified_diff` or
    324    :func:`context_diff`.
    325 
    326    Allows you to compare data with unknown or inconsistent encoding. All
    327    inputs except *n* must be bytes objects, not str. Works by losslessly
    328    converting all inputs (except *n*) to str, and calling ``dfunc(a, b,
    329    fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
    330    *dfunc* is then converted back to bytes, so the delta lines that you
    331    receive have the same unknown/inconsistent encodings as *a* and *b*.
    332 
    333    .. versionadded:: 3.5
    334 
    335 .. function:: IS_LINE_JUNK(line)
    336 
    337    Return true for ignorable lines.  The line *line* is ignorable if *line* is
    338    blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
    339    default for parameter *linejunk* in :func:`ndiff` in older versions.
    340 
    341 
    342 .. function:: IS_CHARACTER_JUNK(ch)
    343 
    344    Return true for ignorable characters.  The character *ch* is ignorable if *ch*
    345    is a space or tab, otherwise it is not ignorable.  Used as a default for
    346    parameter *charjunk* in :func:`ndiff`.
    347 
    348 
    349 .. seealso::
    350 
    351    `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
    352       Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
    353       was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
    354 
    355 
    356 .. _sequence-matcher:
    357 
    358 SequenceMatcher Objects
    359 -----------------------
    360 
    361 The :class:`SequenceMatcher` class has this constructor:
    362 
    363 
    364 .. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
    365 
    366    Optional argument *isjunk* must be ``None`` (the default) or a one-argument
    367    function that takes a sequence element and returns true if and only if the
    368    element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
    369    equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
    370    For example, pass::
    371 
    372       lambda x: x in " \t"
    373 
    374    if you're comparing lines as sequences of characters, and don't want to synch up
    375    on blanks or hard tabs.
    376 
    377    The optional arguments *a* and *b* are sequences to be compared; both default to
    378    empty strings.  The elements of both sequences must be :term:`hashable`.
    379 
    380    The optional argument *autojunk* can be used to disable the automatic junk
    381    heuristic.
    382 
    383    .. versionadded:: 3.2
    384       The *autojunk* parameter.
    385 
    386    SequenceMatcher objects get three data attributes: *bjunk* is the
    387    set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
    388    non-junk elements considered popular by the heuristic (if it is not
    389    disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
    390    of positions where they occur. All three are reset whenever *b* is reset
    391    with :meth:`set_seqs` or :meth:`set_seq2`.
    392 
    393    .. versionadded:: 3.2
    394       The *bjunk* and *bpopular* attributes.
    395 
    396    :class:`SequenceMatcher` objects have the following methods:
    397 
    398    .. method:: set_seqs(a, b)
    399 
    400       Set the two sequences to be compared.
    401 
    402    :class:`SequenceMatcher` computes and caches detailed information about the
    403    second sequence, so if you want to compare one sequence against many
    404    sequences, use :meth:`set_seq2` to set the commonly used sequence once and
    405    call :meth:`set_seq1` repeatedly, once for each of the other sequences.
    406 
    407 
    408    .. method:: set_seq1(a)
    409 
    410       Set the first sequence to be compared.  The second sequence to be compared
    411       is not changed.
    412 
    413 
    414    .. method:: set_seq2(b)
    415 
    416       Set the second sequence to be compared.  The first sequence to be compared
    417       is not changed.
    418 
    419 
    420    .. method:: find_longest_match(alo, ahi, blo, bhi)
    421 
    422       Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
    423 
    424       If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
    425       ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
    426       <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
    427       k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
    428       <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
    429       all maximal matching blocks, return one that starts earliest in *a*, and
    430       of all those maximal matching blocks that start earliest in *a*, return
    431       the one that starts earliest in *b*.
    432 
    433          >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
    434          >>> s.find_longest_match(0, 5, 0, 9)
    435          Match(a=0, b=4, size=5)
    436 
    437       If *isjunk* was provided, first the longest matching block is determined
    438       as above, but with the additional restriction that no junk element appears
    439       in the block.  Then that block is extended as far as possible by matching
    440       (only) junk elements on both sides. So the resulting block never matches
    441       on junk except as identical junk happens to be adjacent to an interesting
    442       match.
    443 
    444       Here's the same example as before, but considering blanks to be junk. That
    445       prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
    446       second sequence directly.  Instead only the ``'abcd'`` can match, and
    447       matches the leftmost ``'abcd'`` in the second sequence:
    448 
    449          >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
    450          >>> s.find_longest_match(0, 5, 0, 9)
    451          Match(a=1, b=0, size=4)
    452 
    453       If no blocks match, this returns ``(alo, blo, 0)``.
    454 
    455       This method returns a :term:`named tuple` ``Match(a, b, size)``.
    456 
    457 
    458    .. method:: get_matching_blocks()
    459 
    460       Return list of triples describing matching subsequences. Each triple is of
    461       the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``.  The
    462       triples are monotonically increasing in *i* and *j*.
    463 
    464       The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
    465       is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
    466       are adjacent triples in the list, and the second is not the last triple in
    467       the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
    468       triples always describe non-adjacent equal blocks.
    469 
    470       .. XXX Explain why a dummy is used!
    471 
    472       .. doctest::
    473 
    474          >>> s = SequenceMatcher(None, "abxcd", "abcd")
    475          >>> s.get_matching_blocks()
    476          [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
    477 
    478 
    479    .. method:: get_opcodes()
    480 
    481       Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
    482       of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
    483       0``, and remaining tuples have *i1* equal to the *i2* from the preceding
    484       tuple, and, likewise, *j1* equal to the previous *j2*.
    485 
    486       The *tag* values are strings, with these meanings:
    487 
    488       +---------------+---------------------------------------------+
    489       | Value         | Meaning                                     |
    490       +===============+=============================================+
    491       | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
    492       |               | ``b[j1:j2]``.                               |
    493       +---------------+---------------------------------------------+
    494       | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
    495       |               | ``j1 == j2`` in this case.                  |
    496       +---------------+---------------------------------------------+
    497       | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
    498       |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
    499       |               | this case.                                  |
    500       +---------------+---------------------------------------------+
    501       | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
    502       |               | are equal).                                 |
    503       +---------------+---------------------------------------------+
    504 
    505       For example::
    506 
    507         >>> a = "qabxcd"
    508         >>> b = "abycdf"
    509         >>> s = SequenceMatcher(None, a, b)
    510         >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
    511         ...     print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
    512         ...         tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
    513         delete    a[0:1] --> b[0:0]      'q' --> ''
    514         equal     a[1:3] --> b[0:2]     'ab' --> 'ab'
    515         replace   a[3:4] --> b[2:3]      'x' --> 'y'
    516         equal     a[4:6] --> b[3:5]     'cd' --> 'cd'
    517         insert    a[6:6] --> b[5:6]       '' --> 'f'
    518 
    519 
    520    .. method:: get_grouped_opcodes(n=3)
    521 
    522       Return a :term:`generator` of groups with up to *n* lines of context.
    523 
    524       Starting with the groups returned by :meth:`get_opcodes`, this method
    525       splits out smaller change clusters and eliminates intervening ranges which
    526       have no changes.
    527 
    528       The groups are returned in the same format as :meth:`get_opcodes`.
    529 
    530 
    531    .. method:: ratio()
    532 
    533       Return a measure of the sequences' similarity as a float in the range [0,
    534       1].
    535 
    536       Where T is the total number of elements in both sequences, and M is the
    537       number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
    538       sequences are identical, and ``0.0`` if they have nothing in common.
    539 
    540       This is expensive to compute if :meth:`get_matching_blocks` or
    541       :meth:`get_opcodes` hasn't already been called, in which case you may want
    542       to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
    543       upper bound.
    544 
    545 
    546    .. method:: quick_ratio()
    547 
    548       Return an upper bound on :meth:`ratio` relatively quickly.
    549 
    550 
    551    .. method:: real_quick_ratio()
    552 
    553       Return an upper bound on :meth:`ratio` very quickly.
    554 
    555 
    556 The three methods that return the ratio of matching to total characters can give
    557 different results due to differing levels of approximation, although
    558 :meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
    559 :meth:`ratio`:
    560 
    561    >>> s = SequenceMatcher(None, "abcd", "bcde")
    562    >>> s.ratio()
    563    0.75
    564    >>> s.quick_ratio()
    565    0.75
    566    >>> s.real_quick_ratio()
    567    1.0
    568 
    569 
    570 .. _sequencematcher-examples:
    571 
    572 SequenceMatcher Examples
    573 ------------------------
    574 
    575 This example compares two strings, considering blanks to be "junk":
    576 
    577    >>> s = SequenceMatcher(lambda x: x == " ",
    578    ...                     "private Thread currentThread;",
    579    ...                     "private volatile Thread currentThread;")
    580 
    581 :meth:`ratio` returns a float in [0, 1], measuring the similarity of the
    582 sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
    583 sequences are close matches:
    584 
    585    >>> print(round(s.ratio(), 3))
    586    0.866
    587 
    588 If you're only interested in where the sequences match,
    589 :meth:`get_matching_blocks` is handy:
    590 
    591    >>> for block in s.get_matching_blocks():
    592    ...     print("a[%d] and b[%d] match for %d elements" % block)
    593    a[0] and b[0] match for 8 elements
    594    a[8] and b[17] match for 21 elements
    595    a[29] and b[38] match for 0 elements
    596 
    597 Note that the last tuple returned by :meth:`get_matching_blocks` is always a
    598 dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
    599 tuple element (number of elements matched) is ``0``.
    600 
    601 If you want to know how to change the first sequence into the second, use
    602 :meth:`get_opcodes`:
    603 
    604    >>> for opcode in s.get_opcodes():
    605    ...     print("%6s a[%d:%d] b[%d:%d]" % opcode)
    606     equal a[0:8] b[0:8]
    607    insert a[8:8] b[8:17]
    608     equal a[8:29] b[17:38]
    609 
    610 .. seealso::
    611 
    612    * The :func:`get_close_matches` function in this module which shows how
    613      simple code building on :class:`SequenceMatcher` can be used to do useful
    614      work.
    615 
    616    * `Simple version control recipe
    617      <https://code.activestate.com/recipes/576729/>`_ for a small application
    618      built with :class:`SequenceMatcher`.
    619 
    620 
    621 .. _differ-objects:
    622 
    623 Differ Objects
    624 --------------
    625 
    626 Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
    627 diffs. To the contrary, minimal diffs are often counter-intuitive, because they
    628 synch up anywhere possible, sometimes accidental matches 100 pages apart.
    629 Restricting synch points to contiguous matches preserves some notion of
    630 locality, at the occasional cost of producing a longer diff.
    631 
    632 The :class:`Differ` class has this constructor:
    633 
    634 
    635 .. class:: Differ(linejunk=None, charjunk=None)
    636 
    637    Optional keyword parameters *linejunk* and *charjunk* are for filter functions
    638    (or ``None``):
    639 
    640    *linejunk*: A function that accepts a single string argument, and returns true
    641    if the string is junk.  The default is ``None``, meaning that no line is
    642    considered junk.
    643 
    644    *charjunk*: A function that accepts a single character argument (a string of
    645    length 1), and returns true if the character is junk. The default is ``None``,
    646    meaning that no character is considered junk.
    647 
    648    These junk-filtering functions speed up matching to find
    649    differences and do not cause any differing lines or characters to
    650    be ignored.  Read the description of the
    651    :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
    652    parameter for an explanation.
    653 
    654    :class:`Differ` objects are used (deltas generated) via a single method:
    655 
    656 
    657    .. method:: Differ.compare(a, b)
    658 
    659       Compare two sequences of lines, and generate the delta (a sequence of lines).
    660 
    661       Each sequence must contain individual single-line strings ending with
    662       newlines.  Such sequences can be obtained from the
    663       :meth:`~io.IOBase.readlines` method of file-like objects.  The delta
    664       generated also consists of newline-terminated strings, ready to be
    665       printed as-is via the :meth:`~io.IOBase.writelines` method of a
    666       file-like object.
    667 
    668 
    669 .. _differ-examples:
    670 
    671 Differ Example
    672 --------------
    673 
    674 This example compares two texts. First we set up the texts, sequences of
    675 individual single-line strings ending with newlines (such sequences can also be
    676 obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
    677 
    678    >>> text1 = '''  1. Beautiful is better than ugly.
    679    ...   2. Explicit is better than implicit.
    680    ...   3. Simple is better than complex.
    681    ...   4. Complex is better than complicated.
    682    ... '''.splitlines(keepends=True)
    683    >>> len(text1)
    684    4
    685    >>> text1[0][-1]
    686    '\n'
    687    >>> text2 = '''  1. Beautiful is better than ugly.
    688    ...   3.   Simple is better than complex.
    689    ...   4. Complicated is better than complex.
    690    ...   5. Flat is better than nested.
    691    ... '''.splitlines(keepends=True)
    692 
    693 Next we instantiate a Differ object:
    694 
    695    >>> d = Differ()
    696 
    697 Note that when instantiating a :class:`Differ` object we may pass functions to
    698 filter out line and character "junk."  See the :meth:`Differ` constructor for
    699 details.
    700 
    701 Finally, we compare the two:
    702 
    703    >>> result = list(d.compare(text1, text2))
    704 
    705 ``result`` is a list of strings, so let's pretty-print it:
    706 
    707    >>> from pprint import pprint
    708    >>> pprint(result)
    709    ['    1. Beautiful is better than ugly.\n',
    710     '-   2. Explicit is better than implicit.\n',
    711     '-   3. Simple is better than complex.\n',
    712     '+   3.   Simple is better than complex.\n',
    713     '?     ++\n',
    714     '-   4. Complex is better than complicated.\n',
    715     '?            ^                     ---- ^\n',
    716     '+   4. Complicated is better than complex.\n',
    717     '?           ++++ ^                      ^\n',
    718     '+   5. Flat is better than nested.\n']
    719 
    720 As a single multi-line string it looks like this:
    721 
    722    >>> import sys
    723    >>> sys.stdout.writelines(result)
    724        1. Beautiful is better than ugly.
    725    -   2. Explicit is better than implicit.
    726    -   3. Simple is better than complex.
    727    +   3.   Simple is better than complex.
    728    ?     ++
    729    -   4. Complex is better than complicated.
    730    ?            ^                     ---- ^
    731    +   4. Complicated is better than complex.
    732    ?           ++++ ^                      ^
    733    +   5. Flat is better than nested.
    734 
    735 
    736 .. _difflib-interface:
    737 
    738 A command-line interface to difflib
    739 -----------------------------------
    740 
    741 This example shows how to use difflib to create a ``diff``-like utility.
    742 It is also contained in the Python source distribution, as
    743 :file:`Tools/scripts/diff.py`.
    744 
    745 .. literalinclude:: ../../Tools/scripts/diff.py
    746