1 :mod:`difflib` --- Helpers for computing deltas 2 =============================================== 3 4 .. module:: difflib 5 :synopsis: Helpers for computing differences between objects. 6 7 .. moduleauthor:: Tim Peters <tim_one (a] users.sourceforge.net> 8 .. sectionauthor:: Tim Peters <tim_one (a] users.sourceforge.net> 9 .. Markup by Fred L. Drake, Jr. <fdrake (a] acm.org> 10 11 **Source code:** :source:`Lib/difflib.py` 12 13 .. testsetup:: 14 15 import sys 16 from difflib import * 17 18 -------------- 19 20 This module provides classes and functions for comparing sequences. It 21 can be used for example, for comparing files, and can produce difference 22 information in various formats, including HTML and context and unified 23 diffs. For comparing directories and files, see also, the :mod:`filecmp` module. 24 25 26 .. class:: SequenceMatcher 27 28 This is a flexible class for comparing pairs of sequences of any type, so long 29 as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a 30 little fancier than, an algorithm published in the late 1980's by Ratcliff and 31 Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to 32 find the longest contiguous matching subsequence that contains no "junk" 33 elements; these "junk" elements are ones that are uninteresting in some 34 sense, such as blank lines or whitespace. (Handling junk is an 35 extension to the Ratcliff and Obershelp algorithm.) The same 36 idea is then applied recursively to the pieces of the sequences to the left and 37 to the right of the matching subsequence. This does not yield minimal edit 38 sequences, but does tend to yield matches that "look right" to people. 39 40 **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst 41 case and quadratic time in the expected case. :class:`SequenceMatcher` is 42 quadratic time for the worst case and has expected-case behavior dependent in a 43 complicated way on how many elements the sequences have in common; best case 44 time is linear. 45 46 **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that 47 automatically treats certain sequence items as junk. The heuristic counts how many 48 times each individual item appears in the sequence. If an item's duplicates (after 49 the first one) account for more than 1% of the sequence and the sequence is at least 50 200 items long, this item is marked as "popular" and is treated as junk for 51 the purpose of sequence matching. This heuristic can be turned off by setting 52 the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`. 53 54 .. versionadded:: 3.2 55 The *autojunk* parameter. 56 57 58 .. class:: Differ 59 60 This is a class for comparing sequences of lines of text, and producing 61 human-readable differences or deltas. Differ uses :class:`SequenceMatcher` 62 both to compare sequences of lines, and to compare sequences of characters 63 within similar (near-matching) lines. 64 65 Each line of a :class:`Differ` delta begins with a two-letter code: 66 67 +----------+-------------------------------------------+ 68 | Code | Meaning | 69 +==========+===========================================+ 70 | ``'- '`` | line unique to sequence 1 | 71 +----------+-------------------------------------------+ 72 | ``'+ '`` | line unique to sequence 2 | 73 +----------+-------------------------------------------+ 74 | ``' '`` | line common to both sequences | 75 +----------+-------------------------------------------+ 76 | ``'? '`` | line not present in either input sequence | 77 +----------+-------------------------------------------+ 78 79 Lines beginning with '``?``' attempt to guide the eye to intraline differences, 80 and were not present in either input sequence. These lines can be confusing if 81 the sequences contain tab characters. 82 83 84 .. class:: HtmlDiff 85 86 This class can be used to create an HTML table (or a complete HTML file 87 containing the table) showing a side by side, line by line comparison of text 88 with inter-line and intra-line change highlights. The table can be generated in 89 either full or contextual difference mode. 90 91 The constructor for this class is: 92 93 94 .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) 95 96 Initializes instance of :class:`HtmlDiff`. 97 98 *tabsize* is an optional keyword argument to specify tab stop spacing and 99 defaults to ``8``. 100 101 *wrapcolumn* is an optional keyword to specify column number where lines are 102 broken and wrapped, defaults to ``None`` where lines are not wrapped. 103 104 *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff` 105 (used by :class:`HtmlDiff` to generate the side by side HTML differences). See 106 :func:`ndiff` documentation for argument default values and descriptions. 107 108 The following methods are public: 109 110 .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \ 111 numlines=5, *, charset='utf-8') 112 113 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 114 is a complete HTML file containing a table showing line by line differences with 115 inter-line and intra-line changes highlighted. 116 117 *fromdesc* and *todesc* are optional keyword arguments to specify from/to file 118 column header strings (both default to an empty string). 119 120 *context* and *numlines* are both optional keyword arguments. Set *context* to 121 ``True`` when contextual differences are to be shown, else the default is 122 ``False`` to show the full files. *numlines* defaults to ``5``. When *context* 123 is ``True`` *numlines* controls the number of context lines which surround the 124 difference highlights. When *context* is ``False`` *numlines* controls the 125 number of lines which are shown before a difference highlight when using the 126 "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place 127 the next difference highlight at the top of the browser without any leading 128 context). 129 130 .. versionchanged:: 3.5 131 *charset* keyword-only argument was added. The default charset of 132 HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``. 133 134 .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) 135 136 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 137 is a complete HTML table showing line by line differences with inter-line and 138 intra-line changes highlighted. 139 140 The arguments for this method are the same as those for the :meth:`make_file` 141 method. 142 143 :file:`Tools/scripts/diff.py` is a command-line front-end to this class and 144 contains a good example of its use. 145 146 147 .. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 148 149 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 150 generating the delta lines) in context diff format. 151 152 Context diffs are a compact way of showing just the lines that have changed plus 153 a few lines of context. The changes are shown in a before/after style. The 154 number of context lines is set by *n* which defaults to three. 155 156 By default, the diff control lines (those with ``***`` or ``---``) are created 157 with a trailing newline. This is helpful so that inputs created from 158 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 159 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 160 newlines. 161 162 For inputs that do not have trailing newlines, set the *lineterm* argument to 163 ``""`` so that the output will be uniformly newline free. 164 165 The context diff format normally has a header for filenames and modification 166 times. Any or all of these may be specified using strings for *fromfile*, 167 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 168 expressed in the ISO 8601 format. If not specified, the 169 strings default to blanks. 170 171 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 172 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 173 >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py')) 174 *** before.py 175 --- after.py 176 *************** 177 *** 1,4 **** 178 ! bacon 179 ! eggs 180 ! ham 181 guido 182 --- 1,4 ---- 183 ! python 184 ! eggy 185 ! hamster 186 guido 187 188 See :ref:`difflib-interface` for a more detailed example. 189 190 191 .. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6) 192 193 Return a list of the best "good enough" matches. *word* is a sequence for which 194 close matches are desired (typically a string), and *possibilities* is a list of 195 sequences against which to match *word* (typically a list of strings). 196 197 Optional argument *n* (default ``3``) is the maximum number of close matches to 198 return; *n* must be greater than ``0``. 199 200 Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1]. 201 Possibilities that don't score at least that similar to *word* are ignored. 202 203 The best (no more than *n*) matches among the possibilities are returned in a 204 list, sorted by similarity score, most similar first. 205 206 >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) 207 ['apple', 'ape'] 208 >>> import keyword 209 >>> get_close_matches('wheel', keyword.kwlist) 210 ['while'] 211 >>> get_close_matches('pineapple', keyword.kwlist) 212 [] 213 >>> get_close_matches('accept', keyword.kwlist) 214 ['except'] 215 216 217 .. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) 218 219 Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style 220 delta (a :term:`generator` generating the delta lines). 221 222 Optional keyword parameters *linejunk* and *charjunk* are filtering functions 223 (or ``None``): 224 225 *linejunk*: A function that accepts a single string argument, and returns 226 true if the string is junk, or false if not. The default is ``None``. There 227 is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines 228 without visible characters, except for at most one pound character (``'#'``) 229 -- however the underlying :class:`SequenceMatcher` class does a dynamic 230 analysis of which lines are so frequent as to constitute noise, and this 231 usually works better than using this function. 232 233 *charjunk*: A function that accepts a character (a string of length 1), and 234 returns if the character is junk, or false if not. The default is module-level 235 function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a 236 blank or tab; it's a bad idea to include newline in this!). 237 238 :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. 239 240 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 241 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 242 >>> print(''.join(diff), end="") 243 - one 244 ? ^ 245 + ore 246 ? ^ 247 - two 248 - three 249 ? - 250 + tree 251 + emu 252 253 254 .. function:: restore(sequence, which) 255 256 Return one of the two sequences that generated a delta. 257 258 Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract 259 lines originating from file 1 or 2 (parameter *which*), stripping off line 260 prefixes. 261 262 Example: 263 264 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 265 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 266 >>> diff = list(diff) # materialize the generated delta into a list 267 >>> print(''.join(restore(diff, 1)), end="") 268 one 269 two 270 three 271 >>> print(''.join(restore(diff, 2)), end="") 272 ore 273 tree 274 emu 275 276 277 .. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 278 279 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 280 generating the delta lines) in unified diff format. 281 282 Unified diffs are a compact way of showing just the lines that have changed plus 283 a few lines of context. The changes are shown in an inline style (instead of 284 separate before/after blocks). The number of context lines is set by *n* which 285 defaults to three. 286 287 By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are 288 created with a trailing newline. This is helpful so that inputs created from 289 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 290 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 291 newlines. 292 293 For inputs that do not have trailing newlines, set the *lineterm* argument to 294 ``""`` so that the output will be uniformly newline free. 295 296 The context diff format normally has a header for filenames and modification 297 times. Any or all of these may be specified using strings for *fromfile*, 298 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 299 expressed in the ISO 8601 format. If not specified, the 300 strings default to blanks. 301 302 303 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 304 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 305 >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) 306 --- before.py 307 +++ after.py 308 @@ -1,4 +1,4 @@ 309 -bacon 310 -eggs 311 -ham 312 +python 313 +eggy 314 +hamster 315 guido 316 317 See :ref:`difflib-interface` for a more detailed example. 318 319 .. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n') 320 321 Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a 322 sequence of delta lines (also bytes) in the format returned by *dfunc*. 323 *dfunc* must be a callable, typically either :func:`unified_diff` or 324 :func:`context_diff`. 325 326 Allows you to compare data with unknown or inconsistent encoding. All 327 inputs except *n* must be bytes objects, not str. Works by losslessly 328 converting all inputs (except *n*) to str, and calling ``dfunc(a, b, 329 fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of 330 *dfunc* is then converted back to bytes, so the delta lines that you 331 receive have the same unknown/inconsistent encodings as *a* and *b*. 332 333 .. versionadded:: 3.5 334 335 .. function:: IS_LINE_JUNK(line) 336 337 Return true for ignorable lines. The line *line* is ignorable if *line* is 338 blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a 339 default for parameter *linejunk* in :func:`ndiff` in older versions. 340 341 342 .. function:: IS_CHARACTER_JUNK(ch) 343 344 Return true for ignorable characters. The character *ch* is ignorable if *ch* 345 is a space or tab, otherwise it is not ignorable. Used as a default for 346 parameter *charjunk* in :func:`ndiff`. 347 348 349 .. seealso:: 350 351 `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_ 352 Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This 353 was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988. 354 355 356 .. _sequence-matcher: 357 358 SequenceMatcher Objects 359 ----------------------- 360 361 The :class:`SequenceMatcher` class has this constructor: 362 363 364 .. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) 365 366 Optional argument *isjunk* must be ``None`` (the default) or a one-argument 367 function that takes a sequence element and returns true if and only if the 368 element is "junk" and should be ignored. Passing ``None`` for *isjunk* is 369 equivalent to passing ``lambda x: 0``; in other words, no elements are ignored. 370 For example, pass:: 371 372 lambda x: x in " \t" 373 374 if you're comparing lines as sequences of characters, and don't want to synch up 375 on blanks or hard tabs. 376 377 The optional arguments *a* and *b* are sequences to be compared; both default to 378 empty strings. The elements of both sequences must be :term:`hashable`. 379 380 The optional argument *autojunk* can be used to disable the automatic junk 381 heuristic. 382 383 .. versionadded:: 3.2 384 The *autojunk* parameter. 385 386 SequenceMatcher objects get three data attributes: *bjunk* is the 387 set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of 388 non-junk elements considered popular by the heuristic (if it is not 389 disabled); *b2j* is a dict mapping the remaining elements of *b* to a list 390 of positions where they occur. All three are reset whenever *b* is reset 391 with :meth:`set_seqs` or :meth:`set_seq2`. 392 393 .. versionadded:: 3.2 394 The *bjunk* and *bpopular* attributes. 395 396 :class:`SequenceMatcher` objects have the following methods: 397 398 .. method:: set_seqs(a, b) 399 400 Set the two sequences to be compared. 401 402 :class:`SequenceMatcher` computes and caches detailed information about the 403 second sequence, so if you want to compare one sequence against many 404 sequences, use :meth:`set_seq2` to set the commonly used sequence once and 405 call :meth:`set_seq1` repeatedly, once for each of the other sequences. 406 407 408 .. method:: set_seq1(a) 409 410 Set the first sequence to be compared. The second sequence to be compared 411 is not changed. 412 413 414 .. method:: set_seq2(b) 415 416 Set the second sequence to be compared. The first sequence to be compared 417 is not changed. 418 419 420 .. method:: find_longest_match(alo, ahi, blo, bhi) 421 422 Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. 423 424 If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns 425 ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo 426 <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', 427 k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i 428 <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of 429 all maximal matching blocks, return one that starts earliest in *a*, and 430 of all those maximal matching blocks that start earliest in *a*, return 431 the one that starts earliest in *b*. 432 433 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 434 >>> s.find_longest_match(0, 5, 0, 9) 435 Match(a=0, b=4, size=5) 436 437 If *isjunk* was provided, first the longest matching block is determined 438 as above, but with the additional restriction that no junk element appears 439 in the block. Then that block is extended as far as possible by matching 440 (only) junk elements on both sides. So the resulting block never matches 441 on junk except as identical junk happens to be adjacent to an interesting 442 match. 443 444 Here's the same example as before, but considering blanks to be junk. That 445 prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the 446 second sequence directly. Instead only the ``'abcd'`` can match, and 447 matches the leftmost ``'abcd'`` in the second sequence: 448 449 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 450 >>> s.find_longest_match(0, 5, 0, 9) 451 Match(a=1, b=0, size=4) 452 453 If no blocks match, this returns ``(alo, blo, 0)``. 454 455 This method returns a :term:`named tuple` ``Match(a, b, size)``. 456 457 458 .. method:: get_matching_blocks() 459 460 Return list of triples describing non-overlapping matching subsequences. 461 Each triple is of the form ``(i, j, n)``, 462 and means that ``a[i:i+n] == b[j:j+n]``. The 463 triples are monotonically increasing in *i* and *j*. 464 465 The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It 466 is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` 467 are adjacent triples in the list, and the second is not the last triple in 468 the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent 469 triples always describe non-adjacent equal blocks. 470 471 .. XXX Explain why a dummy is used! 472 473 .. doctest:: 474 475 >>> s = SequenceMatcher(None, "abxcd", "abcd") 476 >>> s.get_matching_blocks() 477 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 478 479 480 .. method:: get_opcodes() 481 482 Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is 483 of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 484 0``, and remaining tuples have *i1* equal to the *i2* from the preceding 485 tuple, and, likewise, *j1* equal to the previous *j2*. 486 487 The *tag* values are strings, with these meanings: 488 489 +---------------+---------------------------------------------+ 490 | Value | Meaning | 491 +===============+=============================================+ 492 | ``'replace'`` | ``a[i1:i2]`` should be replaced by | 493 | | ``b[j1:j2]``. | 494 +---------------+---------------------------------------------+ 495 | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | 496 | | ``j1 == j2`` in this case. | 497 +---------------+---------------------------------------------+ 498 | ``'insert'`` | ``b[j1:j2]`` should be inserted at | 499 | | ``a[i1:i1]``. Note that ``i1 == i2`` in | 500 | | this case. | 501 +---------------+---------------------------------------------+ 502 | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | 503 | | are equal). | 504 +---------------+---------------------------------------------+ 505 506 For example:: 507 508 >>> a = "qabxcd" 509 >>> b = "abycdf" 510 >>> s = SequenceMatcher(None, a, b) 511 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 512 ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( 513 ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) 514 delete a[0:1] --> b[0:0] 'q' --> '' 515 equal a[1:3] --> b[0:2] 'ab' --> 'ab' 516 replace a[3:4] --> b[2:3] 'x' --> 'y' 517 equal a[4:6] --> b[3:5] 'cd' --> 'cd' 518 insert a[6:6] --> b[5:6] '' --> 'f' 519 520 521 .. method:: get_grouped_opcodes(n=3) 522 523 Return a :term:`generator` of groups with up to *n* lines of context. 524 525 Starting with the groups returned by :meth:`get_opcodes`, this method 526 splits out smaller change clusters and eliminates intervening ranges which 527 have no changes. 528 529 The groups are returned in the same format as :meth:`get_opcodes`. 530 531 532 .. method:: ratio() 533 534 Return a measure of the sequences' similarity as a float in the range [0, 535 1]. 536 537 Where T is the total number of elements in both sequences, and M is the 538 number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the 539 sequences are identical, and ``0.0`` if they have nothing in common. 540 541 This is expensive to compute if :meth:`get_matching_blocks` or 542 :meth:`get_opcodes` hasn't already been called, in which case you may want 543 to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an 544 upper bound. 545 546 547 .. method:: quick_ratio() 548 549 Return an upper bound on :meth:`ratio` relatively quickly. 550 551 552 .. method:: real_quick_ratio() 553 554 Return an upper bound on :meth:`ratio` very quickly. 555 556 557 The three methods that return the ratio of matching to total characters can give 558 different results due to differing levels of approximation, although 559 :meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as 560 :meth:`ratio`: 561 562 >>> s = SequenceMatcher(None, "abcd", "bcde") 563 >>> s.ratio() 564 0.75 565 >>> s.quick_ratio() 566 0.75 567 >>> s.real_quick_ratio() 568 1.0 569 570 571 .. _sequencematcher-examples: 572 573 SequenceMatcher Examples 574 ------------------------ 575 576 This example compares two strings, considering blanks to be "junk": 577 578 >>> s = SequenceMatcher(lambda x: x == " ", 579 ... "private Thread currentThread;", 580 ... "private volatile Thread currentThread;") 581 582 :meth:`ratio` returns a float in [0, 1], measuring the similarity of the 583 sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the 584 sequences are close matches: 585 586 >>> print(round(s.ratio(), 3)) 587 0.866 588 589 If you're only interested in where the sequences match, 590 :meth:`get_matching_blocks` is handy: 591 592 >>> for block in s.get_matching_blocks(): 593 ... print("a[%d] and b[%d] match for %d elements" % block) 594 a[0] and b[0] match for 8 elements 595 a[8] and b[17] match for 21 elements 596 a[29] and b[38] match for 0 elements 597 598 Note that the last tuple returned by :meth:`get_matching_blocks` is always a 599 dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last 600 tuple element (number of elements matched) is ``0``. 601 602 If you want to know how to change the first sequence into the second, use 603 :meth:`get_opcodes`: 604 605 >>> for opcode in s.get_opcodes(): 606 ... print("%6s a[%d:%d] b[%d:%d]" % opcode) 607 equal a[0:8] b[0:8] 608 insert a[8:8] b[8:17] 609 equal a[8:29] b[17:38] 610 611 .. seealso:: 612 613 * The :func:`get_close_matches` function in this module which shows how 614 simple code building on :class:`SequenceMatcher` can be used to do useful 615 work. 616 617 * `Simple version control recipe 618 <https://code.activestate.com/recipes/576729/>`_ for a small application 619 built with :class:`SequenceMatcher`. 620 621 622 .. _differ-objects: 623 624 Differ Objects 625 -------------- 626 627 Note that :class:`Differ`\ -generated deltas make no claim to be **minimal** 628 diffs. To the contrary, minimal diffs are often counter-intuitive, because they 629 synch up anywhere possible, sometimes accidental matches 100 pages apart. 630 Restricting synch points to contiguous matches preserves some notion of 631 locality, at the occasional cost of producing a longer diff. 632 633 The :class:`Differ` class has this constructor: 634 635 636 .. class:: Differ(linejunk=None, charjunk=None) 637 638 Optional keyword parameters *linejunk* and *charjunk* are for filter functions 639 (or ``None``): 640 641 *linejunk*: A function that accepts a single string argument, and returns true 642 if the string is junk. The default is ``None``, meaning that no line is 643 considered junk. 644 645 *charjunk*: A function that accepts a single character argument (a string of 646 length 1), and returns true if the character is junk. The default is ``None``, 647 meaning that no character is considered junk. 648 649 These junk-filtering functions speed up matching to find 650 differences and do not cause any differing lines or characters to 651 be ignored. Read the description of the 652 :meth:`~SequenceMatcher.find_longest_match` method's *isjunk* 653 parameter for an explanation. 654 655 :class:`Differ` objects are used (deltas generated) via a single method: 656 657 658 .. method:: Differ.compare(a, b) 659 660 Compare two sequences of lines, and generate the delta (a sequence of lines). 661 662 Each sequence must contain individual single-line strings ending with 663 newlines. Such sequences can be obtained from the 664 :meth:`~io.IOBase.readlines` method of file-like objects. The delta 665 generated also consists of newline-terminated strings, ready to be 666 printed as-is via the :meth:`~io.IOBase.writelines` method of a 667 file-like object. 668 669 670 .. _differ-examples: 671 672 Differ Example 673 -------------- 674 675 This example compares two texts. First we set up the texts, sequences of 676 individual single-line strings ending with newlines (such sequences can also be 677 obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects): 678 679 >>> text1 = ''' 1. Beautiful is better than ugly. 680 ... 2. Explicit is better than implicit. 681 ... 3. Simple is better than complex. 682 ... 4. Complex is better than complicated. 683 ... '''.splitlines(keepends=True) 684 >>> len(text1) 685 4 686 >>> text1[0][-1] 687 '\n' 688 >>> text2 = ''' 1. Beautiful is better than ugly. 689 ... 3. Simple is better than complex. 690 ... 4. Complicated is better than complex. 691 ... 5. Flat is better than nested. 692 ... '''.splitlines(keepends=True) 693 694 Next we instantiate a Differ object: 695 696 >>> d = Differ() 697 698 Note that when instantiating a :class:`Differ` object we may pass functions to 699 filter out line and character "junk." See the :meth:`Differ` constructor for 700 details. 701 702 Finally, we compare the two: 703 704 >>> result = list(d.compare(text1, text2)) 705 706 ``result`` is a list of strings, so let's pretty-print it: 707 708 >>> from pprint import pprint 709 >>> pprint(result) 710 [' 1. Beautiful is better than ugly.\n', 711 '- 2. Explicit is better than implicit.\n', 712 '- 3. Simple is better than complex.\n', 713 '+ 3. Simple is better than complex.\n', 714 '? ++\n', 715 '- 4. Complex is better than complicated.\n', 716 '? ^ ---- ^\n', 717 '+ 4. Complicated is better than complex.\n', 718 '? ++++ ^ ^\n', 719 '+ 5. Flat is better than nested.\n'] 720 721 As a single multi-line string it looks like this: 722 723 >>> import sys 724 >>> sys.stdout.writelines(result) 725 1. Beautiful is better than ugly. 726 - 2. Explicit is better than implicit. 727 - 3. Simple is better than complex. 728 + 3. Simple is better than complex. 729 ? ++ 730 - 4. Complex is better than complicated. 731 ? ^ ---- ^ 732 + 4. Complicated is better than complex. 733 ? ++++ ^ ^ 734 + 5. Flat is better than nested. 735 736 737 .. _difflib-interface: 738 739 A command-line interface to difflib 740 ----------------------------------- 741 742 This example shows how to use difflib to create a ``diff``-like utility. 743 It is also contained in the Python source distribution, as 744 :file:`Tools/scripts/diff.py`. 745 746 .. literalinclude:: ../../Tools/scripts/diff.py 747