1 ******************************** 2 Functional Programming HOWTO 3 ******************************** 4 5 :Author: A. M. Kuchling 6 :Release: 0.32 7 8 In this document, we'll take a tour of Python's features suitable for 9 implementing programs in a functional style. After an introduction to the 10 concepts of functional programming, we'll look at language features such as 11 :term:`iterator`\s and :term:`generator`\s and relevant library modules such as 12 :mod:`itertools` and :mod:`functools`. 13 14 15 Introduction 16 ============ 17 18 This section explains the basic concept of functional programming; if 19 you're just interested in learning about Python language features, 20 skip to the next section on :ref:`functional-howto-iterators`. 21 22 Programming languages support decomposing problems in several different ways: 23 24 * Most programming languages are **procedural**: programs are lists of 25 instructions that tell the computer what to do with the program's input. C, 26 Pascal, and even Unix shells are procedural languages. 27 28 * In **declarative** languages, you write a specification that describes the 29 problem to be solved, and the language implementation figures out how to 30 perform the computation efficiently. SQL is the declarative language you're 31 most likely to be familiar with; a SQL query describes the data set you want 32 to retrieve, and the SQL engine decides whether to scan tables or use indexes, 33 which subclauses should be performed first, etc. 34 35 * **Object-oriented** programs manipulate collections of objects. Objects have 36 internal state and support methods that query or modify this internal state in 37 some way. Smalltalk and Java are object-oriented languages. C++ and Python 38 are languages that support object-oriented programming, but don't force the 39 use of object-oriented features. 40 41 * **Functional** programming decomposes a problem into a set of functions. 42 Ideally, functions only take inputs and produce outputs, and don't have any 43 internal state that affects the output produced for a given input. Well-known 44 functional languages include the ML family (Standard ML, OCaml, and other 45 variants) and Haskell. 46 47 The designers of some computer languages choose to emphasize one 48 particular approach to programming. This often makes it difficult to 49 write programs that use a different approach. Other languages are 50 multi-paradigm languages that support several different approaches. 51 Lisp, C++, and Python are multi-paradigm; you can write programs or 52 libraries that are largely procedural, object-oriented, or functional 53 in all of these languages. In a large program, different sections 54 might be written using different approaches; the GUI might be 55 object-oriented while the processing logic is procedural or 56 functional, for example. 57 58 In a functional program, input flows through a set of functions. Each function 59 operates on its input and produces some output. Functional style discourages 60 functions with side effects that modify internal state or make other changes 61 that aren't visible in the function's return value. Functions that have no side 62 effects at all are called **purely functional**. Avoiding side effects means 63 not using data structures that get updated as a program runs; every function's 64 output must only depend on its input. 65 66 Some languages are very strict about purity and don't even have assignment 67 statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all 68 side effects. Printing to the screen or writing to a disk file are side 69 effects, for example. For example, in Python a call to the :func:`print` or 70 :func:`time.sleep` function both return no useful value; they're only called for 71 their side effects of sending some text to the screen or pausing execution for a 72 second. 73 74 Python programs written in functional style usually won't go to the extreme of 75 avoiding all I/O or all assignments; instead, they'll provide a 76 functional-appearing interface but will use non-functional features internally. 77 For example, the implementation of a function will still use assignments to 78 local variables, but won't modify global variables or have other side effects. 79 80 Functional programming can be considered the opposite of object-oriented 81 programming. Objects are little capsules containing some internal state along 82 with a collection of method calls that let you modify this state, and programs 83 consist of making the right set of state changes. Functional programming wants 84 to avoid state changes as much as possible and works with data flowing between 85 functions. In Python you might combine the two approaches by writing functions 86 that take and return instances representing objects in your application (e-mail 87 messages, transactions, etc.). 88 89 Functional design may seem like an odd constraint to work under. Why should you 90 avoid objects and side effects? There are theoretical and practical advantages 91 to the functional style: 92 93 * Formal provability. 94 * Modularity. 95 * Composability. 96 * Ease of debugging and testing. 97 98 99 Formal provability 100 ------------------ 101 102 A theoretical benefit is that it's easier to construct a mathematical proof that 103 a functional program is correct. 104 105 For a long time researchers have been interested in finding ways to 106 mathematically prove programs correct. This is different from testing a program 107 on numerous inputs and concluding that its output is usually correct, or reading 108 a program's source code and concluding that the code looks right; the goal is 109 instead a rigorous proof that a program produces the right result for all 110 possible inputs. 111 112 The technique used to prove programs correct is to write down **invariants**, 113 properties of the input data and of the program's variables that are always 114 true. For each line of code, you then show that if invariants X and Y are true 115 **before** the line is executed, the slightly different invariants X' and Y' are 116 true **after** the line is executed. This continues until you reach the end of 117 the program, at which point the invariants should match the desired conditions 118 on the program's output. 119 120 Functional programming's avoidance of assignments arose because assignments are 121 difficult to handle with this technique; assignments can break invariants that 122 were true before the assignment without producing any new invariants that can be 123 propagated onward. 124 125 Unfortunately, proving programs correct is largely impractical and not relevant 126 to Python software. Even trivial programs require proofs that are several pages 127 long; the proof of correctness for a moderately complicated program would be 128 enormous, and few or none of the programs you use daily (the Python interpreter, 129 your XML parser, your web browser) could be proven correct. Even if you wrote 130 down or generated a proof, there would then be the question of verifying the 131 proof; maybe there's an error in it, and you wrongly believe you've proved the 132 program correct. 133 134 135 Modularity 136 ---------- 137 138 A more practical benefit of functional programming is that it forces you to 139 break apart your problem into small pieces. Programs are more modular as a 140 result. It's easier to specify and write a small function that does one thing 141 than a large function that performs a complicated transformation. Small 142 functions are also easier to read and to check for errors. 143 144 145 Ease of debugging and testing 146 ----------------------------- 147 148 Testing and debugging a functional-style program is easier. 149 150 Debugging is simplified because functions are generally small and clearly 151 specified. When a program doesn't work, each function is an interface point 152 where you can check that the data are correct. You can look at the intermediate 153 inputs and outputs to quickly isolate the function that's responsible for a bug. 154 155 Testing is easier because each function is a potential subject for a unit test. 156 Functions don't depend on system state that needs to be replicated before 157 running a test; instead you only have to synthesize the right input and then 158 check that the output matches expectations. 159 160 161 Composability 162 ------------- 163 164 As you work on a functional-style program, you'll write a number of functions 165 with varying inputs and outputs. Some of these functions will be unavoidably 166 specialized to a particular application, but others will be useful in a wide 167 variety of programs. For example, a function that takes a directory path and 168 returns all the XML files in the directory, or a function that takes a filename 169 and returns its contents, can be applied to many different situations. 170 171 Over time you'll form a personal library of utilities. Often you'll assemble 172 new programs by arranging existing functions in a new configuration and writing 173 a few functions specialized for the current task. 174 175 176 .. _functional-howto-iterators: 177 178 Iterators 179 ========= 180 181 I'll start by looking at a Python language feature that's an important 182 foundation for writing functional-style programs: iterators. 183 184 An iterator is an object representing a stream of data; this object returns the 185 data one element at a time. A Python iterator must support a method called 186 :meth:`~iterator.__next__` that takes no arguments and always returns the next 187 element of the stream. If there are no more elements in the stream, 188 :meth:`~iterator.__next__` must raise the :exc:`StopIteration` exception. 189 Iterators don't have to be finite, though; it's perfectly reasonable to write 190 an iterator that produces an infinite stream of data. 191 192 The built-in :func:`iter` function takes an arbitrary object and tries to return 193 an iterator that will return the object's contents or elements, raising 194 :exc:`TypeError` if the object doesn't support iteration. Several of Python's 195 built-in data types support iteration, the most common being lists and 196 dictionaries. An object is called :term:`iterable` if you can get an iterator 197 for it. 198 199 You can experiment with the iteration interface manually: 200 201 >>> L = [1, 2, 3] 202 >>> it = iter(L) 203 >>> it #doctest: +ELLIPSIS 204 <...iterator object at ...> 205 >>> it.__next__() # same as next(it) 206 1 207 >>> next(it) 208 2 209 >>> next(it) 210 3 211 >>> next(it) 212 Traceback (most recent call last): 213 File "<stdin>", line 1, in <module> 214 StopIteration 215 >>> 216 217 Python expects iterable objects in several different contexts, the most 218 important being the :keyword:`for` statement. In the statement ``for X in Y``, 219 Y must be an iterator or some object for which :func:`iter` can create an 220 iterator. These two statements are equivalent:: 221 222 223 for i in iter(obj): 224 print(i) 225 226 for i in obj: 227 print(i) 228 229 Iterators can be materialized as lists or tuples by using the :func:`list` or 230 :func:`tuple` constructor functions: 231 232 >>> L = [1, 2, 3] 233 >>> iterator = iter(L) 234 >>> t = tuple(iterator) 235 >>> t 236 (1, 2, 3) 237 238 Sequence unpacking also supports iterators: if you know an iterator will return 239 N elements, you can unpack them into an N-tuple: 240 241 >>> L = [1, 2, 3] 242 >>> iterator = iter(L) 243 >>> a, b, c = iterator 244 >>> a, b, c 245 (1, 2, 3) 246 247 Built-in functions such as :func:`max` and :func:`min` can take a single 248 iterator argument and will return the largest or smallest element. The ``"in"`` 249 and ``"not in"`` operators also support iterators: ``X in iterator`` is true if 250 X is found in the stream returned by the iterator. You'll run into obvious 251 problems if the iterator is infinite; :func:`max`, :func:`min` 252 will never return, and if the element X never appears in the stream, the 253 ``"in"`` and ``"not in"`` operators won't return either. 254 255 Note that you can only go forward in an iterator; there's no way to get the 256 previous element, reset the iterator, or make a copy of it. Iterator objects 257 can optionally provide these additional capabilities, but the iterator protocol 258 only specifies the :meth:`~iterator.__next__` method. Functions may therefore 259 consume all of the iterator's output, and if you need to do something different 260 with the same stream, you'll have to create a new iterator. 261 262 263 264 Data Types That Support Iterators 265 --------------------------------- 266 267 We've already seen how lists and tuples support iterators. In fact, any Python 268 sequence type, such as strings, will automatically support creation of an 269 iterator. 270 271 Calling :func:`iter` on a dictionary returns an iterator that will loop over the 272 dictionary's keys:: 273 274 >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 275 ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} 276 >>> for key in m: 277 ... print(key, m[key]) 278 Jan 1 279 Feb 2 280 Mar 3 281 Apr 4 282 May 5 283 Jun 6 284 Jul 7 285 Aug 8 286 Sep 9 287 Oct 10 288 Nov 11 289 Dec 12 290 291 Note that starting with Python 3.7, dictionary iteration order is guaranteed 292 to be the same as the insertion order. In earlier versions, the behaviour was 293 unspecified and could vary between implementations. 294 295 Applying :func:`iter` to a dictionary always loops over the keys, but 296 dictionaries have methods that return other iterators. If you want to iterate 297 over values or key/value pairs, you can explicitly call the 298 :meth:`~dict.values` or :meth:`~dict.items` methods to get an appropriate 299 iterator. 300 301 The :func:`dict` constructor can accept an iterator that returns a finite stream 302 of ``(key, value)`` tuples: 303 304 >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')] 305 >>> dict(iter(L)) 306 {'Italy': 'Rome', 'France': 'Paris', 'US': 'Washington DC'} 307 308 Files also support iteration by calling the :meth:`~io.TextIOBase.readline` 309 method until there are no more lines in the file. This means you can read each 310 line of a file like this:: 311 312 for line in file: 313 # do something for each line 314 ... 315 316 Sets can take their contents from an iterable and let you iterate over the set's 317 elements:: 318 319 S = {2, 3, 5, 7, 11, 13} 320 for i in S: 321 print(i) 322 323 324 325 Generator expressions and list comprehensions 326 ============================================= 327 328 Two common operations on an iterator's output are 1) performing some operation 329 for every element, 2) selecting a subset of elements that meet some condition. 330 For example, given a list of strings, you might want to strip off trailing 331 whitespace from each line or extract all the strings containing a given 332 substring. 333 334 List comprehensions and generator expressions (short form: "listcomps" and 335 "genexps") are a concise notation for such operations, borrowed from the 336 functional programming language Haskell (https://www.haskell.org/). You can strip 337 all the whitespace from a stream of strings with the following code:: 338 339 line_list = [' line 1\n', 'line 2 \n', ...] 340 341 # Generator expression -- returns iterator 342 stripped_iter = (line.strip() for line in line_list) 343 344 # List comprehension -- returns list 345 stripped_list = [line.strip() for line in line_list] 346 347 You can select only certain elements by adding an ``"if"`` condition:: 348 349 stripped_list = [line.strip() for line in line_list 350 if line != ""] 351 352 With a list comprehension, you get back a Python list; ``stripped_list`` is a 353 list containing the resulting lines, not an iterator. Generator expressions 354 return an iterator that computes the values as necessary, not needing to 355 materialize all the values at once. This means that list comprehensions aren't 356 useful if you're working with iterators that return an infinite stream or a very 357 large amount of data. Generator expressions are preferable in these situations. 358 359 Generator expressions are surrounded by parentheses ("()") and list 360 comprehensions are surrounded by square brackets ("[]"). Generator expressions 361 have the form:: 362 363 ( expression for expr in sequence1 364 if condition1 365 for expr2 in sequence2 366 if condition2 367 for expr3 in sequence3 ... 368 if condition3 369 for exprN in sequenceN 370 if conditionN ) 371 372 Again, for a list comprehension only the outside brackets are different (square 373 brackets instead of parentheses). 374 375 The elements of the generated output will be the successive values of 376 ``expression``. The ``if`` clauses are all optional; if present, ``expression`` 377 is only evaluated and added to the result when ``condition`` is true. 378 379 Generator expressions always have to be written inside parentheses, but the 380 parentheses signalling a function call also count. If you want to create an 381 iterator that will be immediately passed to a function you can write:: 382 383 obj_total = sum(obj.count for obj in list_all_objects()) 384 385 The ``for...in`` clauses contain the sequences to be iterated over. The 386 sequences do not have to be the same length, because they are iterated over from 387 left to right, **not** in parallel. For each element in ``sequence1``, 388 ``sequence2`` is looped over from the beginning. ``sequence3`` is then looped 389 over for each resulting pair of elements from ``sequence1`` and ``sequence2``. 390 391 To put it another way, a list comprehension or generator expression is 392 equivalent to the following Python code:: 393 394 for expr1 in sequence1: 395 if not (condition1): 396 continue # Skip this element 397 for expr2 in sequence2: 398 if not (condition2): 399 continue # Skip this element 400 ... 401 for exprN in sequenceN: 402 if not (conditionN): 403 continue # Skip this element 404 405 # Output the value of 406 # the expression. 407 408 This means that when there are multiple ``for...in`` clauses but no ``if`` 409 clauses, the length of the resulting output will be equal to the product of the 410 lengths of all the sequences. If you have two lists of length 3, the output 411 list is 9 elements long: 412 413 >>> seq1 = 'abc' 414 >>> seq2 = (1, 2, 3) 415 >>> [(x, y) for x in seq1 for y in seq2] #doctest: +NORMALIZE_WHITESPACE 416 [('a', 1), ('a', 2), ('a', 3), 417 ('b', 1), ('b', 2), ('b', 3), 418 ('c', 1), ('c', 2), ('c', 3)] 419 420 To avoid introducing an ambiguity into Python's grammar, if ``expression`` is 421 creating a tuple, it must be surrounded with parentheses. The first list 422 comprehension below is a syntax error, while the second one is correct:: 423 424 # Syntax error 425 [x, y for x in seq1 for y in seq2] 426 # Correct 427 [(x, y) for x in seq1 for y in seq2] 428 429 430 Generators 431 ========== 432 433 Generators are a special class of functions that simplify the task of writing 434 iterators. Regular functions compute a value and return it, but generators 435 return an iterator that returns a stream of values. 436 437 You're doubtless familiar with how regular function calls work in Python or C. 438 When you call a function, it gets a private namespace where its local variables 439 are created. When the function reaches a ``return`` statement, the local 440 variables are destroyed and the value is returned to the caller. A later call 441 to the same function creates a new private namespace and a fresh set of local 442 variables. But, what if the local variables weren't thrown away on exiting a 443 function? What if you could later resume the function where it left off? This 444 is what generators provide; they can be thought of as resumable functions. 445 446 Here's the simplest example of a generator function: 447 448 >>> def generate_ints(N): 449 ... for i in range(N): 450 ... yield i 451 452 Any function containing a :keyword:`yield` keyword is a generator function; 453 this is detected by Python's :term:`bytecode` compiler which compiles the 454 function specially as a result. 455 456 When you call a generator function, it doesn't return a single value; instead it 457 returns a generator object that supports the iterator protocol. On executing 458 the ``yield`` expression, the generator outputs the value of ``i``, similar to a 459 ``return`` statement. The big difference between ``yield`` and a ``return`` 460 statement is that on reaching a ``yield`` the generator's state of execution is 461 suspended and local variables are preserved. On the next call to the 462 generator's :meth:`~generator.__next__` method, the function will resume 463 executing. 464 465 Here's a sample usage of the ``generate_ints()`` generator: 466 467 >>> gen = generate_ints(3) 468 >>> gen #doctest: +ELLIPSIS 469 <generator object generate_ints at ...> 470 >>> next(gen) 471 0 472 >>> next(gen) 473 1 474 >>> next(gen) 475 2 476 >>> next(gen) 477 Traceback (most recent call last): 478 File "stdin", line 1, in <module> 479 File "stdin", line 2, in generate_ints 480 StopIteration 481 482 You could equally write ``for i in generate_ints(5)``, or ``a, b, c = 483 generate_ints(3)``. 484 485 Inside a generator function, ``return value`` causes ``StopIteration(value)`` 486 to be raised from the :meth:`~generator.__next__` method. Once this happens, or 487 the bottom of the function is reached, the procession of values ends and the 488 generator cannot yield any further values. 489 490 You could achieve the effect of generators manually by writing your own class 491 and storing all the local variables of the generator as instance variables. For 492 example, returning a list of integers could be done by setting ``self.count`` to 493 0, and having the :meth:`~iterator.__next__` method increment ``self.count`` and 494 return it. 495 However, for a moderately complicated generator, writing a corresponding class 496 can be much messier. 497 498 The test suite included with Python's library, 499 :source:`Lib/test/test_generators.py`, contains 500 a number of more interesting examples. Here's one generator that implements an 501 in-order traversal of a tree using generators recursively. :: 502 503 # A recursive generator that generates Tree leaves in in-order. 504 def inorder(t): 505 if t: 506 for x in inorder(t.left): 507 yield x 508 509 yield t.label 510 511 for x in inorder(t.right): 512 yield x 513 514 Two other examples in ``test_generators.py`` produce solutions for the N-Queens 515 problem (placing N queens on an NxN chess board so that no queen threatens 516 another) and the Knight's Tour (finding a route that takes a knight to every 517 square of an NxN chessboard without visiting any square twice). 518 519 520 521 Passing values into a generator 522 ------------------------------- 523 524 In Python 2.4 and earlier, generators only produced output. Once a generator's 525 code was invoked to create an iterator, there was no way to pass any new 526 information into the function when its execution is resumed. You could hack 527 together this ability by making the generator look at a global variable or by 528 passing in some mutable object that callers then modify, but these approaches 529 are messy. 530 531 In Python 2.5 there's a simple way to pass values into a generator. 532 :keyword:`yield` became an expression, returning a value that can be assigned to 533 a variable or otherwise operated on:: 534 535 val = (yield i) 536 537 I recommend that you **always** put parentheses around a ``yield`` expression 538 when you're doing something with the returned value, as in the above example. 539 The parentheses aren't always necessary, but it's easier to always add them 540 instead of having to remember when they're needed. 541 542 (:pep:`342` explains the exact rules, which are that a ``yield``-expression must 543 always be parenthesized except when it occurs at the top-level expression on the 544 right-hand side of an assignment. This means you can write ``val = yield i`` 545 but have to use parentheses when there's an operation, as in ``val = (yield i) 546 + 12``.) 547 548 Values are sent into a generator by calling its :meth:`send(value) 549 <generator.send>` method. This method resumes the generator's code and the 550 ``yield`` expression returns the specified value. If the regular 551 :meth:`~generator.__next__` method is called, the ``yield`` returns ``None``. 552 553 Here's a simple counter that increments by 1 and allows changing the value of 554 the internal counter. 555 556 .. testcode:: 557 558 def counter(maximum): 559 i = 0 560 while i < maximum: 561 val = (yield i) 562 # If value provided, change counter 563 if val is not None: 564 i = val 565 else: 566 i += 1 567 568 And here's an example of changing the counter: 569 570 >>> it = counter(10) #doctest: +SKIP 571 >>> next(it) #doctest: +SKIP 572 0 573 >>> next(it) #doctest: +SKIP 574 1 575 >>> it.send(8) #doctest: +SKIP 576 8 577 >>> next(it) #doctest: +SKIP 578 9 579 >>> next(it) #doctest: +SKIP 580 Traceback (most recent call last): 581 File "t.py", line 15, in <module> 582 it.next() 583 StopIteration 584 585 Because ``yield`` will often be returning ``None``, you should always check for 586 this case. Don't just use its value in expressions unless you're sure that the 587 :meth:`~generator.send` method will be the only method used to resume your 588 generator function. 589 590 In addition to :meth:`~generator.send`, there are two other methods on 591 generators: 592 593 * :meth:`throw(type, value=None, traceback=None) <generator.throw>` is used to 594 raise an exception inside the generator; the exception is raised by the 595 ``yield`` expression where the generator's execution is paused. 596 597 * :meth:`~generator.close` raises a :exc:`GeneratorExit` exception inside the 598 generator to terminate the iteration. On receiving this exception, the 599 generator's code must either raise :exc:`GeneratorExit` or 600 :exc:`StopIteration`; catching the exception and doing anything else is 601 illegal and will trigger a :exc:`RuntimeError`. :meth:`~generator.close` 602 will also be called by Python's garbage collector when the generator is 603 garbage-collected. 604 605 If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest 606 using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`. 607 608 The cumulative effect of these changes is to turn generators from one-way 609 producers of information into both producers and consumers. 610 611 Generators also become **coroutines**, a more generalized form of subroutines. 612 Subroutines are entered at one point and exited at another point (the top of the 613 function, and a ``return`` statement), but coroutines can be entered, exited, 614 and resumed at many different points (the ``yield`` statements). 615 616 617 Built-in functions 618 ================== 619 620 Let's look in more detail at built-in functions often used with iterators. 621 622 Two of Python's built-in functions, :func:`map` and :func:`filter` duplicate the 623 features of generator expressions: 624 625 :func:`map(f, iterA, iterB, ...) <map>` returns an iterator over the sequence 626 ``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``. 627 628 >>> def upper(s): 629 ... return s.upper() 630 631 >>> list(map(upper, ['sentence', 'fragment'])) 632 ['SENTENCE', 'FRAGMENT'] 633 >>> [upper(s) for s in ['sentence', 'fragment']] 634 ['SENTENCE', 'FRAGMENT'] 635 636 You can of course achieve the same effect with a list comprehension. 637 638 :func:`filter(predicate, iter) <filter>` returns an iterator over all the 639 sequence elements that meet a certain condition, and is similarly duplicated by 640 list comprehensions. A **predicate** is a function that returns the truth 641 value of some condition; for use with :func:`filter`, the predicate must take a 642 single value. 643 644 >>> def is_even(x): 645 ... return (x % 2) == 0 646 647 >>> list(filter(is_even, range(10))) 648 [0, 2, 4, 6, 8] 649 650 651 This can also be written as a list comprehension: 652 653 >>> list(x for x in range(10) if is_even(x)) 654 [0, 2, 4, 6, 8] 655 656 657 :func:`enumerate(iter, start=0) <enumerate>` counts off the elements in the 658 iterable returning 2-tuples containing the count (from *start*) and 659 each element. :: 660 661 >>> for item in enumerate(['subject', 'verb', 'object']): 662 ... print(item) 663 (0, 'subject') 664 (1, 'verb') 665 (2, 'object') 666 667 :func:`enumerate` is often used when looping through a list and recording the 668 indexes at which certain conditions are met:: 669 670 f = open('data.txt', 'r') 671 for i, line in enumerate(f): 672 if line.strip() == '': 673 print('Blank line at line #%i' % i) 674 675 :func:`sorted(iterable, key=None, reverse=False) <sorted>` collects all the 676 elements of the iterable into a list, sorts the list, and returns the sorted 677 result. The *key* and *reverse* arguments are passed through to the 678 constructed list's :meth:`~list.sort` method. :: 679 680 >>> import random 681 >>> # Generate 8 random numbers between [0, 10000) 682 >>> rand_list = random.sample(range(10000), 8) 683 >>> rand_list #doctest: +SKIP 684 [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207] 685 >>> sorted(rand_list) #doctest: +SKIP 686 [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878] 687 >>> sorted(rand_list, reverse=True) #doctest: +SKIP 688 [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] 689 690 (For a more detailed discussion of sorting, see the :ref:`sortinghowto`.) 691 692 693 The :func:`any(iter) <any>` and :func:`all(iter) <all>` built-ins look at the 694 truth values of an iterable's contents. :func:`any` returns ``True`` if any element 695 in the iterable is a true value, and :func:`all` returns ``True`` if all of the 696 elements are true values: 697 698 >>> any([0, 1, 0]) 699 True 700 >>> any([0, 0, 0]) 701 False 702 >>> any([1, 1, 1]) 703 True 704 >>> all([0, 1, 0]) 705 False 706 >>> all([0, 0, 0]) 707 False 708 >>> all([1, 1, 1]) 709 True 710 711 712 :func:`zip(iterA, iterB, ...) <zip>` takes one element from each iterable and 713 returns them in a tuple:: 714 715 zip(['a', 'b', 'c'], (1, 2, 3)) => 716 ('a', 1), ('b', 2), ('c', 3) 717 718 It doesn't construct an in-memory list and exhaust all the input iterators 719 before returning; instead tuples are constructed and returned only if they're 720 requested. (The technical term for this behaviour is `lazy evaluation 721 <https://en.wikipedia.org/wiki/Lazy_evaluation>`__.) 722 723 This iterator is intended to be used with iterables that are all of the same 724 length. If the iterables are of different lengths, the resulting stream will be 725 the same length as the shortest iterable. :: 726 727 zip(['a', 'b'], (1, 2, 3)) => 728 ('a', 1), ('b', 2) 729 730 You should avoid doing this, though, because an element may be taken from the 731 longer iterators and discarded. This means you can't go on to use the iterators 732 further because you risk skipping a discarded element. 733 734 735 The itertools module 736 ==================== 737 738 The :mod:`itertools` module contains a number of commonly-used iterators as well 739 as functions for combining several iterators. This section will introduce the 740 module's contents by showing small examples. 741 742 The module's functions fall into a few broad classes: 743 744 * Functions that create a new iterator based on an existing iterator. 745 * Functions for treating an iterator's elements as function arguments. 746 * Functions for selecting portions of an iterator's output. 747 * A function for grouping an iterator's output. 748 749 Creating new iterators 750 ---------------------- 751 752 :func:`itertools.count(start, step) <itertools.count>` returns an infinite 753 stream of evenly spaced values. You can optionally supply the starting number, 754 which defaults to 0, and the interval between numbers, which defaults to 1:: 755 756 itertools.count() => 757 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 758 itertools.count(10) => 759 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... 760 itertools.count(10, 5) => 761 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, ... 762 763 :func:`itertools.cycle(iter) <itertools.cycle>` saves a copy of the contents of 764 a provided iterable and returns a new iterator that returns its elements from 765 first to last. The new iterator will repeat these elements infinitely. :: 766 767 itertools.cycle([1, 2, 3, 4, 5]) => 768 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... 769 770 :func:`itertools.repeat(elem, [n]) <itertools.repeat>` returns the provided 771 element *n* times, or returns the element endlessly if *n* is not provided. :: 772 773 itertools.repeat('abc') => 774 abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... 775 itertools.repeat('abc', 5) => 776 abc, abc, abc, abc, abc 777 778 :func:`itertools.chain(iterA, iterB, ...) <itertools.chain>` takes an arbitrary 779 number of iterables as input, and returns all the elements of the first 780 iterator, then all the elements of the second, and so on, until all of the 781 iterables have been exhausted. :: 782 783 itertools.chain(['a', 'b', 'c'], (1, 2, 3)) => 784 a, b, c, 1, 2, 3 785 786 :func:`itertools.islice(iter, [start], stop, [step]) <itertools.islice>` returns 787 a stream that's a slice of the iterator. With a single *stop* argument, it 788 will return the first *stop* elements. If you supply a starting index, you'll 789 get *stop-start* elements, and if you supply a value for *step*, elements 790 will be skipped accordingly. Unlike Python's string and list slicing, you can't 791 use negative values for *start*, *stop*, or *step*. :: 792 793 itertools.islice(range(10), 8) => 794 0, 1, 2, 3, 4, 5, 6, 7 795 itertools.islice(range(10), 2, 8) => 796 2, 3, 4, 5, 6, 7 797 itertools.islice(range(10), 2, 8, 2) => 798 2, 4, 6 799 800 :func:`itertools.tee(iter, [n]) <itertools.tee>` replicates an iterator; it 801 returns *n* independent iterators that will all return the contents of the 802 source iterator. 803 If you don't supply a value for *n*, the default is 2. Replicating iterators 804 requires saving some of the contents of the source iterator, so this can consume 805 significant memory if the iterator is large and one of the new iterators is 806 consumed more than the others. :: 807 808 itertools.tee( itertools.count() ) => 809 iterA, iterB 810 811 where iterA -> 812 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 813 814 and iterB -> 815 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 816 817 818 Calling functions on elements 819 ----------------------------- 820 821 The :mod:`operator` module contains a set of functions corresponding to Python's 822 operators. Some examples are :func:`operator.add(a, b) <operator.add>` (adds 823 two values), :func:`operator.ne(a, b) <operator.ne>` (same as ``a != b``), and 824 :func:`operator.attrgetter('id') <operator.attrgetter>` 825 (returns a callable that fetches the ``.id`` attribute). 826 827 :func:`itertools.starmap(func, iter) <itertools.starmap>` assumes that the 828 iterable will return a stream of tuples, and calls *func* using these tuples as 829 the arguments:: 830 831 itertools.starmap(os.path.join, 832 [('/bin', 'python'), ('/usr', 'bin', 'java'), 833 ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')]) 834 => 835 /bin/python, /usr/bin/java, /usr/bin/perl, /usr/bin/ruby 836 837 838 Selecting elements 839 ------------------ 840 841 Another group of functions chooses a subset of an iterator's elements based on a 842 predicate. 843 844 :func:`itertools.filterfalse(predicate, iter) <itertools.filterfalse>` is the 845 opposite of :func:`filter`, returning all elements for which the predicate 846 returns false:: 847 848 itertools.filterfalse(is_even, itertools.count()) => 849 1, 3, 5, 7, 9, 11, 13, 15, ... 850 851 :func:`itertools.takewhile(predicate, iter) <itertools.takewhile>` returns 852 elements for as long as the predicate returns true. Once the predicate returns 853 false, the iterator will signal the end of its results. :: 854 855 def less_than_10(x): 856 return x < 10 857 858 itertools.takewhile(less_than_10, itertools.count()) => 859 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 860 861 itertools.takewhile(is_even, itertools.count()) => 862 0 863 864 :func:`itertools.dropwhile(predicate, iter) <itertools.dropwhile>` discards 865 elements while the predicate returns true, and then returns the rest of the 866 iterable's results. :: 867 868 itertools.dropwhile(less_than_10, itertools.count()) => 869 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... 870 871 itertools.dropwhile(is_even, itertools.count()) => 872 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... 873 874 :func:`itertools.compress(data, selectors) <itertools.compress>` takes two 875 iterators and returns only those elements of *data* for which the corresponding 876 element of *selectors* is true, stopping whenever either one is exhausted:: 877 878 itertools.compress([1, 2, 3, 4, 5], [True, True, False, False, True]) => 879 1, 2, 5 880 881 882 Combinatoric functions 883 ---------------------- 884 885 The :func:`itertools.combinations(iterable, r) <itertools.combinations>` 886 returns an iterator giving all possible *r*-tuple combinations of the 887 elements contained in *iterable*. :: 888 889 itertools.combinations([1, 2, 3, 4, 5], 2) => 890 (1, 2), (1, 3), (1, 4), (1, 5), 891 (2, 3), (2, 4), (2, 5), 892 (3, 4), (3, 5), 893 (4, 5) 894 895 itertools.combinations([1, 2, 3, 4, 5], 3) => 896 (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), 897 (2, 3, 4), (2, 3, 5), (2, 4, 5), 898 (3, 4, 5) 899 900 The elements within each tuple remain in the same order as 901 *iterable* returned them. For example, the number 1 is always before 902 2, 3, 4, or 5 in the examples above. A similar function, 903 :func:`itertools.permutations(iterable, r=None) <itertools.permutations>`, 904 removes this constraint on the order, returning all possible 905 arrangements of length *r*:: 906 907 itertools.permutations([1, 2, 3, 4, 5], 2) => 908 (1, 2), (1, 3), (1, 4), (1, 5), 909 (2, 1), (2, 3), (2, 4), (2, 5), 910 (3, 1), (3, 2), (3, 4), (3, 5), 911 (4, 1), (4, 2), (4, 3), (4, 5), 912 (5, 1), (5, 2), (5, 3), (5, 4) 913 914 itertools.permutations([1, 2, 3, 4, 5]) => 915 (1, 2, 3, 4, 5), (1, 2, 3, 5, 4), (1, 2, 4, 3, 5), 916 ... 917 (5, 4, 3, 2, 1) 918 919 If you don't supply a value for *r* the length of the iterable is used, 920 meaning that all the elements are permuted. 921 922 Note that these functions produce all of the possible combinations by 923 position and don't require that the contents of *iterable* are unique:: 924 925 itertools.permutations('aba', 3) => 926 ('a', 'b', 'a'), ('a', 'a', 'b'), ('b', 'a', 'a'), 927 ('b', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'a') 928 929 The identical tuple ``('a', 'a', 'b')`` occurs twice, but the two 'a' 930 strings came from different positions. 931 932 The :func:`itertools.combinations_with_replacement(iterable, r) <itertools.combinations_with_replacement>` 933 function relaxes a different constraint: elements can be repeated 934 within a single tuple. Conceptually an element is selected for the 935 first position of each tuple and then is replaced before the second 936 element is selected. :: 937 938 itertools.combinations_with_replacement([1, 2, 3, 4, 5], 2) => 939 (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), 940 (2, 2), (2, 3), (2, 4), (2, 5), 941 (3, 3), (3, 4), (3, 5), 942 (4, 4), (4, 5), 943 (5, 5) 944 945 946 Grouping elements 947 ----------------- 948 949 The last function I'll discuss, :func:`itertools.groupby(iter, key_func=None) 950 <itertools.groupby>`, is the most complicated. ``key_func(elem)`` is a function 951 that can compute a key value for each element returned by the iterable. If you 952 don't supply a key function, the key is simply each element itself. 953 954 :func:`~itertools.groupby` collects all the consecutive elements from the 955 underlying iterable that have the same key value, and returns a stream of 956 2-tuples containing a key value and an iterator for the elements with that key. 957 958 :: 959 960 city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), 961 ('Anchorage', 'AK'), ('Nome', 'AK'), 962 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), 963 ... 964 ] 965 966 def get_state(city_state): 967 return city_state[1] 968 969 itertools.groupby(city_list, get_state) => 970 ('AL', iterator-1), 971 ('AK', iterator-2), 972 ('AZ', iterator-3), ... 973 974 where 975 iterator-1 => 976 ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL') 977 iterator-2 => 978 ('Anchorage', 'AK'), ('Nome', 'AK') 979 iterator-3 => 980 ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ') 981 982 :func:`~itertools.groupby` assumes that the underlying iterable's contents will 983 already be sorted based on the key. Note that the returned iterators also use 984 the underlying iterable, so you have to consume the results of iterator-1 before 985 requesting iterator-2 and its corresponding key. 986 987 988 The functools module 989 ==================== 990 991 The :mod:`functools` module in Python 2.5 contains some higher-order functions. 992 A **higher-order function** takes one or more functions as input and returns a 993 new function. The most useful tool in this module is the 994 :func:`functools.partial` function. 995 996 For programs written in a functional style, you'll sometimes want to construct 997 variants of existing functions that have some of the parameters filled in. 998 Consider a Python function ``f(a, b, c)``; you may wish to create a new function 999 ``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for 1000 one of ``f()``'s parameters. This is called "partial function application". 1001 1002 The constructor for :func:`~functools.partial` takes the arguments 1003 ``(function, arg1, arg2, ..., kwarg1=value1, kwarg2=value2)``. The resulting 1004 object is callable, so you can just call it to invoke ``function`` with the 1005 filled-in arguments. 1006 1007 Here's a small but realistic example:: 1008 1009 import functools 1010 1011 def log(message, subsystem): 1012 """Write the contents of 'message' to the specified subsystem.""" 1013 print('%s: %s' % (subsystem, message)) 1014 ... 1015 1016 server_log = functools.partial(log, subsystem='server') 1017 server_log('Unable to open socket') 1018 1019 :func:`functools.reduce(func, iter, [initial_value]) <functools.reduce>` 1020 cumulatively performs an operation on all the iterable's elements and, 1021 therefore, can't be applied to infinite iterables. *func* must be a function 1022 that takes two elements and returns a single value. :func:`functools.reduce` 1023 takes the first two elements A and B returned by the iterator and calculates 1024 ``func(A, B)``. It then requests the third element, C, calculates 1025 ``func(func(A, B), C)``, combines this result with the fourth element returned, 1026 and continues until the iterable is exhausted. If the iterable returns no 1027 values at all, a :exc:`TypeError` exception is raised. If the initial value is 1028 supplied, it's used as a starting point and ``func(initial_value, A)`` is the 1029 first calculation. :: 1030 1031 >>> import operator, functools 1032 >>> functools.reduce(operator.concat, ['A', 'BB', 'C']) 1033 'ABBC' 1034 >>> functools.reduce(operator.concat, []) 1035 Traceback (most recent call last): 1036 ... 1037 TypeError: reduce() of empty sequence with no initial value 1038 >>> functools.reduce(operator.mul, [1, 2, 3], 1) 1039 6 1040 >>> functools.reduce(operator.mul, [], 1) 1041 1 1042 1043 If you use :func:`operator.add` with :func:`functools.reduce`, you'll add up all the 1044 elements of the iterable. This case is so common that there's a special 1045 built-in called :func:`sum` to compute it: 1046 1047 >>> import functools, operator 1048 >>> functools.reduce(operator.add, [1, 2, 3, 4], 0) 1049 10 1050 >>> sum([1, 2, 3, 4]) 1051 10 1052 >>> sum([]) 1053 0 1054 1055 For many uses of :func:`functools.reduce`, though, it can be clearer to just 1056 write the obvious :keyword:`for` loop:: 1057 1058 import functools 1059 # Instead of: 1060 product = functools.reduce(operator.mul, [1, 2, 3], 1) 1061 1062 # You can write: 1063 product = 1 1064 for i in [1, 2, 3]: 1065 product *= i 1066 1067 A related function is :func:`itertools.accumulate(iterable, func=operator.add) 1068 <itertools.accumulate>`. It performs the same calculation, but instead of 1069 returning only the final result, :func:`accumulate` returns an iterator that 1070 also yields each partial result:: 1071 1072 itertools.accumulate([1, 2, 3, 4, 5]) => 1073 1, 3, 6, 10, 15 1074 1075 itertools.accumulate([1, 2, 3, 4, 5], operator.mul) => 1076 1, 2, 6, 24, 120 1077 1078 1079 The operator module 1080 ------------------- 1081 1082 The :mod:`operator` module was mentioned earlier. It contains a set of 1083 functions corresponding to Python's operators. These functions are often useful 1084 in functional-style code because they save you from writing trivial functions 1085 that perform a single operation. 1086 1087 Some of the functions in this module are: 1088 1089 * Math operations: ``add()``, ``sub()``, ``mul()``, ``floordiv()``, ``abs()``, ... 1090 * Logical operations: ``not_()``, ``truth()``. 1091 * Bitwise operations: ``and_()``, ``or_()``, ``invert()``. 1092 * Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``. 1093 * Object identity: ``is_()``, ``is_not()``. 1094 1095 Consult the operator module's documentation for a complete list. 1096 1097 1098 Small functions and the lambda expression 1099 ========================================= 1100 1101 When writing functional-style programs, you'll often need little functions that 1102 act as predicates or that combine elements in some way. 1103 1104 If there's a Python built-in or a module function that's suitable, you don't 1105 need to define a new function at all:: 1106 1107 stripped_lines = [line.strip() for line in lines] 1108 existing_files = filter(os.path.exists, file_list) 1109 1110 If the function you need doesn't exist, you need to write it. One way to write 1111 small functions is to use the :keyword:`lambda` expression. ``lambda`` takes a 1112 number of parameters and an expression combining these parameters, and creates 1113 an anonymous function that returns the value of the expression:: 1114 1115 adder = lambda x, y: x+y 1116 1117 print_assign = lambda name, value: name + '=' + str(value) 1118 1119 An alternative is to just use the ``def`` statement and define a function in the 1120 usual way:: 1121 1122 def adder(x, y): 1123 return x + y 1124 1125 def print_assign(name, value): 1126 return name + '=' + str(value) 1127 1128 Which alternative is preferable? That's a style question; my usual course is to 1129 avoid using ``lambda``. 1130 1131 One reason for my preference is that ``lambda`` is quite limited in the 1132 functions it can define. The result has to be computable as a single 1133 expression, which means you can't have multiway ``if... elif... else`` 1134 comparisons or ``try... except`` statements. If you try to do too much in a 1135 ``lambda`` statement, you'll end up with an overly complicated expression that's 1136 hard to read. Quick, what's the following code doing? :: 1137 1138 import functools 1139 total = functools.reduce(lambda a, b: (0, a[1] + b[1]), items)[1] 1140 1141 You can figure it out, but it takes time to disentangle the expression to figure 1142 out what's going on. Using a short nested ``def`` statements makes things a 1143 little bit better:: 1144 1145 import functools 1146 def combine(a, b): 1147 return 0, a[1] + b[1] 1148 1149 total = functools.reduce(combine, items)[1] 1150 1151 But it would be best of all if I had simply used a ``for`` loop:: 1152 1153 total = 0 1154 for a, b in items: 1155 total += b 1156 1157 Or the :func:`sum` built-in and a generator expression:: 1158 1159 total = sum(b for a, b in items) 1160 1161 Many uses of :func:`functools.reduce` are clearer when written as ``for`` loops. 1162 1163 Fredrik Lundh once suggested the following set of rules for refactoring uses of 1164 ``lambda``: 1165 1166 1. Write a lambda function. 1167 2. Write a comment explaining what the heck that lambda does. 1168 3. Study the comment for a while, and think of a name that captures the essence 1169 of the comment. 1170 4. Convert the lambda to a def statement, using that name. 1171 5. Remove the comment. 1172 1173 I really like these rules, but you're free to disagree 1174 about whether this lambda-free style is better. 1175 1176 1177 Revision History and Acknowledgements 1178 ===================================== 1179 1180 The author would like to thank the following people for offering suggestions, 1181 corrections and assistance with various drafts of this article: Ian Bicking, 1182 Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro 1183 Lameiro, Jussi Salmela, Collin Winter, Blake Winton. 1184 1185 Version 0.1: posted June 30 2006. 1186 1187 Version 0.11: posted July 1 2006. Typo fixes. 1188 1189 Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one. 1190 Typo fixes. 1191 1192 Version 0.21: Added more references suggested on the tutor mailing list. 1193 1194 Version 0.30: Adds a section on the ``functional`` module written by Collin 1195 Winter; adds short section on the operator module; a few other edits. 1196 1197 1198 References 1199 ========== 1200 1201 General 1202 ------- 1203 1204 **Structure and Interpretation of Computer Programs**, by Harold Abelson and 1205 Gerald Jay Sussman with Julie Sussman. Full text at 1206 https://mitpress.mit.edu/sicp/. In this classic textbook of computer science, 1207 chapters 2 and 3 discuss the use of sequences and streams to organize the data 1208 flow inside a program. The book uses Scheme for its examples, but many of the 1209 design approaches described in these chapters are applicable to functional-style 1210 Python code. 1211 1212 http://www.defmacro.org/ramblings/fp.html: A general introduction to functional 1213 programming that uses Java examples and has a lengthy historical introduction. 1214 1215 https://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry 1216 describing functional programming. 1217 1218 https://en.wikipedia.org/wiki/Coroutine: Entry for coroutines. 1219 1220 https://en.wikipedia.org/wiki/Currying: Entry for the concept of currying. 1221 1222 Python-specific 1223 --------------- 1224 1225 http://gnosis.cx/TPiP/: The first chapter of David Mertz's book 1226 :title-reference:`Text Processing in Python` discusses functional programming 1227 for text processing, in the section titled "Utilizing Higher-Order Functions in 1228 Text Processing". 1229 1230 Mertz also wrote a 3-part series of articles on functional programming 1231 for IBM's DeveloperWorks site; see 1232 `part 1 <https://www.ibm.com/developerworks/linux/library/l-prog/index.html>`__, 1233 `part 2 <https://www.ibm.com/developerworks/linux/library/l-prog2/index.html>`__, and 1234 `part 3 <https://www.ibm.com/developerworks/linux/library/l-prog3/index.html>`__, 1235 1236 1237 Python documentation 1238 -------------------- 1239 1240 Documentation for the :mod:`itertools` module. 1241 1242 Documentation for the :mod:`functools` module. 1243 1244 Documentation for the :mod:`operator` module. 1245 1246 :pep:`289`: "Generator Expressions" 1247 1248 :pep:`342`: "Coroutines via Enhanced Generators" describes the new generator 1249 features in Python 2.5. 1250 1251 .. comment 1252 1253 Handy little function for printing part of an iterator -- used 1254 while writing this document. 1255 1256 import itertools 1257 def print_iter(it): 1258 slice = itertools.islice(it, 10) 1259 for elem in slice[:-1]: 1260 sys.stdout.write(str(elem)) 1261 sys.stdout.write(', ') 1262 print(elem[-1]) 1263