1 .. _sortinghowto: 2 3 Sorting HOW TO 4 ************** 5 6 :Author: Andrew Dalke and Raymond Hettinger 7 :Release: 0.1 8 9 10 Python lists have a built-in :meth:`list.sort` method that modifies the list 11 in-place. There is also a :func:`sorted` built-in function that builds a new 12 sorted list from an iterable. 13 14 In this document, we explore the various techniques for sorting data using Python. 15 16 17 Sorting Basics 18 ============== 19 20 A simple ascending sort is very easy: just call the :func:`sorted` function. It 21 returns a new sorted list:: 22 23 >>> sorted([5, 2, 3, 1, 4]) 24 [1, 2, 3, 4, 5] 25 26 You can also use the :meth:`list.sort` method. It modifies the list 27 in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 28 than :func:`sorted` - but if you don't need the original list, it's slightly 29 more efficient. 30 31 >>> a = [5, 2, 3, 1, 4] 32 >>> a.sort() 33 >>> a 34 [1, 2, 3, 4, 5] 35 36 Another difference is that the :meth:`list.sort` method is only defined for 37 lists. In contrast, the :func:`sorted` function accepts any iterable. 38 39 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 40 [1, 2, 3, 4, 5] 41 42 Key Functions 43 ============= 44 45 Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 46 function to be called on each list element prior to making comparisons. 47 48 For example, here's a case-insensitive string comparison: 49 50 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 51 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 52 53 The value of the *key* parameter should be a function that takes a single argument 54 and returns a key to use for sorting purposes. This technique is fast because 55 the key function is called exactly once for each input record. 56 57 A common pattern is to sort complex objects using some of the object's indices 58 as keys. For example: 59 60 >>> student_tuples = [ 61 ... ('john', 'A', 15), 62 ... ('jane', 'B', 12), 63 ... ('dave', 'B', 10), 64 ... ] 65 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 66 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 67 68 The same technique works for objects with named attributes. For example: 69 70 >>> class Student: 71 ... def __init__(self, name, grade, age): 72 ... self.name = name 73 ... self.grade = grade 74 ... self.age = age 75 ... def __repr__(self): 76 ... return repr((self.name, self.grade, self.age)) 77 78 >>> student_objects = [ 79 ... Student('john', 'A', 15), 80 ... Student('jane', 'B', 12), 81 ... Student('dave', 'B', 10), 82 ... ] 83 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 84 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 85 86 Operator Module Functions 87 ========================= 88 89 The key-function patterns shown above are very common, so Python provides 90 convenience functions to make accessor functions easier and faster. The 91 :mod:`operator` module has :func:`~operator.itemgetter`, 92 :func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 93 94 Using those functions, the above examples become simpler and faster: 95 96 >>> from operator import itemgetter, attrgetter 97 98 >>> sorted(student_tuples, key=itemgetter(2)) 99 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 100 101 >>> sorted(student_objects, key=attrgetter('age')) 102 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 103 104 The operator module functions allow multiple levels of sorting. For example, to 105 sort by *grade* then by *age*: 106 107 >>> sorted(student_tuples, key=itemgetter(1,2)) 108 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 109 110 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 111 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 112 113 Ascending and Descending 114 ======================== 115 116 Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 117 boolean value. This is used to flag descending sorts. For example, to get the 118 student data in reverse *age* order: 119 120 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 121 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 122 123 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 124 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 125 126 Sort Stability and Complex Sorts 127 ================================ 128 129 Sorts are guaranteed to be `stable 130 <https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 131 when multiple records have the same key, their original order is preserved. 132 133 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 134 >>> sorted(data, key=itemgetter(0)) 135 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 136 137 Notice how the two records for *blue* retain their original order so that 138 ``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 139 140 This wonderful property lets you build complex sorts in a series of sorting 141 steps. For example, to sort the student data by descending *grade* and then 142 ascending *age*, do the *age* sort first and then sort again using *grade*: 143 144 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 145 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 146 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 147 148 The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 149 does multiple sorts efficiently because it can take advantage of any ordering 150 already present in a dataset. 151 152 The Old Way Using Decorate-Sort-Undecorate 153 ========================================== 154 155 This idiom is called Decorate-Sort-Undecorate after its three steps: 156 157 * First, the initial list is decorated with new values that control the sort order. 158 159 * Second, the decorated list is sorted. 160 161 * Finally, the decorations are removed, creating a list that contains only the 162 initial values in the new order. 163 164 For example, to sort the student data by *grade* using the DSU approach: 165 166 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 167 >>> decorated.sort() 168 >>> [student for grade, i, student in decorated] # undecorate 169 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 170 171 This idiom works because tuples are compared lexicographically; the first items 172 are compared; if they are the same then the second items are compared, and so 173 on. 174 175 It is not strictly necessary in all cases to include the index *i* in the 176 decorated list, but including it gives two benefits: 177 178 * The sort is stable -- if two items have the same key, their order will be 179 preserved in the sorted list. 180 181 * The original items do not have to be comparable because the ordering of the 182 decorated tuples will be determined by at most the first two items. So for 183 example the original list could contain complex numbers which cannot be sorted 184 directly. 185 186 Another name for this idiom is 187 `Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 188 after Randal L. Schwartz, who popularized it among Perl programmers. 189 190 Now that Python sorting provides key-functions, this technique is not often needed. 191 192 193 The Old Way Using the *cmp* Parameter 194 ===================================== 195 196 Many constructs given in this HOWTO assume Python 2.4 or later. Before that, 197 there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword 198 arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to 199 handle user specified comparison functions. 200 201 In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to 202 simplify and unify the language, eliminating the conflict between rich 203 comparisons and the :meth:`__cmp__` magic method). 204 205 In Py2.x, sort allowed an optional function which can be called for doing the 206 comparisons. That function should take two arguments to be compared and then 207 return a negative value for less-than, return zero if they are equal, or return 208 a positive value for greater-than. For example, we can do: 209 210 >>> def numeric_compare(x, y): 211 ... return x - y 212 >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP 213 [1, 2, 3, 4, 5] 214 215 Or you can reverse the order of comparison with: 216 217 >>> def reverse_numeric(x, y): 218 ... return y - x 219 >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP 220 [5, 4, 3, 2, 1] 221 222 When porting code from Python 2.x to 3.x, the situation can arise when you have 223 the user supplying a comparison function and you need to convert that to a key 224 function. The following wrapper makes that easy to do:: 225 226 def cmp_to_key(mycmp): 227 'Convert a cmp= function into a key= function' 228 class K: 229 def __init__(self, obj, *args): 230 self.obj = obj 231 def __lt__(self, other): 232 return mycmp(self.obj, other.obj) < 0 233 def __gt__(self, other): 234 return mycmp(self.obj, other.obj) > 0 235 def __eq__(self, other): 236 return mycmp(self.obj, other.obj) == 0 237 def __le__(self, other): 238 return mycmp(self.obj, other.obj) <= 0 239 def __ge__(self, other): 240 return mycmp(self.obj, other.obj) >= 0 241 def __ne__(self, other): 242 return mycmp(self.obj, other.obj) != 0 243 return K 244 245 To convert to a key function, just wrap the old comparison function: 246 247 .. testsetup:: 248 249 from functools import cmp_to_key 250 251 .. doctest:: 252 253 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 254 [5, 4, 3, 2, 1] 255 256 In Python 3.2, the :func:`functools.cmp_to_key` function was added to the 257 :mod:`functools` module in the standard library. 258 259 Odd and Ends 260 ============ 261 262 * For locale aware sorting, use :func:`locale.strxfrm` for a key function or 263 :func:`locale.strcoll` for a comparison function. 264 265 * The *reverse* parameter still maintains sort stability (so that records with 266 equal keys retain the original order). Interestingly, that effect can be 267 simulated without the parameter by using the builtin :func:`reversed` function 268 twice: 269 270 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 271 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 272 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 273 >>> assert standard_way == double_reversed 274 >>> standard_way 275 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 276 277 * The sort routines are guaranteed to use :meth:`__lt__` when making comparisons 278 between two objects. So, it is easy to add a standard sort order to a class by 279 defining an :meth:`__lt__` method:: 280 281 >>> Student.__lt__ = lambda self, other: self.age < other.age 282 >>> sorted(student_objects) 283 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 284 285 * Key functions need not depend directly on the objects being sorted. A key 286 function can also access external resources. For instance, if the student grades 287 are stored in a dictionary, they can be used to sort a separate list of student 288 names: 289 290 >>> students = ['dave', 'john', 'jane'] 291 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 292 >>> sorted(students, key=newgrades.__getitem__) 293 ['jane', 'dave', 'john'] 294