1 2 :mod:`csv` --- CSV File Reading and Writing 3 =========================================== 4 5 .. module:: csv 6 :synopsis: Write and read tabular data to and from delimited files. 7 .. sectionauthor:: Skip Montanaro <skip (a] pobox.com> 8 9 10 .. versionadded:: 2.3 11 12 .. index:: 13 single: csv 14 pair: data; tabular 15 16 The so-called CSV (Comma Separated Values) format is the most common import and 17 export format for spreadsheets and databases. There is no "CSV standard", so 18 the format is operationally defined by the many applications which read and 19 write it. The lack of a standard means that subtle differences often exist in 20 the data produced and consumed by different applications. These differences can 21 make it annoying to process CSV files from multiple sources. Still, while the 22 delimiters and quoting characters vary, the overall format is similar enough 23 that it is possible to write a single module which can efficiently manipulate 24 such data, hiding the details of reading and writing the data from the 25 programmer. 26 27 The :mod:`csv` module implements classes to read and write tabular data in CSV 28 format. It allows programmers to say, "write this data in the format preferred 29 by Excel," or "read data from this file which was generated by Excel," without 30 knowing the precise details of the CSV format used by Excel. Programmers can 31 also describe the CSV formats understood by other applications or define their 32 own special-purpose CSV formats. 33 34 The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and 35 write sequences. Programmers can also read and write data in dictionary form 36 using the :class:`DictReader` and :class:`DictWriter` classes. 37 38 .. note:: 39 40 This version of the :mod:`csv` module doesn't support Unicode input. Also, 41 there are currently some issues regarding ASCII NUL characters. Accordingly, 42 all input should be UTF-8 or printable ASCII to be safe; see the examples in 43 section :ref:`csv-examples`. 44 45 46 .. seealso:: 47 48 :pep:`305` - CSV File API 49 The Python Enhancement Proposal which proposed this addition to Python. 50 51 52 .. _csv-contents: 53 54 Module Contents 55 --------------- 56 57 The :mod:`csv` module defines the following functions: 58 59 60 .. function:: reader(csvfile, dialect='excel', **fmtparams) 61 62 Return a reader object which will iterate over lines in the given *csvfile*. 63 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a 64 string each time its :meth:`!next` method is called --- file objects and list 65 objects are both suitable. If *csvfile* is a file object, it must be opened 66 with the 'b' flag on platforms where that makes a difference. An optional 67 *dialect* parameter can be given which is used to define a set of parameters 68 specific to a particular CSV dialect. It may be an instance of a subclass of 69 the :class:`Dialect` class or one of the strings returned by the 70 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments 71 can be given to override individual formatting parameters in the current 72 dialect. For full details about the dialect and formatting parameters, see 73 section :ref:`csv-fmt-params`. 74 75 Each row read from the csv file is returned as a list of strings. No 76 automatic data type conversion is performed. 77 78 A short usage example:: 79 80 >>> import csv 81 >>> with open('eggs.csv', 'rb') as csvfile: 82 ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') 83 ... for row in spamreader: 84 ... print ', '.join(row) 85 Spam, Spam, Spam, Spam, Spam, Baked Beans 86 Spam, Lovely Spam, Wonderful Spam 87 88 .. versionchanged:: 2.5 89 The parser is now stricter with respect to multi-line quoted fields. Previously, 90 if a line ended within a quoted field without a terminating newline character, a 91 newline would be inserted into the returned field. This behavior caused problems 92 when reading files which contained carriage return characters within fields. 93 The behavior was changed to return the field without inserting newlines. As a 94 consequence, if newlines embedded within fields are important, the input should 95 be split into lines in a manner which preserves the newline characters. 96 97 98 .. function:: writer(csvfile, dialect='excel', **fmtparams) 99 100 Return a writer object responsible for converting the user's data into delimited 101 strings on the given file-like object. *csvfile* can be any object with a 102 :func:`write` method. If *csvfile* is a file object, it must be opened with the 103 'b' flag on platforms where that makes a difference. An optional *dialect* 104 parameter can be given which is used to define a set of parameters specific to a 105 particular CSV dialect. It may be an instance of a subclass of the 106 :class:`Dialect` class or one of the strings returned by the 107 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments 108 can be given to override individual formatting parameters in the current 109 dialect. For full details about the dialect and formatting parameters, see 110 section :ref:`csv-fmt-params`. To make it 111 as easy as possible to interface with modules which implement the DB API, the 112 value :const:`None` is written as the empty string. While this isn't a 113 reversible transformation, it makes it easier to dump SQL NULL data values to 114 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call. 115 Floats are stringified with :func:`repr` before being written. 116 All other non-string data are stringified with :func:`str` before being written. 117 118 A short usage example:: 119 120 import csv 121 with open('eggs.csv', 'wb') as csvfile: 122 spamwriter = csv.writer(csvfile, delimiter=' ', 123 quotechar='|', quoting=csv.QUOTE_MINIMAL) 124 spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) 125 spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) 126 127 128 .. function:: register_dialect(name[, dialect], **fmtparams) 129 130 Associate *dialect* with *name*. *name* must be a string or Unicode object. The 131 dialect can be specified either by passing a sub-class of :class:`Dialect`, or 132 by *fmtparams* keyword arguments, or both, with keyword arguments overriding 133 parameters of the dialect. For full details about the dialect and formatting 134 parameters, see section :ref:`csv-fmt-params`. 135 136 137 .. function:: unregister_dialect(name) 138 139 Delete the dialect associated with *name* from the dialect registry. An 140 :exc:`Error` is raised if *name* is not a registered dialect name. 141 142 143 .. function:: get_dialect(name) 144 145 Return the dialect associated with *name*. An :exc:`Error` is raised if *name* 146 is not a registered dialect name. 147 148 .. versionchanged:: 2.5 149 This function now returns an immutable :class:`Dialect`. Previously an 150 instance of the requested dialect was returned. Users could modify the 151 underlying class, changing the behavior of active readers and writers. 152 153 .. function:: list_dialects() 154 155 Return the names of all registered dialects. 156 157 158 .. function:: field_size_limit([new_limit]) 159 160 Returns the current maximum field size allowed by the parser. If *new_limit* is 161 given, this becomes the new limit. 162 163 .. versionadded:: 2.5 164 165 The :mod:`csv` module defines the following classes: 166 167 168 .. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, \ 169 dialect='excel', *args, **kwds) 170 171 Create an object which operates like a regular reader but maps the 172 information read into a dict whose keys are given by the optional 173 *fieldnames* parameter. The *fieldnames* parameter is a :ref:`sequence 174 <collections-abstract-base-classes>` whose elements are associated with the 175 fields of the input data in order. These elements become the keys of the 176 resulting dictionary. If the *fieldnames* parameter is omitted, the values 177 in the first row of the *csvfile* will be used as the fieldnames. If the 178 row read has more fields than the fieldnames sequence, the remaining data is 179 added as a sequence keyed by the value of *restkey*. If the row read has 180 fewer fields than the fieldnames sequence, the remaining keys take the value 181 of the optional *restval* parameter. Any other optional or keyword 182 arguments are passed to the underlying :class:`reader` instance. 183 184 A short usage example:: 185 186 >>> import csv 187 >>> with open('names.csv') as csvfile: 188 ... reader = csv.DictReader(csvfile) 189 ... for row in reader: 190 ... print(row['first_name'], row['last_name']) 191 ... 192 Baked Beans 193 Lovely Spam 194 Wonderful Spam 195 196 197 .. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', \ 198 dialect='excel', *args, **kwds) 199 200 Create an object which operates like a regular writer but maps dictionaries 201 onto output rows. The *fieldnames* parameter is a :ref:`sequence 202 <collections-abstract-base-classes>` of keys that identify the order in 203 which values in the dictionary passed to the :meth:`writerow` method are 204 written to the *csvfile*. The optional *restval* parameter specifies the 205 value to be written if the dictionary is missing a key in *fieldnames*. If 206 the dictionary passed to the :meth:`writerow` method contains a key not 207 found in *fieldnames*, the optional *extrasaction* parameter indicates what 208 action to take. If it is set to ``'raise'`` a :exc:`ValueError` is raised. 209 If it is set to ``'ignore'``, extra values in the dictionary are ignored. 210 Any other optional or keyword arguments are passed to the underlying 211 :class:`writer` instance. 212 213 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter 214 of the :class:`DictWriter` is not optional. Since Python's :class:`dict` 215 objects are not ordered, there is not enough information available to deduce 216 the order in which the row should be written to the *csvfile*. 217 218 A short usage example:: 219 220 import csv 221 222 with open('names.csv', 'w') as csvfile: 223 fieldnames = ['first_name', 'last_name'] 224 writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 225 226 writer.writeheader() 227 writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'}) 228 writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'}) 229 writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'}) 230 231 232 .. class:: Dialect 233 234 The :class:`Dialect` class is a container class relied on primarily for its 235 attributes, which are used to define the parameters for a specific 236 :class:`reader` or :class:`writer` instance. 237 238 239 .. class:: excel() 240 241 The :class:`excel` class defines the usual properties of an Excel-generated CSV 242 file. It is registered with the dialect name ``'excel'``. 243 244 245 .. class:: excel_tab() 246 247 The :class:`excel_tab` class defines the usual properties of an Excel-generated 248 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``. 249 250 251 .. class:: Sniffer() 252 253 The :class:`Sniffer` class is used to deduce the format of a CSV file. 254 255 The :class:`Sniffer` class provides two methods: 256 257 .. method:: sniff(sample, delimiters=None) 258 259 Analyze the given *sample* and return a :class:`Dialect` subclass 260 reflecting the parameters found. If the optional *delimiters* parameter 261 is given, it is interpreted as a string containing possible valid 262 delimiter characters. 263 264 265 .. method:: has_header(sample) 266 267 Analyze the sample text (presumed to be in CSV format) and return 268 :const:`True` if the first row appears to be a series of column headers. 269 270 An example for :class:`Sniffer` use:: 271 272 with open('example.csv', 'rb') as csvfile: 273 dialect = csv.Sniffer().sniff(csvfile.read(1024)) 274 csvfile.seek(0) 275 reader = csv.reader(csvfile, dialect) 276 # ... process CSV file contents here ... 277 278 279 The :mod:`csv` module defines the following constants: 280 281 .. data:: QUOTE_ALL 282 283 Instructs :class:`writer` objects to quote all fields. 284 285 286 .. data:: QUOTE_MINIMAL 287 288 Instructs :class:`writer` objects to only quote those fields which contain 289 special characters such as *delimiter*, *quotechar* or any of the characters in 290 *lineterminator*. 291 292 293 .. data:: QUOTE_NONNUMERIC 294 295 Instructs :class:`writer` objects to quote all non-numeric fields. 296 297 Instructs the reader to convert all non-quoted fields to type *float*. 298 299 300 .. data:: QUOTE_NONE 301 302 Instructs :class:`writer` objects to never quote fields. When the current 303 *delimiter* occurs in output data it is preceded by the current *escapechar* 304 character. If *escapechar* is not set, the writer will raise :exc:`Error` if 305 any characters that require escaping are encountered. 306 307 Instructs :class:`reader` to perform no special processing of quote characters. 308 309 The :mod:`csv` module defines the following exception: 310 311 312 .. exception:: Error 313 314 Raised by any of the functions when an error is detected. 315 316 317 .. _csv-fmt-params: 318 319 Dialects and Formatting Parameters 320 ---------------------------------- 321 322 To make it easier to specify the format of input and output records, specific 323 formatting parameters are grouped together into dialects. A dialect is a 324 subclass of the :class:`Dialect` class having a set of specific methods and a 325 single :meth:`validate` method. When creating :class:`reader` or 326 :class:`writer` objects, the programmer can specify a string or a subclass of 327 the :class:`Dialect` class as the dialect parameter. In addition to, or instead 328 of, the *dialect* parameter, the programmer can also specify individual 329 formatting parameters, which have the same names as the attributes defined below 330 for the :class:`Dialect` class. 331 332 Dialects support the following attributes: 333 334 335 .. attribute:: Dialect.delimiter 336 337 A one-character string used to separate fields. It defaults to ``','``. 338 339 340 .. attribute:: Dialect.doublequote 341 342 Controls how instances of *quotechar* appearing inside a field should 343 themselves be quoted. When :const:`True`, the character is doubled. When 344 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It 345 defaults to :const:`True`. 346 347 On output, if *doublequote* is :const:`False` and no *escapechar* is set, 348 :exc:`Error` is raised if a *quotechar* is found in a field. 349 350 351 .. attribute:: Dialect.escapechar 352 353 A one-character string used by the writer to escape the *delimiter* if *quoting* 354 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is 355 :const:`False`. On reading, the *escapechar* removes any special meaning from 356 the following character. It defaults to :const:`None`, which disables escaping. 357 358 359 .. attribute:: Dialect.lineterminator 360 361 The string used to terminate lines produced by the :class:`writer`. It defaults 362 to ``'\r\n'``. 363 364 .. note:: 365 366 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as 367 end-of-line, and ignores *lineterminator*. This behavior may change in the 368 future. 369 370 371 .. attribute:: Dialect.quotechar 372 373 A one-character string used to quote fields containing special characters, such 374 as the *delimiter* or *quotechar*, or which contain new-line characters. It 375 defaults to ``'"'``. 376 377 378 .. attribute:: Dialect.quoting 379 380 Controls when quotes should be generated by the writer and recognised by the 381 reader. It can take on any of the :const:`QUOTE_\*` constants (see section 382 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`. 383 384 385 .. attribute:: Dialect.skipinitialspace 386 387 When :const:`True`, whitespace immediately following the *delimiter* is ignored. 388 The default is :const:`False`. 389 390 391 .. attribute:: Dialect.strict 392 393 When ``True``, raise exception :exc:`Error` on bad CSV input. 394 The default is ``False``. 395 396 Reader Objects 397 -------------- 398 399 Reader objects (:class:`DictReader` instances and objects returned by the 400 :func:`reader` function) have the following public methods: 401 402 403 .. method:: csvreader.next() 404 405 Return the next row of the reader's iterable object as a list, parsed according 406 to the current dialect. 407 408 Reader objects have the following public attributes: 409 410 411 .. attribute:: csvreader.dialect 412 413 A read-only description of the dialect in use by the parser. 414 415 416 .. attribute:: csvreader.line_num 417 418 The number of lines read from the source iterator. This is not the same as the 419 number of records returned, as records can span multiple lines. 420 421 .. versionadded:: 2.5 422 423 424 DictReader objects have the following public attribute: 425 426 427 .. attribute:: csvreader.fieldnames 428 429 If not passed as a parameter when creating the object, this attribute is 430 initialized upon first access or when the first record is read from the 431 file. 432 433 .. versionchanged:: 2.6 434 435 436 Writer Objects 437 -------------- 438 439 :class:`Writer` objects (:class:`DictWriter` instances and objects returned by 440 the :func:`writer` function) have the following public methods. A *row* must be 441 a sequence of strings or numbers for :class:`Writer` objects and a dictionary 442 mapping fieldnames to strings or numbers (by passing them through :func:`str` 443 first) for :class:`DictWriter` objects. Note that complex numbers are written 444 out surrounded by parens. This may cause some problems for other programs which 445 read CSV files (assuming they support complex numbers at all). 446 447 448 .. method:: csvwriter.writerow(row) 449 450 Write the *row* parameter to the writer's file object, formatted according to 451 the current dialect. 452 453 454 .. method:: csvwriter.writerows(rows) 455 456 Write all the *rows* parameters (a list of *row* objects as described above) to 457 the writer's file object, formatted according to the current dialect. 458 459 Writer objects have the following public attribute: 460 461 462 .. attribute:: csvwriter.dialect 463 464 A read-only description of the dialect in use by the writer. 465 466 467 DictWriter objects have the following public method: 468 469 470 .. method:: DictWriter.writeheader() 471 472 Write a row with the field names (as specified in the constructor). 473 474 .. versionadded:: 2.7 475 476 477 .. _csv-examples: 478 479 Examples 480 -------- 481 482 The simplest example of reading a CSV file:: 483 484 import csv 485 with open('some.csv', 'rb') as f: 486 reader = csv.reader(f) 487 for row in reader: 488 print row 489 490 Reading a file with an alternate format:: 491 492 import csv 493 with open('passwd', 'rb') as f: 494 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE) 495 for row in reader: 496 print row 497 498 The corresponding simplest possible writing example is:: 499 500 import csv 501 with open('some.csv', 'wb') as f: 502 writer = csv.writer(f) 503 writer.writerows(someiterable) 504 505 Registering a new dialect:: 506 507 import csv 508 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) 509 with open('passwd', 'rb') as f: 510 reader = csv.reader(f, 'unixpwd') 511 512 A slightly more advanced use of the reader --- catching and reporting errors:: 513 514 import csv, sys 515 filename = 'some.csv' 516 with open(filename, 'rb') as f: 517 reader = csv.reader(f) 518 try: 519 for row in reader: 520 print row 521 except csv.Error as e: 522 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) 523 524 And while the module doesn't directly support parsing strings, it can easily be 525 done:: 526 527 import csv 528 for row in csv.reader(['one,two,three']): 529 print row 530 531 The :mod:`csv` module doesn't directly support reading and writing Unicode, but 532 it is 8-bit-clean save for some problems with ASCII NUL characters. So you can 533 write functions or classes that handle the encoding and decoding for you as long 534 as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended. 535 536 :func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader` 537 to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder` 538 is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at 539 a time. The encoded strings are parsed by the CSV reader, and 540 :func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode:: 541 542 import csv 543 544 def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): 545 # csv.py doesn't do Unicode; encode temporarily as UTF-8: 546 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), 547 dialect=dialect, **kwargs) 548 for row in csv_reader: 549 # decode UTF-8 back to Unicode, cell by cell: 550 yield [unicode(cell, 'utf-8') for cell in row] 551 552 def utf_8_encoder(unicode_csv_data): 553 for line in unicode_csv_data: 554 yield line.encode('utf-8') 555 556 For all other encodings the following :class:`UnicodeReader` and 557 :class:`UnicodeWriter` classes can be used. They take an additional *encoding* 558 parameter in their constructor and make sure that the data passes the real 559 reader or writer encoded as UTF-8:: 560 561 import csv, codecs, cStringIO 562 563 class UTF8Recoder: 564 """ 565 Iterator that reads an encoded stream and reencodes the input to UTF-8 566 """ 567 def __init__(self, f, encoding): 568 self.reader = codecs.getreader(encoding)(f) 569 570 def __iter__(self): 571 return self 572 573 def next(self): 574 return self.reader.next().encode("utf-8") 575 576 class UnicodeReader: 577 """ 578 A CSV reader which will iterate over lines in the CSV file "f", 579 which is encoded in the given encoding. 580 """ 581 582 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 583 f = UTF8Recoder(f, encoding) 584 self.reader = csv.reader(f, dialect=dialect, **kwds) 585 586 def next(self): 587 row = self.reader.next() 588 return [unicode(s, "utf-8") for s in row] 589 590 def __iter__(self): 591 return self 592 593 class UnicodeWriter: 594 """ 595 A CSV writer which will write rows to CSV file "f", 596 which is encoded in the given encoding. 597 """ 598 599 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 600 # Redirect output to a queue 601 self.queue = cStringIO.StringIO() 602 self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 603 self.stream = f 604 self.encoder = codecs.getincrementalencoder(encoding)() 605 606 def writerow(self, row): 607 self.writer.writerow([s.encode("utf-8") for s in row]) 608 # Fetch UTF-8 output from the queue ... 609 data = self.queue.getvalue() 610 data = data.decode("utf-8") 611 # ... and reencode it into the target encoding 612 data = self.encoder.encode(data) 613 # write to the target stream 614 self.stream.write(data) 615 # empty queue 616 self.queue.truncate(0) 617 618 def writerows(self, rows): 619 for row in rows: 620 self.writerow(row) 621 622