1 2 :mod:`csv` --- CSV File Reading and Writing 3 =========================================== 4 5 .. module:: csv 6 :synopsis: Write and read tabular data to and from delimited files. 7 .. sectionauthor:: Skip Montanaro <skip (a] pobox.com> 8 9 10 .. versionadded:: 2.3 11 12 .. index:: 13 single: csv 14 pair: data; tabular 15 16 The so-called CSV (Comma Separated Values) format is the most common import and 17 export format for spreadsheets and databases. There is no "CSV standard", so 18 the format is operationally defined by the many applications which read and 19 write it. The lack of a standard means that subtle differences often exist in 20 the data produced and consumed by different applications. These differences can 21 make it annoying to process CSV files from multiple sources. Still, while the 22 delimiters and quoting characters vary, the overall format is similar enough 23 that it is possible to write a single module which can efficiently manipulate 24 such data, hiding the details of reading and writing the data from the 25 programmer. 26 27 The :mod:`csv` module implements classes to read and write tabular data in CSV 28 format. It allows programmers to say, "write this data in the format preferred 29 by Excel," or "read data from this file which was generated by Excel," without 30 knowing the precise details of the CSV format used by Excel. Programmers can 31 also describe the CSV formats understood by other applications or define their 32 own special-purpose CSV formats. 33 34 The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and 35 write sequences. Programmers can also read and write data in dictionary form 36 using the :class:`DictReader` and :class:`DictWriter` classes. 37 38 .. note:: 39 40 This version of the :mod:`csv` module doesn't support Unicode input. Also, 41 there are currently some issues regarding ASCII NUL characters. Accordingly, 42 all input should be UTF-8 or printable ASCII to be safe; see the examples in 43 section :ref:`csv-examples`. 44 45 46 .. seealso:: 47 48 :pep:`305` - CSV File API 49 The Python Enhancement Proposal which proposed this addition to Python. 50 51 52 .. _csv-contents: 53 54 Module Contents 55 --------------- 56 57 The :mod:`csv` module defines the following functions: 58 59 60 .. function:: reader(csvfile, dialect='excel', **fmtparams) 61 62 Return a reader object which will iterate over lines in the given *csvfile*. 63 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a 64 string each time its :meth:`!next` method is called --- file objects and list 65 objects are both suitable. If *csvfile* is a file object, it must be opened 66 with the 'b' flag on platforms where that makes a difference. An optional 67 *dialect* parameter can be given which is used to define a set of parameters 68 specific to a particular CSV dialect. It may be an instance of a subclass of 69 the :class:`Dialect` class or one of the strings returned by the 70 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments 71 can be given to override individual formatting parameters in the current 72 dialect. For full details about the dialect and formatting parameters, see 73 section :ref:`csv-fmt-params`. 74 75 Each row read from the csv file is returned as a list of strings. No 76 automatic data type conversion is performed. 77 78 A short usage example:: 79 80 >>> import csv 81 >>> with open('eggs.csv', 'rb') as csvfile: 82 ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') 83 ... for row in spamreader: 84 ... print ', '.join(row) 85 Spam, Spam, Spam, Spam, Spam, Baked Beans 86 Spam, Lovely Spam, Wonderful Spam 87 88 .. versionchanged:: 2.5 89 The parser is now stricter with respect to multi-line quoted fields. Previously, 90 if a line ended within a quoted field without a terminating newline character, a 91 newline would be inserted into the returned field. This behavior caused problems 92 when reading files which contained carriage return characters within fields. 93 The behavior was changed to return the field without inserting newlines. As a 94 consequence, if newlines embedded within fields are important, the input should 95 be split into lines in a manner which preserves the newline characters. 96 97 98 .. function:: writer(csvfile, dialect='excel', **fmtparams) 99 100 Return a writer object responsible for converting the user's data into delimited 101 strings on the given file-like object. *csvfile* can be any object with a 102 :func:`write` method. If *csvfile* is a file object, it must be opened with the 103 'b' flag on platforms where that makes a difference. An optional *dialect* 104 parameter can be given which is used to define a set of parameters specific to a 105 particular CSV dialect. It may be an instance of a subclass of the 106 :class:`Dialect` class or one of the strings returned by the 107 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments 108 can be given to override individual formatting parameters in the current 109 dialect. For full details about the dialect and formatting parameters, see 110 section :ref:`csv-fmt-params`. To make it 111 as easy as possible to interface with modules which implement the DB API, the 112 value :const:`None` is written as the empty string. While this isn't a 113 reversible transformation, it makes it easier to dump SQL NULL data values to 114 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call. 115 Floats are stringified with :func:`repr` before being written. 116 All other non-string data are stringified with :func:`str` before being written. 117 118 A short usage example:: 119 120 import csv 121 with open('eggs.csv', 'wb') as csvfile: 122 spamwriter = csv.writer(csvfile, delimiter=' ', 123 quotechar='|', quoting=csv.QUOTE_MINIMAL) 124 spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) 125 spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) 126 127 128 .. function:: register_dialect(name[, dialect], **fmtparams) 129 130 Associate *dialect* with *name*. *name* must be a string or Unicode object. The 131 dialect can be specified either by passing a sub-class of :class:`Dialect`, or 132 by *fmtparams* keyword arguments, or both, with keyword arguments overriding 133 parameters of the dialect. For full details about the dialect and formatting 134 parameters, see section :ref:`csv-fmt-params`. 135 136 137 .. function:: unregister_dialect(name) 138 139 Delete the dialect associated with *name* from the dialect registry. An 140 :exc:`Error` is raised if *name* is not a registered dialect name. 141 142 143 .. function:: get_dialect(name) 144 145 Return the dialect associated with *name*. An :exc:`Error` is raised if *name* 146 is not a registered dialect name. 147 148 .. versionchanged:: 2.5 149 This function now returns an immutable :class:`Dialect`. Previously an 150 instance of the requested dialect was returned. Users could modify the 151 underlying class, changing the behavior of active readers and writers. 152 153 .. function:: list_dialects() 154 155 Return the names of all registered dialects. 156 157 158 .. function:: field_size_limit([new_limit]) 159 160 Returns the current maximum field size allowed by the parser. If *new_limit* is 161 given, this becomes the new limit. 162 163 .. versionadded:: 2.5 164 165 The :mod:`csv` module defines the following classes: 166 167 168 .. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \ 169 dialect='excel', *args, **kwds) 170 171 Create an object which operates like a regular reader but maps the 172 information read into a dict whose keys are given by the optional 173 *fieldnames* parameter. The *fieldnames* parameter is a :ref:`sequence 174 <collections-abstract-base-classes>` whose elements are associated with the 175 fields of the input data in order. These elements become the keys of the 176 resulting dictionary. If the *fieldnames* parameter is omitted, the values 177 in the first row of the file *f* will be used as the fieldnames. If the 178 row read has more fields than the fieldnames sequence, the remaining data is 179 added as a sequence keyed by the value of *restkey*. If the row read has 180 fewer fields than the fieldnames sequence, the remaining keys take the value 181 of the optional *restval* parameter. Any other optional or keyword 182 arguments are passed to the underlying :class:`reader` instance. 183 184 A short usage example:: 185 186 >>> import csv 187 >>> with open('names.csv') as csvfile: 188 ... reader = csv.DictReader(csvfile) 189 ... for row in reader: 190 ... print(row['first_name'], row['last_name']) 191 ... 192 Baked Beans 193 Lovely Spam 194 Wonderful Spam 195 196 197 .. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \ 198 dialect='excel', *args, **kwds) 199 200 Create an object which operates like a regular writer but maps dictionaries 201 onto output rows. The *fieldnames* parameter is a :ref:`sequence 202 <collections-abstract-base-classes>` of keys that identify the order in 203 which values in the dictionary passed to the :meth:`writerow` method are 204 written to the file *f*. The optional *restval* parameter specifies the 205 value to be written if the dictionary is missing a key in *fieldnames*. If 206 the dictionary passed to the :meth:`writerow` method contains a key not 207 found in *fieldnames*, the optional *extrasaction* parameter indicates what 208 action to take. If it is set to ``'raise'`` a :exc:`ValueError` is raised. 209 If it is set to ``'ignore'``, extra values in the dictionary are ignored. 210 Any other optional or keyword arguments are passed to the underlying 211 :class:`writer` instance. 212 213 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter 214 of the :class:`DictWriter` is not optional. Since Python's :class:`dict` 215 objects are not ordered, there is not enough information available to deduce 216 the order in which the row should be written to the file *f*. 217 218 A short usage example:: 219 220 import csv 221 222 with open('names.csv', 'w') as csvfile: 223 fieldnames = ['first_name', 'last_name'] 224 writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 225 226 writer.writeheader() 227 writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'}) 228 writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'}) 229 writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'}) 230 231 232 .. class:: Dialect 233 234 The :class:`Dialect` class is a container class relied on primarily for its 235 attributes, which are used to define the parameters for a specific 236 :class:`reader` or :class:`writer` instance. 237 238 239 .. class:: excel() 240 241 The :class:`excel` class defines the usual properties of an Excel-generated CSV 242 file. It is registered with the dialect name ``'excel'``. 243 244 245 .. class:: excel_tab() 246 247 The :class:`excel_tab` class defines the usual properties of an Excel-generated 248 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``. 249 250 251 .. class:: Sniffer() 252 253 The :class:`Sniffer` class is used to deduce the format of a CSV file. 254 255 The :class:`Sniffer` class provides two methods: 256 257 .. method:: sniff(sample, delimiters=None) 258 259 Analyze the given *sample* and return a :class:`Dialect` subclass 260 reflecting the parameters found. If the optional *delimiters* parameter 261 is given, it is interpreted as a string containing possible valid 262 delimiter characters. 263 264 265 .. method:: has_header(sample) 266 267 Analyze the sample text (presumed to be in CSV format) and return 268 :const:`True` if the first row appears to be a series of column headers. 269 270 An example for :class:`Sniffer` use:: 271 272 with open('example.csv', 'rb') as csvfile: 273 dialect = csv.Sniffer().sniff(csvfile.read(1024)) 274 csvfile.seek(0) 275 reader = csv.reader(csvfile, dialect) 276 # ... process CSV file contents here ... 277 278 279 The :mod:`csv` module defines the following constants: 280 281 .. data:: QUOTE_ALL 282 283 Instructs :class:`writer` objects to quote all fields. 284 285 286 .. data:: QUOTE_MINIMAL 287 288 Instructs :class:`writer` objects to only quote those fields which contain 289 special characters such as *delimiter*, *quotechar* or any of the characters in 290 *lineterminator*. 291 292 293 .. data:: QUOTE_NONNUMERIC 294 295 Instructs :class:`writer` objects to quote all non-numeric fields. 296 297 Instructs the reader to convert all non-quoted fields to type *float*. 298 299 300 .. data:: QUOTE_NONE 301 302 Instructs :class:`writer` objects to never quote fields. When the current 303 *delimiter* occurs in output data it is preceded by the current *escapechar* 304 character. If *escapechar* is not set, the writer will raise :exc:`Error` if 305 any characters that require escaping are encountered. 306 307 Instructs :class:`reader` to perform no special processing of quote characters. 308 309 The :mod:`csv` module defines the following exception: 310 311 312 .. exception:: Error 313 314 Raised by any of the functions when an error is detected. 315 316 317 .. _csv-fmt-params: 318 319 Dialects and Formatting Parameters 320 ---------------------------------- 321 322 To make it easier to specify the format of input and output records, specific 323 formatting parameters are grouped together into dialects. A dialect is a 324 subclass of the :class:`Dialect` class having a set of specific methods and a 325 single :meth:`validate` method. When creating :class:`reader` or 326 :class:`writer` objects, the programmer can specify a string or a subclass of 327 the :class:`Dialect` class as the dialect parameter. In addition to, or instead 328 of, the *dialect* parameter, the programmer can also specify individual 329 formatting parameters, which have the same names as the attributes defined below 330 for the :class:`Dialect` class. 331 332 Dialects support the following attributes: 333 334 335 .. attribute:: Dialect.delimiter 336 337 A one-character string used to separate fields. It defaults to ``','``. 338 339 340 .. attribute:: Dialect.doublequote 341 342 Controls how instances of *quotechar* appearing inside a field should 343 themselves be quoted. When :const:`True`, the character is doubled. When 344 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It 345 defaults to :const:`True`. 346 347 On output, if *doublequote* is :const:`False` and no *escapechar* is set, 348 :exc:`Error` is raised if a *quotechar* is found in a field. 349 350 351 .. attribute:: Dialect.escapechar 352 353 A one-character string used by the writer to escape the *delimiter* if *quoting* 354 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is 355 :const:`False`. On reading, the *escapechar* removes any special meaning from 356 the following character. It defaults to :const:`None`, which disables escaping. 357 358 359 .. attribute:: Dialect.lineterminator 360 361 The string used to terminate lines produced by the :class:`writer`. It defaults 362 to ``'\r\n'``. 363 364 .. note:: 365 366 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as 367 end-of-line, and ignores *lineterminator*. This behavior may change in the 368 future. 369 370 371 .. attribute:: Dialect.quotechar 372 373 A one-character string used to quote fields containing special characters, such 374 as the *delimiter* or *quotechar*, or which contain new-line characters. It 375 defaults to ``'"'``. 376 377 378 .. attribute:: Dialect.quoting 379 380 Controls when quotes should be generated by the writer and recognised by the 381 reader. It can take on any of the :const:`QUOTE_\*` constants (see section 382 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`. 383 384 385 .. attribute:: Dialect.skipinitialspace 386 387 When :const:`True`, whitespace immediately following the *delimiter* is ignored. 388 The default is :const:`False`. 389 390 391 .. attribute:: Dialect.strict 392 393 When ``True``, raise exception :exc:`Error` on bad CSV input. 394 The default is ``False``. 395 396 Reader Objects 397 -------------- 398 399 Reader objects (:class:`DictReader` instances and objects returned by the 400 :func:`reader` function) have the following public methods: 401 402 403 .. method:: csvreader.next() 404 405 Return the next row of the reader's iterable object as a list, parsed according 406 to the current dialect. 407 408 Reader objects have the following public attributes: 409 410 411 .. attribute:: csvreader.dialect 412 413 A read-only description of the dialect in use by the parser. 414 415 416 .. attribute:: csvreader.line_num 417 418 The number of lines read from the source iterator. This is not the same as the 419 number of records returned, as records can span multiple lines. 420 421 .. versionadded:: 2.5 422 423 424 DictReader objects have the following public attribute: 425 426 427 .. attribute:: csvreader.fieldnames 428 429 If not passed as a parameter when creating the object, this attribute is 430 initialized upon first access or when the first record is read from the 431 file. 432 433 .. versionchanged:: 2.6 434 435 436 Writer Objects 437 -------------- 438 439 :class:`Writer` objects (:class:`DictWriter` instances and objects returned by 440 the :func:`writer` function) have the following public methods. A *row* must be 441 a sequence of strings or numbers for :class:`Writer` objects and a dictionary 442 mapping fieldnames to strings or numbers (by passing them through :func:`str` 443 first) for :class:`DictWriter` objects. Note that complex numbers are written 444 out surrounded by parens. This may cause some problems for other programs which 445 read CSV files (assuming they support complex numbers at all). 446 447 448 .. method:: csvwriter.writerow(row) 449 450 Write the *row* parameter to the writer's file object, formatted according to 451 the current dialect. 452 453 454 .. method:: csvwriter.writerows(rows) 455 456 Write all elements in *rows* (an iterable of *row* objects as described 457 above) to the writer's file object, formatted according to the current 458 dialect. 459 460 Writer objects have the following public attribute: 461 462 463 .. attribute:: csvwriter.dialect 464 465 A read-only description of the dialect in use by the writer. 466 467 468 DictWriter objects have the following public method: 469 470 471 .. method:: DictWriter.writeheader() 472 473 Write a row with the field names (as specified in the constructor). 474 475 .. versionadded:: 2.7 476 477 478 .. _csv-examples: 479 480 Examples 481 -------- 482 483 The simplest example of reading a CSV file:: 484 485 import csv 486 with open('some.csv', 'rb') as f: 487 reader = csv.reader(f) 488 for row in reader: 489 print row 490 491 Reading a file with an alternate format:: 492 493 import csv 494 with open('passwd', 'rb') as f: 495 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE) 496 for row in reader: 497 print row 498 499 The corresponding simplest possible writing example is:: 500 501 import csv 502 with open('some.csv', 'wb') as f: 503 writer = csv.writer(f) 504 writer.writerows(someiterable) 505 506 Registering a new dialect:: 507 508 import csv 509 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) 510 with open('passwd', 'rb') as f: 511 reader = csv.reader(f, 'unixpwd') 512 513 A slightly more advanced use of the reader --- catching and reporting errors:: 514 515 import csv, sys 516 filename = 'some.csv' 517 with open(filename, 'rb') as f: 518 reader = csv.reader(f) 519 try: 520 for row in reader: 521 print row 522 except csv.Error as e: 523 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) 524 525 And while the module doesn't directly support parsing strings, it can easily be 526 done:: 527 528 import csv 529 for row in csv.reader(['one,two,three']): 530 print row 531 532 The :mod:`csv` module doesn't directly support reading and writing Unicode, but 533 it is 8-bit-clean save for some problems with ASCII NUL characters. So you can 534 write functions or classes that handle the encoding and decoding for you as long 535 as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended. 536 537 :func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader` 538 to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder` 539 is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at 540 a time. The encoded strings are parsed by the CSV reader, and 541 :func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode:: 542 543 import csv 544 545 def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): 546 # csv.py doesn't do Unicode; encode temporarily as UTF-8: 547 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), 548 dialect=dialect, **kwargs) 549 for row in csv_reader: 550 # decode UTF-8 back to Unicode, cell by cell: 551 yield [unicode(cell, 'utf-8') for cell in row] 552 553 def utf_8_encoder(unicode_csv_data): 554 for line in unicode_csv_data: 555 yield line.encode('utf-8') 556 557 For all other encodings the following :class:`UnicodeReader` and 558 :class:`UnicodeWriter` classes can be used. They take an additional *encoding* 559 parameter in their constructor and make sure that the data passes the real 560 reader or writer encoded as UTF-8:: 561 562 import csv, codecs, cStringIO 563 564 class UTF8Recoder: 565 """ 566 Iterator that reads an encoded stream and reencodes the input to UTF-8 567 """ 568 def __init__(self, f, encoding): 569 self.reader = codecs.getreader(encoding)(f) 570 571 def __iter__(self): 572 return self 573 574 def next(self): 575 return self.reader.next().encode("utf-8") 576 577 class UnicodeReader: 578 """ 579 A CSV reader which will iterate over lines in the CSV file "f", 580 which is encoded in the given encoding. 581 """ 582 583 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 584 f = UTF8Recoder(f, encoding) 585 self.reader = csv.reader(f, dialect=dialect, **kwds) 586 587 def next(self): 588 row = self.reader.next() 589 return [unicode(s, "utf-8") for s in row] 590 591 def __iter__(self): 592 return self 593 594 class UnicodeWriter: 595 """ 596 A CSV writer which will write rows to CSV file "f", 597 which is encoded in the given encoding. 598 """ 599 600 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): 601 # Redirect output to a queue 602 self.queue = cStringIO.StringIO() 603 self.writer = csv.writer(self.queue, dialect=dialect, **kwds) 604 self.stream = f 605 self.encoder = codecs.getincrementalencoder(encoding)() 606 607 def writerow(self, row): 608 self.writer.writerow([s.encode("utf-8") for s in row]) 609 # Fetch UTF-8 output from the queue ... 610 data = self.queue.getvalue() 611 data = data.decode("utf-8") 612 # ... and reencode it into the target encoding 613 data = self.encoder.encode(data) 614 # write to the target stream 615 self.stream.write(data) 616 # empty queue 617 self.queue.truncate(0) 618 619 def writerows(self, rows): 620 for row in rows: 621 self.writerow(row) 622 623