Home | History | Annotate | Download | only in library
      1 
      2 :mod:`csv` --- CSV File Reading and Writing
      3 ===========================================
      4 
      5 .. module:: csv
      6    :synopsis: Write and read tabular data to and from delimited files.
      7 .. sectionauthor:: Skip Montanaro <skip (a] pobox.com>
      8 
      9 
     10 .. versionadded:: 2.3
     11 
     12 .. index::
     13    single: csv
     14    pair: data; tabular
     15 
     16 The so-called CSV (Comma Separated Values) format is the most common import and
     17 export format for spreadsheets and databases.  There is no "CSV standard", so
     18 the format is operationally defined by the many applications which read and
     19 write it.  The lack of a standard means that subtle differences often exist in
     20 the data produced and consumed by different applications.  These differences can
     21 make it annoying to process CSV files from multiple sources.  Still, while the
     22 delimiters and quoting characters vary, the overall format is similar enough
     23 that it is possible to write a single module which can efficiently manipulate
     24 such data, hiding the details of reading and writing the data from the
     25 programmer.
     26 
     27 The :mod:`csv` module implements classes to read and write tabular data in CSV
     28 format.  It allows programmers to say, "write this data in the format preferred
     29 by Excel," or "read data from this file which was generated by Excel," without
     30 knowing the precise details of the CSV format used by Excel.  Programmers can
     31 also describe the CSV formats understood by other applications or define their
     32 own special-purpose CSV formats.
     33 
     34 The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
     35 write sequences.  Programmers can also read and write data in dictionary form
     36 using the :class:`DictReader` and :class:`DictWriter` classes.
     37 
     38 .. note::
     39 
     40    This version of the :mod:`csv` module doesn't support Unicode input.  Also,
     41    there are currently some issues regarding ASCII NUL characters.  Accordingly,
     42    all input should be UTF-8 or printable ASCII to be safe; see the examples in
     43    section :ref:`csv-examples`.
     44 
     45 
     46 .. seealso::
     47 
     48    :pep:`305` - CSV File API
     49       The Python Enhancement Proposal which proposed this addition to Python.
     50 
     51 
     52 .. _csv-contents:
     53 
     54 Module Contents
     55 ---------------
     56 
     57 The :mod:`csv` module defines the following functions:
     58 
     59 
     60 .. function:: reader(csvfile, dialect='excel', **fmtparams)
     61 
     62    Return a reader object which will iterate over lines in the given *csvfile*.
     63    *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
     64    string each time its :meth:`!next` method is called --- file objects and list
     65    objects are both suitable.   If *csvfile* is a file object, it must be opened
     66    with the 'b' flag on platforms where that makes a difference.  An optional
     67    *dialect* parameter can be given which is used to define a set of parameters
     68    specific to a particular CSV dialect.  It may be an instance of a subclass of
     69    the :class:`Dialect` class or one of the strings returned by the
     70    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
     71    can be given to override individual formatting parameters in the current
     72    dialect.  For full details about the dialect and formatting parameters, see
     73    section :ref:`csv-fmt-params`.
     74 
     75    Each row read from the csv file is returned as a list of strings.  No
     76    automatic data type conversion is performed.
     77 
     78    A short usage example::
     79 
     80       >>> import csv
     81       >>> with open('eggs.csv', 'rb') as csvfile:
     82       ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     83       ...     for row in spamreader:
     84       ...         print ', '.join(row)
     85       Spam, Spam, Spam, Spam, Spam, Baked Beans
     86       Spam, Lovely Spam, Wonderful Spam
     87 
     88    .. versionchanged:: 2.5
     89       The parser is now stricter with respect to multi-line quoted fields. Previously,
     90       if a line ended within a quoted field without a terminating newline character, a
     91       newline would be inserted into the returned field. This behavior caused problems
     92       when reading files which contained carriage return characters within fields.
     93       The behavior was changed to return the field without inserting newlines. As a
     94       consequence, if newlines embedded within fields are important, the input should
     95       be split into lines in a manner which preserves the newline characters.
     96 
     97 
     98 .. function:: writer(csvfile, dialect='excel', **fmtparams)
     99 
    100    Return a writer object responsible for converting the user's data into delimited
    101    strings on the given file-like object.  *csvfile* can be any object with a
    102    :func:`write` method.  If *csvfile* is a file object, it must be opened with the
    103    'b' flag on platforms where that makes a difference.  An optional *dialect*
    104    parameter can be given which is used to define a set of parameters specific to a
    105    particular CSV dialect.  It may be an instance of a subclass of the
    106    :class:`Dialect` class or one of the strings returned by the
    107    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
    108    can be given to override individual formatting parameters in the current
    109    dialect.  For full details about the dialect and formatting parameters, see
    110    section :ref:`csv-fmt-params`. To make it
    111    as easy as possible to interface with modules which implement the DB API, the
    112    value :const:`None` is written as the empty string.  While this isn't a
    113    reversible transformation, it makes it easier to dump SQL NULL data values to
    114    CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
    115    Floats are stringified with :func:`repr` before being written.
    116    All other non-string data are stringified with :func:`str` before being written.
    117 
    118    A short usage example::
    119 
    120       import csv
    121       with open('eggs.csv', 'wb') as csvfile:
    122           spamwriter = csv.writer(csvfile, delimiter=' ',
    123                                   quotechar='|', quoting=csv.QUOTE_MINIMAL)
    124           spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    125           spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
    126 
    127 
    128 .. function:: register_dialect(name[, dialect], **fmtparams)
    129 
    130    Associate *dialect* with *name*.  *name* must be a string or Unicode object. The
    131    dialect can be specified either by passing a sub-class of :class:`Dialect`, or
    132    by *fmtparams* keyword arguments, or both, with keyword arguments overriding
    133    parameters of the dialect. For full details about the dialect and formatting
    134    parameters, see section :ref:`csv-fmt-params`.
    135 
    136 
    137 .. function:: unregister_dialect(name)
    138 
    139    Delete the dialect associated with *name* from the dialect registry.  An
    140    :exc:`Error` is raised if *name* is not a registered dialect name.
    141 
    142 
    143 .. function:: get_dialect(name)
    144 
    145    Return the dialect associated with *name*.  An :exc:`Error` is raised if *name*
    146    is not a registered dialect name.
    147 
    148    .. versionchanged:: 2.5
    149       This function now returns an immutable :class:`Dialect`.  Previously an
    150       instance of the requested dialect was returned.  Users could modify the
    151       underlying class, changing the behavior of active readers and writers.
    152 
    153 .. function:: list_dialects()
    154 
    155    Return the names of all registered dialects.
    156 
    157 
    158 .. function:: field_size_limit([new_limit])
    159 
    160    Returns the current maximum field size allowed by the parser. If *new_limit* is
    161    given, this becomes the new limit.
    162 
    163    .. versionadded:: 2.5
    164 
    165 The :mod:`csv` module defines the following classes:
    166 
    167 
    168 .. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, \
    169                       dialect='excel', *args, **kwds)
    170 
    171    Create an object which operates like a regular reader but maps the
    172    information read into a dict whose keys are given by the optional
    173    *fieldnames* parameter.  The *fieldnames* parameter is a :ref:`sequence
    174    <collections-abstract-base-classes>` whose elements are associated with the
    175    fields of the input data in order. These elements become the keys of the
    176    resulting dictionary.  If the *fieldnames* parameter is omitted, the values
    177    in the first row of the *csvfile* will be used as the fieldnames.  If the
    178    row read has more fields than the fieldnames sequence, the remaining data is
    179    added as a sequence keyed by the value of *restkey*.  If the row read has
    180    fewer fields than the fieldnames sequence, the remaining keys take the value
    181    of the optional *restval* parameter.  Any other optional or keyword
    182    arguments are passed to the underlying :class:`reader` instance.
    183 
    184    A short usage example::
    185 
    186        >>> import csv
    187        >>> with open('names.csv') as csvfile:
    188        ...     reader = csv.DictReader(csvfile)
    189        ...     for row in reader:
    190        ...         print(row['first_name'], row['last_name'])
    191        ...
    192        Baked Beans
    193        Lovely Spam
    194        Wonderful Spam
    195 
    196 
    197 .. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', \
    198                       dialect='excel', *args, **kwds)
    199 
    200    Create an object which operates like a regular writer but maps dictionaries
    201    onto output rows.  The *fieldnames* parameter is a :ref:`sequence
    202    <collections-abstract-base-classes>` of keys that identify the order in
    203    which values in the dictionary passed to the :meth:`writerow` method are
    204    written to the *csvfile*.  The optional *restval* parameter specifies the
    205    value to be written if the dictionary is missing a key in *fieldnames*.  If
    206    the dictionary passed to the :meth:`writerow` method contains a key not
    207    found in *fieldnames*, the optional *extrasaction* parameter indicates what
    208    action to take.  If it is set to ``'raise'`` a :exc:`ValueError` is raised.
    209    If it is set to ``'ignore'``, extra values in the dictionary are ignored.
    210    Any other optional or keyword arguments are passed to the underlying
    211    :class:`writer` instance.
    212 
    213    Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
    214    of the :class:`DictWriter` is not optional.  Since Python's :class:`dict`
    215    objects are not ordered, there is not enough information available to deduce
    216    the order in which the row should be written to the *csvfile*.
    217 
    218    A short usage example::
    219 
    220        import csv
    221 
    222        with open('names.csv', 'w') as csvfile:
    223            fieldnames = ['first_name', 'last_name']
    224            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    225 
    226            writer.writeheader()
    227            writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    228            writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    229            writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
    230 
    231 
    232 .. class:: Dialect
    233 
    234    The :class:`Dialect` class is a container class relied on primarily for its
    235    attributes, which are used to define the parameters for a specific
    236    :class:`reader` or :class:`writer` instance.
    237 
    238 
    239 .. class:: excel()
    240 
    241    The :class:`excel` class defines the usual properties of an Excel-generated CSV
    242    file.  It is registered with the dialect name ``'excel'``.
    243 
    244 
    245 .. class:: excel_tab()
    246 
    247    The :class:`excel_tab` class defines the usual properties of an Excel-generated
    248    TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
    249 
    250 
    251 .. class:: Sniffer()
    252 
    253    The :class:`Sniffer` class is used to deduce the format of a CSV file.
    254 
    255    The :class:`Sniffer` class provides two methods:
    256 
    257    .. method:: sniff(sample, delimiters=None)
    258 
    259       Analyze the given *sample* and return a :class:`Dialect` subclass
    260       reflecting the parameters found.  If the optional *delimiters* parameter
    261       is given, it is interpreted as a string containing possible valid
    262       delimiter characters.
    263 
    264 
    265    .. method:: has_header(sample)
    266 
    267       Analyze the sample text (presumed to be in CSV format) and return
    268       :const:`True` if the first row appears to be a series of column headers.
    269 
    270 An example for :class:`Sniffer` use::
    271 
    272    with open('example.csv', 'rb') as csvfile:
    273        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    274        csvfile.seek(0)
    275        reader = csv.reader(csvfile, dialect)
    276        # ... process CSV file contents here ...
    277 
    278 
    279 The :mod:`csv` module defines the following constants:
    280 
    281 .. data:: QUOTE_ALL
    282 
    283    Instructs :class:`writer` objects to quote all fields.
    284 
    285 
    286 .. data:: QUOTE_MINIMAL
    287 
    288    Instructs :class:`writer` objects to only quote those fields which contain
    289    special characters such as *delimiter*, *quotechar* or any of the characters in
    290    *lineterminator*.
    291 
    292 
    293 .. data:: QUOTE_NONNUMERIC
    294 
    295    Instructs :class:`writer` objects to quote all non-numeric fields.
    296 
    297    Instructs the reader to convert all non-quoted fields to type *float*.
    298 
    299 
    300 .. data:: QUOTE_NONE
    301 
    302    Instructs :class:`writer` objects to never quote fields.  When the current
    303    *delimiter* occurs in output data it is preceded by the current *escapechar*
    304    character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
    305    any characters that require escaping are encountered.
    306 
    307    Instructs :class:`reader` to perform no special processing of quote characters.
    308 
    309 The :mod:`csv` module defines the following exception:
    310 
    311 
    312 .. exception:: Error
    313 
    314    Raised by any of the functions when an error is detected.
    315 
    316 
    317 .. _csv-fmt-params:
    318 
    319 Dialects and Formatting Parameters
    320 ----------------------------------
    321 
    322 To make it easier to specify the format of input and output records, specific
    323 formatting parameters are grouped together into dialects.  A dialect is a
    324 subclass of the :class:`Dialect` class having a set of specific methods and a
    325 single :meth:`validate` method.  When creating :class:`reader` or
    326 :class:`writer` objects, the programmer can specify a string or a subclass of
    327 the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
    328 of, the *dialect* parameter, the programmer can also specify individual
    329 formatting parameters, which have the same names as the attributes defined below
    330 for the :class:`Dialect` class.
    331 
    332 Dialects support the following attributes:
    333 
    334 
    335 .. attribute:: Dialect.delimiter
    336 
    337    A one-character string used to separate fields.  It defaults to ``','``.
    338 
    339 
    340 .. attribute:: Dialect.doublequote
    341 
    342    Controls how instances of *quotechar* appearing inside a field should
    343    themselves be quoted.  When :const:`True`, the character is doubled. When
    344    :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
    345    defaults to :const:`True`.
    346 
    347    On output, if *doublequote* is :const:`False` and no *escapechar* is set,
    348    :exc:`Error` is raised if a *quotechar* is found in a field.
    349 
    350 
    351 .. attribute:: Dialect.escapechar
    352 
    353    A one-character string used by the writer to escape the *delimiter* if *quoting*
    354    is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
    355    :const:`False`. On reading, the *escapechar* removes any special meaning from
    356    the following character. It defaults to :const:`None`, which disables escaping.
    357 
    358 
    359 .. attribute:: Dialect.lineterminator
    360 
    361    The string used to terminate lines produced by the :class:`writer`. It defaults
    362    to ``'\r\n'``.
    363 
    364    .. note::
    365 
    366       The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
    367       end-of-line, and ignores *lineterminator*. This behavior may change in the
    368       future.
    369 
    370 
    371 .. attribute:: Dialect.quotechar
    372 
    373    A one-character string used to quote fields containing special characters, such
    374    as the *delimiter* or *quotechar*, or which contain new-line characters.  It
    375    defaults to ``'"'``.
    376 
    377 
    378 .. attribute:: Dialect.quoting
    379 
    380    Controls when quotes should be generated by the writer and recognised by the
    381    reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
    382    :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
    383 
    384 
    385 .. attribute:: Dialect.skipinitialspace
    386 
    387    When :const:`True`, whitespace immediately following the *delimiter* is ignored.
    388    The default is :const:`False`.
    389 
    390 
    391 .. attribute:: Dialect.strict
    392 
    393    When ``True``, raise exception :exc:`Error` on bad CSV input.
    394    The default is ``False``.
    395 
    396 Reader Objects
    397 --------------
    398 
    399 Reader objects (:class:`DictReader` instances and objects returned by the
    400 :func:`reader` function) have the following public methods:
    401 
    402 
    403 .. method:: csvreader.next()
    404 
    405    Return the next row of the reader's iterable object as a list, parsed according
    406    to the current dialect.
    407 
    408 Reader objects have the following public attributes:
    409 
    410 
    411 .. attribute:: csvreader.dialect
    412 
    413    A read-only description of the dialect in use by the parser.
    414 
    415 
    416 .. attribute:: csvreader.line_num
    417 
    418    The number of lines read from the source iterator. This is not the same as the
    419    number of records returned, as records can span multiple lines.
    420 
    421    .. versionadded:: 2.5
    422 
    423 
    424 DictReader objects have the following public attribute:
    425 
    426 
    427 .. attribute:: csvreader.fieldnames
    428 
    429    If not passed as a parameter when creating the object, this attribute is
    430    initialized upon first access or when the first record is read from the
    431    file.
    432 
    433    .. versionchanged:: 2.6
    434 
    435 
    436 Writer Objects
    437 --------------
    438 
    439 :class:`Writer` objects (:class:`DictWriter` instances and objects returned by
    440 the :func:`writer` function) have the following public methods.  A *row* must be
    441 a sequence of strings or numbers for :class:`Writer` objects and a dictionary
    442 mapping fieldnames to strings or numbers (by passing them through :func:`str`
    443 first) for :class:`DictWriter` objects.  Note that complex numbers are written
    444 out surrounded by parens. This may cause some problems for other programs which
    445 read CSV files (assuming they support complex numbers at all).
    446 
    447 
    448 .. method:: csvwriter.writerow(row)
    449 
    450    Write the *row* parameter to the writer's file object, formatted according to
    451    the current dialect.
    452 
    453 
    454 .. method:: csvwriter.writerows(rows)
    455 
    456    Write all the *rows* parameters (a list of *row* objects as described above) to
    457    the writer's file object, formatted according to the current dialect.
    458 
    459 Writer objects have the following public attribute:
    460 
    461 
    462 .. attribute:: csvwriter.dialect
    463 
    464    A read-only description of the dialect in use by the writer.
    465 
    466 
    467 DictWriter objects have the following public method:
    468 
    469 
    470 .. method:: DictWriter.writeheader()
    471 
    472    Write a row with the field names (as specified in the constructor).
    473 
    474    .. versionadded:: 2.7
    475 
    476 
    477 .. _csv-examples:
    478 
    479 Examples
    480 --------
    481 
    482 The simplest example of reading a CSV file::
    483 
    484    import csv
    485    with open('some.csv', 'rb') as f:
    486        reader = csv.reader(f)
    487        for row in reader:
    488            print row
    489 
    490 Reading a file with an alternate format::
    491 
    492    import csv
    493    with open('passwd', 'rb') as f:
    494        reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
    495        for row in reader:
    496            print row
    497 
    498 The corresponding simplest possible writing example is::
    499 
    500    import csv
    501    with open('some.csv', 'wb') as f:
    502        writer = csv.writer(f)
    503        writer.writerows(someiterable)
    504 
    505 Registering a new dialect::
    506 
    507    import csv
    508    csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
    509    with open('passwd', 'rb') as f:
    510        reader = csv.reader(f, 'unixpwd')
    511 
    512 A slightly more advanced use of the reader --- catching and reporting errors::
    513 
    514    import csv, sys
    515    filename = 'some.csv'
    516    with open(filename, 'rb') as f:
    517        reader = csv.reader(f)
    518        try:
    519            for row in reader:
    520                print row
    521        except csv.Error as e:
    522            sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
    523 
    524 And while the module doesn't directly support parsing strings, it can easily be
    525 done::
    526 
    527    import csv
    528    for row in csv.reader(['one,two,three']):
    529        print row
    530 
    531 The :mod:`csv` module doesn't directly support reading and writing Unicode, but
    532 it is 8-bit-clean save for some problems with ASCII NUL characters.  So you can
    533 write functions or classes that handle the encoding and decoding for you as long
    534 as you avoid encodings like UTF-16 that use NULs.  UTF-8 is recommended.
    535 
    536 :func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
    537 to handle Unicode CSV data (a list of Unicode strings).  :func:`utf_8_encoder`
    538 is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
    539 a time.  The encoded strings are parsed by the CSV reader, and
    540 :func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
    541 
    542    import csv
    543 
    544    def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    545        # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    546        csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
    547                                dialect=dialect, **kwargs)
    548        for row in csv_reader:
    549            # decode UTF-8 back to Unicode, cell by cell:
    550            yield [unicode(cell, 'utf-8') for cell in row]
    551 
    552    def utf_8_encoder(unicode_csv_data):
    553        for line in unicode_csv_data:
    554            yield line.encode('utf-8')
    555 
    556 For all other encodings the following :class:`UnicodeReader` and
    557 :class:`UnicodeWriter` classes can be used. They take an additional *encoding*
    558 parameter in their constructor and make sure that the data passes the real
    559 reader or writer encoded as UTF-8::
    560 
    561    import csv, codecs, cStringIO
    562 
    563    class UTF8Recoder:
    564        """
    565        Iterator that reads an encoded stream and reencodes the input to UTF-8
    566        """
    567        def __init__(self, f, encoding):
    568            self.reader = codecs.getreader(encoding)(f)
    569 
    570        def __iter__(self):
    571            return self
    572 
    573        def next(self):
    574            return self.reader.next().encode("utf-8")
    575 
    576    class UnicodeReader:
    577        """
    578        A CSV reader which will iterate over lines in the CSV file "f",
    579        which is encoded in the given encoding.
    580        """
    581 
    582        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    583            f = UTF8Recoder(f, encoding)
    584            self.reader = csv.reader(f, dialect=dialect, **kwds)
    585 
    586        def next(self):
    587            row = self.reader.next()
    588            return [unicode(s, "utf-8") for s in row]
    589 
    590        def __iter__(self):
    591            return self
    592 
    593    class UnicodeWriter:
    594        """
    595        A CSV writer which will write rows to CSV file "f",
    596        which is encoded in the given encoding.
    597        """
    598 
    599        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    600            # Redirect output to a queue
    601            self.queue = cStringIO.StringIO()
    602            self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    603            self.stream = f
    604            self.encoder = codecs.getincrementalencoder(encoding)()
    605 
    606        def writerow(self, row):
    607            self.writer.writerow([s.encode("utf-8") for s in row])
    608            # Fetch UTF-8 output from the queue ...
    609            data = self.queue.getvalue()
    610            data = data.decode("utf-8")
    611            # ... and reencode it into the target encoding
    612            data = self.encoder.encode(data)
    613            # write to the target stream
    614            self.stream.write(data)
    615            # empty queue
    616            self.queue.truncate(0)
    617 
    618        def writerows(self, rows):
    619            for row in rows:
    620                self.writerow(row)
    621 
    622