Home | History | Annotate | Download | only in library
      1 :mod:`csv` --- CSV File Reading and Writing
      2 ===========================================
      3 
      4 .. module:: csv
      5    :synopsis: Write and read tabular data to and from delimited files.
      6 
      7 .. sectionauthor:: Skip Montanaro <skip (a] pobox.com>
      8 
      9 **Source code:** :source:`Lib/csv.py`
     10 
     11 .. index::
     12    single: csv
     13    pair: data; tabular
     14 
     15 --------------
     16 
     17 The so-called CSV (Comma Separated Values) format is the most common import and
     18 export format for spreadsheets and databases.  CSV format was used for many
     19 years prior to attempts to describe the format in a standardized way in
     20 :rfc:`4180`.  The lack of a well-defined standard means that subtle differences
     21 often exist in the data produced and consumed by different applications.  These
     22 differences can make it annoying to process CSV files from multiple sources.
     23 Still, while the delimiters and quoting characters vary, the overall format is
     24 similar enough that it is possible to write a single module which can
     25 efficiently manipulate such data, hiding the details of reading and writing the
     26 data from the programmer.
     27 
     28 The :mod:`csv` module implements classes to read and write tabular data in CSV
     29 format.  It allows programmers to say, "write this data in the format preferred
     30 by Excel," or "read data from this file which was generated by Excel," without
     31 knowing the precise details of the CSV format used by Excel.  Programmers can
     32 also describe the CSV formats understood by other applications or define their
     33 own special-purpose CSV formats.
     34 
     35 The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
     36 write sequences.  Programmers can also read and write data in dictionary form
     37 using the :class:`DictReader` and :class:`DictWriter` classes.
     38 
     39 .. seealso::
     40 
     41    :pep:`305` - CSV File API
     42       The Python Enhancement Proposal which proposed this addition to Python.
     43 
     44 
     45 .. _csv-contents:
     46 
     47 Module Contents
     48 ---------------
     49 
     50 The :mod:`csv` module defines the following functions:
     51 
     52 
     53 .. index::
     54    single: universal newlines; csv.reader function
     55 
     56 .. function:: reader(csvfile, dialect='excel', **fmtparams)
     57 
     58    Return a reader object which will iterate over lines in the given *csvfile*.
     59    *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
     60    string each time its :meth:`!__next__` method is called --- :term:`file objects
     61    <file object>` and list objects are both suitable.   If *csvfile* is a file object,
     62    it should be opened with ``newline=''``. [1]_  An optional
     63    *dialect* parameter can be given which is used to define a set of parameters
     64    specific to a particular CSV dialect.  It may be an instance of a subclass of
     65    the :class:`Dialect` class or one of the strings returned by the
     66    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
     67    can be given to override individual formatting parameters in the current
     68    dialect.  For full details about the dialect and formatting parameters, see
     69    section :ref:`csv-fmt-params`.
     70 
     71    Each row read from the csv file is returned as a list of strings.  No
     72    automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
     73    option is specified (in which case unquoted fields are transformed into floats).
     74 
     75    A short usage example::
     76 
     77       >>> import csv
     78       >>> with open('eggs.csv', newline='') as csvfile:
     79       ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     80       ...     for row in spamreader:
     81       ...         print(', '.join(row))
     82       Spam, Spam, Spam, Spam, Spam, Baked Beans
     83       Spam, Lovely Spam, Wonderful Spam
     84 
     85 
     86 .. function:: writer(csvfile, dialect='excel', **fmtparams)
     87 
     88    Return a writer object responsible for converting the user's data into delimited
     89    strings on the given file-like object.  *csvfile* can be any object with a
     90    :func:`write` method.  If *csvfile* is a file object, it should be opened with
     91    ``newline=''`` [1]_.  An optional *dialect*
     92    parameter can be given which is used to define a set of parameters specific to a
     93    particular CSV dialect.  It may be an instance of a subclass of the
     94    :class:`Dialect` class or one of the strings returned by the
     95    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
     96    can be given to override individual formatting parameters in the current
     97    dialect.  For full details about the dialect and formatting parameters, see
     98    section :ref:`csv-fmt-params`. To make it
     99    as easy as possible to interface with modules which implement the DB API, the
    100    value :const:`None` is written as the empty string.  While this isn't a
    101    reversible transformation, it makes it easier to dump SQL NULL data values to
    102    CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
    103    All other non-string data are stringified with :func:`str` before being written.
    104 
    105    A short usage example::
    106 
    107       import csv
    108       with open('eggs.csv', 'w', newline='') as csvfile:
    109           spamwriter = csv.writer(csvfile, delimiter=' ',
    110                                   quotechar='|', quoting=csv.QUOTE_MINIMAL)
    111           spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    112           spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
    113 
    114 
    115 .. function:: register_dialect(name[, dialect[, **fmtparams]])
    116 
    117    Associate *dialect* with *name*.  *name* must be a string. The
    118    dialect can be specified either by passing a sub-class of :class:`Dialect`, or
    119    by *fmtparams* keyword arguments, or both, with keyword arguments overriding
    120    parameters of the dialect. For full details about the dialect and formatting
    121    parameters, see section :ref:`csv-fmt-params`.
    122 
    123 
    124 .. function:: unregister_dialect(name)
    125 
    126    Delete the dialect associated with *name* from the dialect registry.  An
    127    :exc:`Error` is raised if *name* is not a registered dialect name.
    128 
    129 
    130 .. function:: get_dialect(name)
    131 
    132    Return the dialect associated with *name*.  An :exc:`Error` is raised if
    133    *name* is not a registered dialect name.  This function returns an immutable
    134    :class:`Dialect`.
    135 
    136 .. function:: list_dialects()
    137 
    138    Return the names of all registered dialects.
    139 
    140 
    141 .. function:: field_size_limit([new_limit])
    142 
    143    Returns the current maximum field size allowed by the parser. If *new_limit* is
    144    given, this becomes the new limit.
    145 
    146 
    147 The :mod:`csv` module defines the following classes:
    148 
    149 .. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
    150                       dialect='excel', *args, **kwds)
    151 
    152    Create an object that operates like a regular reader but maps the
    153    information in each row to an :mod:`OrderedDict <collections.OrderedDict>`
    154    whose keys are given by the optional *fieldnames* parameter.
    155 
    156    The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
    157    omitted, the values in the first row of file *f* will be used as the
    158    fieldnames.  Regardless of how the fieldnames are determined, the ordered
    159    dictionary preserves their original ordering.
    160 
    161    If a row has more fields than fieldnames, the remaining data is put in a
    162    list and stored with the fieldname specified by *restkey* (which defaults
    163    to ``None``).  If a non-blank row has fewer fields than fieldnames, the
    164    missing values are filled-in with ``None``.
    165 
    166    All other optional or keyword arguments are passed to the underlying
    167    :class:`reader` instance.
    168 
    169    .. versionchanged:: 3.6
    170       Returned rows are now of type :class:`OrderedDict`.
    171 
    172    A short usage example::
    173 
    174        >>> import csv
    175        >>> with open('names.csv', newline='') as csvfile:
    176        ...     reader = csv.DictReader(csvfile)
    177        ...     for row in reader:
    178        ...         print(row['first_name'], row['last_name'])
    179        ...
    180        Eric Idle
    181        John Cleese
    182 
    183        >>> print(row)
    184        OrderedDict([('first_name', 'John'), ('last_name', 'Cleese')])
    185 
    186 
    187 .. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
    188                       dialect='excel', *args, **kwds)
    189 
    190    Create an object which operates like a regular writer but maps dictionaries
    191    onto output rows.  The *fieldnames* parameter is a :mod:`sequence
    192    <collections.abc>` of keys that identify the order in which values in the
    193    dictionary passed to the :meth:`writerow` method are written to file
    194    *f*.  The optional *restval* parameter specifies the value to be
    195    written if the dictionary is missing a key in *fieldnames*.  If the
    196    dictionary passed to the :meth:`writerow` method contains a key not found in
    197    *fieldnames*, the optional *extrasaction* parameter indicates what action to
    198    take.
    199    If it is set to ``'raise'``, the default value, a :exc:`ValueError`
    200    is raised.
    201    If it is set to ``'ignore'``, extra values in the dictionary are ignored.
    202    Any other optional or keyword arguments are passed to the underlying
    203    :class:`writer` instance.
    204 
    205    Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
    206    of the :class:`DictWriter` class is not optional.
    207 
    208    A short usage example::
    209 
    210        import csv
    211 
    212        with open('names.csv', 'w', newline='') as csvfile:
    213            fieldnames = ['first_name', 'last_name']
    214            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    215 
    216            writer.writeheader()
    217            writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    218            writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    219            writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
    220 
    221 
    222 .. class:: Dialect
    223 
    224    The :class:`Dialect` class is a container class relied on primarily for its
    225    attributes, which are used to define the parameters for a specific
    226    :class:`reader` or :class:`writer` instance.
    227 
    228 
    229 .. class:: excel()
    230 
    231    The :class:`excel` class defines the usual properties of an Excel-generated CSV
    232    file.  It is registered with the dialect name ``'excel'``.
    233 
    234 
    235 .. class:: excel_tab()
    236 
    237    The :class:`excel_tab` class defines the usual properties of an Excel-generated
    238    TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
    239 
    240 
    241 .. class:: unix_dialect()
    242 
    243    The :class:`unix_dialect` class defines the usual properties of a CSV file
    244    generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
    245    all fields.  It is registered with the dialect name ``'unix'``.
    246 
    247    .. versionadded:: 3.2
    248 
    249 
    250 .. class:: Sniffer()
    251 
    252    The :class:`Sniffer` class is used to deduce the format of a CSV file.
    253 
    254    The :class:`Sniffer` class provides two methods:
    255 
    256    .. method:: sniff(sample, delimiters=None)
    257 
    258       Analyze the given *sample* and return a :class:`Dialect` subclass
    259       reflecting the parameters found.  If the optional *delimiters* parameter
    260       is given, it is interpreted as a string containing possible valid
    261       delimiter characters.
    262 
    263 
    264    .. method:: has_header(sample)
    265 
    266       Analyze the sample text (presumed to be in CSV format) and return
    267       :const:`True` if the first row appears to be a series of column headers.
    268 
    269 An example for :class:`Sniffer` use::
    270 
    271    with open('example.csv', newline='') as csvfile:
    272        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    273        csvfile.seek(0)
    274        reader = csv.reader(csvfile, dialect)
    275        # ... process CSV file contents here ...
    276 
    277 
    278 The :mod:`csv` module defines the following constants:
    279 
    280 .. data:: QUOTE_ALL
    281 
    282    Instructs :class:`writer` objects to quote all fields.
    283 
    284 
    285 .. data:: QUOTE_MINIMAL
    286 
    287    Instructs :class:`writer` objects to only quote those fields which contain
    288    special characters such as *delimiter*, *quotechar* or any of the characters in
    289    *lineterminator*.
    290 
    291 
    292 .. data:: QUOTE_NONNUMERIC
    293 
    294    Instructs :class:`writer` objects to quote all non-numeric fields.
    295 
    296    Instructs the reader to convert all non-quoted fields to type *float*.
    297 
    298 
    299 .. data:: QUOTE_NONE
    300 
    301    Instructs :class:`writer` objects to never quote fields.  When the current
    302    *delimiter* occurs in output data it is preceded by the current *escapechar*
    303    character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
    304    any characters that require escaping are encountered.
    305 
    306    Instructs :class:`reader` to perform no special processing of quote characters.
    307 
    308 The :mod:`csv` module defines the following exception:
    309 
    310 
    311 .. exception:: Error
    312 
    313    Raised by any of the functions when an error is detected.
    314 
    315 .. _csv-fmt-params:
    316 
    317 Dialects and Formatting Parameters
    318 ----------------------------------
    319 
    320 To make it easier to specify the format of input and output records, specific
    321 formatting parameters are grouped together into dialects.  A dialect is a
    322 subclass of the :class:`Dialect` class having a set of specific methods and a
    323 single :meth:`validate` method.  When creating :class:`reader` or
    324 :class:`writer` objects, the programmer can specify a string or a subclass of
    325 the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
    326 of, the *dialect* parameter, the programmer can also specify individual
    327 formatting parameters, which have the same names as the attributes defined below
    328 for the :class:`Dialect` class.
    329 
    330 Dialects support the following attributes:
    331 
    332 
    333 .. attribute:: Dialect.delimiter
    334 
    335    A one-character string used to separate fields.  It defaults to ``','``.
    336 
    337 
    338 .. attribute:: Dialect.doublequote
    339 
    340    Controls how instances of *quotechar* appearing inside a field should
    341    themselves be quoted.  When :const:`True`, the character is doubled. When
    342    :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
    343    defaults to :const:`True`.
    344 
    345    On output, if *doublequote* is :const:`False` and no *escapechar* is set,
    346    :exc:`Error` is raised if a *quotechar* is found in a field.
    347 
    348 
    349 .. attribute:: Dialect.escapechar
    350 
    351    A one-character string used by the writer to escape the *delimiter* if *quoting*
    352    is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
    353    :const:`False`. On reading, the *escapechar* removes any special meaning from
    354    the following character. It defaults to :const:`None`, which disables escaping.
    355 
    356 
    357 .. attribute:: Dialect.lineterminator
    358 
    359    The string used to terminate lines produced by the :class:`writer`. It defaults
    360    to ``'\r\n'``.
    361 
    362    .. note::
    363 
    364       The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
    365       end-of-line, and ignores *lineterminator*. This behavior may change in the
    366       future.
    367 
    368 
    369 .. attribute:: Dialect.quotechar
    370 
    371    A one-character string used to quote fields containing special characters, such
    372    as the *delimiter* or *quotechar*, or which contain new-line characters.  It
    373    defaults to ``'"'``.
    374 
    375 
    376 .. attribute:: Dialect.quoting
    377 
    378    Controls when quotes should be generated by the writer and recognised by the
    379    reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
    380    :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
    381 
    382 
    383 .. attribute:: Dialect.skipinitialspace
    384 
    385    When :const:`True`, whitespace immediately following the *delimiter* is ignored.
    386    The default is :const:`False`.
    387 
    388 
    389 .. attribute:: Dialect.strict
    390 
    391    When ``True``, raise exception :exc:`Error` on bad CSV input.
    392    The default is ``False``.
    393 
    394 Reader Objects
    395 --------------
    396 
    397 Reader objects (:class:`DictReader` instances and objects returned by the
    398 :func:`reader` function) have the following public methods:
    399 
    400 .. method:: csvreader.__next__()
    401 
    402    Return the next row of the reader's iterable object as a list (if the object
    403    was returned from :func:`reader`) or a dict (if it is a :class:`DictReader`
    404    instance), parsed according to the current dialect.  Usually you should call
    405    this as ``next(reader)``.
    406 
    407 
    408 Reader objects have the following public attributes:
    409 
    410 .. attribute:: csvreader.dialect
    411 
    412    A read-only description of the dialect in use by the parser.
    413 
    414 
    415 .. attribute:: csvreader.line_num
    416 
    417    The number of lines read from the source iterator. This is not the same as the
    418    number of records returned, as records can span multiple lines.
    419 
    420 
    421 DictReader objects have the following public attribute:
    422 
    423 .. attribute:: csvreader.fieldnames
    424 
    425    If not passed as a parameter when creating the object, this attribute is
    426    initialized upon first access or when the first record is read from the
    427    file.
    428 
    429 
    430 
    431 Writer Objects
    432 --------------
    433 
    434 :class:`Writer` objects (:class:`DictWriter` instances and objects returned by
    435 the :func:`writer` function) have the following public methods.  A *row* must be
    436 an iterable of strings or numbers for :class:`Writer` objects and a dictionary
    437 mapping fieldnames to strings or numbers (by passing them through :func:`str`
    438 first) for :class:`DictWriter` objects.  Note that complex numbers are written
    439 out surrounded by parens. This may cause some problems for other programs which
    440 read CSV files (assuming they support complex numbers at all).
    441 
    442 
    443 .. method:: csvwriter.writerow(row)
    444 
    445    Write the *row* parameter to the writer's file object, formatted according to
    446    the current dialect.
    447 
    448    .. versionchanged:: 3.5
    449       Added support of arbitrary iterables.
    450 
    451 .. method:: csvwriter.writerows(rows)
    452 
    453    Write all elements in *rows* (an iterable of *row* objects as described
    454    above) to the writer's file object, formatted according to the current
    455    dialect.
    456 
    457 Writer objects have the following public attribute:
    458 
    459 
    460 .. attribute:: csvwriter.dialect
    461 
    462    A read-only description of the dialect in use by the writer.
    463 
    464 
    465 DictWriter objects have the following public method:
    466 
    467 
    468 .. method:: DictWriter.writeheader()
    469 
    470    Write a row with the field names (as specified in the constructor).
    471 
    472    .. versionadded:: 3.2
    473 
    474 
    475 .. _csv-examples:
    476 
    477 Examples
    478 --------
    479 
    480 The simplest example of reading a CSV file::
    481 
    482    import csv
    483    with open('some.csv', newline='') as f:
    484        reader = csv.reader(f)
    485        for row in reader:
    486            print(row)
    487 
    488 Reading a file with an alternate format::
    489 
    490    import csv
    491    with open('passwd', newline='') as f:
    492        reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
    493        for row in reader:
    494            print(row)
    495 
    496 The corresponding simplest possible writing example is::
    497 
    498    import csv
    499    with open('some.csv', 'w', newline='') as f:
    500        writer = csv.writer(f)
    501        writer.writerows(someiterable)
    502 
    503 Since :func:`open` is used to open a CSV file for reading, the file
    504 will by default be decoded into unicode using the system default
    505 encoding (see :func:`locale.getpreferredencoding`).  To decode a file
    506 using a different encoding, use the ``encoding`` argument of open::
    507 
    508    import csv
    509    with open('some.csv', newline='', encoding='utf-8') as f:
    510        reader = csv.reader(f)
    511        for row in reader:
    512            print(row)
    513 
    514 The same applies to writing in something other than the system default
    515 encoding: specify the encoding argument when opening the output file.
    516 
    517 Registering a new dialect::
    518 
    519    import csv
    520    csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
    521    with open('passwd', newline='') as f:
    522        reader = csv.reader(f, 'unixpwd')
    523 
    524 A slightly more advanced use of the reader --- catching and reporting errors::
    525 
    526    import csv, sys
    527    filename = 'some.csv'
    528    with open(filename, newline='') as f:
    529        reader = csv.reader(f)
    530        try:
    531            for row in reader:
    532                print(row)
    533        except csv.Error as e:
    534            sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
    535 
    536 And while the module doesn't directly support parsing strings, it can easily be
    537 done::
    538 
    539    import csv
    540    for row in csv.reader(['one,two,three']):
    541        print(row)
    542 
    543 
    544 .. rubric:: Footnotes
    545 
    546 .. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
    547    will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
    548    on write an extra ``\r`` will be added.  It should always be safe to specify
    549    ``newline=''``, since the csv module does its own
    550    (:term:`universal <universal newlines>`) newline handling.
    551