Home | History | Annotate | Download | only in library
      1 :mod:`csv` --- CSV File Reading and Writing
      2 ===========================================
      3 
      4 .. module:: csv
      5    :synopsis: Write and read tabular data to and from delimited files.
      6 
      7 .. sectionauthor:: Skip Montanaro <skip (a] pobox.com>
      8 
      9 **Source code:** :source:`Lib/csv.py`
     10 
     11 .. index::
     12    single: csv
     13    pair: data; tabular
     14 
     15 --------------
     16 
     17 The so-called CSV (Comma Separated Values) format is the most common import and
     18 export format for spreadsheets and databases.  CSV format was used for many
     19 years prior to attempts to describe the format in a standardized way in
     20 :rfc:`4180`.  The lack of a well-defined standard means that subtle differences
     21 often exist in the data produced and consumed by different applications.  These
     22 differences can make it annoying to process CSV files from multiple sources.
     23 Still, while the delimiters and quoting characters vary, the overall format is
     24 similar enough that it is possible to write a single module which can
     25 efficiently manipulate such data, hiding the details of reading and writing the
     26 data from the programmer.
     27 
     28 The :mod:`csv` module implements classes to read and write tabular data in CSV
     29 format.  It allows programmers to say, "write this data in the format preferred
     30 by Excel," or "read data from this file which was generated by Excel," without
     31 knowing the precise details of the CSV format used by Excel.  Programmers can
     32 also describe the CSV formats understood by other applications or define their
     33 own special-purpose CSV formats.
     34 
     35 The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
     36 write sequences.  Programmers can also read and write data in dictionary form
     37 using the :class:`DictReader` and :class:`DictWriter` classes.
     38 
     39 .. seealso::
     40 
     41    :pep:`305` - CSV File API
     42       The Python Enhancement Proposal which proposed this addition to Python.
     43 
     44 
     45 .. _csv-contents:
     46 
     47 Module Contents
     48 ---------------
     49 
     50 The :mod:`csv` module defines the following functions:
     51 
     52 
     53 .. index::
     54    single: universal newlines; csv.reader function
     55 
     56 .. function:: reader(csvfile, dialect='excel', **fmtparams)
     57 
     58    Return a reader object which will iterate over lines in the given *csvfile*.
     59    *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
     60    string each time its :meth:`!__next__` method is called --- :term:`file objects
     61    <file object>` and list objects are both suitable.   If *csvfile* is a file object,
     62    it should be opened with ``newline=''``. [1]_  An optional
     63    *dialect* parameter can be given which is used to define a set of parameters
     64    specific to a particular CSV dialect.  It may be an instance of a subclass of
     65    the :class:`Dialect` class or one of the strings returned by the
     66    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
     67    can be given to override individual formatting parameters in the current
     68    dialect.  For full details about the dialect and formatting parameters, see
     69    section :ref:`csv-fmt-params`.
     70 
     71    Each row read from the csv file is returned as a list of strings.  No
     72    automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
     73    option is specified (in which case unquoted fields are transformed into floats).
     74 
     75    A short usage example::
     76 
     77       >>> import csv
     78       >>> with open('eggs.csv', newline='') as csvfile:
     79       ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
     80       ...     for row in spamreader:
     81       ...         print(', '.join(row))
     82       Spam, Spam, Spam, Spam, Spam, Baked Beans
     83       Spam, Lovely Spam, Wonderful Spam
     84 
     85 
     86 .. function:: writer(csvfile, dialect='excel', **fmtparams)
     87 
     88    Return a writer object responsible for converting the user's data into delimited
     89    strings on the given file-like object.  *csvfile* can be any object with a
     90    :func:`write` method.  If *csvfile* is a file object, it should be opened with
     91    ``newline=''`` [1]_.  An optional *dialect*
     92    parameter can be given which is used to define a set of parameters specific to a
     93    particular CSV dialect.  It may be an instance of a subclass of the
     94    :class:`Dialect` class or one of the strings returned by the
     95    :func:`list_dialects` function.  The other optional *fmtparams* keyword arguments
     96    can be given to override individual formatting parameters in the current
     97    dialect.  For full details about the dialect and formatting parameters, see
     98    section :ref:`csv-fmt-params`. To make it
     99    as easy as possible to interface with modules which implement the DB API, the
    100    value :const:`None` is written as the empty string.  While this isn't a
    101    reversible transformation, it makes it easier to dump SQL NULL data values to
    102    CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
    103    All other non-string data are stringified with :func:`str` before being written.
    104 
    105    A short usage example::
    106 
    107       import csv
    108       with open('eggs.csv', 'w', newline='') as csvfile:
    109           spamwriter = csv.writer(csvfile, delimiter=' ',
    110                                   quotechar='|', quoting=csv.QUOTE_MINIMAL)
    111           spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    112           spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
    113 
    114 
    115 .. function:: register_dialect(name[, dialect[, **fmtparams]])
    116 
    117    Associate *dialect* with *name*.  *name* must be a string. The
    118    dialect can be specified either by passing a sub-class of :class:`Dialect`, or
    119    by *fmtparams* keyword arguments, or both, with keyword arguments overriding
    120    parameters of the dialect. For full details about the dialect and formatting
    121    parameters, see section :ref:`csv-fmt-params`.
    122 
    123 
    124 .. function:: unregister_dialect(name)
    125 
    126    Delete the dialect associated with *name* from the dialect registry.  An
    127    :exc:`Error` is raised if *name* is not a registered dialect name.
    128 
    129 
    130 .. function:: get_dialect(name)
    131 
    132    Return the dialect associated with *name*.  An :exc:`Error` is raised if
    133    *name* is not a registered dialect name.  This function returns an immutable
    134    :class:`Dialect`.
    135 
    136 .. function:: list_dialects()
    137 
    138    Return the names of all registered dialects.
    139 
    140 
    141 .. function:: field_size_limit([new_limit])
    142 
    143    Returns the current maximum field size allowed by the parser. If *new_limit* is
    144    given, this becomes the new limit.
    145 
    146 
    147 The :mod:`csv` module defines the following classes:
    148 
    149 .. class:: DictReader(f, fieldnames=None, restkey=None, restval=None, \
    150                       dialect='excel', *args, **kwds)
    151 
    152    Create an object that operates like a regular reader but maps the
    153    information in each row to an :mod:`OrderedDict <collections.OrderedDict>`
    154    whose keys are given by the optional *fieldnames* parameter.
    155 
    156    The *fieldnames* parameter is a :term:`sequence`.  If *fieldnames* is
    157    omitted, the values in the first row of file *f* will be used as the
    158    fieldnames.  Regardless of how the fieldnames are determined, the ordered
    159    dictionary preserves their original ordering.
    160 
    161    If a row has more fields than fieldnames, the remaining data is put in a
    162    list and stored with the fieldname specified by *restkey* (which defaults
    163    to ``None``).  If a non-blank row has fewer fields than fieldnames, the
    164    missing values are filled-in with ``None``.
    165 
    166    All other optional or keyword arguments are passed to the underlying
    167    :class:`reader` instance.
    168 
    169    .. versionchanged:: 3.6
    170       Returned rows are now of type :class:`OrderedDict`.
    171 
    172    A short usage example::
    173 
    174        >>> import csv
    175        >>> with open('names.csv') as csvfile:
    176        ...     reader = csv.DictReader(csvfile)
    177        ...     for row in reader:
    178        ...         print(row['first_name'], row['last_name'])
    179        ...
    180        Eric Idle
    181        John Cleese
    182 
    183        >>> print(row)
    184        OrderedDict([('first_name', 'John'), ('last_name', 'Cleese')])
    185 
    186 
    187 .. class:: DictWriter(f, fieldnames, restval='', extrasaction='raise', \
    188                       dialect='excel', *args, **kwds)
    189 
    190    Create an object which operates like a regular writer but maps dictionaries
    191    onto output rows.  The *fieldnames* parameter is a :mod:`sequence
    192    <collections.abc>` of keys that identify the order in which values in the
    193    dictionary passed to the :meth:`writerow` method are written to file
    194    *f*.  The optional *restval* parameter specifies the value to be
    195    written if the dictionary is missing a key in *fieldnames*.  If the
    196    dictionary passed to the :meth:`writerow` method contains a key not found in
    197    *fieldnames*, the optional *extrasaction* parameter indicates what action to
    198    take.
    199    If it is set to ``'raise'``, the default value, a :exc:`ValueError`
    200    is raised.
    201    If it is set to ``'ignore'``, extra values in the dictionary are ignored.
    202    Any other optional or keyword arguments are passed to the underlying
    203    :class:`writer` instance.
    204 
    205    Note that unlike the :class:`DictReader` class, the *fieldnames* parameter
    206    of the :class:`DictWriter` is not optional.  Since Python's :class:`dict`
    207    objects are not ordered, there is not enough information available to deduce
    208    the order in which the row should be written to file *f*.
    209 
    210    A short usage example::
    211 
    212        import csv
    213 
    214        with open('names.csv', 'w') as csvfile:
    215            fieldnames = ['first_name', 'last_name']
    216            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    217 
    218            writer.writeheader()
    219            writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    220            writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    221            writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
    222 
    223 
    224 .. class:: Dialect
    225 
    226    The :class:`Dialect` class is a container class relied on primarily for its
    227    attributes, which are used to define the parameters for a specific
    228    :class:`reader` or :class:`writer` instance.
    229 
    230 
    231 .. class:: excel()
    232 
    233    The :class:`excel` class defines the usual properties of an Excel-generated CSV
    234    file.  It is registered with the dialect name ``'excel'``.
    235 
    236 
    237 .. class:: excel_tab()
    238 
    239    The :class:`excel_tab` class defines the usual properties of an Excel-generated
    240    TAB-delimited file.  It is registered with the dialect name ``'excel-tab'``.
    241 
    242 
    243 .. class:: unix_dialect()
    244 
    245    The :class:`unix_dialect` class defines the usual properties of a CSV file
    246    generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
    247    all fields.  It is registered with the dialect name ``'unix'``.
    248 
    249    .. versionadded:: 3.2
    250 
    251 
    252 .. class:: Sniffer()
    253 
    254    The :class:`Sniffer` class is used to deduce the format of a CSV file.
    255 
    256    The :class:`Sniffer` class provides two methods:
    257 
    258    .. method:: sniff(sample, delimiters=None)
    259 
    260       Analyze the given *sample* and return a :class:`Dialect` subclass
    261       reflecting the parameters found.  If the optional *delimiters* parameter
    262       is given, it is interpreted as a string containing possible valid
    263       delimiter characters.
    264 
    265 
    266    .. method:: has_header(sample)
    267 
    268       Analyze the sample text (presumed to be in CSV format) and return
    269       :const:`True` if the first row appears to be a series of column headers.
    270 
    271 An example for :class:`Sniffer` use::
    272 
    273    with open('example.csv') as csvfile:
    274        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    275        csvfile.seek(0)
    276        reader = csv.reader(csvfile, dialect)
    277        # ... process CSV file contents here ...
    278 
    279 
    280 The :mod:`csv` module defines the following constants:
    281 
    282 .. data:: QUOTE_ALL
    283 
    284    Instructs :class:`writer` objects to quote all fields.
    285 
    286 
    287 .. data:: QUOTE_MINIMAL
    288 
    289    Instructs :class:`writer` objects to only quote those fields which contain
    290    special characters such as *delimiter*, *quotechar* or any of the characters in
    291    *lineterminator*.
    292 
    293 
    294 .. data:: QUOTE_NONNUMERIC
    295 
    296    Instructs :class:`writer` objects to quote all non-numeric fields.
    297 
    298    Instructs the reader to convert all non-quoted fields to type *float*.
    299 
    300 
    301 .. data:: QUOTE_NONE
    302 
    303    Instructs :class:`writer` objects to never quote fields.  When the current
    304    *delimiter* occurs in output data it is preceded by the current *escapechar*
    305    character.  If *escapechar* is not set, the writer will raise :exc:`Error` if
    306    any characters that require escaping are encountered.
    307 
    308    Instructs :class:`reader` to perform no special processing of quote characters.
    309 
    310 The :mod:`csv` module defines the following exception:
    311 
    312 
    313 .. exception:: Error
    314 
    315    Raised by any of the functions when an error is detected.
    316 
    317 .. _csv-fmt-params:
    318 
    319 Dialects and Formatting Parameters
    320 ----------------------------------
    321 
    322 To make it easier to specify the format of input and output records, specific
    323 formatting parameters are grouped together into dialects.  A dialect is a
    324 subclass of the :class:`Dialect` class having a set of specific methods and a
    325 single :meth:`validate` method.  When creating :class:`reader` or
    326 :class:`writer` objects, the programmer can specify a string or a subclass of
    327 the :class:`Dialect` class as the dialect parameter.  In addition to, or instead
    328 of, the *dialect* parameter, the programmer can also specify individual
    329 formatting parameters, which have the same names as the attributes defined below
    330 for the :class:`Dialect` class.
    331 
    332 Dialects support the following attributes:
    333 
    334 
    335 .. attribute:: Dialect.delimiter
    336 
    337    A one-character string used to separate fields.  It defaults to ``','``.
    338 
    339 
    340 .. attribute:: Dialect.doublequote
    341 
    342    Controls how instances of *quotechar* appearing inside a field should
    343    themselves be quoted.  When :const:`True`, the character is doubled. When
    344    :const:`False`, the *escapechar* is used as a prefix to the *quotechar*.  It
    345    defaults to :const:`True`.
    346 
    347    On output, if *doublequote* is :const:`False` and no *escapechar* is set,
    348    :exc:`Error` is raised if a *quotechar* is found in a field.
    349 
    350 
    351 .. attribute:: Dialect.escapechar
    352 
    353    A one-character string used by the writer to escape the *delimiter* if *quoting*
    354    is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
    355    :const:`False`. On reading, the *escapechar* removes any special meaning from
    356    the following character. It defaults to :const:`None`, which disables escaping.
    357 
    358 
    359 .. attribute:: Dialect.lineterminator
    360 
    361    The string used to terminate lines produced by the :class:`writer`. It defaults
    362    to ``'\r\n'``.
    363 
    364    .. note::
    365 
    366       The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
    367       end-of-line, and ignores *lineterminator*. This behavior may change in the
    368       future.
    369 
    370 
    371 .. attribute:: Dialect.quotechar
    372 
    373    A one-character string used to quote fields containing special characters, such
    374    as the *delimiter* or *quotechar*, or which contain new-line characters.  It
    375    defaults to ``'"'``.
    376 
    377 
    378 .. attribute:: Dialect.quoting
    379 
    380    Controls when quotes should be generated by the writer and recognised by the
    381    reader.  It can take on any of the :const:`QUOTE_\*` constants (see section
    382    :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
    383 
    384 
    385 .. attribute:: Dialect.skipinitialspace
    386 
    387    When :const:`True`, whitespace immediately following the *delimiter* is ignored.
    388    The default is :const:`False`.
    389 
    390 
    391 .. attribute:: Dialect.strict
    392 
    393    When ``True``, raise exception :exc:`Error` on bad CSV input.
    394    The default is ``False``.
    395 
    396 Reader Objects
    397 --------------
    398 
    399 Reader objects (:class:`DictReader` instances and objects returned by the
    400 :func:`reader` function) have the following public methods:
    401 
    402 .. method:: csvreader.__next__()
    403 
    404    Return the next row of the reader's iterable object as a list, parsed according
    405    to the current dialect.  Usually you should call this as ``next(reader)``.
    406 
    407 
    408 Reader objects have the following public attributes:
    409 
    410 .. attribute:: csvreader.dialect
    411 
    412    A read-only description of the dialect in use by the parser.
    413 
    414 
    415 .. attribute:: csvreader.line_num
    416 
    417    The number of lines read from the source iterator. This is not the same as the
    418    number of records returned, as records can span multiple lines.
    419 
    420 
    421 DictReader objects have the following public attribute:
    422 
    423 .. attribute:: csvreader.fieldnames
    424 
    425    If not passed as a parameter when creating the object, this attribute is
    426    initialized upon first access or when the first record is read from the
    427    file.
    428 
    429 
    430 
    431 Writer Objects
    432 --------------
    433 
    434 :class:`Writer` objects (:class:`DictWriter` instances and objects returned by
    435 the :func:`writer` function) have the following public methods.  A *row* must be
    436 an iterable of strings or numbers for :class:`Writer` objects and a dictionary
    437 mapping fieldnames to strings or numbers (by passing them through :func:`str`
    438 first) for :class:`DictWriter` objects.  Note that complex numbers are written
    439 out surrounded by parens. This may cause some problems for other programs which
    440 read CSV files (assuming they support complex numbers at all).
    441 
    442 
    443 .. method:: csvwriter.writerow(row)
    444 
    445    Write the *row* parameter to the writer's file object, formatted according to
    446    the current dialect.
    447 
    448    .. versionchanged:: 3.5
    449       Added support of arbitrary iterables.
    450 
    451 .. method:: csvwriter.writerows(rows)
    452 
    453    Write all the *rows* parameters (a list of *row* objects as described above) to
    454    the writer's file object, formatted according to the current dialect.
    455 
    456 Writer objects have the following public attribute:
    457 
    458 
    459 .. attribute:: csvwriter.dialect
    460 
    461    A read-only description of the dialect in use by the writer.
    462 
    463 
    464 DictWriter objects have the following public method:
    465 
    466 
    467 .. method:: DictWriter.writeheader()
    468 
    469    Write a row with the field names (as specified in the constructor).
    470 
    471    .. versionadded:: 3.2
    472 
    473 
    474 .. _csv-examples:
    475 
    476 Examples
    477 --------
    478 
    479 The simplest example of reading a CSV file::
    480 
    481    import csv
    482    with open('some.csv', newline='') as f:
    483        reader = csv.reader(f)
    484        for row in reader:
    485            print(row)
    486 
    487 Reading a file with an alternate format::
    488 
    489    import csv
    490    with open('passwd', newline='') as f:
    491        reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
    492        for row in reader:
    493            print(row)
    494 
    495 The corresponding simplest possible writing example is::
    496 
    497    import csv
    498    with open('some.csv', 'w', newline='') as f:
    499        writer = csv.writer(f)
    500        writer.writerows(someiterable)
    501 
    502 Since :func:`open` is used to open a CSV file for reading, the file
    503 will by default be decoded into unicode using the system default
    504 encoding (see :func:`locale.getpreferredencoding`).  To decode a file
    505 using a different encoding, use the ``encoding`` argument of open::
    506 
    507    import csv
    508    with open('some.csv', newline='', encoding='utf-8') as f:
    509        reader = csv.reader(f)
    510        for row in reader:
    511            print(row)
    512 
    513 The same applies to writing in something other than the system default
    514 encoding: specify the encoding argument when opening the output file.
    515 
    516 Registering a new dialect::
    517 
    518    import csv
    519    csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
    520    with open('passwd', newline='') as f:
    521        reader = csv.reader(f, 'unixpwd')
    522 
    523 A slightly more advanced use of the reader --- catching and reporting errors::
    524 
    525    import csv, sys
    526    filename = 'some.csv'
    527    with open(filename, newline='') as f:
    528        reader = csv.reader(f)
    529        try:
    530            for row in reader:
    531                print(row)
    532        except csv.Error as e:
    533            sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
    534 
    535 And while the module doesn't directly support parsing strings, it can easily be
    536 done::
    537 
    538    import csv
    539    for row in csv.reader(['one,two,three']):
    540        print(row)
    541 
    542 
    543 .. rubric:: Footnotes
    544 
    545 .. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
    546    will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
    547    on write an extra ``\r`` will be added.  It should always be safe to specify
    548    ``newline=''``, since the csv module does its own
    549    (:term:`universal <universal newlines>`) newline handling.
    550