Home | History | Annotate | Download | only in docs
      1 Table of Contents
      2 =================
      3 
      4  - [Intro](#intro)
      5  - [git](#git)
      6  - [Portability](#Portability)
      7  - [Windows vs Unix](#winvsunix)
      8  - [Library](#Library)
      9    - [`Curl_connect`](#Curl_connect)
     10    - [`Curl_do`](#Curl_do)
     11    - [`Curl_readwrite`](#Curl_readwrite)
     12    - [`Curl_done`](#Curl_done)
     13    - [`Curl_disconnect`](#Curl_disconnect)
     14  - [HTTP(S)](#http)
     15  - [FTP](#ftp)
     16    - [Kerberos](#kerberos)
     17  - [TELNET](#telnet)
     18  - [FILE](#file)
     19  - [SMB](#smb)
     20  - [LDAP](#ldap)
     21  - [E-mail](#email)
     22  - [General](#general)
     23  - [Persistent Connections](#persistent)
     24  - [multi interface/non-blocking](#multi)
     25  - [SSL libraries](#ssl)
     26  - [Library Symbols](#symbols)
     27  - [Return Codes and Informationals](#returncodes)
     28  - [AP/ABI](#abi)
     29  - [Client](#client)
     30  - [Memory Debugging](#memorydebug)
     31  - [Test Suite](#test)
     32  - [Asynchronous name resolves](#asyncdns)
     33    - [c-ares](#cares)
     34  - [`curl_off_t`](#curl_off_t)
     35  - [curlx](#curlx)
     36  - [Content Encoding](#contentencoding)
     37  - [hostip.c explained](#hostip)
     38  - [Track Down Memory Leaks](#memoryleak)
     39  - [`multi_socket`](#multi_socket)
     40  - [Structs in libcurl](#structs)
     41 
     42 <a name="intro"></a>
     43 curl internals
     44 ==============
     45 
     46  This project is split in two. The library and the client. The client part
     47  uses the library, but the library is designed to allow other applications to
     48  use it.
     49 
     50  The largest amount of code and complexity is in the library part.
     51 
     52 
     53 <a name="git"></a>
     54 git
     55 ===
     56 
     57  All changes to the sources are committed to the git repository as soon as
     58  they're somewhat verified to work. Changes shall be committed as independently
     59  as possible so that individual changes can be easier spotted and tracked
     60  afterwards.
     61 
     62  Tagging shall be used extensively, and by the time we release new archives we
     63  should tag the sources with a name similar to the released version number.
     64 
     65 <a name="Portability"></a>
     66 Portability
     67 ===========
     68 
     69  We write curl and libcurl to compile with C89 compilers.  On 32bit and up
     70  machines. Most of libcurl assumes more or less POSIX compliance but that's
     71  not a requirement.
     72 
     73  We write libcurl to build and work with lots of third party tools, and we
     74  want it to remain functional and buildable with these and later versions
     75  (older versions may still work but is not what we work hard to maintain):
     76 
     77 Dependencies
     78 ------------
     79 
     80  - OpenSSL      0.9.7
     81  - GnuTLS       1.2
     82  - zlib         1.1.4
     83  - libssh2      0.16
     84  - c-ares       1.6.0
     85  - libidn       0.4.1
     86  - cyassl       2.0.0
     87  - openldap     2.0
     88  - MIT Kerberos 1.2.4
     89  - GSKit        V5R3M0
     90  - NSS          3.14.x
     91  - axTLS        1.2.7
     92  - PolarSSL     1.3.0
     93  - Heimdal      ?
     94  - nghttp2      1.0.0
     95 
     96 Operating Systems
     97 -----------------
     98 
     99  On systems where configure runs, we aim at working on them all - if they have
    100  a suitable C compiler. On systems that don't run configure, we strive to keep
    101  curl running fine on:
    102 
    103  - Windows      98
    104  - AS/400       V5R3M0
    105  - Symbian      9.1
    106  - Windows CE   ?
    107  - TPF          ?
    108 
    109 Build tools
    110 -----------
    111 
    112  When writing code (mostly for generating stuff included in release tarballs)
    113  we use a few "build tools" and we make sure that we remain functional with
    114  these versions:
    115 
    116  - GNU Libtool  1.4.2
    117  - GNU Autoconf 2.57
    118  - GNU Automake 1.7
    119  - GNU M4       1.4
    120  - perl         5.004
    121  - roffit       0.5
    122  - groff        ? (any version that supports "groff -Tps -man [in] [out]")
    123  - ps2pdf (gs)  ?
    124 
    125 <a name="winvsunix"></a>
    126 Windows vs Unix
    127 ===============
    128 
    129  There are a few differences in how to program curl the unix way compared to
    130  the Windows way. The four perhaps most notable details are:
    131 
    132  1. Different function names for socket operations.
    133 
    134    In curl, this is solved with defines and macros, so that the source looks
    135    the same at all places except for the header file that defines them. The
    136    macros in use are sclose(), sread() and swrite().
    137 
    138  2. Windows requires a couple of init calls for the socket stuff.
    139 
    140    That's taken care of by the `curl_global_init()` call, but if other libs
    141    also do it etc there might be reasons for applications to alter that
    142    behaviour.
    143 
    144  3. The file descriptors for network communication and file operations are
    145     not easily interchangeable as in unix.
    146 
    147    We avoid this by not trying any funny tricks on file descriptors.
    148 
    149  4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
    150     destroying binary data, although you do want that conversion if it is
    151     text coming through... (sigh)
    152 
    153    We set stdout to binary under windows
    154 
    155  Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
    156  conditionals that deal with features *should* instead be in the format
    157  `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
    158  we maintain a `curl_config-win32.h` file in lib directory that is supposed to
    159  look exactly as a `curl_config.h` file would have looked like on a Windows
    160  machine!
    161 
    162  Generally speaking: always remember that this will be compiled on dozens of
    163  operating systems. Don't walk on the edge.
    164 
    165 <a name="Library"></a>
    166 Library
    167 =======
    168 
    169  (See [Structs in libcurl](#structs) for the separate section describing all
    170  major internal structs and their purposes.)
    171 
    172  There are plenty of entry points to the library, namely each publicly defined
    173  function that libcurl offers to applications. All of those functions are
    174  rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
    175  put in the lib/easy.c file.
    176 
    177  `curl_global_init_()` and `curl_global_cleanup()` should be called by the
    178  application to initialize and clean up global stuff in the library. As of
    179  today, it can handle the global SSL initing if SSL is enabled and it can init
    180  the socket layer on windows machines. libcurl itself has no "global" scope.
    181 
    182  All printf()-style functions use the supplied clones in lib/mprintf.c. This
    183  makes sure we stay absolutely platform independent.
    184 
    185  [ `curl_easy_init()`][2] allocates an internal struct and makes some
    186  initializations.  The returned handle does not reveal internals. This is the
    187  'Curl_easy' struct which works as an "anchor" struct for all `curl_easy`
    188  functions. All connections performed will get connect-specific data allocated
    189  that should be used for things related to particular connections/requests.
    190 
    191  [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
    192  be passed in pairs: the parameter-ID and the parameter-value. The list of
    193  options is documented in the man page. This function mainly sets things in
    194  the 'Curl_easy' struct.
    195 
    196  `curl_easy_perform()` is just a wrapper function that makes use of the multi
    197  API.  It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
    198  `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
    199  and then returns.
    200 
    201  Some of the most important key functions in url.c are called from multi.c
    202  when certain key steps are to be made in the transfer operation.
    203 
    204 <a name="Curl_connect"></a>
    205 Curl_connect()
    206 --------------
    207 
    208    Analyzes the URL, it separates the different components and connects to the
    209    remote host. This may involve using a proxy and/or using SSL. The
    210    `Curl_resolv()` function in lib/hostip.c is used for looking up host names
    211    (it does then use the proper underlying method, which may vary between
    212    platforms and builds).
    213 
    214    When `Curl_connect` is done, we are connected to the remote site. Then it
    215    is time to tell the server to get a document/file. `Curl_do()` arranges
    216    this.
    217 
    218    This function makes sure there's an allocated and initiated 'connectdata'
    219    struct that is used for this particular connection only (although there may
    220    be several requests performed on the same connect). A bunch of things are
    221    inited/inherited from the Curl_easy struct.
    222 
    223 <a name="Curl_do"></a>
    224 Curl_do()
    225 ---------
    226 
    227    `Curl_do()` makes sure the proper protocol-specific function is called. The
    228    functions are named after the protocols they handle.
    229 
    230    The protocol-specific functions of course deal with protocol-specific
    231    negotiations and setup. They have access to the `Curl_sendf()` (from
    232    lib/sendf.c) function to send printf-style formatted data to the remote
    233    host and when they're ready to make the actual file transfer they call the
    234    `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
    235    returns.
    236 
    237    If this DO function fails and the connection is being re-used, libcurl will
    238    then close this connection, setup a new connection and re-issue the DO
    239    request on that. This is because there is no way to be perfectly sure that
    240    we have discovered a dead connection before the DO function and thus we
    241    might wrongly be re-using a connection that was closed by the remote peer.
    242 
    243    Some time during the DO function, the `Curl_setup_transfer()` function must
    244    be called with some basic info about the upcoming transfer: what socket(s)
    245    to read/write and the expected file transfer sizes (if known).
    246 
    247 <a name="Curl_readwrite"></a>
    248 Curl_readwrite()
    249 ----------------
    250 
    251    Called during the transfer of the actual protocol payload.
    252 
    253    During transfer, the progress functions in lib/progress.c are called at a
    254    frequent interval (or at the user's choice, a specified callback might get
    255    called). The speedcheck functions in lib/speedcheck.c are also used to
    256    verify that the transfer is as fast as required.
    257 
    258 <a name="Curl_done"></a>
    259 Curl_done()
    260 -----------
    261 
    262    Called after a transfer is done. This function takes care of everything
    263    that has to be done after a transfer. This function attempts to leave
    264    matters in a state so that `Curl_do()` should be possible to call again on
    265    the same connection (in a persistent connection case). It might also soon
    266    be closed with `Curl_disconnect()`.
    267 
    268 <a name="Curl_disconnect"></a>
    269 Curl_disconnect()
    270 -----------------
    271 
    272    When doing normal connections and transfers, no one ever tries to close any
    273    connections so this is not normally called when `curl_easy_perform()` is
    274    used. This function is only used when we are certain that no more transfers
    275    is going to be made on the connection. It can be also closed by force, or
    276    it can be called to make sure that libcurl doesn't keep too many
    277    connections alive at the same time.
    278 
    279    This function cleans up all resources that are associated with a single
    280    connection.
    281 
    282 <a name="http"></a>
    283 HTTP(S)
    284 =======
    285 
    286  HTTP offers a lot and is the protocol in curl that uses the most lines of
    287  code. There is a special file (lib/formdata.c) that offers all the multipart
    288  post functions.
    289 
    290  base64-functions for user+password stuff (and more) is in (lib/base64.c) and
    291  all functions for parsing and sending cookies are found in (lib/cookie.c).
    292 
    293  HTTPS uses in almost every means the same procedure as HTTP, with only two
    294  exceptions: the connect procedure is different and the function used to read
    295  or write from the socket is different, although the latter fact is hidden in
    296  the source by the use of `Curl_read()` for reading and `Curl_write()` for
    297  writing data to the remote server.
    298 
    299  `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
    300  encoding.
    301 
    302  An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
    303  series of functions we use. They append data to one single buffer, and when
    304  the building is done the entire request is sent off in one single write. This
    305  is done this way to overcome problems with flawed firewalls and lame servers.
    306 
    307 <a name="ftp"></a>
    308 FTP
    309 ===
    310 
    311  The `Curl_if2ip()` function can be used for getting the IP number of a
    312  specified network interface, and it resides in lib/if2ip.c.
    313 
    314  `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
    315  was made a separate function to prevent us programmers from forgetting that
    316  they must be CRLF terminated. They must also be sent in one single write() to
    317  make firewalls and similar happy.
    318 
    319 <a name="kerberos"></a>
    320 Kerberos
    321 --------
    322 
    323  Kerberos support is mainly in lib/krb5.c and lib/security.c but also
    324  `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
    325  `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
    326 
    327 <a name="telnet"></a>
    328 TELNET
    329 ======
    330 
    331  Telnet is implemented in lib/telnet.c.
    332 
    333 <a name="file"></a>
    334 FILE
    335 ====
    336 
    337  The file:// protocol is dealt with in lib/file.c.
    338 
    339 <a name="smb"></a>
    340 SMB
    341 ===
    342 
    343  The smb:// protocol is dealt with in lib/smb.c.
    344 
    345 <a name="ldap"></a>
    346 LDAP
    347 ====
    348 
    349  Everything LDAP is in lib/ldap.c and lib/openldap.c
    350 
    351 <a name="email"></a>
    352 E-mail
    353 ======
    354 
    355  The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
    356 
    357 <a name="general"></a>
    358 General
    359 =======
    360 
    361  URL encoding and decoding, called escaping and unescaping in the source code,
    362  is found in lib/escape.c.
    363 
    364  While transferring data in Transfer() a few functions might get used.
    365  `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
    366 
    367  lib/getenv.c offers `curl_getenv()` which is for reading environment
    368  variables in a neat platform independent way. That's used in the client, but
    369  also in lib/url.c when checking the proxy environment variables. Note that
    370  contrary to the normal unix getenv(), this returns an allocated buffer that
    371  must be free()ed after use.
    372 
    373  lib/netrc.c holds the .netrc parser
    374 
    375  lib/timeval.c features replacement functions for systems that don't have
    376  gettimeofday() and a few support functions for timeval conversions.
    377 
    378  A function named `curl_version()` that returns the full curl version string
    379  is found in lib/version.c.
    380 
    381 <a name="persistent"></a>
    382 Persistent Connections
    383 ======================
    384 
    385  The persistent connection support in libcurl requires some considerations on
    386  how to do things inside of the library.
    387 
    388  - The 'Curl_easy' struct returned in the [`curl_easy_init()`][2] call
    389    must never hold connection-oriented data. It is meant to hold the root data
    390    as well as all the options etc that the library-user may choose.
    391 
    392  - The 'Curl_easy' struct holds the "connection cache" (an array of
    393    pointers to 'connectdata' structs).
    394 
    395  - This enables the 'curl handle' to be reused on subsequent transfers.
    396 
    397  - When libcurl is told to perform a transfer, it first checks for an already
    398    existing connection in the cache that we can use. Otherwise it creates a
    399    new one and adds that the cache. If the cache is full already when a new
    400    connection is added added, it will first close the oldest unused one.
    401 
    402  - When the transfer operation is complete, the connection is left
    403    open. Particular options may tell libcurl not to, and protocols may signal
    404    closure on connections and then they won't be kept open of course.
    405 
    406  - When `curl_easy_cleanup()` is called, we close all still opened connections,
    407    unless of course the multi interface "owns" the connections.
    408 
    409  The curl handle must be re-used in order for the persistent connections to
    410  work.
    411 
    412 <a name="multi"></a>
    413 multi interface/non-blocking
    414 ============================
    415 
    416  The multi interface is a non-blocking interface to the library. To make that
    417  interface work as good as possible, no low-level functions within libcurl
    418  must be written to work in a blocking manner. (There are still a few spots
    419  violating this rule.)
    420 
    421  One of the primary reasons we introduced c-ares support was to allow the name
    422  resolve phase to be perfectly non-blocking as well.
    423 
    424  The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
    425  the code to allow non-blocking operations even on multi-stage command-
    426  response protocols. They are built around state machines that return when
    427  they would otherwise block waiting for data.  The DICT, LDAP and TELNET
    428  protocols are crappy examples and they are subject for rewrite in the future
    429  to better fit the libcurl protocol family.
    430 
    431 <a name="ssl"></a>
    432 SSL libraries
    433 =============
    434 
    435  Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
    436  extended to its successor OpenSSL but has since also been extended to several
    437  other SSL/TLS libraries and we expect and hope to further extend the support
    438  in future libcurl versions.
    439 
    440  To deal with this internally in the best way possible, we have a generic SSL
    441  function API as provided by the vtls/vtls.[ch] system, and they are the only
    442  SSL functions we must use from within libcurl. vtls is then crafted to use
    443  the appropriate lower-level function calls to whatever SSL library that is in
    444  use. For example vtls/openssl.[ch] for the OpenSSL library.
    445 
    446 <a name="symbols"></a>
    447 Library Symbols
    448 ===============
    449 
    450  All symbols used internally in libcurl must use a `Curl_` prefix if they're
    451  used in more than a single file. Single-file symbols must be made static.
    452  Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
    453  but they are to be changed to follow this pattern in future versions.) Public
    454  API functions are marked with `CURL_EXTERN` in the public header files so
    455  that all others can be hidden on platforms where this is possible.
    456 
    457 <a name="returncodes"></a>
    458 Return Codes and Informationals
    459 ===============================
    460 
    461  I've made things simple. Almost every function in libcurl returns a CURLcode,
    462  that must be `CURLE_OK` if everything is OK or otherwise a suitable error
    463  code as the curl/curl.h include file defines. The very spot that detects an
    464  error must use the `Curl_failf()` function to set the human-readable error
    465  description.
    466 
    467  In aiding the user to understand what's happening and to debug curl usage, we
    468  must supply a fair amount of informational messages by using the
    469  `Curl_infof()` function. Those messages are only displayed when the user
    470  explicitly asks for them. They are best used when revealing information that
    471  isn't otherwise obvious.
    472 
    473 <a name="abi"></a>
    474 API/ABI
    475 =======
    476 
    477  We make an effort to not export or show internals or how internals work, as
    478  that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
    479  for our promise to users.
    480 
    481 <a name="client"></a>
    482 Client
    483 ======
    484 
    485  main() resides in `src/tool_main.c`.
    486 
    487  `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
    488  to display the complete "manual" and the src/tool_urlglob.c file holds the
    489  functions used for the URL-"globbing" support. Globbing in the sense that the
    490  {} and [] expansion stuff is there.
    491 
    492  The client mostly messes around to setup its 'config' struct properly, then
    493  it calls the `curl_easy_*()` functions of the library and when it gets back
    494  control after the `curl_easy_perform()` it cleans up the library, checks
    495  status and exits.
    496 
    497  When the operation is done, the ourWriteOut() function in src/writeout.c may
    498  be called to report about the operation. That function is using the
    499  `curl_easy_getinfo()` function to extract useful information from the curl
    500  session.
    501 
    502  It may loop and do all this several times if many URLs were specified on the
    503  command line or config file.
    504 
    505 <a name="memorydebug"></a>
    506 Memory Debugging
    507 ================
    508 
    509  The file lib/memdebug.c contains debug-versions of a few functions. Functions
    510  such as malloc, free, fopen, fclose, etc that somehow deal with resources
    511  that might give us problems if we "leak" them. The functions in the memdebug
    512  system do nothing fancy, they do their normal function and then log
    513  information about what they just did. The logged data can then be analyzed
    514  after a complete session,
    515 
    516  memanalyze.pl is the perl script present in tests/ that analyzes a log file
    517  generated by the memory tracking system. It detects if resources are
    518  allocated but never freed and other kinds of errors related to resource
    519  management.
    520 
    521  Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
    522  is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
    523  differentiate code which is _only_ used for memory tracking/debugging.
    524 
    525  Use -DCURLDEBUG when compiling to enable memory debugging, this is also
    526  switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
    527  when compiling to enable a debug build or run configure with --enable-debug.
    528 
    529  curl --version will list 'Debug' feature for debug enabled builds, and
    530  will list 'TrackMemory' feature for curl debug memory tracking capable
    531  builds. These features are independent and can be controlled when running
    532  the configure script. When --enable-debug is given both features will be
    533  enabled, unless some restriction prevents memory tracking from being used.
    534 
    535 <a name="test"></a>
    536 Test Suite
    537 ==========
    538 
    539  The test suite is placed in its own subdirectory directly off the root in the
    540  curl archive tree, and it contains a bunch of scripts and a lot of test case
    541  data.
    542 
    543  The main test script is runtests.pl that will invoke test servers like
    544  httpserver.pl and ftpserver.pl before all the test cases are performed. The
    545  test suite currently only runs on unix-like platforms.
    546 
    547  You'll find a description of the test suite in the tests/README file, and the
    548  test case data files in the tests/FILEFORMAT file.
    549 
    550  The test suite automatically detects if curl was built with the memory
    551  debugging enabled, and if it was it will detect memory leaks, too.
    552 
    553 <a name="asyncdns"></a>
    554 Asynchronous name resolves
    555 ==========================
    556 
    557  libcurl can be built to do name resolves asynchronously, using either the
    558  normal resolver in a threaded manner or by using c-ares.
    559 
    560 <a name="cares"></a>
    561 [c-ares][3]
    562 ------
    563 
    564 ### Build libcurl to use a c-ares
    565 
    566 1. ./configure --enable-ares=/path/to/ares/install
    567 2. make
    568 
    569 ### c-ares on win32
    570 
    571  First I compiled c-ares. I changed the default C runtime library to be the
    572  single-threaded rather than the multi-threaded (this seems to be required to
    573  prevent linking errors later on). Then I simply build the areslib project
    574  (the other projects adig/ahost seem to fail under MSVC).
    575 
    576  Next was libcurl. I opened lib/config-win32.h and I added a:
    577  `#define USE_ARES 1`
    578 
    579  Next thing I did was I added the path for the ares includes to the include
    580  path, and the libares.lib to the libraries.
    581 
    582  Lastly, I also changed libcurl to be single-threaded rather than
    583  multi-threaded, again this was to prevent some duplicate symbol errors. I'm
    584  not sure why I needed to change everything to single-threaded, but when I
    585  didn't I got redefinition errors for several CRT functions (malloc, stricmp,
    586  etc.)
    587 
    588 <a name="curl_off_t"></a>
    589 `curl_off_t`
    590 ==========
    591 
    592  curl_off_t is a data type provided by the external libcurl include
    593  headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
    594  options that end with LARGE. The type is 64bit large on most modern
    595  platforms.
    596 
    597 curlx
    598 =====
    599 
    600  The libcurl source code offers a few functions by source only. They are not
    601  part of the official libcurl API, but the source files might be useful for
    602  others so apps can optionally compile/build with these sources to gain
    603  additional functions.
    604 
    605  We provide them through a single header file for easy access for apps:
    606  "curlx.h"
    607 
    608 `curlx_strtoofft()`
    609 -------------------
    610    A macro that converts a string containing a number to a curl_off_t number.
    611    This might use the curlx_strtoll() function which is provided as source
    612    code in strtoofft.c. Note that the function is only provided if no
    613    strtoll() (or equivalent) function exist on your platform. If curl_off_t
    614    is only a 32 bit number on your platform, this macro uses strtol().
    615 
    616 `curlx_tvnow()`
    617 ---------------
    618    returns a struct timeval for the current time.
    619 
    620 `curlx_tvdiff()`
    621 --------------
    622    returns the difference between two timeval structs, in number of
    623    milliseconds.
    624 
    625 `curlx_tvdiff_secs()`
    626 ---------------------
    627    returns the same as curlx_tvdiff but with full usec resolution (as a
    628    double)
    629 
    630 Future
    631 ------
    632 
    633  Several functions will be removed from the public curl_ name space in a
    634  future libcurl release. They will then only become available as curlx_
    635  functions instead. To make the transition easier, we already today provide
    636  these functions with the curlx_ prefix to allow sources to get built properly
    637  with the new function names. The functions this concerns are:
    638 
    639  - `curlx_getenv`
    640  - `curlx_strequal`
    641  - `curlx_strnequal`
    642  - `curlx_mvsnprintf`
    643  - `curlx_msnprintf`
    644  - `curlx_maprintf`
    645  - `curlx_mvaprintf`
    646  - `curlx_msprintf`
    647  - `curlx_mprintf`
    648  - `curlx_mfprintf`
    649  - `curlx_mvsprintf`
    650  - `curlx_mvprintf`
    651  - `curlx_mvfprintf`
    652 
    653 <a name="contentencoding"></a>
    654 Content Encoding
    655 ================
    656 
    657 ## About content encodings
    658 
    659  [HTTP/1.1][4] specifies that a client may request that a server encode its
    660  response. This is usually used to compress a response using one of a set of
    661  commonly available compression techniques. These schemes are 'deflate' (the
    662  zlib algorithm), 'gzip' and 'compress'. A client requests that the sever
    663  perform an encoding by including an Accept-Encoding header in the request
    664  document. The value of the header should be one of the recognized tokens
    665  'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of
    666  the spec). A server MAY honor the client's encoding request. When a response
    667  is encoded, the server includes a Content-Encoding header in the
    668  response. The value of the Content-Encoding header indicates which scheme was
    669  used to encode the data.
    670 
    671  A client may tell a server that it can understand several different encoding
    672  schemes. In this case the server may choose any one of those and use it to
    673  encode the response (indicating which one using the Content-Encoding header).
    674  It's also possible for a client to attach priorities to different schemes so
    675  that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
    676  information on the Accept-Encoding header.
    677 
    678 ## Supported content encodings
    679 
    680  The 'deflate' and 'gzip' content encoding are supported by libcurl. Both
    681  regular and chunked transfers work fine.  The zlib library is required for
    682  this feature.
    683 
    684 ## The libcurl interface
    685 
    686  To cause libcurl to request a content encoding use:
    687 
    688   [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
    689 
    690  where string is the intended value of the Accept-Encoding header.
    691 
    692  Currently, libcurl only understands how to process responses that use the
    693  "deflate" or "gzip" Content-Encoding, so the only values for
    694  [`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does
    695  nothing) are "deflate" and "gzip" If a response is encoded using the
    696  "compress" or methods, libcurl will return an error indicating that the
    697  response could not be decoded.  If <string> is NULL no Accept-Encoding header
    698  is generated.  If <string> is a zero-length string, then an Accept-Encoding
    699  header containing all supported encodings will be generated.
    700 
    701  The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
    702  content to be automatically decoded.  If it is not set and the server still
    703  sends encoded content (despite not having been asked), the data is returned
    704  in its raw form and the Content-Encoding type is not checked.
    705 
    706 ## The curl interface
    707 
    708  Use the [--compressed][6] option with curl to cause it to ask servers to
    709  compress responses using any format supported by curl.
    710 
    711 <a name="hostip"></a>
    712 hostip.c explained
    713 ==================
    714 
    715  The main compile-time defines to keep in mind when reading the host*.c source
    716  file are these:
    717 
    718 ## `CURLRES_IPV6`
    719 
    720  this host has getaddrinfo() and family, and thus we use that. The host may
    721  not be able to resolve IPv6, but we don't really have to take that into
    722  account. Hosts that aren't IPv6-enabled have CURLRES_IPV4 defined.
    723 
    724 ## `CURLRES_ARES`
    725 
    726  is defined if libcurl is built to use c-ares for asynchronous name
    727  resolves. This can be Windows or *nix.
    728 
    729 ## `CURLRES_THREADED`
    730 
    731  is defined if libcurl is built to use threading for asynchronous name
    732  resolves. The name resolve will be done in a new thread, and the supported
    733  asynch API will be the same as for ares-builds. This is the default under
    734  (native) Windows.
    735 
    736  If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
    737  libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
    738  defined.
    739 
    740 ## host*.c sources
    741 
    742  The host*.c sources files are split up like this:
    743 
    744  - hostip.c      - method-independent resolver functions and utility functions
    745  - hostasyn.c    - functions for asynchronous name resolves
    746  - hostsyn.c     - functions for synchronous name resolves
    747  - asyn-ares.c   - functions for asynchronous name resolves using c-ares
    748  - asyn-thread.c - functions for asynchronous name resolves using threads
    749  - hostip4.c     - IPv4 specific functions
    750  - hostip6.c     - IPv6 specific functions
    751 
    752  The hostip.h is the single united header file for all this. It defines the
    753  `CURLRES_*` defines based on the config*.h and curl_setup.h defines.
    754 
    755 <a name="memoryleak"></a>
    756 Track Down Memory Leaks
    757 =======================
    758 
    759 ## Single-threaded
    760 
    761   Please note that this memory leak system is not adjusted to work in more
    762   than one thread. If you want/need to use it in a multi-threaded app. Please
    763   adjust accordingly.
    764 
    765 
    766 ## Build
    767 
    768   Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
    769   --enable-debug fixes this). 'make clean' first, then 'make' so that all
    770   files actually are rebuilt properly. It will also make sense to build
    771   libcurl with the debug option (usually -g to the compiler) so that debugging
    772   it will be easier if you actually do find a leak in the library.
    773 
    774   This will create a library that has memory debugging enabled.
    775 
    776 ## Modify Your Application
    777 
    778   Add a line in your application code:
    779 
    780        `curl_memdebug("dump");`
    781 
    782   This will make the malloc debug system output a full trace of all resource
    783   using functions to the given file name. Make sure you rebuild your program
    784   and that you link with the same libcurl you built for this purpose as
    785   described above.
    786 
    787 ## Run Your Application
    788 
    789   Run your program as usual. Watch the specified memory trace file grow.
    790 
    791   Make your program exit and use the proper libcurl cleanup functions etc. So
    792   that all non-leaks are returned/freed properly.
    793 
    794 ## Analyze the Flow
    795 
    796   Use the tests/memanalyze.pl perl script to analyze the dump file:
    797 
    798     tests/memanalyze.pl dump
    799 
    800   This now outputs a report on what resources that were allocated but never
    801   freed etc. This report is very fine for posting to the list!
    802 
    803   If this doesn't produce any output, no leak was detected in libcurl. Then
    804   the leak is mostly likely to be in your code.
    805 
    806 <a name="multi_socket"></a>
    807 `multi_socket`
    808 ==============
    809 
    810  Implementation of the `curl_multi_socket` API
    811 
    812   The main ideas of this API are simply:
    813 
    814    1 - The application can use whatever event system it likes as it gets info
    815        from libcurl about what file descriptors libcurl waits for what action
    816        on. (The previous API returns `fd_sets` which is very select()-centric).
    817 
    818    2 - When the application discovers action on a single socket, it calls
    819        libcurl and informs that there was action on this particular socket and
    820        libcurl can then act on that socket/transfer only and not care about
    821        any other transfers. (The previous API always had to scan through all
    822        the existing transfers.)
    823 
    824   The idea is that [`curl_multi_socket_action()`][7] calls a given callback
    825   with information about what socket to wait for what action on, and the
    826   callback only gets called if the status of that socket has changed.
    827 
    828   We also added a timer callback that makes libcurl call the application when
    829   the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
    830   and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
    831   Internally, there's an added a struct to each easy handle in which we store
    832   an "expire time" (if any). The structs are then "splay sorted" so that we
    833   can add and remove times from the linked list and yet somewhat swiftly
    834   figure out both how long time there is until the next nearest timer expires
    835   and which timer (handle) we should take care of now. Of course, the upside
    836   of all this is that we get a [`curl_multi_timeout()`][8] that should also
    837   work with old-style applications that use [`curl_multi_perform()`][11].
    838 
    839   We created an internal "socket to easy handles" hash table that given
    840   a socket (file descriptor) return the easy handle that waits for action on
    841   that socket.  This hash is made using the already existing hash code
    842   (previously only used for the DNS cache).
    843 
    844   To make libcurl able to report plain sockets in the socket callback, we had
    845   to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
    846   the conversion from sockets to `fd_sets` for that function is only done in
    847   the last step before the data is returned. I also had to extend c-ares to
    848   get a function that can return plain sockets, as that library too returned
    849   only `fd_sets` and that is no longer good enough. The changes done to c-ares
    850   are available in c-ares 1.3.1 and later.
    851 
    852 <a name="structs"></a>
    853 Structs in libcurl
    854 ==================
    855 
    856 This section should cover 7.32.0 pretty accurately, but will make sense even
    857 for older and later versions as things don't change drastically that often.
    858 
    859 ## Curl_easy
    860 
    861   The Curl_easy struct is the one returned to the outside in the external API
    862   as a "CURL *". This is usually known as an easy handle in API documentations
    863   and examples.
    864 
    865   Information and state that is related to the actual connection is in the
    866   'connectdata' struct. When a transfer is about to be made, libcurl will
    867   either create a new connection or re-use an existing one. The particular
    868   connectdata that is used by this handle is pointed out by
    869   Curl_easy->easy_conn.
    870 
    871   Data and information that regard this particular single transfer is put in
    872   the SingleRequest sub-struct.
    873 
    874   When the Curl_easy struct is added to a multi handle, as it must be in order
    875   to do any transfer, the ->multi member will point to the `Curl_multi` struct
    876   it belongs to. The ->prev and ->next members will then be used by the multi
    877   code to keep a linked list of Curl_easy structs that are added to that same
    878   multi handle. libcurl always uses multi so ->multi *will* point to a
    879   `Curl_multi` when a transfer is in progress.
    880 
    881   ->mstate is the multi state of this particular Curl_easy. When
    882   `multi_runsingle()` is called, it will act on this handle according to which
    883   state it is in. The mstate is also what tells which sockets to return for a
    884   specific Curl_easy when [`curl_multi_fdset()`][12] is called etc.
    885 
    886   The libcurl source code generally use the name 'data' for the variable that
    887   points to the Curl_easy.
    888 
    889   When doing multiplexed HTTP/2 transfers, each Curl_easy is associated with
    890   an individual stream, sharing the same connectdata struct. Multiplexing
    891   makes it even more important to keep things associated with the right thing!
    892 
    893 ## connectdata
    894 
    895   A general idea in libcurl is to keep connections around in a connection
    896   "cache" after they have been used in case they will be used again and then
    897   re-use an existing one instead of creating a new as it creates a significant
    898   performance boost.
    899 
    900   Each 'connectdata' identifies a single physical connection to a server. If
    901   the connection can't be kept alive, the connection will be closed after use
    902   and then this struct can be removed from the cache and freed.
    903 
    904   Thus, the same Curl_easy can be used multiple times and each time select
    905   another connectdata struct to use for the connection. Keep this in mind, as
    906   it is then important to consider if options or choices are based on the
    907   connection or the Curl_easy.
    908 
    909   Functions in libcurl will assume that connectdata->data points to the
    910   Curl_easy that uses this connection (for the moment).
    911 
    912   As a special complexity, some protocols supported by libcurl require a
    913   special disconnect procedure that is more than just shutting down the
    914   socket. It can involve sending one or more commands to the server before
    915   doing so. Since connections are kept in the connection cache after use, the
    916   original Curl_easy may no longer be around when the time comes to shut down
    917   a particular connection. For this purpose, libcurl holds a special dummy
    918   `closure_handle` Curl_easy in the `Curl_multi` struct to use when needed.
    919 
    920   FTP uses two TCP connections for a typical transfer but it keeps both in
    921   this single struct and thus can be considered a single connection for most
    922   internal concerns.
    923 
    924   The libcurl source code generally use the name 'conn' for the variable that
    925   points to the connectdata.
    926 
    927 ## Curl_multi
    928 
    929   Internally, the easy interface is implemented as a wrapper around multi
    930   interface functions. This makes everything multi interface.
    931 
    932   `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs.
    933 
    934   This struct holds a list of Curl_easy structs that have been added to this
    935   handle with [`curl_multi_add_handle()`][13]. The start of the list is
    936   ->easyp and ->num_easy is a counter of added Curl_easys.
    937 
    938   ->msglist is a linked list of messages to send back when
    939   [`curl_multi_info_read()`][14] is called. Basically a node is added to that
    940   list when an individual Curl_easy's transfer has completed.
    941 
    942   ->hostcache points to the name cache. It is a hash table for looking up name
    943   to IP. The nodes have a limited life time in there and this cache is meant
    944   to reduce the time for when the same name is wanted within a short period of
    945   time.
    946 
    947   ->timetree points to a tree of Curl_easys, sorted by the remaining time
    948   until it should be checked - normally some sort of timeout. Each Curl_easy
    949   has one node in the tree.
    950 
    951   ->sockhash is a hash table to allow fast lookups of socket descriptor to
    952   which Curl_easy that uses that descriptor. This is necessary for the
    953   `multi_socket` API.
    954 
    955   ->conn_cache points to the connection cache. It keeps track of all
    956   connections that are kept after use. The cache has a maximum size.
    957 
    958   ->closure_handle is described in the 'connectdata' section.
    959 
    960   The libcurl source code generally use the name 'multi' for the variable that
    961   points to the Curl_multi struct.
    962 
    963 ## Curl_handler
    964 
    965   Each unique protocol that is supported by libcurl needs to provide at least
    966   one `Curl_handler` struct. It defines what the protocol is called and what
    967   functions the main code should call to deal with protocol specific issues.
    968   In general, there's a source file named [protocol].c in which there's a
    969   "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's
    970   then the main array with all individual `Curl_handler` structs pointed to
    971   from a single array which is scanned through when a URL is given to libcurl
    972   to work with.
    973 
    974   ->scheme is the URL scheme name, usually spelled out in uppercase. That's
    975   "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler`
    976   setup so HTTPS separate from HTTP.
    977 
    978   ->setup_connection is called to allow the protocol code to allocate protocol
    979   specific data that then gets associated with that Curl_easy for the rest of
    980   this transfer. It gets freed again at the end of the transfer. It will be
    981   called before the 'connectdata' for the transfer has been selected/created.
    982   Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
    983   Curl_easy->req.protop to point to it.
    984 
    985   ->connect_it allows a protocol to do some specific actions after the TCP
    986   connect is done, that can still be considered part of the connection phase.
    987 
    988   Some protocols will alter the connectdata->recv[] and connectdata->send[]
    989   function pointers in this function.
    990 
    991   ->connecting is similarly a function that keeps getting called as long as the
    992   protocol considers itself still in the connecting phase.
    993 
    994   ->do_it is the function called to issue the transfer request. What we call
    995   the DO action internally. If the DO is not enough and things need to be kept
    996   getting done for the entire DO sequence to complete, ->doing is then usually
    997   also provided. Each protocol that needs to do multiple commands or similar
    998   for do/doing need to implement their own state machines (see SCP, SFTP,
    999   FTP). Some protocols (only FTP and only due to historical reasons) has a
   1000   separate piece of the DO state called `DO_MORE`.
   1001 
   1002   ->doing keeps getting called while issuing the transfer request command(s)
   1003 
   1004   ->done gets called when the transfer is complete and DONE. That's after the
   1005   main data has been transferred.
   1006 
   1007   ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this
   1008   state when setting up the second connection.
   1009 
   1010   ->`proto_getsock`
   1011   ->`doing_getsock`
   1012   ->`domore_getsock`
   1013   ->`perform_getsock`
   1014   Functions that return socket information. Which socket(s) to wait for which
   1015   action(s) during the particular multi state.
   1016 
   1017   ->disconnect is called immediately before the TCP connection is shutdown.
   1018 
   1019   ->readwrite gets called during transfer to allow the protocol to do extra
   1020   reads/writes
   1021 
   1022   ->defport is the default report TCP or UDP port this protocol uses
   1023 
   1024   ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions
   1025   have their "base" protocol set and then the SSL variation. Like
   1026   "HTTP|HTTPS".
   1027 
   1028   ->flags is a bitmask with additional information about the protocol that will
   1029   make it get treated differently by the generic engine:
   1030 
   1031   - `PROTOPT_SSL` - will make it connect and negotiate SSL
   1032 
   1033   - `PROTOPT_DUAL` - this protocol uses two connections
   1034 
   1035   - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the
   1036     connection. This flag is no longer used by code, yet still set for a bunch
   1037     protocol handlers.
   1038 
   1039   - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to
   1040     limit which "direction" of socket actions that the main engine will
   1041     concern itself about.
   1042 
   1043   - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:)
   1044 
   1045   - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default
   1046     one unless one is provided
   1047 
   1048   - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL
   1049     (?foo=bar)
   1050 
   1051 ## conncache
   1052 
   1053   Is a hash table with connections for later re-use. Each Curl_easy has a
   1054   pointer to its connection cache. Each multi handle sets up a connection
   1055   cache that all added Curl_easys share by default.
   1056 
   1057 ## Curl_share
   1058 
   1059   The libcurl share API allocates a `Curl_share` struct, exposed to the
   1060   external API as "CURLSH *".
   1061 
   1062   The idea is that the struct can have a set of own versions of caches and
   1063   pools and then by providing this struct in the `CURLOPT_SHARE` option, those
   1064   specific Curl_easys will use the caches/pools that this share handle
   1065   holds.
   1066 
   1067   Then individual Curl_easy structs can be made to share specific things
   1068   that they otherwise wouldn't, such as cookies.
   1069 
   1070   The `Curl_share` struct can currently hold cookies, DNS cache and the SSL
   1071   session cache.
   1072 
   1073 ## CookieInfo
   1074 
   1075   This is the main cookie struct. It holds all known cookies and related
   1076   information. Each Curl_easy has its own private CookieInfo even when
   1077   they are added to a multi handle. They can be made to share cookies by using
   1078   the share API.
   1079 
   1080 
   1081 [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html
   1082 [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html
   1083 [3]: http://c-ares.haxx.se/
   1084 [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
   1085 [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
   1086 [6]: https://curl.haxx.se/docs/manpage.html#--compressed
   1087 [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
   1088 [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html
   1089 [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html
   1090 [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
   1091 [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html
   1092 [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
   1093 [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html
   1094 [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html
   1095