1 curl internals 2 ============== 3 4 - [Intro](#intro) 5 - [git](#git) 6 - [Portability](#Portability) 7 - [Windows vs Unix](#winvsunix) 8 - [Library](#Library) 9 - [`Curl_connect`](#Curl_connect) 10 - [`Curl_do`](#Curl_do) 11 - [`Curl_readwrite`](#Curl_readwrite) 12 - [`Curl_done`](#Curl_done) 13 - [`Curl_disconnect`](#Curl_disconnect) 14 - [HTTP(S)](#http) 15 - [FTP](#ftp) 16 - [Kerberos](#kerberos) 17 - [TELNET](#telnet) 18 - [FILE](#file) 19 - [SMB](#smb) 20 - [LDAP](#ldap) 21 - [E-mail](#email) 22 - [General](#general) 23 - [Persistent Connections](#persistent) 24 - [multi interface/non-blocking](#multi) 25 - [SSL libraries](#ssl) 26 - [Library Symbols](#symbols) 27 - [Return Codes and Informationals](#returncodes) 28 - [AP/ABI](#abi) 29 - [Client](#client) 30 - [Memory Debugging](#memorydebug) 31 - [Test Suite](#test) 32 - [Asynchronous name resolves](#asyncdns) 33 - [c-ares](#cares) 34 - [`curl_off_t`](#curl_off_t) 35 - [curlx](#curlx) 36 - [Content Encoding](#contentencoding) 37 - [hostip.c explained](#hostip) 38 - [Track Down Memory Leaks](#memoryleak) 39 - [`multi_socket`](#multi_socket) 40 - [Structs in libcurl](#structs) 41 42 <a name="intro"></a> 43 Intro 44 ===== 45 46 This project is split in two. The library and the client. The client part 47 uses the library, but the library is designed to allow other applications to 48 use it. 49 50 The largest amount of code and complexity is in the library part. 51 52 53 <a name="git"></a> 54 git 55 === 56 57 All changes to the sources are committed to the git repository as soon as 58 they're somewhat verified to work. Changes shall be committed as independently 59 as possible so that individual changes can be easily spotted and tracked 60 afterwards. 61 62 Tagging shall be used extensively, and by the time we release new archives we 63 should tag the sources with a name similar to the released version number. 64 65 <a name="Portability"></a> 66 Portability 67 =========== 68 69 We write curl and libcurl to compile with C89 compilers. On 32bit and up 70 machines. Most of libcurl assumes more or less POSIX compliance but that's 71 not a requirement. 72 73 We write libcurl to build and work with lots of third party tools, and we 74 want it to remain functional and buildable with these and later versions 75 (older versions may still work but is not what we work hard to maintain): 76 77 Dependencies 78 ------------ 79 80 - OpenSSL 0.9.7 81 - GnuTLS 1.2 82 - zlib 1.1.4 83 - libssh2 0.16 84 - c-ares 1.6.0 85 - libidn2 2.0.0 86 - cyassl 2.0.0 87 - openldap 2.0 88 - MIT Kerberos 1.2.4 89 - GSKit V5R3M0 90 - NSS 3.14.x 91 - axTLS 2.1.0 92 - PolarSSL 1.3.0 93 - Heimdal ? 94 - nghttp2 1.0.0 95 96 Operating Systems 97 ----------------- 98 99 On systems where configure runs, we aim at working on them all - if they have 100 a suitable C compiler. On systems that don't run configure, we strive to keep 101 curl running correctly on: 102 103 - Windows 98 104 - AS/400 V5R3M0 105 - Symbian 9.1 106 - Windows CE ? 107 - TPF ? 108 109 Build tools 110 ----------- 111 112 When writing code (mostly for generating stuff included in release tarballs) 113 we use a few "build tools" and we make sure that we remain functional with 114 these versions: 115 116 - GNU Libtool 1.4.2 117 - GNU Autoconf 2.57 118 - GNU Automake 1.7 119 - GNU M4 1.4 120 - perl 5.004 121 - roffit 0.5 122 - groff ? (any version that supports "groff -Tps -man [in] [out]") 123 - ps2pdf (gs) ? 124 125 <a name="winvsunix"></a> 126 Windows vs Unix 127 =============== 128 129 There are a few differences in how to program curl the Unix way compared to 130 the Windows way. Perhaps the four most notable details are: 131 132 1. Different function names for socket operations. 133 134 In curl, this is solved with defines and macros, so that the source looks 135 the same in all places except for the header file that defines them. The 136 macros in use are sclose(), sread() and swrite(). 137 138 2. Windows requires a couple of init calls for the socket stuff. 139 140 That's taken care of by the `curl_global_init()` call, but if other libs 141 also do it etc there might be reasons for applications to alter that 142 behaviour. 143 144 3. The file descriptors for network communication and file operations are 145 not as easily interchangeable as in Unix. 146 147 We avoid this by not trying any funny tricks on file descriptors. 148 149 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus 150 destroying binary data, although you do want that conversion if it is 151 text coming through... (sigh) 152 153 We set stdout to binary under windows 154 155 Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All 156 conditionals that deal with features *should* instead be in the format 157 `#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts, 158 we maintain a `curl_config-win32.h` file in lib directory that is supposed to 159 look exactly like a `curl_config.h` file would have looked like on a Windows 160 machine! 161 162 Generally speaking: always remember that this will be compiled on dozens of 163 operating systems. Don't walk on the edge! 164 165 <a name="Library"></a> 166 Library 167 ======= 168 169 (See [Structs in libcurl](#structs) for the separate section describing all 170 major internal structs and their purposes.) 171 172 There are plenty of entry points to the library, namely each publicly defined 173 function that libcurl offers to applications. All of those functions are 174 rather small and easy-to-follow. All the ones prefixed with `curl_easy` are 175 put in the lib/easy.c file. 176 177 `curl_global_init()` and `curl_global_cleanup()` should be called by the 178 application to initialize and clean up global stuff in the library. As of 179 today, it can handle the global SSL initing if SSL is enabled and it can init 180 the socket layer on windows machines. libcurl itself has no "global" scope. 181 182 All printf()-style functions use the supplied clones in lib/mprintf.c. This 183 makes sure we stay absolutely platform independent. 184 185 [ `curl_easy_init()`][2] allocates an internal struct and makes some 186 initializations. The returned handle does not reveal internals. This is the 187 `Curl_easy` struct which works as an "anchor" struct for all `curl_easy` 188 functions. All connections performed will get connect-specific data allocated 189 that should be used for things related to particular connections/requests. 190 191 [`curl_easy_setopt()`][1] takes three arguments, where the option stuff must 192 be passed in pairs: the parameter-ID and the parameter-value. The list of 193 options is documented in the man page. This function mainly sets things in 194 the `Curl_easy` struct. 195 196 `curl_easy_perform()` is just a wrapper function that makes use of the multi 197 API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`, 198 `curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done 199 and then returns. 200 201 Some of the most important key functions in url.c are called from multi.c 202 when certain key steps are to be made in the transfer operation. 203 204 <a name="Curl_connect"></a> 205 Curl_connect() 206 -------------- 207 208 Analyzes the URL, it separates the different components and connects to the 209 remote host. This may involve using a proxy and/or using SSL. The 210 `Curl_resolv()` function in lib/hostip.c is used for looking up host names 211 (it does then use the proper underlying method, which may vary between 212 platforms and builds). 213 214 When `Curl_connect` is done, we are connected to the remote site. Then it 215 is time to tell the server to get a document/file. `Curl_do()` arranges 216 this. 217 218 This function makes sure there's an allocated and initiated 'connectdata' 219 struct that is used for this particular connection only (although there may 220 be several requests performed on the same connect). A bunch of things are 221 inited/inherited from the `Curl_easy` struct. 222 223 <a name="Curl_do"></a> 224 Curl_do() 225 --------- 226 227 `Curl_do()` makes sure the proper protocol-specific function is called. The 228 functions are named after the protocols they handle. 229 230 The protocol-specific functions of course deal with protocol-specific 231 negotiations and setup. They have access to the `Curl_sendf()` (from 232 lib/sendf.c) function to send printf-style formatted data to the remote 233 host and when they're ready to make the actual file transfer they call the 234 `Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and 235 returns. 236 237 If this DO function fails and the connection is being re-used, libcurl will 238 then close this connection, setup a new connection and re-issue the DO 239 request on that. This is because there is no way to be perfectly sure that 240 we have discovered a dead connection before the DO function and thus we 241 might wrongly be re-using a connection that was closed by the remote peer. 242 243 Some time during the DO function, the `Curl_setup_transfer()` function must 244 be called with some basic info about the upcoming transfer: what socket(s) 245 to read/write and the expected file transfer sizes (if known). 246 247 <a name="Curl_readwrite"></a> 248 Curl_readwrite() 249 ---------------- 250 251 Called during the transfer of the actual protocol payload. 252 253 During transfer, the progress functions in lib/progress.c are called at 254 frequent intervals (or at the user's choice, a specified callback might get 255 called). The speedcheck functions in lib/speedcheck.c are also used to 256 verify that the transfer is as fast as required. 257 258 <a name="Curl_done"></a> 259 Curl_done() 260 ----------- 261 262 Called after a transfer is done. This function takes care of everything 263 that has to be done after a transfer. This function attempts to leave 264 matters in a state so that `Curl_do()` should be possible to call again on 265 the same connection (in a persistent connection case). It might also soon 266 be closed with `Curl_disconnect()`. 267 268 <a name="Curl_disconnect"></a> 269 Curl_disconnect() 270 ----------------- 271 272 When doing normal connections and transfers, no one ever tries to close any 273 connections so this is not normally called when `curl_easy_perform()` is 274 used. This function is only used when we are certain that no more transfers 275 are going to be made on the connection. It can be also closed by force, or 276 it can be called to make sure that libcurl doesn't keep too many 277 connections alive at the same time. 278 279 This function cleans up all resources that are associated with a single 280 connection. 281 282 <a name="http"></a> 283 HTTP(S) 284 ======= 285 286 HTTP offers a lot and is the protocol in curl that uses the most lines of 287 code. There is a special file (lib/formdata.c) that offers all the multipart 288 post functions. 289 290 base64-functions for user+password stuff (and more) is in (lib/base64.c) and 291 all functions for parsing and sending cookies are found in (lib/cookie.c). 292 293 HTTPS uses in almost every case the same procedure as HTTP, with only two 294 exceptions: the connect procedure is different and the function used to read 295 or write from the socket is different, although the latter fact is hidden in 296 the source by the use of `Curl_read()` for reading and `Curl_write()` for 297 writing data to the remote server. 298 299 `http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer 300 encoding. 301 302 An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()` 303 series of functions we use. They append data to one single buffer, and when 304 the building is finished the entire request is sent off in one single write. This is done this way to overcome problems with flawed firewalls and lame servers. 305 306 <a name="ftp"></a> 307 FTP 308 === 309 310 The `Curl_if2ip()` function can be used for getting the IP number of a 311 specified network interface, and it resides in lib/if2ip.c. 312 313 `Curl_ftpsendf()` is used for sending FTP commands to the remote server. It 314 was made a separate function to prevent us programmers from forgetting that 315 they must be CRLF terminated. They must also be sent in one single write() to 316 make firewalls and similar happy. 317 318 <a name="kerberos"></a> 319 Kerberos 320 -------- 321 322 Kerberos support is mainly in lib/krb5.c and lib/security.c but also 323 `curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and 324 `socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics. 325 326 <a name="telnet"></a> 327 TELNET 328 ====== 329 330 Telnet is implemented in lib/telnet.c. 331 332 <a name="file"></a> 333 FILE 334 ==== 335 336 The file:// protocol is dealt with in lib/file.c. 337 338 <a name="smb"></a> 339 SMB 340 === 341 342 The smb:// protocol is dealt with in lib/smb.c. 343 344 <a name="ldap"></a> 345 LDAP 346 ==== 347 348 Everything LDAP is in lib/ldap.c and lib/openldap.c 349 350 <a name="email"></a> 351 E-mail 352 ====== 353 354 The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c. 355 356 <a name="general"></a> 357 General 358 ======= 359 360 URL encoding and decoding, called escaping and unescaping in the source code, 361 is found in lib/escape.c. 362 363 While transferring data in Transfer() a few functions might get used. 364 `curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more). 365 366 lib/getenv.c offers `curl_getenv()` which is for reading environment 367 variables in a neat platform independent way. That's used in the client, but 368 also in lib/url.c when checking the proxy environment variables. Note that 369 contrary to the normal unix getenv(), this returns an allocated buffer that 370 must be free()ed after use. 371 372 lib/netrc.c holds the .netrc parser 373 374 lib/timeval.c features replacement functions for systems that don't have 375 gettimeofday() and a few support functions for timeval conversions. 376 377 A function named `curl_version()` that returns the full curl version string 378 is found in lib/version.c. 379 380 <a name="persistent"></a> 381 Persistent Connections 382 ====================== 383 384 The persistent connection support in libcurl requires some considerations on 385 how to do things inside of the library. 386 387 - The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call 388 must never hold connection-oriented data. It is meant to hold the root data 389 as well as all the options etc that the library-user may choose. 390 391 - The `Curl_easy` struct holds the "connection cache" (an array of 392 pointers to 'connectdata' structs). 393 394 - This enables the 'curl handle' to be reused on subsequent transfers. 395 396 - When libcurl is told to perform a transfer, it first checks for an already 397 existing connection in the cache that we can use. Otherwise it creates a 398 new one and adds that to the cache. If the cache is full already when a new 399 connection is added, it will first close the oldest unused one. 400 401 - When the transfer operation is complete, the connection is left 402 open. Particular options may tell libcurl not to, and protocols may signal 403 closure on connections and then they won't be kept open, of course. 404 405 - When `curl_easy_cleanup()` is called, we close all still opened connections, 406 unless of course the multi interface "owns" the connections. 407 408 The curl handle must be re-used in order for the persistent connections to 409 work. 410 411 <a name="multi"></a> 412 multi interface/non-blocking 413 ============================ 414 415 The multi interface is a non-blocking interface to the library. To make that 416 interface work as well as possible, no low-level functions within libcurl 417 must be written to work in a blocking manner. (There are still a few spots 418 violating this rule.) 419 420 One of the primary reasons we introduced c-ares support was to allow the name 421 resolve phase to be perfectly non-blocking as well. 422 423 The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust 424 the code to allow non-blocking operations even on multi-stage command- 425 response protocols. They are built around state machines that return when 426 they would otherwise block waiting for data. The DICT, LDAP and TELNET 427 protocols are crappy examples and they are subject for rewrite in the future 428 to better fit the libcurl protocol family. 429 430 <a name="ssl"></a> 431 SSL libraries 432 ============= 433 434 Originally libcurl supported SSLeay for SSL/TLS transports, but that was then 435 extended to its successor OpenSSL but has since also been extended to several 436 other SSL/TLS libraries and we expect and hope to further extend the support 437 in future libcurl versions. 438 439 To deal with this internally in the best way possible, we have a generic SSL 440 function API as provided by the vtls/vtls.[ch] system, and they are the only 441 SSL functions we must use from within libcurl. vtls is then crafted to use 442 the appropriate lower-level function calls to whatever SSL library that is in 443 use. For example vtls/openssl.[ch] for the OpenSSL library. 444 445 <a name="symbols"></a> 446 Library Symbols 447 =============== 448 449 All symbols used internally in libcurl must use a `Curl_` prefix if they're 450 used in more than a single file. Single-file symbols must be made static. 451 Public ("exported") symbols must use a `curl_` prefix. (There are exceptions, 452 but they are to be changed to follow this pattern in future versions.) Public 453 API functions are marked with `CURL_EXTERN` in the public header files so 454 that all others can be hidden on platforms where this is possible. 455 456 <a name="returncodes"></a> 457 Return Codes and Informationals 458 =============================== 459 460 I've made things simple. Almost every function in libcurl returns a CURLcode, 461 that must be `CURLE_OK` if everything is OK or otherwise a suitable error 462 code as the curl/curl.h include file defines. The very spot that detects an 463 error must use the `Curl_failf()` function to set the human-readable error 464 description. 465 466 In aiding the user to understand what's happening and to debug curl usage, we 467 must supply a fair number of informational messages by using the 468 `Curl_infof()` function. Those messages are only displayed when the user 469 explicitly asks for them. They are best used when revealing information that 470 isn't otherwise obvious. 471 472 <a name="abi"></a> 473 API/ABI 474 ======= 475 476 We make an effort to not export or show internals or how internals work, as 477 that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI 478 for our promise to users. 479 480 <a name="client"></a> 481 Client 482 ====== 483 484 main() resides in `src/tool_main.c`. 485 486 `src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script 487 to display the complete "manual" and the `src/tool_urlglob.c` file holds the 488 functions used for the URL-"globbing" support. Globbing in the sense that the 489 {} and [] expansion stuff is there. 490 491 The client mostly sets up its 'config' struct properly, then 492 it calls the `curl_easy_*()` functions of the library and when it gets back 493 control after the `curl_easy_perform()` it cleans up the library, checks 494 status and exits. 495 496 When the operation is done, the ourWriteOut() function in src/writeout.c may 497 be called to report about the operation. That function is using the 498 `curl_easy_getinfo()` function to extract useful information from the curl 499 session. 500 501 It may loop and do all this several times if many URLs were specified on the 502 command line or config file. 503 504 <a name="memorydebug"></a> 505 Memory Debugging 506 ================ 507 508 The file lib/memdebug.c contains debug-versions of a few functions. Functions 509 such as malloc, free, fopen, fclose, etc that somehow deal with resources 510 that might give us problems if we "leak" them. The functions in the memdebug 511 system do nothing fancy, they do their normal function and then log 512 information about what they just did. The logged data can then be analyzed 513 after a complete session, 514 515 memanalyze.pl is the perl script present in tests/ that analyzes a log file 516 generated by the memory tracking system. It detects if resources are 517 allocated but never freed and other kinds of errors related to resource 518 management. 519 520 Internally, definition of preprocessor symbol DEBUGBUILD restricts code which 521 is only compiled for debug enabled builds. And symbol CURLDEBUG is used to 522 differentiate code which is _only_ used for memory tracking/debugging. 523 524 Use -DCURLDEBUG when compiling to enable memory debugging, this is also 525 switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD 526 when compiling to enable a debug build or run configure with --enable-debug. 527 528 curl --version will list 'Debug' feature for debug enabled builds, and 529 will list 'TrackMemory' feature for curl debug memory tracking capable 530 builds. These features are independent and can be controlled when running 531 the configure script. When --enable-debug is given both features will be 532 enabled, unless some restriction prevents memory tracking from being used. 533 534 <a name="test"></a> 535 Test Suite 536 ========== 537 538 The test suite is placed in its own subdirectory directly off the root in the 539 curl archive tree, and it contains a bunch of scripts and a lot of test case 540 data. 541 542 The main test script is runtests.pl that will invoke test servers like 543 httpserver.pl and ftpserver.pl before all the test cases are performed. The 544 test suite currently only runs on Unix-like platforms. 545 546 You'll find a description of the test suite in the tests/README file, and the 547 test case data files in the tests/FILEFORMAT file. 548 549 The test suite automatically detects if curl was built with the memory 550 debugging enabled, and if it was, it will detect memory leaks, too. 551 552 <a name="asyncdns"></a> 553 Asynchronous name resolves 554 ========================== 555 556 libcurl can be built to do name resolves asynchronously, using either the 557 normal resolver in a threaded manner or by using c-ares. 558 559 <a name="cares"></a> 560 [c-ares][3] 561 ------ 562 563 ### Build libcurl to use a c-ares 564 565 1. ./configure --enable-ares=/path/to/ares/install 566 2. make 567 568 ### c-ares on win32 569 570 First I compiled c-ares. I changed the default C runtime library to be the 571 single-threaded rather than the multi-threaded (this seems to be required to 572 prevent linking errors later on). Then I simply build the areslib project 573 (the other projects adig/ahost seem to fail under MSVC). 574 575 Next was libcurl. I opened lib/config-win32.h and I added a: 576 `#define USE_ARES 1` 577 578 Next thing I did was I added the path for the ares includes to the include 579 path, and the libares.lib to the libraries. 580 581 Lastly, I also changed libcurl to be single-threaded rather than 582 multi-threaded, again this was to prevent some duplicate symbol errors. I'm 583 not sure why I needed to change everything to single-threaded, but when I 584 didn't I got redefinition errors for several CRT functions (malloc, stricmp, 585 etc.) 586 587 <a name="curl_off_t"></a> 588 `curl_off_t` 589 ========== 590 591 `curl_off_t` is a data type provided by the external libcurl include 592 headers. It is the type meant to be used for the [`curl_easy_setopt()`][1] 593 options that end with LARGE. The type is 64bit large on most modern 594 platforms. 595 596 curlx 597 ===== 598 599 The libcurl source code offers a few functions by source only. They are not 600 part of the official libcurl API, but the source files might be useful for 601 others so apps can optionally compile/build with these sources to gain 602 additional functions. 603 604 We provide them through a single header file for easy access for apps: 605 "curlx.h" 606 607 `curlx_strtoofft()` 608 ------------------- 609 A macro that converts a string containing a number to a `curl_off_t` number. 610 This might use the `curlx_strtoll()` function which is provided as source 611 code in strtoofft.c. Note that the function is only provided if no 612 strtoll() (or equivalent) function exist on your platform. If `curl_off_t` 613 is only a 32 bit number on your platform, this macro uses strtol(). 614 615 Future 616 ------ 617 618 Several functions will be removed from the public `curl_` name space in a 619 future libcurl release. They will then only become available as `curlx_` 620 functions instead. To make the transition easier, we already today provide 621 these functions with the `curlx_` prefix to allow sources to be built 622 properly with the new function names. The concerned functions are: 623 624 - `curlx_getenv` 625 - `curlx_strequal` 626 - `curlx_strnequal` 627 - `curlx_mvsnprintf` 628 - `curlx_msnprintf` 629 - `curlx_maprintf` 630 - `curlx_mvaprintf` 631 - `curlx_msprintf` 632 - `curlx_mprintf` 633 - `curlx_mfprintf` 634 - `curlx_mvsprintf` 635 - `curlx_mvprintf` 636 - `curlx_mvfprintf` 637 638 <a name="contentencoding"></a> 639 Content Encoding 640 ================ 641 642 ## About content encodings 643 644 [HTTP/1.1][4] specifies that a client may request that a server encode its 645 response. This is usually used to compress a response using one (or more) 646 encodings from a set of commonly available compression techniques. These 647 schemes include 'deflate' (the zlib algorithm), 'gzip' 'br' (brotli) and 648 'compress'. A client requests that the server perform an encoding by including 649 an Accept-Encoding header in the request document. The value of the header 650 should be one of the recognized tokens 'deflate', ... (there's a way to 651 register new schemes/tokens, see sec 3.5 of the spec). A server MAY honor 652 the client's encoding request. When a response is encoded, the server 653 includes a Content-Encoding header in the response. The value of the 654 Content-Encoding header indicates which encodings were used to encode the 655 data, in the order in which they were applied. 656 657 It's also possible for a client to attach priorities to different schemes so 658 that the server knows which it prefers. See sec 14.3 of RFC 2616 for more 659 information on the Accept-Encoding header. See sec [3.1.2.2 of RFC 7231][15] 660 for more information on the Content-Encoding header. 661 662 ## Supported content encodings 663 664 The 'deflate', 'gzip' and 'br' content encodings are supported by libcurl. 665 Both regular and chunked transfers work fine. The zlib library is required 666 for the 'deflate' and 'gzip' encodings, while the brotli decoding library is 667 for the 'br' encoding. 668 669 ## The libcurl interface 670 671 To cause libcurl to request a content encoding use: 672 673 [`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string) 674 675 where string is the intended value of the Accept-Encoding header. 676 677 Currently, libcurl does support multiple encodings but only 678 understands how to process responses that use the "deflate", "gzip" and/or 679 "br" content encodings, so the only values for [`CURLOPT_ACCEPT_ENCODING`][5] 680 that will work (besides "identity," which does nothing) are "deflate", 681 "gzip" and "br". If a response is encoded using the "compress" or methods, 682 libcurl will return an error indicating that the response could 683 not be decoded. If <string> is NULL no Accept-Encoding header is generated. 684 If <string> is a zero-length string, then an Accept-Encoding header 685 containing all supported encodings will be generated. 686 687 The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for 688 content to be automatically decoded. If it is not set and the server still 689 sends encoded content (despite not having been asked), the data is returned 690 in its raw form and the Content-Encoding type is not checked. 691 692 ## The curl interface 693 694 Use the [--compressed][6] option with curl to cause it to ask servers to 695 compress responses using any format supported by curl. 696 697 <a name="hostip"></a> 698 hostip.c explained 699 ================== 700 701 The main compile-time defines to keep in mind when reading the host*.c source 702 file are these: 703 704 ## `CURLRES_IPV6` 705 706 this host has getaddrinfo() and family, and thus we use that. The host may 707 not be able to resolve IPv6, but we don't really have to take that into 708 account. Hosts that aren't IPv6-enabled have `CURLRES_IPV4` defined. 709 710 ## `CURLRES_ARES` 711 712 is defined if libcurl is built to use c-ares for asynchronous name 713 resolves. This can be Windows or *nix. 714 715 ## `CURLRES_THREADED` 716 717 is defined if libcurl is built to use threading for asynchronous name 718 resolves. The name resolve will be done in a new thread, and the supported 719 asynch API will be the same as for ares-builds. This is the default under 720 (native) Windows. 721 722 If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If 723 libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is 724 defined. 725 726 ## host*.c sources 727 728 The host*.c sources files are split up like this: 729 730 - hostip.c - method-independent resolver functions and utility functions 731 - hostasyn.c - functions for asynchronous name resolves 732 - hostsyn.c - functions for synchronous name resolves 733 - asyn-ares.c - functions for asynchronous name resolves using c-ares 734 - asyn-thread.c - functions for asynchronous name resolves using threads 735 - hostip4.c - IPv4 specific functions 736 - hostip6.c - IPv6 specific functions 737 738 The hostip.h is the single united header file for all this. It defines the 739 `CURLRES_*` defines based on the config*.h and `curl_setup.h` defines. 740 741 <a name="memoryleak"></a> 742 Track Down Memory Leaks 743 ======================= 744 745 ## Single-threaded 746 747 Please note that this memory leak system is not adjusted to work in more 748 than one thread. If you want/need to use it in a multi-threaded app. Please 749 adjust accordingly. 750 751 752 ## Build 753 754 Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with 755 --enable-debug fixes this). 'make clean' first, then 'make' so that all 756 files are actually rebuilt properly. It will also make sense to build 757 libcurl with the debug option (usually -g to the compiler) so that debugging 758 it will be easier if you actually do find a leak in the library. 759 760 This will create a library that has memory debugging enabled. 761 762 ## Modify Your Application 763 764 Add a line in your application code: 765 766 `curl_memdebug("dump");` 767 768 This will make the malloc debug system output a full trace of all resource 769 using functions to the given file name. Make sure you rebuild your program 770 and that you link with the same libcurl you built for this purpose as 771 described above. 772 773 ## Run Your Application 774 775 Run your program as usual. Watch the specified memory trace file grow. 776 777 Make your program exit and use the proper libcurl cleanup functions etc. So 778 that all non-leaks are returned/freed properly. 779 780 ## Analyze the Flow 781 782 Use the tests/memanalyze.pl perl script to analyze the dump file: 783 784 tests/memanalyze.pl dump 785 786 This now outputs a report on what resources that were allocated but never 787 freed etc. This report is very fine for posting to the list! 788 789 If this doesn't produce any output, no leak was detected in libcurl. Then 790 the leak is mostly likely to be in your code. 791 792 <a name="multi_socket"></a> 793 `multi_socket` 794 ============== 795 796 Implementation of the `curl_multi_socket` API 797 798 The main ideas of this API are simply: 799 800 1 - The application can use whatever event system it likes as it gets info 801 from libcurl about what file descriptors libcurl waits for what action 802 on. (The previous API returns `fd_sets` which is very select()-centric). 803 804 2 - When the application discovers action on a single socket, it calls 805 libcurl and informs that there was action on this particular socket and 806 libcurl can then act on that socket/transfer only and not care about 807 any other transfers. (The previous API always had to scan through all 808 the existing transfers.) 809 810 The idea is that [`curl_multi_socket_action()`][7] calls a given callback 811 with information about what socket to wait for what action on, and the 812 callback only gets called if the status of that socket has changed. 813 814 We also added a timer callback that makes libcurl call the application when 815 the timeout value changes, and you set that with [`curl_multi_setopt()`][9] 816 and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work, 817 Internally, there's an added struct to each easy handle in which we store 818 an "expire time" (if any). The structs are then "splay sorted" so that we 819 can add and remove times from the linked list and yet somewhat swiftly 820 figure out both how long there is until the next nearest timer expires 821 and which timer (handle) we should take care of now. Of course, the upside 822 of all this is that we get a [`curl_multi_timeout()`][8] that should also 823 work with old-style applications that use [`curl_multi_perform()`][11]. 824 825 We created an internal "socket to easy handles" hash table that given 826 a socket (file descriptor) returns the easy handle that waits for action on 827 that socket. This hash is made using the already existing hash code 828 (previously only used for the DNS cache). 829 830 To make libcurl able to report plain sockets in the socket callback, we had 831 to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that 832 the conversion from sockets to `fd_sets` for that function is only done in 833 the last step before the data is returned. I also had to extend c-ares to 834 get a function that can return plain sockets, as that library too returned 835 only `fd_sets` and that is no longer good enough. The changes done to c-ares 836 are available in c-ares 1.3.1 and later. 837 838 <a name="structs"></a> 839 Structs in libcurl 840 ================== 841 842 This section should cover 7.32.0 pretty accurately, but will make sense even 843 for older and later versions as things don't change drastically that often. 844 845 ## Curl_easy 846 847 The `Curl_easy` struct is the one returned to the outside in the external API 848 as a "CURL *". This is usually known as an easy handle in API documentations 849 and examples. 850 851 Information and state that is related to the actual connection is in the 852 'connectdata' struct. When a transfer is about to be made, libcurl will 853 either create a new connection or re-use an existing one. The particular 854 connectdata that is used by this handle is pointed out by 855 `Curl_easy->easy_conn`. 856 857 Data and information that regard this particular single transfer is put in 858 the SingleRequest sub-struct. 859 860 When the `Curl_easy` struct is added to a multi handle, as it must be in 861 order to do any transfer, the ->multi member will point to the `Curl_multi` 862 struct it belongs to. The ->prev and ->next members will then be used by the 863 multi code to keep a linked list of `Curl_easy` structs that are added to 864 that same multi handle. libcurl always uses multi so ->multi *will* point to 865 a `Curl_multi` when a transfer is in progress. 866 867 ->mstate is the multi state of this particular `Curl_easy`. When 868 `multi_runsingle()` is called, it will act on this handle according to which 869 state it is in. The mstate is also what tells which sockets to return for a 870 specific `Curl_easy` when [`curl_multi_fdset()`][12] is called etc. 871 872 The libcurl source code generally use the name 'data' for the variable that 873 points to the `Curl_easy`. 874 875 When doing multiplexed HTTP/2 transfers, each `Curl_easy` is associated with 876 an individual stream, sharing the same connectdata struct. Multiplexing 877 makes it even more important to keep things associated with the right thing! 878 879 ## connectdata 880 881 A general idea in libcurl is to keep connections around in a connection 882 "cache" after they have been used in case they will be used again and then 883 re-use an existing one instead of creating a new as it creates a significant 884 performance boost. 885 886 Each 'connectdata' identifies a single physical connection to a server. If 887 the connection can't be kept alive, the connection will be closed after use 888 and then this struct can be removed from the cache and freed. 889 890 Thus, the same `Curl_easy` can be used multiple times and each time select 891 another connectdata struct to use for the connection. Keep this in mind, as 892 it is then important to consider if options or choices are based on the 893 connection or the `Curl_easy`. 894 895 Functions in libcurl will assume that connectdata->data points to the 896 `Curl_easy` that uses this connection (for the moment). 897 898 As a special complexity, some protocols supported by libcurl require a 899 special disconnect procedure that is more than just shutting down the 900 socket. It can involve sending one or more commands to the server before 901 doing so. Since connections are kept in the connection cache after use, the 902 original `Curl_easy` may no longer be around when the time comes to shut down 903 a particular connection. For this purpose, libcurl holds a special dummy 904 `closure_handle` `Curl_easy` in the `Curl_multi` struct to use when needed. 905 906 FTP uses two TCP connections for a typical transfer but it keeps both in 907 this single struct and thus can be considered a single connection for most 908 internal concerns. 909 910 The libcurl source code generally use the name 'conn' for the variable that 911 points to the connectdata. 912 913 ## Curl_multi 914 915 Internally, the easy interface is implemented as a wrapper around multi 916 interface functions. This makes everything multi interface. 917 918 `Curl_multi` is the multi handle struct exposed as "CURLM *" in external 919 APIs. 920 921 This struct holds a list of `Curl_easy` structs that have been added to this 922 handle with [`curl_multi_add_handle()`][13]. The start of the list is 923 `->easyp` and `->num_easy` is a counter of added `Curl_easy`s. 924 925 `->msglist` is a linked list of messages to send back when 926 [`curl_multi_info_read()`][14] is called. Basically a node is added to that 927 list when an individual `Curl_easy`'s transfer has completed. 928 929 `->hostcache` points to the name cache. It is a hash table for looking up 930 name to IP. The nodes have a limited life time in there and this cache is 931 meant to reduce the time for when the same name is wanted within a short 932 period of time. 933 934 `->timetree` points to a tree of `Curl_easy`s, sorted by the remaining time 935 until it should be checked - normally some sort of timeout. Each `Curl_easy` 936 has one node in the tree. 937 938 `->sockhash` is a hash table to allow fast lookups of socket descriptor for 939 which `Curl_easy` uses that descriptor. This is necessary for the 940 `multi_socket` API. 941 942 `->conn_cache` points to the connection cache. It keeps track of all 943 connections that are kept after use. The cache has a maximum size. 944 945 `->closure_handle` is described in the 'connectdata' section. 946 947 The libcurl source code generally use the name 'multi' for the variable that 948 points to the `Curl_multi` struct. 949 950 ## Curl_handler 951 952 Each unique protocol that is supported by libcurl needs to provide at least 953 one `Curl_handler` struct. It defines what the protocol is called and what 954 functions the main code should call to deal with protocol specific issues. 955 In general, there's a source file named [protocol].c in which there's a 956 "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's 957 then the main array with all individual `Curl_handler` structs pointed to 958 from a single array which is scanned through when a URL is given to libcurl 959 to work with. 960 961 `->scheme` is the URL scheme name, usually spelled out in uppercase. That's 962 "HTTP" or "FTP" etc. SSL versions of the protocol need their own `Curl_handler` setup so HTTPS separate from HTTP. 963 964 `->setup_connection` is called to allow the protocol code to allocate 965 protocol specific data that then gets associated with that `Curl_easy` for 966 the rest of this transfer. It gets freed again at the end of the transfer. 967 It will be called before the 'connectdata' for the transfer has been 968 selected/created. Most protocols will allocate its private 969 'struct [PROTOCOL]' here and assign `Curl_easy->req.protop` to point to it. 970 971 `->connect_it` allows a protocol to do some specific actions after the TCP 972 connect is done, that can still be considered part of the connection phase. 973 974 Some protocols will alter the `connectdata->recv[]` and 975 `connectdata->send[]` function pointers in this function. 976 977 `->connecting` is similarly a function that keeps getting called as long as 978 the protocol considers itself still in the connecting phase. 979 980 `->do_it` is the function called to issue the transfer request. What we call 981 the DO action internally. If the DO is not enough and things need to be kept 982 getting done for the entire DO sequence to complete, `->doing` is then 983 usually also provided. Each protocol that needs to do multiple commands or 984 similar for do/doing need to implement their own state machines (see SCP, 985 SFTP, FTP). Some protocols (only FTP and only due to historical reasons) has 986 a separate piece of the DO state called `DO_MORE`. 987 988 `->doing` keeps getting called while issuing the transfer request command(s) 989 990 `->done` gets called when the transfer is complete and DONE. That's after the 991 main data has been transferred. 992 993 `->do_more` gets called during the `DO_MORE` state. The FTP protocol uses 994 this state when setting up the second connection. 995 996 ->`proto_getsock` 997 ->`doing_getsock` 998 ->`domore_getsock` 999 ->`perform_getsock` 1000 Functions that return socket information. Which socket(s) to wait for which 1001 action(s) during the particular multi state. 1002 1003 ->disconnect is called immediately before the TCP connection is shutdown. 1004 1005 ->readwrite gets called during transfer to allow the protocol to do extra 1006 reads/writes 1007 1008 ->defport is the default report TCP or UDP port this protocol uses 1009 1010 ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions 1011 have their "base" protocol set and then the SSL variation. Like 1012 "HTTP|HTTPS". 1013 1014 ->flags is a bitmask with additional information about the protocol that will 1015 make it get treated differently by the generic engine: 1016 1017 - `PROTOPT_SSL` - will make it connect and negotiate SSL 1018 1019 - `PROTOPT_DUAL` - this protocol uses two connections 1020 1021 - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the 1022 connection. This flag is no longer used by code, yet still set for a bunch 1023 of protocol handlers. 1024 1025 - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to 1026 limit which "direction" of socket actions that the main engine will 1027 concern itself with. 1028 1029 - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:) 1030 1031 - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default 1032 one unless one is provided 1033 1034 - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL 1035 (?foo=bar) 1036 1037 ## conncache 1038 1039 Is a hash table with connections for later re-use. Each `Curl_easy` has a 1040 pointer to its connection cache. Each multi handle sets up a connection 1041 cache that all added `Curl_easy`s share by default. 1042 1043 ## Curl_share 1044 1045 The libcurl share API allocates a `Curl_share` struct, exposed to the 1046 external API as "CURLSH *". 1047 1048 The idea is that the struct can have a set of its own versions of caches and 1049 pools and then by providing this struct in the `CURLOPT_SHARE` option, those 1050 specific `Curl_easy`s will use the caches/pools that this share handle 1051 holds. 1052 1053 Then individual `Curl_easy` structs can be made to share specific things 1054 that they otherwise wouldn't, such as cookies. 1055 1056 The `Curl_share` struct can currently hold cookies, DNS cache and the SSL 1057 session cache. 1058 1059 ## CookieInfo 1060 1061 This is the main cookie struct. It holds all known cookies and related 1062 information. Each `Curl_easy` has its own private CookieInfo even when 1063 they are added to a multi handle. They can be made to share cookies by using 1064 the share API. 1065 1066 1067 [1]: https://curl.haxx.se/libcurl/c/curl_easy_setopt.html 1068 [2]: https://curl.haxx.se/libcurl/c/curl_easy_init.html 1069 [3]: https://c-ares.haxx.se/ 1070 [4]: https://tools.ietf.org/html/rfc7230 "RFC 7230" 1071 [5]: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html 1072 [6]: https://curl.haxx.se/docs/manpage.html#--compressed 1073 [7]: https://curl.haxx.se/libcurl/c/curl_multi_socket_action.html 1074 [8]: https://curl.haxx.se/libcurl/c/curl_multi_timeout.html 1075 [9]: https://curl.haxx.se/libcurl/c/curl_multi_setopt.html 1076 [10]: https://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html 1077 [11]: https://curl.haxx.se/libcurl/c/curl_multi_perform.html 1078 [12]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html 1079 [13]: https://curl.haxx.se/libcurl/c/curl_multi_add_handle.html 1080 [14]: https://curl.haxx.se/libcurl/c/curl_multi_info_read.html 1081 [15]: https://tools.ietf.org/html/rfc7231#section-3.1.2.2 1082