1 _ _ ____ _ 2 ___| | | | _ \| | 3 / __| | | | |_) | | 4 | (__| |_| | _ <| |___ 5 \___|\___/|_| \_\_____| 6 7 8 The Art Of Scripting HTTP Requests Using Curl 9 10 1. HTTP Scripting 11 1.1 Background 12 1.2 The HTTP Protocol 13 1.3 See the Protocol 14 1.4 See the Timing 15 1.5 See the Response 16 2. URL 17 2.1 Spec 18 2.2 Host 19 2.3 Port number 20 2.4 User name and password 21 2.5 Path part 22 3. Fetch a page 23 3.1 GET 24 3.2 HEAD 25 3.3 Multiple URLs in a single command line 26 3.4 Multiple HTTP methods in a single command line 27 4. HTML forms 28 4.1 Forms explained 29 4.2 GET 30 4.3 POST 31 4.4 File Upload POST 32 4.5 Hidden Fields 33 4.6 Figure Out What A POST Looks Like 34 5. HTTP upload 35 5.1 PUT 36 6. HTTP Authentication 37 6.1 Basic Authentication 38 6.2 Other Authentication 39 6.3 Proxy Authentication 40 6.4 Hiding credentials 41 7. More HTTP Headers 42 7.1 Referer 43 7.2 User Agent 44 8. Redirects 45 8.1 Location header 46 8.2 Other redirects 47 9. Cookies 48 9.1 Cookie Basics 49 9.2 Cookie options 50 10. HTTPS 51 10.1 HTTPS is HTTP secure 52 10.2 Certificates 53 11. Custom Request Elements 54 11.1 Modify method and headers 55 11.2 More on changed methods 56 12. Web Login 57 12.1 Some login tricks 58 13. Debug 59 13.1 Some debug tricks 60 14. References 61 14.1 Standards 62 14.2 Sites 63 64 ============================================================================== 65 66 1. HTTP Scripting 67 68 1.1 Background 69 70 This document assumes that you're familiar with HTML and general networking. 71 72 The increasing amount of applications moving to the web has made "HTTP 73 Scripting" more frequently requested and wanted. To be able to automatically 74 extract information from the web, to fake users, to post or upload data to 75 web servers are all important tasks today. 76 77 Curl is a command line tool for doing all sorts of URL manipulations and 78 transfers, but this particular document will focus on how to use it when 79 doing HTTP requests for fun and profit. I'll assume that you know how to 80 invoke 'curl --help' or 'curl --manual' to get basic information about it. 81 82 Curl is not written to do everything for you. It makes the requests, it gets 83 the data, it sends data and it retrieves the information. You probably need 84 to glue everything together using some kind of script language or repeated 85 manual invokes. 86 87 1.2 The HTTP Protocol 88 89 HTTP is the protocol used to fetch data from web servers. It is a very simple 90 protocol that is built upon TCP/IP. The protocol also allows information to 91 get sent to the server from the client using a few different methods, as will 92 be shown here. 93 94 HTTP is plain ASCII text lines being sent by the client to a server to 95 request a particular action, and then the server replies a few text lines 96 before the actual requested content is sent to the client. 97 98 The client, curl, sends a HTTP request. The request contains a method (like 99 GET, POST, HEAD etc), a number of request headers and sometimes a request 100 body. The HTTP server responds with a status line (indicating if things went 101 well), response headers and most often also a response body. The "body" part 102 is the plain data you requested, like the actual HTML or the image etc. 103 104 1.3 See the Protocol 105 106 Using curl's option --verbose (-v as a short option) will display what kind 107 of commands curl sends to the server, as well as a few other informational 108 texts. 109 110 --verbose is the single most useful option when it comes to debug or even 111 understand the curl<->server interaction. 112 113 Sometimes even --verbose is not enough. Then --trace and --trace-ascii offer 114 even more details as they show EVERYTHING curl sends and receives. Use it 115 like this: 116 117 curl --trace-ascii debugdump.txt http://www.example.com/ 118 119 1.4 See the Timing 120 121 Many times you may wonder what exactly is taking all the time, or you just 122 want to know the amount of milliseconds between two points in a 123 transfer. For those, and other similar situations, the --trace-time option 124 is what you need. It'll prepend the time to each trace output line: 125 126 curl --trace-ascii d.txt --trace-time http://example.com/ 127 128 1.5 See the Response 129 130 By default curl sends the response to stdout. You need to redirect it 131 somewhere to avoid that, most often that is done with -o or -O. 132 133 2. URL 134 135 2.1 Spec 136 137 The Uniform Resource Locator format is how you specify the address of a 138 particular resource on the Internet. You know these, you've seen URLs like 139 https://curl.haxx.se or https://yourbank.com a million times. RFC 3986 is the 140 canonical spec. And yeah, the formal name is not URL, it is URI. 141 142 2.2 Host 143 144 The host name is usually resolved using DNS or your /etc/hosts file to an IP 145 address and that's what curl will communicate with. Alternatively you specify 146 the IP address directly in the URL instead of a name. 147 148 For development and other trying out situation, you can point out a different 149 IP address for a host name than what would otherwise be used, by using curl's 150 --resolve option: 151 152 curl --resolve www.example.org:80:127.0.0.1 http://www.example.org/ 153 154 2.3 Port number 155 156 Each protocol curl supports operate on a default port number, be it over TCP 157 or in some cases UDP. Normally you don't have to take that into 158 consideration, but at times you run test servers on other ports or 159 similar. Then you can specify the port number in the URL with a colon and a 160 number immediately following the host name. Like when doing HTTP to port 161 1234: 162 163 curl http://www.example.org:1234/ 164 165 The port number you specify in the URL is the number that the server uses to 166 offer its services. Sometimes you may use a local proxy, and then you may 167 need to specify that proxy's port number separate on what curl needs to 168 connect to locally. Like when using a HTTP proxy on port 4321: 169 170 curl --proxy http://proxy.example.org:4321 http://remote.example.org/ 171 172 2.4 User name and password 173 174 Some services are setup to require HTTP authentication and then you need to 175 provide name and password which then is transferred to the remote site in 176 various ways depending on the exact authentication protocol used. 177 178 You can opt to either insert the user and password in the URL or you can 179 provide them separately: 180 181 curl http://user:password@example.org/ 182 183 or 184 185 curl -u user:password http://example.org/ 186 187 You need to pay attention that this kind of HTTP authentication is not what 188 is usually done and requested by user-oriented web sites these days. They 189 tend to use forms and cookies instead. 190 191 2.5 Path part 192 193 The path part is just sent off to the server to request that it sends back 194 the associated response. The path is what is to the right side of the slash 195 that follows the host name and possibly port number. 196 197 3. Fetch a page 198 199 3.1 GET 200 201 The simplest and most common request/operation made using HTTP is to get a 202 URL. The URL could itself refer to a web page, an image or a file. The client 203 issues a GET request to the server and receives the document it asked for. 204 If you issue the command line 205 206 curl https://curl.haxx.se 207 208 you get a web page returned in your terminal window. The entire HTML document 209 that that URL holds. 210 211 All HTTP replies contain a set of response headers that are normally hidden, 212 use curl's --include (-i) option to display them as well as the rest of the 213 document. 214 215 3.2 HEAD 216 217 You can ask the remote server for ONLY the headers by using the --head (-I) 218 option which will make curl issue a HEAD request. In some special cases 219 servers deny the HEAD method while others still work, which is a particular 220 kind of annoyance. 221 222 The HEAD method is defined and made so that the server returns the headers 223 exactly the way it would do for a GET, but without a body. It means that you 224 may see a Content-Length: in the response headers, but there must not be an 225 actual body in the HEAD response. 226 227 3.3 Multiple URLs in a single command line 228 229 A single curl command line may involve one or many URLs. The most common case 230 is probably to just use one, but you can specify any amount of URLs. Yes 231 any. No limits. You'll then get requests repeated over and over for all the 232 given URLs. 233 234 Example, send two GETs: 235 236 curl http://url1.example.com http://url2.example.com 237 238 If you use --data to POST to the URL, using multiple URLs means that you send 239 that same POST to all the given URLs. 240 241 Example, send two POSTs: 242 243 curl --data name=curl http://url1.example.com http://url2.example.com 244 245 246 3.4 Multiple HTTP methods in a single command line 247 248 Sometimes you need to operate on several URLs in a single command line and do 249 different HTTP methods on each. For this, you'll enjoy the --next option. It 250 is basically a separator that separates a bunch of options from the next. All 251 the URLs before --next will get the same method and will get all the POST 252 data merged into one. 253 254 When curl reaches the --next on the command line, it'll sort of reset the 255 method and the POST data and allow a new set. 256 257 Perhaps this is best shown with a few examples. To send first a HEAD and then 258 a GET: 259 260 curl -I http://example.com --next http://example.com 261 262 To first send a POST and then a GET: 263 264 curl -d score=10 http://example.com/post.cgi --next http://example.com/results.html 265 266 267 4. HTML forms 268 269 4.1 Forms explained 270 271 Forms are the general way a web site can present a HTML page with fields for 272 the user to enter data in, and then press some kind of 'OK' or 'submit' 273 button to get that data sent to the server. The server then typically uses 274 the posted data to decide how to act. Like using the entered words to search 275 in a database, or to add the info in a bug track system, display the entered 276 address on a map or using the info as a login-prompt verifying that the user 277 is allowed to see what it is about to see. 278 279 Of course there has to be some kind of program in the server end to receive 280 the data you send. You cannot just invent something out of the air. 281 282 4.2 GET 283 284 A GET-form uses the method GET, as specified in HTML like: 285 286 <form method="GET" action="junk.cgi"> 287 <input type=text name="birthyear"> 288 <input type=submit name=press value="OK"> 289 </form> 290 291 In your favorite browser, this form will appear with a text box to fill in 292 and a press-button labeled "OK". If you fill in '1905' and press the OK 293 button, your browser will then create a new URL to get for you. The URL will 294 get "junk.cgi?birthyear=1905&press=OK" appended to the path part of the 295 previous URL. 296 297 If the original form was seen on the page "www.hotmail.com/when/birth.html", 298 the second page you'll get will become 299 "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK". 300 301 Most search engines work this way. 302 303 To make curl do the GET form post for you, just enter the expected created 304 URL: 305 306 curl "http://www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK" 307 308 4.3 POST 309 310 The GET method makes all input field names get displayed in the URL field of 311 your browser. That's generally a good thing when you want to be able to 312 bookmark that page with your given data, but it is an obvious disadvantage 313 if you entered secret information in one of the fields or if there are a 314 large amount of fields creating a very long and unreadable URL. 315 316 The HTTP protocol then offers the POST method. This way the client sends the 317 data separated from the URL and thus you won't see any of it in the URL 318 address field. 319 320 The form would look very similar to the previous one: 321 322 <form method="POST" action="junk.cgi"> 323 <input type=text name="birthyear"> 324 <input type=submit name=press value=" OK "> 325 </form> 326 327 And to use curl to post this form with the same data filled in as before, we 328 could do it like: 329 330 curl --data "birthyear=1905&press=%20OK%20" \ 331 http://www.example.com/when.cgi 332 333 This kind of POST will use the Content-Type 334 application/x-www-form-urlencoded and is the most widely used POST kind. 335 336 The data you send to the server MUST already be properly encoded, curl will 337 not do that for you. For example, if you want the data to contain a space, 338 you need to replace that space with %20 etc. Failing to comply with this 339 will most likely cause your data to be received wrongly and messed up. 340 341 Recent curl versions can in fact url-encode POST data for you, like this: 342 343 curl --data-urlencode "name=I am Daniel" http://www.example.com 344 345 If you repeat --data several times on the command line, curl will 346 concatenate all the given data pieces - and put a '&' symbol between each 347 data segment. 348 349 4.4 File Upload POST 350 351 Back in late 1995 they defined an additional way to post data over HTTP. It 352 is documented in the RFC 1867, why this method sometimes is referred to as 353 RFC1867-posting. 354 355 This method is mainly designed to better support file uploads. A form that 356 allows a user to upload a file could be written like this in HTML: 357 358 <form method="POST" enctype='multipart/form-data' action="upload.cgi"> 359 <input type=file name=upload> 360 <input type=submit name=press value="OK"> 361 </form> 362 363 This clearly shows that the Content-Type about to be sent is 364 multipart/form-data. 365 366 To post to a form like this with curl, you enter a command line like: 367 368 curl --form upload=@localfilename --form press=OK [URL] 369 370 4.5 Hidden Fields 371 372 A very common way for HTML based application to pass state information 373 between pages is to add hidden fields to the forms. Hidden fields are 374 already filled in, they aren't displayed to the user and they get passed 375 along just as all the other fields. 376 377 A similar example form with one visible field, one hidden field and one 378 submit button could look like: 379 380 <form method="POST" action="foobar.cgi"> 381 <input type=text name="birthyear"> 382 <input type=hidden name="person" value="daniel"> 383 <input type=submit name="press" value="OK"> 384 </form> 385 386 To post this with curl, you won't have to think about if the fields are 387 hidden or not. To curl they're all the same: 388 389 curl --data "birthyear=1905&press=OK&person=daniel" [URL] 390 391 4.6 Figure Out What A POST Looks Like 392 393 When you're about fill in a form and send to a server by using curl instead 394 of a browser, you're of course very interested in sending a POST exactly the 395 way your browser does. 396 397 An easy way to get to see this, is to save the HTML page with the form on 398 your local disk, modify the 'method' to a GET, and press the submit button 399 (you could also change the action URL if you want to). 400 401 You will then clearly see the data get appended to the URL, separated with a 402 '?'-letter as GET forms are supposed to. 403 404 5. HTTP upload 405 406 5.1 PUT 407 408 The perhaps best way to upload data to a HTTP server is to use PUT. Then 409 again, this of course requires that someone put a program or script on the 410 server end that knows how to receive a HTTP PUT stream. 411 412 Put a file to a HTTP server with curl: 413 414 curl --upload-file uploadfile http://www.example.com/receive.cgi 415 416 6. HTTP Authentication 417 418 6.1 Basic Authentication 419 420 HTTP Authentication is the ability to tell the server your username and 421 password so that it can verify that you're allowed to do the request you're 422 doing. The Basic authentication used in HTTP (which is the type curl uses by 423 default) is *plain* *text* based, which means it sends username and password 424 only slightly obfuscated, but still fully readable by anyone that sniffs on 425 the network between you and the remote server. 426 427 To tell curl to use a user and password for authentication: 428 429 curl --user name:password http://www.example.com 430 431 6.2 Other Authentication 432 433 The site might require a different authentication method (check the headers 434 returned by the server), and then --ntlm, --digest, --negotiate or even 435 --anyauth might be options that suit you. 436 437 6.3 Proxy Authentication 438 439 Sometimes your HTTP access is only available through the use of a HTTP 440 proxy. This seems to be especially common at various companies. A HTTP proxy 441 may require its own user and password to allow the client to get through to 442 the Internet. To specify those with curl, run something like: 443 444 curl --proxy-user proxyuser:proxypassword curl.haxx.se 445 446 If your proxy requires the authentication to be done using the NTLM method, 447 use --proxy-ntlm, if it requires Digest use --proxy-digest. 448 449 If you use any one these user+password options but leave out the password 450 part, curl will prompt for the password interactively. 451 452 6.4 Hiding credentials 453 454 Do note that when a program is run, its parameters might be possible to see 455 when listing the running processes of the system. Thus, other users may be 456 able to watch your passwords if you pass them as plain command line 457 options. There are ways to circumvent this. 458 459 It is worth noting that while this is how HTTP Authentication works, very 460 many web sites will not use this concept when they provide logins etc. See 461 the Web Login chapter further below for more details on that. 462 463 7. More HTTP Headers 464 465 7.1 Referer 466 467 A HTTP request may include a 'referer' field (yes it is misspelled), which 468 can be used to tell from which URL the client got to this particular 469 resource. Some programs/scripts check the referer field of requests to verify 470 that this wasn't arriving from an external site or an unknown page. While 471 this is a stupid way to check something so easily forged, many scripts still 472 do it. Using curl, you can put anything you want in the referer-field and 473 thus more easily be able to fool the server into serving your request. 474 475 Use curl to set the referer field with: 476 477 curl --referer http://www.example.come http://www.example.com 478 479 7.2 User Agent 480 481 Very similar to the referer field, all HTTP requests may set the User-Agent 482 field. It names what user agent (client) that is being used. Many 483 applications use this information to decide how to display pages. Silly web 484 programmers try to make different pages for users of different browsers to 485 make them look the best possible for their particular browsers. They usually 486 also do different kinds of javascript, vbscript etc. 487 488 At times, you will see that getting a page with curl will not return the same 489 page that you see when getting the page with your browser. Then you know it 490 is time to set the User Agent field to fool the server into thinking you're 491 one of those browsers. 492 493 To make curl look like Internet Explorer 5 on a Windows 2000 box: 494 495 curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL] 496 497 Or why not look like you're using Netscape 4.73 on an old Linux box: 498 499 curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL] 500 501 8. Redirects 502 503 8.1 Location header 504 505 When a resource is requested from a server, the reply from the server may 506 include a hint about where the browser should go next to find this page, or a 507 new page keeping newly generated output. The header that tells the browser 508 to redirect is Location:. 509 510 Curl does not follow Location: headers by default, but will simply display 511 such pages in the same manner it display all HTTP replies. It does however 512 feature an option that will make it attempt to follow the Location: pointers. 513 514 To tell curl to follow a Location: 515 516 curl --location http://www.example.com 517 518 If you use curl to POST to a site that immediately redirects you to another 519 page, you can safely use --location (-L) and --data/--form together. Curl will 520 only use POST in the first request, and then revert to GET in the following 521 operations. 522 523 8.2 Other redirects 524 525 Browser typically support at least two other ways of redirects that curl 526 doesn't: first the html may contain a meta refresh tag that asks the browser 527 to load a specific URL after a set number of seconds, or it may use 528 javascript to do it. 529 530 9. Cookies 531 532 9.1 Cookie Basics 533 534 The way the web browsers do "client side state control" is by using 535 cookies. Cookies are just names with associated contents. The cookies are 536 sent to the client by the server. The server tells the client for what path 537 and host name it wants the cookie sent back, and it also sends an expiration 538 date and a few more properties. 539 540 When a client communicates with a server with a name and path as previously 541 specified in a received cookie, the client sends back the cookies and their 542 contents to the server, unless of course they are expired. 543 544 Many applications and servers use this method to connect a series of requests 545 into a single logical session. To be able to use curl in such occasions, we 546 must be able to record and send back cookies the way the web application 547 expects them. The same way browsers deal with them. 548 549 9.2 Cookie options 550 551 The simplest way to send a few cookies to the server when getting a page with 552 curl is to add them on the command line like: 553 554 curl --cookie "name=Daniel" http://www.example.com 555 556 Cookies are sent as common HTTP headers. This is practical as it allows curl 557 to record cookies simply by recording headers. Record cookies with curl by 558 using the --dump-header (-D) option like: 559 560 curl --dump-header headers_and_cookies http://www.example.com 561 562 (Take note that the --cookie-jar option described below is a better way to 563 store cookies.) 564 565 Curl has a full blown cookie parsing engine built-in that comes to use if you 566 want to reconnect to a server and use cookies that were stored from a 567 previous connection (or hand-crafted manually to fool the server into 568 believing you had a previous connection). To use previously stored cookies, 569 you run curl like: 570 571 curl --cookie stored_cookies_in_file http://www.example.com 572 573 Curl's "cookie engine" gets enabled when you use the --cookie option. If you 574 only want curl to understand received cookies, use --cookie with a file that 575 doesn't exist. Example, if you want to let curl understand cookies from a 576 page and follow a location (and thus possibly send back cookies it received), 577 you can invoke it like: 578 579 curl --cookie nada --location http://www.example.com 580 581 Curl has the ability to read and write cookie files that use the same file 582 format that Netscape and Mozilla once used. It is a convenient way to share 583 cookies between scripts or invokes. The --cookie (-b) switch automatically 584 detects if a given file is such a cookie file and parses it, and by using the 585 --cookie-jar (-c) option you'll make curl write a new cookie file at the end 586 of an operation: 587 588 curl --cookie cookies.txt --cookie-jar newcookies.txt \ 589 http://www.example.com 590 591 10. HTTPS 592 593 10.1 HTTPS is HTTP secure 594 595 There are a few ways to do secure HTTP transfers. The by far most common 596 protocol for doing this is what is generally known as HTTPS, HTTP over 597 SSL. SSL encrypts all the data that is sent and received over the network and 598 thus makes it harder for attackers to spy on sensitive information. 599 600 SSL (or TLS as the latest version of the standard is called) offers a 601 truckload of advanced features to allow all those encryptions and key 602 infrastructure mechanisms encrypted HTTP requires. 603 604 Curl supports encrypted fetches when built to use a TLS library and it can be 605 built to use one out of a fairly large set of libraries - "curl -V" will show 606 which one your curl was built to use (if any!). To get a page from a HTTPS 607 server, simply run curl like: 608 609 curl https://secure.example.com 610 611 10.2 Certificates 612 613 In the HTTPS world, you use certificates to validate that you are the one 614 you claim to be, as an addition to normal passwords. Curl supports client- 615 side certificates. All certificates are locked with a pass phrase, which you 616 need to enter before the certificate can be used by curl. The pass phrase 617 can be specified on the command line or if not, entered interactively when 618 curl queries for it. Use a certificate with curl on a HTTPS server like: 619 620 curl --cert mycert.pem https://secure.example.com 621 622 curl also tries to verify that the server is who it claims to be, by 623 verifying the server's certificate against a locally stored CA cert 624 bundle. Failing the verification will cause curl to deny the connection. You 625 must then use --insecure (-k) in case you want to tell curl to ignore that 626 the server can't be verified. 627 628 More about server certificate verification and ca cert bundles can be read 629 in the SSLCERTS document, available online here: 630 631 https://curl.haxx.se/docs/sslcerts.html 632 633 At times you may end up with your own CA cert store and then you can tell 634 curl to use that to verify the server's certificate: 635 636 curl --cacert ca-bundle.pem https://example.com/ 637 638 639 11. Custom Request Elements 640 641 11.1 Modify method and headers 642 643 Doing fancy stuff, you may need to add or change elements of a single curl 644 request. 645 646 For example, you can change the POST request to a PROPFIND and send the data 647 as "Content-Type: text/xml" (instead of the default Content-Type) like this: 648 649 curl --data "<xml>" --header "Content-Type: text/xml" \ 650 --request PROPFIND url.com 651 652 You can delete a default header by providing one without content. Like you 653 can ruin the request by chopping off the Host: header: 654 655 curl --header "Host:" http://www.example.com 656 657 You can add headers the same way. Your server may want a "Destination:" 658 header, and you can add it: 659 660 curl --header "Destination: http://nowhere" http://example.com 661 662 11.2 More on changed methods 663 664 It should be noted that curl selects which methods to use on its own 665 depending on what action to ask for. -d will do POST, -I will do HEAD and so 666 on. If you use the --request / -X option you can change the method keyword 667 curl selects, but you will not modify curl's behavior. This means that if you 668 for example use -d "data" to do a POST, you can modify the method to a 669 PROPFIND with -X and curl will still think it sends a POST. You can change 670 the normal GET to a POST method by simply adding -X POST in a command line 671 like: 672 673 curl -X POST http://example.org/ 674 675 ... but curl will still think and act as if it sent a GET so it won't send any 676 request body etc. 677 678 679 12. Web Login 680 681 12.1 Some login tricks 682 683 While not strictly just HTTP related, it still cause a lot of people problems 684 so here's the executive run-down of how the vast majority of all login forms 685 work and how to login to them using curl. 686 687 It can also be noted that to do this properly in an automated fashion, you 688 will most certainly need to script things and do multiple curl invokes etc. 689 690 First, servers mostly use cookies to track the logged-in status of the 691 client, so you will need to capture the cookies you receive in the 692 responses. Then, many sites also set a special cookie on the login page (to 693 make sure you got there through their login page) so you should make a habit 694 of first getting the login-form page to capture the cookies set there. 695 696 Some web-based login systems features various amounts of javascript, and 697 sometimes they use such code to set or modify cookie contents. Possibly they 698 do that to prevent programmed logins, like this manual describes how to... 699 Anyway, if reading the code isn't enough to let you repeat the behavior 700 manually, capturing the HTTP requests done by your browsers and analyzing the 701 sent cookies is usually a working method to work out how to shortcut the 702 javascript need. 703 704 In the actual <form> tag for the login, lots of sites fill-in random/session 705 or otherwise secretly generated hidden tags and you may need to first capture 706 the HTML code for the login form and extract all the hidden fields to be able 707 to do a proper login POST. Remember that the contents need to be URL encoded 708 when sent in a normal POST. 709 710 13. Debug 711 712 13.1 Some debug tricks 713 714 Many times when you run curl on a site, you'll notice that the site doesn't 715 seem to respond the same way to your curl requests as it does to your 716 browser's. 717 718 Then you need to start making your curl requests more similar to your 719 browser's requests: 720 721 * Use the --trace-ascii option to store fully detailed logs of the requests 722 for easier analyzing and better understanding 723 724 * Make sure you check for and use cookies when needed (both reading with 725 --cookie and writing with --cookie-jar) 726 727 * Set user-agent to one like a recent popular browser does 728 729 * Set referer like it is set by the browser 730 731 * If you use POST, make sure you send all the fields and in the same order as 732 the browser does it. 733 734 A very good helper to make sure you do this right, is the LiveHTTPHeader tool 735 that lets you view all headers you send and receive with Mozilla/Firefox 736 (even when using HTTPS). Chrome features similar functionality out of the box 737 among the developer's tools. 738 739 A more raw approach is to capture the HTTP traffic on the network with tools 740 such as ethereal or tcpdump and check what headers that were sent and 741 received by the browser. (HTTPS makes this technique inefficient.) 742 743 14. References 744 745 14.1 Standards 746 747 RFC 7230 is a must to read if you want in-depth understanding of the HTTP 748 protocol 749 750 RFC 3986 explains the URL syntax 751 752 RFC 1867 defines the HTTP post upload format 753 754 RFC 6525 defines how HTTP cookies work 755 756 14.2 Sites 757 758 https://curl.haxx.se is the home of the cURL project 759