Home | History | Annotate | Download | only in gdx2d
      1 /* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h
      2                                      no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45    Revision 2.00 release notes:
     46 
     47       - Progressive JPEG is now supported.
     48 
     49       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
     50 
     51       - x86 platforms now make use of SSE2 SIMD instructions for
     52         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
     53         This work was done by Fabian "ryg" Giesen. SSE2 is used by
     54         default, but NEON must be enabled explicitly; see docs.
     55 
     56         With other JPEG optimizations included in this version, we see
     57         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
     58         on a JPEG on an ARM machine, relative to previous versions of this
     59         library. The same results will not obtain for all JPGs and for all
     60         x86/ARM machines. (Note that progressive JPEGs are significantly
     61         slower to decode than regular JPEGs.) This doesn't mean that this
     62         is the fastest JPEG decoder in the land; rather, it brings it
     63         closer to parity with standard libraries. If you want the fastest
     64         decode, look elsewhere. (See "Philosophy" section of docs below.)
     65 
     66         See final bullet items below for more info on SIMD.
     67 
     68       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
     69         the memory allocator. Unlike other STBI libraries, these macros don't
     70         support a context parameter, so if you need to pass a context in to
     71         the allocator, you'll have to store it in a global or a thread-local
     72         variable.
     73 
     74       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
     75         STBI_NO_LINEAR.
     76             STBI_NO_HDR:     suppress implementation of .hdr reader format
     77             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
     78 
     79       - You can suppress implementation of any of the decoders to reduce
     80         your code footprint by #defining one or more of the following
     81         symbols before creating the implementation.
     82 
     83             STBI_NO_JPEG
     84             STBI_NO_PNG
     85             STBI_NO_BMP
     86             STBI_NO_PSD
     87             STBI_NO_TGA
     88             STBI_NO_GIF
     89             STBI_NO_HDR
     90             STBI_NO_PIC
     91             STBI_NO_PNM   (.ppm and .pgm)
     92 
     93       - You can request *only* certain decoders and suppress all other ones
     94         (this will be more forward-compatible, as addition of new decoders
     95         doesn't require you to disable them explicitly):
     96 
     97             STBI_ONLY_JPEG
     98             STBI_ONLY_PNG
     99             STBI_ONLY_BMP
    100             STBI_ONLY_PSD
    101             STBI_ONLY_TGA
    102             STBI_ONLY_GIF
    103             STBI_ONLY_HDR
    104             STBI_ONLY_PIC
    105             STBI_ONLY_PNM   (.ppm and .pgm)
    106 
    107          Note that you can define multiples of these, and you will get all
    108          of them ("only x" and "only y" is interpreted to mean "only x&y").
    109 
    110        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    111          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    112 
    113       - Compilation of all SIMD code can be suppressed with
    114             #define STBI_NO_SIMD
    115         It should not be necessary to disable SIMD unless you have issues
    116         compiling (e.g. using an x86 compiler which doesn't support SSE
    117         intrinsics or that doesn't support the method used to detect
    118         SSE2 support at run-time), and even those can be reported as
    119         bugs so I can refine the built-in compile-time checking to be
    120         smarter.
    121 
    122       - The old STBI_SIMD system which allowed installing a user-defined
    123         IDCT etc. has been removed. If you need this, don't upgrade. My
    124         assumption is that almost nobody was doing this, and those who
    125         were will find the built-in SIMD more satisfactory anyway.
    126 
    127       - RGB values computed for JPEG images are slightly different from
    128         previous versions of stb_image. (This is due to using less
    129         integer precision in SIMD.) The C code has been adjusted so
    130         that the same RGB values will be computed regardless of whether
    131         SIMD support is available, so your app should always produce
    132         consistent results. But these results are slightly different from
    133         previous versions. (Specifically, about 3% of available YCbCr values
    134         will compute different RGB results from pre-1.49 versions by +-1;
    135         most of the deviating values are one smaller in the G channel.)
    136 
    137       - If you must produce consistent results with previous versions of
    138         stb_image, #define STBI_JPEG_OLD and you will get the same results
    139         you used to; however, you will not get the SIMD speedups for
    140         the YCbCr-to-RGB conversion step (although you should still see
    141         significant JPEG speedup from the other changes).
    142 
    143         Please note that STBI_JPEG_OLD is a temporary feature; it will be
    144         removed in future versions of the library. It is only intended for
    145         near-term back-compatibility use.
    146 
    147 
    148    Latest revision history:
    149       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
    150       2.07  (2015-09-13) partial animated GIF support
    151                          limited 16-bit PSD support
    152                          minor bugs, code cleanup, and compiler warnings
    153       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
    154       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
    155       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
    156       2.03  (2015-04-12) additional corruption checking
    157                          stbi_set_flip_vertically_on_load
    158                          fix NEON support; fix mingw support
    159       2.02  (2015-01-19) fix incorrect assert, fix warning
    160       2.01  (2015-01-17) fix various warnings
    161       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
    162       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
    163                          progressive JPEG
    164                          PGM/PPM support
    165                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
    166                          STBI_NO_*, STBI_ONLY_*
    167                          GIF bugfix
    168       1.48  (2014-12-14) fix incorrectly-named assert()
    169       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
    170                          optimize PNG
    171                          fix bug in interlaced PNG with user-specified channel count
    172 
    173    See end of file for full revision history.
    174 
    175 
    176  ============================    Contributors    =========================
    177 
    178  Image formats                                Bug fixes & warning fixes
    179     Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
    180     Nicolas Schulz (hdr, psd)                    Christpher Lloyd
    181     Jonathan Dummer (tga)                        Dave Moore
    182     Jean-Marc Lienher (gif)                      Won Chun
    183     Tom Seddon (pic)                             the Horde3D community
    184     Thatcher Ulrich (psd)                        Janez Zemva
    185     Ken Miller (pgm, ppm)                        Jonathan Blow
    186     urraka@github (animated gif)                 Laurent Gomila
    187                                                  Aruelien Pocheville
    188                                                  Ryamond Barbiero
    189                                                  David Woo
    190  Extensions, features                            Martin Golini
    191     Jetro Lauha (stbi_info)                      Roy Eltham
    192     Martin "SpartanJ" Golini (stbi_info)         Luke Graham
    193     James "moose2000" Brown (iPhone PNG)         Thomas Ruf
    194     Ben "Disch" Wenger (io callbacks)            John Bartholomew
    195     Omar Cornut (1/2/4-bit PNG)                  Ken Hamada
    196     Nicolas Guillemot (vertical flip)            Cort Stratton
    197     Richard Mitton (16-bit PSD)                  Blazej Dariusz Roszkowski
    198                                                  Thibault Reuille
    199                                                  Paul Du Bois
    200                                                  Guillaume George
    201                                                  Jerry Jansson
    202                                                  Hayaki Saito
    203                                                  Johan Duparc
    204                                                  Ronny Chevalier
    205  Optimizations & bugfixes                        Michal Cichon
    206     Fabian "ryg" Giesen                          Tero Hanninen
    207     Arseny Kapoulkine                            Sergio Gonzalez
    208                                                  Cass Everitt
    209                                                  Engin Manap
    210   If your name should be here but                Martins Mozeiko
    211   isn't, let Sean know.                          Joseph Thomson
    212                                                  Phil Jordan
    213                                                  Nathan Reed
    214                                                  Michaelangel007@github
    215                                                  Nick Verigakis
    216 
    217 LICENSE
    218 
    219 This software is in the public domain. Where that dedication is not
    220 recognized, you are granted a perpetual, irrevocable license to copy,
    221 distribute, and modify this file as you see fit.
    222 
    223 */
    224 
    225 #ifndef STBI_INCLUDE_STB_IMAGE_H
    226 #define STBI_INCLUDE_STB_IMAGE_H
    227 
    228 // DOCUMENTATION
    229 //
    230 // Limitations:
    231 //    - no 16-bit-per-channel PNG
    232 //    - no 12-bit-per-channel JPEG
    233 //    - no JPEGs with arithmetic coding
    234 //    - no 1-bit BMP
    235 //    - GIF always returns *comp=4
    236 //
    237 // Basic usage (see HDR discussion below for HDR usage):
    238 //    int x,y,n;
    239 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    240 //    // ... process data if not NULL ...
    241 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    242 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    243 //    // ... but 'n' will always be the number that it would have been if you said 0
    244 //    stbi_image_free(data)
    245 //
    246 // Standard parameters:
    247 //    int *x       -- outputs image width in pixels
    248 //    int *y       -- outputs image height in pixels
    249 //    int *comp    -- outputs # of image components in image file
    250 //    int req_comp -- if non-zero, # of image components requested in result
    251 //
    252 // The return value from an image loader is an 'unsigned char *' which points
    253 // to the pixel data, or NULL on an allocation failure or if the image is
    254 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    255 // with each pixel consisting of N interleaved 8-bit components; the first
    256 // pixel pointed to is top-left-most in the image. There is no padding between
    257 // image scanlines or between pixels, regardless of format. The number of
    258 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
    259 // If req_comp is non-zero, *comp has the number of components that _would_
    260 // have been output otherwise. E.g. if you set req_comp to 4, you will always
    261 // get RGBA output, but you can check *comp to see if it's trivially opaque
    262 // because e.g. there were only 3 channels in the source image.
    263 //
    264 // An output image with N components has the following components interleaved
    265 // in this order in each pixel:
    266 //
    267 //     N=#comp     components
    268 //       1           grey
    269 //       2           grey, alpha
    270 //       3           red, green, blue
    271 //       4           red, green, blue, alpha
    272 //
    273 // If image loading fails for any reason, the return value will be NULL,
    274 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
    275 // can be queried for an extremely brief, end-user unfriendly explanation
    276 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
    277 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    278 // more user-friendly ones.
    279 //
    280 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    281 //
    282 // ===========================================================================
    283 //
    284 // Philosophy
    285 //
    286 // stb libraries are designed with the following priorities:
    287 //
    288 //    1. easy to use
    289 //    2. easy to maintain
    290 //    3. good performance
    291 //
    292 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    293 // and for best performance I may provide less-easy-to-use APIs that give higher
    294 // performance, in addition to the easy to use ones. Nevertheless, it's important
    295 // to keep in mind that from the standpoint of you, a client of this library,
    296 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
    297 //
    298 // Some secondary priorities arise directly from the first two, some of which
    299 // make more explicit reasons why performance can't be emphasized.
    300 //
    301 //    - Portable ("ease of use")
    302 //    - Small footprint ("easy to maintain")
    303 //    - No dependencies ("ease of use")
    304 //
    305 // ===========================================================================
    306 //
    307 // I/O callbacks
    308 //
    309 // I/O callbacks allow you to read from arbitrary sources, like packaged
    310 // files or some other source. Data read from callbacks are processed
    311 // through a small internal buffer (currently 128 bytes) to try to reduce
    312 // overhead.
    313 //
    314 // The three functions you must define are "read" (reads some bytes of data),
    315 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    316 //
    317 // ===========================================================================
    318 //
    319 // SIMD support
    320 //
    321 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    322 // supported by the compiler. For ARM Neon support, you must explicitly
    323 // request it.
    324 //
    325 // (The old do-it-yourself SIMD API is no longer supported in the current
    326 // code.)
    327 //
    328 // On x86, SSE2 will automatically be used when available based on a run-time
    329 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    330 // the typical path is to have separate builds for NEON and non-NEON devices
    331 // (at least this is true for iOS and Android). Therefore, the NEON support is
    332 // toggled by a build flag: define STBI_NEON to get NEON loops.
    333 //
    334 // The output of the JPEG decoder is slightly different from versions where
    335 // SIMD support was introduced (that is, for versions before 1.49). The
    336 // difference is only +-1 in the 8-bit RGB channels, and only on a small
    337 // fraction of pixels. You can force the pre-1.49 behavior by defining
    338 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
    339 // and hence cost some performance.
    340 //
    341 // If for some reason you do not want to use any of SIMD code, or if
    342 // you have issues compiling it, you can disable it entirely by
    343 // defining STBI_NO_SIMD.
    344 //
    345 // ===========================================================================
    346 //
    347 // HDR image support   (disable by defining STBI_NO_HDR)
    348 //
    349 // stb_image now supports loading HDR images in general, and currently
    350 // the Radiance .HDR file format, although the support is provided
    351 // generically. You can still load any file through the existing interface;
    352 // if you attempt to load an HDR file, it will be automatically remapped to
    353 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    354 // both of these constants can be reconfigured through this interface:
    355 //
    356 //     stbi_hdr_to_ldr_gamma(2.2f);
    357 //     stbi_hdr_to_ldr_scale(1.0f);
    358 //
    359 // (note, do not use _inverse_ constants; stbi_image will invert them
    360 // appropriately).
    361 //
    362 // Additionally, there is a new, parallel interface for loading files as
    363 // (linear) floats to preserve the full dynamic range:
    364 //
    365 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    366 //
    367 // If you load LDR images through this interface, those images will
    368 // be promoted to floating point values, run through the inverse of
    369 // constants corresponding to the above:
    370 //
    371 //     stbi_ldr_to_hdr_scale(1.0f);
    372 //     stbi_ldr_to_hdr_gamma(2.2f);
    373 //
    374 // Finally, given a filename (or an open file or memory block--see header
    375 // file for details) containing image data, you can query for the "most
    376 // appropriate" interface to use (that is, whether the image is HDR or
    377 // not), using:
    378 //
    379 //     stbi_is_hdr(char *filename);
    380 //
    381 // ===========================================================================
    382 //
    383 // iPhone PNG support:
    384 //
    385 // By default we convert iphone-formatted PNGs back to RGB, even though
    386 // they are internally encoded differently. You can disable this conversion
    387 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
    388 // you will always just get the native iphone "format" through (which
    389 // is BGR stored in RGB).
    390 //
    391 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    392 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    393 // says there's premultiplied data (currently only happens in iPhone images,
    394 // and only if iPhone convert-to-rgb processing is on).
    395 //
    396 
    397 
    398 #ifndef STBI_NO_STDIO
    399 #include <stdio.h>
    400 #endif // STBI_NO_STDIO
    401 
    402 #define STBI_VERSION 1
    403 
    404 enum
    405 {
    406    STBI_default = 0, // only used for req_comp
    407 
    408    STBI_grey       = 1,
    409    STBI_grey_alpha = 2,
    410    STBI_rgb        = 3,
    411    STBI_rgb_alpha  = 4
    412 };
    413 
    414 typedef unsigned char stbi_uc;
    415 
    416 #ifdef __cplusplus
    417 extern "C" {
    418 #endif
    419 
    420 #ifdef STB_IMAGE_STATIC
    421 #define STBIDEF static
    422 #else
    423 #define STBIDEF extern
    424 #endif
    425 
    426 //////////////////////////////////////////////////////////////////////////////
    427 //
    428 // PRIMARY API - works on images of any type
    429 //
    430 
    431 //
    432 // load image by filename, open file, or memory buffer
    433 //
    434 
    435 typedef struct
    436 {
    437    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    438    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    439    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    440 } stbi_io_callbacks;
    441 
    442 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
    443 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
    444 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
    445 
    446 #ifndef STBI_NO_STDIO
    447 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
    448 // for stbi_load_from_file, file pointer is left pointing immediately after image
    449 #endif
    450 
    451 #ifndef STBI_NO_LINEAR
    452    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
    453    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
    454    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
    455 
    456    #ifndef STBI_NO_STDIO
    457    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
    458    #endif
    459 #endif
    460 
    461 #ifndef STBI_NO_HDR
    462    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    463    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    464 #endif
    465 
    466 #ifndef STBI_NO_LINEAR
    467    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    468    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    469 #endif // STBI_NO_HDR
    470 
    471 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    472 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    473 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    474 #ifndef STBI_NO_STDIO
    475 STBIDEF int      stbi_is_hdr          (char const *filename);
    476 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    477 #endif // STBI_NO_STDIO
    478 
    479 
    480 // get a VERY brief reason for failure
    481 // NOT THREADSAFE
    482 STBIDEF const char *stbi_failure_reason  (void);
    483 
    484 // free the loaded image -- this is just free()
    485 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    486 
    487 // get image dimensions & components without fully decoding
    488 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    489 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    490 
    491 #ifndef STBI_NO_STDIO
    492 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
    493 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
    494 
    495 #endif
    496 
    497 
    498 
    499 // for image formats that explicitly notate that they have premultiplied alpha,
    500 // we just return the colors as stored in the file. set this flag to force
    501 // unpremultiplication. results are undefined if the unpremultiply overflow.
    502 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    503 
    504 // indicate whether we should process iphone images back to canonical format,
    505 // or just pass them through "as-is"
    506 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    507 
    508 // flip the image vertically, so the first pixel in the output array is the bottom left
    509 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    510 
    511 // ZLIB client - used by PNG, available for other purposes
    512 
    513 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    514 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    515 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    516 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    517 
    518 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    519 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    520 
    521 
    522 #ifdef __cplusplus
    523 }
    524 #endif
    525 
    526 //
    527 //
    528 ////   end header file   /////////////////////////////////////////////////////
    529 #endif // STBI_INCLUDE_STB_IMAGE_H
    530 
    531 #ifdef STB_IMAGE_IMPLEMENTATION
    532 
    533 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    534   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    535   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    536   || defined(STBI_ONLY_ZLIB)
    537    #ifndef STBI_ONLY_JPEG
    538    #define STBI_NO_JPEG
    539    #endif
    540    #ifndef STBI_ONLY_PNG
    541    #define STBI_NO_PNG
    542    #endif
    543    #ifndef STBI_ONLY_BMP
    544    #define STBI_NO_BMP
    545    #endif
    546    #ifndef STBI_ONLY_PSD
    547    #define STBI_NO_PSD
    548    #endif
    549    #ifndef STBI_ONLY_TGA
    550    #define STBI_NO_TGA
    551    #endif
    552    #ifndef STBI_ONLY_GIF
    553    #define STBI_NO_GIF
    554    #endif
    555    #ifndef STBI_ONLY_HDR
    556    #define STBI_NO_HDR
    557    #endif
    558    #ifndef STBI_ONLY_PIC
    559    #define STBI_NO_PIC
    560    #endif
    561    #ifndef STBI_ONLY_PNM
    562    #define STBI_NO_PNM
    563    #endif
    564 #endif
    565 
    566 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    567 #define STBI_NO_ZLIB
    568 #endif
    569 
    570 
    571 #include <stdarg.h>
    572 #include <stddef.h> // ptrdiff_t on osx
    573 #include <stdlib.h>
    574 #include <string.h>
    575 
    576 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    577 #include <math.h>  // ldexp
    578 #endif
    579 
    580 #ifndef STBI_NO_STDIO
    581 #include <stdio.h>
    582 #endif
    583 
    584 #ifndef STBI_ASSERT
    585 #include <assert.h>
    586 #define STBI_ASSERT(x) assert(x)
    587 #endif
    588 
    589 
    590 #ifndef _MSC_VER
    591    #ifdef __cplusplus
    592    #define stbi_inline inline
    593    #else
    594    #define stbi_inline
    595    #endif
    596 #else
    597    #define stbi_inline __forceinline
    598 #endif
    599 
    600 
    601 #ifdef _MSC_VER
    602 typedef unsigned short stbi__uint16;
    603 typedef   signed short stbi__int16;
    604 typedef unsigned int   stbi__uint32;
    605 typedef   signed int   stbi__int32;
    606 #else
    607 #include <stdint.h>
    608 typedef uint16_t stbi__uint16;
    609 typedef int16_t  stbi__int16;
    610 typedef uint32_t stbi__uint32;
    611 typedef int32_t  stbi__int32;
    612 #endif
    613 
    614 // should produce compiler error if size is wrong
    615 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    616 
    617 #ifdef _MSC_VER
    618 #define STBI_NOTUSED(v)  (void)(v)
    619 #else
    620 #define STBI_NOTUSED(v)  (void)sizeof(v)
    621 #endif
    622 
    623 #ifdef _MSC_VER
    624 #define STBI_HAS_LROTL
    625 #endif
    626 
    627 #ifdef STBI_HAS_LROTL
    628    #define stbi_lrot(x,y)  _lrotl(x,y)
    629 #else
    630    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
    631 #endif
    632 
    633 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
    634 // ok
    635 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
    636 // ok
    637 #else
    638 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
    639 #endif
    640 
    641 #ifndef STBI_MALLOC
    642 #define STBI_MALLOC(sz)    malloc(sz)
    643 #define STBI_REALLOC(p,sz) realloc(p,sz)
    644 #define STBI_FREE(p)       free(p)
    645 #endif
    646 
    647 // x86/x64 detection
    648 #if defined(__x86_64__) || defined(_M_X64)
    649 #define STBI__X64_TARGET
    650 #elif defined(__i386) || defined(_M_IX86)
    651 #define STBI__X86_TARGET
    652 #endif
    653 
    654 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    655 // NOTE: not clear do we actually need this for the 64-bit path?
    656 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    657 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
    658 // this is just broken and gcc are jerks for not fixing it properly
    659 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
    660 #define STBI_NO_SIMD
    661 #endif
    662 
    663 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    664 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    665 //
    666 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    667 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    668 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    669 // simultaneously enabling "-mstackrealign".
    670 //
    671 // See https://github.com/nothings/stb/issues/81 for more information.
    672 //
    673 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    674 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    675 #define STBI_NO_SIMD
    676 #endif
    677 
    678 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
    679 #define STBI_SSE2
    680 #include <emmintrin.h>
    681 
    682 #ifdef _MSC_VER
    683 
    684 #if _MSC_VER >= 1400  // not VC6
    685 #include <intrin.h> // __cpuid
    686 static int stbi__cpuid3(void)
    687 {
    688    int info[4];
    689    __cpuid(info,1);
    690    return info[3];
    691 }
    692 #else
    693 static int stbi__cpuid3(void)
    694 {
    695    int res;
    696    __asm {
    697       mov  eax,1
    698       cpuid
    699       mov  res,edx
    700    }
    701    return res;
    702 }
    703 #endif
    704 
    705 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    706 
    707 static int stbi__sse2_available()
    708 {
    709    int info3 = stbi__cpuid3();
    710    return ((info3 >> 26) & 1) != 0;
    711 }
    712 #else // assume GCC-style if not VC++
    713 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    714 
    715 static int stbi__sse2_available()
    716 {
    717 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
    718    // GCC 4.8+ has a nice way to do this
    719    return __builtin_cpu_supports("sse2");
    720 #else
    721    // portable way to do this, preferably without using GCC inline ASM?
    722    // just bail for now.
    723    return 0;
    724 #endif
    725 }
    726 #endif
    727 #endif
    728 
    729 // ARM NEON
    730 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    731 #undef STBI_NEON
    732 #endif
    733 
    734 #ifdef STBI_NEON
    735 #include <arm_neon.h>
    736 // assume GCC or Clang on ARM targets
    737 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    738 #endif
    739 
    740 #ifndef STBI_SIMD_ALIGN
    741 #define STBI_SIMD_ALIGN(type, name) type name
    742 #endif
    743 
    744 ///////////////////////////////////////////////
    745 //
    746 //  stbi__context struct and start_xxx functions
    747 
    748 // stbi__context structure is our basic context used by all images, so it
    749 // contains all the IO context, plus some basic image information
    750 typedef struct
    751 {
    752    stbi__uint32 img_x, img_y;
    753    int img_n, img_out_n;
    754 
    755    stbi_io_callbacks io;
    756    void *io_user_data;
    757 
    758    int read_from_callbacks;
    759    int buflen;
    760    stbi_uc buffer_start[128];
    761 
    762    stbi_uc *img_buffer, *img_buffer_end;
    763    stbi_uc *img_buffer_original, *img_buffer_original_end;
    764 } stbi__context;
    765 
    766 
    767 static void stbi__refill_buffer(stbi__context *s);
    768 
    769 // initialize a memory-decode context
    770 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    771 {
    772    s->io.read = NULL;
    773    s->read_from_callbacks = 0;
    774    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    775    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    776 }
    777 
    778 // initialize a callback-based context
    779 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    780 {
    781    s->io = *c;
    782    s->io_user_data = user;
    783    s->buflen = sizeof(s->buffer_start);
    784    s->read_from_callbacks = 1;
    785    s->img_buffer_original = s->buffer_start;
    786    stbi__refill_buffer(s);
    787    s->img_buffer_original_end = s->img_buffer_end;
    788 }
    789 
    790 #ifndef STBI_NO_STDIO
    791 
    792 static int stbi__stdio_read(void *user, char *data, int size)
    793 {
    794    return (int) fread(data,1,size,(FILE*) user);
    795 }
    796 
    797 static void stbi__stdio_skip(void *user, int n)
    798 {
    799    fseek((FILE*) user, n, SEEK_CUR);
    800 }
    801 
    802 static int stbi__stdio_eof(void *user)
    803 {
    804    return feof((FILE*) user);
    805 }
    806 
    807 static stbi_io_callbacks stbi__stdio_callbacks =
    808 {
    809    stbi__stdio_read,
    810    stbi__stdio_skip,
    811    stbi__stdio_eof,
    812 };
    813 
    814 static void stbi__start_file(stbi__context *s, FILE *f)
    815 {
    816    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    817 }
    818 
    819 //static void stop_file(stbi__context *s) { }
    820 
    821 #endif // !STBI_NO_STDIO
    822 
    823 static void stbi__rewind(stbi__context *s)
    824 {
    825    // conceptually rewind SHOULD rewind to the beginning of the stream,
    826    // but we just rewind to the beginning of the initial buffer, because
    827    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    828    s->img_buffer = s->img_buffer_original;
    829    s->img_buffer_end = s->img_buffer_original_end;
    830 }
    831 
    832 #ifndef STBI_NO_JPEG
    833 static int      stbi__jpeg_test(stbi__context *s);
    834 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    835 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    836 #endif
    837 
    838 #ifndef STBI_NO_PNG
    839 static int      stbi__png_test(stbi__context *s);
    840 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    841 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    842 #endif
    843 
    844 #ifndef STBI_NO_BMP
    845 static int      stbi__bmp_test(stbi__context *s);
    846 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    847 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    848 #endif
    849 
    850 #ifndef STBI_NO_TGA
    851 static int      stbi__tga_test(stbi__context *s);
    852 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    853 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    854 #endif
    855 
    856 #ifndef STBI_NO_PSD
    857 static int      stbi__psd_test(stbi__context *s);
    858 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    859 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    860 #endif
    861 
    862 #ifndef STBI_NO_HDR
    863 static int      stbi__hdr_test(stbi__context *s);
    864 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    865 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    866 #endif
    867 
    868 #ifndef STBI_NO_PIC
    869 static int      stbi__pic_test(stbi__context *s);
    870 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    871 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    872 #endif
    873 
    874 #ifndef STBI_NO_GIF
    875 static int      stbi__gif_test(stbi__context *s);
    876 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    877 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    878 #endif
    879 
    880 #ifndef STBI_NO_PNM
    881 static int      stbi__pnm_test(stbi__context *s);
    882 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
    883 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    884 #endif
    885 
    886 // this is not threadsafe
    887 static const char *stbi__g_failure_reason;
    888 
    889 STBIDEF const char *stbi_failure_reason(void)
    890 {
    891    return stbi__g_failure_reason;
    892 }
    893 
    894 static int stbi__err(const char *str)
    895 {
    896    stbi__g_failure_reason = str;
    897    return 0;
    898 }
    899 
    900 static void *stbi__malloc(size_t size)
    901 {
    902     return STBI_MALLOC(size);
    903 }
    904 
    905 // stbi__err - error
    906 // stbi__errpf - error returning pointer to float
    907 // stbi__errpuc - error returning pointer to unsigned char
    908 
    909 #ifdef STBI_NO_FAILURE_STRINGS
    910    #define stbi__err(x,y)  0
    911 #elif defined(STBI_FAILURE_USERMSG)
    912    #define stbi__err(x,y)  stbi__err(y)
    913 #else
    914    #define stbi__err(x,y)  stbi__err(x)
    915 #endif
    916 
    917 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
    918 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
    919 
    920 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
    921 {
    922    STBI_FREE(retval_from_stbi_load);
    923 }
    924 
    925 #ifndef STBI_NO_LINEAR
    926 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
    927 #endif
    928 
    929 #ifndef STBI_NO_HDR
    930 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
    931 #endif
    932 
    933 static int stbi__vertically_flip_on_load = 0;
    934 
    935 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
    936 {
    937     stbi__vertically_flip_on_load = flag_true_if_should_flip;
    938 }
    939 
    940 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
    941 {
    942    #ifndef STBI_NO_JPEG
    943    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
    944    #endif
    945    #ifndef STBI_NO_PNG
    946    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
    947    #endif
    948    #ifndef STBI_NO_BMP
    949    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
    950    #endif
    951    #ifndef STBI_NO_GIF
    952    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
    953    #endif
    954    #ifndef STBI_NO_PSD
    955    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
    956    #endif
    957    #ifndef STBI_NO_PIC
    958    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
    959    #endif
    960    #ifndef STBI_NO_PNM
    961    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
    962    #endif
    963 
    964    #ifndef STBI_NO_HDR
    965    if (stbi__hdr_test(s)) {
    966       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
    967       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
    968    }
    969    #endif
    970 
    971    #ifndef STBI_NO_TGA
    972    // test tga last because it's a crappy test!
    973    if (stbi__tga_test(s))
    974       return stbi__tga_load(s,x,y,comp,req_comp);
    975    #endif
    976 
    977    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
    978 }
    979 
    980 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
    981 {
    982    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
    983 
    984    if (stbi__vertically_flip_on_load && result != NULL) {
    985       int w = *x, h = *y;
    986       int depth = req_comp ? req_comp : *comp;
    987       int row,col,z;
    988       stbi_uc temp;
    989 
    990       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
    991       for (row = 0; row < (h>>1); row++) {
    992          for (col = 0; col < w; col++) {
    993             for (z = 0; z < depth; z++) {
    994                temp = result[(row * w + col) * depth + z];
    995                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
    996                result[((h - row - 1) * w + col) * depth + z] = temp;
    997             }
    998          }
    999       }
   1000    }
   1001 
   1002    return result;
   1003 }
   1004 
   1005 #ifndef STBI_NO_HDR
   1006 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1007 {
   1008    if (stbi__vertically_flip_on_load && result != NULL) {
   1009       int w = *x, h = *y;
   1010       int depth = req_comp ? req_comp : *comp;
   1011       int row,col,z;
   1012       float temp;
   1013 
   1014       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
   1015       for (row = 0; row < (h>>1); row++) {
   1016          for (col = 0; col < w; col++) {
   1017             for (z = 0; z < depth; z++) {
   1018                temp = result[(row * w + col) * depth + z];
   1019                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
   1020                result[((h - row - 1) * w + col) * depth + z] = temp;
   1021             }
   1022          }
   1023       }
   1024    }
   1025 }
   1026 #endif
   1027 
   1028 #ifndef STBI_NO_STDIO
   1029 
   1030 static FILE *stbi__fopen(char const *filename, char const *mode)
   1031 {
   1032    FILE *f;
   1033 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1034    if (0 != fopen_s(&f, filename, mode))
   1035       f=0;
   1036 #else
   1037    f = fopen(filename, mode);
   1038 #endif
   1039    return f;
   1040 }
   1041 
   1042 
   1043 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1044 {
   1045    FILE *f = stbi__fopen(filename, "rb");
   1046    unsigned char *result;
   1047    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1048    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1049    fclose(f);
   1050    return result;
   1051 }
   1052 
   1053 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1054 {
   1055    unsigned char *result;
   1056    stbi__context s;
   1057    stbi__start_file(&s,f);
   1058    result = stbi__load_flip(&s,x,y,comp,req_comp);
   1059    if (result) {
   1060       // need to 'unget' all the characters in the IO buffer
   1061       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1062    }
   1063    return result;
   1064 }
   1065 #endif //!STBI_NO_STDIO
   1066 
   1067 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1068 {
   1069    stbi__context s;
   1070    stbi__start_mem(&s,buffer,len);
   1071    return stbi__load_flip(&s,x,y,comp,req_comp);
   1072 }
   1073 
   1074 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1075 {
   1076    stbi__context s;
   1077    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1078    return stbi__load_flip(&s,x,y,comp,req_comp);
   1079 }
   1080 
   1081 #ifndef STBI_NO_LINEAR
   1082 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1083 {
   1084    unsigned char *data;
   1085    #ifndef STBI_NO_HDR
   1086    if (stbi__hdr_test(s)) {
   1087       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
   1088       if (hdr_data)
   1089          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1090       return hdr_data;
   1091    }
   1092    #endif
   1093    data = stbi__load_flip(s, x, y, comp, req_comp);
   1094    if (data)
   1095       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1096    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1097 }
   1098 
   1099 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1100 {
   1101    stbi__context s;
   1102    stbi__start_mem(&s,buffer,len);
   1103    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1104 }
   1105 
   1106 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1107 {
   1108    stbi__context s;
   1109    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1110    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1111 }
   1112 
   1113 #ifndef STBI_NO_STDIO
   1114 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1115 {
   1116    float *result;
   1117    FILE *f = stbi__fopen(filename, "rb");
   1118    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1119    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1120    fclose(f);
   1121    return result;
   1122 }
   1123 
   1124 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1125 {
   1126    stbi__context s;
   1127    stbi__start_file(&s,f);
   1128    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1129 }
   1130 #endif // !STBI_NO_STDIO
   1131 
   1132 #endif // !STBI_NO_LINEAR
   1133 
   1134 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1135 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1136 // reports false!
   1137 
   1138 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1139 {
   1140    #ifndef STBI_NO_HDR
   1141    stbi__context s;
   1142    stbi__start_mem(&s,buffer,len);
   1143    return stbi__hdr_test(&s);
   1144    #else
   1145    STBI_NOTUSED(buffer);
   1146    STBI_NOTUSED(len);
   1147    return 0;
   1148    #endif
   1149 }
   1150 
   1151 #ifndef STBI_NO_STDIO
   1152 STBIDEF int      stbi_is_hdr          (char const *filename)
   1153 {
   1154    FILE *f = stbi__fopen(filename, "rb");
   1155    int result=0;
   1156    if (f) {
   1157       result = stbi_is_hdr_from_file(f);
   1158       fclose(f);
   1159    }
   1160    return result;
   1161 }
   1162 
   1163 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
   1164 {
   1165    #ifndef STBI_NO_HDR
   1166    stbi__context s;
   1167    stbi__start_file(&s,f);
   1168    return stbi__hdr_test(&s);
   1169    #else
   1170    STBI_NOTUSED(f);
   1171    return 0;
   1172    #endif
   1173 }
   1174 #endif // !STBI_NO_STDIO
   1175 
   1176 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1177 {
   1178    #ifndef STBI_NO_HDR
   1179    stbi__context s;
   1180    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1181    return stbi__hdr_test(&s);
   1182    #else
   1183    STBI_NOTUSED(clbk);
   1184    STBI_NOTUSED(user);
   1185    return 0;
   1186    #endif
   1187 }
   1188 
   1189 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1190 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1191 
   1192 #ifndef STBI_NO_LINEAR
   1193 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1194 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1195 #endif
   1196 
   1197 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1198 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1199 
   1200 
   1201 //////////////////////////////////////////////////////////////////////////////
   1202 //
   1203 // Common code used by all image loaders
   1204 //
   1205 
   1206 enum
   1207 {
   1208    STBI__SCAN_load=0,
   1209    STBI__SCAN_type,
   1210    STBI__SCAN_header
   1211 };
   1212 
   1213 static void stbi__refill_buffer(stbi__context *s)
   1214 {
   1215    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1216    if (n == 0) {
   1217       // at end of file, treat same as if from memory, but need to handle case
   1218       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1219       s->read_from_callbacks = 0;
   1220       s->img_buffer = s->buffer_start;
   1221       s->img_buffer_end = s->buffer_start+1;
   1222       *s->img_buffer = 0;
   1223    } else {
   1224       s->img_buffer = s->buffer_start;
   1225       s->img_buffer_end = s->buffer_start + n;
   1226    }
   1227 }
   1228 
   1229 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1230 {
   1231    if (s->img_buffer < s->img_buffer_end)
   1232       return *s->img_buffer++;
   1233    if (s->read_from_callbacks) {
   1234       stbi__refill_buffer(s);
   1235       return *s->img_buffer++;
   1236    }
   1237    return 0;
   1238 }
   1239 
   1240 stbi_inline static int stbi__at_eof(stbi__context *s)
   1241 {
   1242    if (s->io.read) {
   1243       if (!(s->io.eof)(s->io_user_data)) return 0;
   1244       // if feof() is true, check if buffer = end
   1245       // special case: we've only got the special 0 character at the end
   1246       if (s->read_from_callbacks == 0) return 1;
   1247    }
   1248 
   1249    return s->img_buffer >= s->img_buffer_end;
   1250 }
   1251 
   1252 static void stbi__skip(stbi__context *s, int n)
   1253 {
   1254    if (n < 0) {
   1255       s->img_buffer = s->img_buffer_end;
   1256       return;
   1257    }
   1258    if (s->io.read) {
   1259       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1260       if (blen < n) {
   1261          s->img_buffer = s->img_buffer_end;
   1262          (s->io.skip)(s->io_user_data, n - blen);
   1263          return;
   1264       }
   1265    }
   1266    s->img_buffer += n;
   1267 }
   1268 
   1269 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1270 {
   1271    if (s->io.read) {
   1272       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1273       if (blen < n) {
   1274          int res, count;
   1275 
   1276          memcpy(buffer, s->img_buffer, blen);
   1277 
   1278          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1279          res = (count == (n-blen));
   1280          s->img_buffer = s->img_buffer_end;
   1281          return res;
   1282       }
   1283    }
   1284 
   1285    if (s->img_buffer+n <= s->img_buffer_end) {
   1286       memcpy(buffer, s->img_buffer, n);
   1287       s->img_buffer += n;
   1288       return 1;
   1289    } else
   1290       return 0;
   1291 }
   1292 
   1293 static int stbi__get16be(stbi__context *s)
   1294 {
   1295    int z = stbi__get8(s);
   1296    return (z << 8) + stbi__get8(s);
   1297 }
   1298 
   1299 static stbi__uint32 stbi__get32be(stbi__context *s)
   1300 {
   1301    stbi__uint32 z = stbi__get16be(s);
   1302    return (z << 16) + stbi__get16be(s);
   1303 }
   1304 
   1305 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1306 // nothing
   1307 #else
   1308 static int stbi__get16le(stbi__context *s)
   1309 {
   1310    int z = stbi__get8(s);
   1311    return z + (stbi__get8(s) << 8);
   1312 }
   1313 #endif
   1314 
   1315 #ifndef STBI_NO_BMP
   1316 static stbi__uint32 stbi__get32le(stbi__context *s)
   1317 {
   1318    stbi__uint32 z = stbi__get16le(s);
   1319    return z + (stbi__get16le(s) << 16);
   1320 }
   1321 #endif
   1322 
   1323 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1324 
   1325 
   1326 //////////////////////////////////////////////////////////////////////////////
   1327 //
   1328 //  generic converter from built-in img_n to req_comp
   1329 //    individual types do this automatically as much as possible (e.g. jpeg
   1330 //    does all cases internally since it needs to colorspace convert anyway,
   1331 //    and it never has alpha, so very few cases ). png can automatically
   1332 //    interleave an alpha=255 channel, but falls back to this for other cases
   1333 //
   1334 //  assume data buffer is malloced, so malloc a new one and free that one
   1335 //  only failure mode is malloc failing
   1336 
   1337 static stbi_uc stbi__compute_y(int r, int g, int b)
   1338 {
   1339    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1340 }
   1341 
   1342 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1343 {
   1344    int i,j;
   1345    unsigned char *good;
   1346 
   1347    if (req_comp == img_n) return data;
   1348    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1349 
   1350    good = (unsigned char *) stbi__malloc(req_comp * x * y);
   1351    if (good == NULL) {
   1352       STBI_FREE(data);
   1353       return stbi__errpuc("outofmem", "Out of memory");
   1354    }
   1355 
   1356    for (j=0; j < (int) y; ++j) {
   1357       unsigned char *src  = data + j * x * img_n   ;
   1358       unsigned char *dest = good + j * x * req_comp;
   1359 
   1360       #define COMBO(a,b)  ((a)*8+(b))
   1361       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1362       // convert source image with img_n components to one with req_comp components;
   1363       // avoid switch per pixel, so use switch per scanline and massive macros
   1364       switch (COMBO(img_n, req_comp)) {
   1365          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
   1366          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
   1367          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
   1368          CASE(2,1) dest[0]=src[0]; break;
   1369          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
   1370          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
   1371          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
   1372          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
   1373          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
   1374          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
   1375          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
   1376          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
   1377          default: STBI_ASSERT(0);
   1378       }
   1379       #undef CASE
   1380    }
   1381 
   1382    STBI_FREE(data);
   1383    return good;
   1384 }
   1385 
   1386 #ifndef STBI_NO_LINEAR
   1387 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1388 {
   1389    int i,k,n;
   1390    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
   1391    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1392    // compute number of non-alpha components
   1393    if (comp & 1) n = comp; else n = comp-1;
   1394    for (i=0; i < x*y; ++i) {
   1395       for (k=0; k < n; ++k) {
   1396          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1397       }
   1398       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
   1399    }
   1400    STBI_FREE(data);
   1401    return output;
   1402 }
   1403 #endif
   1404 
   1405 #ifndef STBI_NO_HDR
   1406 #define stbi__float2int(x)   ((int) (x))
   1407 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1408 {
   1409    int i,k,n;
   1410    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
   1411    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1412    // compute number of non-alpha components
   1413    if (comp & 1) n = comp; else n = comp-1;
   1414    for (i=0; i < x*y; ++i) {
   1415       for (k=0; k < n; ++k) {
   1416          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1417          if (z < 0) z = 0;
   1418          if (z > 255) z = 255;
   1419          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1420       }
   1421       if (k < comp) {
   1422          float z = data[i*comp+k] * 255 + 0.5f;
   1423          if (z < 0) z = 0;
   1424          if (z > 255) z = 255;
   1425          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1426       }
   1427    }
   1428    STBI_FREE(data);
   1429    return output;
   1430 }
   1431 #endif
   1432 
   1433 //////////////////////////////////////////////////////////////////////////////
   1434 //
   1435 //  "baseline" JPEG/JFIF decoder
   1436 //
   1437 //    simple implementation
   1438 //      - doesn't support delayed output of y-dimension
   1439 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1440 //      - doesn't try to recover corrupt jpegs
   1441 //      - doesn't allow partial loading, loading multiple at once
   1442 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1443 //      - allocates lots of intermediate memory (full size of all components)
   1444 //        - non-interleaved case requires this anyway
   1445 //        - allows good upsampling (see next)
   1446 //    high-quality
   1447 //      - upsampled channels are bilinearly interpolated, even across blocks
   1448 //      - quality integer IDCT derived from IJG's 'slow'
   1449 //    performance
   1450 //      - fast huffman; reasonable integer IDCT
   1451 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1452 //      - uses a lot of intermediate memory, could cache poorly
   1453 
   1454 #ifndef STBI_NO_JPEG
   1455 
   1456 // huffman decoding acceleration
   1457 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1458 
   1459 typedef struct
   1460 {
   1461    stbi_uc  fast[1 << FAST_BITS];
   1462    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1463    stbi__uint16 code[256];
   1464    stbi_uc  values[256];
   1465    stbi_uc  size[257];
   1466    unsigned int maxcode[18];
   1467    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1468 } stbi__huffman;
   1469 
   1470 typedef struct
   1471 {
   1472    stbi__context *s;
   1473    stbi__huffman huff_dc[4];
   1474    stbi__huffman huff_ac[4];
   1475    stbi_uc dequant[4][64];
   1476    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1477 
   1478 // sizes for components, interleaved MCUs
   1479    int img_h_max, img_v_max;
   1480    int img_mcu_x, img_mcu_y;
   1481    int img_mcu_w, img_mcu_h;
   1482 
   1483 // definition of jpeg image component
   1484    struct
   1485    {
   1486       int id;
   1487       int h,v;
   1488       int tq;
   1489       int hd,ha;
   1490       int dc_pred;
   1491 
   1492       int x,y,w2,h2;
   1493       stbi_uc *data;
   1494       void *raw_data, *raw_coeff;
   1495       stbi_uc *linebuf;
   1496       short   *coeff;   // progressive only
   1497       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1498    } img_comp[4];
   1499 
   1500    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1501    int            code_bits;   // number of valid bits
   1502    unsigned char  marker;      // marker seen while filling entropy buffer
   1503    int            nomore;      // flag if we saw a marker so must stop
   1504 
   1505    int            progressive;
   1506    int            spec_start;
   1507    int            spec_end;
   1508    int            succ_high;
   1509    int            succ_low;
   1510    int            eob_run;
   1511 
   1512    int scan_n, order[4];
   1513    int restart_interval, todo;
   1514 
   1515 // kernels
   1516    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1517    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1518    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1519 } stbi__jpeg;
   1520 
   1521 static int stbi__build_huffman(stbi__huffman *h, int *count)
   1522 {
   1523    int i,j,k=0,code;
   1524    // build size list for each symbol (from JPEG spec)
   1525    for (i=0; i < 16; ++i)
   1526       for (j=0; j < count[i]; ++j)
   1527          h->size[k++] = (stbi_uc) (i+1);
   1528    h->size[k] = 0;
   1529 
   1530    // compute actual symbols (from jpeg spec)
   1531    code = 0;
   1532    k = 0;
   1533    for(j=1; j <= 16; ++j) {
   1534       // compute delta to add to code to compute symbol id
   1535       h->delta[j] = k - code;
   1536       if (h->size[k] == j) {
   1537          while (h->size[k] == j)
   1538             h->code[k++] = (stbi__uint16) (code++);
   1539          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   1540       }
   1541       // compute largest code + 1 for this size, preshifted as needed later
   1542       h->maxcode[j] = code << (16-j);
   1543       code <<= 1;
   1544    }
   1545    h->maxcode[j] = 0xffffffff;
   1546 
   1547    // build non-spec acceleration table; 255 is flag for not-accelerated
   1548    memset(h->fast, 255, 1 << FAST_BITS);
   1549    for (i=0; i < k; ++i) {
   1550       int s = h->size[i];
   1551       if (s <= FAST_BITS) {
   1552          int c = h->code[i] << (FAST_BITS-s);
   1553          int m = 1 << (FAST_BITS-s);
   1554          for (j=0; j < m; ++j) {
   1555             h->fast[c+j] = (stbi_uc) i;
   1556          }
   1557       }
   1558    }
   1559    return 1;
   1560 }
   1561 
   1562 // build a table that decodes both magnitude and value of small ACs in
   1563 // one go.
   1564 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   1565 {
   1566    int i;
   1567    for (i=0; i < (1 << FAST_BITS); ++i) {
   1568       stbi_uc fast = h->fast[i];
   1569       fast_ac[i] = 0;
   1570       if (fast < 255) {
   1571          int rs = h->values[fast];
   1572          int run = (rs >> 4) & 15;
   1573          int magbits = rs & 15;
   1574          int len = h->size[fast];
   1575 
   1576          if (magbits && len + magbits <= FAST_BITS) {
   1577             // magnitude code followed by receive_extend code
   1578             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   1579             int m = 1 << (magbits - 1);
   1580             if (k < m) k += (-1 << magbits) + 1;
   1581             // if the result is small enough, we can fit it in fast_ac table
   1582             if (k >= -128 && k <= 127)
   1583                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
   1584          }
   1585       }
   1586    }
   1587 }
   1588 
   1589 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   1590 {
   1591    do {
   1592       int b = j->nomore ? 0 : stbi__get8(j->s);
   1593       if (b == 0xff) {
   1594          int c = stbi__get8(j->s);
   1595          if (c != 0) {
   1596             j->marker = (unsigned char) c;
   1597             j->nomore = 1;
   1598             return;
   1599          }
   1600       }
   1601       j->code_buffer |= b << (24 - j->code_bits);
   1602       j->code_bits += 8;
   1603    } while (j->code_bits <= 24);
   1604 }
   1605 
   1606 // (1 << n) - 1
   1607 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   1608 
   1609 // decode a jpeg huffman value from the bitstream
   1610 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   1611 {
   1612    unsigned int temp;
   1613    int c,k;
   1614 
   1615    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1616 
   1617    // look at the top FAST_BITS and determine what symbol ID it is,
   1618    // if the code is <= FAST_BITS
   1619    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1620    k = h->fast[c];
   1621    if (k < 255) {
   1622       int s = h->size[k];
   1623       if (s > j->code_bits)
   1624          return -1;
   1625       j->code_buffer <<= s;
   1626       j->code_bits -= s;
   1627       return h->values[k];
   1628    }
   1629 
   1630    // naive test is to shift the code_buffer down so k bits are
   1631    // valid, then test against maxcode. To speed this up, we've
   1632    // preshifted maxcode left so that it has (16-k) 0s at the
   1633    // end; in other words, regardless of the number of bits, it
   1634    // wants to be compared against something shifted to have 16;
   1635    // that way we don't need to shift inside the loop.
   1636    temp = j->code_buffer >> 16;
   1637    for (k=FAST_BITS+1 ; ; ++k)
   1638       if (temp < h->maxcode[k])
   1639          break;
   1640    if (k == 17) {
   1641       // error! code not found
   1642       j->code_bits -= 16;
   1643       return -1;
   1644    }
   1645 
   1646    if (k > j->code_bits)
   1647       return -1;
   1648 
   1649    // convert the huffman code to the symbol id
   1650    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   1651    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   1652 
   1653    // convert the id to a symbol
   1654    j->code_bits -= k;
   1655    j->code_buffer <<= k;
   1656    return h->values[c];
   1657 }
   1658 
   1659 // bias[n] = (-1<<n) + 1
   1660 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   1661 
   1662 // combined JPEG 'receive' and JPEG 'extend', since baseline
   1663 // always extends everything it receives.
   1664 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   1665 {
   1666    unsigned int k;
   1667    int sgn;
   1668    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1669 
   1670    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
   1671    k = stbi_lrot(j->code_buffer, n);
   1672    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
   1673    j->code_buffer = k & ~stbi__bmask[n];
   1674    k &= stbi__bmask[n];
   1675    j->code_bits -= n;
   1676    return k + (stbi__jbias[n] & ~sgn);
   1677 }
   1678 
   1679 // get some unsigned bits
   1680 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   1681 {
   1682    unsigned int k;
   1683    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   1684    k = stbi_lrot(j->code_buffer, n);
   1685    j->code_buffer = k & ~stbi__bmask[n];
   1686    k &= stbi__bmask[n];
   1687    j->code_bits -= n;
   1688    return k;
   1689 }
   1690 
   1691 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   1692 {
   1693    unsigned int k;
   1694    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   1695    k = j->code_buffer;
   1696    j->code_buffer <<= 1;
   1697    --j->code_bits;
   1698    return k & 0x80000000;
   1699 }
   1700 
   1701 // given a value that's at position X in the zigzag stream,
   1702 // where does it appear in the 8x8 matrix coded as row-major?
   1703 static stbi_uc stbi__jpeg_dezigzag[64+15] =
   1704 {
   1705     0,  1,  8, 16,  9,  2,  3, 10,
   1706    17, 24, 32, 25, 18, 11,  4,  5,
   1707    12, 19, 26, 33, 40, 48, 41, 34,
   1708    27, 20, 13,  6,  7, 14, 21, 28,
   1709    35, 42, 49, 56, 57, 50, 43, 36,
   1710    29, 22, 15, 23, 30, 37, 44, 51,
   1711    58, 59, 52, 45, 38, 31, 39, 46,
   1712    53, 60, 61, 54, 47, 55, 62, 63,
   1713    // let corrupt input sample past end
   1714    63, 63, 63, 63, 63, 63, 63, 63,
   1715    63, 63, 63, 63, 63, 63, 63
   1716 };
   1717 
   1718 // decode one 64-entry block--
   1719 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
   1720 {
   1721    int diff,dc,k;
   1722    int t;
   1723 
   1724    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1725    t = stbi__jpeg_huff_decode(j, hdc);
   1726    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1727 
   1728    // 0 all the ac values now so we can do it 32-bits at a time
   1729    memset(data,0,64*sizeof(data[0]));
   1730 
   1731    diff = t ? stbi__extend_receive(j, t) : 0;
   1732    dc = j->img_comp[b].dc_pred + diff;
   1733    j->img_comp[b].dc_pred = dc;
   1734    data[0] = (short) (dc * dequant[0]);
   1735 
   1736    // decode AC components, see JPEG spec
   1737    k = 1;
   1738    do {
   1739       unsigned int zig;
   1740       int c,r,s;
   1741       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1742       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1743       r = fac[c];
   1744       if (r) { // fast-AC path
   1745          k += (r >> 4) & 15; // run
   1746          s = r & 15; // combined length
   1747          j->code_buffer <<= s;
   1748          j->code_bits -= s;
   1749          // decode into unzigzag'd location
   1750          zig = stbi__jpeg_dezigzag[k++];
   1751          data[zig] = (short) ((r >> 8) * dequant[zig]);
   1752       } else {
   1753          int rs = stbi__jpeg_huff_decode(j, hac);
   1754          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1755          s = rs & 15;
   1756          r = rs >> 4;
   1757          if (s == 0) {
   1758             if (rs != 0xf0) break; // end block
   1759             k += 16;
   1760          } else {
   1761             k += r;
   1762             // decode into unzigzag'd location
   1763             zig = stbi__jpeg_dezigzag[k++];
   1764             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   1765          }
   1766       }
   1767    } while (k < 64);
   1768    return 1;
   1769 }
   1770 
   1771 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   1772 {
   1773    int diff,dc;
   1774    int t;
   1775    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   1776 
   1777    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1778 
   1779    if (j->succ_high == 0) {
   1780       // first scan for DC coefficient, must be first
   1781       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   1782       t = stbi__jpeg_huff_decode(j, hdc);
   1783       diff = t ? stbi__extend_receive(j, t) : 0;
   1784 
   1785       dc = j->img_comp[b].dc_pred + diff;
   1786       j->img_comp[b].dc_pred = dc;
   1787       data[0] = (short) (dc << j->succ_low);
   1788    } else {
   1789       // refinement scan for DC coefficient
   1790       if (stbi__jpeg_get_bit(j))
   1791          data[0] += (short) (1 << j->succ_low);
   1792    }
   1793    return 1;
   1794 }
   1795 
   1796 // @OPTIMIZE: store non-zigzagged during the decode passes,
   1797 // and only de-zigzag when dequantizing
   1798 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   1799 {
   1800    int k;
   1801    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   1802 
   1803    if (j->succ_high == 0) {
   1804       int shift = j->succ_low;
   1805 
   1806       if (j->eob_run) {
   1807          --j->eob_run;
   1808          return 1;
   1809       }
   1810 
   1811       k = j->spec_start;
   1812       do {
   1813          unsigned int zig;
   1814          int c,r,s;
   1815          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   1816          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   1817          r = fac[c];
   1818          if (r) { // fast-AC path
   1819             k += (r >> 4) & 15; // run
   1820             s = r & 15; // combined length
   1821             j->code_buffer <<= s;
   1822             j->code_bits -= s;
   1823             zig = stbi__jpeg_dezigzag[k++];
   1824             data[zig] = (short) ((r >> 8) << shift);
   1825          } else {
   1826             int rs = stbi__jpeg_huff_decode(j, hac);
   1827             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1828             s = rs & 15;
   1829             r = rs >> 4;
   1830             if (s == 0) {
   1831                if (r < 15) {
   1832                   j->eob_run = (1 << r);
   1833                   if (r)
   1834                      j->eob_run += stbi__jpeg_get_bits(j, r);
   1835                   --j->eob_run;
   1836                   break;
   1837                }
   1838                k += 16;
   1839             } else {
   1840                k += r;
   1841                zig = stbi__jpeg_dezigzag[k++];
   1842                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
   1843             }
   1844          }
   1845       } while (k <= j->spec_end);
   1846    } else {
   1847       // refinement scan for these AC coefficients
   1848 
   1849       short bit = (short) (1 << j->succ_low);
   1850 
   1851       if (j->eob_run) {
   1852          --j->eob_run;
   1853          for (k = j->spec_start; k <= j->spec_end; ++k) {
   1854             short *p = &data[stbi__jpeg_dezigzag[k]];
   1855             if (*p != 0)
   1856                if (stbi__jpeg_get_bit(j))
   1857                   if ((*p & bit)==0) {
   1858                      if (*p > 0)
   1859                         *p += bit;
   1860                      else
   1861                         *p -= bit;
   1862                   }
   1863          }
   1864       } else {
   1865          k = j->spec_start;
   1866          do {
   1867             int r,s;
   1868             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   1869             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   1870             s = rs & 15;
   1871             r = rs >> 4;
   1872             if (s == 0) {
   1873                if (r < 15) {
   1874                   j->eob_run = (1 << r) - 1;
   1875                   if (r)
   1876                      j->eob_run += stbi__jpeg_get_bits(j, r);
   1877                   r = 64; // force end of block
   1878                } else {
   1879                   // r=15 s=0 should write 16 0s, so we just do
   1880                   // a run of 15 0s and then write s (which is 0),
   1881                   // so we don't have to do anything special here
   1882                }
   1883             } else {
   1884                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   1885                // sign bit
   1886                if (stbi__jpeg_get_bit(j))
   1887                   s = bit;
   1888                else
   1889                   s = -bit;
   1890             }
   1891 
   1892             // advance by r
   1893             while (k <= j->spec_end) {
   1894                short *p = &data[stbi__jpeg_dezigzag[k++]];
   1895                if (*p != 0) {
   1896                   if (stbi__jpeg_get_bit(j))
   1897                      if ((*p & bit)==0) {
   1898                         if (*p > 0)
   1899                            *p += bit;
   1900                         else
   1901                            *p -= bit;
   1902                      }
   1903                } else {
   1904                   if (r == 0) {
   1905                      *p = (short) s;
   1906                      break;
   1907                   }
   1908                   --r;
   1909                }
   1910             }
   1911          } while (k <= j->spec_end);
   1912       }
   1913    }
   1914    return 1;
   1915 }
   1916 
   1917 // take a -128..127 value and stbi__clamp it and convert to 0..255
   1918 stbi_inline static stbi_uc stbi__clamp(int x)
   1919 {
   1920    // trick to use a single test to catch both cases
   1921    if ((unsigned int) x > 255) {
   1922       if (x < 0) return 0;
   1923       if (x > 255) return 255;
   1924    }
   1925    return (stbi_uc) x;
   1926 }
   1927 
   1928 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   1929 #define stbi__fsh(x)  ((x) << 12)
   1930 
   1931 // derived from jidctint -- DCT_ISLOW
   1932 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   1933    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   1934    p2 = s2;                                    \
   1935    p3 = s6;                                    \
   1936    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   1937    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   1938    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   1939    p2 = s0;                                    \
   1940    p3 = s4;                                    \
   1941    t0 = stbi__fsh(p2+p3);                      \
   1942    t1 = stbi__fsh(p2-p3);                      \
   1943    x0 = t0+t3;                                 \
   1944    x3 = t0-t3;                                 \
   1945    x1 = t1+t2;                                 \
   1946    x2 = t1-t2;                                 \
   1947    t0 = s7;                                    \
   1948    t1 = s5;                                    \
   1949    t2 = s3;                                    \
   1950    t3 = s1;                                    \
   1951    p3 = t0+t2;                                 \
   1952    p4 = t1+t3;                                 \
   1953    p1 = t0+t3;                                 \
   1954    p2 = t1+t2;                                 \
   1955    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   1956    t0 = t0*stbi__f2f( 0.298631336f);           \
   1957    t1 = t1*stbi__f2f( 2.053119869f);           \
   1958    t2 = t2*stbi__f2f( 3.072711026f);           \
   1959    t3 = t3*stbi__f2f( 1.501321110f);           \
   1960    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   1961    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   1962    p3 = p3*stbi__f2f(-1.961570560f);           \
   1963    p4 = p4*stbi__f2f(-0.390180644f);           \
   1964    t3 += p1+p4;                                \
   1965    t2 += p2+p3;                                \
   1966    t1 += p2+p4;                                \
   1967    t0 += p1+p3;
   1968 
   1969 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   1970 {
   1971    int i,val[64],*v=val;
   1972    stbi_uc *o;
   1973    short *d = data;
   1974 
   1975    // columns
   1976    for (i=0; i < 8; ++i,++d, ++v) {
   1977       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   1978       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   1979            && d[40]==0 && d[48]==0 && d[56]==0) {
   1980          //    no shortcut                 0     seconds
   1981          //    (1|2|3|4|5|6|7)==0          0     seconds
   1982          //    all separate               -0.047 seconds
   1983          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   1984          int dcterm = d[0] << 2;
   1985          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   1986       } else {
   1987          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   1988          // constants scaled things up by 1<<12; let's bring them back
   1989          // down, but keep 2 extra bits of precision
   1990          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   1991          v[ 0] = (x0+t3) >> 10;
   1992          v[56] = (x0-t3) >> 10;
   1993          v[ 8] = (x1+t2) >> 10;
   1994          v[48] = (x1-t2) >> 10;
   1995          v[16] = (x2+t1) >> 10;
   1996          v[40] = (x2-t1) >> 10;
   1997          v[24] = (x3+t0) >> 10;
   1998          v[32] = (x3-t0) >> 10;
   1999       }
   2000    }
   2001 
   2002    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2003       // no fast case since the first 1D IDCT spread components out
   2004       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2005       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2006       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2007       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2008       // so we want to round that, which means adding 0.5 * 1<<17,
   2009       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2010       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2011       x0 += 65536 + (128<<17);
   2012       x1 += 65536 + (128<<17);
   2013       x2 += 65536 + (128<<17);
   2014       x3 += 65536 + (128<<17);
   2015       // tried computing the shifts into temps, or'ing the temps to see
   2016       // if any were out of range, but that was slower
   2017       o[0] = stbi__clamp((x0+t3) >> 17);
   2018       o[7] = stbi__clamp((x0-t3) >> 17);
   2019       o[1] = stbi__clamp((x1+t2) >> 17);
   2020       o[6] = stbi__clamp((x1-t2) >> 17);
   2021       o[2] = stbi__clamp((x2+t1) >> 17);
   2022       o[5] = stbi__clamp((x2-t1) >> 17);
   2023       o[3] = stbi__clamp((x3+t0) >> 17);
   2024       o[4] = stbi__clamp((x3-t0) >> 17);
   2025    }
   2026 }
   2027 
   2028 #ifdef STBI_SSE2
   2029 // sse2 integer IDCT. not the fastest possible implementation but it
   2030 // produces bit-identical results to the generic C version so it's
   2031 // fully "transparent".
   2032 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2033 {
   2034    // This is constructed to match our regular (generic) integer IDCT exactly.
   2035    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2036    __m128i tmp;
   2037 
   2038    // dot product constant: even elems=x, odd elems=y
   2039    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2040 
   2041    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2042    // out(1) = c1[even]*x + c1[odd]*y
   2043    #define dct_rot(out0,out1, x,y,c0,c1) \
   2044       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2045       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2046       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2047       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2048       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2049       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2050 
   2051    // out = in << 12  (in 16-bit, out 32-bit)
   2052    #define dct_widen(out, in) \
   2053       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2054       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2055 
   2056    // wide add
   2057    #define dct_wadd(out, a, b) \
   2058       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2059       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2060 
   2061    // wide sub
   2062    #define dct_wsub(out, a, b) \
   2063       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2064       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2065 
   2066    // butterfly a/b, add bias, then shift by "s" and pack
   2067    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2068       { \
   2069          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2070          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2071          dct_wadd(sum, abiased, b); \
   2072          dct_wsub(dif, abiased, b); \
   2073          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2074          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2075       }
   2076 
   2077    // 8-bit interleave step (for transposes)
   2078    #define dct_interleave8(a, b) \
   2079       tmp = a; \
   2080       a = _mm_unpacklo_epi8(a, b); \
   2081       b = _mm_unpackhi_epi8(tmp, b)
   2082 
   2083    // 16-bit interleave step (for transposes)
   2084    #define dct_interleave16(a, b) \
   2085       tmp = a; \
   2086       a = _mm_unpacklo_epi16(a, b); \
   2087       b = _mm_unpackhi_epi16(tmp, b)
   2088 
   2089    #define dct_pass(bias,shift) \
   2090       { \
   2091          /* even part */ \
   2092          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2093          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2094          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2095          dct_widen(t0e, sum04); \
   2096          dct_widen(t1e, dif04); \
   2097          dct_wadd(x0, t0e, t3e); \
   2098          dct_wsub(x3, t0e, t3e); \
   2099          dct_wadd(x1, t1e, t2e); \
   2100          dct_wsub(x2, t1e, t2e); \
   2101          /* odd part */ \
   2102          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2103          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2104          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2105          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2106          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2107          dct_wadd(x4, y0o, y4o); \
   2108          dct_wadd(x5, y1o, y5o); \
   2109          dct_wadd(x6, y2o, y5o); \
   2110          dct_wadd(x7, y3o, y4o); \
   2111          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2112          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2113          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2114          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2115       }
   2116 
   2117    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2118    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2119    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2120    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2121    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2122    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2123    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2124    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2125 
   2126    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2127    __m128i bias_0 = _mm_set1_epi32(512);
   2128    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2129 
   2130    // load
   2131    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2132    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2133    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2134    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2135    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2136    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2137    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2138    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2139 
   2140    // column pass
   2141    dct_pass(bias_0, 10);
   2142 
   2143    {
   2144       // 16bit 8x8 transpose pass 1
   2145       dct_interleave16(row0, row4);
   2146       dct_interleave16(row1, row5);
   2147       dct_interleave16(row2, row6);
   2148       dct_interleave16(row3, row7);
   2149 
   2150       // transpose pass 2
   2151       dct_interleave16(row0, row2);
   2152       dct_interleave16(row1, row3);
   2153       dct_interleave16(row4, row6);
   2154       dct_interleave16(row5, row7);
   2155 
   2156       // transpose pass 3
   2157       dct_interleave16(row0, row1);
   2158       dct_interleave16(row2, row3);
   2159       dct_interleave16(row4, row5);
   2160       dct_interleave16(row6, row7);
   2161    }
   2162 
   2163    // row pass
   2164    dct_pass(bias_1, 17);
   2165 
   2166    {
   2167       // pack
   2168       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2169       __m128i p1 = _mm_packus_epi16(row2, row3);
   2170       __m128i p2 = _mm_packus_epi16(row4, row5);
   2171       __m128i p3 = _mm_packus_epi16(row6, row7);
   2172 
   2173       // 8bit 8x8 transpose pass 1
   2174       dct_interleave8(p0, p2); // a0e0a1e1...
   2175       dct_interleave8(p1, p3); // c0g0c1g1...
   2176 
   2177       // transpose pass 2
   2178       dct_interleave8(p0, p1); // a0c0e0g0...
   2179       dct_interleave8(p2, p3); // b0d0f0h0...
   2180 
   2181       // transpose pass 3
   2182       dct_interleave8(p0, p2); // a0b0c0d0...
   2183       dct_interleave8(p1, p3); // a4b4c4d4...
   2184 
   2185       // store
   2186       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2187       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2188       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2189       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2190       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2191       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2192       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2193       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2194    }
   2195 
   2196 #undef dct_const
   2197 #undef dct_rot
   2198 #undef dct_widen
   2199 #undef dct_wadd
   2200 #undef dct_wsub
   2201 #undef dct_bfly32o
   2202 #undef dct_interleave8
   2203 #undef dct_interleave16
   2204 #undef dct_pass
   2205 }
   2206 
   2207 #endif // STBI_SSE2
   2208 
   2209 #ifdef STBI_NEON
   2210 
   2211 // NEON integer IDCT. should produce bit-identical
   2212 // results to the generic C version.
   2213 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2214 {
   2215    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2216 
   2217    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2218    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2219    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2220    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2221    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2222    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2223    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2224    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2225    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2226    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2227    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2228    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2229 
   2230 #define dct_long_mul(out, inq, coeff) \
   2231    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2232    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2233 
   2234 #define dct_long_mac(out, acc, inq, coeff) \
   2235    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2236    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2237 
   2238 #define dct_widen(out, inq) \
   2239    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2240    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2241 
   2242 // wide add
   2243 #define dct_wadd(out, a, b) \
   2244    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2245    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2246 
   2247 // wide sub
   2248 #define dct_wsub(out, a, b) \
   2249    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2250    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2251 
   2252 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2253 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2254    { \
   2255       dct_wadd(sum, a, b); \
   2256       dct_wsub(dif, a, b); \
   2257       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2258       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2259    }
   2260 
   2261 #define dct_pass(shiftop, shift) \
   2262    { \
   2263       /* even part */ \
   2264       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2265       dct_long_mul(p1e, sum26, rot0_0); \
   2266       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2267       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2268       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2269       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2270       dct_widen(t0e, sum04); \
   2271       dct_widen(t1e, dif04); \
   2272       dct_wadd(x0, t0e, t3e); \
   2273       dct_wsub(x3, t0e, t3e); \
   2274       dct_wadd(x1, t1e, t2e); \
   2275       dct_wsub(x2, t1e, t2e); \
   2276       /* odd part */ \
   2277       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2278       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2279       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2280       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2281       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2282       dct_long_mul(p5o, sumodd, rot1_0); \
   2283       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2284       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2285       dct_long_mul(p3o, sum37, rot2_0); \
   2286       dct_long_mul(p4o, sum15, rot2_1); \
   2287       dct_wadd(sump13o, p1o, p3o); \
   2288       dct_wadd(sump24o, p2o, p4o); \
   2289       dct_wadd(sump23o, p2o, p3o); \
   2290       dct_wadd(sump14o, p1o, p4o); \
   2291       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2292       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2293       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2294       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2295       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2296       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2297       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2298       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2299    }
   2300 
   2301    // load
   2302    row0 = vld1q_s16(data + 0*8);
   2303    row1 = vld1q_s16(data + 1*8);
   2304    row2 = vld1q_s16(data + 2*8);
   2305    row3 = vld1q_s16(data + 3*8);
   2306    row4 = vld1q_s16(data + 4*8);
   2307    row5 = vld1q_s16(data + 5*8);
   2308    row6 = vld1q_s16(data + 6*8);
   2309    row7 = vld1q_s16(data + 7*8);
   2310 
   2311    // add DC bias
   2312    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2313 
   2314    // column pass
   2315    dct_pass(vrshrn_n_s32, 10);
   2316 
   2317    // 16bit 8x8 transpose
   2318    {
   2319 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2320 // whether compilers actually get this is another story, sadly.
   2321 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2322 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2323 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2324 
   2325       // pass 1
   2326       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2327       dct_trn16(row2, row3);
   2328       dct_trn16(row4, row5);
   2329       dct_trn16(row6, row7);
   2330 
   2331       // pass 2
   2332       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2333       dct_trn32(row1, row3);
   2334       dct_trn32(row4, row6);
   2335       dct_trn32(row5, row7);
   2336 
   2337       // pass 3
   2338       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2339       dct_trn64(row1, row5);
   2340       dct_trn64(row2, row6);
   2341       dct_trn64(row3, row7);
   2342 
   2343 #undef dct_trn16
   2344 #undef dct_trn32
   2345 #undef dct_trn64
   2346    }
   2347 
   2348    // row pass
   2349    // vrshrn_n_s32 only supports shifts up to 16, we need
   2350    // 17. so do a non-rounding shift of 16 first then follow
   2351    // up with a rounding shift by 1.
   2352    dct_pass(vshrn_n_s32, 16);
   2353 
   2354    {
   2355       // pack and round
   2356       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2357       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2358       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2359       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2360       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2361       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2362       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2363       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2364 
   2365       // again, these can translate into one instruction, but often don't.
   2366 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2367 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2368 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2369 
   2370       // sadly can't use interleaved stores here since we only write
   2371       // 8 bytes to each scan line!
   2372 
   2373       // 8x8 8-bit transpose pass 1
   2374       dct_trn8_8(p0, p1);
   2375       dct_trn8_8(p2, p3);
   2376       dct_trn8_8(p4, p5);
   2377       dct_trn8_8(p6, p7);
   2378 
   2379       // pass 2
   2380       dct_trn8_16(p0, p2);
   2381       dct_trn8_16(p1, p3);
   2382       dct_trn8_16(p4, p6);
   2383       dct_trn8_16(p5, p7);
   2384 
   2385       // pass 3
   2386       dct_trn8_32(p0, p4);
   2387       dct_trn8_32(p1, p5);
   2388       dct_trn8_32(p2, p6);
   2389       dct_trn8_32(p3, p7);
   2390 
   2391       // store
   2392       vst1_u8(out, p0); out += out_stride;
   2393       vst1_u8(out, p1); out += out_stride;
   2394       vst1_u8(out, p2); out += out_stride;
   2395       vst1_u8(out, p3); out += out_stride;
   2396       vst1_u8(out, p4); out += out_stride;
   2397       vst1_u8(out, p5); out += out_stride;
   2398       vst1_u8(out, p6); out += out_stride;
   2399       vst1_u8(out, p7);
   2400 
   2401 #undef dct_trn8_8
   2402 #undef dct_trn8_16
   2403 #undef dct_trn8_32
   2404    }
   2405 
   2406 #undef dct_long_mul
   2407 #undef dct_long_mac
   2408 #undef dct_widen
   2409 #undef dct_wadd
   2410 #undef dct_wsub
   2411 #undef dct_bfly32o
   2412 #undef dct_pass
   2413 }
   2414 
   2415 #endif // STBI_NEON
   2416 
   2417 #define STBI__MARKER_none  0xff
   2418 // if there's a pending marker from the entropy stream, return that
   2419 // otherwise, fetch from the stream and get a marker. if there's no
   2420 // marker, return 0xff, which is never a valid marker value
   2421 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2422 {
   2423    stbi_uc x;
   2424    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2425    x = stbi__get8(j->s);
   2426    if (x != 0xff) return STBI__MARKER_none;
   2427    while (x == 0xff)
   2428       x = stbi__get8(j->s);
   2429    return x;
   2430 }
   2431 
   2432 // in each scan, we'll have scan_n components, and the order
   2433 // of the components is specified by order[]
   2434 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2435 
   2436 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2437 // the dc prediction
   2438 static void stbi__jpeg_reset(stbi__jpeg *j)
   2439 {
   2440    j->code_bits = 0;
   2441    j->code_buffer = 0;
   2442    j->nomore = 0;
   2443    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
   2444    j->marker = STBI__MARKER_none;
   2445    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2446    j->eob_run = 0;
   2447    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2448    // since we don't even allow 1<<30 pixels
   2449 }
   2450 
   2451 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2452 {
   2453    stbi__jpeg_reset(z);
   2454    if (!z->progressive) {
   2455       if (z->scan_n == 1) {
   2456          int i,j;
   2457          STBI_SIMD_ALIGN(short, data[64]);
   2458          int n = z->order[0];
   2459          // non-interleaved data, we just need to process one block at a time,
   2460          // in trivial scanline order
   2461          // number of blocks to do just depends on how many actual "pixels" this
   2462          // component has, independent of interleaved MCU blocking and such
   2463          int w = (z->img_comp[n].x+7) >> 3;
   2464          int h = (z->img_comp[n].y+7) >> 3;
   2465          for (j=0; j < h; ++j) {
   2466             for (i=0; i < w; ++i) {
   2467                int ha = z->img_comp[n].ha;
   2468                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2469                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2470                // every data block is an MCU, so countdown the restart interval
   2471                if (--z->todo <= 0) {
   2472                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2473                   // if it's NOT a restart, then just bail, so we get corrupt data
   2474                   // rather than no data
   2475                   if (!STBI__RESTART(z->marker)) return 1;
   2476                   stbi__jpeg_reset(z);
   2477                }
   2478             }
   2479          }
   2480          return 1;
   2481       } else { // interleaved
   2482          int i,j,k,x,y;
   2483          STBI_SIMD_ALIGN(short, data[64]);
   2484          for (j=0; j < z->img_mcu_y; ++j) {
   2485             for (i=0; i < z->img_mcu_x; ++i) {
   2486                // scan an interleaved mcu... process scan_n components in order
   2487                for (k=0; k < z->scan_n; ++k) {
   2488                   int n = z->order[k];
   2489                   // scan out an mcu's worth of this component; that's just determined
   2490                   // by the basic H and V specified for the component
   2491                   for (y=0; y < z->img_comp[n].v; ++y) {
   2492                      for (x=0; x < z->img_comp[n].h; ++x) {
   2493                         int x2 = (i*z->img_comp[n].h + x)*8;
   2494                         int y2 = (j*z->img_comp[n].v + y)*8;
   2495                         int ha = z->img_comp[n].ha;
   2496                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2497                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2498                      }
   2499                   }
   2500                }
   2501                // after all interleaved components, that's an interleaved MCU,
   2502                // so now count down the restart interval
   2503                if (--z->todo <= 0) {
   2504                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2505                   if (!STBI__RESTART(z->marker)) return 1;
   2506                   stbi__jpeg_reset(z);
   2507                }
   2508             }
   2509          }
   2510          return 1;
   2511       }
   2512    } else {
   2513       if (z->scan_n == 1) {
   2514          int i,j;
   2515          int n = z->order[0];
   2516          // non-interleaved data, we just need to process one block at a time,
   2517          // in trivial scanline order
   2518          // number of blocks to do just depends on how many actual "pixels" this
   2519          // component has, independent of interleaved MCU blocking and such
   2520          int w = (z->img_comp[n].x+7) >> 3;
   2521          int h = (z->img_comp[n].y+7) >> 3;
   2522          for (j=0; j < h; ++j) {
   2523             for (i=0; i < w; ++i) {
   2524                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2525                if (z->spec_start == 0) {
   2526                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2527                      return 0;
   2528                } else {
   2529                   int ha = z->img_comp[n].ha;
   2530                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   2531                      return 0;
   2532                }
   2533                // every data block is an MCU, so countdown the restart interval
   2534                if (--z->todo <= 0) {
   2535                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2536                   if (!STBI__RESTART(z->marker)) return 1;
   2537                   stbi__jpeg_reset(z);
   2538                }
   2539             }
   2540          }
   2541          return 1;
   2542       } else { // interleaved
   2543          int i,j,k,x,y;
   2544          for (j=0; j < z->img_mcu_y; ++j) {
   2545             for (i=0; i < z->img_mcu_x; ++i) {
   2546                // scan an interleaved mcu... process scan_n components in order
   2547                for (k=0; k < z->scan_n; ++k) {
   2548                   int n = z->order[k];
   2549                   // scan out an mcu's worth of this component; that's just determined
   2550                   // by the basic H and V specified for the component
   2551                   for (y=0; y < z->img_comp[n].v; ++y) {
   2552                      for (x=0; x < z->img_comp[n].h; ++x) {
   2553                         int x2 = (i*z->img_comp[n].h + x);
   2554                         int y2 = (j*z->img_comp[n].v + y);
   2555                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   2556                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2557                            return 0;
   2558                      }
   2559                   }
   2560                }
   2561                // after all interleaved components, that's an interleaved MCU,
   2562                // so now count down the restart interval
   2563                if (--z->todo <= 0) {
   2564                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2565                   if (!STBI__RESTART(z->marker)) return 1;
   2566                   stbi__jpeg_reset(z);
   2567                }
   2568             }
   2569          }
   2570          return 1;
   2571       }
   2572    }
   2573 }
   2574 
   2575 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
   2576 {
   2577    int i;
   2578    for (i=0; i < 64; ++i)
   2579       data[i] *= dequant[i];
   2580 }
   2581 
   2582 static void stbi__jpeg_finish(stbi__jpeg *z)
   2583 {
   2584    if (z->progressive) {
   2585       // dequantize and idct the data
   2586       int i,j,n;
   2587       for (n=0; n < z->s->img_n; ++n) {
   2588          int w = (z->img_comp[n].x+7) >> 3;
   2589          int h = (z->img_comp[n].y+7) >> 3;
   2590          for (j=0; j < h; ++j) {
   2591             for (i=0; i < w; ++i) {
   2592                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2593                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   2594                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2595             }
   2596          }
   2597       }
   2598    }
   2599 }
   2600 
   2601 static int stbi__process_marker(stbi__jpeg *z, int m)
   2602 {
   2603    int L;
   2604    switch (m) {
   2605       case STBI__MARKER_none: // no marker found
   2606          return stbi__err("expected marker","Corrupt JPEG");
   2607 
   2608       case 0xDD: // DRI - specify restart interval
   2609          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   2610          z->restart_interval = stbi__get16be(z->s);
   2611          return 1;
   2612 
   2613       case 0xDB: // DQT - define quantization table
   2614          L = stbi__get16be(z->s)-2;
   2615          while (L > 0) {
   2616             int q = stbi__get8(z->s);
   2617             int p = q >> 4;
   2618             int t = q & 15,i;
   2619             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
   2620             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   2621             for (i=0; i < 64; ++i)
   2622                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
   2623             L -= 65;
   2624          }
   2625          return L==0;
   2626 
   2627       case 0xC4: // DHT - define huffman table
   2628          L = stbi__get16be(z->s)-2;
   2629          while (L > 0) {
   2630             stbi_uc *v;
   2631             int sizes[16],i,n=0;
   2632             int q = stbi__get8(z->s);
   2633             int tc = q >> 4;
   2634             int th = q & 15;
   2635             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   2636             for (i=0; i < 16; ++i) {
   2637                sizes[i] = stbi__get8(z->s);
   2638                n += sizes[i];
   2639             }
   2640             L -= 17;
   2641             if (tc == 0) {
   2642                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   2643                v = z->huff_dc[th].values;
   2644             } else {
   2645                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   2646                v = z->huff_ac[th].values;
   2647             }
   2648             for (i=0; i < n; ++i)
   2649                v[i] = stbi__get8(z->s);
   2650             if (tc != 0)
   2651                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   2652             L -= n;
   2653          }
   2654          return L==0;
   2655    }
   2656    // check for comment block or APP blocks
   2657    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   2658       stbi__skip(z->s, stbi__get16be(z->s)-2);
   2659       return 1;
   2660    }
   2661    return 0;
   2662 }
   2663 
   2664 // after we see SOS
   2665 static int stbi__process_scan_header(stbi__jpeg *z)
   2666 {
   2667    int i;
   2668    int Ls = stbi__get16be(z->s);
   2669    z->scan_n = stbi__get8(z->s);
   2670    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   2671    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   2672    for (i=0; i < z->scan_n; ++i) {
   2673       int id = stbi__get8(z->s), which;
   2674       int q = stbi__get8(z->s);
   2675       for (which = 0; which < z->s->img_n; ++which)
   2676          if (z->img_comp[which].id == id)
   2677             break;
   2678       if (which == z->s->img_n) return 0; // no match
   2679       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   2680       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   2681       z->order[i] = which;
   2682    }
   2683 
   2684    {
   2685       int aa;
   2686       z->spec_start = stbi__get8(z->s);
   2687       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   2688       aa = stbi__get8(z->s);
   2689       z->succ_high = (aa >> 4);
   2690       z->succ_low  = (aa & 15);
   2691       if (z->progressive) {
   2692          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   2693             return stbi__err("bad SOS", "Corrupt JPEG");
   2694       } else {
   2695          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2696          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   2697          z->spec_end = 63;
   2698       }
   2699    }
   2700 
   2701    return 1;
   2702 }
   2703 
   2704 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   2705 {
   2706    stbi__context *s = z->s;
   2707    int Lf,p,i,q, h_max=1,v_max=1,c;
   2708    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   2709    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   2710    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   2711    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   2712    c = stbi__get8(s);
   2713    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
   2714    s->img_n = c;
   2715    for (i=0; i < c; ++i) {
   2716       z->img_comp[i].data = NULL;
   2717       z->img_comp[i].linebuf = NULL;
   2718    }
   2719 
   2720    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   2721 
   2722    for (i=0; i < s->img_n; ++i) {
   2723       z->img_comp[i].id = stbi__get8(s);
   2724       if (z->img_comp[i].id != i+1)   // JFIF requires
   2725          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
   2726             return stbi__err("bad component ID","Corrupt JPEG");
   2727       q = stbi__get8(s);
   2728       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   2729       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   2730       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   2731    }
   2732 
   2733    if (scan != STBI__SCAN_load) return 1;
   2734 
   2735    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   2736 
   2737    for (i=0; i < s->img_n; ++i) {
   2738       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   2739       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   2740    }
   2741 
   2742    // compute interleaved mcu info
   2743    z->img_h_max = h_max;
   2744    z->img_v_max = v_max;
   2745    z->img_mcu_w = h_max * 8;
   2746    z->img_mcu_h = v_max * 8;
   2747    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   2748    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   2749 
   2750    for (i=0; i < s->img_n; ++i) {
   2751       // number of effective pixels (e.g. for non-interleaved MCU)
   2752       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   2753       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   2754       // to simplify generation, we'll allocate enough memory to decode
   2755       // the bogus oversized data from using interleaved MCUs and their
   2756       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   2757       // discard the extra data until colorspace conversion
   2758       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   2759       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   2760       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
   2761 
   2762       if (z->img_comp[i].raw_data == NULL) {
   2763          for(--i; i >= 0; --i) {
   2764             STBI_FREE(z->img_comp[i].raw_data);
   2765             z->img_comp[i].raw_data = NULL;
   2766          }
   2767          return stbi__err("outofmem", "Out of memory");
   2768       }
   2769       // align blocks for idct using mmx/sse
   2770       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   2771       z->img_comp[i].linebuf = NULL;
   2772       if (z->progressive) {
   2773          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
   2774          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
   2775          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
   2776          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   2777       } else {
   2778          z->img_comp[i].coeff = 0;
   2779          z->img_comp[i].raw_coeff = 0;
   2780       }
   2781    }
   2782 
   2783    return 1;
   2784 }
   2785 
   2786 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   2787 #define stbi__DNL(x)         ((x) == 0xdc)
   2788 #define stbi__SOI(x)         ((x) == 0xd8)
   2789 #define stbi__EOI(x)         ((x) == 0xd9)
   2790 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   2791 #define stbi__SOS(x)         ((x) == 0xda)
   2792 
   2793 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   2794 
   2795 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   2796 {
   2797    int m;
   2798    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   2799    m = stbi__get_marker(z);
   2800    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   2801    if (scan == STBI__SCAN_type) return 1;
   2802    m = stbi__get_marker(z);
   2803    while (!stbi__SOF(m)) {
   2804       if (!stbi__process_marker(z,m)) return 0;
   2805       m = stbi__get_marker(z);
   2806       while (m == STBI__MARKER_none) {
   2807          // some files have extra padding after their blocks, so ok, we'll scan
   2808          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   2809          m = stbi__get_marker(z);
   2810       }
   2811    }
   2812    z->progressive = stbi__SOF_progressive(m);
   2813    if (!stbi__process_frame_header(z, scan)) return 0;
   2814    return 1;
   2815 }
   2816 
   2817 // decode image to YCbCr format
   2818 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   2819 {
   2820    int m;
   2821    for (m = 0; m < 4; m++) {
   2822       j->img_comp[m].raw_data = NULL;
   2823       j->img_comp[m].raw_coeff = NULL;
   2824    }
   2825    j->restart_interval = 0;
   2826    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   2827    m = stbi__get_marker(j);
   2828    while (!stbi__EOI(m)) {
   2829       if (stbi__SOS(m)) {
   2830          if (!stbi__process_scan_header(j)) return 0;
   2831          if (!stbi__parse_entropy_coded_data(j)) return 0;
   2832          if (j->marker == STBI__MARKER_none ) {
   2833             // handle 0s at the end of image data from IP Kamera 9060
   2834             while (!stbi__at_eof(j->s)) {
   2835                int x = stbi__get8(j->s);
   2836                if (x == 255) {
   2837                   j->marker = stbi__get8(j->s);
   2838                   break;
   2839                } else if (x != 0) {
   2840                   return stbi__err("junk before marker", "Corrupt JPEG");
   2841                }
   2842             }
   2843             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   2844          }
   2845       } else {
   2846          if (!stbi__process_marker(j, m)) return 0;
   2847       }
   2848       m = stbi__get_marker(j);
   2849    }
   2850    if (j->progressive)
   2851       stbi__jpeg_finish(j);
   2852    return 1;
   2853 }
   2854 
   2855 // static jfif-centered resampling (across block boundaries)
   2856 
   2857 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   2858                                     int w, int hs);
   2859 
   2860 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   2861 
   2862 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2863 {
   2864    STBI_NOTUSED(out);
   2865    STBI_NOTUSED(in_far);
   2866    STBI_NOTUSED(w);
   2867    STBI_NOTUSED(hs);
   2868    return in_near;
   2869 }
   2870 
   2871 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2872 {
   2873    // need to generate two samples vertically for every one in input
   2874    int i;
   2875    STBI_NOTUSED(hs);
   2876    for (i=0; i < w; ++i)
   2877       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   2878    return out;
   2879 }
   2880 
   2881 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2882 {
   2883    // need to generate two samples horizontally for every one in input
   2884    int i;
   2885    stbi_uc *input = in_near;
   2886 
   2887    if (w == 1) {
   2888       // if only one sample, can't do any interpolation
   2889       out[0] = out[1] = input[0];
   2890       return out;
   2891    }
   2892 
   2893    out[0] = input[0];
   2894    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   2895    for (i=1; i < w-1; ++i) {
   2896       int n = 3*input[i]+2;
   2897       out[i*2+0] = stbi__div4(n+input[i-1]);
   2898       out[i*2+1] = stbi__div4(n+input[i+1]);
   2899    }
   2900    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   2901    out[i*2+1] = input[w-1];
   2902 
   2903    STBI_NOTUSED(in_far);
   2904    STBI_NOTUSED(hs);
   2905 
   2906    return out;
   2907 }
   2908 
   2909 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   2910 
   2911 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2912 {
   2913    // need to generate 2x2 samples for every one in input
   2914    int i,t0,t1;
   2915    if (w == 1) {
   2916       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   2917       return out;
   2918    }
   2919 
   2920    t1 = 3*in_near[0] + in_far[0];
   2921    out[0] = stbi__div4(t1+2);
   2922    for (i=1; i < w; ++i) {
   2923       t0 = t1;
   2924       t1 = 3*in_near[i]+in_far[i];
   2925       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   2926       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   2927    }
   2928    out[w*2-1] = stbi__div4(t1+2);
   2929 
   2930    STBI_NOTUSED(hs);
   2931 
   2932    return out;
   2933 }
   2934 
   2935 #if defined(STBI_SSE2) || defined(STBI_NEON)
   2936 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   2937 {
   2938    // need to generate 2x2 samples for every one in input
   2939    int i=0,t0,t1;
   2940 
   2941    if (w == 1) {
   2942       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   2943       return out;
   2944    }
   2945 
   2946    t1 = 3*in_near[0] + in_far[0];
   2947    // process groups of 8 pixels for as long as we can.
   2948    // note we can't handle the last pixel in a row in this loop
   2949    // because we need to handle the filter boundary conditions.
   2950    for (; i < ((w-1) & ~7); i += 8) {
   2951 #if defined(STBI_SSE2)
   2952       // load and perform the vertical filtering pass
   2953       // this uses 3*x + y = 4*x + (y - x)
   2954       __m128i zero  = _mm_setzero_si128();
   2955       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   2956       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   2957       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   2958       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   2959       __m128i diff  = _mm_sub_epi16(farw, nearw);
   2960       __m128i nears = _mm_slli_epi16(nearw, 2);
   2961       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   2962 
   2963       // horizontal filter works the same based on shifted vers of current
   2964       // row. "prev" is current row shifted right by 1 pixel; we need to
   2965       // insert the previous pixel value (from t1).
   2966       // "next" is current row shifted left by 1 pixel, with first pixel
   2967       // of next block of 8 pixels added in.
   2968       __m128i prv0 = _mm_slli_si128(curr, 2);
   2969       __m128i nxt0 = _mm_srli_si128(curr, 2);
   2970       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   2971       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   2972 
   2973       // horizontal filter, polyphase implementation since it's convenient:
   2974       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   2975       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   2976       // note the shared term.
   2977       __m128i bias  = _mm_set1_epi16(8);
   2978       __m128i curs = _mm_slli_epi16(curr, 2);
   2979       __m128i prvd = _mm_sub_epi16(prev, curr);
   2980       __m128i nxtd = _mm_sub_epi16(next, curr);
   2981       __m128i curb = _mm_add_epi16(curs, bias);
   2982       __m128i even = _mm_add_epi16(prvd, curb);
   2983       __m128i odd  = _mm_add_epi16(nxtd, curb);
   2984 
   2985       // interleave even and odd pixels, then undo scaling.
   2986       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   2987       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   2988       __m128i de0  = _mm_srli_epi16(int0, 4);
   2989       __m128i de1  = _mm_srli_epi16(int1, 4);
   2990 
   2991       // pack and write output
   2992       __m128i outv = _mm_packus_epi16(de0, de1);
   2993       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   2994 #elif defined(STBI_NEON)
   2995       // load and perform the vertical filtering pass
   2996       // this uses 3*x + y = 4*x + (y - x)
   2997       uint8x8_t farb  = vld1_u8(in_far + i);
   2998       uint8x8_t nearb = vld1_u8(in_near + i);
   2999       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   3000       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   3001       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3002 
   3003       // horizontal filter works the same based on shifted vers of current
   3004       // row. "prev" is current row shifted right by 1 pixel; we need to
   3005       // insert the previous pixel value (from t1).
   3006       // "next" is current row shifted left by 1 pixel, with first pixel
   3007       // of next block of 8 pixels added in.
   3008       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3009       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3010       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3011       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3012 
   3013       // horizontal filter, polyphase implementation since it's convenient:
   3014       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3015       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3016       // note the shared term.
   3017       int16x8_t curs = vshlq_n_s16(curr, 2);
   3018       int16x8_t prvd = vsubq_s16(prev, curr);
   3019       int16x8_t nxtd = vsubq_s16(next, curr);
   3020       int16x8_t even = vaddq_s16(curs, prvd);
   3021       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3022 
   3023       // undo scaling and round, then store with even/odd phases interleaved
   3024       uint8x8x2_t o;
   3025       o.val[0] = vqrshrun_n_s16(even, 4);
   3026       o.val[1] = vqrshrun_n_s16(odd,  4);
   3027       vst2_u8(out + i*2, o);
   3028 #endif
   3029 
   3030       // "previous" value for next iter
   3031       t1 = 3*in_near[i+7] + in_far[i+7];
   3032    }
   3033 
   3034    t0 = t1;
   3035    t1 = 3*in_near[i] + in_far[i];
   3036    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3037 
   3038    for (++i; i < w; ++i) {
   3039       t0 = t1;
   3040       t1 = 3*in_near[i]+in_far[i];
   3041       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3042       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3043    }
   3044    out[w*2-1] = stbi__div4(t1+2);
   3045 
   3046    STBI_NOTUSED(hs);
   3047 
   3048    return out;
   3049 }
   3050 #endif
   3051 
   3052 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3053 {
   3054    // resample with nearest-neighbor
   3055    int i,j;
   3056    STBI_NOTUSED(in_far);
   3057    for (i=0; i < w; ++i)
   3058       for (j=0; j < hs; ++j)
   3059          out[i*hs+j] = in_near[i];
   3060    return out;
   3061 }
   3062 
   3063 #ifdef STBI_JPEG_OLD
   3064 // this is the same YCbCr-to-RGB calculation that stb_image has used
   3065 // historically before the algorithm changes in 1.49
   3066 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
   3067 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3068 {
   3069    int i;
   3070    for (i=0; i < count; ++i) {
   3071       int y_fixed = (y[i] << 16) + 32768; // rounding
   3072       int r,g,b;
   3073       int cr = pcr[i] - 128;
   3074       int cb = pcb[i] - 128;
   3075       r = y_fixed + cr*float2fixed(1.40200f);
   3076       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
   3077       b = y_fixed                            + cb*float2fixed(1.77200f);
   3078       r >>= 16;
   3079       g >>= 16;
   3080       b >>= 16;
   3081       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3082       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3083       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3084       out[0] = (stbi_uc)r;
   3085       out[1] = (stbi_uc)g;
   3086       out[2] = (stbi_uc)b;
   3087       out[3] = 255;
   3088       out += step;
   3089    }
   3090 }
   3091 #else
   3092 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3093 // to make sure the code produces the same results in both SIMD and scalar
   3094 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3095 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3096 {
   3097    int i;
   3098    for (i=0; i < count; ++i) {
   3099       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3100       int r,g,b;
   3101       int cr = pcr[i] - 128;
   3102       int cb = pcb[i] - 128;
   3103       r = y_fixed +  cr* float2fixed(1.40200f);
   3104       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
   3105       b = y_fixed                               +   cb* float2fixed(1.77200f);
   3106       r >>= 20;
   3107       g >>= 20;
   3108       b >>= 20;
   3109       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3110       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3111       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3112       out[0] = (stbi_uc)r;
   3113       out[1] = (stbi_uc)g;
   3114       out[2] = (stbi_uc)b;
   3115       out[3] = 255;
   3116       out += step;
   3117    }
   3118 }
   3119 #endif
   3120 
   3121 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3122 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3123 {
   3124    int i = 0;
   3125 
   3126 #ifdef STBI_SSE2
   3127    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3128    // it's useful in practice (you wouldn't use it for textures, for example).
   3129    // so just accelerate step == 4 case.
   3130    if (step == 4) {
   3131       // this is a fairly straightforward implementation and not super-optimized.
   3132       __m128i signflip  = _mm_set1_epi8(-0x80);
   3133       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3134       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3135       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3136       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3137       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3138       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3139 
   3140       for (; i+7 < count; i += 8) {
   3141          // load
   3142          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3143          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3144          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3145          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3146          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3147 
   3148          // unpack to short (and left-shift cr, cb by 8)
   3149          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3150          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3151          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3152 
   3153          // color transform
   3154          __m128i yws = _mm_srli_epi16(yw, 4);
   3155          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3156          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3157          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3158          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3159          __m128i rws = _mm_add_epi16(cr0, yws);
   3160          __m128i gwt = _mm_add_epi16(cb0, yws);
   3161          __m128i bws = _mm_add_epi16(yws, cb1);
   3162          __m128i gws = _mm_add_epi16(gwt, cr1);
   3163 
   3164          // descale
   3165          __m128i rw = _mm_srai_epi16(rws, 4);
   3166          __m128i bw = _mm_srai_epi16(bws, 4);
   3167          __m128i gw = _mm_srai_epi16(gws, 4);
   3168 
   3169          // back to byte, set up for transpose
   3170          __m128i brb = _mm_packus_epi16(rw, bw);
   3171          __m128i gxb = _mm_packus_epi16(gw, xw);
   3172 
   3173          // transpose to interleave channels
   3174          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3175          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3176          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3177          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3178 
   3179          // store
   3180          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3181          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3182          out += 32;
   3183       }
   3184    }
   3185 #endif
   3186 
   3187 #ifdef STBI_NEON
   3188    // in this version, step=3 support would be easy to add. but is there demand?
   3189    if (step == 4) {
   3190       // this is a fairly straightforward implementation and not super-optimized.
   3191       uint8x8_t signflip = vdup_n_u8(0x80);
   3192       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3193       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3194       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3195       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3196 
   3197       for (; i+7 < count; i += 8) {
   3198          // load
   3199          uint8x8_t y_bytes  = vld1_u8(y + i);
   3200          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3201          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3202          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3203          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3204 
   3205          // expand to s16
   3206          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3207          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3208          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3209 
   3210          // color transform
   3211          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3212          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3213          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3214          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3215          int16x8_t rws = vaddq_s16(yws, cr0);
   3216          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3217          int16x8_t bws = vaddq_s16(yws, cb1);
   3218 
   3219          // undo scaling, round, convert to byte
   3220          uint8x8x4_t o;
   3221          o.val[0] = vqrshrun_n_s16(rws, 4);
   3222          o.val[1] = vqrshrun_n_s16(gws, 4);
   3223          o.val[2] = vqrshrun_n_s16(bws, 4);
   3224          o.val[3] = vdup_n_u8(255);
   3225 
   3226          // store, interleaving r/g/b/a
   3227          vst4_u8(out, o);
   3228          out += 8*4;
   3229       }
   3230    }
   3231 #endif
   3232 
   3233    for (; i < count; ++i) {
   3234       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3235       int r,g,b;
   3236       int cr = pcr[i] - 128;
   3237       int cb = pcb[i] - 128;
   3238       r = y_fixed + cr* float2fixed(1.40200f);
   3239       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
   3240       b = y_fixed                             +   cb* float2fixed(1.77200f);
   3241       r >>= 20;
   3242       g >>= 20;
   3243       b >>= 20;
   3244       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3245       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3246       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3247       out[0] = (stbi_uc)r;
   3248       out[1] = (stbi_uc)g;
   3249       out[2] = (stbi_uc)b;
   3250       out[3] = 255;
   3251       out += step;
   3252    }
   3253 }
   3254 #endif
   3255 
   3256 // set up the kernels
   3257 static void stbi__setup_jpeg(stbi__jpeg *j)
   3258 {
   3259    j->idct_block_kernel = stbi__idct_block;
   3260    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3261    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3262 
   3263 #ifdef STBI_SSE2
   3264    if (stbi__sse2_available()) {
   3265       j->idct_block_kernel = stbi__idct_simd;
   3266       #ifndef STBI_JPEG_OLD
   3267       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3268       #endif
   3269       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3270    }
   3271 #endif
   3272 
   3273 #ifdef STBI_NEON
   3274    j->idct_block_kernel = stbi__idct_simd;
   3275    #ifndef STBI_JPEG_OLD
   3276    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3277    #endif
   3278    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3279 #endif
   3280 }
   3281 
   3282 // clean up the temporary component buffers
   3283 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3284 {
   3285    int i;
   3286    for (i=0; i < j->s->img_n; ++i) {
   3287       if (j->img_comp[i].raw_data) {
   3288          STBI_FREE(j->img_comp[i].raw_data);
   3289          j->img_comp[i].raw_data = NULL;
   3290          j->img_comp[i].data = NULL;
   3291       }
   3292       if (j->img_comp[i].raw_coeff) {
   3293          STBI_FREE(j->img_comp[i].raw_coeff);
   3294          j->img_comp[i].raw_coeff = 0;
   3295          j->img_comp[i].coeff = 0;
   3296       }
   3297       if (j->img_comp[i].linebuf) {
   3298          STBI_FREE(j->img_comp[i].linebuf);
   3299          j->img_comp[i].linebuf = NULL;
   3300       }
   3301    }
   3302 }
   3303 
   3304 typedef struct
   3305 {
   3306    resample_row_func resample;
   3307    stbi_uc *line0,*line1;
   3308    int hs,vs;   // expansion factor in each axis
   3309    int w_lores; // horizontal pixels pre-expansion
   3310    int ystep;   // how far through vertical expansion we are
   3311    int ypos;    // which pre-expansion row we're on
   3312 } stbi__resample;
   3313 
   3314 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3315 {
   3316    int n, decode_n;
   3317    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3318 
   3319    // validate req_comp
   3320    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3321 
   3322    // load a jpeg image from whichever source, but leave in YCbCr format
   3323    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3324 
   3325    // determine actual number of components to generate
   3326    n = req_comp ? req_comp : z->s->img_n;
   3327 
   3328    if (z->s->img_n == 3 && n < 3)
   3329       decode_n = 1;
   3330    else
   3331       decode_n = z->s->img_n;
   3332 
   3333    // resample and color-convert
   3334    {
   3335       int k;
   3336       unsigned int i,j;
   3337       stbi_uc *output;
   3338       stbi_uc *coutput[4];
   3339 
   3340       stbi__resample res_comp[4];
   3341 
   3342       for (k=0; k < decode_n; ++k) {
   3343          stbi__resample *r = &res_comp[k];
   3344 
   3345          // allocate line buffer big enough for upsampling off the edges
   3346          // with upsample factor of 4
   3347          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3348          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3349 
   3350          r->hs      = z->img_h_max / z->img_comp[k].h;
   3351          r->vs      = z->img_v_max / z->img_comp[k].v;
   3352          r->ystep   = r->vs >> 1;
   3353          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3354          r->ypos    = 0;
   3355          r->line0   = r->line1 = z->img_comp[k].data;
   3356 
   3357          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3358          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3359          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3360          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3361          else                               r->resample = stbi__resample_row_generic;
   3362       }
   3363 
   3364       // can't error after this so, this is safe
   3365       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
   3366       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3367 
   3368       // now go ahead and resample
   3369       for (j=0; j < z->s->img_y; ++j) {
   3370          stbi_uc *out = output + n * z->s->img_x * j;
   3371          for (k=0; k < decode_n; ++k) {
   3372             stbi__resample *r = &res_comp[k];
   3373             int y_bot = r->ystep >= (r->vs >> 1);
   3374             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3375                                      y_bot ? r->line1 : r->line0,
   3376                                      y_bot ? r->line0 : r->line1,
   3377                                      r->w_lores, r->hs);
   3378             if (++r->ystep >= r->vs) {
   3379                r->ystep = 0;
   3380                r->line0 = r->line1;
   3381                if (++r->ypos < z->img_comp[k].y)
   3382                   r->line1 += z->img_comp[k].w2;
   3383             }
   3384          }
   3385          if (n >= 3) {
   3386             stbi_uc *y = coutput[0];
   3387             if (z->s->img_n == 3) {
   3388                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3389             } else
   3390                for (i=0; i < z->s->img_x; ++i) {
   3391                   out[0] = out[1] = out[2] = y[i];
   3392                   out[3] = 255; // not used if n==3
   3393                   out += n;
   3394                }
   3395          } else {
   3396             stbi_uc *y = coutput[0];
   3397             if (n == 1)
   3398                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   3399             else
   3400                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
   3401          }
   3402       }
   3403       stbi__cleanup_jpeg(z);
   3404       *out_x = z->s->img_x;
   3405       *out_y = z->s->img_y;
   3406       if (comp) *comp  = z->s->img_n; // report original components, not output
   3407       return output;
   3408    }
   3409 }
   3410 
   3411 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   3412 {
   3413    stbi__jpeg j;
   3414    j.s = s;
   3415    stbi__setup_jpeg(&j);
   3416    return load_jpeg_image(&j, x,y,comp,req_comp);
   3417 }
   3418 
   3419 static int stbi__jpeg_test(stbi__context *s)
   3420 {
   3421    int r;
   3422    stbi__jpeg j;
   3423    j.s = s;
   3424    stbi__setup_jpeg(&j);
   3425    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
   3426    stbi__rewind(s);
   3427    return r;
   3428 }
   3429 
   3430 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   3431 {
   3432    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   3433       stbi__rewind( j->s );
   3434       return 0;
   3435    }
   3436    if (x) *x = j->s->img_x;
   3437    if (y) *y = j->s->img_y;
   3438    if (comp) *comp = j->s->img_n;
   3439    return 1;
   3440 }
   3441 
   3442 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   3443 {
   3444    stbi__jpeg j;
   3445    j.s = s;
   3446    return stbi__jpeg_info_raw(&j, x, y, comp);
   3447 }
   3448 #endif
   3449 
   3450 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   3451 //    simple implementation
   3452 //      - all input must be provided in an upfront buffer
   3453 //      - all output is written to a single output buffer (can malloc/realloc)
   3454 //    performance
   3455 //      - fast huffman
   3456 
   3457 #ifndef STBI_NO_ZLIB
   3458 
   3459 // fast-way is faster to check than jpeg huffman, but slow way is slower
   3460 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   3461 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   3462 
   3463 // zlib-style huffman encoding
   3464 // (jpegs packs from left, zlib from right, so can't share code)
   3465 typedef struct
   3466 {
   3467    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   3468    stbi__uint16 firstcode[16];
   3469    int maxcode[17];
   3470    stbi__uint16 firstsymbol[16];
   3471    stbi_uc  size[288];
   3472    stbi__uint16 value[288];
   3473 } stbi__zhuffman;
   3474 
   3475 stbi_inline static int stbi__bitreverse16(int n)
   3476 {
   3477   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   3478   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   3479   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   3480   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   3481   return n;
   3482 }
   3483 
   3484 stbi_inline static int stbi__bit_reverse(int v, int bits)
   3485 {
   3486    STBI_ASSERT(bits <= 16);
   3487    // to bit reverse n bits, reverse 16 and shift
   3488    // e.g. 11 bits, bit reverse and shift away 5
   3489    return stbi__bitreverse16(v) >> (16-bits);
   3490 }
   3491 
   3492 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
   3493 {
   3494    int i,k=0;
   3495    int code, next_code[16], sizes[17];
   3496 
   3497    // DEFLATE spec for generating codes
   3498    memset(sizes, 0, sizeof(sizes));
   3499    memset(z->fast, 0, sizeof(z->fast));
   3500    for (i=0; i < num; ++i)
   3501       ++sizes[sizelist[i]];
   3502    sizes[0] = 0;
   3503    for (i=1; i < 16; ++i)
   3504       if (sizes[i] > (1 << i))
   3505          return stbi__err("bad sizes", "Corrupt PNG");
   3506    code = 0;
   3507    for (i=1; i < 16; ++i) {
   3508       next_code[i] = code;
   3509       z->firstcode[i] = (stbi__uint16) code;
   3510       z->firstsymbol[i] = (stbi__uint16) k;
   3511       code = (code + sizes[i]);
   3512       if (sizes[i])
   3513          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   3514       z->maxcode[i] = code << (16-i); // preshift for inner loop
   3515       code <<= 1;
   3516       k += sizes[i];
   3517    }
   3518    z->maxcode[16] = 0x10000; // sentinel
   3519    for (i=0; i < num; ++i) {
   3520       int s = sizelist[i];
   3521       if (s) {
   3522          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   3523          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   3524          z->size [c] = (stbi_uc     ) s;
   3525          z->value[c] = (stbi__uint16) i;
   3526          if (s <= STBI__ZFAST_BITS) {
   3527             int j = stbi__bit_reverse(next_code[s],s);
   3528             while (j < (1 << STBI__ZFAST_BITS)) {
   3529                z->fast[j] = fastv;
   3530                j += (1 << s);
   3531             }
   3532          }
   3533          ++next_code[s];
   3534       }
   3535    }
   3536    return 1;
   3537 }
   3538 
   3539 // zlib-from-memory implementation for PNG reading
   3540 //    because PNG allows splitting the zlib stream arbitrarily,
   3541 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   3542 //    we require PNG read all the IDATs and combine them into a single
   3543 //    memory buffer
   3544 
   3545 typedef struct
   3546 {
   3547    stbi_uc *zbuffer, *zbuffer_end;
   3548    int num_bits;
   3549    stbi__uint32 code_buffer;
   3550 
   3551    char *zout;
   3552    char *zout_start;
   3553    char *zout_end;
   3554    int   z_expandable;
   3555 
   3556    stbi__zhuffman z_length, z_distance;
   3557 } stbi__zbuf;
   3558 
   3559 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   3560 {
   3561    if (z->zbuffer >= z->zbuffer_end) return 0;
   3562    return *z->zbuffer++;
   3563 }
   3564 
   3565 static void stbi__fill_bits(stbi__zbuf *z)
   3566 {
   3567    do {
   3568       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
   3569       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   3570       z->num_bits += 8;
   3571    } while (z->num_bits <= 24);
   3572 }
   3573 
   3574 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   3575 {
   3576    unsigned int k;
   3577    if (z->num_bits < n) stbi__fill_bits(z);
   3578    k = z->code_buffer & ((1 << n) - 1);
   3579    z->code_buffer >>= n;
   3580    z->num_bits -= n;
   3581    return k;
   3582 }
   3583 
   3584 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   3585 {
   3586    int b,s,k;
   3587    // not resolved by fast table, so compute it the slow way
   3588    // use jpeg approach, which requires MSbits at top
   3589    k = stbi__bit_reverse(a->code_buffer, 16);
   3590    for (s=STBI__ZFAST_BITS+1; ; ++s)
   3591       if (k < z->maxcode[s])
   3592          break;
   3593    if (s == 16) return -1; // invalid code!
   3594    // code size is s, so:
   3595    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   3596    STBI_ASSERT(z->size[b] == s);
   3597    a->code_buffer >>= s;
   3598    a->num_bits -= s;
   3599    return z->value[b];
   3600 }
   3601 
   3602 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   3603 {
   3604    int b,s;
   3605    if (a->num_bits < 16) stbi__fill_bits(a);
   3606    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   3607    if (b) {
   3608       s = b >> 9;
   3609       a->code_buffer >>= s;
   3610       a->num_bits -= s;
   3611       return b & 511;
   3612    }
   3613    return stbi__zhuffman_decode_slowpath(a, z);
   3614 }
   3615 
   3616 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   3617 {
   3618    char *q;
   3619    int cur, limit;
   3620    z->zout = zout;
   3621    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   3622    cur   = (int) (z->zout     - z->zout_start);
   3623    limit = (int) (z->zout_end - z->zout_start);
   3624    while (cur + n > limit)
   3625       limit *= 2;
   3626    q = (char *) STBI_REALLOC(z->zout_start, limit);
   3627    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   3628    z->zout_start = q;
   3629    z->zout       = q + cur;
   3630    z->zout_end   = q + limit;
   3631    return 1;
   3632 }
   3633 
   3634 static int stbi__zlength_base[31] = {
   3635    3,4,5,6,7,8,9,10,11,13,
   3636    15,17,19,23,27,31,35,43,51,59,
   3637    67,83,99,115,131,163,195,227,258,0,0 };
   3638 
   3639 static int stbi__zlength_extra[31]=
   3640 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   3641 
   3642 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   3643 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   3644 
   3645 static int stbi__zdist_extra[32] =
   3646 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   3647 
   3648 static int stbi__parse_huffman_block(stbi__zbuf *a)
   3649 {
   3650    char *zout = a->zout;
   3651    for(;;) {
   3652       int z = stbi__zhuffman_decode(a, &a->z_length);
   3653       if (z < 256) {
   3654          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   3655          if (zout >= a->zout_end) {
   3656             if (!stbi__zexpand(a, zout, 1)) return 0;
   3657             zout = a->zout;
   3658          }
   3659          *zout++ = (char) z;
   3660       } else {
   3661          stbi_uc *p;
   3662          int len,dist;
   3663          if (z == 256) {
   3664             a->zout = zout;
   3665             return 1;
   3666          }
   3667          z -= 257;
   3668          len = stbi__zlength_base[z];
   3669          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   3670          z = stbi__zhuffman_decode(a, &a->z_distance);
   3671          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
   3672          dist = stbi__zdist_base[z];
   3673          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   3674          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   3675          if (zout + len > a->zout_end) {
   3676             if (!stbi__zexpand(a, zout, len)) return 0;
   3677             zout = a->zout;
   3678          }
   3679          p = (stbi_uc *) (zout - dist);
   3680          if (dist == 1) { // run of one byte; common in images.
   3681             stbi_uc v = *p;
   3682             if (len) { do *zout++ = v; while (--len); }
   3683          } else {
   3684             if (len) { do *zout++ = *p++; while (--len); }
   3685          }
   3686       }
   3687    }
   3688 }
   3689 
   3690 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   3691 {
   3692    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   3693    stbi__zhuffman z_codelength;
   3694    stbi_uc lencodes[286+32+137];//padding for maximum single op
   3695    stbi_uc codelength_sizes[19];
   3696    int i,n;
   3697 
   3698    int hlit  = stbi__zreceive(a,5) + 257;
   3699    int hdist = stbi__zreceive(a,5) + 1;
   3700    int hclen = stbi__zreceive(a,4) + 4;
   3701 
   3702    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   3703    for (i=0; i < hclen; ++i) {
   3704       int s = stbi__zreceive(a,3);
   3705       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   3706    }
   3707    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   3708 
   3709    n = 0;
   3710    while (n < hlit + hdist) {
   3711       int c = stbi__zhuffman_decode(a, &z_codelength);
   3712       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   3713       if (c < 16)
   3714          lencodes[n++] = (stbi_uc) c;
   3715       else if (c == 16) {
   3716          c = stbi__zreceive(a,2)+3;
   3717          memset(lencodes+n, lencodes[n-1], c);
   3718          n += c;
   3719       } else if (c == 17) {
   3720          c = stbi__zreceive(a,3)+3;
   3721          memset(lencodes+n, 0, c);
   3722          n += c;
   3723       } else {
   3724          STBI_ASSERT(c == 18);
   3725          c = stbi__zreceive(a,7)+11;
   3726          memset(lencodes+n, 0, c);
   3727          n += c;
   3728       }
   3729    }
   3730    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
   3731    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   3732    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   3733    return 1;
   3734 }
   3735 
   3736 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
   3737 {
   3738    stbi_uc header[4];
   3739    int len,nlen,k;
   3740    if (a->num_bits & 7)
   3741       stbi__zreceive(a, a->num_bits & 7); // discard
   3742    // drain the bit-packed data into header
   3743    k = 0;
   3744    while (a->num_bits > 0) {
   3745       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   3746       a->code_buffer >>= 8;
   3747       a->num_bits -= 8;
   3748    }
   3749    STBI_ASSERT(a->num_bits == 0);
   3750    // now fill header the normal way
   3751    while (k < 4)
   3752       header[k++] = stbi__zget8(a);
   3753    len  = header[1] * 256 + header[0];
   3754    nlen = header[3] * 256 + header[2];
   3755    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   3756    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   3757    if (a->zout + len > a->zout_end)
   3758       if (!stbi__zexpand(a, a->zout, len)) return 0;
   3759    memcpy(a->zout, a->zbuffer, len);
   3760    a->zbuffer += len;
   3761    a->zout += len;
   3762    return 1;
   3763 }
   3764 
   3765 static int stbi__parse_zlib_header(stbi__zbuf *a)
   3766 {
   3767    int cmf   = stbi__zget8(a);
   3768    int cm    = cmf & 15;
   3769    /* int cinfo = cmf >> 4; */
   3770    int flg   = stbi__zget8(a);
   3771    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   3772    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   3773    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   3774    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   3775    return 1;
   3776 }
   3777 
   3778 // @TODO: should statically initialize these for optimal thread safety
   3779 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
   3780 static void stbi__init_zdefaults(void)
   3781 {
   3782    int i;   // use <= to match clearly with spec
   3783    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   3784    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   3785    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   3786    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   3787 
   3788    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   3789 }
   3790 
   3791 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   3792 {
   3793    int final, type;
   3794    if (parse_header)
   3795       if (!stbi__parse_zlib_header(a)) return 0;
   3796    a->num_bits = 0;
   3797    a->code_buffer = 0;
   3798    do {
   3799       final = stbi__zreceive(a,1);
   3800       type = stbi__zreceive(a,2);
   3801       if (type == 0) {
   3802          if (!stbi__parse_uncomperssed_block(a)) return 0;
   3803       } else if (type == 3) {
   3804          return 0;
   3805       } else {
   3806          if (type == 1) {
   3807             // use fixed code lengths
   3808             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
   3809             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
   3810             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   3811          } else {
   3812             if (!stbi__compute_huffman_codes(a)) return 0;
   3813          }
   3814          if (!stbi__parse_huffman_block(a)) return 0;
   3815       }
   3816    } while (!final);
   3817    return 1;
   3818 }
   3819 
   3820 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   3821 {
   3822    a->zout_start = obuf;
   3823    a->zout       = obuf;
   3824    a->zout_end   = obuf + olen;
   3825    a->z_expandable = exp;
   3826 
   3827    return stbi__parse_zlib(a, parse_header);
   3828 }
   3829 
   3830 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   3831 {
   3832    stbi__zbuf a;
   3833    char *p = (char *) stbi__malloc(initial_size);
   3834    if (p == NULL) return NULL;
   3835    a.zbuffer = (stbi_uc *) buffer;
   3836    a.zbuffer_end = (stbi_uc *) buffer + len;
   3837    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   3838       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3839       return a.zout_start;
   3840    } else {
   3841       STBI_FREE(a.zout_start);
   3842       return NULL;
   3843    }
   3844 }
   3845 
   3846 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   3847 {
   3848    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   3849 }
   3850 
   3851 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   3852 {
   3853    stbi__zbuf a;
   3854    char *p = (char *) stbi__malloc(initial_size);
   3855    if (p == NULL) return NULL;
   3856    a.zbuffer = (stbi_uc *) buffer;
   3857    a.zbuffer_end = (stbi_uc *) buffer + len;
   3858    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   3859       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3860       return a.zout_start;
   3861    } else {
   3862       STBI_FREE(a.zout_start);
   3863       return NULL;
   3864    }
   3865 }
   3866 
   3867 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   3868 {
   3869    stbi__zbuf a;
   3870    a.zbuffer = (stbi_uc *) ibuffer;
   3871    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   3872    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   3873       return (int) (a.zout - a.zout_start);
   3874    else
   3875       return -1;
   3876 }
   3877 
   3878 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   3879 {
   3880    stbi__zbuf a;
   3881    char *p = (char *) stbi__malloc(16384);
   3882    if (p == NULL) return NULL;
   3883    a.zbuffer = (stbi_uc *) buffer;
   3884    a.zbuffer_end = (stbi_uc *) buffer+len;
   3885    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   3886       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   3887       return a.zout_start;
   3888    } else {
   3889       STBI_FREE(a.zout_start);
   3890       return NULL;
   3891    }
   3892 }
   3893 
   3894 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   3895 {
   3896    stbi__zbuf a;
   3897    a.zbuffer = (stbi_uc *) ibuffer;
   3898    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   3899    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   3900       return (int) (a.zout - a.zout_start);
   3901    else
   3902       return -1;
   3903 }
   3904 #endif
   3905 
   3906 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   3907 //    simple implementation
   3908 //      - only 8-bit samples
   3909 //      - no CRC checking
   3910 //      - allocates lots of intermediate memory
   3911 //        - avoids problem of streaming data between subsystems
   3912 //        - avoids explicit window management
   3913 //    performance
   3914 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   3915 
   3916 #ifndef STBI_NO_PNG
   3917 typedef struct
   3918 {
   3919    stbi__uint32 length;
   3920    stbi__uint32 type;
   3921 } stbi__pngchunk;
   3922 
   3923 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   3924 {
   3925    stbi__pngchunk c;
   3926    c.length = stbi__get32be(s);
   3927    c.type   = stbi__get32be(s);
   3928    return c;
   3929 }
   3930 
   3931 static int stbi__check_png_header(stbi__context *s)
   3932 {
   3933    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   3934    int i;
   3935    for (i=0; i < 8; ++i)
   3936       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   3937    return 1;
   3938 }
   3939 
   3940 typedef struct
   3941 {
   3942    stbi__context *s;
   3943    stbi_uc *idata, *expanded, *out;
   3944 } stbi__png;
   3945 
   3946 
   3947 enum {
   3948    STBI__F_none=0,
   3949    STBI__F_sub=1,
   3950    STBI__F_up=2,
   3951    STBI__F_avg=3,
   3952    STBI__F_paeth=4,
   3953    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   3954    STBI__F_avg_first,
   3955    STBI__F_paeth_first
   3956 };
   3957 
   3958 static stbi_uc first_row_filter[5] =
   3959 {
   3960    STBI__F_none,
   3961    STBI__F_sub,
   3962    STBI__F_none,
   3963    STBI__F_avg_first,
   3964    STBI__F_paeth_first
   3965 };
   3966 
   3967 static int stbi__paeth(int a, int b, int c)
   3968 {
   3969    int p = a + b - c;
   3970    int pa = abs(p-a);
   3971    int pb = abs(p-b);
   3972    int pc = abs(p-c);
   3973    if (pa <= pb && pa <= pc) return a;
   3974    if (pb <= pc) return b;
   3975    return c;
   3976 }
   3977 
   3978 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   3979 
   3980 // create the png data from post-deflated data
   3981 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   3982 {
   3983    stbi__context *s = a->s;
   3984    stbi__uint32 i,j,stride = x*out_n;
   3985    stbi__uint32 img_len, img_width_bytes;
   3986    int k;
   3987    int img_n = s->img_n; // copy it into a local for later
   3988 
   3989    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   3990    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
   3991    if (!a->out) return stbi__err("outofmem", "Out of memory");
   3992 
   3993    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   3994    img_len = (img_width_bytes + 1) * y;
   3995    if (s->img_x == x && s->img_y == y) {
   3996       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
   3997    } else { // interlaced:
   3998       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   3999    }
   4000 
   4001    for (j=0; j < y; ++j) {
   4002       stbi_uc *cur = a->out + stride*j;
   4003       stbi_uc *prior = cur - stride;
   4004       int filter = *raw++;
   4005       int filter_bytes = img_n;
   4006       int width = x;
   4007       if (filter > 4)
   4008          return stbi__err("invalid filter","Corrupt PNG");
   4009 
   4010       if (depth < 8) {
   4011          STBI_ASSERT(img_width_bytes <= x);
   4012          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4013          filter_bytes = 1;
   4014          width = img_width_bytes;
   4015       }
   4016 
   4017       // if first row, use special filter that doesn't sample previous row
   4018       if (j == 0) filter = first_row_filter[filter];
   4019 
   4020       // handle first byte explicitly
   4021       for (k=0; k < filter_bytes; ++k) {
   4022          switch (filter) {
   4023             case STBI__F_none       : cur[k] = raw[k]; break;
   4024             case STBI__F_sub        : cur[k] = raw[k]; break;
   4025             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4026             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4027             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4028             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4029             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4030          }
   4031       }
   4032 
   4033       if (depth == 8) {
   4034          if (img_n != out_n)
   4035             cur[img_n] = 255; // first pixel
   4036          raw += img_n;
   4037          cur += out_n;
   4038          prior += out_n;
   4039       } else {
   4040          raw += 1;
   4041          cur += 1;
   4042          prior += 1;
   4043       }
   4044 
   4045       // this is a little gross, so that we don't switch per-pixel or per-component
   4046       if (depth < 8 || img_n == out_n) {
   4047          int nk = (width - 1)*img_n;
   4048          #define CASE(f) \
   4049              case f:     \
   4050                 for (k=0; k < nk; ++k)
   4051          switch (filter) {
   4052             // "none" filter turns into a memcpy here; make that explicit.
   4053             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4054             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
   4055             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4056             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
   4057             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
   4058             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
   4059             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
   4060          }
   4061          #undef CASE
   4062          raw += nk;
   4063       } else {
   4064          STBI_ASSERT(img_n+1 == out_n);
   4065          #define CASE(f) \
   4066              case f:     \
   4067                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
   4068                    for (k=0; k < img_n; ++k)
   4069          switch (filter) {
   4070             CASE(STBI__F_none)         cur[k] = raw[k]; break;
   4071             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
   4072             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4073             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
   4074             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
   4075             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
   4076             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
   4077          }
   4078          #undef CASE
   4079       }
   4080    }
   4081 
   4082    // we make a separate pass to expand bits to pixels; for performance,
   4083    // this could run two scanlines behind the above code, so it won't
   4084    // intefere with filtering but will still be in the cache.
   4085    if (depth < 8) {
   4086       for (j=0; j < y; ++j) {
   4087          stbi_uc *cur = a->out + stride*j;
   4088          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4089          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4090          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4091          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4092 
   4093          // note that the final byte might overshoot and write more data than desired.
   4094          // we can allocate enough data that this never writes out of memory, but it
   4095          // could also overwrite the next scanline. can it overwrite non-empty data
   4096          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4097          // so we need to explicitly clamp the final ones
   4098 
   4099          if (depth == 4) {
   4100             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4101                *cur++ = scale * ((*in >> 4)       );
   4102                *cur++ = scale * ((*in     ) & 0x0f);
   4103             }
   4104             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4105          } else if (depth == 2) {
   4106             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4107                *cur++ = scale * ((*in >> 6)       );
   4108                *cur++ = scale * ((*in >> 4) & 0x03);
   4109                *cur++ = scale * ((*in >> 2) & 0x03);
   4110                *cur++ = scale * ((*in     ) & 0x03);
   4111             }
   4112             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4113             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4114             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4115          } else if (depth == 1) {
   4116             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4117                *cur++ = scale * ((*in >> 7)       );
   4118                *cur++ = scale * ((*in >> 6) & 0x01);
   4119                *cur++ = scale * ((*in >> 5) & 0x01);
   4120                *cur++ = scale * ((*in >> 4) & 0x01);
   4121                *cur++ = scale * ((*in >> 3) & 0x01);
   4122                *cur++ = scale * ((*in >> 2) & 0x01);
   4123                *cur++ = scale * ((*in >> 1) & 0x01);
   4124                *cur++ = scale * ((*in     ) & 0x01);
   4125             }
   4126             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4127             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4128             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4129             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4130             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4131             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4132             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4133          }
   4134          if (img_n != out_n) {
   4135             int q;
   4136             // insert alpha = 255
   4137             cur = a->out + stride*j;
   4138             if (img_n == 1) {
   4139                for (q=x-1; q >= 0; --q) {
   4140                   cur[q*2+1] = 255;
   4141                   cur[q*2+0] = cur[q];
   4142                }
   4143             } else {
   4144                STBI_ASSERT(img_n == 3);
   4145                for (q=x-1; q >= 0; --q) {
   4146                   cur[q*4+3] = 255;
   4147                   cur[q*4+2] = cur[q*3+2];
   4148                   cur[q*4+1] = cur[q*3+1];
   4149                   cur[q*4+0] = cur[q*3+0];
   4150                }
   4151             }
   4152          }
   4153       }
   4154    }
   4155 
   4156    return 1;
   4157 }
   4158 
   4159 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4160 {
   4161    stbi_uc *final;
   4162    int p;
   4163    if (!interlaced)
   4164       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4165 
   4166    // de-interlacing
   4167    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
   4168    for (p=0; p < 7; ++p) {
   4169       int xorig[] = { 0,4,0,2,0,1,0 };
   4170       int yorig[] = { 0,0,4,0,2,0,1 };
   4171       int xspc[]  = { 8,8,4,4,2,2,1 };
   4172       int yspc[]  = { 8,8,8,4,4,2,2 };
   4173       int i,j,x,y;
   4174       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4175       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4176       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4177       if (x && y) {
   4178          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4179          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4180             STBI_FREE(final);
   4181             return 0;
   4182          }
   4183          for (j=0; j < y; ++j) {
   4184             for (i=0; i < x; ++i) {
   4185                int out_y = j*yspc[p]+yorig[p];
   4186                int out_x = i*xspc[p]+xorig[p];
   4187                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
   4188                       a->out + (j*x+i)*out_n, out_n);
   4189             }
   4190          }
   4191          STBI_FREE(a->out);
   4192          image_data += img_len;
   4193          image_data_len -= img_len;
   4194       }
   4195    }
   4196    a->out = final;
   4197 
   4198    return 1;
   4199 }
   4200 
   4201 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4202 {
   4203    stbi__context *s = z->s;
   4204    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4205    stbi_uc *p = z->out;
   4206 
   4207    // compute color-based transparency, assuming we've
   4208    // already got 255 as the alpha value in the output
   4209    STBI_ASSERT(out_n == 2 || out_n == 4);
   4210 
   4211    if (out_n == 2) {
   4212       for (i=0; i < pixel_count; ++i) {
   4213          p[1] = (p[0] == tc[0] ? 0 : 255);
   4214          p += 2;
   4215       }
   4216    } else {
   4217       for (i=0; i < pixel_count; ++i) {
   4218          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4219             p[3] = 0;
   4220          p += 4;
   4221       }
   4222    }
   4223    return 1;
   4224 }
   4225 
   4226 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4227 {
   4228    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4229    stbi_uc *p, *temp_out, *orig = a->out;
   4230 
   4231    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
   4232    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4233 
   4234    // between here and free(out) below, exitting would leak
   4235    temp_out = p;
   4236 
   4237    if (pal_img_n == 3) {
   4238       for (i=0; i < pixel_count; ++i) {
   4239          int n = orig[i]*4;
   4240          p[0] = palette[n  ];
   4241          p[1] = palette[n+1];
   4242          p[2] = palette[n+2];
   4243          p += 3;
   4244       }
   4245    } else {
   4246       for (i=0; i < pixel_count; ++i) {
   4247          int n = orig[i]*4;
   4248          p[0] = palette[n  ];
   4249          p[1] = palette[n+1];
   4250          p[2] = palette[n+2];
   4251          p[3] = palette[n+3];
   4252          p += 4;
   4253       }
   4254    }
   4255    STBI_FREE(a->out);
   4256    a->out = temp_out;
   4257 
   4258    STBI_NOTUSED(len);
   4259 
   4260    return 1;
   4261 }
   4262 
   4263 static int stbi__unpremultiply_on_load = 0;
   4264 static int stbi__de_iphone_flag = 0;
   4265 
   4266 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4267 {
   4268    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
   4269 }
   4270 
   4271 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   4272 {
   4273    stbi__de_iphone_flag = flag_true_if_should_convert;
   4274 }
   4275 
   4276 static void stbi__de_iphone(stbi__png *z)
   4277 {
   4278    stbi__context *s = z->s;
   4279    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4280    stbi_uc *p = z->out;
   4281 
   4282    if (s->img_out_n == 3) {  // convert bgr to rgb
   4283       for (i=0; i < pixel_count; ++i) {
   4284          stbi_uc t = p[0];
   4285          p[0] = p[2];
   4286          p[2] = t;
   4287          p += 3;
   4288       }
   4289    } else {
   4290       STBI_ASSERT(s->img_out_n == 4);
   4291       if (stbi__unpremultiply_on_load) {
   4292          // convert bgr to rgb and unpremultiply
   4293          for (i=0; i < pixel_count; ++i) {
   4294             stbi_uc a = p[3];
   4295             stbi_uc t = p[0];
   4296             if (a) {
   4297                p[0] = p[2] * 255 / a;
   4298                p[1] = p[1] * 255 / a;
   4299                p[2] =  t   * 255 / a;
   4300             } else {
   4301                p[0] = p[2];
   4302                p[2] = t;
   4303             }
   4304             p += 4;
   4305          }
   4306       } else {
   4307          // convert bgr to rgb
   4308          for (i=0; i < pixel_count; ++i) {
   4309             stbi_uc t = p[0];
   4310             p[0] = p[2];
   4311             p[2] = t;
   4312             p += 4;
   4313          }
   4314       }
   4315    }
   4316 }
   4317 
   4318 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
   4319 
   4320 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   4321 {
   4322    stbi_uc palette[1024], pal_img_n=0;
   4323    stbi_uc has_trans=0, tc[3];
   4324    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   4325    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
   4326    stbi__context *s = z->s;
   4327 
   4328    z->expanded = NULL;
   4329    z->idata = NULL;
   4330    z->out = NULL;
   4331 
   4332    if (!stbi__check_png_header(s)) return 0;
   4333 
   4334    if (scan == STBI__SCAN_type) return 1;
   4335 
   4336    for (;;) {
   4337       stbi__pngchunk c = stbi__get_chunk_header(s);
   4338       switch (c.type) {
   4339          case STBI__PNG_TYPE('C','g','B','I'):
   4340             is_iphone = 1;
   4341             stbi__skip(s, c.length);
   4342             break;
   4343          case STBI__PNG_TYPE('I','H','D','R'): {
   4344             int comp,filter;
   4345             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   4346             first = 0;
   4347             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   4348             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4349             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
   4350             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
   4351             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   4352             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   4353             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   4354             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   4355             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   4356             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   4357             if (!pal_img_n) {
   4358                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   4359                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   4360                if (scan == STBI__SCAN_header) return 1;
   4361             } else {
   4362                // if paletted, then pal_n is our final components, and
   4363                // img_n is # components to decompress/filter.
   4364                s->img_n = 1;
   4365                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   4366                // if SCAN_header, have to scan to see if we have a tRNS
   4367             }
   4368             break;
   4369          }
   4370 
   4371          case STBI__PNG_TYPE('P','L','T','E'):  {
   4372             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4373             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   4374             pal_len = c.length / 3;
   4375             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   4376             for (i=0; i < pal_len; ++i) {
   4377                palette[i*4+0] = stbi__get8(s);
   4378                palette[i*4+1] = stbi__get8(s);
   4379                palette[i*4+2] = stbi__get8(s);
   4380                palette[i*4+3] = 255;
   4381             }
   4382             break;
   4383          }
   4384 
   4385          case STBI__PNG_TYPE('t','R','N','S'): {
   4386             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4387             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   4388             if (pal_img_n) {
   4389                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   4390                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   4391                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   4392                pal_img_n = 4;
   4393                for (i=0; i < c.length; ++i)
   4394                   palette[i*4+3] = stbi__get8(s);
   4395             } else {
   4396                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   4397                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   4398                has_trans = 1;
   4399                for (k=0; k < s->img_n; ++k)
   4400                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
   4401             }
   4402             break;
   4403          }
   4404 
   4405          case STBI__PNG_TYPE('I','D','A','T'): {
   4406             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4407             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   4408             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
   4409             if ((int)(ioff + c.length) < (int)ioff) return 0;
   4410             if (ioff + c.length > idata_limit) {
   4411                stbi_uc *p;
   4412                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   4413                while (ioff + c.length > idata_limit)
   4414                   idata_limit *= 2;
   4415                p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4416                z->idata = p;
   4417             }
   4418             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   4419             ioff += c.length;
   4420             break;
   4421          }
   4422 
   4423          case STBI__PNG_TYPE('I','E','N','D'): {
   4424             stbi__uint32 raw_len, bpl;
   4425             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4426             if (scan != STBI__SCAN_load) return 1;
   4427             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   4428             // initial guess for decoded data size to avoid unnecessary reallocs
   4429             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
   4430             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   4431             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   4432             if (z->expanded == NULL) return 0; // zlib should set error
   4433             STBI_FREE(z->idata); z->idata = NULL;
   4434             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   4435                s->img_out_n = s->img_n+1;
   4436             else
   4437                s->img_out_n = s->img_n;
   4438             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
   4439             if (has_trans)
   4440                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   4441             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   4442                stbi__de_iphone(z);
   4443             if (pal_img_n) {
   4444                // pal_img_n == 3 or 4
   4445                s->img_n = pal_img_n; // record the actual colors we had
   4446                s->img_out_n = pal_img_n;
   4447                if (req_comp >= 3) s->img_out_n = req_comp;
   4448                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   4449                   return 0;
   4450             }
   4451             STBI_FREE(z->expanded); z->expanded = NULL;
   4452             return 1;
   4453          }
   4454 
   4455          default:
   4456             // if critical, fail
   4457             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   4458             if ((c.type & (1 << 29)) == 0) {
   4459                #ifndef STBI_NO_FAILURE_STRINGS
   4460                // not threadsafe
   4461                static char invalid_chunk[] = "XXXX PNG chunk not known";
   4462                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   4463                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   4464                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   4465                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   4466                #endif
   4467                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   4468             }
   4469             stbi__skip(s, c.length);
   4470             break;
   4471       }
   4472       // end of PNG chunk, read and skip CRC
   4473       stbi__get32be(s);
   4474    }
   4475 }
   4476 
   4477 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
   4478 {
   4479    unsigned char *result=NULL;
   4480    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   4481    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   4482       result = p->out;
   4483       p->out = NULL;
   4484       if (req_comp && req_comp != p->s->img_out_n) {
   4485          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   4486          p->s->img_out_n = req_comp;
   4487          if (result == NULL) return result;
   4488       }
   4489       *x = p->s->img_x;
   4490       *y = p->s->img_y;
   4491       if (n) *n = p->s->img_out_n;
   4492    }
   4493    STBI_FREE(p->out);      p->out      = NULL;
   4494    STBI_FREE(p->expanded); p->expanded = NULL;
   4495    STBI_FREE(p->idata);    p->idata    = NULL;
   4496 
   4497    return result;
   4498 }
   4499 
   4500 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4501 {
   4502    stbi__png p;
   4503    p.s = s;
   4504    return stbi__do_png(&p, x,y,comp,req_comp);
   4505 }
   4506 
   4507 static int stbi__png_test(stbi__context *s)
   4508 {
   4509    int r;
   4510    r = stbi__check_png_header(s);
   4511    stbi__rewind(s);
   4512    return r;
   4513 }
   4514 
   4515 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   4516 {
   4517    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   4518       stbi__rewind( p->s );
   4519       return 0;
   4520    }
   4521    if (x) *x = p->s->img_x;
   4522    if (y) *y = p->s->img_y;
   4523    if (comp) *comp = p->s->img_n;
   4524    return 1;
   4525 }
   4526 
   4527 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   4528 {
   4529    stbi__png p;
   4530    p.s = s;
   4531    return stbi__png_info_raw(&p, x, y, comp);
   4532 }
   4533 #endif
   4534 
   4535 // Microsoft/Windows BMP image
   4536 
   4537 #ifndef STBI_NO_BMP
   4538 static int stbi__bmp_test_raw(stbi__context *s)
   4539 {
   4540    int r;
   4541    int sz;
   4542    if (stbi__get8(s) != 'B') return 0;
   4543    if (stbi__get8(s) != 'M') return 0;
   4544    stbi__get32le(s); // discard filesize
   4545    stbi__get16le(s); // discard reserved
   4546    stbi__get16le(s); // discard reserved
   4547    stbi__get32le(s); // discard data offset
   4548    sz = stbi__get32le(s);
   4549    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   4550    return r;
   4551 }
   4552 
   4553 static int stbi__bmp_test(stbi__context *s)
   4554 {
   4555    int r = stbi__bmp_test_raw(s);
   4556    stbi__rewind(s);
   4557    return r;
   4558 }
   4559 
   4560 
   4561 // returns 0..31 for the highest set bit
   4562 static int stbi__high_bit(unsigned int z)
   4563 {
   4564    int n=0;
   4565    if (z == 0) return -1;
   4566    if (z >= 0x10000) n += 16, z >>= 16;
   4567    if (z >= 0x00100) n +=  8, z >>=  8;
   4568    if (z >= 0x00010) n +=  4, z >>=  4;
   4569    if (z >= 0x00004) n +=  2, z >>=  2;
   4570    if (z >= 0x00002) n +=  1, z >>=  1;
   4571    return n;
   4572 }
   4573 
   4574 static int stbi__bitcount(unsigned int a)
   4575 {
   4576    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   4577    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   4578    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   4579    a = (a + (a >> 8)); // max 16 per 8 bits
   4580    a = (a + (a >> 16)); // max 32 per 8 bits
   4581    return a & 0xff;
   4582 }
   4583 
   4584 static int stbi__shiftsigned(int v, int shift, int bits)
   4585 {
   4586    int result;
   4587    int z=0;
   4588 
   4589    if (shift < 0) v <<= -shift;
   4590    else v >>= shift;
   4591    result = v;
   4592 
   4593    z = bits;
   4594    while (z < 8) {
   4595       result += v >> z;
   4596       z += bits;
   4597    }
   4598    return result;
   4599 }
   4600 
   4601 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4602 {
   4603    stbi_uc *out;
   4604    unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255;
   4605    stbi_uc pal[256][4];
   4606    int psize=0,i,j,compress=0,width;
   4607    int bpp, flip_vertically, pad, target, offset, hsz;
   4608    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   4609    stbi__get32le(s); // discard filesize
   4610    stbi__get16le(s); // discard reserved
   4611    stbi__get16le(s); // discard reserved
   4612    offset = stbi__get32le(s);
   4613    hsz = stbi__get32le(s);
   4614    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   4615    if (hsz == 12) {
   4616       s->img_x = stbi__get16le(s);
   4617       s->img_y = stbi__get16le(s);
   4618    } else {
   4619       s->img_x = stbi__get32le(s);
   4620       s->img_y = stbi__get32le(s);
   4621    }
   4622    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   4623    bpp = stbi__get16le(s);
   4624    if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
   4625    flip_vertically = ((int) s->img_y) > 0;
   4626    s->img_y = abs((int) s->img_y);
   4627    if (hsz == 12) {
   4628       if (bpp < 24)
   4629          psize = (offset - 14 - 24) / 3;
   4630    } else {
   4631       compress = stbi__get32le(s);
   4632       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   4633       stbi__get32le(s); // discard sizeof
   4634       stbi__get32le(s); // discard hres
   4635       stbi__get32le(s); // discard vres
   4636       stbi__get32le(s); // discard colorsused
   4637       stbi__get32le(s); // discard max important
   4638       if (hsz == 40 || hsz == 56) {
   4639          if (hsz == 56) {
   4640             stbi__get32le(s);
   4641             stbi__get32le(s);
   4642             stbi__get32le(s);
   4643             stbi__get32le(s);
   4644          }
   4645          if (bpp == 16 || bpp == 32) {
   4646             mr = mg = mb = 0;
   4647             if (compress == 0) {
   4648                if (bpp == 32) {
   4649                   mr = 0xffu << 16;
   4650                   mg = 0xffu <<  8;
   4651                   mb = 0xffu <<  0;
   4652                   ma = 0xffu << 24;
   4653                   all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   4654                } else {
   4655                   mr = 31u << 10;
   4656                   mg = 31u <<  5;
   4657                   mb = 31u <<  0;
   4658                }
   4659             } else if (compress == 3) {
   4660                mr = stbi__get32le(s);
   4661                mg = stbi__get32le(s);
   4662                mb = stbi__get32le(s);
   4663                // not documented, but generated by photoshop and handled by mspaint
   4664                if (mr == mg && mg == mb) {
   4665                   // ?!?!?
   4666                   return stbi__errpuc("bad BMP", "bad BMP");
   4667                }
   4668             } else
   4669                return stbi__errpuc("bad BMP", "bad BMP");
   4670          }
   4671       } else {
   4672          STBI_ASSERT(hsz == 108 || hsz == 124);
   4673          mr = stbi__get32le(s);
   4674          mg = stbi__get32le(s);
   4675          mb = stbi__get32le(s);
   4676          ma = stbi__get32le(s);
   4677          stbi__get32le(s); // discard color space
   4678          for (i=0; i < 12; ++i)
   4679             stbi__get32le(s); // discard color space parameters
   4680          if (hsz == 124) {
   4681             stbi__get32le(s); // discard rendering intent
   4682             stbi__get32le(s); // discard offset of profile data
   4683             stbi__get32le(s); // discard size of profile data
   4684             stbi__get32le(s); // discard reserved
   4685          }
   4686       }
   4687       if (bpp < 16)
   4688          psize = (offset - 14 - hsz) >> 2;
   4689    }
   4690    s->img_n = ma ? 4 : 3;
   4691    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   4692       target = req_comp;
   4693    else
   4694       target = s->img_n; // if they want monochrome, we'll post-convert
   4695    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
   4696    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   4697    if (bpp < 16) {
   4698       int z=0;
   4699       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   4700       for (i=0; i < psize; ++i) {
   4701          pal[i][2] = stbi__get8(s);
   4702          pal[i][1] = stbi__get8(s);
   4703          pal[i][0] = stbi__get8(s);
   4704          if (hsz != 12) stbi__get8(s);
   4705          pal[i][3] = 255;
   4706       }
   4707       stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
   4708       if (bpp == 4) width = (s->img_x + 1) >> 1;
   4709       else if (bpp == 8) width = s->img_x;
   4710       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   4711       pad = (-width)&3;
   4712       for (j=0; j < (int) s->img_y; ++j) {
   4713          for (i=0; i < (int) s->img_x; i += 2) {
   4714             int v=stbi__get8(s),v2=0;
   4715             if (bpp == 4) {
   4716                v2 = v & 15;
   4717                v >>= 4;
   4718             }
   4719             out[z++] = pal[v][0];
   4720             out[z++] = pal[v][1];
   4721             out[z++] = pal[v][2];
   4722             if (target == 4) out[z++] = 255;
   4723             if (i+1 == (int) s->img_x) break;
   4724             v = (bpp == 8) ? stbi__get8(s) : v2;
   4725             out[z++] = pal[v][0];
   4726             out[z++] = pal[v][1];
   4727             out[z++] = pal[v][2];
   4728             if (target == 4) out[z++] = 255;
   4729          }
   4730          stbi__skip(s, pad);
   4731       }
   4732    } else {
   4733       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   4734       int z = 0;
   4735       int easy=0;
   4736       stbi__skip(s, offset - 14 - hsz);
   4737       if (bpp == 24) width = 3 * s->img_x;
   4738       else if (bpp == 16) width = 2*s->img_x;
   4739       else /* bpp = 32 and pad = 0 */ width=0;
   4740       pad = (-width) & 3;
   4741       if (bpp == 24) {
   4742          easy = 1;
   4743       } else if (bpp == 32) {
   4744          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   4745             easy = 2;
   4746       }
   4747       if (!easy) {
   4748          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   4749          // right shift amt to put high bit in position #7
   4750          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   4751          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   4752          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   4753          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   4754       }
   4755       for (j=0; j < (int) s->img_y; ++j) {
   4756          if (easy) {
   4757             for (i=0; i < (int) s->img_x; ++i) {
   4758                unsigned char a;
   4759                out[z+2] = stbi__get8(s);
   4760                out[z+1] = stbi__get8(s);
   4761                out[z+0] = stbi__get8(s);
   4762                z += 3;
   4763                a = (easy == 2 ? stbi__get8(s) : 255);
   4764                all_a |= a;
   4765                if (target == 4) out[z++] = a;
   4766             }
   4767          } else {
   4768             for (i=0; i < (int) s->img_x; ++i) {
   4769                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   4770                int a;
   4771                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   4772                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   4773                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   4774                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   4775                all_a |= a;
   4776                if (target == 4) out[z++] = STBI__BYTECAST(a);
   4777             }
   4778          }
   4779          stbi__skip(s, pad);
   4780       }
   4781    }
   4782 
   4783    // if alpha channel is all 0s, replace with all 255s
   4784    if (target == 4 && all_a == 0)
   4785       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   4786          out[i] = 255;
   4787 
   4788    if (flip_vertically) {
   4789       stbi_uc t;
   4790       for (j=0; j < (int) s->img_y>>1; ++j) {
   4791          stbi_uc *p1 = out +      j     *s->img_x*target;
   4792          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   4793          for (i=0; i < (int) s->img_x*target; ++i) {
   4794             t = p1[i], p1[i] = p2[i], p2[i] = t;
   4795          }
   4796       }
   4797    }
   4798 
   4799    if (req_comp && req_comp != target) {
   4800       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   4801       if (out == NULL) return out; // stbi__convert_format frees input on failure
   4802    }
   4803 
   4804    *x = s->img_x;
   4805    *y = s->img_y;
   4806    if (comp) *comp = s->img_n;
   4807    return out;
   4808 }
   4809 #endif
   4810 
   4811 // Targa Truevision - TGA
   4812 // by Jonathan Dummer
   4813 #ifndef STBI_NO_TGA
   4814 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   4815 {
   4816     int tga_w, tga_h, tga_comp;
   4817     int sz;
   4818     stbi__get8(s);                   // discard Offset
   4819     sz = stbi__get8(s);              // color type
   4820     if( sz > 1 ) {
   4821         stbi__rewind(s);
   4822         return 0;      // only RGB or indexed allowed
   4823     }
   4824     sz = stbi__get8(s);              // image type
   4825     // only RGB or grey allowed, +/- RLE
   4826     if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
   4827     stbi__skip(s,9);
   4828     tga_w = stbi__get16le(s);
   4829     if( tga_w < 1 ) {
   4830         stbi__rewind(s);
   4831         return 0;   // test width
   4832     }
   4833     tga_h = stbi__get16le(s);
   4834     if( tga_h < 1 ) {
   4835         stbi__rewind(s);
   4836         return 0;   // test height
   4837     }
   4838     sz = stbi__get8(s);               // bits per pixel
   4839     // only RGB or RGBA or grey allowed
   4840     if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
   4841         stbi__rewind(s);
   4842         return 0;
   4843     }
   4844     tga_comp = sz;
   4845     if (x) *x = tga_w;
   4846     if (y) *y = tga_h;
   4847     if (comp) *comp = tga_comp / 8;
   4848     return 1;                   // seems to have passed everything
   4849 }
   4850 
   4851 static int stbi__tga_test(stbi__context *s)
   4852 {
   4853    int res;
   4854    int sz;
   4855    stbi__get8(s);      //   discard Offset
   4856    sz = stbi__get8(s);   //   color type
   4857    if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
   4858    sz = stbi__get8(s);   //   image type
   4859    if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
   4860    stbi__get16be(s);      //   discard palette start
   4861    stbi__get16be(s);      //   discard palette length
   4862    stbi__get8(s);         //   discard bits per palette color entry
   4863    stbi__get16be(s);      //   discard x origin
   4864    stbi__get16be(s);      //   discard y origin
   4865    if ( stbi__get16be(s) < 1 ) return 0;      //   test width
   4866    if ( stbi__get16be(s) < 1 ) return 0;      //   test height
   4867    sz = stbi__get8(s);   //   bits per pixel
   4868    if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
   4869       res = 0;
   4870    else
   4871       res = 1;
   4872    stbi__rewind(s);
   4873    return res;
   4874 }
   4875 
   4876 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   4877 {
   4878    //   read in the TGA header stuff
   4879    int tga_offset = stbi__get8(s);
   4880    int tga_indexed = stbi__get8(s);
   4881    int tga_image_type = stbi__get8(s);
   4882    int tga_is_RLE = 0;
   4883    int tga_palette_start = stbi__get16le(s);
   4884    int tga_palette_len = stbi__get16le(s);
   4885    int tga_palette_bits = stbi__get8(s);
   4886    int tga_x_origin = stbi__get16le(s);
   4887    int tga_y_origin = stbi__get16le(s);
   4888    int tga_width = stbi__get16le(s);
   4889    int tga_height = stbi__get16le(s);
   4890    int tga_bits_per_pixel = stbi__get8(s);
   4891    int tga_comp = tga_bits_per_pixel / 8;
   4892    int tga_inverted = stbi__get8(s);
   4893    //   image data
   4894    unsigned char *tga_data;
   4895    unsigned char *tga_palette = NULL;
   4896    int i, j;
   4897    unsigned char raw_data[4];
   4898    int RLE_count = 0;
   4899    int RLE_repeating = 0;
   4900    int read_next_pixel = 1;
   4901 
   4902    //   do a tiny bit of precessing
   4903    if ( tga_image_type >= 8 )
   4904    {
   4905       tga_image_type -= 8;
   4906       tga_is_RLE = 1;
   4907    }
   4908    /* int tga_alpha_bits = tga_inverted & 15; */
   4909    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   4910 
   4911    //   error check
   4912    if ( //(tga_indexed) ||
   4913       (tga_width < 1) || (tga_height < 1) ||
   4914       (tga_image_type < 1) || (tga_image_type > 3) ||
   4915       ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
   4916       (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
   4917       )
   4918    {
   4919       return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
   4920    }
   4921 
   4922    //   If I'm paletted, then I'll use the number of bits from the palette
   4923    if ( tga_indexed )
   4924    {
   4925       tga_comp = tga_palette_bits / 8;
   4926    }
   4927 
   4928    //   tga info
   4929    *x = tga_width;
   4930    *y = tga_height;
   4931    if (comp) *comp = tga_comp;
   4932 
   4933    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
   4934    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   4935 
   4936    // skip to the data's starting position (offset usually = 0)
   4937    stbi__skip(s, tga_offset );
   4938 
   4939    if ( !tga_indexed && !tga_is_RLE) {
   4940       for (i=0; i < tga_height; ++i) {
   4941          int row = tga_inverted ? tga_height -i - 1 : i;
   4942          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   4943          stbi__getn(s, tga_row, tga_width * tga_comp);
   4944       }
   4945    } else  {
   4946       //   do I need to load a palette?
   4947       if ( tga_indexed)
   4948       {
   4949          //   any data to skip? (offset usually = 0)
   4950          stbi__skip(s, tga_palette_start );
   4951          //   load the palette
   4952          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
   4953          if (!tga_palette) {
   4954             STBI_FREE(tga_data);
   4955             return stbi__errpuc("outofmem", "Out of memory");
   4956          }
   4957          if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
   4958             STBI_FREE(tga_data);
   4959             STBI_FREE(tga_palette);
   4960             return stbi__errpuc("bad palette", "Corrupt TGA");
   4961          }
   4962       }
   4963       //   load the data
   4964       for (i=0; i < tga_width * tga_height; ++i)
   4965       {
   4966          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   4967          if ( tga_is_RLE )
   4968          {
   4969             if ( RLE_count == 0 )
   4970             {
   4971                //   yep, get the next byte as a RLE command
   4972                int RLE_cmd = stbi__get8(s);
   4973                RLE_count = 1 + (RLE_cmd & 127);
   4974                RLE_repeating = RLE_cmd >> 7;
   4975                read_next_pixel = 1;
   4976             } else if ( !RLE_repeating )
   4977             {
   4978                read_next_pixel = 1;
   4979             }
   4980          } else
   4981          {
   4982             read_next_pixel = 1;
   4983          }
   4984          //   OK, if I need to read a pixel, do it now
   4985          if ( read_next_pixel )
   4986          {
   4987             //   load however much data we did have
   4988             if ( tga_indexed )
   4989             {
   4990                //   read in 1 byte, then perform the lookup
   4991                int pal_idx = stbi__get8(s);
   4992                if ( pal_idx >= tga_palette_len )
   4993                {
   4994                   //   invalid index
   4995                   pal_idx = 0;
   4996                }
   4997                pal_idx *= tga_bits_per_pixel / 8;
   4998                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
   4999                {
   5000                   raw_data[j] = tga_palette[pal_idx+j];
   5001                }
   5002             } else
   5003             {
   5004                //   read in the data raw
   5005                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
   5006                {
   5007                   raw_data[j] = stbi__get8(s);
   5008                }
   5009             }
   5010             //   clear the reading flag for the next pixel
   5011             read_next_pixel = 0;
   5012          } // end of reading a pixel
   5013 
   5014          // copy data
   5015          for (j = 0; j < tga_comp; ++j)
   5016            tga_data[i*tga_comp+j] = raw_data[j];
   5017 
   5018          //   in case we're in RLE mode, keep counting down
   5019          --RLE_count;
   5020       }
   5021       //   do I need to invert the image?
   5022       if ( tga_inverted )
   5023       {
   5024          for (j = 0; j*2 < tga_height; ++j)
   5025          {
   5026             int index1 = j * tga_width * tga_comp;
   5027             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   5028             for (i = tga_width * tga_comp; i > 0; --i)
   5029             {
   5030                unsigned char temp = tga_data[index1];
   5031                tga_data[index1] = tga_data[index2];
   5032                tga_data[index2] = temp;
   5033                ++index1;
   5034                ++index2;
   5035             }
   5036          }
   5037       }
   5038       //   clear my palette, if I had one
   5039       if ( tga_palette != NULL )
   5040       {
   5041          STBI_FREE( tga_palette );
   5042       }
   5043    }
   5044 
   5045    // swap RGB
   5046    if (tga_comp >= 3)
   5047    {
   5048       unsigned char* tga_pixel = tga_data;
   5049       for (i=0; i < tga_width * tga_height; ++i)
   5050       {
   5051          unsigned char temp = tga_pixel[0];
   5052          tga_pixel[0] = tga_pixel[2];
   5053          tga_pixel[2] = temp;
   5054          tga_pixel += tga_comp;
   5055       }
   5056    }
   5057 
   5058    // convert to target component count
   5059    if (req_comp && req_comp != tga_comp)
   5060       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   5061 
   5062    //   the things I do to get rid of an error message, and yet keep
   5063    //   Microsoft's C compilers happy... [8^(
   5064    tga_palette_start = tga_palette_len = tga_palette_bits =
   5065          tga_x_origin = tga_y_origin = 0;
   5066    //   OK, done
   5067    return tga_data;
   5068 }
   5069 #endif
   5070 
   5071 // *************************************************************************************************
   5072 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   5073 
   5074 #ifndef STBI_NO_PSD
   5075 static int stbi__psd_test(stbi__context *s)
   5076 {
   5077    int r = (stbi__get32be(s) == 0x38425053);
   5078    stbi__rewind(s);
   5079    return r;
   5080 }
   5081 
   5082 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   5083 {
   5084    int   pixelCount;
   5085    int channelCount, compression;
   5086    int channel, i, count, len;
   5087    int bitdepth;
   5088    int w,h;
   5089    stbi_uc *out;
   5090 
   5091    // Check identifier
   5092    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   5093       return stbi__errpuc("not PSD", "Corrupt PSD image");
   5094 
   5095    // Check file type version.
   5096    if (stbi__get16be(s) != 1)
   5097       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   5098 
   5099    // Skip 6 reserved bytes.
   5100    stbi__skip(s, 6 );
   5101 
   5102    // Read the number of channels (R, G, B, A, etc).
   5103    channelCount = stbi__get16be(s);
   5104    if (channelCount < 0 || channelCount > 16)
   5105       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   5106 
   5107    // Read the rows and columns of the image.
   5108    h = stbi__get32be(s);
   5109    w = stbi__get32be(s);
   5110 
   5111    // Make sure the depth is 8 bits.
   5112    bitdepth = stbi__get16be(s);
   5113    if (bitdepth != 8 && bitdepth != 16)
   5114       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   5115 
   5116    // Make sure the color mode is RGB.
   5117    // Valid options are:
   5118    //   0: Bitmap
   5119    //   1: Grayscale
   5120    //   2: Indexed color
   5121    //   3: RGB color
   5122    //   4: CMYK color
   5123    //   7: Multichannel
   5124    //   8: Duotone
   5125    //   9: Lab color
   5126    if (stbi__get16be(s) != 3)
   5127       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   5128 
   5129    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   5130    stbi__skip(s,stbi__get32be(s) );
   5131 
   5132    // Skip the image resources.  (resolution, pen tool paths, etc)
   5133    stbi__skip(s, stbi__get32be(s) );
   5134 
   5135    // Skip the reserved data.
   5136    stbi__skip(s, stbi__get32be(s) );
   5137 
   5138    // Find out if the data is compressed.
   5139    // Known values:
   5140    //   0: no compression
   5141    //   1: RLE compressed
   5142    compression = stbi__get16be(s);
   5143    if (compression > 1)
   5144       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   5145 
   5146    // Create the destination image.
   5147    out = (stbi_uc *) stbi__malloc(4 * w*h);
   5148    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5149    pixelCount = w*h;
   5150 
   5151    // Initialize the data to zero.
   5152    //memset( out, 0, pixelCount * 4 );
   5153 
   5154    // Finally, the image data.
   5155    if (compression) {
   5156       // RLE as used by .PSD and .TIFF
   5157       // Loop until you get the number of unpacked bytes you are expecting:
   5158       //     Read the next source byte into n.
   5159       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   5160       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   5161       //     Else if n is 128, noop.
   5162       // Endloop
   5163 
   5164       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
   5165       // which we're going to just skip.
   5166       stbi__skip(s, h * channelCount * 2 );
   5167 
   5168       // Read the RLE data by channel.
   5169       for (channel = 0; channel < 4; channel++) {
   5170          stbi_uc *p;
   5171 
   5172          p = out+channel;
   5173          if (channel >= channelCount) {
   5174             // Fill this channel with default data.
   5175             for (i = 0; i < pixelCount; i++, p += 4)
   5176                *p = (channel == 3 ? 255 : 0);
   5177          } else {
   5178             // Read the RLE data.
   5179             count = 0;
   5180             while (count < pixelCount) {
   5181                len = stbi__get8(s);
   5182                if (len == 128) {
   5183                   // No-op.
   5184                } else if (len < 128) {
   5185                   // Copy next len+1 bytes literally.
   5186                   len++;
   5187                   count += len;
   5188                   while (len) {
   5189                      *p = stbi__get8(s);
   5190                      p += 4;
   5191                      len--;
   5192                   }
   5193                } else if (len > 128) {
   5194                   stbi_uc   val;
   5195                   // Next -len+1 bytes in the dest are replicated from next source byte.
   5196                   // (Interpret len as a negative 8-bit int.)
   5197                   len ^= 0x0FF;
   5198                   len += 2;
   5199                   val = stbi__get8(s);
   5200                   count += len;
   5201                   while (len) {
   5202                      *p = val;
   5203                      p += 4;
   5204                      len--;
   5205                   }
   5206                }
   5207             }
   5208          }
   5209       }
   5210 
   5211    } else {
   5212       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   5213       // where each channel consists of an 8-bit value for each pixel in the image.
   5214 
   5215       // Read the data by channel.
   5216       for (channel = 0; channel < 4; channel++) {
   5217          stbi_uc *p;
   5218 
   5219          p = out + channel;
   5220          if (channel >= channelCount) {
   5221             // Fill this channel with default data.
   5222             stbi_uc val = channel == 3 ? 255 : 0;
   5223             for (i = 0; i < pixelCount; i++, p += 4)
   5224                *p = val;
   5225          } else {
   5226             // Read the data.
   5227             if (bitdepth == 16) {
   5228                for (i = 0; i < pixelCount; i++, p += 4)
   5229                   *p = (stbi_uc) (stbi__get16be(s) >> 8);
   5230             } else {
   5231                for (i = 0; i < pixelCount; i++, p += 4)
   5232                   *p = stbi__get8(s);
   5233             }
   5234          }
   5235       }
   5236    }
   5237 
   5238    if (req_comp && req_comp != 4) {
   5239       out = stbi__convert_format(out, 4, req_comp, w, h);
   5240       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5241    }
   5242 
   5243    if (comp) *comp = 4;
   5244    *y = h;
   5245    *x = w;
   5246 
   5247    return out;
   5248 }
   5249 #endif
   5250 
   5251 // *************************************************************************************************
   5252 // Softimage PIC loader
   5253 // by Tom Seddon
   5254 //
   5255 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   5256 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   5257 
   5258 #ifndef STBI_NO_PIC
   5259 static int stbi__pic_is4(stbi__context *s,const char *str)
   5260 {
   5261    int i;
   5262    for (i=0; i<4; ++i)
   5263       if (stbi__get8(s) != (stbi_uc)str[i])
   5264          return 0;
   5265 
   5266    return 1;
   5267 }
   5268 
   5269 static int stbi__pic_test_core(stbi__context *s)
   5270 {
   5271    int i;
   5272 
   5273    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   5274       return 0;
   5275 
   5276    for(i=0;i<84;++i)
   5277       stbi__get8(s);
   5278 
   5279    if (!stbi__pic_is4(s,"PICT"))
   5280       return 0;
   5281 
   5282    return 1;
   5283 }
   5284 
   5285 typedef struct
   5286 {
   5287    stbi_uc size,type,channel;
   5288 } stbi__pic_packet;
   5289 
   5290 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   5291 {
   5292    int mask=0x80, i;
   5293 
   5294    for (i=0; i<4; ++i, mask>>=1) {
   5295       if (channel & mask) {
   5296          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   5297          dest[i]=stbi__get8(s);
   5298       }
   5299    }
   5300 
   5301    return dest;
   5302 }
   5303 
   5304 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   5305 {
   5306    int mask=0x80,i;
   5307 
   5308    for (i=0;i<4; ++i, mask>>=1)
   5309       if (channel&mask)
   5310          dest[i]=src[i];
   5311 }
   5312 
   5313 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   5314 {
   5315    int act_comp=0,num_packets=0,y,chained;
   5316    stbi__pic_packet packets[10];
   5317 
   5318    // this will (should...) cater for even some bizarre stuff like having data
   5319     // for the same channel in multiple packets.
   5320    do {
   5321       stbi__pic_packet *packet;
   5322 
   5323       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   5324          return stbi__errpuc("bad format","too many packets");
   5325 
   5326       packet = &packets[num_packets++];
   5327 
   5328       chained = stbi__get8(s);
   5329       packet->size    = stbi__get8(s);
   5330       packet->type    = stbi__get8(s);
   5331       packet->channel = stbi__get8(s);
   5332 
   5333       act_comp |= packet->channel;
   5334 
   5335       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   5336       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   5337    } while (chained);
   5338 
   5339    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   5340 
   5341    for(y=0; y<height; ++y) {
   5342       int packet_idx;
   5343 
   5344       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   5345          stbi__pic_packet *packet = &packets[packet_idx];
   5346          stbi_uc *dest = result+y*width*4;
   5347 
   5348          switch (packet->type) {
   5349             default:
   5350                return stbi__errpuc("bad format","packet has bad compression type");
   5351 
   5352             case 0: {//uncompressed
   5353                int x;
   5354 
   5355                for(x=0;x<width;++x, dest+=4)
   5356                   if (!stbi__readval(s,packet->channel,dest))
   5357                      return 0;
   5358                break;
   5359             }
   5360 
   5361             case 1://Pure RLE
   5362                {
   5363                   int left=width, i;
   5364 
   5365                   while (left>0) {
   5366                      stbi_uc count,value[4];
   5367 
   5368                      count=stbi__get8(s);
   5369                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   5370 
   5371                      if (count > left)
   5372                         count = (stbi_uc) left;
   5373 
   5374                      if (!stbi__readval(s,packet->channel,value))  return 0;
   5375 
   5376                      for(i=0; i<count; ++i,dest+=4)
   5377                         stbi__copyval(packet->channel,dest,value);
   5378                      left -= count;
   5379                   }
   5380                }
   5381                break;
   5382 
   5383             case 2: {//Mixed RLE
   5384                int left=width;
   5385                while (left>0) {
   5386                   int count = stbi__get8(s), i;
   5387                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   5388 
   5389                   if (count >= 128) { // Repeated
   5390                      stbi_uc value[4];
   5391 
   5392                      if (count==128)
   5393                         count = stbi__get16be(s);
   5394                      else
   5395                         count -= 127;
   5396                      if (count > left)
   5397                         return stbi__errpuc("bad file","scanline overrun");
   5398 
   5399                      if (!stbi__readval(s,packet->channel,value))
   5400                         return 0;
   5401 
   5402                      for(i=0;i<count;++i, dest += 4)
   5403                         stbi__copyval(packet->channel,dest,value);
   5404                   } else { // Raw
   5405                      ++count;
   5406                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   5407 
   5408                      for(i=0;i<count;++i, dest+=4)
   5409                         if (!stbi__readval(s,packet->channel,dest))
   5410                            return 0;
   5411                   }
   5412                   left-=count;
   5413                }
   5414                break;
   5415             }
   5416          }
   5417       }
   5418    }
   5419 
   5420    return result;
   5421 }
   5422 
   5423 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
   5424 {
   5425    stbi_uc *result;
   5426    int i, x,y;
   5427 
   5428    for (i=0; i<92; ++i)
   5429       stbi__get8(s);
   5430 
   5431    x = stbi__get16be(s);
   5432    y = stbi__get16be(s);
   5433    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   5434    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
   5435 
   5436    stbi__get32be(s); //skip `ratio'
   5437    stbi__get16be(s); //skip `fields'
   5438    stbi__get16be(s); //skip `pad'
   5439 
   5440    // intermediate buffer is RGBA
   5441    result = (stbi_uc *) stbi__malloc(x*y*4);
   5442    memset(result, 0xff, x*y*4);
   5443 
   5444    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   5445       STBI_FREE(result);
   5446       result=0;
   5447    }
   5448    *px = x;
   5449    *py = y;
   5450    if (req_comp == 0) req_comp = *comp;
   5451    result=stbi__convert_format(result,4,req_comp,x,y);
   5452 
   5453    return result;
   5454 }
   5455 
   5456 static int stbi__pic_test(stbi__context *s)
   5457 {
   5458    int r = stbi__pic_test_core(s);
   5459    stbi__rewind(s);
   5460    return r;
   5461 }
   5462 #endif
   5463 
   5464 // *************************************************************************************************
   5465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   5466 
   5467 #ifndef STBI_NO_GIF
   5468 typedef struct
   5469 {
   5470    stbi__int16 prefix;
   5471    stbi_uc first;
   5472    stbi_uc suffix;
   5473 } stbi__gif_lzw;
   5474 
   5475 typedef struct
   5476 {
   5477    int w,h;
   5478    stbi_uc *out, *old_out;             // output buffer (always 4 components)
   5479    int flags, bgindex, ratio, transparent, eflags, delay;
   5480    stbi_uc  pal[256][4];
   5481    stbi_uc lpal[256][4];
   5482    stbi__gif_lzw codes[4096];
   5483    stbi_uc *color_table;
   5484    int parse, step;
   5485    int lflags;
   5486    int start_x, start_y;
   5487    int max_x, max_y;
   5488    int cur_x, cur_y;
   5489    int line_size;
   5490 } stbi__gif;
   5491 
   5492 static int stbi__gif_test_raw(stbi__context *s)
   5493 {
   5494    int sz;
   5495    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   5496    sz = stbi__get8(s);
   5497    if (sz != '9' && sz != '7') return 0;
   5498    if (stbi__get8(s) != 'a') return 0;
   5499    return 1;
   5500 }
   5501 
   5502 static int stbi__gif_test(stbi__context *s)
   5503 {
   5504    int r = stbi__gif_test_raw(s);
   5505    stbi__rewind(s);
   5506    return r;
   5507 }
   5508 
   5509 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   5510 {
   5511    int i;
   5512    for (i=0; i < num_entries; ++i) {
   5513       pal[i][2] = stbi__get8(s);
   5514       pal[i][1] = stbi__get8(s);
   5515       pal[i][0] = stbi__get8(s);
   5516       pal[i][3] = transp == i ? 0 : 255;
   5517    }
   5518 }
   5519 
   5520 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   5521 {
   5522    stbi_uc version;
   5523    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   5524       return stbi__err("not GIF", "Corrupt GIF");
   5525 
   5526    version = stbi__get8(s);
   5527    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   5528    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   5529 
   5530    stbi__g_failure_reason = "";
   5531    g->w = stbi__get16le(s);
   5532    g->h = stbi__get16le(s);
   5533    g->flags = stbi__get8(s);
   5534    g->bgindex = stbi__get8(s);
   5535    g->ratio = stbi__get8(s);
   5536    g->transparent = -1;
   5537 
   5538    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   5539 
   5540    if (is_info) return 1;
   5541 
   5542    if (g->flags & 0x80)
   5543       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   5544 
   5545    return 1;
   5546 }
   5547 
   5548 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   5549 {
   5550    stbi__gif g;
   5551    if (!stbi__gif_header(s, &g, comp, 1)) {
   5552       stbi__rewind( s );
   5553       return 0;
   5554    }
   5555    if (x) *x = g.w;
   5556    if (y) *y = g.h;
   5557    return 1;
   5558 }
   5559 
   5560 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   5561 {
   5562    stbi_uc *p, *c;
   5563 
   5564    // recurse to decode the prefixes, since the linked-list is backwards,
   5565    // and working backwards through an interleaved image would be nasty
   5566    if (g->codes[code].prefix >= 0)
   5567       stbi__out_gif_code(g, g->codes[code].prefix);
   5568 
   5569    if (g->cur_y >= g->max_y) return;
   5570 
   5571    p = &g->out[g->cur_x + g->cur_y];
   5572    c = &g->color_table[g->codes[code].suffix * 4];
   5573 
   5574    if (c[3] >= 128) {
   5575       p[0] = c[2];
   5576       p[1] = c[1];
   5577       p[2] = c[0];
   5578       p[3] = c[3];
   5579    }
   5580    g->cur_x += 4;
   5581 
   5582    if (g->cur_x >= g->max_x) {
   5583       g->cur_x = g->start_x;
   5584       g->cur_y += g->step;
   5585 
   5586       while (g->cur_y >= g->max_y && g->parse > 0) {
   5587          g->step = (1 << g->parse) * g->line_size;
   5588          g->cur_y = g->start_y + (g->step >> 1);
   5589          --g->parse;
   5590       }
   5591    }
   5592 }
   5593 
   5594 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   5595 {
   5596    stbi_uc lzw_cs;
   5597    stbi__int32 len, init_code;
   5598    stbi__uint32 first;
   5599    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   5600    stbi__gif_lzw *p;
   5601 
   5602    lzw_cs = stbi__get8(s);
   5603    if (lzw_cs > 12) return NULL;
   5604    clear = 1 << lzw_cs;
   5605    first = 1;
   5606    codesize = lzw_cs + 1;
   5607    codemask = (1 << codesize) - 1;
   5608    bits = 0;
   5609    valid_bits = 0;
   5610    for (init_code = 0; init_code < clear; init_code++) {
   5611       g->codes[init_code].prefix = -1;
   5612       g->codes[init_code].first = (stbi_uc) init_code;
   5613       g->codes[init_code].suffix = (stbi_uc) init_code;
   5614    }
   5615 
   5616    // support no starting clear code
   5617    avail = clear+2;
   5618    oldcode = -1;
   5619 
   5620    len = 0;
   5621    for(;;) {
   5622       if (valid_bits < codesize) {
   5623          if (len == 0) {
   5624             len = stbi__get8(s); // start new block
   5625             if (len == 0)
   5626                return g->out;
   5627          }
   5628          --len;
   5629          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   5630          valid_bits += 8;
   5631       } else {
   5632          stbi__int32 code = bits & codemask;
   5633          bits >>= codesize;
   5634          valid_bits -= codesize;
   5635          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   5636          if (code == clear) {  // clear code
   5637             codesize = lzw_cs + 1;
   5638             codemask = (1 << codesize) - 1;
   5639             avail = clear + 2;
   5640             oldcode = -1;
   5641             first = 0;
   5642          } else if (code == clear + 1) { // end of stream code
   5643             stbi__skip(s, len);
   5644             while ((len = stbi__get8(s)) > 0)
   5645                stbi__skip(s,len);
   5646             return g->out;
   5647          } else if (code <= avail) {
   5648             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
   5649 
   5650             if (oldcode >= 0) {
   5651                p = &g->codes[avail++];
   5652                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
   5653                p->prefix = (stbi__int16) oldcode;
   5654                p->first = g->codes[oldcode].first;
   5655                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   5656             } else if (code == avail)
   5657                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   5658 
   5659             stbi__out_gif_code(g, (stbi__uint16) code);
   5660 
   5661             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   5662                codesize++;
   5663                codemask = (1 << codesize) - 1;
   5664             }
   5665 
   5666             oldcode = code;
   5667          } else {
   5668             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   5669          }
   5670       }
   5671    }
   5672 }
   5673 
   5674 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
   5675 {
   5676    int x, y;
   5677    stbi_uc *c = g->pal[g->bgindex];
   5678    for (y = y0; y < y1; y += 4 * g->w) {
   5679       for (x = x0; x < x1; x += 4) {
   5680          stbi_uc *p  = &g->out[y + x];
   5681          p[0] = c[2];
   5682          p[1] = c[1];
   5683          p[2] = c[0];
   5684          p[3] = 0;
   5685       }
   5686    }
   5687 }
   5688 
   5689 // this function is designed to support animated gifs, although stb_image doesn't support it
   5690 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
   5691 {
   5692    int i;
   5693    stbi_uc *prev_out = 0;
   5694 
   5695    if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
   5696       return 0; // stbi__g_failure_reason set by stbi__gif_header
   5697 
   5698    prev_out = g->out;
   5699    g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
   5700    if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
   5701 
   5702    switch ((g->eflags & 0x1C) >> 2) {
   5703       case 0: // unspecified (also always used on 1st frame)
   5704          stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
   5705          break;
   5706       case 1: // do not dispose
   5707          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
   5708          g->old_out = prev_out;
   5709          break;
   5710       case 2: // dispose to background
   5711          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
   5712          stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
   5713          break;
   5714       case 3: // dispose to previous
   5715          if (g->old_out) {
   5716             for (i = g->start_y; i < g->max_y; i += 4 * g->w)
   5717                memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
   5718          }
   5719          break;
   5720    }
   5721 
   5722    for (;;) {
   5723       switch (stbi__get8(s)) {
   5724          case 0x2C: /* Image Descriptor */
   5725          {
   5726             int prev_trans = -1;
   5727             stbi__int32 x, y, w, h;
   5728             stbi_uc *o;
   5729 
   5730             x = stbi__get16le(s);
   5731             y = stbi__get16le(s);
   5732             w = stbi__get16le(s);
   5733             h = stbi__get16le(s);
   5734             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   5735                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   5736 
   5737             g->line_size = g->w * 4;
   5738             g->start_x = x * 4;
   5739             g->start_y = y * g->line_size;
   5740             g->max_x   = g->start_x + w * 4;
   5741             g->max_y   = g->start_y + h * g->line_size;
   5742             g->cur_x   = g->start_x;
   5743             g->cur_y   = g->start_y;
   5744 
   5745             g->lflags = stbi__get8(s);
   5746 
   5747             if (g->lflags & 0x40) {
   5748                g->step = 8 * g->line_size; // first interlaced spacing
   5749                g->parse = 3;
   5750             } else {
   5751                g->step = g->line_size;
   5752                g->parse = 0;
   5753             }
   5754 
   5755             if (g->lflags & 0x80) {
   5756                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   5757                g->color_table = (stbi_uc *) g->lpal;
   5758             } else if (g->flags & 0x80) {
   5759                if (g->transparent >= 0 && (g->eflags & 0x01)) {
   5760                   prev_trans = g->pal[g->transparent][3];
   5761                   g->pal[g->transparent][3] = 0;
   5762                }
   5763                g->color_table = (stbi_uc *) g->pal;
   5764             } else
   5765                return stbi__errpuc("missing color table", "Corrupt GIF");
   5766 
   5767             o = stbi__process_gif_raster(s, g);
   5768             if (o == NULL) return NULL;
   5769 
   5770             if (prev_trans != -1)
   5771                g->pal[g->transparent][3] = (stbi_uc) prev_trans;
   5772 
   5773             return o;
   5774          }
   5775 
   5776          case 0x21: // Comment Extension.
   5777          {
   5778             int len;
   5779             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
   5780                len = stbi__get8(s);
   5781                if (len == 4) {
   5782                   g->eflags = stbi__get8(s);
   5783                   g->delay = stbi__get16le(s);
   5784                   g->transparent = stbi__get8(s);
   5785                } else {
   5786                   stbi__skip(s, len);
   5787                   break;
   5788                }
   5789             }
   5790             while ((len = stbi__get8(s)) != 0)
   5791                stbi__skip(s, len);
   5792             break;
   5793          }
   5794 
   5795          case 0x3B: // gif stream termination code
   5796             return (stbi_uc *) s; // using '1' causes warning on some compilers
   5797 
   5798          default:
   5799             return stbi__errpuc("unknown code", "Corrupt GIF");
   5800       }
   5801    }
   5802 
   5803    STBI_NOTUSED(req_comp);
   5804 }
   5805 
   5806 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   5807 {
   5808    stbi_uc *u = 0;
   5809    stbi__gif g;
   5810    memset(&g, 0, sizeof(g));
   5811 
   5812    u = stbi__gif_load_next(s, &g, comp, req_comp);
   5813    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   5814    if (u) {
   5815       *x = g.w;
   5816       *y = g.h;
   5817       if (req_comp && req_comp != 4)
   5818          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   5819    }
   5820    else if (g.out)
   5821       STBI_FREE(g.out);
   5822 
   5823    return u;
   5824 }
   5825 
   5826 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   5827 {
   5828    return stbi__gif_info_raw(s,x,y,comp);
   5829 }
   5830 #endif
   5831 
   5832 // *************************************************************************************************
   5833 // Radiance RGBE HDR loader
   5834 // originally by Nicolas Schulz
   5835 #ifndef STBI_NO_HDR
   5836 static int stbi__hdr_test_core(stbi__context *s)
   5837 {
   5838    const char *signature = "#?RADIANCE\n";
   5839    int i;
   5840    for (i=0; signature[i]; ++i)
   5841       if (stbi__get8(s) != signature[i])
   5842          return 0;
   5843    return 1;
   5844 }
   5845 
   5846 static int stbi__hdr_test(stbi__context* s)
   5847 {
   5848    int r = stbi__hdr_test_core(s);
   5849    stbi__rewind(s);
   5850    return r;
   5851 }
   5852 
   5853 #define STBI__HDR_BUFLEN  1024
   5854 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   5855 {
   5856    int len=0;
   5857    char c = '\0';
   5858 
   5859    c = (char) stbi__get8(z);
   5860 
   5861    while (!stbi__at_eof(z) && c != '\n') {
   5862       buffer[len++] = c;
   5863       if (len == STBI__HDR_BUFLEN-1) {
   5864          // flush to end of line
   5865          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   5866             ;
   5867          break;
   5868       }
   5869       c = (char) stbi__get8(z);
   5870    }
   5871 
   5872    buffer[len] = 0;
   5873    return buffer;
   5874 }
   5875 
   5876 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   5877 {
   5878    if ( input[3] != 0 ) {
   5879       float f1;
   5880       // Exponent
   5881       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   5882       if (req_comp <= 2)
   5883          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   5884       else {
   5885          output[0] = input[0] * f1;
   5886          output[1] = input[1] * f1;
   5887          output[2] = input[2] * f1;
   5888       }
   5889       if (req_comp == 2) output[1] = 1;
   5890       if (req_comp == 4) output[3] = 1;
   5891    } else {
   5892       switch (req_comp) {
   5893          case 4: output[3] = 1; /* fallthrough */
   5894          case 3: output[0] = output[1] = output[2] = 0;
   5895                  break;
   5896          case 2: output[1] = 1; /* fallthrough */
   5897          case 1: output[0] = 0;
   5898                  break;
   5899       }
   5900    }
   5901 }
   5902 
   5903 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   5904 {
   5905    char buffer[STBI__HDR_BUFLEN];
   5906    char *token;
   5907    int valid = 0;
   5908    int width, height;
   5909    stbi_uc *scanline;
   5910    float *hdr_data;
   5911    int len;
   5912    unsigned char count, value;
   5913    int i, j, k, c1,c2, z;
   5914 
   5915 
   5916    // Check identifier
   5917    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
   5918       return stbi__errpf("not HDR", "Corrupt HDR image");
   5919 
   5920    // Parse header
   5921    for(;;) {
   5922       token = stbi__hdr_gettoken(s,buffer);
   5923       if (token[0] == 0) break;
   5924       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   5925    }
   5926 
   5927    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   5928 
   5929    // Parse width and height
   5930    // can't use sscanf() if we're not using stdio!
   5931    token = stbi__hdr_gettoken(s,buffer);
   5932    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   5933    token += 3;
   5934    height = (int) strtol(token, &token, 10);
   5935    while (*token == ' ') ++token;
   5936    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   5937    token += 3;
   5938    width = (int) strtol(token, NULL, 10);
   5939 
   5940    *x = width;
   5941    *y = height;
   5942 
   5943    if (comp) *comp = 3;
   5944    if (req_comp == 0) req_comp = 3;
   5945 
   5946    // Read data
   5947    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
   5948 
   5949    // Load image data
   5950    // image data is stored as some number of sca
   5951    if ( width < 8 || width >= 32768) {
   5952       // Read flat data
   5953       for (j=0; j < height; ++j) {
   5954          for (i=0; i < width; ++i) {
   5955             stbi_uc rgbe[4];
   5956            main_decode_loop:
   5957             stbi__getn(s, rgbe, 4);
   5958             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   5959          }
   5960       }
   5961    } else {
   5962       // Read RLE-encoded data
   5963       scanline = NULL;
   5964 
   5965       for (j = 0; j < height; ++j) {
   5966          c1 = stbi__get8(s);
   5967          c2 = stbi__get8(s);
   5968          len = stbi__get8(s);
   5969          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   5970             // not run-length encoded, so we have to actually use THIS data as a decoded
   5971             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   5972             stbi_uc rgbe[4];
   5973             rgbe[0] = (stbi_uc) c1;
   5974             rgbe[1] = (stbi_uc) c2;
   5975             rgbe[2] = (stbi_uc) len;
   5976             rgbe[3] = (stbi_uc) stbi__get8(s);
   5977             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   5978             i = 1;
   5979             j = 0;
   5980             STBI_FREE(scanline);
   5981             goto main_decode_loop; // yes, this makes no sense
   5982          }
   5983          len <<= 8;
   5984          len |= stbi__get8(s);
   5985          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   5986          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
   5987 
   5988          for (k = 0; k < 4; ++k) {
   5989             i = 0;
   5990             while (i < width) {
   5991                count = stbi__get8(s);
   5992                if (count > 128) {
   5993                   // Run
   5994                   value = stbi__get8(s);
   5995                   count -= 128;
   5996                   for (z = 0; z < count; ++z)
   5997                      scanline[i++ * 4 + k] = value;
   5998                } else {
   5999                   // Dump
   6000                   for (z = 0; z < count; ++z)
   6001                      scanline[i++ * 4 + k] = stbi__get8(s);
   6002                }
   6003             }
   6004          }
   6005          for (i=0; i < width; ++i)
   6006             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   6007       }
   6008       STBI_FREE(scanline);
   6009    }
   6010 
   6011    return hdr_data;
   6012 }
   6013 
   6014 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   6015 {
   6016    char buffer[STBI__HDR_BUFLEN];
   6017    char *token;
   6018    int valid = 0;
   6019 
   6020    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
   6021        stbi__rewind( s );
   6022        return 0;
   6023    }
   6024 
   6025    for(;;) {
   6026       token = stbi__hdr_gettoken(s,buffer);
   6027       if (token[0] == 0) break;
   6028       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   6029    }
   6030 
   6031    if (!valid) {
   6032        stbi__rewind( s );
   6033        return 0;
   6034    }
   6035    token = stbi__hdr_gettoken(s,buffer);
   6036    if (strncmp(token, "-Y ", 3)) {
   6037        stbi__rewind( s );
   6038        return 0;
   6039    }
   6040    token += 3;
   6041    *y = (int) strtol(token, &token, 10);
   6042    while (*token == ' ') ++token;
   6043    if (strncmp(token, "+X ", 3)) {
   6044        stbi__rewind( s );
   6045        return 0;
   6046    }
   6047    token += 3;
   6048    *x = (int) strtol(token, NULL, 10);
   6049    *comp = 3;
   6050    return 1;
   6051 }
   6052 #endif // STBI_NO_HDR
   6053 
   6054 #ifndef STBI_NO_BMP
   6055 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   6056 {
   6057    int hsz;
   6058    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
   6059        stbi__rewind( s );
   6060        return 0;
   6061    }
   6062    stbi__skip(s,12);
   6063    hsz = stbi__get32le(s);
   6064    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
   6065        stbi__rewind( s );
   6066        return 0;
   6067    }
   6068    if (hsz == 12) {
   6069       *x = stbi__get16le(s);
   6070       *y = stbi__get16le(s);
   6071    } else {
   6072       *x = stbi__get32le(s);
   6073       *y = stbi__get32le(s);
   6074    }
   6075    if (stbi__get16le(s) != 1) {
   6076        stbi__rewind( s );
   6077        return 0;
   6078    }
   6079    *comp = stbi__get16le(s) / 8;
   6080    return 1;
   6081 }
   6082 #endif
   6083 
   6084 #ifndef STBI_NO_PSD
   6085 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   6086 {
   6087    int channelCount;
   6088    if (stbi__get32be(s) != 0x38425053) {
   6089        stbi__rewind( s );
   6090        return 0;
   6091    }
   6092    if (stbi__get16be(s) != 1) {
   6093        stbi__rewind( s );
   6094        return 0;
   6095    }
   6096    stbi__skip(s, 6);
   6097    channelCount = stbi__get16be(s);
   6098    if (channelCount < 0 || channelCount > 16) {
   6099        stbi__rewind( s );
   6100        return 0;
   6101    }
   6102    *y = stbi__get32be(s);
   6103    *x = stbi__get32be(s);
   6104    if (stbi__get16be(s) != 8) {
   6105        stbi__rewind( s );
   6106        return 0;
   6107    }
   6108    if (stbi__get16be(s) != 3) {
   6109        stbi__rewind( s );
   6110        return 0;
   6111    }
   6112    *comp = 4;
   6113    return 1;
   6114 }
   6115 #endif
   6116 
   6117 #ifndef STBI_NO_PIC
   6118 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   6119 {
   6120    int act_comp=0,num_packets=0,chained;
   6121    stbi__pic_packet packets[10];
   6122 
   6123    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   6124       stbi__rewind(s);
   6125       return 0;
   6126    }
   6127 
   6128    stbi__skip(s, 88);
   6129 
   6130    *x = stbi__get16be(s);
   6131    *y = stbi__get16be(s);
   6132    if (stbi__at_eof(s)) {
   6133       stbi__rewind( s);
   6134       return 0;
   6135    }
   6136    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   6137       stbi__rewind( s );
   6138       return 0;
   6139    }
   6140 
   6141    stbi__skip(s, 8);
   6142 
   6143    do {
   6144       stbi__pic_packet *packet;
   6145 
   6146       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6147          return 0;
   6148 
   6149       packet = &packets[num_packets++];
   6150       chained = stbi__get8(s);
   6151       packet->size    = stbi__get8(s);
   6152       packet->type    = stbi__get8(s);
   6153       packet->channel = stbi__get8(s);
   6154       act_comp |= packet->channel;
   6155 
   6156       if (stbi__at_eof(s)) {
   6157           stbi__rewind( s );
   6158           return 0;
   6159       }
   6160       if (packet->size != 8) {
   6161           stbi__rewind( s );
   6162           return 0;
   6163       }
   6164    } while (chained);
   6165 
   6166    *comp = (act_comp & 0x10 ? 4 : 3);
   6167 
   6168    return 1;
   6169 }
   6170 #endif
   6171 
   6172 // *************************************************************************************************
   6173 // Portable Gray Map and Portable Pixel Map loader
   6174 // by Ken Miller
   6175 //
   6176 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   6177 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   6178 //
   6179 // Known limitations:
   6180 //    Does not support comments in the header section
   6181 //    Does not support ASCII image data (formats P2 and P3)
   6182 //    Does not support 16-bit-per-channel
   6183 
   6184 #ifndef STBI_NO_PNM
   6185 
   6186 static int      stbi__pnm_test(stbi__context *s)
   6187 {
   6188    char p, t;
   6189    p = (char) stbi__get8(s);
   6190    t = (char) stbi__get8(s);
   6191    if (p != 'P' || (t != '5' && t != '6')) {
   6192        stbi__rewind( s );
   6193        return 0;
   6194    }
   6195    return 1;
   6196 }
   6197 
   6198 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   6199 {
   6200    stbi_uc *out;
   6201    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
   6202       return 0;
   6203    *x = s->img_x;
   6204    *y = s->img_y;
   6205    *comp = s->img_n;
   6206 
   6207    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
   6208    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   6209    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
   6210 
   6211    if (req_comp && req_comp != s->img_n) {
   6212       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   6213       if (out == NULL) return out; // stbi__convert_format frees input on failure
   6214    }
   6215    return out;
   6216 }
   6217 
   6218 static int      stbi__pnm_isspace(char c)
   6219 {
   6220    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   6221 }
   6222 
   6223 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   6224 {
   6225    while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   6226       *c = (char) stbi__get8(s);
   6227 }
   6228 
   6229 static int      stbi__pnm_isdigit(char c)
   6230 {
   6231    return c >= '0' && c <= '9';
   6232 }
   6233 
   6234 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   6235 {
   6236    int value = 0;
   6237 
   6238    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   6239       value = value*10 + (*c - '0');
   6240       *c = (char) stbi__get8(s);
   6241    }
   6242 
   6243    return value;
   6244 }
   6245 
   6246 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   6247 {
   6248    int maxv;
   6249    char c, p, t;
   6250 
   6251    stbi__rewind( s );
   6252 
   6253    // Get identifier
   6254    p = (char) stbi__get8(s);
   6255    t = (char) stbi__get8(s);
   6256    if (p != 'P' || (t != '5' && t != '6')) {
   6257        stbi__rewind( s );
   6258        return 0;
   6259    }
   6260 
   6261    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   6262 
   6263    c = (char) stbi__get8(s);
   6264    stbi__pnm_skip_whitespace(s, &c);
   6265 
   6266    *x = stbi__pnm_getinteger(s, &c); // read width
   6267    stbi__pnm_skip_whitespace(s, &c);
   6268 
   6269    *y = stbi__pnm_getinteger(s, &c); // read height
   6270    stbi__pnm_skip_whitespace(s, &c);
   6271 
   6272    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   6273 
   6274    if (maxv > 255)
   6275       return stbi__err("max value > 255", "PPM image not 8-bit");
   6276    else
   6277       return 1;
   6278 }
   6279 #endif
   6280 
   6281 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   6282 {
   6283    #ifndef STBI_NO_JPEG
   6284    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   6285    #endif
   6286 
   6287    #ifndef STBI_NO_PNG
   6288    if (stbi__png_info(s, x, y, comp))  return 1;
   6289    #endif
   6290 
   6291    #ifndef STBI_NO_GIF
   6292    if (stbi__gif_info(s, x, y, comp))  return 1;
   6293    #endif
   6294 
   6295    #ifndef STBI_NO_BMP
   6296    if (stbi__bmp_info(s, x, y, comp))  return 1;
   6297    #endif
   6298 
   6299    #ifndef STBI_NO_PSD
   6300    if (stbi__psd_info(s, x, y, comp))  return 1;
   6301    #endif
   6302 
   6303    #ifndef STBI_NO_PIC
   6304    if (stbi__pic_info(s, x, y, comp))  return 1;
   6305    #endif
   6306 
   6307    #ifndef STBI_NO_PNM
   6308    if (stbi__pnm_info(s, x, y, comp))  return 1;
   6309    #endif
   6310 
   6311    #ifndef STBI_NO_HDR
   6312    if (stbi__hdr_info(s, x, y, comp))  return 1;
   6313    #endif
   6314 
   6315    // test tga last because it's a crappy test!
   6316    #ifndef STBI_NO_TGA
   6317    if (stbi__tga_info(s, x, y, comp))
   6318        return 1;
   6319    #endif
   6320    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   6321 }
   6322 
   6323 #ifndef STBI_NO_STDIO
   6324 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   6325 {
   6326     FILE *f = stbi__fopen(filename, "rb");
   6327     int result;
   6328     if (!f) return stbi__err("can't fopen", "Unable to open file");
   6329     result = stbi_info_from_file(f, x, y, comp);
   6330     fclose(f);
   6331     return result;
   6332 }
   6333 
   6334 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   6335 {
   6336    int r;
   6337    stbi__context s;
   6338    long pos = ftell(f);
   6339    stbi__start_file(&s, f);
   6340    r = stbi__info_main(&s,x,y,comp);
   6341    fseek(f,pos,SEEK_SET);
   6342    return r;
   6343 }
   6344 #endif // !STBI_NO_STDIO
   6345 
   6346 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   6347 {
   6348    stbi__context s;
   6349    stbi__start_mem(&s,buffer,len);
   6350    return stbi__info_main(&s,x,y,comp);
   6351 }
   6352 
   6353 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   6354 {
   6355    stbi__context s;
   6356    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   6357    return stbi__info_main(&s,x,y,comp);
   6358 }
   6359 
   6360 #endif // STB_IMAGE_IMPLEMENTATION
   6361 
   6362 /*
   6363    revision history:
   6364       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   6365       2.07  (2015-09-13) fix compiler warnings
   6366                          partial animated GIF support
   6367                          limited 16-bit PSD support
   6368                          #ifdef unused functions
   6369                          bug with < 92 byte PIC,PNM,HDR,TGA
   6370       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   6371       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   6372       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   6373       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   6374                          stbi_set_flip_vertically_on_load (nguillemot)
   6375                          fix NEON support; fix mingw support
   6376       2.02  (2015-01-19) fix incorrect assert, fix warning
   6377       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   6378       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   6379       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   6380                          progressive JPEG (stb)
   6381                          PGM/PPM support (Ken Miller)
   6382                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   6383                          GIF bugfix -- seemingly never worked
   6384                          STBI_NO_*, STBI_ONLY_*
   6385       1.48  (2014-12-14) fix incorrectly-named assert()
   6386       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   6387                          optimize PNG (ryg)
   6388                          fix bug in interlaced PNG with user-specified channel count (stb)
   6389       1.46  (2014-08-26)
   6390               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   6391       1.45  (2014-08-16)
   6392               fix MSVC-ARM internal compiler error by wrapping malloc
   6393       1.44  (2014-08-07)
   6394               various warning fixes from Ronny Chevalier
   6395       1.43  (2014-07-15)
   6396               fix MSVC-only compiler problem in code changed in 1.42
   6397       1.42  (2014-07-09)
   6398               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   6399               fixes to stbi__cleanup_jpeg path
   6400               added STBI_ASSERT to avoid requiring assert.h
   6401       1.41  (2014-06-25)
   6402               fix search&replace from 1.36 that messed up comments/error messages
   6403       1.40  (2014-06-22)
   6404               fix gcc struct-initialization warning
   6405       1.39  (2014-06-15)
   6406               fix to TGA optimization when req_comp != number of components in TGA;
   6407               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   6408               add support for BMP version 5 (more ignored fields)
   6409       1.38  (2014-06-06)
   6410               suppress MSVC warnings on integer casts truncating values
   6411               fix accidental rename of 'skip' field of I/O
   6412       1.37  (2014-06-04)
   6413               remove duplicate typedef
   6414       1.36  (2014-06-03)
   6415               convert to header file single-file library
   6416               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   6417       1.35  (2014-05-27)
   6418               various warnings
   6419               fix broken STBI_SIMD path
   6420               fix bug where stbi_load_from_file no longer left file pointer in correct place
   6421               fix broken non-easy path for 32-bit BMP (possibly never used)
   6422               TGA optimization by Arseny Kapoulkine
   6423       1.34  (unknown)
   6424               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   6425       1.33  (2011-07-14)
   6426               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   6427       1.32  (2011-07-13)
   6428               support for "info" function for all supported filetypes (SpartanJ)
   6429       1.31  (2011-06-20)
   6430               a few more leak fixes, bug in PNG handling (SpartanJ)
   6431       1.30  (2011-06-11)
   6432               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   6433               removed deprecated format-specific test/load functions
   6434               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   6435               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   6436               fix inefficiency in decoding 32-bit BMP (David Woo)
   6437       1.29  (2010-08-16)
   6438               various warning fixes from Aurelien Pocheville
   6439       1.28  (2010-08-01)
   6440               fix bug in GIF palette transparency (SpartanJ)
   6441       1.27  (2010-08-01)
   6442               cast-to-stbi_uc to fix warnings
   6443       1.26  (2010-07-24)
   6444               fix bug in file buffering for PNG reported by SpartanJ
   6445       1.25  (2010-07-17)
   6446               refix trans_data warning (Won Chun)
   6447       1.24  (2010-07-12)
   6448               perf improvements reading from files on platforms with lock-heavy fgetc()
   6449               minor perf improvements for jpeg
   6450               deprecated type-specific functions so we'll get feedback if they're needed
   6451               attempt to fix trans_data warning (Won Chun)
   6452       1.23    fixed bug in iPhone support
   6453       1.22  (2010-07-10)
   6454               removed image *writing* support
   6455               stbi_info support from Jetro Lauha
   6456               GIF support from Jean-Marc Lienher
   6457               iPhone PNG-extensions from James Brown
   6458               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   6459       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   6460       1.20    added support for Softimage PIC, by Tom Seddon
   6461       1.19    bug in interlaced PNG corruption check (found by ryg)
   6462       1.18  (2008-08-02)
   6463               fix a threading bug (local mutable static)
   6464       1.17    support interlaced PNG
   6465       1.16    major bugfix - stbi__convert_format converted one too many pixels
   6466       1.15    initialize some fields for thread safety
   6467       1.14    fix threadsafe conversion bug
   6468               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   6469       1.13    threadsafe
   6470       1.12    const qualifiers in the API
   6471       1.11    Support installable IDCT, colorspace conversion routines
   6472       1.10    Fixes for 64-bit (don't use "unsigned long")
   6473               optimized upsampling by Fabian "ryg" Giesen
   6474       1.09    Fix format-conversion for PSD code (bad global variables!)
   6475       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   6476       1.07    attempt to fix C++ warning/errors again
   6477       1.06    attempt to fix C++ warning/errors again
   6478       1.05    fix TGA loading to return correct *comp and use good luminance calc
   6479       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   6480       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   6481       1.02    support for (subset of) HDR files, float interface for preferred access to them
   6482       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   6483               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   6484       1.00    interface to zlib that skips zlib header
   6485       0.99    correct handling of alpha in palette
   6486       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   6487       0.97    jpeg errors on too large a file; also catch another malloc failure
   6488       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   6489       0.95    during header scan, seek to markers in case of padding
   6490       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   6491       0.93    handle jpegtran output; verbose errors
   6492       0.92    read 4,8,16,24,32-bit BMP files of several formats
   6493       0.91    output 24-bit Windows 3.0 BMP files
   6494       0.90    fix a few more warnings; bump version number to approach 1.0
   6495       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   6496       0.60    fix compiling as c++
   6497       0.59    fix warnings: merge Dave Moore's -Wall fixes
   6498       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   6499       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   6500       0.56    fix bug: zlib uncompressed mode len vs. nlen
   6501       0.55    fix bug: restart_interval not initialized to 0
   6502       0.54    allow NULL for 'int *comp'
   6503       0.53    fix bug in png 3->4; speedup png decoding
   6504       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   6505       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   6506               on 'test' only check type, not whether we support this variant
   6507       0.50  (2006-11-19)
   6508               first released version
   6509 */
   6510