Home | History | Annotate | Download | only in doc
      1 # Stream
      2 
      3 In RapidJSON, `rapidjson::Stream` is a concept for reading/writing JSON. Here we first show how to use streams provided. And then see how to create a custom stream.
      4 
      5 [TOC]
      6 
      7 # Memory Streams {#MemoryStreams}
      8 
      9 Memory streams store JSON in memory.
     10 
     11 ## StringStream (Input) {#StringStream}
     12 
     13 `StringStream` is the most basic input stream. It represents a complete, read-only JSON stored in memory. It is defined in `rapidjson/rapidjson.h`.
     14 
     15 ~~~~~~~~~~cpp
     16 #include "rapidjson/document.h" // will include "rapidjson/rapidjson.h"
     17 
     18 using namespace rapidjson;
     19 
     20 // ...
     21 const char json[] = "[1, 2, 3, 4]";
     22 StringStream s(json);
     23 
     24 Document d;
     25 d.ParseStream(s);
     26 ~~~~~~~~~~
     27 
     28 Since this is very common usage, `Document::Parse(const char*)` is provided to do exactly the same as above:
     29 
     30 ~~~~~~~~~~cpp
     31 // ...
     32 const char json[] = "[1, 2, 3, 4]";
     33 Document d;
     34 d.Parse(json);
     35 ~~~~~~~~~~
     36 
     37 Note that, `StringStream` is a typedef of `GenericStringStream<UTF8<> >`, user may use another encodings to represent the character set of the stream.
     38 
     39 ## StringBuffer (Output) {#StringBuffer}
     40 
     41 `StringBuffer` is a simple output stream. It allocates a memory buffer for writing the whole JSON. Use `GetString()` to obtain the buffer.
     42 
     43 ~~~~~~~~~~cpp
     44 #include "rapidjson/stringbuffer.h"
     45 
     46 StringBuffer buffer;
     47 Writer<StringBuffer> writer(buffer);
     48 d.Accept(writer);
     49 
     50 const char* output = buffer.GetString();
     51 ~~~~~~~~~~
     52 
     53 When the buffer is full, it will increases the capacity automatically. The default capacity is 256 characters (256 bytes for UTF8, 512 bytes for UTF16, etc.). User can provide an allocator and a initial capacity.
     54 
     55 ~~~~~~~~~~cpp
     56 StringBuffer buffer1(0, 1024); // Use its allocator, initial size = 1024
     57 StringBuffer buffer2(allocator, 1024);
     58 ~~~~~~~~~~
     59 
     60 By default, `StringBuffer` will instantiate an internal allocator.
     61 
     62 Similarly, `StringBuffer` is a typedef of `GenericStringBuffer<UTF8<> >`.
     63 
     64 # File Streams {#FileStreams}
     65 
     66 When parsing a JSON from file, you may read the whole JSON into memory and use ``StringStream`` above.
     67 
     68 However, if the JSON is big, or memory is limited, you can use `FileReadStream`. It only read a part of JSON from file into buffer, and then let the part be parsed. If it runs out of characters in the buffer, it will read the next part from file.
     69 
     70 ## FileReadStream (Input) {#FileReadStream}
     71 
     72 `FileReadStream` reads the file via a `FILE` pointer. And user need to provide a buffer.
     73 
     74 ~~~~~~~~~~cpp
     75 #include "rapidjson/filereadstream.h"
     76 #include <cstdio>
     77 
     78 using namespace rapidjson;
     79 
     80 FILE* fp = fopen("big.json", "rb"); // non-Windows use "r"
     81 
     82 char readBuffer[65536];
     83 FileReadStream is(fp, readBuffer, sizeof(readBuffer));
     84 
     85 Document d;
     86 d.ParseStream(is);
     87 
     88 fclose(fp);
     89 ~~~~~~~~~~
     90 
     91 Different from string streams, `FileReadStream` is byte stream. It does not handle encodings. If the file is not UTF-8, the byte stream can be wrapped in a `EncodedInputStream`. It will be discussed very soon.
     92 
     93 Apart from reading file, user can also use `FileReadStream` to read `stdin`.
     94 
     95 ## FileWriteStream (Output) {#FileWriteStream}
     96 
     97 `FileWriteStream` is buffered output stream. Its usage is very similar to `FileReadStream`.
     98 
     99 ~~~~~~~~~~cpp
    100 #include "rapidjson/filewritestream.h"
    101 #include <cstdio>
    102 
    103 using namespace rapidjson;
    104 
    105 Document d;
    106 d.Parse(json);
    107 // ...
    108 
    109 FILE* fp = fopen("output.json", "wb"); // non-Windows use "w"
    110 
    111 char writeBuffer[65536];
    112 FileWriteStream os(fp, writeBuffer, sizeof(writeBuffer));
    113 
    114 Writer<FileWriteStream> writer(os);
    115 d.Accept(writer);
    116 
    117 fclose(fp);
    118 ~~~~~~~~~~
    119 
    120 It can also directs the output to `stdout`.
    121 
    122 # Encoded Streams {#EncodedStreams}
    123 
    124 Encoded streams do not contain JSON itself, but they wrap byte streams to provide basic encoding/decoding function.
    125 
    126 As mentioned above, UTF-8 byte streams can be read directly. However, UTF-16 and UTF-32 have endian issue. To handle endian correctly, it needs to convert bytes into characters (e.g. `wchar_t` for UTF-16) while reading, and characters into bytes while writing.
    127 
    128 Besides, it also need to handle [byte order mark (BOM)](http://en.wikipedia.org/wiki/Byte_order_mark). When reading from a byte stream, it is needed to detect or just consume the BOM if exists. When writing to a byte stream, it can optionally write BOM.
    129 
    130 If the encoding of stream is known in compile-time, you may use `EncodedInputStream` and `EncodedOutputStream`. If the stream can be UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE JSON, and it is only known in runtime, you may use `AutoUTFInputStream` and `AutoUTFOutputStream`. These streams are defined in `rapidjson/encodedstream.h`.
    131 
    132 Note that, these encoded streams can be applied to streams other than file. For example, you may have a file in memory, or a custom byte stream, be wrapped in encoded streams.
    133 
    134 ## EncodedInputStream {#EncodedInputStream}
    135 
    136 `EncodedInputStream` has two template parameters. The first one is a `Encoding` class, such as `UTF8`, `UTF16LE`, defined in `rapidjson/encodings.h`. The second one is the class of stream to be wrapped.
    137 
    138 ~~~~~~~~~~cpp
    139 #include "rapidjson/document.h"
    140 #include "rapidjson/filereadstream.h"   // FileReadStream
    141 #include "rapidjson/encodedstream.h"    // EncodedInputStream
    142 #include <cstdio>
    143 
    144 using namespace rapidjson;
    145 
    146 FILE* fp = fopen("utf16le.json", "rb"); // non-Windows use "r"
    147 
    148 char readBuffer[256];
    149 FileReadStream bis(fp, readBuffer, sizeof(readBuffer));
    150 
    151 EncodedInputStream<UTF16LE<>, FileReadStream> eis(bis);  // wraps bis into eis
    152 
    153 Document d; // Document is GenericDocument<UTF8<> > 
    154 d.ParseStream<0, UTF16LE<> >(eis);  // Parses UTF-16LE file into UTF-8 in memory
    155 
    156 fclose(fp);
    157 ~~~~~~~~~~
    158 
    159 ## EncodedOutputStream {#EncodedOutputStream}
    160 
    161 `EncodedOutputStream` is similar but it has a `bool putBOM` parameter in the constructor, controlling whether to write BOM into output byte stream.
    162 
    163 ~~~~~~~~~~cpp
    164 #include "rapidjson/filewritestream.h"  // FileWriteStream
    165 #include "rapidjson/encodedstream.h"    // EncodedOutputStream
    166 #include <cstdio>
    167 
    168 Document d;         // Document is GenericDocument<UTF8<> > 
    169 // ...
    170 
    171 FILE* fp = fopen("output_utf32le.json", "wb"); // non-Windows use "w"
    172 
    173 char writeBuffer[256];
    174 FileWriteStream bos(fp, writeBuffer, sizeof(writeBuffer));
    175 
    176 typedef EncodedOutputStream<UTF32LE<>, FileWriteStream> OutputStream;
    177 OutputStream eos(bos, true);   // Write BOM
    178 
    179 Writer<OutputStream, UTF32LE<>, UTF8<>> writer(eos);
    180 d.Accept(writer);   // This generates UTF32-LE file from UTF-8 in memory
    181 
    182 fclose(fp);
    183 ~~~~~~~~~~
    184 
    185 ## AutoUTFInputStream {#AutoUTFInputStream}
    186 
    187 Sometimes an application may want to handle all supported JSON encoding. `AutoUTFInputStream` will detection encoding by BOM first. If BOM is unavailable, it will use  characteristics of valid JSON to make detection. If neither method success, it falls back to the UTF type provided in constructor.
    188 
    189 Since the characters (code units) may be 8-bit, 16-bit or 32-bit. `AutoUTFInputStream` requires a character type which can hold at least 32-bit. We may use `unsigned`, as in the template parameter:
    190 
    191 ~~~~~~~~~~cpp
    192 #include "rapidjson/document.h"
    193 #include "rapidjson/filereadstream.h"   // FileReadStream
    194 #include "rapidjson/encodedstream.h"    // AutoUTFInputStream
    195 #include <cstdio>
    196 
    197 using namespace rapidjson;
    198 
    199 FILE* fp = fopen("any.json", "rb"); // non-Windows use "r"
    200 
    201 char readBuffer[256];
    202 FileReadStream bis(fp, readBuffer, sizeof(readBuffer));
    203 
    204 AutoUTFInputStream<unsigned, FileReadStream> eis(bis);  // wraps bis into eis
    205 
    206 Document d;         // Document is GenericDocument<UTF8<> > 
    207 d.ParseStream<0, AutoUTF<unsigned> >(eis); // This parses any UTF file into UTF-8 in memory
    208 
    209 fclose(fp);
    210 ~~~~~~~~~~
    211 
    212 When specifying the encoding of stream, uses `AutoUTF<CharType>` as in `ParseStream()` above.
    213 
    214 You can obtain the type of UTF via `UTFType GetType()`. And check whether a BOM is found by `HasBOM()`
    215 
    216 ## AutoUTFOutputStream {#AutoUTFOutputStream}
    217 
    218 Similarly, to choose encoding for output during runtime, we can use `AutoUTFOutputStream`. This class is not automatic *per se*. You need to specify the UTF type and whether to write BOM in runtime.
    219 
    220 ~~~~~~~~~~cpp
    221 using namespace rapidjson;
    222 
    223 void WriteJSONFile(FILE* fp, UTFType type, bool putBOM, const Document& d) {
    224     char writeBuffer[256];
    225     FileWriteStream bos(fp, writeBuffer, sizeof(writeBuffer));
    226 
    227     typedef AutoUTFOutputStream<unsigned, FileWriteStream> OutputStream;
    228     OutputStream eos(bos, type, putBOM);
    229     
    230     Writer<OutputStream, UTF8<>, AutoUTF<> > writer;
    231     d.Accept(writer);
    232 }
    233 ~~~~~~~~~~
    234 
    235 `AutoUTFInputStream` and `AutoUTFOutputStream` is more convenient than `EncodedInputStream` and `EncodedOutputStream`. They just incur a little bit runtime overheads.
    236 
    237 # Custom Stream {#CustomStream}
    238 
    239 In addition to memory/file streams, user can create their own stream classes which fits RapidJSON's API. For example, you may create network stream, stream from compressed file, etc.
    240 
    241 RapidJSON combines different types using templates. A class containing all required interface can be a stream. The Stream interface is defined in comments of `rapidjson/rapidjson.h`:
    242 
    243 ~~~~~~~~~~cpp
    244 concept Stream {
    245     typename Ch;    //!< Character type of the stream.
    246 
    247     //! Read the current character from stream without moving the read cursor.
    248     Ch Peek() const;
    249 
    250     //! Read the current character from stream and moving the read cursor to next character.
    251     Ch Take();
    252 
    253     //! Get the current read cursor.
    254     //! \return Number of characters read from start.
    255     size_t Tell();
    256 
    257     //! Begin writing operation at the current read pointer.
    258     //! \return The begin writer pointer.
    259     Ch* PutBegin();
    260 
    261     //! Write a character.
    262     void Put(Ch c);
    263 
    264     //! Flush the buffer.
    265     void Flush();
    266 
    267     //! End the writing operation.
    268     //! \param begin The begin write pointer returned by PutBegin().
    269     //! \return Number of characters written.
    270     size_t PutEnd(Ch* begin);
    271 }
    272 ~~~~~~~~~~
    273 
    274 For input stream, they must implement `Peek()`, `Take()` and `Tell()`.
    275 For output stream, they must implement `Put()` and `Flush()`. 
    276 There are two special interface, `PutBegin()` and `PutEnd()`, which are only for *in situ* parsing. Normal streams do not implement them. However, if the interface is not needed for a particular stream, it is still need to a dummy implementation, otherwise will generate compilation error.
    277 
    278 ## Example: istream wrapper {#ExampleIStreamWrapper}
    279 
    280 The following example is a wrapper of `std::istream`, which only implements 3 functions.
    281 
    282 ~~~~~~~~~~cpp
    283 class IStreamWrapper {
    284 public:
    285     typedef char Ch;
    286 
    287     IStreamWrapper(std::istream& is) : is_(is) {
    288     }
    289 
    290     Ch Peek() const { // 1
    291         int c = is_.peek();
    292         return c == std::char_traits<char>::eof() ? '\0' : (Ch)c;
    293     }
    294 
    295     Ch Take() { // 2
    296         int c = is_.get();
    297         return c == std::char_traits<char>::eof() ? '\0' : (Ch)c;
    298     }
    299 
    300     size_t Tell() const { return (size_t)is_.tellg(); } // 3
    301 
    302     Ch* PutBegin() { assert(false); return 0; }
    303     void Put(Ch) { assert(false); }
    304     void Flush() { assert(false); }
    305     size_t PutEnd(Ch*) { assert(false); return 0; }
    306 
    307 private:
    308     IStreamWrapper(const IStreamWrapper&);
    309     IStreamWrapper& operator=(const IStreamWrapper&);
    310 
    311     std::istream& is_;
    312 };
    313 ~~~~~~~~~~
    314 
    315 User can use it to wrap instances of `std::stringstream`, `std::ifstream`.
    316 
    317 ~~~~~~~~~~cpp
    318 const char* json = "[1,2,3,4]";
    319 std::stringstream ss(json);
    320 IStreamWrapper is(ss);
    321 
    322 Document d;
    323 d.ParseStream(is);
    324 ~~~~~~~~~~
    325 
    326 Note that, this implementation may not be as efficient as RapidJSON's memory or file streams, due to internal overheads of the standard library.
    327 
    328 ## Example: ostream wrapper {#ExampleOStreamWrapper}
    329 
    330 The following example is a wrapper of `std::istream`, which only implements 2 functions.
    331 
    332 ~~~~~~~~~~cpp
    333 class OStreamWrapper {
    334 public:
    335     typedef char Ch;
    336 
    337     OStreamWrapper(std::ostream& os) : os_(os) {
    338     }
    339 
    340     Ch Peek() const { assert(false); return '\0'; }
    341     Ch Take() { assert(false); return '\0'; }
    342     size_t Tell() const {  }
    343 
    344     Ch* PutBegin() { assert(false); return 0; }
    345     void Put(Ch c) { os_.put(c); }                  // 1
    346     void Flush() { os_.flush(); }                   // 2
    347     size_t PutEnd(Ch*) { assert(false); return 0; }
    348 
    349 private:
    350     OStreamWrapper(const OStreamWrapper&);
    351     OStreamWrapper& operator=(const OStreamWrapper&);
    352 
    353     std::ostream& os_;
    354 };
    355 ~~~~~~~~~~
    356 
    357 User can use it to wrap instances of `std::stringstream`, `std::ofstream`.
    358 
    359 ~~~~~~~~~~cpp
    360 Document d;
    361 // ...
    362 
    363 std::stringstream ss;
    364 OSStreamWrapper os(ss);
    365 
    366 Writer<OStreamWrapper> writer(os);
    367 d.Accept(writer);
    368 ~~~~~~~~~~
    369 
    370 Note that, this implementation may not be as efficient as RapidJSON's memory or file streams, due to internal overheads of the standard library.
    371 
    372 # Summary {#Summary}
    373 
    374 This section describes stream classes available in RapidJSON. Memory streams are simple. File stream can reduce the memory required during JSON parsing and generation, if the JSON is stored in file system. Encoded streams converts between byte streams and character streams. Finally, user may create custom streams using a simple interface.
    375