1 # Tutorial 2 3 This tutorial introduces the basics of the Document Object Model(DOM) API. 4 5 As shown in [Usage at a glance](@ref index), a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON. 6 7 [TOC] 8 9 # Value & Document {#ValueDocument} 10 11 Each JSON value is stored in a type called `Value`. A `Document`, representing the DOM, contains the root `Value` of the DOM tree. All public types and functions of RapidJSON are defined in the `rapidjson` namespace. 12 13 # Query Value {#QueryValue} 14 15 In this section, we will use excerpt of `example/tutorial/tutorial.cpp`. 16 17 Assumes we have a JSON stored in a C string (`const char* json`): 18 ~~~~~~~~~~js 19 { 20 "hello": "world", 21 "t": true , 22 "f": false, 23 "n": null, 24 "i": 123, 25 "pi": 3.1416, 26 "a": [1, 2, 3, 4] 27 } 28 ~~~~~~~~~~ 29 30 Parse it into a `Document`: 31 ~~~~~~~~~~cpp 32 #include "rapidjson/document.h" 33 34 using namespace rapidjson; 35 36 // ... 37 Document document; 38 document.Parse(json); 39 ~~~~~~~~~~ 40 41 The JSON is now parsed into `document` as a *DOM tree*: 42 43  44 45 Since the update to RFC 7159, the root of a conforming JSON document can be any JSON value. In earlier RFC 4627, only objects or arrays were allowed as root values. In this case, the root is an object. 46 ~~~~~~~~~~cpp 47 assert(document.IsObject()); 48 ~~~~~~~~~~ 49 50 Let's query whether a `"hello"` member exists in the root object. Since a `Value` can contain different types of value, we may need to verify its type and use suitable API to obtain the value. In this example, `"hello"` member associates with a JSON string. 51 ~~~~~~~~~~cpp 52 assert(document.HasMember("hello")); 53 assert(document["hello"].IsString()); 54 printf("hello = %s\n", document["hello"].GetString()); 55 ~~~~~~~~~~ 56 57 ~~~~~~~~~~ 58 world 59 ~~~~~~~~~~ 60 61 JSON true/false values are represented as `bool`. 62 ~~~~~~~~~~cpp 63 assert(document["t"].IsBool()); 64 printf("t = %s\n", document["t"].GetBool() ? "true" : "false"); 65 ~~~~~~~~~~ 66 67 ~~~~~~~~~~ 68 true 69 ~~~~~~~~~~ 70 71 JSON null can be queryed by `IsNull()`. 72 ~~~~~~~~~~cpp 73 printf("n = %s\n", document["n"].IsNull() ? "null" : "?"); 74 ~~~~~~~~~~ 75 76 ~~~~~~~~~~ 77 null 78 ~~~~~~~~~~ 79 80 JSON number type represents all numeric values. However, C++ needs more specific type for manipulation. 81 82 ~~~~~~~~~~cpp 83 assert(document["i"].IsNumber()); 84 85 // In this case, IsUint()/IsInt64()/IsUInt64() also return true. 86 assert(document["i"].IsInt()); 87 printf("i = %d\n", document["i"].GetInt()); 88 // Alternative (int)document["i"] 89 90 assert(document["pi"].IsNumber()); 91 assert(document["pi"].IsDouble()); 92 printf("pi = %g\n", document["pi"].GetDouble()); 93 ~~~~~~~~~~ 94 95 ~~~~~~~~~~ 96 i = 123 97 pi = 3.1416 98 ~~~~~~~~~~ 99 100 JSON array contains a number of elements. 101 ~~~~~~~~~~cpp 102 // Using a reference for consecutive access is handy and faster. 103 const Value& a = document["a"]; 104 assert(a.IsArray()); 105 for (SizeType i = 0; i < a.Size(); i++) // Uses SizeType instead of size_t 106 printf("a[%d] = %d\n", i, a[i].GetInt()); 107 ~~~~~~~~~~ 108 109 ~~~~~~~~~~ 110 a[0] = 1 111 a[1] = 2 112 a[2] = 3 113 a[3] = 4 114 ~~~~~~~~~~ 115 116 Note that, RapidJSON does not automatically convert values between JSON types. If a value is a string, it is invalid to call `GetInt()`, for example. In debug mode it will fail an assertion. In release mode, the behavior is undefined. 117 118 In the following, details about querying individual types are discussed. 119 120 ## Query Array {#QueryArray} 121 122 By default, `SizeType` is typedef of `unsigned`. In most systems, array is limited to store up to 2^32-1 elements. 123 124 You may access the elements in array by integer literal, for example, `a[0]`, `a[1]`, `a[2]`. 125 126 Array is similar to `std::vector`, instead of using indices, you may also use iterator to access all the elements. 127 ~~~~~~~~~~cpp 128 for (Value::ConstValueIterator itr = a.Begin(); itr != a.End(); ++itr) 129 printf("%d ", itr->GetInt()); 130 ~~~~~~~~~~ 131 132 And other familiar query functions: 133 * `SizeType Capacity() const` 134 * `bool Empty() const` 135 136 ## Query Object {#QueryObject} 137 138 Similar to array, we can access all object members by iterator: 139 140 ~~~~~~~~~~cpp 141 static const char* kTypeNames[] = 142 { "Null", "False", "True", "Object", "Array", "String", "Number" }; 143 144 for (Value::ConstMemberIterator itr = document.MemberBegin(); 145 itr != document.MemberEnd(); ++itr) 146 { 147 printf("Type of member %s is %s\n", 148 itr->name.GetString(), kTypeNames[itr->value.GetType()]); 149 } 150 ~~~~~~~~~~ 151 152 ~~~~~~~~~~ 153 Type of member hello is String 154 Type of member t is True 155 Type of member f is False 156 Type of member n is Null 157 Type of member i is Number 158 Type of member pi is Number 159 Type of member a is Array 160 ~~~~~~~~~~ 161 162 Note that, when `operator[](const char*)` cannot find the member, it will fail an assertion. 163 164 If we are unsure whether a member exists, we need to call `HasMember()` before calling `operator[](const char*)`. However, this incurs two lookup. A better way is to call `FindMember()`, which can check the existence of member and obtain its value at once: 165 166 ~~~~~~~~~~cpp 167 Value::ConstMemberIterator itr = document.FindMember("hello"); 168 if (itr != document.MemberEnd()) 169 printf("%s %s\n", itr->value.GetString()); 170 ~~~~~~~~~~ 171 172 ## Querying Number {#QueryNumber} 173 174 JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser. 175 176 As C++ provides several integer and floating point number types, the DOM tries to handle these with widest possible range and good performance. 177 178 When a Number is parsed, it is stored in the DOM as either one of the following type: 179 180 Type | Description 181 -----------|--------------------------------------- 182 `unsigned` | 32-bit unsigned integer 183 `int` | 32-bit signed integer 184 `uint64_t` | 64-bit unsigned integer 185 `int64_t` | 64-bit signed integer 186 `double` | 64-bit double precision floating point 187 188 When querying a number, you can check whether the number can be obtained as target type: 189 190 Checking | Obtaining 191 ------------------|--------------------- 192 `bool IsNumber()` | N/A 193 `bool IsUint()` | `unsigned GetUint()` 194 `bool IsInt()` | `int GetInt()` 195 `bool IsUint64()` | `uint64_t GetUint64()` 196 `bool IsInt64()` | `int64_t GetInt64()` 197 `bool IsDouble()` | `double GetDouble()` 198 199 Note that, an integer value may be obtained in various ways without conversion. For example, A value `x` containing 123 will make `x.IsInt() == x.IsUint() == x.IsInt64() == x.IsUint64() == true`. But a value `y` containing -3000000000 will only makes `x.IsInt64() == true`. 200 201 When obtaining the numeric values, `GetDouble()` will convert internal integer representation to a `double`. Note that, `int` and `unsigned` can be safely convert to `double`, but `int64_t` and `uint64_t` may lose precision (since mantissa of `double` is only 52-bits). 202 203 ## Query String {#QueryString} 204 205 In addition to `GetString()`, the `Value` class also contains `GetStringLength()`. Here explains why. 206 207 According to RFC 4627, JSON strings can contain Unicode character `U+0000`, which must be escaped as `"\u0000"`. The problem is that, C/C++ often uses null-terminated string, which treats ``\0'` as the terminator symbol. 208 209 To conform RFC 4627, RapidJSON supports string containing `U+0000`. If you need to handle this, you can use `GetStringLength()` API to obtain the correct length of string. 210 211 For example, after parsing a the following JSON to `Document d`: 212 213 ~~~~~~~~~~js 214 { "s" : "a\u0000b" } 215 ~~~~~~~~~~ 216 The correct length of the value `"a\u0000b"` is 3. But `strlen()` returns 1. 217 218 `GetStringLength()` can also improve performance, as user may often need to call `strlen()` for allocating buffer. 219 220 Besides, `std::string` also support a constructor: 221 222 ~~~~~~~~~~cpp 223 string(const char* s, size_t count); 224 ~~~~~~~~~~ 225 226 which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance. 227 228 ## Comparing values 229 230 You can use `==` and `!=` to compare values. Two values are equal if and only if they are have same type and contents. You can also compare values with primitive types. Here is an example. 231 232 ~~~~~~~~~~cpp 233 if (document["hello"] == document["n"]) /*...*/; // Compare values 234 if (document["hello"] == "world") /*...*/; // Compare value with literal string 235 if (document["i"] != 123) /*...*/; // Compare with integers 236 if (document["pi"] != 3.14) /*...*/; // Compare with double. 237 ~~~~~~~~~~ 238 239 Array/object compares their elements/members in order. They are equal if and only if their whole subtrees are equal. 240 241 Note that, currently if an object contains duplicated named member, comparing equality with any object is always `false`. 242 243 # Create/Modify Values {#CreateModifyValues} 244 245 There are several ways to create values. After a DOM tree is created and/or modified, it can be saved as JSON again using `Writer`. 246 247 ## Change Value Type {#ChangeValueType} 248 When creating a Value or Document by default constructor, its type is Null. To change its type, call `SetXXX()` or assignment operator, for example: 249 250 ~~~~~~~~~~cpp 251 Document d; // Null 252 d.SetObject(); 253 254 Value v; // Null 255 v.SetInt(10); 256 v = 10; // Shortcut, same as above 257 ~~~~~~~~~~ 258 259 ### Overloaded Constructors 260 There are also overloaded constructors for several types: 261 262 ~~~~~~~~~~cpp 263 Value b(true); // calls Value(bool) 264 Value i(-123); // calls Value(int) 265 Value u(123u); // calls Value(unsigned) 266 Value d(1.5); // calls Value(double) 267 ~~~~~~~~~~ 268 269 To create empty object or array, you may use `SetObject()`/`SetArray()` after default constructor, or using the `Value(Type)` in one shot: 270 271 ~~~~~~~~~~cpp 272 Value o(kObjectType); 273 Value a(kArrayType); 274 ~~~~~~~~~~ 275 276 ## Move Semantics {#MoveSemantics} 277 278 A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example, 279 280 ~~~~~~~~~~cpp 281 Value a(123); 282 Value b(456); 283 b = a; // a becomes a Null value, b becomes number 123. 284 ~~~~~~~~~~ 285 286  287 288 Why? What is the advantage of this semantics? 289 290 The simple answer is performance. For fixed size JSON types (Number, True, False, Null), copying them is fast and easy. However, For variable size JSON types (String, Array, Object), copying them will incur a lot of overheads. And these overheads are often unnoticed. Especially when we need to create temporary object, copy it to another variable, and then destruct it. 291 292 For example, if normal *copy* semantics was used: 293 294 ~~~~~~~~~~cpp 295 Document d; 296 Value o(kObjectType); 297 { 298 Value contacts(kArrayType); 299 // adding elements to contacts array. 300 // ... 301 o.AddMember("contacts", contacts, d.GetAllocator()); // deep clone contacts (may be with lots of allocations) 302 // destruct contacts. 303 } 304 ~~~~~~~~~~ 305 306  307 308 The object `o` needs to allocate a buffer of same size as contacts, makes a deep clone of it, and then finally contacts is destructed. This will incur a lot of unnecessary allocations/deallocations and memory copying. 309 310 There are solutions to prevent actual copying these data, such as reference counting and garbage collection(GC). 311 312 To make RapidJSON simple and fast, we chose to use *move* semantics for assignment. It is similar to `std::auto_ptr` which transfer ownership during assignment. Move is much faster and simpler, it just destructs the original value, `memcpy()` the source to destination, and finally sets the source as Null type. 313 314 So, with move semantics, the above example becomes: 315 316 ~~~~~~~~~~cpp 317 Document d; 318 Value o(kObjectType); 319 { 320 Value contacts(kArrayType); 321 // adding elements to contacts array. 322 o.AddMember("contacts", contacts, d.GetAllocator()); // just memcpy() of contacts itself to the value of new member (16 bytes) 323 // contacts became Null here. Its destruction is trivial. 324 } 325 ~~~~~~~~~~ 326 327  328 329 This is called move assignment operator in C++11. As RapidJSON supports C++03, it adopts move semantics using assignment operator, and all other modifying function like `AddMember()`, `PushBack()`. 330 331 ### Move semantics and temporary values {#TemporaryValues} 332 333 Sometimes, it is convenient to construct a Value in place, before passing it to one of the "moving" functions, like `PushBack()` or `AddMember()`. As temporary objects can't be converted to proper Value references, the convenience function `Move()` is available: 334 335 ~~~~~~~~~~cpp 336 Value a(kArrayType); 337 Document::AllocatorType& allocator = document.GetAllocator(); 338 // a.PushBack(Value(42), allocator); // will not compile 339 a.PushBack(Value().SetInt(42), allocator); // fluent API 340 a.PushBack(Value(42).Move(), allocator); // same as above 341 ~~~~~~~~~~ 342 343 ## Create String {#CreateString} 344 RapidJSON provide two strategies for storing string. 345 346 1. copy-string: allocates a buffer, and then copy the source data into it. 347 2. const-string: simply store a pointer of string. 348 349 Copy-string is always safe because it owns a copy of the data. Const-string can be used for storing string literal, and in-situ parsing which we will mentioned in Document section. 350 351 To make memory allocation customizable, RapidJSON requires user to pass an instance of allocator, whenever an operation may require allocation. This design is needed to prevent storing a allocator (or Document) pointer per Value. 352 353 Therefore, when we assign a copy-string, we call this overloaded `SetString()` with allocator: 354 355 ~~~~~~~~~~cpp 356 Document document; 357 Value author; 358 char buffer[10]; 359 int len = sprintf(buffer, "%s %s", "Milo", "Yip"); // dynamically created string. 360 author.SetString(buffer, len, document.GetAllocator()); 361 memset(buffer, 0, sizeof(buffer)); 362 // author.GetString() still contains "Milo Yip" after buffer is destroyed 363 ~~~~~~~~~~ 364 365 In this example, we get the allocator from a `Document` instance. This is a common idiom when using RapidJSON. But you may use other instances of allocator. 366 367 Besides, the above `SetString()` requires length. This can handle null characters within a string. There is another `SetString()` overloaded function without the length parameter. And it assumes the input is null-terminated and calls a `strlen()`-like function to obtain the length. 368 369 Finally, for string literal or string with safe life-cycle can use const-string version of `SetString()`, which lacks allocator parameter. For string literals (or constant character arrays), simply passing the literal as parameter is safe and efficient: 370 371 ~~~~~~~~~~cpp 372 Value s; 373 s.SetString("rapidjson"); // can contain null character, length derived at compile time 374 s = "rapidjson"; // shortcut, same as above 375 ~~~~~~~~~~ 376 377 For character pointer, the RapidJSON requires to mark it as safe before using it without copying. This can be achieved by using the `StringRef` function: 378 379 ~~~~~~~~~cpp 380 const char * cstr = getenv("USER"); 381 size_t cstr_len = ...; // in case length is available 382 Value s; 383 // s.SetString(cstr); // will not compile 384 s.SetString(StringRef(cstr)); // ok, assume safe lifetime, null-terminated 385 s = StringRef(cstr); // shortcut, same as above 386 s.SetString(StringRef(cstr,cstr_len)); // faster, can contain null character 387 s = StringRef(cstr,cstr_len); // shortcut, same as above 388 389 ~~~~~~~~~ 390 391 ## Modify Array {#ModifyArray} 392 Value with array type provides similar APIs as `std::vector`. 393 394 * `Clear()` 395 * `Reserve(SizeType, Allocator&)` 396 * `Value& PushBack(Value&, Allocator&)` 397 * `template <typename T> GenericValue& PushBack(T, Allocator&)` 398 * `Value& PopBack()` 399 * `ValueIterator Erase(ConstValueIterator pos)` 400 * `ValueIterator Erase(ConstValueIterator first, ConstValueIterator last)` 401 402 Note that, `Reserve(...)` and `PushBack(...)` may allocate memory for the array elements, therefore require an allocator. 403 404 Here is an example of `PushBack()`: 405 406 ~~~~~~~~~~cpp 407 Value a(kArrayType); 408 Document::AllocatorType& allocator = document.GetAllocator(); 409 410 for (int i = 5; i <= 10; i++) 411 a.PushBack(i, allocator); // allocator is needed for potential realloc(). 412 413 // Fluent interface 414 a.PushBack("Lua", allocator).PushBack("Mio", allocator); 415 ~~~~~~~~~~ 416 417 Differs from STL, `PushBack()`/`PopBack()` returns the array reference itself. This is called _fluent interface_. 418 419 If you want to add a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)) to the array, you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place: 420 421 ~~~~~~~~~~cpp 422 // in-place Value parameter 423 contact.PushBack(Value("copy", document.GetAllocator()).Move(), // copy string 424 document.GetAllocator()); 425 426 // explicit parameters 427 Value val("key", document.GetAllocator()); // copy string 428 contact.PushBack(val, document.GetAllocator()); 429 ~~~~~~~~~~ 430 431 ## Modify Object {#ModifyObject} 432 Object is a collection of key-value pairs (members). Each key must be a string value. To modify an object, either add or remove members. THe following APIs are for adding members: 433 434 * `Value& AddMember(Value&, Value&, Allocator& allocator)` 435 * `Value& AddMember(StringRefType, Value&, Allocator&)` 436 * `template <typename T> Value& AddMember(StringRefType, T value, Allocator&)` 437 438 Here is an example. 439 440 ~~~~~~~~~~cpp 441 Value contact(kObject); 442 contact.AddMember("name", "Milo", document.GetAllocator()); 443 contact.AddMember("married", true, document.GetAllocator()); 444 ~~~~~~~~~~ 445 446 The name parameter with `StringRefType` is similar to the interface of `SetString` function for string values. These overloads are used to avoid the need for copying the `name` string, as constant key names are very common in JSON objects. 447 448 If you need to create a name from a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)), you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place: 449 450 ~~~~~~~~~~cpp 451 // in-place Value parameter 452 contact.AddMember(Value("copy", document.GetAllocator()).Move(), // copy string 453 Value().Move(), // null value 454 document.GetAllocator()); 455 456 // explicit parameters 457 Value key("key", document.GetAllocator()); // copy string name 458 Value val(42); // some value 459 contact.AddMember(key, val, document.GetAllocator()); 460 ~~~~~~~~~~ 461 462 For removing members, there are several choices: 463 464 * `bool RemoveMember(const Ch* name)`: Remove a member by search its name (linear time complexity). 465 * `bool RemoveMember(const Value& name)`: same as above but `name` is a Value. 466 * `MemberIterator RemoveMember(MemberIterator)`: Remove a member by iterator (_constant_ time complexity). 467 * `MemberIterator EraseMember(MemberIterator)`: similar to the above but it preserves order of members (linear time complexity). 468 * `MemberIterator EraseMember(MemberIterator first, MemberIterator last)`: remove a range of members, preserves order (linear time complexity). 469 470 `MemberIterator RemoveMember(MemberIterator)` uses a "move-last" trick to achieve constant time complexity. Basically the member at iterator is destructed, and then the last element is moved to that position. So the order of the remaining members are changed. 471 472 ## Deep Copy Value {#DeepCopyValue} 473 If we really need to copy a DOM tree, we can use two APIs for deep copy: constructor with allocator, and `CopyFrom()`. 474 475 ~~~~~~~~~~cpp 476 Document d; 477 Document::AllocatorType& a = d.GetAllocator(); 478 Value v1("foo"); 479 // Value v2(v1); // not allowed 480 481 Value v2(v1, a); // make a copy 482 assert(v1.IsString()); // v1 untouched 483 d.SetArray().PushBack(v1, a).PushBack(v2, a); 484 assert(v1.IsNull() && v2.IsNull()); // both moved to d 485 486 v2.CopyFrom(d, a); // copy whole document to v2 487 assert(d.IsArray() && d.Size() == 2); // d untouched 488 v1.SetObject().AddMember("array", v2, a); 489 d.PushBack(v1, a); 490 ~~~~~~~~~~ 491 492 ## Swap Values {#SwapValues} 493 494 `Swap()` is also provided. 495 496 ~~~~~~~~~~cpp 497 Value a(123); 498 Value b("Hello"); 499 a.Swap(b); 500 assert(a.IsString()); 501 assert(b.IsInt()); 502 ~~~~~~~~~~ 503 504 Swapping two DOM trees is fast (constant time), despite the complexity of the trees. 505 506 # What's next {#WhatsNext} 507 508 This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON: 509 510 1. [Streams](doc/stream.md) are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams. 511 2. [Encoding](doc/encoding.md) defines which character encoding is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally. 512 3. [DOM](doc/dom.md)'s basics are already covered in this tutorial. Uncover more advanced features such as *in situ* parsing, other parsing options and advanced usages. 513 4. [SAX](doc/sax.md) is the foundation of parsing/generating facility in RapidJSON. Learn how to use `Reader`/`Writer` to implement even faster applications. Also try `PrettyWriter` to format the JSON. 514 5. [Performance](doc/performance.md) shows some in-house and third-party benchmarks. 515 6. [Internals](doc/internals.md) describes some internal designs and techniques of RapidJSON. 516 517 You may also refer to the [FAQ](doc/faq.md), API documentation, examples and unit tests. 518