Home | History | Annotate | Download | only in tinyxml
      1 /** @mainpage
      2 
      3 <h1> TinyXml </h1>
      4 
      5 TinyXml is a simple, small, C++ XML parser that can be easily
      6 integrating into other programs.
      7 
      8 <h2> What it does. </h2>
      9 
     10 In brief, TinyXml parses an XML document, and builds from that a
     11 Document Object Model (DOM) that can be read, modified, and saved.
     12 
     13 XML stands for "eXtensible Markup Language." It allows you to create
     14 your own document markups. Where HTML does a very good job of marking
     15 documents for browsers, XML allows you to define any kind of document
     16 markup, for example a document that describes a "to do" list for an
     17 organizer application. XML is a very structured and convenient format.
     18 All those random file formats created to store application data can
     19 all be replaced with XML. One parser for everything.
     20 
     21 The best place for the complete, correct, and quite frankly hard to
     22 read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
     23 http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML
     24 (that I really like) can be found at
     25 <a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>.
     26 
     27 There are different ways to access and interact with XML data.
     28 TinyXml uses a Document Object Model (DOM), meaning the XML data is parsed
     29 into a C++ objects that can be browsed and manipulated, and then
     30 written to disk or another output stream. You can also construct an XML document from
     31 scratch with C++ objects and write this to disk or another output
     32 stream.
     33 
     34 TinyXml is designed to be easy and fast to learn. It is two headers
     35 and four cpp files. Simply add these to your project and off you go.
     36 There is an example file - xmltest.cpp - to get you started.
     37 
     38 TinyXml is released under the ZLib license,
     39 so you can use it in open source or commercial code. The details
     40 of the license are at the top of every source file.
     41 
     42 TinyXml attempts to be a flexible parser, but with truly correct and
     43 compliant XML output. TinyXml should compile on any reasonably C++
     44 compliant system. It does not rely on exceptions or RTTI. It can be
     45 compiled with or without STL support. TinyXml fully supports
     46 the UTF-8 encoding, and the first 64k character entities.
     47 
     48 
     49 <h2> What it doesn't do. </h2>
     50 
     51 It doesnt parse or use DTDs (Document Type Definitions) or XSLs
     52 (eXtensible Stylesheet Language.) There are other parsers out there
     53 (check out www.sourceforge.org, search for XML) that are much more fully
     54 featured. But they are also much bigger, take longer to set up in
     55 your project, have a higher learning curve, and often have a more
     56 restrictive license. If you are working with browsers or have more
     57 complete XML needs, TinyXml is not the parser for you.
     58 
     59 The following DTD syntax will not parse at this time in TinyXml:
     60 
     61 @verbatim
     62 	<!DOCTYPE Archiv [
     63 	 <!ELEMENT Comment (#PCDATA)>
     64 	]>
     65 @endverbatim
     66 
     67 because TinyXml sees this as a !DOCTYPE node with an illegally
     68 embedded !ELEMENT node. This may be addressed in the future.
     69 
     70 <h2> Tutorials. </h2>
     71 
     72 For the impatient, here is a tutorial to get you going. A great way to get started,
     73 but it is worth your time to read this (very short) manual completely.
     74 
     75 - @subpage tutorial0
     76 
     77 <h2> Code Status.  </h2>
     78 
     79 TinyXml is mature, tested code. It is very stable. If you find
     80 bugs, please file a bug report on the sourceforge web site
     81 (www.sourceforge.net/projects/tinyxml).
     82 We'll get them straightened out as soon as possible.
     83 
     84 There are some areas of improvement; please check sourceforge if you are
     85 interested in working on TinyXml.
     86 
     87 
     88 <h2> Features </h2>
     89 
     90 <h3> Using STL </h3>
     91 
     92 TinyXml can be compiled to use or not use STL. When using STL, TinyXml
     93 uses the std::string class, and fully supports std::istream, std::ostream,
     94 operator<<, and operator>>. Many API methods have both 'const char*' and
     95 'const std::string&' forms.
     96 
     97 When STL support is compiled out, no STL files are included whatsover. All
     98 the string classes are implemented by TinyXml itself. API methods
     99 all use the 'const char*' form for input.
    100 
    101 Use the compile time #define:
    102 
    103 	TIXML_USE_STL
    104 
    105 to compile one version or the other. This can be passed by the compiler,
    106 or set as the first line of "tinyxml.h".
    107 
    108 Note: If compiling the test code in Linux, setting the environment
    109 variable TINYXML_USE_STL=YES/NO will control STL compilation. In the
    110 Windows project file, STL and non STL targets are provided. In your project,
    111 its probably easiest to add the line "#define TIXML_USE_STL" as the first
    112 line of tinyxml.h.
    113 
    114 <h3> UTF-8 </h3>
    115 
    116 TinyXml supports UTF-8 allowing to manipulate XML files in any language. TinyXml
    117 also supports "legacy mode" - the encoding used before UTF-8 support and
    118 probably best described as "extended ascii".
    119 
    120 Normally, TinyXml will try to detect the correct encoding and use it. However,
    121 by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXml
    122 can be forced to always use one encoding.
    123 
    124 TinyXml will assume Legacy Mode until one of the following occurs:
    125 <ol>
    126 	<li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf)
    127 		 begin the file or data stream, TinyXml will read it as UTF-8. </li>
    128 	<li> If the declaration tag is read, and it has an encoding="UTF-8", then
    129 		 TinyXml will read it as UTF-8. </li>
    130 	<li> If the declaration tag is read, and it has no encoding specified, then
    131 		 TinyXml will read it as UTF-8. </li>
    132 	<li> If the declaration tag is read, and it has an encoding="something else", then
    133 		 TinyXml will read it as Legacy Mode. In legacy mode, TinyXml will
    134 		 work as it did before. It's not clear what that mode does exactly, but
    135 		 old content should keep working.</li>
    136 	<li> Until one of the above criteria is met, TinyXml runs in Legacy Mode.</li>
    137 </ol>
    138 
    139 What happens if the encoding is incorrectly set or detected? TinyXml will try
    140 to read and pass through text seen as improperly encoded. You may get some strange
    141 results or mangled characters. You may want to force TinyXml to the correct mode.
    142 
    143 <b> You may force TinyXml to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or
    144 LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all
    145 the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may
    146 force it to TIXML_ENCODING_UTF8 with the same technique.</b>
    147 
    148 For English users, using English XML, UTF-8 is the same as low-ASCII. You
    149 don't need to be aware of UTF-8 or change your code in any way. You can think
    150 of UTF-8 as a "superset" of ASCII.
    151 
    152 UTF-8 is not a double byte format - but it is a standard encoding of Unicode!
    153 TinyXml does not use or directly support wchar, TCHAR, or Microsofts _UNICODE at this time.
    154 It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding
    155 of unicode. This is a source of confusion.
    156 
    157 For "high-ascii" languages - everything not English, pretty much - TinyXml can
    158 handle all languages, at the same time, as long as the XML is encoded
    159 in UTF-8. That can be a little tricky, older programs and operating systems
    160 tend to use the "default" or "traditional" code page. Many apps (and almost all
    161 modern ones) can output UTF-8, but older or stubborn (or just broken) ones
    162 still output text in the default code page.
    163 
    164 For example, Japanese systems traditionally use SHIFT-JIS encoding.
    165 Text encoded as SHIFT-JIS can not be read by tinyxml.
    166 A good text editor can import SHIFT-JIS and then save as UTF-8.
    167 
    168 The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great
    169 job covering the encoding issue.
    170 
    171 The test file "utf8test.xml" is an XML containing English, Spanish, Russian,
    172 and Simplified Chinese. (Hopefully they are translated correctly). The file
    173 "utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that
    174 if you don't have the correct fonts (Simplified Chinese or Russian) on your
    175 system, you won't see output that matches the GIF file even if you can parse
    176 it correctly. Also note that (at least on my Windows machine) console output
    177 is in a Western code page, so that Print() or printf() cannot correctly display
    178 the file. This is not a bug in TinyXml - just an OS issue. No data is lost or
    179 destroyed by TinyXml. The console just doesn't render UTF-8.
    180 
    181 
    182 <h3> Entities </h3>
    183 TinyXml recognizes the pre-defined "character entities", meaning special
    184 characters. Namely:
    185 
    186 @verbatim
    187 	&amp;	&
    188 	&lt;	<
    189 	&gt;	>
    190 	&quot;	"
    191 	&apos;	'
    192 @endverbatim
    193 
    194 These are recognized when the XML document is read, and translated to there
    195 UTF-8 equivalents. For instance, text with the XML of:
    196 
    197 @verbatim
    198 	Far &amp; Away
    199 @endverbatim
    200 
    201 will have the Value() of "Far & Away" when queried from the TiXmlText object,
    202 and will be written back to the XML stream/file as an ampersand. Older versions
    203 of TinyXml "preserved" character entities, but the newer versions will translate
    204 them into characters.
    205 
    206 Additionally, any character can be specified by its Unicode code point:
    207 The syntax "&#xA0;" or "&#160;" are both to the non-breaking space characher.
    208 
    209 
    210 <h3> Streams </h3>
    211 With TIXML_USE_STL on,
    212 TiXml has been modified to support both C (FILE) and C++ (operator <<,>>)
    213 streams. There are some differences that you may need to be aware of.
    214 
    215 C style output:
    216 	- based on FILE*
    217 	- the Print() and SaveFile() methods
    218 
    219 	Generates formatted output, with plenty of white space, intended to be as
    220 	human-readable as possible. They are very fast, and tolerant of ill formed
    221 	XML documents. For example, an XML document that contains 2 root elements
    222 	and 2 declarations, will still print.
    223 
    224 C style input:
    225 	- based on FILE*
    226 	- the Parse() and LoadFile() methods
    227 
    228 	A fast, tolerant read. Use whenever you don't need the C++ streams.
    229 
    230 C++ style ouput:
    231 	- based on std::ostream
    232 	- operator<<
    233 
    234 	Generates condensed output, intended for network transmission rather than
    235 	readability. Depending on your system's implementation of the ostream class,
    236 	these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML:
    237 	a document should contain the correct one root element. Additional root level
    238 	elements will not be streamed out.
    239 
    240 C++ style input:
    241 	- based on std::istream
    242 	- operator>>
    243 
    244 	Reads XML from a stream, making it useful for network transmission. The tricky
    245 	part is knowing when the XML document is complete, since there will almost
    246 	certainly be other data in the stream. TinyXml will assume the XML data is
    247 	complete after it reads the root element. Put another way, documents that
    248 	are ill-constructed with more than one root element will not read correctly.
    249 	Also note that operator>> is somewhat slower than Parse, due to both
    250 	implementation of the STL and limitations of TinyXml.
    251 
    252 <h3> White space </h3>
    253 The world simply does not agree on whether white space should be kept, or condensed.
    254 For example, pretend the '_' is a space, and look at "Hello____world". HTML, and
    255 at least some XML parsers, will interpret this as "Hello_world". They condense white
    256 space. Some XML parsers do not, and will leave it as "Hello____world". (Remember
    257 to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become
    258 Hello___world.
    259 
    260 It's an issue that hasn't been resolved to my satisfaction. TinyXml supports the
    261 first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior.
    262 The default is to condense white space.
    263 
    264 If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool )
    265 before making any calls to Parse XML data, and I don't recommend changing it after
    266 it has been set.
    267 
    268 
    269 <h3> Handles </h3>
    270 
    271 Where browsing an XML document in a robust way, it is important to check
    272 for null returns from method calls. An error safe implementation can
    273 generate a lot of code like:
    274 
    275 @verbatim
    276 TiXmlElement* root = document.FirstChildElement( "Document" );
    277 if ( root )
    278 {
    279 	TiXmlElement* element = root->FirstChildElement( "Element" );
    280 	if ( element )
    281 	{
    282 		TiXmlElement* child = element->FirstChildElement( "Child" );
    283 		if ( child )
    284 		{
    285 			TiXmlElement* child2 = child->NextSiblingElement( "Child" );
    286 			if ( child2 )
    287 			{
    288 				// Finally do something useful.
    289 @endverbatim
    290 
    291 Handles have been introduced to clean this up. Using the TiXmlHandle class,
    292 the previous code reduces to:
    293 
    294 @verbatim
    295 TiXmlHandle docHandle( &document );
    296 TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).Element();
    297 if ( child2 )
    298 {
    299 	// do something useful
    300 @endverbatim
    301 
    302 Which is much easier to deal with. See TiXmlHandle for more information.
    303 
    304 
    305 <h3> Row and Column tracking </h3>
    306 Being able to track nodes and attributes back to their origin location
    307 in source files can be very important for some applications. Additionally,
    308 knowing where parsing errors occured in the original source can be very
    309 time saving.
    310 
    311 TinyXml can tracks the row and column origin of all nodes and attributes
    312 in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return
    313 the origin of the node in the source text. The correct tabs can be
    314 configured in TiXmlDocument::SetTabSize().
    315 
    316 
    317 <h2> Using and Installing </h2>
    318 
    319 To Compile and Run xmltest:
    320 
    321 A Linux Makefile and a Windows Visual C++ .dsw file is provided.
    322 Simply compile and run. It will write the file demotest.xml to your
    323 disk and generate output on the screen. It also tests walking the
    324 DOM by printing out the number of nodes found using different
    325 techniques.
    326 
    327 The Linux makefile is very generic and will
    328 probably run on other systems, but is only tested on Linux. You no
    329 longer need to run 'make depend'. The dependecies have been
    330 hard coded.
    331 
    332 <h3>Windows project file for VC6</h3>
    333 <ul>
    334 <li>tinyxml:		tinyxml library, non-STL </li>
    335 <li>tinyxmlSTL:		tinyxml library, STL </li>
    336 <li>tinyXmlTest:	test app, non-STL </li>
    337 <li>tinyXmlTestSTL: test app, STL </li>
    338 </ul>
    339 
    340 <h3>Linux Make file</h3>
    341 At the top of the makefile you can set:
    342 
    343 PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in
    344 the makefile.
    345 
    346 In the tinyxml directory, type "make clean" then "make". The executable
    347 file 'xmltest' will be created.
    348 
    349 
    350 
    351 <h3>To Use in an Application:</h3>
    352 
    353 Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your
    354 project or make file. That's it! It should compile on any reasonably
    355 compliant C++ system. You do not need to enable exceptions or
    356 RTTI for TinyXml.
    357 
    358 
    359 <h2> How TinyXml works.  </h2>
    360 
    361 An example is probably the best way to go. Take:
    362 @verbatim
    363 	<?xml version="1.0" standalone=no>
    364 	<!-- Our to do list data -->
    365 	<ToDo>
    366 		<Item priority="1"> Go to the <bold>Toy store!</bold></Item>
    367 		<Item priority="2"> Do bills</Item>
    368 	</ToDo>
    369 @endverbatim
    370 
    371 Its not much of a To Do list, but it will do. To read this file
    372 (say "demo.xml") you would create a document, and parse it in:
    373 @verbatim
    374 	TiXmlDocument doc( "demo.xml" );
    375 	doc.LoadFile();
    376 @endverbatim
    377 
    378 And its ready to go. Now lets look at some lines and how they
    379 relate to the DOM.
    380 
    381 @verbatim
    382 <?xml version="1.0" standalone=no>
    383 @endverbatim
    384 
    385 	The first line is a declaration, and gets turned into the
    386 	TiXmlDeclaration class. It will be the first child of the
    387 	document node.
    388 
    389 	This is the only directive/special tag parsed by by TinyXml.
    390 	Generally directive targs are stored in TiXmlUnknown so the
    391 	commands wont be lost when it is saved back to disk.
    392 
    393 @verbatim
    394 <!-- Our to do list data -->
    395 @endverbatim
    396 
    397 	A comment. Will become a TiXmlComment object.
    398 
    399 @verbatim
    400 <ToDo>
    401 @endverbatim
    402 
    403 	The "ToDo" tag defines a TiXmlElement object. This one does not have
    404 	any attributes, but does contain 2 other elements.
    405 
    406 @verbatim
    407 <Item priority="1">
    408 @endverbatim
    409 
    410 	Creates another TiXmlElement which is a child of the "ToDo" element.
    411 	This element has 1 attribute, with the name "priority" and the value
    412 	"1".
    413 
    414 Go to the
    415 
    416 	A TiXmlText. This is a leaf node and cannot contain other nodes.
    417 	It is a child of the "Item" TiXmlElement.
    418 
    419 @verbatim
    420 <bold>
    421 @endverbatim
    422 
    423 
    424 	Another TiXmlElement, this one a child of the "Item" element.
    425 
    426 Etc.
    427 
    428 Looking at the entire object tree, you end up with:
    429 @verbatim
    430 TiXmlDocument				"demo.xml"
    431 	TiXmlDeclaration		"version='1.0'" "standalone=no"
    432 	TiXmlComment			" Our to do list data"
    433 	TiXmlElement			"ToDo"
    434 		TiXmlElement		"Item"		Attribtutes: priority = 1
    435 			TiXmlText		"Go to the "
    436 			TiXmlElement    "bold"
    437 				TiXmlText	"Toy store!"
    438 		TiXmlElement			"Item"		Attributes: priority=2
    439 			TiXmlText			"Do bills"
    440 @endverbatim
    441 
    442 <h2> Documentation </h2>
    443 
    444 The documentation is build with Doxygen, using the 'dox'
    445 configuration file.
    446 
    447 <h2> License </h2>
    448 
    449 TinyXml is released under the zlib license:
    450 
    451 This software is provided 'as-is', without any express or implied
    452 warranty. In no event will the authors be held liable for any
    453 damages arising from the use of this software.
    454 
    455 Permission is granted to anyone to use this software for any
    456 purpose, including commercial applications, and to alter it and
    457 redistribute it freely, subject to the following restrictions:
    458 
    459 1. The origin of this software must not be misrepresented; you must
    460 not claim that you wrote the original software. If you use this
    461 software in a product, an acknowledgment in the product documentation
    462 would be appreciated but is not required.
    463 
    464 2. Altered source versions must be plainly marked as such, and
    465 must not be misrepresented as being the original software.
    466 
    467 3. This notice may not be removed or altered from any source
    468 distribution.
    469 
    470 <h2> References  </h2>
    471 
    472 The World Wide Web Consortium is the definitive standard body for
    473 XML, and there web pages contain huge amounts of information.
    474 
    475 The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
    476 http://www.w3.org/TR/2004/REC-xml-20040204/</a>
    477 
    478 I also recommend "XML Pocket Reference" by Robert Eckstein and published by
    479 OReilly...the book that got the whole thing started.
    480 
    481 <h2> Contributors, Contacts, and a Brief History </h2>
    482 
    483 Thanks very much to everyone who sends suggestions, bugs, ideas, and
    484 encouragement. It all helps, and makes this project fun. A special thanks
    485 to the contributors on the web pages that keep it lively.
    486 
    487 So many people have sent in bugs and ideas, that rather than list here
    488 we try to give credit due in the "changes.txt" file.
    489 
    490 TinyXml was originally written be Lee Thomason. (Often the "I" still
    491 in the documenation.) Lee reviews changes and releases new versions,
    492 with the help of Yves Berquin and the tinyXml community.
    493 
    494 We appreciate your suggestions, and would love to know if you
    495 use TinyXml. Hopefully you will enjoy it and find it useful.
    496 Please post questions, comments, file bugs, or contact us at:
    497 
    498 www.sourceforge.net/projects/tinyxml
    499 
    500 Lee Thomason,
    501 Yves Berquin
    502 */
    503