Home | History | Annotate | Download | only in doc
      1 <?xml version="1.0" encoding="UTF-8"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      3 <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
      4 TD {font-family: Verdana,Arial,Helvetica}
      5 BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
      6 H1 {font-family: Verdana,Arial,Helvetica}
      7 H2 {font-family: Verdana,Arial,Helvetica}
      8 H3 {font-family: Verdana,Arial,Helvetica}
      9 A:link, A:visited, A:active { text-decoration: underline }
     10 </style><title>The parser interfaces</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>The parser interfaces</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Developer Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html" style="font-weight:bold">Main Menu</a></li><li><a href="html/index.html" style="font-weight:bold">Reference Manual</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="tutorial/index.html">Tutorial</a></li><li><a href="xmlreader.html">The Reader Interface</a></li><li><a href="ChangeLog.html">ChangeLog</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="python.html">Python and bindings</a></li><li><a href="architecture.html">libxml2 architecture</a></li><li><a href="tree.html">The tree output</a></li><li><a href="interface.html">The SAX interface</a></li><li><a href="xmlmem.html">Memory Management</a></li><li><a href="xmlio.html">I/O Interfaces</a></li><li><a href="library.html">The parser interfaces</a></li><li><a href="entities.html">Entities or no entities</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="upgrade.html">Upgrading 1.x code</a></li><li><a href="threads.html">Thread safety</a></li><li><a href="DOM.html">DOM Principles</a></li><li><a href="example.html">A real example</a></li><li><a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="APIchunk0.html">Alphabetic</a></li><li><a href="APIconstructors.html">Constructors</a></li><li><a href="APIfunctions.html">Functions/Types</a></li><li><a href="APIfiles.html">Modules</a></li><li><a href="APIsymbols.html">Symbols</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://opencsw.org/packages/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://lxml.de/">lxml Python bindings</a></li><li><a href="http://cpan.uwinnipeg.ca/dist/XML-LibXML">Perl bindings</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>This section is directly intended to help programmers getting bootstrapped
     11 using the XML tollkit from the C language. It is not intended to be
     12 extensive. I hope the automatically generated documents will provide the
     13 completeness required, but as a separate set of documents. The interfaces of
     14 the XML parser are by principle low level, Those interested in a higher level
     15 API should <a href="#DOM">look at DOM</a>.</p><p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are
     16 separated from the <a href="html/libxml-htmlparser.html">HTML parser
     17 interfaces</a>.  Let's have a look at how the XML parser can be called:</p><h3><a name="Invoking" id="Invoking">Invoking the parser : the pull method</a></h3><p>Usually, the first thing to do is to read an XML input. The parser accepts
     18 documents either from in-memory strings or from files.  The functions are
     19 defined in "parser.h":</p><dl>
     20   <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
     21     <dd><p>Parse a null-terminated string containing the document.</p>
     22     </dd>
     23 </dl><dl>
     24   <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
     25     <dd><p>Parse an XML document contained in a (possibly compressed)
     26       file.</p>
     27     </dd>
     28 </dl><p>The parser returns a pointer to the document structure (or NULL in case of
     29 failure).</p><h3 id="Invoking1">Invoking the parser: the push method</h3><p>In order for the application to keep the control when the document is
     30 being fetched (which is common for GUI based programs) libxml2 provides a
     31 push interface, too, as of version 1.8.3. Here are the interface
     32 functions:</p><pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
     33                                          void *user_data,
     34                                          const char *chunk,
     35                                          int size,
     36                                          const char *filename);
     37 int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
     38                                          const char *chunk,
     39                                          int size,
     40                                          int terminate);</pre><p>and here is a simple example showing how to use the interface:</p><pre>            FILE *f;
     41 
     42             f = fopen(filename, "r");
     43             if (f != NULL) {
     44                 int res, size = 1024;
     45                 char chars[1024];
     46                 xmlParserCtxtPtr ctxt;
     47 
     48                 res = fread(chars, 1, 4, f);
     49                 if (res &gt; 0) {
     50                     ctxt = xmlCreatePushParserCtxt(NULL, NULL,
     51                                 chars, res, filename);
     52                     while ((res = fread(chars, 1, size, f)) &gt; 0) {
     53                         xmlParseChunk(ctxt, chars, res, 0);
     54                     }
     55                     xmlParseChunk(ctxt, chars, 0, 1);
     56                     doc = ctxt-&gt;myDoc;
     57                     xmlFreeParserCtxt(ctxt);
     58                 }
     59             }</pre><p>The HTML parser embedded into libxml2 also has a push interface; the
     60 functions are just prefixed by "html" rather than "xml".</p><h3 id="Invoking2">Invoking the parser: the SAX interface</h3><p>The tree-building interface makes the parser memory-hungry, first loading
     61 the document in memory and then building the tree itself. Reading a document
     62 without building the tree is possible using the SAX interfaces (see SAX.h and
     63 <a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James
     64 Henstridge's documentation</a>). Note also that the push interface can be
     65 limited to SAX: just use the two first arguments of
     66 <code>xmlCreatePushParserCtxt()</code>.</p><h3><a name="Building" id="Building">Building a tree from scratch</a></h3><p>The other way to get an XML tree in memory is by building it. Basically
     67 there is a set of functions dedicated to building new elements. (These are
     68 also described in &lt;libxml/tree.h&gt;.) For example, here is a piece of
     69 code that produces the XML document used in the previous examples:</p><pre>    #include &lt;libxml/tree.h&gt;
     70     xmlDocPtr doc;
     71     xmlNodePtr tree, subtree;
     72 
     73     doc = xmlNewDoc("1.0");
     74     doc-&gt;children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
     75     xmlSetProp(doc-&gt;children, "prop1", "gnome is great");
     76     xmlSetProp(doc-&gt;children, "prop2", "&amp; linux too");
     77     tree = xmlNewChild(doc-&gt;children, NULL, "head", NULL);
     78     subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
     79     tree = xmlNewChild(doc-&gt;children, NULL, "chapter", NULL);
     80     subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
     81     subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
     82     subtree = xmlNewChild(tree, NULL, "image", NULL);
     83     xmlSetProp(subtree, "href", "linus.gif");</pre><p>Not really rocket science ...</p><h3><a name="Traversing" id="Traversing">Traversing the tree</a></h3><p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your
     84 code has access to the internal structure of all the elements of the tree.
     85 The names should be somewhat simple like <strong>parent</strong>,
     86 <strong>children</strong>, <strong>next</strong>, <strong>prev</strong>,
     87 <strong>properties</strong>, etc... For example, still with the previous
     88 example:</p><pre><code>doc-&gt;children-&gt;children-&gt;children</code></pre><p>points to the title element,</p><pre>doc-&gt;children-&gt;children-&gt;next-&gt;children-&gt;children</pre><p>points to the text node containing the chapter title "The Linux
     89 adventure".</p><p><strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be
     90 present before the document root, so <code>doc-&gt;children</code> may point
     91 to an element which is not the document Root Element; a function
     92 <code>xmlDocGetRootElement()</code> was added for this purpose.</p><h3><a name="Modifying" id="Modifying">Modifying the tree</a></h3><p>Functions are provided for reading and writing the document content. Here
     93 is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p><dl>
     94   <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
     95   xmlChar *value);</code></dt>
     96     <dd><p>This sets (or changes) an attribute carried by an ELEMENT node.
     97       The value can be NULL.</p>
     98     </dd>
     99 </dl><dl>
    100   <dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
    101   *name);</code></dt>
    102     <dd><p>This function returns a pointer to new copy of the property
    103       content. Note that the user must deallocate the result.</p>
    104     </dd>
    105 </dl><p>Two functions are provided for reading and writing the text associated
    106 with elements:</p><dl>
    107   <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
    108   *value);</code></dt>
    109     <dd><p>This function takes an "external" string and converts it to one
    110       text node or possibly to a list of entity and text nodes. All
    111       non-predefined entity references like &amp;Gnome; will be stored
    112       internally as entity nodes, hence the result of the function may not be
    113       a single node.</p>
    114     </dd>
    115 </dl><dl>
    116   <dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
    117   inLine);</code></dt>
    118     <dd><p>This function is the inverse of
    119       <code>xmlStringGetNodeList()</code>. It generates a new string
    120       containing the content of the text and entity nodes. Note the extra
    121       argument inLine. If this argument is set to 1, the function will expand
    122       entity references.  For example, instead of returning the &amp;Gnome;
    123       XML encoding in the string, it will substitute it with its value (say,
    124       "GNU Network Object Model Environment").</p>
    125     </dd>
    126 </dl><h3><a name="Saving" id="Saving">Saving a tree</a></h3><p>Basically 3 options are possible:</p><dl>
    127   <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
    128   *size);</code></dt>
    129     <dd><p>Returns a buffer into which the document has been saved.</p>
    130     </dd>
    131 </dl><dl>
    132   <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
    133     <dd><p>Dumps a document to an open file descriptor.</p>
    134     </dd>
    135 </dl><dl>
    136   <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
    137     <dd><p>Saves the document to a file. In this case, the compression
    138       interface is triggered if it has been turned on.</p>
    139     </dd>
    140 </dl><h3><a name="Compressio" id="Compressio">Compression</a></h3><p>The library transparently handles compression when doing file-based
    141 accesses. The level of compression on saves can be turned on either globally
    142 or individually for one file:</p><dl>
    143   <dt><code>int  xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
    144     <dd><p>Gets the document compression ratio (0-9).</p>
    145     </dd>
    146 </dl><dl>
    147   <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
    148     <dd><p>Sets the document compression ratio.</p>
    149     </dd>
    150 </dl><dl>
    151   <dt><code>int  xmlGetCompressMode(void);</code></dt>
    152     <dd><p>Gets the default compression ratio.</p>
    153     </dd>
    154 </dl><dl>
    155   <dt><code>void xmlSetCompressMode(int mode);</code></dt>
    156     <dd><p>Sets the default compression ratio.</p>
    157     </dd>
    158 </dl><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>
    159