Home | History | Annotate | Download | only in doc
      1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
      2     "http://www.w3.org/TR/html4/loose.dtd">
      3 <html>
      4 <head>
      5   <meta http-equiv="Content-Type" content="text/html">
      6   <style type="text/css"></style>
      7 <!--
      8 TD {font-family: Verdana,Arial,Helvetica}
      9 BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
     10 H1 {font-family: Verdana,Arial,Helvetica}
     11 H2 {font-family: Verdana,Arial,Helvetica}
     12 H3 {font-family: Verdana,Arial,Helvetica}
     13 A:link, A:visited, A:active { text-decoration: underline }
     14   </style>
     15 -->
     16   <title>XML resources publication guidelines</title>
     17 </head>
     18 
     19 <body bgcolor="#fffacd" text="#000000">
     20 <h1 align="center">XML resources publication guidelines</h1>
     21 
     22 <p></p>
     23 
     24 <p>The goal of this document is to provide a set of guidelines and tips
     25 helping the publication and deployment of <a
     26 href="http://www.w3.org/XML/">XML</a> resources for the <a
     27 href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
     28 GNOME and might be helpful more generally. I welcome <a
     29 href="mailto:veillard (a] redhat.com">feedback</a> on this document.</p>
     30 
     31 <p>The intended audience is the software developers who started using XML
     32 for some of the resources of their project, as a storage format, for data
     33 exchange, checking or transformations. There have been an increasing number
     34 of new XML formats defined, but not all steps have been taken, possibly because of
     35 lack of documentation, to truly gain all the benefits of the use of XML.
     36 These guidelines hope to improve the matter and provide a better overview of
     37 the overall XML processing and associated steps needed to deploy it
     38 successfully:</p>
     39 
     40 <p>Table of contents:</p>
     41 <ol>
     42   <li><a href="#Design">Design guidelines</a></li>
     43   <li><a href="#Canonical">Canonical URL</a></li>
     44   <li><a href="#Catalog">Catalog setup</a></li>
     45   <li><a href="#Package">Package integration</a></li>
     46 </ol>
     47 
     48 <h2><a name="Design">Design guidelines</a></h2>
     49 
     50 <p>This part intends to focus on the format itself of XML. It may  arrive
     51 a bit too late since the structure of the document may already be cast in
     52 existing and deployed code. Still, here are a few rules which might be helpful
     53 when designing a new XML vocabulary or making the revision of an existing
     54 format:</p>
     55 
     56 <h3>Reuse existing formats:</h3>
     57 
     58 <p>This may sounds a bit simplistic, but before designing your own format,
     59 try to lookup existing XML vocabularies on similar data. Ideally this allows
     60 you to reuse them, in which case a lot of the existing tools like DTD, schemas
     61 and stylesheets may already be available. If you are looking at a
     62 documentation format, <a href="http://www.docbook.org/">DocBook</a> should
     63 handle your needs. If reuse is not possible because some semantic or use case
     64 aspects are too different this will be helpful avoiding design errors like
     65 targeting the vocabulary to the wrong abstraction level. In this format
     66 design phase try to be synthetic and be sure to express the real content of
     67 your data and use the XML structure to express the semantic and context of
     68 those data.</p>
     69 
     70 <h3>DTD rules:</h3>
     71 
     72 <p>Building a DTD (Document Type Definition) or a Schema describing the
     73 structure allowed by instances is the core of the design process of the
     74 vocabulary. Here are a few tips:</p>
     75 <ul>
     76   <li>use significant words for the element and attributes names.</li>
     77   <li>do not use attributes for general textual content, attributes
     78     will be modified by the parser before reaching the application,
     79     spaces and line informations will be modified.</li>
     80   <li>use single elements for every string that might be subject to
     81     localization. The canonical way to localize XML content is to use
     82     siblings element carrying different xml:lang attributes like in the
     83     following:
     84     <pre>&lt;welcome&gt;
     85   &lt;msg xml:lang="en"&gt;hello&lt;/msg&gt;
     86   &lt;msg xml:lang="fr"&gt;bonjour&lt;/msg&gt;
     87 &lt;/welcome&gt;</pre>
     88   </li>
     89   <li>use attributes to refine the content of an element but avoid them for
     90     more complex tasks, attribute parsing is not cheaper than an element and
     91     it is far easier to make an element content more complex while attribute
     92     will have to remain very simple.</li>
     93 </ul>
     94 
     95 <h3>Versioning:</h3>
     96 
     97 <p>As part of the design, make sure the structure you define will be usable
     98 for future extension that you may not consider for the current version. There
     99 are two parts to this:</p>
    100 <ul>
    101   <li>Make sure the instance contains a version number which will allow to
    102     make backward compatibility easy. Something as simple as having a
    103     <code>version="1.0"</code> on the root document of the instance is
    104     sufficient.</li>
    105   <li>While designing the code doing the analysis of the data provided by the
    106     XML parser, make sure you can work with unknown versions, generate a UI
    107     warning and process only the tags recognized by your version but keep in
    108     mind that you should not break on unknown elements if the version
    109     attribute was not in the recognized set.</li>
    110 </ul>
    111 
    112 <h3>Other design parts:</h3>
    113 
    114 <p>While defining you vocabulary, try to think in term of other usage of your
    115 data, for example how using XSLT stylesheets could be used to make an HTML
    116 view of your data, or to convert it into a different format. Checking XML
    117 Schemas and looking at defining an XML Schema with a more complete
    118 validation and datatyping of your data structures is important, this helps
    119 avoiding some mistakes in the design phase.</p>
    120 
    121 <h3>Namespace:</h3>
    122 
    123 <p>If you expect your XML vocabulary to be used or recognized outside of your
    124 application (for example binding a specific processing from a graphic shell
    125 like Nautilus to an instance of your data) then you should really define an <a
    126 href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
    127 vocabulary. A namespace name is an URL (absolute URI more precisely). It is
    128 generally recommended to anchor it as an HTTP resource to a server associated
    129 with the software project. See the next section about this. In practice this
    130 will mean that XML parsers will not handle your element names as-is but as a
    131 couple based on the namespace name and the element name. This allows it to
    132 recognize and disambiguate processing. Unicity of the namespace name can be
    133 for the most part guaranteed by the use of the DNS registry. Namespace can
    134 also be used to carry versioning information like:</p>
    135 
    136 <p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
    137 
    138 <p>An easy way to use them is to make them the default namespace on the
    139 root element of the XML instance like:</p>
    140 <pre>&lt;structure xmlns="http://www.gnome.org/project/projectname/1.0/"&gt;
    141   &lt;data&gt;
    142   ...
    143   &lt;/data&gt;
    144 &lt;/structure&gt;</pre>
    145 
    146 <p>In that document, structure and all descendant elements like data are in
    147 the given namespace.</p>
    148 
    149 <h2><a name="Canonical">Canonical URL</a></h2>
    150 
    151 <p>As seen in the previous namespace section, while XML processing is not
    152 tied to the Web there is a natural synergy between both. XML was designed to
    153 be available on the Web, and keeping the infrastructure that way helps
    154 deploying the XML resources. The core of this issue is the notion of
    155 "Canonical URL" of an XML resource. The resource can be an XML document, a
    156 DTD, a stylesheet, a schema, or even non-XML data associated with an XML
    157 resource, the canonical URL is the URL where the "master" copy of that
    158 resource is expected to be present on the Web. Usually when processing XML a
    159 copy of the resource will be present on the local disk, maybe in
    160 /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
    161 (horror !). The key point is that the way to name that resource should be
    162 independent of the actual place where it resides on disk if it is available,
    163 and the fact that the processing will still work if there is no local copy
    164 (and that the machine where the processing is connected to the Internet).</p>
    165 
    166 <p>What this really means is that one should never use the local name of a
    167 resource to reference it but always use the canonical URL. For example in a
    168 DocBook instance the following should not be used:</p>
    169 <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
    170 
    171 
    172                          "/usr/share/xml/docbook/4.2/docbookx.dtd"&gt;</pre>
    173 
    174 <p>But always reference the canonical URL for the DTD:</p>
    175 <pre>&lt;!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
    176 
    177 
    178                          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"&gt;   </pre>
    179 
    180 <p>Similarly, the document instance may reference the <a
    181 href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
    182 generate HTML, and the canonical URL should be used:</p>
    183 <pre>&lt;?xml-stylesheet
    184   href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
    185   type="text/xsl"?&gt;</pre>
    186 
    187 <p>Defining the canonical URL for the resources needed should obey a few
    188 simple rules similar to those used to design namespace names:</p>
    189 <ul>
    190   <li>use a DNS name you know is associated to the project and will be
    191     available on the long term</li>
    192   <li>within that server space, reserve the right to the subtree where you
    193     intend to keep those data</li>
    194   <li>version the URL so that multiple concurrent versions of the resources
    195     can be hosted simultaneously</li>
    196 </ul>
    197 
    198 <h2><a name="Catalog">Catalog setup</a></h2>
    199 
    200 <h3>How catalogs work:</h3>
    201 
    202 <p>The catalogs are the technical mechanism which allow the XML processing
    203 tools to use a local copy of the resources if it is available even if the
    204 instance document references the canonical URL. <a
    205 href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
    206 anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
    207 defined by the user). They are a tree of XML documents defining the mappings
    208 between the canonical naming space and the local installed ones, this can be
    209 seen as a static cache structure.</p>
    210 
    211 <p>When the XML processor is asked to process a resource it will
    212 automatically test for a locally available version in the catalog, starting
    213 from the root catalog, and possibly fetching sub-catalog resources until it
    214 finds that the catalog has that resource or not. If not the default
    215 processing of fetching the resource from the Web is done, allowing in most
    216 case to recover from a catalog miss. The key point is that the document
    217 instances are totally independent of the availability of a catalog or from
    218 the actual place where the local resource they reference may be installed.
    219 This greatly improves the management of the documents in the long run, making
    220 them independent of the platform or toolchain used to process them. The
    221 figure below tries to express that  mechanism:<img src="catalog.gif"
    222 alt="Picture describing the catalog "></p>
    223 
    224 <h3>Usual catalog setup:</h3>
    225 
    226 <p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
    227 the root catalog containing only "delegates" indicating a separate subcatalog
    228 dedicated to the project. The goal is to keep the root catalog clean and
    229 simplify the maintenance of the catalog by using separate catalogs per
    230 project. For example when creating a catalog for the <a
    231 href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
    232 the root catalog:</p>
    233 <pre>  &lt;delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
    234                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
    235   &lt;delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
    236                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;
    237   &lt;delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
    238                   catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/&gt;</pre>
    239 
    240 <p>They are all "delegates" meaning that if the catalog system is asked to
    241 resolve a reference corresponding to them, it has to lookup a sub catalog.
    242 Here the subcatalog was installed as
    243 <code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree. That
    244 decision is left to the sysadmin or the packager for that system and may
    245 obey different rules, but the actual place on the filesystem (or on a
    246 resource cache on the local network) will not influence the processing as
    247 long as it is available. The first rule indicate that if the reference uses a
    248 PUBLIC identifier beginning with the</p>
    249 
    250 <p><code>"-//W3C//DTD XHTML 1.0"</code></p>
    251 
    252 <p>substring, then the catalog lookup should be limited to the specific given
    253 lookup catalog. Similarly the second and third entries indicate those
    254 delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
    255 starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> substring
    256 which indicates the location on the W3C server where the XHTML1 resources are
    257 stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
    258 Those three rules are sufficient in practice to capture all references to XHTML1
    259 resources and direct the processing tools to the right subcatalog.</p>
    260 
    261 <h3>A subcatalog example:</h3>
    262 
    263 <p>Here is the complete subcatalog used for XHTML1:</p>
    264 <pre>&lt;?xml version="1.0"?&gt;
    265 &lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
    266           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
    267 &lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
    268   &lt;public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
    269           uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/&gt;
    270   &lt;public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
    271           uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/&gt;
    272   &lt;public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
    273           uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/&gt;
    274   &lt;rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
    275           rewritePrefix="xhtml1-20020801/DTD"/&gt;
    276   &lt;rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
    277           rewritePrefix="xhtml1-20020801/DTD"/&gt;
    278 &lt;/catalog&gt;</pre>
    279 
    280 <p>There are a few things to notice:</p>
    281 <ul>
    282   <li>this is an XML resource, it points to the DTD using Canonical URLs, the
    283     root element defines a namespace (but based on an URN not an HTTP
    284   URL).</li>
    285   <li>it contains 5 rules, the 3 first ones are direct mapping for the 3
    286     PUBLIC identifiers defined by the XHTML1 specification and associating
    287     them with the local resource containing the DTD, the 2 last ones are
    288     rewrite rules allowing to build the local filename for any URL based on
    289     "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
    290     keeping the same structure as the on-line server at the Canonical URL</li>
    291   <li>the local resources are designated using URI references (the uri or
    292     rewritePrefix attributes), the base being the containing sub-catalog URL,
    293     which means that in practice the copy of the XHTML1 strict DTD is stored
    294     locally in
    295     <code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
    296 </ul>
    297 
    298 <p>Those 5 rules are sufficient to cover all references to the resources held
    299 at the Canonical URL for the XHTML1 DTDs.</p>
    300 
    301 <h2><a name="Package">Package integration</a></h2>
    302 
    303 <p>Creating and removing catalogs should be handled as part of the process of
    304 (un)installing the local copy of the resources. The catalog files being XML
    305 resources should be processed with XML based tools to avoid problems with the
    306 generated files, the xmlcatalog command coming with libxml2 allows you to create
    307 catalogs, and add or remove rules at that time. Here is a complete example
    308 coming from the RPM for the XHTML1 DTDs post install script. While this example
    309 is platform and packaging specific, this can be useful as a an example in
    310 other contexts:</p>
    311 <pre>%post
    312 CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
    313 #
    314 # Register it in the super catalog with the appropriate delegates
    315 #
    316 ROOTCATALOG=/etc/xml/catalog
    317 
    318 if [ ! -r $ROOTCATALOG ]
    319 then
    320     /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
    321 fi
    322 
    323 if [ -w $ROOTCATALOG ]
    324 then
    325         /usr/bin/xmlcatalog --noout --add "delegatePublic" \
    326                 "-//W3C//DTD XHTML 1.0" \
    327                 "file://$CATALOG" $ROOTCATALOG
    328         /usr/bin/xmlcatalog --noout --add "delegateSystem" \
    329                 "http://www.w3.org/TR/xhtml1/DTD" \
    330                 "file://$CATALOG" $ROOTCATALOG
    331         /usr/bin/xmlcatalog --noout --add "delegateURI" \
    332                 "http://www.w3.org/TR/xhtml1/DTD" \
    333                 "file://$CATALOG" $ROOTCATALOG
    334 fi</pre>
    335 
    336 <p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
    337 installed as part of the files of the packages. So the only work needed is to
    338 make sure the root catalog exists and register the delegate rules.</p>
    339 
    340 <p>Similarly, the script for the post-uninstall just remove the rules from the
    341 catalog:</p>
    342 <pre>%postun
    343 #
    344 # On removal, unregister the xmlcatalog from the supercatalog
    345 #
    346 if [ "$1" = 0 ]; then
    347     CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
    348     ROOTCATALOG=/etc/xml/catalog
    349 
    350     if [ -w $ROOTCATALOG ]
    351     then
    352             /usr/bin/xmlcatalog --noout --del \
    353                     "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
    354             /usr/bin/xmlcatalog --noout --del \
    355                     "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
    356             /usr/bin/xmlcatalog --noout --del \
    357                     "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
    358     fi
    359 fi</pre>
    360 
    361 <p>Note the test against $1, this is needed to not remove the delegate rules
    362 in case of upgrade of the package.</p>
    363 
    364 <p>Following the set of guidelines and tips provided in this document should
    365 help deploy the XML resources in the GNOME framework without much pain and
    366 ensure a smooth evolution of the resource and instances.</p>
    367 
    368 <p><a href="mailto:veillard (a] redhat.com">Daniel Veillard</a></p>
    369 
    370 <p>$Id$</p>
    371 
    372 <p></p>
    373 </body>
    374 </html>
    375