<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css"> TD {font-family: Verdana,Arial,Helvetica} BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} H1 {font-family: Verdana,Arial,Helvetica} H2 {font-family: Verdana,Arial,Helvetica} H3 {font-family: Verdana,Arial,Helvetica} A:link, A:visited, A:active { text-decoration: underline } </style><title>Catalog support</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>Catalog support</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html">Home</a></li><li><a href="html/index.html">Reference Manual</a></li><li><a href="intro.html">Introduction</a></li><li><a href="FAQ.html">FAQ</a></li><li><a href="docs.html" style="font-weight:bold">Developer Menu</a></li><li><a href="bugs.html">Reporting bugs and getting help</a></li><li><a href="help.html">How to help</a></li><li><a href="downloads.html">Downloads</a></li><li><a href="news.html">Releases</a></li><li><a href="XMLinfo.html">XML</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="xmldtd.html">Validation & DTDs</a></li><li><a href="encoding.html">Encodings support</a></li><li><a href="catalog.html">Catalog support</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="contribs.html">Contributions</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="html/index.html" style="font-weight:bold">API Menu</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="ChangeLog.html">Recent Changes</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://www.blastwave.org/packages.php/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>Table of Content:</p><ol><li><a href="General2">General overview</a></li> <li><a href="#definition">The definition</a></li> <li><a href="#Simple">Using catalogs</a></li> <li><a href="#Some">Some examples</a></li> <li><a href="#reference">How to tune catalog usage</a></li> <li><a href="#validate">How to debug catalog processing</a></li> <li><a href="#Declaring">How to create and maintain catalogs</a></li> <li><a href="#implemento">The implementor corner quick review oftheAPI</a></li> <li><a href="#Other">Other resources</a></li> </ol><h3><a name="General2" id="General2">General overview</a></h3><p>What is a catalog? Basically it's a lookup mechanism used when an entity(afile or a remote resource) references another entity. The catalog lookupisinserted between the moment the reference is recognized by the software(XMLparser, stylesheet processing, or even images referenced for inclusionin arendering) and the time where loading that resource is actuallystarted.</p><p>It is basically used for 3 things:</p><ul><li>mapping from "logical" names, the public identifiers and a moreconcretename usable for download (and URI). For example it can associatethelogical name <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p> <p>of the DocBook 4.1.2 XML DTD with the actual URL where it canbedownloaded</p> <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p> </li> <li>remapping from a given URL to another one, like an HTTPindirectionsaying that <p>"http://www.oasis-open.org/committes/tr.xsl"</p> <p>should really be looked at</p> <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p> </li> <li>providing a local cache mechanism allowing to load theentitiesassociated to public identifiers or remote resources, this is areallyimportant feature for any significant deployment of XML or SGMLsince itallows to avoid the aleas and delays associated to fetchingremoteresources.</li> </ul><h3><a name="definition" id="definition">The definitions</a></h3><p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p><ul><li>the older SGML catalogs, the official spec is SGML OpenTechnicalResolution TR9401:1997, but is better understood by reading <a href="http://www.jclark.com/sp/catalog.htm">the SP Catalogpage</a>fromJames Clark. This is relatively old and not the preferredmode ofoperation of libxml.</li> <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XMLCatalogs</a>isfar more flexible, more recent, uses an XML syntax andshould scale quitebetter. This is the default option of libxml.</li> </ul><p></p><h3><a name="Simple" id="Simple">Using catalog</a></h3><p>In a normal environment libxml2 will by default check the presence ofacatalog in /etc/xml/catalog, and assuming it has been correctlypopulated,the processing is completely transparent to the document user. Totake aconcrete example, suppose you are authoring a DocBook document, thisonestarts with the following DOCTYPE definition:</p><pre><?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre><p>When validating the document with libxml, the catalog will beautomaticallyconsulted to lookup the public identifier "-//Norman Walsh//DTDDocBk XMLV3.1.4//EN" and the systemidentifier"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if theseentities havebeen installed on your system and the catalogs actually point tothem, libxmlwill fetch them from the local disk.</p><p style="font-size: 10pt"><strong>Note</strong>: Really don't usethisDOCTYPE example it's a really old version, but is fine as an example.</p><p>Libxml2 will check the catalog each time that it is requested to loadanentity, this includes DTD, external parsed entities, stylesheets, etc ...Ifyour system is correctly configured all the authoring phase andprocessingshould use only local files, even if your document stays portablebecause ituses the canonical public and system ID, referencing the remotedocument.</p><h3><a name="Some" id="Some">Some examples:</a></h3><p>Here is a couple of fragments from XML Catalogs used in libxml2earlyregression tests in <code>test/catalogs</code>:</p><pre><?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> ...</pre><p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogsarewritten in XML, there is a specific namespace for catalogelements"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry inthiscatalog is a <code>public</code>mapping it allows to associate aPublicIdentifier with an URI.</p><pre>... <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/" rewritePrefix="file:///usr/share/xml/docbook/"/> ...</pre><p>A <code>rewriteSystem</code>is a very powerful instruction, it saysthatany URI starting with a given prefix should be looked at anotherURIconstructed by replacing the prefix with an new one. In effect this actslikea cache system for a full area of the Web. In practice it is extremelyusefulwith a file prefix if you have installed a copy of those resources onyourlocal system.</p><pre>... <delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //" catalog="file:///usr/share/xml/docbook.xml"/> <delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML" catalog="file:///usr/share/xml/docbook.xml"/> <delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML" catalog="file:///usr/share/xml/docbook.xml"/> <delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/" catalog="file:///usr/share/xml/docbook.xml"/> <delegateURI uriStartString="http://www.oasis-open.org/docbook/" catalog="file:///usr/share/xml/docbook.xml"/> ...</pre><p>Delegation is the core features which allows to build a tree ofcatalogs,easier to maintain than a single catalog, based on PublicIdentifier, SystemIdentifier or URI prefixes it instructs the catalogsoftware to look upentries in another resource. This feature allow to buildhierarchies ofcatalogs, the set of entries presented should be sufficient toredirect theresolution of all DocBook references to the specific catalogin<code>/usr/share/xml/docbook.xml</code>this one in turn could delegateallreferences for DocBook 4.2.1 to a specific catalog installed at the sametimeas the DocBook resources on the local machine.</p><h3><a name="reference" id="reference">How to tune catalog usage:</a></h3><p>The user can change the default catalog behaviour by redirecting queriestoits own set of catalogs, this can be done by settingthe<code>XML_CATALOG_FILES</code>environment variable to a list of catalogs,anempty one should deactivate loading the default<code>/etc/xml/catalog</code>default catalog</p><h3><a name="validate" id="validate">How to debug catalog processing:</a></h3><p>Setting up the <code>XML_DEBUG_CATALOG</code>environment variable willmakelibxml2 output debugging informations for each catalog operations,forexample:</p><pre>orchis:~/XML -> xmllint --memory --noout test/ent2 warning: failed to load external entity "title.xml" orchis:~/XML -> export XML_DEBUG_CATALOG= orchis:~/XML -> xmllint --memory --noout test/ent2 Failed to parse catalog /etc/xml/catalog Failed to parse catalog /etc/xml/catalog warning: failed to load external entity "title.xml" Catalogs cleanup orchis:~/XML -> </pre><p>The test/ent2 references an entity, running the parser from memorymakesthe base URI unavailable and the the "title.xml" entity cannot beloaded.Setting up the debug environment variable allows to detect that anattempt ismade to load the <code>/etc/xml/catalog</code>but since it's notpresent theresolution fails.</p><p>But the most advanced way to debug XML catalog processing is to usethe<strong>xmlcatalog</strong>command shipped with libxml2, it allows toloadcatalogs and make resolution queries to see what is going on. This isalsoused for the regression tests:</p><pre>orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \ "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd orchis:~/XML -> </pre><p>For debugging what is going on, adding one -v flags increase theverbositylevel to indicate the processing done (adding a second flag alsoindicatewhat elements are recognized at parsing):</p><pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \ "-//OASIS//DTD DocBook XML V4.1.2//EN" Parsing catalog test/catalogs/docbook.xml's content Found public match -//OASIS//DTD DocBook XML V4.1.2//EN http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd Catalogs cleanup orchis:~/XML -> </pre><p>A shell interface is also available to debug and process multiplequeries(and for regression tests):</p><pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \ "-//OASIS//DTD DocBook XML V4.1.2//EN" > help Commands available: public PublicID: make a PUBLIC identifier lookup system SystemID: make a SYSTEM identifier lookup resolve PublicID SystemID: do a full resolver lookup add 'type' 'orig' 'replace' : add an entry del 'values' : remove values dump: print the current catalog state debug: increase the verbosity level quiet: decrease the verbosity level exit: quit the shell > public "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd > quit orchis:~/XML -> </pre><p>This should be sufficient for most debugging purpose, this wasactuallyused heavily to debug the XML Catalog implementation itself.</p><h3><a name="Declaring" id="Declaring">How to create and maintain</a>catalogs:</h3><p>Basically XML Catalogs are XML files, you can either use XML toolstomanage them or use <strong>xmlcatalog</strong>for this. The basic stepisto create a catalog the -create option provide this facility:</p><pre>orchis:~/XML -> ./xmlcatalog --create tst.xml <?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> orchis:~/XML -> </pre><p>By default xmlcatalog does not overwrite the original catalog and savetheresult on the standard output, this can be overridden using the-nooutoption. The <code>-add</code>command allows to add entries inthecatalog:</p><pre>orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \ "-//OASIS//DTD DocBook XML V4.1.2//EN" \ http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml orchis:~/XML -> cat tst.xml <?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"> <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/> </catalog> orchis:~/XML -> </pre><p>The <code>-add</code>option will always take 3 parameters even if someofthe XML Catalog constructs (like nextCatalog) will have only asingleargument, just pass a third empty string, it will be ignored.</p><p>Similarly the <code>-del</code>option remove matching entries fromthecatalog:</p><pre>orchis:~/XML -> ./xmlcatalog --del \ "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml <?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/> orchis:~/XML -> </pre><p>The catalog is now empty. Note that the matching of<code>-del</code>isexact and would have worked in a similar fashion with thePublic IDstring.</p><p>This is rudimentary but should be sufficient to manage a not toocomplexcatalog tree of resources.</p><h3><a name="implemento" id="implemento">The implementor corner quick review oftheAPI:</a></h3><p>First, and like for every other module of libxml, there is anautomaticallygenerated <a href="html/libxml-catalog.html">API page forcatalogsupport</a>.</p><p>The header for the catalog interfaces should be included as:</p><pre>#include <libxml/catalog.h></pre><p>The API is voluntarily kept very simple. First it is not obviousthatapplications really need access to it since it is the default behaviouroflibxml2 (Note: it is possible to completely override libxml2 defaultcatalogby using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a>toplug anapplication specific resolver).</p><p>Basically libxml2 support 2 catalog lists:</p><ul><li>the default one, global shared by all the application</li> <li>a per-document catalog, this one is built if the document usesthe<code>oasis-xml-catalog</code>PIs to specify its own catalog list, itisassociated to the parser context and destroyed when the parsingcontextis destroyed.</li> </ul><p>the document one will be used first if it exists.</p><h4>Initialization routines:</h4><p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() shouldbeused at startup to initialize the catalog, if the catalog shouldbeinitialized with specific values xmlLoadCatalog() orxmlLoadCatalogs()should be called before xmlInitializeCatalog() which wouldotherwise do adefault initialization first.</p><p>The xmlCatalogAddLocal() call is used by the parser to grow thedocumentown catalog list if needed.</p><h4>Preferences setup:</h4><p>The XML Catalog spec requires the possibility to select defaultpreferencesbetween public and system delegation,xmlCatalogSetDefaultPrefer() allowsthis, xmlCatalogSetDefaults() andxmlCatalogGetDefaults() allow to control ifXML Catalogs resolution shouldbe forbidden, allowed for global catalog, fordocument catalog or both, thedefault is to allow both.</p><p>And of course xmlCatalogSetDebug() allows to generate debugmessages(through the xmlGenericError() mechanism).</p><h4>Querying routines:</h4><p>xmlCatalogResolve(), xmlCatalogResolveSystem(),xmlCatalogResolvePublic()and xmlCatalogResolveURI() are relatively explicitif you read the XMLCatalog specification they correspond to section 7algorithms, they shouldalso work if you have loaded an SGML catalog with asimplified semantic.</p><p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the samebutoperate on the document catalog list</p><h4>Cleanup and Miscellaneous:</h4><p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal()isthe per-document equivalent.</p><p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modifythefirst catalog in the global list, and xmlCatalogDump() allows to dumpacatalog state, those routines are primarily designed for xmlcatalog, I'mnotsure that exposing more complex interfaces (like navigation ones) wouldbereally useful.</p><p>The xmlParseCatalogFile() is a function used to load XML Catalogfiles,it's similar as xmlParseFile() except it bypass all catalog lookups,it'sprovided because this functionality may be useful for client tools.</p><h4>threaded environments:</h4><p>Since the catalog tree is built progressively, some care has been takentotry to avoid troubles in multithreaded environments. The code is nowthreadsafe assuming that the libxml2 library has been compiled withthreadssupport.</p><p></p><h3><a name="Other" id="Other">Other resources</a></h3><p>The XML Catalog specification is relatively recent so there isn'tmuchliterature to point at:</p><ul><li>You can find a good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">theneedfor catalogs</a>, it provides a lot of context informations even ifIdon't agree with everything presented. Norm also wrote a morerecentarticle <a href="http://wwws.sun.com/software/xml/developers/resolver/article/">XMLentitiesand URI resolvers</a>describing them.</li> <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">oldXMLcatalog proposal</a>from John Cowan</li> <li>The <a href="http://www.rddl.org/">Resource DirectoryDescriptionLanguage</a>(RDDL) another catalog system but more orientedtowardproviding metadata for XML namespaces.</li> <li>the page from the OASIS Technical <a href="http://www.oasis-open.org/committees/entity/">Committee onEntityResolution</a>who maintains XML Catalog, you will find pointers tothespecification update, some background and pointers to otherstoolsproviding XML Catalog support</li> <li>There is a <a href="buildDocBookCatalog">shell script</a>to generateXMLCatalogs for DocBook 4.1.2 . If it can write to the /etc/xml/directory,it will set-up /etc/xml/catalog and /etc/xml/docbook based ontheresources found on the system. Otherwise it will just create~/xmlcatalogand ~/dbkxmlcatalog and doing: <p><code>export XML_CATALOG_FILES=$HOME/xmlcatalog</code></p> <p>should allow to process DocBook documentations withoutrequiringnetwork accesses for the DTD or stylesheets</p> </li> <li>I have uploaded <a href="ftp://xmlsoft.org/libxml2/test/dbk412catalog.tar.gz">asmalltarball</a>containing XML Catalogs for DocBook 4.1.2 which seemsto workfine for me too</li> <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalogmanualpage</a></li> </ul><p>If you have suggestions for corrections or additions, simply contactme:</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>