From 35a201cc8ef0c3f5b2df88d2e528aabee1048348 Mon Sep 17 00:00:00 2001 From: Wojtek Kosior Date: Fri, 30 Apr 2021 18:47:09 +0200 Subject: Initial/Final commit --- libxml2-2.9.10/doc/catalog.html | 261 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 261 insertions(+) create mode 100644 libxml2-2.9.10/doc/catalog.html (limited to 'libxml2-2.9.10/doc/catalog.html') diff --git a/libxml2-2.9.10/doc/catalog.html b/libxml2-2.9.10/doc/catalog.html new file mode 100644 index 0000000..42be5e7 --- /dev/null +++ b/libxml2-2.9.10/doc/catalog.html @@ -0,0 +1,261 @@ + + +Catalog support
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Catalog support

Main Menu
Related links

Table of Content:

    +
  1. General overview
  2. +
  3. The definition
  4. +
  5. Using catalogs
  6. +
  7. Some examples
  8. +
  9. How to tune catalog usage
  10. +
  11. How to debug catalog processing
  12. +
  13. How to create and maintain catalogs
  14. +
  15. The implementor corner quick review of the + API
  16. +
  17. Other resources
  18. +

General overview

What is a catalog? Basically it's a lookup mechanism used when an entity +(a file or a remote resource) references another entity. The catalog lookup +is inserted between the moment the reference is recognized by the software +(XML parser, stylesheet processing, or even images referenced for inclusion +in a rendering) and the time where loading that resource is actually +started.

It is basically used for 3 things:

    +
  • mapping from "logical" names, the public identifiers and a more + concrete name usable for download (and URI). For example it can associate + the logical name +

    "-//OASIS//DTD DocBook XML V4.1.2//EN"

    +

    of the DocBook 4.1.2 XML DTD with the actual URL where it can be + downloaded

    +

    http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd

    +
  • +
  • remapping from a given URL to another one, like an HTTP indirection + saying that +

    "http://www.oasis-open.org/committes/tr.xsl"

    +

    should really be looked at

    +

    "http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"

    +
  • +
  • providing a local cache mechanism allowing to load the entities + associated to public identifiers or remote resources, this is a really + important feature for any significant deployment of XML or SGML since it + allows to avoid the aleas and delays associated to fetching remote + resources.
  • +

The definitions

Libxml, as of 2.4.3 implements 2 kind of catalogs:

    +
  • the older SGML catalogs, the official spec is SGML Open Technical + Resolution TR9401:1997, but is better understood by reading the SP Catalog page from + James Clark. This is relatively old and not the preferred mode of + operation of libxml.
  • +
  • XML + Catalogs is far more flexible, more recent, uses an XML syntax and + should scale quite better. This is the default option of libxml.
  • +

Using catalog

In a normal environment libxml2 will by default check the presence of a +catalog in /etc/xml/catalog, and assuming it has been correctly populated, +the processing is completely transparent to the document user. To take a +concrete example, suppose you are authoring a DocBook document, this one +starts with the following DOCTYPE definition:

<?xml version='1.0'?>
+<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
+          "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">

When validating the document with libxml, the catalog will be +automatically consulted to lookup the public identifier "-//Norman Walsh//DTD +DocBk XML V3.1.4//EN" and the system identifier +"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have +been installed on your system and the catalogs actually point to them, libxml +will fetch them from the local disk.

Note: Really don't use this +DOCTYPE example it's a really old version, but is fine as an example.

Libxml2 will check the catalog each time that it is requested to load an +entity, this includes DTD, external parsed entities, stylesheets, etc ... If +your system is correctly configured all the authoring phase and processing +should use only local files, even if your document stays portable because it +uses the canonical public and system ID, referencing the remote document.

Some examples:

Here is a couple of fragments from XML Catalogs used in libxml2 early +regression tests in test/catalogs :

<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC 
+   "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+  <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+   uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+...

This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are +written in XML, there is a specific namespace for catalog elements +"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this +catalog is a public mapping it allows to associate a Public +Identifier with an URI.

...
+    <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+                   rewritePrefix="file:///usr/share/xml/docbook/"/>
+...

A rewriteSystem is a very powerful instruction, it says that +any URI starting with a given prefix should be looked at another URI +constructed by replacing the prefix with an new one. In effect this acts like +a cache system for a full area of the Web. In practice it is extremely useful +with a file prefix if you have installed a copy of those resources on your +local system.

...
+<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
+                catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
+                catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
+                catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+                catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateURI uriStartString="http://www.oasis-open.org/docbook/"
+                catalog="file:///usr/share/xml/docbook.xml"/>
+...

Delegation is the core features which allows to build a tree of catalogs, +easier to maintain than a single catalog, based on Public Identifier, System +Identifier or URI prefixes it instructs the catalog software to look up +entries in another resource. This feature allow to build hierarchies of +catalogs, the set of entries presented should be sufficient to redirect the +resolution of all DocBook references to the specific catalog in +/usr/share/xml/docbook.xml this one in turn could delegate all +references for DocBook 4.2.1 to a specific catalog installed at the same time +as the DocBook resources on the local machine.

How to tune catalog usage:

The user can change the default catalog behaviour by redirecting queries +to its own set of catalogs, this can be done by setting the +XML_CATALOG_FILES environment variable to a list of catalogs, an +empty one should deactivate loading the default /etc/xml/catalog +default catalog

How to debug catalog processing:

Setting up the XML_DEBUG_CATALOG environment variable will +make libxml2 output debugging information for each catalog operations, for +example:

orchis:~/XML -> xmllint --memory --noout test/ent2
+warning: failed to load external entity "title.xml"
+orchis:~/XML -> export XML_DEBUG_CATALOG=
+orchis:~/XML -> xmllint --memory --noout test/ent2
+Failed to parse catalog /etc/xml/catalog
+Failed to parse catalog /etc/xml/catalog
+warning: failed to load external entity "title.xml"
+Catalogs cleanup
+orchis:~/XML -> 

The test/ent2 references an entity, running the parser from memory makes +the base URI unavailable and the the "title.xml" entity cannot be loaded. +Setting up the debug environment variable allows to detect that an attempt is +made to load the /etc/xml/catalog but since it's not present the +resolution fails.

But the most advanced way to debug XML catalog processing is to use the +xmlcatalog command shipped with libxml2, it allows to load +catalogs and make resolution queries to see what is going on. This is also +used for the regression tests:

orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+orchis:~/XML -> 

For debugging what is going on, adding one -v flags increase the verbosity +level to indicate the processing done (adding a second flag also indicate +what elements are recognized at parsing):

orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+Parsing catalog test/catalogs/docbook.xml's content
+Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+Catalogs cleanup
+orchis:~/XML -> 

A shell interface is also available to debug and process multiple queries +(and for regression tests):

orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+> help   
+Commands available:
+public PublicID: make a PUBLIC identifier lookup
+system SystemID: make a SYSTEM identifier lookup
+resolve PublicID SystemID: do a full resolver lookup
+add 'type' 'orig' 'replace' : add an entry
+del 'values' : remove values
+dump: print the current catalog state
+debug: increase the verbosity level
+quiet: decrease the verbosity level
+exit:  quit the shell
+> public "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+> quit
+orchis:~/XML -> 

This should be sufficient for most debugging purpose, this was actually +used heavily to debug the XML Catalog implementation itself.

How to create and maintain catalogs:

Basically XML Catalogs are XML files, you can either use XML tools to +manage them or use xmlcatalog for this. The basic step is +to create a catalog the -create option provide this facility:

orchis:~/XML -> ./xmlcatalog --create tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+         "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> 

By default xmlcatalog does not overwrite the original catalog and save the +result on the standard output, this can be overridden using the -noout +option. The -add command allows to add entries in the +catalog:

orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \
+  "-//OASIS//DTD DocBook XML V4.1.2//EN" \
+  http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
+orchis:~/XML -> cat tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \
+  "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+        uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+</catalog>
+orchis:~/XML -> 

The -add option will always take 3 parameters even if some of +the XML Catalog constructs (like nextCatalog) will have only a single +argument, just pass a third empty string, it will be ignored.

Similarly the -del option remove matching entries from the +catalog:

orchis:~/XML -> ./xmlcatalog --del \
+  "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+    "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> 

The catalog is now empty. Note that the matching of -del is +exact and would have worked in a similar fashion with the Public ID +string.

This is rudimentary but should be sufficient to manage a not too complex +catalog tree of resources.

The implementor corner quick review of the +API:

First, and like for every other module of libxml, there is an +automatically generated API page for +catalog support.

The header for the catalog interfaces should be included as:

#include <libxml/catalog.h>

The API is voluntarily kept very simple. First it is not obvious that +applications really need access to it since it is the default behaviour of +libxml2 (Note: it is possible to completely override libxml2 default catalog +by using xmlSetExternalEntityLoader to +plug an application specific resolver).

Basically libxml2 support 2 catalog lists:

    +
  • the default one, global shared by all the application
  • +
  • a per-document catalog, this one is built if the document uses the + oasis-xml-catalog PIs to specify its own catalog list, it is + associated to the parser context and destroyed when the parsing context + is destroyed.
  • +

the document one will be used first if it exists.

Initialization routines:

xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be +used at startup to initialize the catalog, if the catalog should be +initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs() +should be called before xmlInitializeCatalog() which would otherwise do a +default initialization first.

The xmlCatalogAddLocal() call is used by the parser to grow the document +own catalog list if needed.

Preferences setup:

The XML Catalog spec requires the possibility to select default +preferences between public and system delegation, +xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and +xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should +be forbidden, allowed for global catalog, for document catalog or both, the +default is to allow both.

And of course xmlCatalogSetDebug() allows to generate debug messages +(through the xmlGenericError() mechanism).

Querying routines:

xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic() +and xmlCatalogResolveURI() are relatively explicit if you read the XML +Catalog specification they correspond to section 7 algorithms, they should +also work if you have loaded an SGML catalog with a simplified semantic.

xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but +operate on the document catalog list

Cleanup and Miscellaneous:

xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is +the per-document equivalent.

xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the +first catalog in the global list, and xmlCatalogDump() allows to dump a +catalog state, those routines are primarily designed for xmlcatalog, I'm not +sure that exposing more complex interfaces (like navigation ones) would be +really useful.

The xmlParseCatalogFile() is a function used to load XML Catalog files, +it's similar as xmlParseFile() except it bypass all catalog lookups, it's +provided because this functionality may be useful for client tools.

threaded environments:

Since the catalog tree is built progressively, some care has been taken to +try to avoid troubles in multithreaded environments. The code is now thread +safe assuming that the libxml2 library has been compiled with threads +support.

Other resources

The XML Catalog specification is relatively recent so there isn't much +literature to point at:

    +
  • You can find a good rant from Norm Walsh about the + need for catalogs, it provides a lot of context information even if + I don't agree with everything presented. Norm also wrote a more recent + article XML + entities and URI resolvers describing them.
  • +
  • An old XML + catalog proposal from John Cowan
  • +
  • The Resource Directory Description + Language (RDDL) another catalog system but more oriented toward + providing metadata for XML namespaces.
  • +
  • the page from the OASIS Technical Committee on Entity + Resolution who maintains XML Catalog, you will find pointers to the + specification update, some background and pointers to others tools + providing XML Catalog support
  • +
  • There is a shell script to generate + XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/ + directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on + the resources found on the system. Otherwise it will just create + ~/xmlcatalog and ~/dbkxmlcatalog and doing: +

    export XML_CATALOG_FILES=$HOME/xmlcatalog

    +

    should allow to process DocBook documentations without requiring + network accesses for the DTD or stylesheets

    +
  • +
  • I have uploaded a + small tarball containing XML Catalogs for DocBook 4.1.2 which seems + to work fine for me too
  • +
  • The xmlcatalog + manual page
  • +

If you have suggestions for corrections or additions, simply contact +me:

Daniel Veillard

-- cgit v1.2.3