From 35a201cc8ef0c3f5b2df88d2e528aabee1048348 Mon Sep 17 00:00:00 2001 From: Wojtek Kosior Date: Fri, 30 Apr 2021 18:47:09 +0200 Subject: Initial/Final commit --- libxml2-2.9.10/doc/entities.html | 64 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 libxml2-2.9.10/doc/entities.html (limited to 'libxml2-2.9.10/doc/entities.html') diff --git a/libxml2-2.9.10/doc/entities.html b/libxml2-2.9.10/doc/entities.html new file mode 100644 index 0000000..e4433f2 --- /dev/null +++ b/libxml2-2.9.10/doc/entities.html @@ -0,0 +1,64 @@ + + +Entities or no entities
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Entities or no entities

Developer Menu
API Indexes
Related links

Entities in principle are similar to simple C macros. An entity defines an +abbreviation for a given string that you can reuse many times throughout the +content of your document. Entities are especially useful when a given string +may occur frequently within a document, or to confine the change needed to a +document to a restricted area in the internal subset of the document (at the +beginning). Example:

1 <?xml version="1.0"?>
+2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
+3 <!ENTITY xml "Extensible Markup Language">
+4 ]>
+5 <EXAMPLE>
+6    &xml;
+7 </EXAMPLE>

Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing +its name with '&' and following it by ';' without any spaces added. There +are 5 predefined entities in libxml2 allowing you to escape characters with +predefined meaning in some parts of the xml document content: +&lt; for the character '<', &gt; +for the character '>', &apos; for the character ''', +&quot; for the character '"', and +&amp; for the character '&'.

One of the problems related to entities is that you may want the parser to +substitute an entity's content so that you can see the replacement text in +your application. Or you may prefer to keep entity references as such in the +content to be able to save the document back without losing this usually +precious information (if the user went through the pain of explicitly +defining entities, he may have a a rather negative attitude if you blindly +substitute them as saving time). The xmlSubstituteEntitiesDefault() +function allows you to check and change the behaviour, which is to not +substitute entities by default.

Here is the DOM tree built by libxml2 for the previous document in the +default case:

/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
+DOCUMENT
+version=1.0
+   ELEMENT EXAMPLE
+     TEXT
+     content=
+     ENTITY_REF
+       INTERNAL_GENERAL_ENTITY xml
+       content=Extensible Markup Language
+     TEXT
+     content=

And here is the result when substituting entities:

/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
+DOCUMENT
+version=1.0
+   ELEMENT EXAMPLE
+     TEXT
+     content=     Extensible Markup Language

So, entities or no entities? Basically, it depends on your use case. I +suggest that you keep the non-substituting default behaviour and avoid using +entities in your XML document or data if you are not willing to handle the +entity references elements in the DOM tree.

Note that at save time libxml2 enforces the conversion of the predefined +entities where necessary to prevent well-formedness problems, and will also +transparently replace those with chars (i.e. it will not generate entity +reference elements in the DOM tree or call the reference() SAX callback when +finding them in the input).

WARNING: handling entities +on top of the libxml2 SAX interface is difficult!!! If you plan to use +non-predefined entities in your documents, then the learning curve to handle +then using the SAX API may be long. If you plan to use complex documents, I +strongly suggest you consider using the DOM interface instead and let libxml +deal with the complexity rather than trying to do it yourself.

Daniel Veillard

-- cgit v1.2.3