XML Basics:
The eXtensible Markup Language (XML) was created to store and define
complex, hiearchically structured data for exchange and storage.
The XML structure begins with it's hiearchy at a root node and branches
from this document root.
The Document Type Definition (DTD) is optional and defines the data to be
presented in an XML document. It is often used to verify the data for
completness and adherance to rules.
XML Schema (XSD) is a newer and more complete data definition with definable
types. XSD will be competing with DTD as the format for data definition
especially when defining complex relationships and data types.
XML parsers fall into three major catagories:
- DOM: Import/parse all data into a data structure in memory for query.
The data is held as nodes in a data tree which can be traversed.
While this is often easier to program than SAX invocations, it uses
more memory and runs slower.
- SAX: Parse on the fly to look for the data requested.
This is event driven where callbacks are invoked as elements are
encountered during parsing. Programmer writes callbacks. A custom
class is written for each document. This is considered to be the
fastest way to parse a file.
- Xpath: (XML Path) Search data with regular expression. Very easy to use.
Usage is similar to a query with regular expression. A node list is returned
which matches the Xpath expression. It is usually implemented as an extension
to DOM.
DTD:
Number of children:
- ? Only one element permitted.
- * allows for zero or multiple elements i.e.: <!ELEMENT name (first, middle*, last?)>
- + At least one or many elements permitted.
Attributes:
CDATA #REQUIRED |
|
CDATA #IMPLIED |
|
CDATA |
Character Data |
PCDATA |
Parsed character Data |
NMTOKEN |
No whitespaces. |
NMTOKENS |
One or more name tokens separated by white space |
ENUMERATION |
i.e.
<date month="January" day="27" year="2004"/> |
ENTITY |
|
ENTITTIES |
|
ID |
XML name specified: <!ATTLIST xml_name1 xml_name2 ID #REQUIRED>
xml_name2 is required. |
IDREF |
attribute refers to an ID |
IDREFS |
|
NOTATION |
|
- XML names may include _-.
- When HTML text is included use <, &, > and " to repressent <, &, >, and " respectively.
Links:
The XML file and the DTD:
File:
testLibXml2.xml
01 | <? xml version = "1.0" encoding = "UTF-8" ?> |
02 | <!DOCTYPE AppConfigData [ |
03 | <!ELEMENT AppConfigData (DisplayX+)> |
05 | <!ELEMENT DisplayX (AlternateName*,FieldLength,TextFont?)> |
06 | <!ATTLIST DisplayX name CDATA #REQUIRED> |
07 | <!ATTLIST DisplayX type CDATA #REQUIRED> |
09 | <!ELEMENT AlternateName (#PCDATA)> |
10 | <!ATTLIST AlternateName type CDATA #REQUIRED> |
11 | <!ELEMENT FieldLength (#PCDATA)> |
12 | <!ELEMENT TextFont (#PCDATA)> |
15 | < DisplayX name = "DisplayText_A" type = "Type1" > |
16 | < AlternateName type = "Type1" >DisplayText_a</ AlternateName > |
17 | < FieldLength >30</ FieldLength > |
18 | < TextFont type = "Courier" /> |
20 | < DisplayX name = "DisplayText_B" type = "Type2" > |
21 | < FieldLength >30</ FieldLength > |
22 | < TextFont type = "Arial" /> |
24 | < DisplayX name = "DisplayText_C" type = "Type1" > |
25 | < AlternateName type = "Type1" >DisplayText_c</ AlternateName > |
26 | < FieldLength >30</ FieldLength > |
27 | < TextFont type = "Courier" /> |
29 | < DisplayX name = "DisplayText_D" type = "Type2" > |
30 | < FieldLength >30</ FieldLength > |
31 | < TextFont type = "Courier" /> |
Note: The DTD is not required for use with the Gnome LibXml2 API. If using
this API to generate XML, the DTD will not be generated.
Parsing the XML file using the Gnome libXML2 API:
Prerequisite (RPM) packages: pkgconfig, libxml2-devel, gnome-libs-devel
#include <stdio.h>
#include <stdlib.h>
#include <gtk/gtk.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
int main(int argc, char **argv)
{
xmlNode *cur_node, *child_node;
xmlChar *fieldLength, *alternateName;
char *DisplayXName, *DisplayXType, *altProp, *textFont;
// --------------------------------------------------------------------------
// Open XML document
// --------------------------------------------------------------------------
xmlDocPtr doc;
doc = xmlParseFile("testLibXml2.xml");
if (doc == NULL)
printf("error: could not parse file file.xml\n");
// --------------------------------------------------------------------------
// XML root.
// --------------------------------------------------------------------------
/*Get the root element node */
xmlNode *root = NULL;
root = xmlDocGetRootElement(doc);
// --------------------------------------------------------------------------
// Must have root element, a name and the name must be "AppConfigData"
// --------------------------------------------------------------------------
if( !root ||
!root->name ||
xmlStrcmp(root->name,"AppConfigData") )
{
xmlFreeDoc(doc);
return FALSE;
}
// --------------------------------------------------------------------------
// AppConfigData children: For each DisplayX
// --------------------------------------------------------------------------
for(cur_node = root->children; cur_node != NULL; cur_node = cur_node->next)
{
if ( cur_node->type == XML_ELEMENT_NODE &&
!xmlStrcmp(cur_node->name, (const xmlChar *) "DisplayX" ) )
{
printf("Element: %s \n", cur_node->name);
DisplayXName = xmlGetProp(cur_node,"name");
if(DisplayXName) printf(" name=%s\n", DisplayXName);
DisplayXType = xmlGetProp(cur_node,"type");
if(DisplayXType) printf(" type=%s\n", DisplayXType);
// For each child of DisplayX: i.e. AlternateName, FieldLength
for(child_node = cur_node->children; child_node != NULL; child_node = child_node->next)
{
if ( cur_node->type == XML_ELEMENT_NODE &&
!xmlStrcmp(child_node->name, (const xmlChar *)"FieldLength") )
{
printf(" Child=%s\n", child_node->name);
fieldLength = xmlNodeGetContent(child_node);
if(fieldLength) printf(" Length: %s\n", fieldLength);
xmlFree(fieldLength);
}
if ( cur_node->type == XML_ELEMENT_NODE &&
!xmlStrcmp(child_node->name, (const xmlChar *)"AlternateName") )
{
printf(" Child=%s\n", child_node->name);
alternateName = xmlNodeGetContent(child_node);
if(alternateName) printf(" Name: %s\n", alternateName);
altProp = xmlGetProp(child_node,"type");
if(altProp) printf(" type=%s\n", altProp);
xmlFree(altProp);
xmlFree(alternateName);
}
if ( cur_node->type == XML_ELEMENT_NODE &&
!xmlStrcmp(child_node->name, (const xmlChar *)"TextFont") )
{
printf(" Child=%s\n", child_node->name);
textFont = xmlGetProp(child_node,"type");
if(textFont) printf(" type=%s\n", textFont);
xmlFree(textFont);
}
}
xmlFree(DisplayXType);
xmlFree(DisplayXName);
}
}
// --------------------------------------------------------------------------
/*free the document */
xmlFreeDoc(doc);
/*
*Free the global variables that may
*have been allocated by the parser.
*/
xmlCleanupParser();
return 0;
}
Compile:
gcc -g -Wall `xml2-config --cflags --libs` `gnome-config --cflags --libs gnome gnomeui xml` -o testLibXml2 testLibXml2.c
[Potential Pitfall]: The order of the directory
paths referenced matters. Reference the libxml2 include path directories
before the gnome directory paths. The following will result in a
compilation error:
gcc -g -Wall `gnome-config --cflags --libs gnome gnomeui xml` `xml2-config --cflags --libs` -o testLibXml2 testLibXml2.c
This is due to different structure definitions of xmlDocPtr
(struct _xmlDoc) and xmlNodePtr (struct _xmlNode) in
libxml/tree.h. The reference to the subdirectory
libxml/ should have
differentiated the two versions of the include file but that is not the case
with the GNU compiler. The proper file is
/usr/include/libxml2/libxml/tree.h and
not the file
/usr/include/gnome-xml/tree.h.
Components:
- LibXML: xml2-config --cflags --libs
(Reference this first.)
- Gtk: pkg-config --cflags --libs gtk+-2.0
- Gnome: gnome-config --cflags --libs gnome gnomeui xml
Results:
$ testLibXml2
Element: DisplayX
name=DisplayText_A
type=Type1
Child=AlternateName
Name: DisplayText_a
type=Type1
Child=FieldLength
Length: 30
Child=TextFont
type=Courier
Element: DisplayX
name=DisplayText_B
type=Type2
Child=FieldLength
Length: 30
Child=TextFont
type=Arial
Element: DisplayX
name=DisplayText_C
type=Type1
Child=AlternateName
Name: DisplayText_c
type=Type1
Child=FieldLength
Length: 30
Child=TextFont
type=Courier
Element: DisplayX
name=DisplayText_D
type=Type2
Child=FieldLength
Length: 30
Child=TextFont
type=Courier
Terms:
- XSL family: has various subsets to describe XML encoded data.
W3C: XSL family
- XSL: (Extensible Stylesheet Language) describes XML encoded data.
W3C: XSL
- XSLT: (XSL Transformations) maps XML document from one form to another.
XSLT stylesheets are not procedural and often include a template to
define output.
W3C: XSLT
- XSL-FO: (XSL Formatting Objects) define visual formatting of XML document.
XML.com: Using XSL-FO
- XPath: (XML Path Language) non-XML language used to find data (XML query) within an XML document.
i.e.
- Find root element: /*
- Find all elements: //*
W3C: XPath 1.0,
W3C: XPath 2.0
- XQuery: XML query language which includes XPath and procedural programming features.
W3C XQuery
- XPointer: address components of XML document. i.e. element(el1/2/1)
Sun patent.
Links: