prev
next
XML Syntax Rules
The body of an XML document is a tree of element nodes with a single root.
Each element node is a tagged structure of Unicode characters.
-
element syntax is:
<tagName *[attributeName="value"]> element body </tagName>
TagNames are user defined (with two exceptions we will discuss later),
as are attributes.
-
Unlike HTML, tagNames and attributes are case sensitive.
-
XML names are composed of Unicode characters.
-
TagNames must begin with a letter or underscore.
-
Other tagName characters may contain characters, underscores, digits,
hypens, and periods.
-
Names may not start with the letters xml, including any variations
on case. These names are reserved for use by W3C, as in the document
header, at the top of this page's source file.
-
Attributes are a property of the element, as opposed to the data it
contains, carried in the element body. Attribute names follow the
same rules as tagNames, and are also required to be unique, within
the tag in which they are embedded. That is, an attribute name may
not appear more than once in a tag.
-
Element bodies contain character data.
-
Character data is any data that is not markup, e.g., stuff inside
tags.
-
The characters &, <, >, ', and " are markup delimiters and
may not appear in character data. These may be represented by the
five escape sequences defined for XML, e.g., "&", "<",
">", "&apos", and """, respectively.
-
CDATA sections are a way to pass data that contains markup delimiters
without using escape sequences. The XML parser will not interprete
characters in a CDATA section, but simply pass it along to the
application. The syntax for a CDATA section is:
<! [CDATA[...]]>
Note that CDATA sections will not work for passing binary data to an
application, as it is possible that the binary data contain a bit
sequence interpreted by the parser as "]]>". This would cause
termination of the CDATA section before the binary data was completely
digested.
To pass binary data, you must convert it to a character representation.
Converting to hexadecimal representation is easy, but doubles memory
required to hold the data. The Simple Object Acess Protocol (SOAP)
parser uses a conversion scheme that expands binary data by about 30%.
-
A well-formed XML document has:
-
An optional prolog:
-
Starts with the line:
<?xml version="1.0">
This identifies the file contents as belonging to an XML document.
-
Processing instructions (more on this later).
-
A reference to a DTD or schema, used for validation.
Stand-alone documents should include the first item. The second two
are strictly optional. XML data islands, used as part of a larger
document (perhaps HTML), will not use the first item either.
-
A body with a single root node. The body of this root node may
contain one or more elements, as may the bodies of any descendent
node.
-
An optional epilogue consisting of comments and processing
instructions.
prev
next