XML Prologue

One thing I come across frequently is incorrect terminology. I’ve written about this topic once before (see HTML Tags) and others have discussed similar topics as well, particularly relating to elements, attributes and tags. But a more specific area that deserves a little more attention is the distinction between the DOCTYPE, the XML declaration and the XML prolog and other things within it.

The XML Prolog is the section at the beginning of an XML document which includes everything that appears before the document’s root element. The XML declaration, the DOCTYPE and any processing instructions or comments may all be a part of it. The following figure illustrates this concept.

The diagram highlights the XML Prolog at the beginning of a sample XHTML 1.0 document containing the XML declaration, a processing instruction, a comment and the DOCTYPE.

In fact, the XML Prolog is always present in every XML document, though it may in fact be empty because all of those are optional in some circumstances.

The XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

The XML declaration, if present, must occur at the very beginning of the file. It may not be preceded by anything except for a possible Byte Order Mark (depending on the character encoding). It is mostly used to provide XML version information and to declare the character encoding of the document. There is another thing called the standalone document declaration; but since it’s rarely needed or used and its purpose is not easy to explain, just ignore it.

Presently, only XML 1.0 and XML 1.1 are defined. Either may be used, but the decision should not be made lightly. Do not just use version="1.1" because it is higher version number. For most authors these days, version="1.0" should be used. In fact, unless you have a specific reason that requires the use of XML 1.1 features, you should stick with 1.0.

The encoding declaration, if present, must declare the encoding of the document. Authors may use any encoding supported by user agents, but are encouraged to use charsets registered with IANA (preferably UTF-8 or UTF-16). If the declaration is not present, the document must be encoded as UTF-8 or UTF-16 (unless it specified by a higher level protocol, like HTTP).

Processing Instructions

<?xml-stylesheet type="text/css" href="/style/design"?>

Processing Instructions are used to provide instructions to applications processing the document. The example of the xml-stylesheet PI given in the above diagram is used to instruct an application to apply a stylesheet to the document.

PIs can be used almost anywhere within the document. Though, only those that appear prior to the root element are considered part of the prolog.

Comments

<!-- This is a comment -->

Most people know what comments are, there’s not much I need to say about them. However, like PIs, they’re only considered part of the prolog if they appear before the root element.

The Document Type Declaration

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Many authors will have seen and used a DOCTYPE in their documents, although there are still many who don’t. The DOCTYPE is used to reference a Document Type Definition and is mostly used for validation purposes.

Many people know that using specific DOCTYPEs will trigger standards mode in browsers, but this does not apply to XML documents. DOCTYPE sniffing only applies to HTML documents (i.e. any document served as text/html). Browsers have, thankfully, not introduced it into XML processing. Henri Sivonen explains more about this in Activating the Right Layout Mode Using the Doctype Declaration.

1 thought on “XML Prologue

Comments are closed.