It is common practice when including inline scripts within HTML markup, to surround the entire script within what appears to be an HTML comment like this:
This technique is discussed in HTML 4.01, section 18.3.2 Hiding script data from user agents. However, few people really understand its purpose, what it really is, the problems it creates within XHTML documents, nor the theoretical problems it creates for HTML.
In this article, the term legacy user agents, legacy UAs (or
equivalent) is used to refer only to user agents implemented before the
were introduced as place holders in HTML 3.2.
Note: Although this document will be primarily focussing on the
the concepts presented apply equally to the
style element as well.
Purpose of the Comment
The purpose of the comment is to allow documents to degrade gracefully in legacy user agents by preventing them from outputting the script as content for the user to read. Since HTML user agents output the content of any unknown elements and because it would be unwise to do so for a script, the use of the comment is designed to ensure that this does not occur within legacy user agents.
It should be noted that there are no user agents in use today that donâ€™t support
script element (regardless of whether they support the actual
script or not), so using this technique on the web today seems rather superfluous.
Yet itâ€™s interesting to note that so many sites still make use of this old technique
that was designed for accessibility reasons, despite the fact that so many
of these sites choke in many other ways in browsers with scripts disabled or
Itâ€™s Not Really a Comment
The content model of the
script element in HTML is declared as
stands for character data and means that the content within the element is not
processed as markup, but as plain text. The only piece of markup which is recognised
is the end-tag open (
</. Where the
ETAGO occurs, it must
be for the elementâ€™s end-tag (
</script> in this case). This is actually
the cause of a really common
validation error in scripts for people that use
document.write() function or the
Because no other markup is recognised, the comment declaration is not really
a comment; but rather plain text that looks like a comment. Itâ€™s designed for
legacy UAs that donâ€™t read the DTD and are, therefore, unaware that the script
element actually contains
CDATA. Because it is designed for backwards compatibility
with legacy UAs processing it as markup, there are certain considerations that
should be made as a result of this.
There are two small theoretical problems which most people are unaware of but, since no browser has ever been a strictly conforming SGML parser, neither of which are of any practical concern. As you will see, it is in fact the bugs in the legacy UAs for which this technique is designed, that ensures it always works as intended.
For legacy browsers encountering the unknown
script element, they may or may
not know the content model, depending on whether or not theyâ€™ve read the DTD
or obtained the information from elsewhere.
In the case of the unknown content model, the parser would treat the elementâ€™s content as markup and hide the comment as expected, in most cases. However, there may be a problem caused by the presence of two hyphens within the script that is contained within the comment. Consider the following example:
Although that is perfect valid HTML 4.01, given that it is not really a comment,
this creates a problem for a legacy user agent that process that as a comment.
A comment delimiter in SGML is a pair of hyphens (
--) and only white
space may occur between the second comment delimiter and markup declaration
>. See the WDGâ€™s
explanation of HTML comments for more
For an SGML parser treating the elementâ€™s content as markup, the invalid comment syntax may potentially cause a problem and result in part of the script being output. As we will see later, this example will actually cause a fatal error in XHTML documents.
In the case of the known content model, since the parser is aware that the
content model is
CDATA, yet the
script element is still an unknown element for
the UA, it would process it as such and actually end up outputting the entire
content of the element as text, thus defeating the purpose of attempting to
hide the script.
For backwards compatibility, legacy UAs processing the elementâ€™s content as
markup is depended upon to hide the content, but for the above reasons it
means that it does not allow for backwards compatibility with any hypothetical
legacy user agent using a strictly conforming SGML parser, in all cases.
However, there are no browsers that do read the DTD and due to the bugs in
all real legacy browsers, neither of these issues have ever caused any real
world problems in HTML. It will also never cause any problem in the future
either, since all future implementations will support the
This technique does in fact cause real problems for XHTML documents that many authors are unaware of. It seems that Microsoft have fallen into this trap with their new Visual Web Developer 2005 Express application, as discussed by Charl van Niekerk in ASP.NET 2.0 â€“ Part 2. It also seems that the Movable Type developers have made this mistake too, as Jacques Distler pointed out recently.
In XHTML, the content model of the
script element is declared as
character data), not
CDATA, thus the content of the
script element is supposed
to be parsed as markup and the comment declaration really is a comment. Because
of this, XHTML UAs will (when the document is served as XML) ignore the content
of the comment, and thus ignore the script entirely.
However because the HTML
script element contains
CDATA and user agents treat
XHTML documents served as
text/html as HTML, they also treat the content of
script element as
CDATA, and thus the comment is not treated as a comment
by HTML 4 UAs. This is one of the many problems with serving XHTML as
As I mentioned earlier, example 2, which contains the extra pair of hyphens, will actually cause a fatal error in XHTML. In SGML, if it were a real comment, it would also be invalid, yet it would not be fatal since UAs employ error handling techniques to continue processing the document. In XML, however, it is a well-formedness error, which is fatal. Thus not only would the script be ignored because of the comment, but the entire document would be rendered totally useless for the user.
However, because most authors do incorrectly serve their XHTML documents as
text/html, and the UAs parse it as HTML, authors are generally not aware of
these issues. There are many other problems with using scripts for both HTML
and XHTML, but those issues are out of scope for this article and best left
for another day.
The Correct Method for XHTML
The correct way to use an inline script within XHTML is to escape it as character
data using a
CDATA section is necessary to ensure that special characters such as
not treated as markup, which would otherwise result in a well-formedness error
if it were not escaped as character references. The alternative is to encode
such characters with the character references like
the readability of the script would be reduced and is not backwards compatible
with HTML UAs, when XHTML is incorrectly served as text/html.
If your script doesnâ€™t make use of either of those special characters, then
CDATA section is not necessary, but itâ€™s a good habit to always
include it anyway.
Ignoring the fact that XHTML should not be served as
text/html and accepting
that it does happen in the real world, for an HTML 4 UA processing the above
script element, the
error since the only markup-like syntax that scripting engines allow is an SGML
comment declaration at the beginning of the first line. In order to allow the
script to be correctly recognised and processed by current HTML and XHTML UAs,
while still hiding from legacy UAs, a clever combination of comments and a
section needs to be used.
An HTML UA which correctly treats the
script element as
supposed to ignore everything following the comment declaration open delimiter
on the first line. i.e. everything after the first â€œ
<!--â€ should be
ignored. An XHTML UA will treat it as a comment followed by a
to escape the entire script.
In summary, an HTML 4 UA will pass the entire content of the
of markup, whereas an XHTML UA will only pass everything between
]]> (not inclusive). The end result is that both HTML and XHTML UAs treat
the script as
CDATA, as intended, with neither of them ignoring any of the actual
script, while providing some level of backwards compatibility with legacy user
This does, however, open up a whole new theoretical problem for any hypothetical
legacy UA that is unaware of the
script element and processes it
as markup with a conforming SGML parser. When example 4 is
processed as markup within SGML, it should be parsed identically to XML.
So the unknown
script element in a
legacy user agent with a strictly conforming SGML parser would output the
content of the script (within the
CDATA section) as text. However,
the only user agent I know of the supports the
CDATA section for HTML
documents is the new Opera 8. Thus, the above syntax is really only backwards
compatible with the non-conformant parsing behaviour of real world legacy
HTML UAs, which does not cause any practical problems.
Avoiding All of These Problems
To avoid all of these problems with scripts and the decision of whether or not to include the pseudo-comment declaration in HTML documents at all, the best solution is to always include scripts as external files. This has the advantage of not being unintentionally ignored by XHTML UAs, and not erroneously processed in anyway by legacy HTML/SGML UAs, regardless of whether they are conforming or not. It also helps to better separate the markup from the script, and makes the script more easily reusable in multiple documents.
However, you must remember, that simply solving all of these markup issues for your scripts in an XHTML document doesn’t necessarily mean the the script will work correctly for both XHTML and HTML when served with the correct MIME types. There are more issues such as document.write() not working for XML, the need to use the namespace aware DOM methods in XHTML and many more related issues.