HTML Comments in Scripts

It is common practice when including inline scripts within HTML markup, to surround the entire script within what appears to be an HTML comment like this:

Example 1:

<script type="text/javascript"><!--
    ...
//--></script> 

This technique is discussed in HTML 4.01, section 18.3.2 Hiding script data from user agents. However, few people really understand its purpose, what it really is, the problems it creates within XHTML documents, nor the theoretical problems it creates for HTML.

In this article, the term legacy user agents, legacy UAs (or equivalent) is used to refer only to user agents implemented before the script and style elements were introduced as place holders in HTML 3.2.

Note: Although this document will be primarily focussing on the script element, the concepts presented apply equally to the style element as well.

Purpose of the Comment

The purpose of the comment is to allow documents to degrade gracefully in legacy user agents by preventing them from outputting the script as content for the user to read. Since HTML user agents output the content of any unknown elements and because it would be unwise to do so for a script, the use of the comment is designed to ensure that this does not occur within legacy user agents.

It should be noted that there are no user agents in use today that don’t support the script element (regardless of whether they support the actual script or not), so using this technique on the web today seems rather superfluous. Yet it’s interesting to note that so many sites still make use of this old technique that was designed for accessibility reasons, despite the fact that so many of these sites choke in many other ways in browsers with scripts disabled or unsupported.

It’s Not Really a Comment

The content model of the script element in HTML is declared as CDATA, which stands for character data and means that the content within the element is not processed as markup, but as plain text. The only piece of markup which is recognised is the end-tag open (ETAGO) delimiter: </. Where the ETAGO occurs, it must be for the element’s end-tag ( </script> in this case). This is actually the cause of a really common validation error in scripts for people that use the document.write() function or the innerHTML property.

Because no other markup is recognised, the comment declaration is not really a comment; but rather plain text that looks like a comment. It’s designed for legacy UAs that don’t read the DTD and are, therefore, unaware that the script element actually contains CDATA. Because it is designed for backwards compatibility with legacy UAs processing it as markup, there are certain considerations that should be made as a result of this.

There are two small theoretical problems which most people are unaware of but, since no browser has ever been a strictly conforming SGML parser, neither of which are of any practical concern. As you will see, it is in fact the bugs in the legacy UAs for which this technique is designed, that ensures it always works as intended.

For legacy browsers encountering the unknown script element, they may or may not know the content model, depending on whether or not they’ve read the DTD or obtained the information from elsewhere.

In the case of the unknown content model, the parser would treat the element’s content as markup and hide the comment as expected, in most cases. However, there may be a problem caused by the presence of two hyphens within the script that is contained within the comment. Consider the following example:

Example 2:

<script type="text/javascript"><!--
 var i;
 for (i = 10; i > 0; i--) {
 // do something
 }
 //--></script>

Although that is perfect valid HTML 4.01, given that it is not really a comment, this creates a problem for a legacy user agent that process that as a comment. A comment delimiter in SGML is a pair of hyphens (--) and only white space may occur between the second comment delimiter and markup declaration close (MDC) delimiter: >. See the WDG’s explanation of HTML comments for more information.

For an SGML parser treating the element’s content as markup, the invalid comment syntax may potentially cause a problem and result in part of the script being output. As we will see later, this example will actually cause a fatal error in XHTML documents.

In the case of the known content model, since the parser is aware that the content model is CDATA, yet the script element is still an unknown element for the UA, it would process it as such and actually end up outputting the entire content of the element as text, thus defeating the purpose of attempting to hide the script.

For backwards compatibility, legacy UAs processing the element’s content as markup is depended upon to hide the content, but for the above reasons it means that it does not allow for backwards compatibility with any hypothetical legacy user agent using a strictly conforming SGML parser, in all cases. However, there are no browsers that do read the DTD and due to the bugs in all real legacy browsers, neither of these issues have ever caused any real world problems in HTML. It will also never cause any problem in the future either, since all future implementations will support the script element.

XHTML Problems

This technique does in fact cause real problems for XHTML documents that many authors are unaware of. It seems that Microsoft have fallen into this trap with their new Visual Web Developer 2005 Express application, as discussed by Charl van Niekerk in ASP.NET 2.0 – Part 2. It also seems that the Movable Type developers have made this mistake too, as Jacques Distler pointed out recently.

In XHTML, the content model of the script element is declared as #PCDATA (parsed character data), not CDATA, thus the content of the script element is supposed to be parsed as markup and the comment declaration really is a comment. Because of this, XHTML UAs will (when the document is served as XML) ignore the content of the comment, and thus ignore the script entirely.

However because the HTML script element contains CDATA and user agents treat XHTML documents served as text/html as HTML, they also treat the content of the script element as CDATA, and thus the comment is not treated as a comment by HTML 4 UAs. This is one of the many problems with serving XHTML as text/html.

As I mentioned earlier, example 2, which contains the extra pair of hyphens, will actually cause a fatal error in XHTML. In SGML, if it were a real comment, it would also be invalid, yet it would not be fatal since UAs employ error handling techniques to continue processing the document. In XML, however, it is a well-formedness error, which is fatal. Thus not only would the script be ignored because of the comment, but the entire document would be rendered totally useless for the user.

However, because most authors do incorrectly serve their XHTML documents as text/html, and the UAs parse it as HTML, authors are generally not aware of these issues. There are many other problems with using scripts for both HTML and XHTML, but those issues are out of scope for this article and best left for another day.

The Correct Method for XHTML

The correct way to use an inline script within XHTML is to escape it as character data using a CDATA section:

Example 3:

<script type="text/javascript"><![CDATA[
    var i = 5;
    if (i < 10) {
        // do something
    }
//]]></script> 

The CDATA section is necessary to ensure that special characters such as < and & are not treated as markup, which would otherwise result in a well-formedness error if it were not escaped as character references. The alternative is to encode such characters with the character references like &lt; and &amp;, however the readability of the script would be reduced and is not backwards compatible with HTML UAs, when XHTML is incorrectly served as text/html.

If your script doesn’t make use of either of those special characters, then the CDATA section is not necessary, but it’s a good habit to always include it anyway.

Backwards Compatibility

Ignoring the fact that XHTML should not be served as text/html and accepting that it does happen in the real world, for an HTML 4 UA processing the above XHTML script element, the CDATA section markup results in a JavaScript syntax error since the only markup-like syntax that scripting engines allow is an SGML comment declaration at the beginning of the first line. In order to allow the script to be correctly recognised and processed by current HTML and XHTML UAs, while still hiding from legacy UAs, a clever combination of comments and a CDATA section needs to be used.

Example 4:

<script type="text/javascript"><!--//--><![CDATA[//><!--
    ...
//--><!]]></script> 

An HTML UA which correctly treats the script element as CDATA is supposed to ignore everything following the comment declaration open delimiter on the first line. i.e. everything after the first “<!--” should be ignored. An XHTML UA will treat it as a comment followed by a CDATA section to escape the entire script.

In summary, an HTML 4 UA will pass the entire content of the script element off to the JavaScript engine, which will quite happily ignore the first line of markup, whereas an XHTML UA will only pass everything between <![CDATA[ and ]]> (not inclusive). The end result is that both HTML and XHTML UAs treat the script as CDATA, as intended, with neither of them ignoring any of the actual script, while providing some level of backwards compatibility with legacy user agents.

This does, however, open up a whole new theoretical problem for any hypothetical legacy UA that is unaware of the script element and processes it as markup with a conforming SGML parser. When example 4 is processed as markup within SGML, it should be parsed identically to XML. So the unknown script element in a legacy user agent with a strictly conforming SGML parser would output the content of the script (within the CDATA section) as text. However, the only user agent I know of the supports the CDATA section for HTML documents is the new Opera 8. Thus, the above syntax is really only backwards compatible with the non-conformant parsing behaviour of real world legacy HTML UAs, which does not cause any practical problems.

Avoiding All of These Problems

To avoid all of these problems with scripts and the decision of whether or not to include the pseudo-comment declaration in HTML documents at all, the best solution is to always include scripts as external files. This has the advantage of not being unintentionally ignored by XHTML UAs, and not erroneously processed in anyway by legacy HTML/SGML UAs, regardless of whether they are conforming or not. It also helps to better separate the markup from the script, and makes the script more easily reusable in multiple documents.

However, you must remember, that simply solving all of these markup issues for your scripts in an XHTML document doesn’t necessarily mean the the script will work correctly for both XHTML and HTML when served with the correct MIME types. There are more issues such as document.write() not working for XML, the need to use the namespace aware DOM methods in XHTML and many more related issues.

9 thoughts on “HTML Comments in Scripts

  1. Interesting article. I’ve hit up against these problems a lot because we use XSL to output XHTML, so inline scripts have to be valid XML inside the system. As you mentioned, the only easy way to solve this has been to move the scripts to an external location.

  2. I had honestly forgotten about that practice until now. I haven’t done anything of the sort for years now; except on those rare occasions when I serve up ‘real’ XHTML, when I of course use CDATA wrappers.
    A user agent that doesn’t support the script element is practically non-existent, so why bother worrying about it? Your example 4 is absolutely awful! This to me would be like omitting the DOCTYPE declaration because ‘legacy’ browsers will render it as text!
    And you are of course correct, it’s best to use external script files whenever possible.

  3. Good post!

    the best solution is to always include scripts as external files

    Absolutely, but there are cases when that’s not an option (for instance, when one will use variables whose value will be dynamically generated, from, for example, C# code in .NET).

    Who should we solve that problem? Have an .aspx page that returns itself as a JavaScript file while having generated variables?
    Or use the commenting function you proposed above?

    I don’t know myself…
    Any good ideas?

  4. - It seems that innerHTML is not working in a XHTML context. Do you know what kind of alternatives could be used instead ? Will innerHTML be implemented in a way or another in XHTML in the future ?

    - I assume that the namespare-aware functions in Javascript are those to create and manipulate DOM nodes… Does someone has a good pointer for such techniques ?

    Thanks !

  5. Hi Lachy!

    I was searchig for explanations to the errors delivered by the W3C validation service. They don’t give much of human readable help :)

    Your article on the matter is really beutiful.
    I learned a lot.

    Great job!

Comments are closed.