{"id":72,"date":"2005-05-04T10:46:57","date_gmt":"2005-05-04T10:46:57","guid":{"rendered":"http:\/\/lachy.id.au\/log\/2005\/05\/72"},"modified":"2006-04-30T23:46:15","modified_gmt":"2006-04-30T23:46:15","slug":"script-comments","status":"publish","type":"post","link":"https:\/\/lachy.id.au\/log\/2005\/05\/script-comments","title":{"rendered":"HTML Comments in Scripts"},"content":{"rendered":"<p>It is common practice when including inline scripts within HTML markup, to\r\n\tsurround the entire script within what appears to be an HTML comment like this:<\/p>\r\n\r\n<div class=\"example\" id=\"example-1\">\r\n<h4>Example 1:<\/h4>\r\n<pre><code>&lt;script type=\"text\/javascript\"&gt;&lt;!--\r\n    ...\r\n\/\/--&gt;&lt;\/script&gt; <\/code><\/pre>\r\n<\/div>\r\n\r\n<p>This technique is discussed in HTML 4.01, section 18.3.2 <a href=\"http:\/\/www.w3.org\/TR\/html401\/interact\/scripts.html#h-18.3.2\">Hiding\r\n\t\tscript data from user agents<\/a>. However, few people really understand its purpose, what it\r\n\treally is, the problems it creates within XHTML documents, nor the theoretical\r\n\tproblems it creates for HTML.<\/p>\r\n<p>In this article, the term <dfn>legacy user agents<\/dfn>, <dfn>legacy UAs<\/dfn> (or\r\n\tequivalent) is used to refer only to user agents implemented before <a href=\"http:\/\/www.w3.org\/TR\/REC-html32#script\">the <code>script<\/code>\tand <code>style<\/code> elements\r\n\twere introduced as place holders in HTML 3.2<\/a>.<\/p>\r\n<p>Note: Although this document will be primarily focussing on the <code>script<\/code> element,\r\n\tthe concepts presented apply equally to the <code>style<\/code> element as well.<\/p>\r\n\r\n<h3 id=\"purpose\">Purpose of the Comment<\/h3>\r\n<p>The purpose of the comment is to allow documents to degrade gracefully in\r\n\tlegacy user agents by preventing them from outputting the script as content\r\n\tfor the user to read. Since HTML user agents output the content of any unknown\r\n\telements and because it would be unwise to do so for a script, the use of the\r\n\tcomment is designed to ensure that this does not occur within legacy user agents.<\/p>\r\n<p>It should be noted that there are no user agents in use today that don\u2019t support\r\n\tthe <code>script<\/code> element (regardless of whether they support the actual\r\n\tscript or not), so using this technique on the web today seems rather superfluous.\r\n\tYet it\u2019s interesting to note that so many sites still make use of this old technique\r\n\tthat was designed for accessibility reasons, despite the fact that so many\r\n\tof these sites choke in many other ways in browsers with scripts disabled or\r\n\tunsupported.<\/p>\r\n\r\n<h3 id=\"not-a-comment\">It\u2019s Not Really a Comment<\/h3>\r\n<p>The content model of the <code>script<\/code> element in HTML is declared as <code>CDATA<\/code>, which\r\n\tstands for character data and means that the content within the element is not\r\n\tprocessed as markup, but as plain text. The only piece of markup which is recognised\r\n\tis the end-tag open (<code>ETAGO<\/code>) delimiter: <code>&lt;\/<\/code>. Where the <code>ETAGO<\/code> occurs, it must\r\n\tbe for the element\u2019s end-tag ( <code>&lt;\/script&gt;<\/code> in this case). This is actually\r\n\tthe cause of a really <a href=\"http:\/\/htmlhelp.com\/tools\/validator\/problems.html.en#script\">common\r\n\tvalidation error in scripts<\/a> for people that use\r\n\tthe <code>document.write()<\/code> function or the <code>innerHTML<\/code> property.<\/p>\r\n<p>Because no other markup is recognised, the comment declaration is not really\r\n\ta comment; but rather plain text that looks like a comment. It\u2019s designed for\r\n\tlegacy UAs that don\u2019t read the DTD and are, therefore, unaware that the script\r\n\telement actually contains <code>CDATA<\/code>. Because it is designed for backwards compatibility\r\n\twith legacy UAs processing it as markup, there are certain considerations that\r\n\tshould be made as a result of this.<\/p>\r\n<p>There are two small theoretical problems which most people are unaware of\r\n\tbut, since no browser has ever been a strictly conforming SGML parser, neither\r\n\tof which are of any practical concern. As you will see, it is in fact the bugs\r\n\tin the legacy UAs for which this technique is designed, that ensures it always\r\n\tworks as intended.<\/p>\r\n<p>For legacy browsers encountering the unknown <code>script<\/code> element, they may or may\r\n\tnot know the content model, depending on whether or not they\u2019ve read the DTD\r\n\tor obtained the information from elsewhere.<\/p>\r\n<p>In the case of the unknown content model, the parser would treat the element\u2019s\r\n\tcontent as markup and hide the comment as expected, in most cases. However,\r\n\tthere may be a problem caused by the presence of two hyphens within the script\r\n\tthat is contained within the comment. Consider the following example:<\/p>\r\n\r\n<div class=\"example\" id=\"example-2\">\r\n<h4>Example 2:<\/h4>\r\n<pre><code>&lt;script type=\"text\/javascript\"&gt;&lt;!--\r\n var i;\r\n for (i = 10; i &gt; 0; i--) {\r\n \/\/ do something\r\n }\r\n \/\/--&gt;&lt;\/script&gt;<\/code><\/pre>\r\n <\/div>\r\n \r\n<p>Although that is perfect valid HTML 4.01, given that it is not really a comment,\r\n\tthis creates a problem for a legacy user agent that process that as a comment.\r\n\tA comment delimiter in SGML is a pair of hyphens (<code>--<\/code>) and only white\r\n\tspace may occur between the second comment delimiter and markup declaration\r\n\tclose (<code>MDC<\/code>) delimiter: <code>&gt;<\/code>. See the <a href=\"http:\/\/www.htmlhelp.com\/reference\/wilbur\/misc\/comment.html\">WDG\u2019s\r\n\texplanation of HTML comments<\/a> for more\r\n\tinformation.<\/p>\r\n<p>For an SGML parser treating the element\u2019s content as markup, the invalid comment\r\n\tsyntax may potentially cause a problem and result in part of the script being\r\n\toutput. As we will see later, this example will actually cause a fatal error\r\n\tin XHTML documents.<\/p>\r\n<p>In the case of the known content model, since the parser is aware that the\r\n\tcontent model is <code>CDATA<\/code>, yet the <code>script<\/code> element is still an unknown element for\r\n\tthe UA, it would process it as such and actually end up outputting the entire\r\n\tcontent of the element as text, thus defeating the purpose of attempting to\r\n\thide the script.<\/p>\r\n<p>For backwards compatibility, legacy UAs processing the element\u2019s content as\r\n\tmarkup is depended upon to hide the content, but for the above reasons it\r\n\tmeans that it does not allow for backwards compatibility with any hypothetical\r\n\tlegacy user agent using a strictly conforming SGML parser, in all cases.\r\n\tHowever, there are no browsers that do read the DTD and due to the bugs in\r\n\tall real legacy browsers, neither of these issues have ever caused any real\r\n\tworld problems in HTML. It will also never cause any problem in the future\r\n\teither, since all future implementations will support the <code>script<\/code> element.<\/p>\r\n\r\n<h3 id=\"xhtml-problems\">XHTML Problems<\/h3>\r\n<p>This technique does in fact cause real problems for XHTML documents that many\r\n\tauthors are unaware of. It seems that Microsoft have fallen into this trap with\r\n\ttheir new Visual Web Developer 2005 Express application, as discussed by <a href=\"http:\/\/charlvn.blogspot.com\/\">Charl\r\n\tvan Niekerk<\/a> in <a href=\"http:\/\/charlvn.blogspot.com\/2005\/04\/aspnet-20-part-1.html\">ASP.NET\r\n\t2.0 \u2013 Part 2<\/a>. It also seems that the <a href=\"http:\/\/www.sixapart.com\/movabletype\/\">Movable\r\n\tType<\/a> developers\r\n\thave made this mistake too, as <a href=\"http:\/\/golem.ph.utexas.edu\/~distler\/blog\/archives\/000564.html\">Jacques\r\n\tDistler pointed out<\/a> recently.<\/p>\r\n<p>In XHTML, the content model of the <code>script<\/code> element is declared as <code>#PCDATA<\/code> (parsed\r\n\tcharacter data), not <code>CDATA<\/code>, thus the content of the <code>script<\/code> element is supposed\r\n\tto be parsed as markup and the comment declaration really is a comment. Because\r\n\tof this, XHTML UAs will (when the document is served as XML) ignore the content\r\n\tof the comment, and thus ignore the script entirely.<\/p>\r\n<p>However because the HTML <code>script<\/code> element contains <code>CDATA<\/code> and user agents treat\r\n\tXHTML documents served as <code>text\/html<\/code> as HTML, they also treat the content of\r\n\tthe <code>script<\/code> element as <code>CDATA<\/code>, and thus the comment is not treated as a comment\r\n\tby HTML 4 UAs. This is one of the many problems with serving XHTML as <code>text\/html<\/code>.<\/p>\r\n<p>As I mentioned earlier, <a href=\"#example-2\">example 2<\/a>, which contains the extra pair of hyphens,\r\n\twill actually cause a fatal error in XHTML. In SGML, if it were a real comment,\r\n\tit would also be invalid, yet it would not be fatal since UAs employ error handling\r\n\ttechniques to continue processing the document. In XML, however, it is a well-formedness\r\n\terror, which is fatal. Thus not only would the script be ignored because of\r\n\tthe comment, but the entire document would be rendered totally useless for the\r\n\tuser.<\/p>\r\n<p>However, because most authors do incorrectly serve their XHTML documents as\r\n\t<code>text\/html<\/code>, and the UAs parse it as HTML, authors are generally not aware of\r\n\tthese issues. There are many other problems with using scripts for both HTML\r\n\tand XHTML, but those issues are out of scope for this article and best left\r\n\tfor another day.<\/p>\r\n\r\n<h3 id=\"correct-method\">The Correct Method for XHTML<\/h3>\r\n<p>The correct way to use an inline script within XHTML is to escape it as character\r\n\tdata using a <code>CDATA<\/code> section:<\/p>\r\n\r\n<div class=\"example\" id=\"example-3\">\r\n<h4>Example 3:<\/h4>\r\n<pre><code>&lt;script type=\"text\/javascript\"&gt;&lt;![CDATA[\r\n    var i = 5;\r\n    if (i &lt; 10) {\r\n        \/\/ do something\r\n    }\r\n\/\/]]&gt;&lt;\/script&gt; <\/code><\/pre>\r\n<\/div>\r\n \r\n<p>The <code>CDATA<\/code> section is necessary to ensure that special characters such as <code>&lt;<\/code> and <code>&amp;<\/code> are\r\n\tnot treated as markup, which would otherwise result in a well-formedness error\r\n\tif it were not escaped as character references. The alternative is to encode\r\n\tsuch characters with the character references like <code>&amp;lt;<\/code> and <code>&amp;amp;<\/code>, however\r\n\tthe readability of the script would be reduced and is not backwards compatible\r\n\twith HTML UAs, when XHTML is incorrectly served as text\/html.<\/p>\r\n<p>If your script doesn\u2019t make use of either of those special characters, then\r\n\tthe <code>CDATA<\/code> section is not necessary, but it\u2019s a good habit to always\r\n\tinclude it anyway.<\/p>\r\n\r\n<h3 id=\"back-compat\">Backwards Compatibility<\/h3>\r\n<p>Ignoring the fact that XHTML should not be served as <code>text\/html<\/code> and accepting\r\n\tthat it does happen in the real world, for an HTML 4 UA processing the above\r\n\tXHTML <code>script<\/code> element, the <code>CDATA<\/code> section markup results in a JavaScript syntax\r\n\terror since the only markup-like syntax that scripting engines allow is an SGML\r\n\tcomment declaration at the beginning of the first line. In order to allow the\r\n\tscript to be correctly recognised and processed by current HTML and XHTML UAs,\r\n\twhile still hiding from legacy UAs, a clever combination of comments and a <code>CDATA<\/code>\r\n\tsection needs to be used.<\/p>\r\n\r\n<div class=\"example\" id=\"example-4\">\r\n<h4>Example 4:<\/h4>\r\n<pre><code>&lt;script type=\"text\/javascript\"&gt;&lt;!--\/\/--&gt;&lt;![CDATA[\/\/&gt;&lt;!--\r\n    ...\r\n\/\/--&gt;&lt;!]]&gt;&lt;\/script&gt; <\/code><\/pre>\r\n<\/div>\r\n\r\n<p>An HTML UA which correctly treats the <code>script<\/code> element as <code>CDATA<\/code> is\r\n\tsupposed to ignore everything following the comment declaration open delimiter\r\n\ton the first line. i.e. everything after the first \u201c<code>&lt;!--<\/code>\u201d should be\r\n\tignored. An XHTML UA will treat it as a comment followed by a <code>CDATA<\/code> section\r\n\tto escape the entire script.<\/p>\r\n<p>In summary, an HTML 4 UA will pass the entire content of the <code>script<\/code> element\r\n\toff to the JavaScript engine, which will quite happily ignore the first line\r\n\tof markup, whereas an XHTML UA will only pass everything between <code>&lt;![CDATA[<\/code>\tand <code>]]&gt;<\/code> (not inclusive). The end result is that both HTML and XHTML UAs treat\r\n\tthe script as <code>CDATA<\/code>, as intended, with neither of them ignoring any of the actual\r\n\tscript, while providing some level of backwards compatibility with legacy user\r\n\tagents.<\/p>\r\n<p>This does, however, open up a whole new theoretical problem for any hypothetical\r\n\tlegacy UA that is unaware of the <code>script<\/code> element and processes it\r\n\tas markup with a conforming SGML parser. When <a href=\"#example-4\">example 4<\/a> is\r\n\tprocessed as markup within SGML, it should be parsed identically to XML.\r\n\tSo the unknown <code>script<\/code> element in a\r\n\tlegacy user agent with a strictly conforming SGML parser would output the\r\n\tcontent of the script (within the <code>CDATA<\/code> section) as text. However,\r\n\tthe only user agent I know of the supports the <code>CDATA<\/code> section for HTML\r\n\tdocuments is the new Opera 8. Thus, the above syntax is really only backwards\r\n\tcompatible with the non-conformant parsing behaviour of real world legacy\r\n\tHTML UAs, which does not cause any practical problems.<\/p>\r\n\r\n<h3 id=\"avoid-problems\">Avoiding All of These Problems<\/h3>\r\n<p>To avoid all of these problems with scripts and the decision of whether or\r\n\tnot to include the pseudo-comment declaration in HTML documents at all, the\r\n\tbest solution is to always include scripts as external files. This has the advantage\r\n\tof not being unintentionally ignored by XHTML UAs, and not erroneously processed\r\n\tin anyway by legacy HTML\/SGML UAs, regardless of whether they are conforming\r\n\tor not. It also helps to better separate the markup from the script, and makes\r\n\tthe script more easily reusable in multiple documents.<\/p>\r\n<p>However, you must remember, that simply solving all of these markup issues\r\n\tfor your scripts in an XHTML document doesn&#8217;t necessarily mean the the script\r\n\twill work correctly for both XHTML and HTML when served with the correct\r\n\tMIME types. There are more issues such as document.write() not working for\r\n\tXML, the need to use the namespace aware DOM methods in XHTML and many more\r\n\trelated issues.<\/p>","protected":false},"excerpt":{"rendered":"It is common practice when including inline scripts within HTML markup, to surround the entire script within what appears to be an HTML comment but few people really understand its purpose, what it really is, the problems it creates.","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,2,4,7],"tags":[],"_links":{"self":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/72"}],"collection":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/comments?post=72"}],"version-history":[{"count":0,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/72\/revisions"}],"wp:attachment":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/media?parent=72"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/categories?post=72"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/tags?post=72"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}