{"id":75,"date":"2005-05-19T10:36:47","date_gmt":"2005-05-19T10:36:47","guid":{"rendered":"http:\/\/lachy.id.au\/log\/2005\/05\/validation-quiz-explanation"},"modified":"2006-04-30T23:46:03","modified_gmt":"2006-04-30T23:46:03","slug":"validation-quiz-explanation","status":"publish","type":"post","link":"https:\/\/lachy.id.au\/log\/2005\/05\/validation-quiz-explanation","title":{"rendered":"Validation Quiz Explanation"},"content":{"rendered":"<p>Last week, I\u2019m sure you all had fun trying to get your mind around understanding\r\n    the incredibly complex, yet almost entirely valid markup in the\r\n    <a href=\"http:\/\/lachy.id.au\/log\/2005\/05\/validation-quiz\">validation    quiz<\/a>.\r\n    It was solved a lot sooner than I had expected by both <a href=\"http:\/\/annevankesteren.nl\/\">Anne\r\n    van Kesteren<\/a> and\r\n    <a href=\"http:\/\/hzr.dzygn.com\/\">David H\u00e5s\u00e4ther<\/a>. Anne was correct, but his explanation wasn\u2019t quite satisfactory\r\n    enough to win. <a href=\"http:\/\/lachy.id.au\/log\/2005\/05\/validation-quiz#comment-289\">David\u2019s\r\n    explanation<\/a> was spot on. Well done to both of them.<\/p>\r\n<p>For the rest of you who aren\u2019t SGML experts, and are still trying to figure\r\n    how a non-conformant XML declaration in an HTML document with 2 DOCTYPEs can\r\n    be valid, read on to find out.<\/p>\r\n<h3 id=\"xml-declaration\">The XML Declaration<\/h3>\r\n<p>Despite appearances to the contrary, the first line is not an XML declaration\r\n    at all.<\/p>\r\n\r\n<pre><code class=\"markup\">&lt;?xml version=\"1.0\" comment=\"Find the Error!\" ?&gt;<\/code><\/pre>\r\n\r\n<p>In SGML, it is a Processing Instruction, which just happens to look like somewhat\r\n    an XML declaration. Although the meaning of PI is undefined in SGML and HTML,\r\n    it still passes validation. If the document were served with an XML MIME type,\r\n    rather than text\/html, then an XML parser would try to process it as an XML\r\n    declaration, although it would be non-conformant since there is no comment attribute\r\n    defined for it in the XML recommendation.<\/p>\r\n<p>It is actually included as the first of two exploits for known validator bugs.\r\n    This bug \u2014 <a href=\"http:\/\/www.w3.org\/Bugs\/Public\/show_bug.cgi?id=14\" title=\"XHTML Detection is over-eager.\">bug\r\n    14<\/a>\r\n    to be precise \u2014 prematurely sets the validator to XML parsing mode. Because\r\n    of this, the validator incorrectly parses the comments, DOCTYPEs and everything\r\n    else that follows as ill-formed and invalid XHTML. This bug is the cause\r\n    of the 80 incorrect errors being issued for the entire document.<\/p>\r\n<h3 id=\"xhtml-10-strict-doctype\">The XHTML 1.0 Strict DOCTYPE<\/h3>\r\n<p>Knowing that the pseudo-XML-declaration above is really a valid SGML PI and\r\n    that the document is served as text\/html, the comment declarations and DOCTYPEs\r\n    should be parsed with SGML rules, not with XML rules like the validator does\r\n    incorrectly.<\/p>\r\n<pre><code class=\"markup\">&lt;!-- -- --&gt;\r\n&lt;!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD XHTML 1.0 Strict\/\/EN\"\r\n\"http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-strict.dtd\"&gt;\r\n&lt;!-- -- --&gt;<\/code><\/pre>\r\n<p>As I briefly mentioned in <a href=\"http:\/\/lachy.id.au\/log\/2005\/05\/script-comments\">HTML\r\n    Comments in Scripts<\/a> and which is <a href=\"http:\/\/www.htmlhelp.com\/reference\/wilbur\/misc\/comment.html\" title=\"HTML Comments\">discussed\r\n    in more detail by the WDG<\/a>, a comment declaration starts with a <dfn>markup\r\n    declaration open<\/dfn> (<code class=\"markup\">MDO<\/code>): <code class=\"markup\">&lt;!<\/code>, ends with a <dfn>markup declaration close<\/dfn> (<code class=\"markup\">MDC<\/code>) <code class=\"markup\">&gt;<\/code> and contains\r\n    one or more comments. Each comment within the comment declaration starts with\r\n    and ends with a matching pair of hyphens. Because of this, despite appearances,\r\n    there is actually only one comment declaration that surrounds the XHTML 1.0\r\n    DOCTYPE and contains 3 separate comments.<\/p>\r\n<p>The first comment, between the first and second pair of hyphens, and the third\r\n    comment, between the fifth and sixth pair, each only contain a space. The second\r\n    comment, between the third and fourth pair, contains everything in between,\r\n    including the XHTML DOCTYPE. The entire comment declaration ends just before\r\n    the HTML 4.01 DOCTYPE, making it an HTML 4.01 document.<\/p>\r\n<p>This section is essentially the same as the following:<\/p>\r\n<pre><code class=\"markup\">&lt;!--\r\n&lt;!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD XHTML 1.0 Strict\/\/EN\"\r\n\"http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-strict.dtd\"&gt;\r\n--&gt;<\/code> <\/pre>\r\n\r\n\r\n<h3 id=\"html-401-doctype\">The HTML 4.01 DOCTYPE<\/h3>\r\n<p>The second exploit for a known bug in the validator \u2014 <a href=\"http:\/\/www.w3.org\/Bugs\/Public\/show_bug.cgi?id=24\" title=\"HTML::Parser in XML mode doesn't work with lowercase doctypes\">bug\r\n    24<\/a> \u2014 is actually used in the HTML 4.01 DOCTYPE.<\/p>\r\n\r\n<pre><code class=\"markup\">&lt;!doctype html public \"-\/\/W3C\/\/DTD HTML 4.01\/\/EN\" [\r\n&lt;!ENTITY smile CDATA \"?\" -- U+263A WHITE SMILING FACE --&gt;\r\n]&gt;<\/code><\/pre>\r\n\r\n<p>In SGML, it is perfectly valid to write <code class=\"markup\">doctype html public<\/code> in\r\n    lowercase, although it is conventional to use uppercase. With a lower case <code class=\"markup\">DOCTYPE<\/code>,\r\n    the validator does not identify the document as HTML 4.01, and (if it were\r\n    valid) would only state <dfn>This Page Is Valid!<\/dfn>, rather than <dfn>This Page\r\n    Is Valid HTML 4.01 Strict!<\/dfn><\/p>\r\n<p>The HTML 4.01 <code class=\"markup\">DOCTYPE<\/code> also includes an internal subset. i.e. Everything between\r\n    the square brackets. This, in addition to everything defined in the HTML 4.01\r\n    Strict DTD, defines a new entity \u201c<code class=\"markup\">smile<\/code>\u201d, representing the character <code class=\"unicode\">U+263A<\/code>,\r\n    a white smiling face (<samp>?<\/samp>). This entity may be referenced using entity reference: <code class=\"markup\">&amp;smile;<\/code>,\r\n    as is done later in the document.<\/p>\r\n<p>This is equivalent to the following:<\/p>\r\n<pre><code class=\"markup\">&lt;!DOCTYPE HTML PUBLIC \"-\/\/W3C\/\/DTD HTML 4.01\/\/EN\" [\r\n&lt;!ENTITY smile CDATA \"?\" -- U+263A WHITE SMILING FACE --&gt;\r\n]&gt;<\/code><\/pre>\r\n<p>However, because internal subsets are unsupported, it will be omitted from     the final document and the entity reference will be replaced with the real character     later.<\/p>\r\n<pre><code class=\"markup\">&lt;!DOCTYPE HTML PUBLIC \"-\/\/W3C\/\/DTD HTML 4.01\/\/EN\"&gt;<\/code><\/pre>\r\n<h3 id=\"document-head\">The Document Head<\/h3>\r\n<pre><code class=\"markup\">&lt;html lang=\"en\"&gt;\r\n&lt;title\/validation quiz\/\r\n&lt;\/head&gt;<\/code><\/pre>\r\n<p>There is nothing strange about the <code class=\"markup\">html<\/code> start-tag, it\u2019s fairly standard and\r\n    just describes the document\u2019s language as being English (<code class=\"lang\">en<\/code>).<\/p>\r\n<p>The start-tag for the <code class=\"markup\">head<\/code> element has been omitted. This is perfect valid\r\n    since both the start- and end-tags for the <code class=\"markup\">html<\/code>, <code class=\"markup\">head<\/code>, <code class=\"markup\">body<\/code> and <code class=\"markup\">tbody<\/code> elements\r\n    are optional. Even though the start-tag is omitted, the <code class=\"markup\">head<\/code> element\u2019s start-tag\r\n    is still implied by the presence of the <code class=\"markup\">title<\/code> element and, despite the missing\r\n    start-tag, it is still valid to include the end-tag.<\/p>\r\n<p>The <code class=\"markup\">title<\/code> element uses a special syntax, known as <dfn>SHORTTAG NET<\/dfn> (Null\r\n    End Tag). Many of the SGML SHORTTAG features are unsupported in real world\r\n    browsers, but that doesn\u2019t make this any less valid. The first solidus closes\r\n    the start-tag (known as a <dfn>net-enabling start-tag close delimiter<\/dfn>) and begins\r\n    the element\u2019s content. The second solidus is the null-end-tag, which closes\r\n    the element.<\/p>\r\n<p>The markup for this section is exactly equivalent to this:<\/p>\r\n<pre><code class=\"markup\">&lt;html lang=\"en\"&gt;\r\n&lt;head&gt;\r\n  &lt;title&gt;validation quiz&lt;\/title&gt;\r\n&lt;\/head&gt;<\/code> <\/pre>\r\n<h3 id=\"document-body\">The Document Body<\/h3>\r\n<p>Like the <code class=\"markup\">head<\/code> element, the start-tag for the <code class=\"markup\">body<\/code> element\r\n    has been omitted. However, as we will see later, the end-tag hasn\u2019t been,\r\n    although its presence is not immediately obvious. The <code class=\"markup\">body<\/code> element\r\n    is implied by the first paragraph, immediately following the end-tag for\r\n    the <code class=\"markup\">head<\/code> element.\r\n    (Note: It is not implied simply by the end-tag for the <code class=\"markup\">head<\/code> element,\r\n    even though the next element must be a <code class=\"markup\">body<\/code> element.)<\/p>\r\n<h3 id=\"p1\">Paragraph 1<\/h3>\r\n<p>The first paragraph is quite straight forward.<\/p>\r\n\r\n<pre><code class=\"markup\">&lt;p&gt;In this document, there &amp;exist;s a single validation error.\r\n   It makes use of some &lt;strong&lt;em\/very\/&lt;\/strong&gt; uncommon &amp;\r\n   unsupported markup techniques designed to fool the faint\r\n   hearted.<\/code><\/pre>\r\n\r\n<p>It starts with the <code class=\"markup\">p<\/code> start-tag, which is required, but the optional\r\n    end-tag is omitted (which is important, as we will see for paragraph 2). It\r\n    contains an entity reference, <code class=\"markup\">&amp;exist;<\/code> which is defined in the character\r\n    entity references for symbols, mathematical symbols, and Greek letters.<\/p>\r\n<p>The <code class=\"markup\">strong<\/code> and <code class=\"markup\">em<\/code> elements in this may look invalid, but they are not. The\r\n    <code class=\"markup\">em<\/code> element makes use of the same SHORTTAG NET syntax used for the <code class=\"markup\">title<\/code> element.\r\n    The <code class=\"markup\">strong<\/code> element, however, is a little more confusing. The start-tag is unclosed\r\n    \u2014 it omits the <dfn>tag close delimiter<\/dfn> (<code class=\"markup\">TAGC<\/code>) <code class=\"markup\">&gt;<\/code>. In SGML, <code class=\"markup\">TAGC<\/code> may be omitted\r\n    when the first non-white space character is a <dfn>tag open delimiter<\/dfn> (<code class=\"markup\">TAGO<\/code>) <code class=\"markup\">&lt;<\/code>.<\/p>\r\n<p>There is also a lone ampersand within this paragraph. Because the ampersand\r\n    usually represents the start of an entity reference; ampersands are, in many\r\n    circumstances, required to be written using the entity reference <code class=\"markup\">&amp;amp;<\/code>.\r\n    However, there are valid cases in SGML where this is not required, such as when\r\n    immediately followed by white space or other character that may not be part\r\n    of an entity name.<\/p>\r\n<p>As stated previously, because this is the first paragraph, it implies the\r\n    presence of the <code class=\"markup\">body<\/code> start-tag. The end-tag is actually implied by the start\r\n    of the next paragraph, but I\u2019ve included it in this section because it\u2019s easier\r\n    and makes no difference to the end result. Thus the markup for this section\r\n    is equivalent to this:<\/p>\r\n\r\n<pre><code class=\"markup\">&lt;body&gt;\r\n  &lt;p&gt;In this document, there &amp;exist;s a single validation error.\r\n     It makes use of some &lt;strong&gt;&lt;em&gt;very&lt;\/em&gt;&lt;\/strong&gt; uncommon\r\n     &amp;amp; unsupported markup techniques designed to fool the\r\n     faint hearted.&lt;\/p&gt;<\/code><\/pre>\r\n\r\n<h3 id=\"p2\">Paragraph 2<\/h3>\r\n<p>The second paragraph is a little more complex. It starts and ends with tags\r\n    that are missing the tag name. In SGML terms, these are respectively known as\r\n    <dfn>empty start-tags<\/dfn> and <dfn>empty end-tags<\/dfn>.<\/p>\r\n\r\n<pre><code class=\"markup\">&lt;&gt;This exploits some known bugs in &lt;a href=http:\/\/validator.w3.org\/\r\n  to both help prevent cheaters and confuse even the most experienced\r\n  authors.&lt;\/&gt;<\/code><\/pre>\r\n\r\n<p>According to the rules of SGML, an empty start-tag represents the same element\r\n    as the most recently opened element within the tree. There is a condition in\r\n    SGML that changes this rule, but that small detail will be omitted because it\r\n    is not relevant to HTML. For a full explanation, see <a href=\"http:\/\/www.is-thought.co.uk\/book\/sgml-9.htm#Empty\">section\r\n    9.3.1 Empty Tags<\/a>    in Martin Bryan\u2019s <a href=\"http:\/\/www.is-thought.co.uk\/book\/home.htm\">SGML\r\n    and HTML Explained<\/a>.<\/p>\r\n<p>Because the end-tag for the first paragraph was omitted (recall that I said\r\n    it was important), the paragraph element is still open, and thus the empty start-tag\r\n    is recognised as a paragraph start-tag as well. As I stated previously, this\r\n    start-tag also implies the end-tag for the previous paragraph. The empty end-tag\r\n    simply closes the most recently opened element within the tree, which is this\r\n    paragraph.<\/p>\r\n<p>The a element in this paragraph also looks invalid because it appears to be\r\n    missing both the <code class=\"markup\">TAGC<\/code> and an end tag., yet neither are\r\n    really missing and it is not invalid. However, the markup does not mean what\r\n    it appears to mean at first glance. As explained above for the SHORTTAG NET\r\n    syntax used for the title element, the first solidus is the NET enabling\r\n    start-tag close delimiter and the second in the null end-tag. These are processed\r\n    in this way because the attribute value is not quoted. If it were quoted,\r\n    the solidus would not represent the NET syntax.<\/p>\r\n<p>The markup for this section is equivalent to this:<\/p>\r\n<pre><code class=\"markup\">&lt;p&gt;This exploits some known bugs in &lt;a href=\"http:\"&gt;&lt;\/a&gt;validator.w3.org\/\r\n   to both help prevent cheaters and confuse even the most experienced\r\n   authors.&lt;\/p&gt;<\/code><\/pre>\r\n\r\n<h3 id=\"form\">The Form<\/h3>\r\n<p>This section is quite easy, given that most of the concepts used have been\r\n    covered earlier. To make it easier to explain, I\u2019ve indented the lines a little,\r\n    but the markup is otherwise unchanged.<\/p>\r\n<pre><code class=\"markup\">&lt;form method=\"get\" action=\"http:\/\/validator.w3.org\/check\"\r\n  &lt;table\r\n      &lt;tr\r\n        &lt;td&lt;input text checked id=uri name=uri size=40\/&gt;\r\n        &lt;&gt;&lt;label for=uri&gt;Is this test too hard?&lt;\/label&gt;&lt;\/&gt;\r\n      &lt;&gt;&lt;td&lt;button button&gt;Don't Cheat!&lt;\/&gt;\r\n  &lt;\/tbody\r\n &gt;&lt;\/table&gt;<\/code><\/pre>\r\n<p>The start-tags for the <code class=\"markup\">form<\/code>, <code class=\"markup\">table<\/code>, <code class=\"markup\">tr<\/code> and <code class=\"markup\">td<\/code> elements are unclosed start-tags\r\n    again. The empty start-tags and end-tags should be fairly self explanatory.<\/p>\r\n<p>The <code class=\"markup\">tbody<\/code> element, like the <code class=\"markup\">head<\/code> and <code class=\"markup\">body<\/code> elements, is missing its start-tag\r\n    but not its end-tag. The <code class=\"markup\">tbody<\/code> end-tag is not, in this case, an unclosed tag,\r\n    because it is closed on the following line with the <code class=\"markup\">TAGC<\/code> delimiter just before\r\n    the <code class=\"markup\">table<\/code> end-tag.<\/p>\r\n<p>The attributes for the <code class=\"markup\">input<\/code> and <code class=\"markup\">button<\/code> elements make use of a feature called\r\n    <dfn>attribute minimisation<\/dfn>. Despite popular belief, attribute minimisation allows\r\n    the omission of the attribute name where the attribute value may be unambiguously\r\n    associated with a particular attribute. The <code class=\"markup\">text<\/code> attribute, is not actually\r\n    a <code class=\"markup\">text<\/code> attribute. It is a value that may be unambiguously associated with the\r\n    <code class=\"markup\">type<\/code> attribute and is, therefore, the minimised form of <code class=\"markup\">type=\"text\"<\/code>.\r\n    The <code class=\"markup\">checked<\/code> attribute is more commonly known has the short form of <code class=\"markup\">checked=\"checked\"<\/code>.\r\n    This is exactly the same for the <code class=\"markup\">button<\/code> element, where the value <code class=\"markup\">button<\/code> is unambiguously\r\n    associated with <code class=\"markup\">type=\"button\"<\/code>.<\/p>\r\n<p>The <code class=\"markup\">input<\/code> element also uses a net enabling start-tag. Because it is\r\n    an empty element, for which end-tags are forbidden, it does not need the\r\n    null end-tag to be present. The net enabling start-tag is also followed by\r\n    a greater than symbol &gt;, which is designed to make it look like XHTML syntax.\r\n    However, because the start-tag and element are ended with the net enabling\r\n    start-tag close delimiter, the greater than symbol actually follows the element,\r\n    and should be treated as character data, not markup.<\/p>\r\n<p>Finally, the required end-tag for the <code class=\"markup\">form<\/code> element is not in this section,\r\n    but it does appear later in the document, which will be discussed when we\r\n    get to it. The markup for this section is equivalent to: <\/p>\r\n<pre><code class=\"markup\">&lt;form method=\"get\" action=\"http:\/\/validator.w3.org\/check\"&gt;\r\n  &lt;table&gt;\r\n    &lt;tbody&gt;\r\n      &lt;tr&gt;\r\n        &lt;td&gt;&lt;input type=\"text\" checked=\"checked\" id=\"uri\"\r\n                   name=\"uri\" size=\"40\"&gt;&amp;gt;&lt;\/td&gt;\r\n        &lt;td&gt;&lt;label for=\"uri\"&gt;Is this test too hard?&lt;\/label&gt;&lt;\/td&gt;\r\n      &lt;\/tr&gt;\r\n      &lt;tr&gt;\r\n        &lt;td&gt;&lt;button type=\"button\"&gt;Don't Cheat!&lt;\/button&gt;&lt;\/td&gt;\r\n    &lt;\/tbody&gt;\r\n  &lt;\/table&gt;<\/code>\r\n<\/pre>\r\n<h3 id=\"list\">The List<\/h3>\r\n<p>The list is made up of several sections. First, some list items, followed\r\n    by a processing instruction and lastly a really complicated comment declaration\r\n    containing what appears to be invalid markup.<\/p>\r\n<pre><code class=\"markup\">&lt;ul\/\r\n&lt;li&gt;&lt;![CDATA[\r\n&lt;li Oops&lt;!-- ?]]&gt; --&gt;\r\n&lt;li&gt;There are &lt; 2 validation errors in this document&lt;\/li&gt;\r\n&lt;?hello comment=\"What's this doing here?\"?&gt;\r\n&lt;!--- Found the error yet? ----&gt;\r\n&lt;blink&gt;I'll bet this is &amp;#147;annoying&amp;#148;!&lt;\/blink&gt;\r\n&lt;p align=\"right\"&gt;Remeber, it's a Strict DOCTYPE!\r\n&lt;!-- ------ Don't give up now! ----- &gt;\r\n&lt;meta http-equiv=\"Content-Type\" content=\"text\/html;charset=UTF-8\"&gt;\r\n&lt;p&gt;Is the error here --&gt;&lt;li&gt;?\/<\/code><\/pre>\r\n<p> The <code class=\"markup\">ul<\/code> element starts with net enabling start-tag, as discussed previously.\r\n    The null end-tag appears at the very end of this section. <\/p>\r\n<p> The first list item contains a <code class=\"markup\">CDATA<\/code> section. Many people know this syntax\r\n    from XML, but it is also valid in SGML, though it is unsupported in most browsers.\r\n    Opera 8 is the only browser I know of that supports it for HTML. The <code class=\"markup\">CDATA<\/code> section\r\n    means that its content (everything up to the <code class=\"markup\">CDATA<\/code> section end <code class=\"markup\">]]&gt;<\/code> should\r\n    be treated as character data. <\/p>\r\n<p> Thus, the second list item, which looks like an invalid unclosed start-tag\r\n    containing an invalid attribute, is not really markup. It is character data\r\n    that should be output as such. If it were markup, it would be an error because\r\n    \u201c<code class=\"markup\">Oops<\/code>\u201d is not a valid attribute or value. For the same reason, the comment\r\n    declaration surrounding the CDATA section end is not really a comment. Because\r\n    no markup is recognised, it\u2019s also treated as character data and output as text.<\/p>\r\n<p> The second real list item (following the <code class=\"markup\">CDATA<\/code> section) contains an\r\n    unencoded less than symbol. In most cases, this should be encoded as <code class=\"markup\">&amp;lt;<\/code>.\r\n    While in XHTML, this is compulsory, in HTML (like for the unencoded ampersand\r\n    in paragraph 1) there are circumstances where it is valid to leave it unencoded.<\/p>\r\n<p> The processing instruction <code class=\"markup\">&lt;?hello ... ?&gt;<\/code> is also valid. These may\r\n    appear almost anywhere within an SGML document. Although this PI has no defined\r\n    meaning, it does not affect validation in any way. <\/p>\r\n<p> The big comment declaration is actually fairly complicated, but easy to understand\r\n    with a basic understanding of SGML comment syntax. If you recall the discussion\r\n    of the comment syntax above in the DOCTYPE section, you\u2019ll remember that a comment\r\n    starts and ends with matching pairs of hyphens. By counting the number of hyphen\r\n    pairs, you should notice that the <code class=\"markup\">blink<\/code> element, the <code class=\"markup\">p<\/code> elements and the <code class=\"markup\">meta<\/code>    element are actually commented out. <\/p>\r\n<p> Within the commented out <code class=\"markup\">blink<\/code> element, there are also some invalid numeric\r\n    character references used. The decimal code points 147 and148 are actually <code class=\"charset\">Windows-1252<\/code>    code points which are commonly used and supported in web browsers. However,\r\n    because numeric character references are supposed to use Unicode code points\r\n    and these are control characters, these character references are invalid. <\/p>\r\n<p> The whole comment is actually closed on the last line of this section, just\r\n    before a new list item is opened. This is valid because the unordered list\r\n    has not yet been closed. The final list item simply contains a question mark,\r\n    and is immediately followed by the null end-tag for the ul element.<\/p>\r\n<pre>\r\n\r\n<code class=\"markup\">&lt;ul&gt;\r\n  &lt;li&gt;\r\n  &amp;lt;li Oops&amp;lt;!-- ? --&amp;gt;&lt;\/li&gt;\r\n  &lt;li&gt;There are &amp;lt; 2 validation errors in this document&lt;\/li&gt;\r\n  &lt;?hello comment=\"What's this doing here?\"?&gt;\r\n  &lt;!--- Found the error yet? ...\r\n    &lt;p&gt;Is the error here --&gt; &lt;li&gt;?&lt;\/li&gt;\r\n&lt;\/ul&gt;<\/code>\r\n\r\n<\/pre>\r\n<h3 id=\"p3\">Paragraph 3<\/h3>\r\n<p> The final paragraph, which is also the location of the only validation error\r\n    within the document, should be extremely easy to understand, if you successfully\r\n    read and understood the entire explanation so far. <\/p>\r\n<pre><code class=\"markup\">&lt;p\/&gt;The question is: Is this&lt;br&gt;HTML or&lt;br\/&gt;XHTML\r\n   served as text\/html? &amp;smile&lt;\/&gt;&lt;\/&gt;&lt;\/&gt;<\/code><\/pre>\r\n<p> The <code class=\"markup\">p<\/code> element contains a net enabling start-tag, followed by a greater than\r\n    symbol. Again this was designed to look like XHTML\u2019s empty element syntax, but\r\n    it is not. <\/p>\r\n<p> The two <code class=\"markup\">br<\/code> elements are actually both valid HTML, despite the second\r\n    appearing to use XHTML empty element syntax also. The second <code class=\"markup\">br<\/code> element\r\n    is closed by the net enabling start-tag and also followed by a greater than\r\n    symbol. <\/p>\r\n<p> The <code class=\"markup\">p<\/code> element is actually closed by the solidus in <code class=\"mime\">text\/html<\/code>. Thus, everything\r\n    following the solidus is outside of the <code class=\"markup\">p<\/code> element and a direct child of the\r\n    form element; which, as you should recall, is still open. <\/p>\r\n<p> The entity reference <code class=\"markup\">&amp;smile<\/code> is missing the <dfn>reference close<\/dfn> (<code class=\"markup\">REFC<\/code>)\r\n    delimiter (semi-colon <code class=\"markup\">;<\/code>). This is valid in HTML where the entity reference\r\n    is followed by any non-name character. However, because the entity declaration\r\n    was removed from the DOCTYPE above due to lack of browser support, it will\r\n    be replaced with the real white smiling face character. <\/p>\r\n<p> The first empty end-tag is for the <code class=\"markup\">form<\/code> element, which is still open. The\r\n    second is for the <code class=\"markup\">body<\/code> element and the third is for the <code class=\"markup\">html<\/code> element. <\/p>\r\n<pre>    <code class=\"markup\">&lt;p&gt;&amp;gt;The question is: Is this&lt;br&gt;HTML or&lt;br&gt;&amp;gt;XHTML\r\n       served as text&lt;\/p&gt;html? ?\r\n  &lt;\/form&gt;\r\n&lt;\/body&gt;\r\n&lt;\/html&gt;<\/code><\/pre>\r\n<h3 id=\"all-together\">Putting it All Together<\/h3>\r\n<p> After we combine each of the newly marked up sections using the commonly\r\n    supported syntax, we end up with the entire document looking like this: <\/p>\r\n<pre><code class=\"markup\">&lt;?xml version=\"1.0\" comment=\"Find the Error!\" ?&gt;\r\n&lt;!--\r\n&lt;!DOCTYPE html PUBLIC \"-\/\/W3C\/\/DTD XHTML 1.0 Strict\/\/EN\"\r\n\"http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-strict.dtd\"&gt;\r\n--&gt;\r\n&lt;!DOCTYPE HTML PUBLIC \"-\/\/W3C\/\/DTD HTML 4.01\/\/EN\"&gt;\r\n&lt;html lang=\"en\"&gt;\r\n&lt;head&gt;\r\n  &lt;title&gt;validation quiz&lt;\/title&gt;\r\n&lt;\/head&gt;\r\n&lt;body&gt;\r\n  &lt;p&gt;In this document, there &amp;exist;s a single validation error.\r\n     It makes use of some &lt;strong&gt;&lt;em&gt;very&lt;\/em&gt;&lt;\/strong&gt; uncommon\r\n     &amp;amp; unsupported markup techiques designed to fool the faint\r\n     hearted.&lt;\/p&gt;\r\n  &lt;p&gt;This exploits some known bugs in &lt;a href=\"http:\"&gt;&lt;\/a&gt;validator.w3.org\/\r\n     to both help prevent cheaters and confuse even the most experienced\r\n     authors.&lt;\/p&gt;\r\n\r\n  &lt;form method=\"get\" action=\"http:\/\/validator.w3.org\/check\"&gt;\r\n    &lt;table&gt;\r\n      &lt;tbody&gt;\r\n        &lt;tr&gt;\r\n          &lt;td&gt;&lt;input type=\"text\" checked=\"checked\" id=\"uri\"\r\n                     name=\"uri\" size=\"40\"&gt;&amp;gt;&lt;\/td&gt;\r\n          &lt;td&gt;&lt;label for=\"uri\"&gt;Is this test too hard?&lt;\/label&gt;&lt;\/td&gt;\r\n        &lt;\/tr&gt;\r\n        &lt;tr&gt;\r\n          &lt;td&gt;&lt;button type=\"button\"&gt;Don't Cheat!&lt;\/button&gt;&lt;\/td&gt;\r\n        &lt;\/tr&gt;\r\n      &lt;\/tbody&gt;\r\n    &lt;\/table&gt;\r\n\r\n    &lt;ul&gt;\r\n      &lt;li&gt;\r\n      &amp;lt;li Oops&amp;lt;!-- ? --&amp;gt;&lt;\/li&gt;\r\n      &lt;li&gt;There are &amp;lt; 2 validation errors in this document&lt;\/li&gt;\r\n      &lt;?hello comment=\"What's this doing here?\"?&gt;\r\n      &lt;!--- Found the error yet? ...\r\n        &lt;p&gt;Is the error here --&gt;\r\n      &lt;li&gt;?&lt;\/li&gt;\r\n    &lt;\/ul&gt;\r\n\r\n    &lt;p&gt;&amp;gt;The question is: Is this&lt;br&gt;HTML or&lt;br&gt;&amp;gt;XHTML\r\n       served as text&lt;\/p&gt;html? ?\r\n  &lt;\/form&gt;\r\n&lt;\/body&gt;\r\n&lt;\/html&gt;<\/code><\/pre>\r\n\r\n<p>Well, that&#8217;s it.  If you have any questions or need clarification for anything\r\n    I&#8217;ve discussed, please don&#8217;t hesitate to ask me.<\/p>\r\n","protected":false},"excerpt":{"rendered":"Last week, I\u2019m sure you all had fun trying to get your mind around understanding the incredibly complex, yet almost entirely valid markup in the validation\tquiz.  Those of you who aren\u2019t SGML experts, and are still trying to figure how a non-conformant XML declaration in an HTML document with 2 DOCTYPEs can be valid, read on to find out.","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,7],"tags":[],"_links":{"self":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/75"}],"collection":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/comments?post=75"}],"version-history":[{"count":0,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/75\/revisions"}],"wp:attachment":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/media?parent=75"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/categories?post=75"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/tags?post=75"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}