{"id":185,"date":"2009-02-24T23:35:30","date_gmt":"2009-02-24T23:35:30","guid":{"rendered":"http:\/\/lachy.id.au\/log\/?p=185"},"modified":"2013-09-07T16:36:58","modified_gmt":"2013-09-07T16:36:58","slug":"markup-spec","status":"publish","type":"post","link":"https:\/\/lachy.id.au\/log\/2009\/02\/markup-spec","title":{"rendered":"HTML 5: The Markup Language"},"content":{"rendered":"<p>A relatively new editor\u2019s draft entitled\r\n   <a href=\"http:\/\/www.w3.org\/html\/wg\/markup-spec\/\"\r\n      title=\"HTML 5: The Markup Language\">HTML 5: The Markup Language<\/a>\r\n   has been proposed by Mike Smith.  This draft is an attempt to define the\r\n   vocabulary and syntax of HTML, without any implementation conformance\r\n   criteria or associated DOM APIs.  It\u2019s being positioned as a replacement,\r\n   normative definition of the language over the existing HTML 5 spec, and its\r\n   proponents claim that it\u2019s better for authors.  But don\u2019t be deceived; this\r\n   draft isn\u2019t what it claims to be, and is not really beneficial for the vast\r\n   majority of web developers.<\/p>\r\n\r\n\r\n<h3>About the Draft<\/h3>\r\n\r\n<p>The document itself is largely generated from two primary sources, with\r\n   some additional explanatory material included manually.  It incorporates\r\n   selected statements and conformance criteria from the spec itself, which is\r\n   fine.  This is a useful technique to help ensure that it and the spec stay\r\n   relatively in sync with each other.  But it also incorporates the RelaxNG\r\n   schemas and regular expressions that are being developed for the HTML 5\r\n   Validator.  This is part of the source code from one particular validator\r\n   implementation, and it\u2019s important to note that this code was not primarily\r\n   written for human consumption, but rather machine processing.<\/p>\r\n\r\n<p>Yet, despite this, it is being pushed as a suitable, human readable method\r\n   for describing the conforming syntax and element content models of HTML.\r\n   In a sense, it\u2019s analogous to the DTDs used within the HTML 4.01\r\n   specification, except that it\u2019s more difficult to read.<\/p>\r\n\r\n<p>From past experience, we know that many web developers were not comfortable\r\n   reading the DTD syntax, and preferred to check reference guides, tutorials,\r\n   or ask others on mailing lists or forums to explain things.  So the notion\r\n   that such a document would be useful for the majority of web developers is,\r\n   frankly, absurd.<\/p>\r\n\r\n<p>But don\u2019t just take my word for it.  Let\u2019s take a look at some examples of\r\n   this notation and see for ourselves.  This is the regular expression that\r\n   describes the conforming DOCTYPE syntax:<\/p>\r\n\r\n<p><code>doctype = &lt;![dD][oO][cC][tT][yY][pP][eE]\\s+[hH][tT][mM][lL]\\s*&gt;<\/code><\/p>\r\n\r\n<p>If that\u2019s not scary enough, how about this which defines the conforming values for the target attribute:<\/p>\r\n\r\n<p><code>browsing-context-or-keyword = ()|([^_].*)|(_[bB][lL][aA][nN][kK])|(_[sS][eE][lL][fF])|(_[pP][aA][rR][eE][nN][tT])|(_[tT][oO][pP])<\/code><\/p>\r\n\r\n<p>To be fair, it is accompanied by a plain text list of examples of the four\r\n   predefined values, but simply looking at the examples alone doesn\u2019t the\r\n   reader anything about case insensitivity, nor indicate that other custom\r\n   values are not allowed to begin with an underscore.  The only way to deduce\r\n   that is from the above RegExp.<\/p>\r\n\r\n<p>Finally, take a look at\r\n   <a href=\"http:\/\/www.w3.org\/html\/wg\/markup-spec\/#a\">the definition of the\r\n      a element<\/a>, or any other, and see if you can understand what it\r\n   means.  Personally, I know how the a element is defined in the spec, but\r\n   even I can\u2019t easily figure out what that schemas are trying to say.<\/p>\r\n\r\n<p>The a element\u2019s content model is actually defined as Transparent in the\r\n   spec, which you can think of as basically meaning that its content model is\r\n   inherited from the parent element.  (This is a slight over simplification\r\n   of its actual meaning, but we can ignore the subtleties for now.)\r\n   i.e. When it\u2019s included as a child of an element that only permits phrasing\r\n   content, that applies to the a element too. But when it\u2019s parent permits\r\n   flow content, so does the a element.  If you were able to decipher that on\r\n   your own from the proposed draft, then well done.  I couldn\u2019t.<\/p>\r\n\r\n<p>By now, you may be asking, if this proposal isn\u2019t really suitable for web\r\n   developers, then who is it suitable for?  It\u2019s a question that has been\r\n   asked several times on the mailing list, and yet one that has not yet been\r\n   adequately answered.  I\u2019ll do my best to explain how I see it shortly.  But\r\n   first, there\u2019s a little background to cover.<\/p>\r\n\r\n\r\n<h3>The Spec Splitters<\/h3>\r\n\r\n<p>Within the working group, as expected, many people have a very diverse\r\n   range of opinions.  In particular, a number of individuals share the\r\n   opinion that the current HTML 5 spec is far too monolithic and that it\r\n   should be split.  There\u2019s nothing inherently wrong with that position,\r\n   per se.  There are indeed sections of the spec that nearly everyone agrees\r\n   should be, or have already been, separated out into their own\r\n   specifications.<\/p>\r\n\r\n<p>For instance, XMLHttpRequest was, at one time, part of HTML5.  This was\r\n   taken out a long time ago and moved to the WebApps working group, where it\r\n   has thrived independently from HTML5 ever since.  More recently, the web\r\n   sockets protocol and API have also been split into their own specs, as has\r\n   the the content sniffing, HTTP Origin header, <a href=\"http:\/\/lists.w3.org\/Archives\/Public\/www-html\/2009Jan\/0049.html\" title=\"Re: HTML5 and XHTML2 combined (a new approach) from Ian Hickson on 2009-01-22 (www-html@w3.org from January 2009)\">and more<\/a>.<\/p>\r\n\r\n<p>The issue is that a number of individuals want the spec split in ways that\r\n   aren\u2019t entirely sensible.  This includes the idea of splitting the spec\r\n   along the lines of a conforming, declarative language definition and\r\n   separate implementation requirements.  There are even those who would go so\r\n   far as to say that only the former should be defined, effectively leaving\r\n   the implementers to fend for themselves.  But I\u2019ll spare you from the\r\n   horror of such extremes, as the group moved beyond that debate long ago,\r\n   and merely deal with those who want to split the spec.<\/p>\r\n\r\n<p>From high level perspective, the concept of splitting the spec along those\r\n   lines looks reasonable. These two seemingly independent components\r\n   intuitively feel like they could be defined separately.  That is, until you\r\n   start to appreciate just how intertwined these sections are, and where\r\n   exactly they want to draw the line.<\/p>\r\n\r\n<p>It is argued that the language spec should only describe the conforming\r\n   syntax and content models of the HTML markup alone.  This would omit any\r\n   details about how such features are processed and provide limited\r\n   information about what they do.  It would also omit any and all details\r\n   about the associated DOM APIs.<\/p>\r\n\r\n<p>The semantics of elements and attributes are closely related to what\r\n   functionality they provide, which is itself closely related to the\r\n   implementation requirements.  Consider, for example, the heading and\r\n   sectioning elements.  Their semantics are useful for providing hierarchical\r\n   document structures, with varying levels of headings. This is very closely\r\n   related to the processing requirements for creating an outline.  Authors\r\n   need to know how to mark up their heading structures, and implementers need\r\n   to know how to interpret them.<\/p>\r\n\r\n<p>Consider also, many of the DOM APIs for many elements reflect the values of\r\n   the content attributes.  The processing requirements for getting and\r\n   setting such properties is very dependent upon the processing requirements\r\n   for the attributes themselves, which is itself dependent upon the\r\n   conforming values of those attributes.<\/p>\r\n\r\n<p>There are many more examples of such interconnected dependencies, but I\r\n   won\u2019t try to list them all.  Suffice it to say that the problem is that by\r\n   splitting the spec, it becomes much harder to manage the integration points\r\n   between these highly interconnected sections, and creates a greater risk of\r\n   things not being defined well.  Such a situation would inevitably lead to\r\n   interoperability problems, which doesn\u2019t only end up hurting implementers,\r\n   but everyone involved including authors and users.<\/p>\r\n\r\n\r\n<h3>The Wedge Strategy<\/h3>\r\n\r\n<p>Despite the significant resistance to splitting out the language\r\n   definition, there has still been a significant push for there to be a\r\n   document that normatively defines it separately from the implementation\r\n   requirements, and this draft has been put forth with the intention of doing\r\n   just that.<\/p>\r\n\r\n<p>However, since the spec has not been split in the way described above, and\r\n   hopefully won\u2019t be, we are left with a situation where we have two drafts,\r\n   the HTML5 spec itself and this proposal, each claiming to normatively\r\n   define the language.<\/p>\r\n\r\n<p>But some people seem to be willing to use this to get their way, even if it\r\n   means normatively defining the language twice, in two separate specs.  This\r\n   is of course absurd.  With two normative documents, each defining things in\r\n   their own way, will inevitably lead to conflicts between the two specs,\r\n   which then raises the question of which takes precedence.<\/p>\r\n\r\n<p>While people claim that it\u2019s possible to define things normatively in two\r\n   separate specs and keep them in sync, there is no evidence to support that\r\n   situation and plenty of evidence against it.  But suffice it to say that it\r\n   won\u2019t work and will lead to one of two possible outcomes:<\/p>\r\n\r\n<ol>\r\n\t<li>The conforming language definition is split from the main spec,\r\n\t    leaving it to be defined only in this proposal.  This, as I explained\r\n\t    above, would be bad.<\/li>\r\n\t<li>The proposal becomes non-normative, leaving the spec itself as the\r\n\t    single authoritative normative source.  This is what I have been and\r\n\t    will continue to push for.<\/li>\r\n<\/ol>\r\n\r\n\r\n<h3>The Audience of the Proposal<\/h3>\r\n\r\n<p>As I briefly explained above, given the content of the draft, it is not\r\n   really suitable for the vast majority of web developers.  In fact, its\r\n   audience is, in practice, despite claims to the contrary, severely limited\r\n   in scope to a small minority of people that are comfortable with reading\r\n   complicated schemas and regular expressions, and whom actually have some\r\n   use for them.<\/p>\r\n\r\n<p>Schemas are primarily designed for the purpose of conformance checking.\r\n   Specifically, tools that read the document and compare it with the grammar\r\n   described in the schema.  This is effectively what validators do, although\r\n   it should be noted that schemas are not the only means of achieving this\r\n   goal.<\/p>\r\n\r\n<p>So it is somewhat useful for people writing tools with conformance checking\r\n   features, since they can, if they choose, incorporate the schemas from the\r\n   spec into their own tools, or use them as a guide for creating their own.\r\n   However, it doesn\u2019t provide all the information necessary for such\r\n   developers, as they will still need to turn to the main spec for many\r\n   implementation requirements, particularly parsing.<\/p>\r\n\r\n\r\n<h3>What about Web Developers?<\/h3>\r\n\r\n<p>Web developers certainly haven\u2019t been forgotten.  Their needs are just as\r\n   important to address as implementers.  But I and many others recognise that\r\n   such developers, many of whom aren\u2019t comfortable with normative spec\r\n   language, need something specifically targeted at them.  For this, there\r\n   are now two separate, non-normative drafts, under development.<\/p>\r\n\r\n<p>The first, currently entitled the\r\n   <a href=\"http:\/\/dev.w3.org\/html5\/html-author\/\"\r\n      title=\"HTML 5 Reference\">HTML 5 Reference<\/a>, really a reference guide\r\n   for web developers that will explain the elements, attributes and their\r\n   semantics, the syntax and DOM APIs, and provide plenty of explanatory\r\n   material and examples showing how and why to use each feature.  This is a\r\n   draft that I\u2019m working on and have recently started to make some\r\n   significant progress with it.<\/p>\r\n\r\n<p>The second is a new proposal by Dan Connolly, but which there is currently\r\n   no draft available.  This document is intended to be more of a\r\n   step-by-step, cookbook-style guide to writing pages using HTML5, with a big\r\n   focus on the multimedia aspects.  e.g. It will provide things like:<\/p>\r\n\r\n<ul>\r\n\t<li>How to embed a video within a page and provide customised controls\r\n\t    using the DOM API,<\/li>\r\n\t<li>How to indicate the completion status of a web application using a\r\n\t    progress bar.<\/li>\r\n\t<li>How to markup images with captions<\/li>\r\n\t<li>etc.<\/li>\r\n<\/ul>\r\n","protected":false},"excerpt":{"rendered":"A relatively new editor\u2019s draft entitled HTML 5: The Markup Language has been proposed in the HTML working group as an attempt to define the vocabulary and syntax of HTML, without any implementation conformance criteria or associated DOM APIs.","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,7,21],"tags":[],"_links":{"self":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/185"}],"collection":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":4,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":213,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/185\/revisions\/213"}],"wp:attachment":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}