This was originally presented by Lachlan Hunt at the Web Standards Group meeting in Sydney on 2007-01-25.
You may also download the presentation slides (PowerPoint) and audio recording (Ogg Vorbis) of the following transcript.
This presentation is in the public domain. If you wish to convert the audio or slides into a different format, please do so.
In the early ’90’s, Tim Berners-Lee conceived HTML, but there was no formal HTML 1.0 specification written and, despite the similarities in syntax, it was not formally based on SGML.
Work continued over the next few years and in 1995, HTML 2.0 was published as RFC 1866 which formally defined HTML as an application of SGML. However, browsers still didn’t bother to implement SGML parsers and, even at this early stage, many proprietary extensions were starting to appear.
From around 1996, the browser wars were in full swing. There were proprietary extensions flying in from all directions and an abundance of broken pages relying on browser bugs to work. This eventually became widely known as “Tag Soup”. In an effort to standardise this mess, the W3C published HTML 3.2 in ’97 and 4.0 in the following year which formally deprecated many of the presentational features that had crept in.
By now it seemed that the life of HTML was coming to an end and work on XHTML began. After HTML 4.01 was published at the end of ’99 to resolve a few minor issues, work on HTML as an application of SGML ceased and the HTML Working Group have been pushing ahead with XHTML ever since.
In what seemed like an effort to further distance themselves from these huge mistakes of the past, the HTML Working Group began work on XHTML 2.0 in 2002. However, it has not been designed with backwards compatibility in mind; it has been designed as a way to start over fresh with a new markup language; although many see this as a major barrier to XHTML 2.0’s chances of ever taking off.
Over the years, Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML and apparent disregard for the needs of real-world authors. So, in 2004, led by Ian Hickson, these organisations set out to with a mission to meet the needs of both users and developers; and the Web Hypertext Application Technology Working Group was born.
The goals of the WHATWG include documenting existing, real-world browser behaviour; standardising widely supported and useful proprietary extensions and developing practical new features that meet the demands of both users and developers whilst ensuring backwards compatibility and defining robust error handling techniques.
Over the past 2 years, they've been planning and working on 3 separate specifications: Web Applications 1.0, Web Forms 2.0 and Web Controls 1.0. Together, these 3 specs form what is collectively known as HTML 5.
The Web Apps 1.0 spec is redefining the syntax and parsing requirements of HTML to match the way existing browsers handle tag soup, introducing new document structure and semantics, and DOM APIs, many of which are designed specifically for building applications.
The Web Forms 2.0 spec aims to extend the HTML 4 forms with new controls, a repetition model, improved client side validation and new DOM APIs for working with forms and controls.
Lastly, the Web Controls 1.0 spec aims to further enhance CSS and the DOM for building customised controls and widgets. However, at present, not much work has been done in this area and so there isn't much to say about it yet.
HTML5 introduces the concept of serialisations for an HTML document. A serialisation in this context refers to its physical representation. HTML5 uses the HTML serialisation and XHTML5 uses the XML serialisation. Because of this, the distinction between an HTML and an XHTML document is reduced.
In most cases, either serialisation can be used to represent exactly the same document. Although they will be parsed according to different rules, browsers will create a DOM, which is simply another way of representing the document.
There are, however, some features that cannot be represented in all of these. For instance, namespaces can be used in the DOM and in the XHTML serialisation, but cannot be used in the HTML serialisation.
As a consequence, this resolves the HTML vs. XHTML debate once and for all. These days, many authors use an XHTML 1.0 DOCTYPE and then proceed to claim they’re using XHTML, but in reality, they’re using HTML because browsers make the decision about whether to treat a document as HTML or XHTML based on the MIME type.
So, unlike previous versions, the choice of using either HTML or XHTML is not dependent upon the DOCTYPE used. It is solely dependent upon the MIME type. If the document is served as text/html, it is HTML and gets parsed as such; but if it is served with an XML MIME type, like application/xhtml+xml, it is XHTML and gets parsed as XML.
In reality, parsing HTML is a nightmare. The web is literally filled with an infinite number of pages, growing every day, and browsers are forced to handle it all gracefully. They can't allow themselves to choke on invalid HTML, regardless of how broken it is.
The major problem is that there is a serious lack of interoperability, which is a direct result of the fact that parsing and error handling were not well defined in HTML, and most certainly not defined in a way that is compatible with the web.
There are also many proprietary extensions out there that are both widely used and supported. The problem with this is that these features aren't well-defined and browser vendors have spent years reverse engineering them from each other.
While reverse engineering has gone some way in fostering interoperability between browsers, the process is far from perfect and it would be much better if the widely used and deployed extensions could be thoroughly documented and interoperably implemented; which is exactly what the WHATWG is attempting to do.
To illustrate the lack of interoperability, let’s take a look at a simple, yet very common. markup error and show how it is handled by different browsers. In this example, the strong and em elements have been badly nested.
In this case, Firefox and Safari produce the same result, although they use different parsing algorithms to do so. In the DOM representation, notice that there are 2 em elements in the DOM, yet only one appears in the markup. To work around the error, they’ve effectively closed the em element when its parent element closed, and created a new one immediately afterwards.
Compare this with IE, however, which, instead of creating 2 em elements, creates a broken DOM that isn’t strictly a tree. Notice how the em element has 2 child text nodes, b and c, but the text node c references the p element as its parent, rather than the em.
Lastly, Opera creates a DOM similar to that in IE, except that it is a proper tree structure. But the problem with this approach is that the text node c is a descendant of the strong element, but it is not rendered as such. By default, it is only rendered in italics, not in bold, as you would expect with this DOM.
So you can see, with just this one simple example, that browsers do handle markup differently. And keep in mind that the web is filled with an infinite number of pages, with errors far more complicated than that.
The WHATWG is attempting to resolve this situation by thoroughly documenting and defining the parsing requirements for handling HTML. They are achieving this goal by analysing the behaviour of current browsers–primarily IE, Firefox, Opera and Safari–and defining an algorithm that will be compatible with the web, in the hope that it will be implemented by all future browsers.
To help ensure full interoperability between browsers, one of the most important issues to deal with is error handling. We can never expect all web pages to be error free, but, as users, we should always expect browsers to handle it. So the algorithm has been specced, at least in theory, to deal every possible error condition.
In HTML, DOCTYPEs serves 2 practical purposes: validation and DOCTYPE sniffing. These days, most standards-aware developers use either an HTML or XHTML, Strict or Transitional DOCTYPE. Since HTML 5 is no longer formally based on SGML and because DTD based validation has many limitations with respect to conformance checking, HTML 5 will no longer recommend the use of a DTD. Rather, conformance checkers will be free to use whatever methodology they like to check the document for validity and conformance, so long as the end result is the same.
However, there is still the practical issue of triggering standards mode and some form of DOCTYPE is required for that in HTML.
In HTML 4, the DOCTYPE was long and complicated, and very few people can actually remember it all. The complex PUBLIC and SYSTEM identifers are used to refer to the DTD. But because there is no DTD in HTML5, we’ve taken out the PUBLIC and SYSTEM identifiers and left the minimal amount of code that is both easy to remember and triggers standards mode. Thus, in HTML 5, the DOCTYPE will simply be <!DOCTYPE html>.
This does not apply to XHTML 5, for which there is no DOCTYPE sniffing and no need for any DOCTYPE at all.
These days, it’s fairly common to use div elements for the major structures on the page, such as headers, footers and columns, giving each one a descriptive id or class. But the use of divs is simply because current versions of HTML lack the necessary semantics for describing these sections.
In extreme cases, the overuse of the non-semantic div element can lead to a syndrome, which is common amongst beginners, known as either divitis or div-mania. HTML 5 is attempting to cure this condition by introducing new elements that provide the semantics for representing each of these different sections.
There are new header and footer elements, for marking up the header and footer of a page or section.
The new nav element has been introduced for marking up navigation links; either site navigation or page navigation.
The new aside element is for content that is tangentially related to the content around it, and is typically useful for marking up side bars.
The new section element represents a generic section of a document or application, such as a chapter, for example.
The article element is like section, but is specifically for marking up content such as a news article or blog entry.
When used in conjunction with the heading elements, all of these elements provide a way to mark up nested sections with heading levels, beyond the 6 levels possible with previous versions of HTML.
HTML 5 is also introducing many other new elements for a wide range of semantic purposes, ranging from simple metadata to cool new widgets.
The new meter element provides a widget for representing scalar measurements or fractional values. For example, you could use it to show a quality rating, disk quota usage or the current temperature.
The progress element is designed to show the completion progress of a task. It has been designed to work with scripted applications that can dynamically update the progress. For example, you could use it to show the loading progress in an Ajax application, or to illustrate the user’s progress through a series of forms.
The canvas element is designed to provide a 2D drawing API, specifically for use with scripts. It can be used to render anything from simple artwork or graphs drawn from tables of data, to fancy animations or interactive applications, such as a game. There has even been some talk of introducing a 3D drawing API.
The new datagrid element represents an interactive representation of tree, list, or tabular data and provides a widget that allows the user, and a rich DOM API for scripts, to work with the data.
There is a new time element for marking up dates and times, m for highlighting text. The revitalised menu element is back with improvements, in conjunction with the new command element, for providing toolbars and context menus.
The widely implemented, yet previously undocumented, embed element has been introduced, and the figure element provides a way for adding captions to images.
The details element can be used to represent additional information, available on request, and the new dialog element is for marking up conversations.
Over the years, it’s become clear that the types of controls available in HTML4 are quite limited and have forced many sites to work around these limitations with varying degrees of complexity. Dates, for example, are often requested using 3 separate fields–one each for the day, month and year. Web Forms 2 has introduced a number of new controls for a wide range of additional datatypes.
There are several new controls for dates and times. This is the new widget that Opera provides for the datetime control. It provides a calendar for selecting the date and a clock for entering the time. Similar controls are also available for just the date, or just the time.
The new number control is for any numeric value. The advantage of this control is that, in this implementation, it provides a spin control for incrementing and decrementing the value, as well as ensuring that only numbers are entered. It doesn’t allow any non-numeric characters to be entered, so it’s one less thing for client side validation to worry about.
There’s also a new slider control available. It’s value is also numeric, but it’s designed for cases where the exact value is relatively unimportant. For example, it could be used as a volume control or brightness control.
The new email control is designed specifically for e-mail addresses. The advantage of this control is that browsers could provide access to the user’s address book and also verify that a valid e-mail address has been entered.
The new URL control is also available for URIs. In this example, the browser has listed some matching addresses from the user’s browsing history.
There are often times when you need to collect an arbitrary number of values for a set of data. For example, a ticket booking form may ask you to list the names of all the people for whom you are purchasing tickets, you may need to add multiple contacts to an address book, or, as in this example, list all the members of SG-1.
In current sites, this usually requires the user to submit the form to the server, using the Add button provided on the page, and the server to respond with a new page updated with additional rows. With this new model, the addition and removal of rows can be handled entirely on the client side.
Web Forms 2.0 has introduced new template features and buttons for replicating form controls. The add button can be used to add a new set of controls. In this example, a fourth set of fields for name and rank have been added and filled out.
Values can also be easily removed using the Remove button. When the user has completed the form, it can be submitted just like any other, with a regular submit button.
The way it works is by marking up a template in the page. Almost any element can be used as a template; you are not restricted to using table rows, as in this example. The new repeat attribute indicates that the element and its content is a template that can be replicated.
The repeat-start attribute indicates how many copies of the template should be generated when the page loads. In this example, 2 rows will be generated.
When a template is replicated, a few things need to occur. The repeat attribute is given a unique index and the repeat-template attribute is used to refer to ID of the template from which it was created.
Also notice the name attributes in the template row at the end. The use of square brackets is a special syntax that needs to refer to the template’s ID that gets replaced with the value of the repetition index. In this way, it can be used as a way to ensure that each control has a unique name for sending to the server.
For removing repetition blocks, a new remove button has been defined. When activated, it causes its nearest ancestor repetition block to be removed.
Similarly, for adding new repetition blocks, a new add button is available. When it is activated, it generates a new repetition block from the template and inserts it into the page.
HTML5 has introduced some new attributes on form controls for describing the expected value to enable the browser to assist with the validation. The new required attribute can be used to indicate that a value is required.
Regular Expressions, which are typically embedded in form validation scripts, can now be used with the new pattern attribute for describing the exact format allowed. For instance, you could use this pattern for a username field to restrict it to alphanumeric characters only.
For numeric controls, such as number and range, it will be possible to restrict the allowable values to be within a certain range using the min and max attributes.
And, although this has been available on text boxes since the beginning, maxlength can now be specified on textareas too.
Browsers that support these features can notify the user of any mistakes and automatically prevent submission until they are corrected; or they can be used in conjunction with scripts to enhance the user experience.
Along with the new markup features that have been introduced, HTML5 is also including many new features in the DOM. The DOM is a browsers internal representation of the page and the APIs are provided to allow scripts to work with it.
There are many widely supported APIs in browsers that were previously undocumented; known as DOM level 0. These include interfaces like Window, History, Location; and the many widely supported and used methods that aren’t defined in a current DOM spec.
In recognising the fact that these APIs are both widely used and supported, it is considered far better to document, standardise, and improve them where possible, so that they can become interoperably implemented.
Along with these, there are also many new features that are being developed. The client-side storage APIs are designed to allow scripts to store data on the client side. In a way, they are similar to cookies, but with a much richer API and enhancements.
The new Audio interface is being designed for playing small sound effects.
There are several new communication APIs, including server-sent events, which allows a page to receive notifications from the server when an event occurs. For example, it could be used for a stock ticker to be updated with new values as they change.
The network connection APIs are being designed to allow scripts to make TCP connections directly with a server. This is similar to XMLHttpRequest, but you are not restricted to just HTTP requests.
And finally, the cross-document messaging APIs are designed to allow one document to communicate with another, without the hassle of cross-domain security issues.
If you would like more information, you can check out the WHATWG website, read the specs, blog, wiki or FAQ, or feel free to ask questions in the new mailing list or forums aimed at designers and developers.