HTML5 Accepted by HTMLWG

It’s about time! After over 2 months and dozens of pointless debates on public-html–the W3C’s HTMLWG mailing list–we have finally achieved something productive. The HTML5 spec, which we have been working on in the WHATWG for the past 3 years, has finally been accepted for review as the basis for the official HTML 5.

This is great news! It’s certainly better than the other alternative proposed: start with HTML4 and drop the transitional features (yes, some people actually did suggest that!).

The charter says we’re supposed to have a first public working draft published by June this year, and it looks like we will actually be able to achieve that.

The <b> and <i> Elements

Over on public-html (and cross posted to www-html), there is some intense bikeshedding going on pertaining to the use of the b and i elements and why they should or should not be retained in HTML 5. Some have argued that b and i are strictly presentational elements, and that they should not be retained in HTML5.

Others have argued that the b and i elements are virtually synonymous with strong and em in reality, so trying to define otherwise is pointless. The reason that i and b could be considered synonymous with em and strong, respectively, has to do with them being used largely interchangeably with each other in reality, and very little to do with their actual definitions.

However, it’s reasonable to hypothesise that there would be significantly more non-emphasis-usage of i than there would be usage for emphasis, and so defining that all uses of i represent emphasis would be a mistake (similarly for b and strong).

Therefore, the pragmatic approach is to specify that i and b convey unspecified semantics, which are to be determined by the reader in the context of their use. In other words, although they don’t convey specific semantics by themselves, they indicate that that the content is somehow distinct from its surroundings and leaves the interpretation of the semantics up to the reader.

That is effectively the approach taken by the HTML5 spec. These are the current definitions for the b and i elements.

The b element:

The b element represents a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is boldened.

The b element should be used as a last resort when no other element is more appropriate. In particular, headers should use the h1 to h6 elements, stress emphasis should use the em element, importance should be denoted with the strong element, and text marked or highlighted should use the m element.

The i element:

The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized.

Terms in languages different from the main text should be annotated with lang attributes (xml:lang in XML).

The i element should be used as a last resort when no other element is more appropriate. In particular, citations should use the cite element, defining instances of terms should use the dfn element, stress emphasis should use the em element, importance should be denoted with the strong element, quotes should be marked up with the q element, and small print should use the small element.

Context is effectively how users distinguish between italics used for emphasis and the variety of other uses, so providing a couple of catch-all elements for the remaining cases that don’t have specific elements, isn’t all that bad.

The separation of presentation and semantics isn’t a goal in and of itself that needs to be strictly adhered to. Rather, it’s just a means to an end and if that end can be reached without a strict separation, then so be it.