Over on public-html (and cross posted to www-html), there is some intense
bikeshedding going on pertaining to the
use of the b
and i
elements
and why they should or should not be retained in HTML 5. Some have argued
that b
and i
are
strictly presentational elements, and that they should not be retained
in HTML5.
Others have argued that the b
and i
elements are
virtually synonymous with strong
and em
in reality, so trying
to define otherwise is pointless. The reason that i
and b
could
be considered synonymous with em
and strong
, respectively,
has to do with them being used largely interchangeably with each other in
reality, and very little to do with their actual definitions.
However, it’s reasonable to hypothesise that there would be significantly
more non-emphasis-usage of i
than there would be usage for emphasis,
and so defining that all uses of i
represent emphasis would be
a mistake (similarly for b
and strong
).
Therefore, the pragmatic approach is to specify that i
and b
convey
unspecified semantics, which are to be determined by the reader in the context
of their use. In other words, although they don’t convey specific semantics
by themselves, they indicate that that the content is somehow distinct from
its surroundings and leaves the interpretation of the semantics up to the
reader.
That is effectively the approach taken by the HTML5 spec. These are the current
definitions for the b
and i
elements.
The
b
element represents a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is boldened.The
b
element should be used as a last resort when no other element is more appropriate. In particular, headers should use theh1
toh6
elements, stress emphasis should use theem
element, importance should be denoted with thestrong
element, and text marked or highlighted should use them
element.
The
i
element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized.Terms in languages different from the main text should be annotated with
lang
attributes (xml:lang
in XML).The
i
element should be used as a last resort when no other element is more appropriate. In particular, citations should use thecite
element, defining instances of terms should use thedfn
element, stress emphasis should use theem
element, importance should be denoted with thestrong
element, quotes should be marked up with theq
element, and small print should use thesmall
element.
Context is effectively how users distinguish between italics used for emphasis and the variety of other uses, so providing a couple of catch-all elements for the remaining cases that don’t have specific elements, isn’t all that bad.
The separation of presentation and semantics isn’t a goal in and of itself that needs to be strictly adhered to. Rather, it’s just a means to an end and if that end can be reached without a strict separation, then so be it.
I occasionally find out that people are convinced all Standards Guys truly care about b/i vs strong/em. Which is a pity. It’s just a convenient example used for explaining the difference between semantic and presentational markup, after all…
As for keeping them in the spec; well ultimately if people want to add presentation without meaning then a span and a class works a treat. People just juse b and i because they’re a shorthand and faster to type, not because they’re superior 🙂
Besides which, I don’t really buy the idea that something can truly be both semantically neutral AND still require a visual difference from other content. If it’s visually different the page is communicating “hey this bit’s significant in some small way”.
Lachlan:
I am hearing you loud and clear. I use the strong tag to make something bold where it has importance beyond the block of text in which it is contained. On the other hand, there are places where I find I want to highlight some text that really has no importance outside of the context in which it is written. That’s where the use of the bold tag makes sense.
The more I read about semantic markup versus presentational markup, the more I believe people have lost the meaning of ‘semantic’. All markup is semantic. If something is written in HTML to serve the function/purpose of HTML, it’s semantic HTML. I think the distinction needs to return to structural versus presentational. Nothing is semantically neutral, the question needs to be asked, is it contexturally proper?
I don’t buy the fact that using structural HTML eliminates tag soup. Having to use span/class to define presentation can make a horrific soup; it’s just been moved over to the Cascading Style Sheet. The b and i tags are prime examples where their use avoids this. Also, take for example a two or three column layout. one table, one row, two or three cells, easily dimensioned and styled in CSS; instead of having to float this left, the other right, a wrapper to contain two of the columns in case I want to add a third, which order do I write them in, how will they degrade, so on and so forth, ad nauseum. The tag soup is not the table, it’s styling the table in CSS. But, since a table is not structurally proper, how about proper implementation of display: inline-block?
Personally, I think more emphasis needs to be placed on the browser manufacturers to adhere to already well defined standards, while continuing to get the styling moved over to CSS without losing site that tags like the b and i tags, and the occasional inline style have a useful place and purpose.
Cheers,
Peter
I meant to say, “The tag soup is not in the table, it’s styling the table in the HTML”. The tag soup can be eliminated by styling the table in CSS.
Whew, how embarrassing.
Peter
I am a solid believer in separation of structure/semantics, presentation and behavior. I have embraced the ’em’ and ‘strong’ tags as the preferred means of indicating an emphatic voice. But there is still the issue of typographic convention to imply the meaning or description of a snippet of text. Using a span of class (??) seems purely presentational and makes no implications of what is being delimited.
Back in the day, when I was learning these things, the titles of articles, and bits of a foreign language phrase were indicated by italics in print, and by quote marks when typing (yeah, I go back to the days of manual typewriters—they’re called boat anchors today). Book or magazine/journal titles were indicated by bold or underscoring in print, or underscoring in type-written text.
There are probably any number of other typographical conventions that I’ve forgot or never knew. The point is, that a description of what the element is is implied by the italicizing or bolding of the text. It is up to the reader to infer that meaning from the context. For this purpose, the b and i tags are perfectly usable as implied-semantic markers. They have the added capability of indicating their presence to people using assistive technology, which a purely presentational solution would not do. If the underscore hadn’t been preempted for indicating links (another widely understood convention), it too, would deserve a place in this debate.
Is there another way to do it? A plethora of specific tags for every conceivable usage? No, the application of b and i tags for longstanding convention is a sane approach.
Content on the web is not and should not be intended or left for human interpretation alone. The reason the web is what it is nowadays, is because developers have found ways to interconnect information and/or make it more accessible. This should be kept in mind when defining the specifications for the evolution of the web standards.
This is why semantics play [or at least should play] such an important role when defining new standards.
In my view, the <b> and <i> tags are purely presentational, non-semantic tags that should be deprecated and here’s why:
From your post
Notice how the words “stylistically” and “typographic presentation” are used on the definition. These clearly define this element provides a purely presentational differentiation from the rest of the document.
Moreover, you say:
This is precisely everything that’s wrong with the <b> element. You’re relying on the reader to interpret meaning, instead of providing it. Remember the reader is not always going to be a person. In fact, it is more likely for it to be a machine that will process the information before serving it to the end user. Meaning should be indicated in as many ways as possible.
The fact that you are italicizing something to differentiate it from the rest of the content means you intend to give it emphasis, therefore the <em> element is more semantically appropriate if no other, more semantic element applies. The reason why you’re giving it emphasis should be stated through the use of the title attribute. Or in the case of indicating the use of other language, when using XHTML at least, with the use of the lang attribute.
Ultimately, people can find any sort of argument to justify their laziness when writing code; some will go as far as to re-define the meaning of words, or to point fingers as to who’s responsibility it is to correctly interpret the meaning of their content.
All I know is that the W3C shouldn’t encourage this sort of behaviour if it were really committed to its mantra: To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.
@Jorge: No, that's not what's wrong with the <b> element.
If you want to, give the </b><b> element the meaning of <strong> and the <i> element the meaning of <cite> or <em>, whichever it is. I find it hard enough since I already work with Java, C++ and PHP to remember each language-specific quirk. I sometimes don't remember what </em><em> is because it's counter-intuitive. And although I've been speaking English for the past 10 years, I sometimes consider emphasis to be in bold as well.
It's a matter of perception. If you prefer it that way, then fine. I'm all for the new additions such as and to HTML5 because it removes all the clutter both in the HTML and the associated CSS. But… what good does removing <b> do for me?
Ok, so let's say we use <strong>. But </strong><strong> could possibly mean any font-weight above 300 for one person, and any font-weight above 500 for another one. While <b> is already very clear and people have been using bold and italics for ages.
Not to mention the HUGE overhead this can cause. Just 10 more characters for one use of </b><b>. Add 5 more in the css, if the <strong> is styled. Yes, most of us may very well live in the age of T3 and DSL connections, but there are still many more people using dial-up, cable or something else. The web is for EVERYONE, not just those privileged few.
And the overhead does not only apply for poorer internet connections, what about mobile devices? It would take longer to load and it would cost you more. Who wins now? Nokia?
In my view, the </strong><strong>, <em> and <cite> tags are purely presentational, just like <b> and <i>. Want to give something the meaning of the citation? Fine, then add an attribute to the span. Or the whatever tag it is placed in. But don't go self-righteous on everyone by saying your beloved <strong> tag is not presentational. It's just as presentational as <b>.</b></strong></i></b></cite></em></strong></b></strong></b></em></cite></i></strong></b>
The tag “em” does not mean the full range of things that italics does. HTML does not define a tag for all of those possible meanings. That was the entire point of the article. The most common cases have certainly been accounted for by adding the language reference and emphasis formats.
That still leaves a good number of other, more esoteric, uses of italics that have meaning apart from emphasis with no tags attached. It’s better to know what is unknown than to be incorrect. Marking up with incorrect tags is a very poor approach to dealing with these sorts of gaps. Using the wrong tag is certainly not more semantically appropriate than using one with less intrinsic meaning.
Span and Div have no meaning at all. They have no standardized and interpretable meaning to a computer, where the tag “I” at least provides a hint.
If you want to get to a point where everyone can use their own defined stylesheets for navigating through the web, while using other automated tools, you need a richly defined set of tags. That includes some fall-backs for those areas where the standard couldn’t cover everything. That is the direction of growth for the web.
I agree with “Lachy” on this.
I really hate it when a “smart” wysiwyg-editor turns everything you want to be italic to em. I don’t wanted to use italics to emphasize! I wanted to mark up a thought, for example. I just wanted it to be italic or off colored in Lynx. It’s a special sentence, but not an emphasized one, maybe even the opposite of emphasized. Browsers, even Lynx, have great support for it.
It would be a shame to loose i and b. And the wrong use of em and strong is already a shame today. People who depend on wysiwyg adding wrong sematics to there writings without wanting or even knowing it just because the editor is so “smart”.
Jorge,
No. Very definitely not.
Consider,
What’s emphasised there? Italic text has a wide variety of grammatical meanings (not least the typographic habit of setting the entire preface to a book in italic type), and emphasis is just one of them.
TRiG.