Category Archives: MarkUp

SGML, (X)HTML, XML and other markup languages.

Web Forms 2.0 W3C Working Draft

The Web Forms 2.0 specification that was drafted by the WHATWG has been published as a W3C Working Draft. This spec provides many enhancements to traditional HTML forms including new form controls, a repetition model, XML submission, and DOM interface and event enhancements.

The spec is still being developed by the WHATWG, but has been published by the The Web Application Formats Working Group’s Rich Web Clients activity.

Accessible Alternate Content

For accessibility reasons, it’s important to specify alternate content for any multimedia including images, audio, video, etc. When a text alternative is provided for an image, it should serve the same purpose as the image itself and not just describe what it looks like; and similarly for other multimedia content. However, there are some cases where it’s important to know that the alternate content is replacing an image in order to give some context, even if the user can’t actually see the image (either by choice or phyisical limitation).

For example, take a recent post from Joe Clark – a well respected accessibility consultant – entitled Kills Bugs Dead. When I read this article in my feed reader, which is configured to view plain-text only (no images), this is what I read:

Truck fender has illustrations of six green-and-yellow grasshoppers in workboots and shades striking different poses, with the last two seated and stretched out on its back

After reading that, I had no clue what he was talking about. I had no idea what truck fender he was talking about, let alone having knowledge of any grasshoppers. Once I loaded the article in my browser and saw a photograph of a truck with grasshoppers as described, it suddenly became clear: the article was about the image itself and since I didn’t know there was an image, it was read out of context and didn’t make any sense.

A few weeks later, in a recent instant messaging (IM) discussion with Charl van Niekerk, we were discussing his alternate content (or lack thereof) for an image he’d used in an article entitled The guilty one at Koeberg. As in Joe Clark’s case, Charl’s article was also about the image and when I first read it in my feed reader, this is part of what I read:

South Africans might enjoy this one. 🙂

Now we know! 🙂

In this case, not only is it important to know what the image says, it’s important to know that there is an image between those two lines because it gives some context to the other content around it. Additionally, just because the user may be viewing the content without images enabled in one environment, that doesn’t mean the user can’t see the image at all and so there should be a way for the user to access it, if desired.

To summarise, these are the requirements for a solution to this problem:

  • Make it clear that there is an image being discussed.
  • Provide suitable alternate content for where the image is not available.
  • Provide easy access to the image for the user to view.

The first issue may be easily addressed by prepending the phrase “Image:” or “Photograph of…” to the alternate content and the second point is addressed by actually providing alternate content. But the third point is a little tricker.

One way to do this is with a hyperlink, but unfortunately most images are included using the img element, which only allows plain text with no markup. In Charl’s case, he made use of the object element which has a much richer content model and allows things like hyperlinks within. His markup now looks like this (URIs modified to make the example shorter):

<object type="image/jpeg" data="homer.jpg">
  <a href=" homer.jpg" type="image/jpeg">Image: Eskom –
     Koeberg Nuclear Power Plant - Reactor Maintainance,
     Head of Maintainance: Homer J. Simpson</a>
</object>

As you can see, this now clearly addresses all three requirements. But what about the average author that uses the good old img element? One alternative is to make the image itself a hyperlink, like this:

<a href=" homer.jpg" type="image/jpeg">
  <img src="homer.jpg" alt="Image: Eskom –
     Koeberg Nuclear Power Plant - Reactor Maintainance,
     Head of Maintainance: Homer J. Simpson"></a>

Another alternative is to place a hyperlink as text after the image, like this:

<img src="homer.jpg" alt="Image: Eskom –
    Koeberg Nuclear Power Plant - Reactor Maintainance,
    Head of Maintainance: Homer J. Simpson">
<a href="homer.jpg" type="image/jpeg">(view image)</a>

There may be other, possibly more accessible, alternatives I couldn’t think of. Let me know. What do you think of these methods and how would you improve them?

Content-Type

When it comes to the web, one of the most important yet least understood concepts is the media type of a file and, for text files, the character encoding. Raise your hands now if you’ve ever been guilty of including the following meta element (or equivalent) in an HTML or XHTML document:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

Anyone who has ever created an HTML document and did not raise their hand to that question is a liar — every single HTML author in the world has used it and, today, I am going to explain what it does and does not do, and explain what you should use instead.

HTTP Response Headers

HTTP response headers are sent along with every single HTTP response and contain metadata about the file being sent. The response header contains a number of header fields used to specify a variety of information such as the last modified dates, content length, encoding information and, in particular, the Content-Type.

Each header field appears on a new line and takes the following format (white space is optional):

Header-Field: value; parameter=parameter-value

There are various tools available for you to examine the HTTP headers sent by your server, such as the Web Developer toolbar, the Live HTTP Headers extension, Fiddler or an online tool like the W3C’s HTTP HEAD service.

What is Content-Type?

Content-Type is an HTTP header field that is used by the server to specify, and by the browser to determine, what type of file has been sent and received, respectively, in order to know how to process it. The field value is a MIME type, preferably one registered with IANA, followed by zero or more parameters.

For HTML documents, this value is text/html with an optional charset parameter. Take a look at the meta element above and you will see the value of the content attribute contains this MIME type and the charset parameter, separated by a semi-colon, which matches the format of the HTTP header field value. Thus, the HTTP Content-Type header field should look something like this:

Content-Type: text/html; charset=UTF-8

Although, technically, the charset parameter is optional, it should always be included correctly.

The Meta Element

The meta element in HTML has two attributes of interest in this case: http-equiv and content. The http-equiv attribute, which was designed as a method to include HTTP header information within the document, contains the name of the header field and the content attribute contains its value.

The intention was that it be used by HTTP servers to create/set real HTTP response headers prior to sending the document, but the reality is that there are none (at least none that I’m aware of) that ever do this. It was not really intended for processing by user agents on the client side, although it is described in the section on specifying the character encoding that user agents should, in the absence of the information from a higher level protocol, observe the meta element for determining the character encoding.

It is, however, not used by any user agent for determining any other HTTP header information and thus including it for anything but Content-Type is nothing short of completely useless, regardless of the examples given in the HTML 4.01 recommendation.

The content Attribute

When used for specifying the Content-Type, despite the fact that it includes both the media type and the charset parameter, it is only ever used by browsers to determine the character encoding. Despite the popular misconception, it is not used to determine the MIME type, as the MIME type needs to be known before parsing the file can begin and (as always) the information specified by a higher level protocol (like HTTP) takes precedence.

The Content-Type header is always included for HTML files sent over HTTP and it must at least contain the MIME type: text/html. In the absence of this header, the HTTP protocol provides some guidance on how to handle it, but it will likely end up being treated as application/octet-stream, which typically results in the user agent prompting the user for what to do with the file.

Therefore, regardless of the MIME type included within the meta element, the MIME type used for HTML documents will always be text/html. (XHTML documents served as text/html are considered to be HTML documents for the purpose of this discussion). This makes the practice of using the following within XHTML documents completely useless for specifying the MIME type:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />

Infact, for XHTML served as XML, this meta element is not used at all – not even for the character encoding. In such cases, XML rules apply and the encoding is determined based on protocol information (e.g. HTTP headers), XML declaration or the Byte Order Mark.

Determining Character Encoding

As mentioned, browsers do make use of the meta element for determining the encoding in HTML. However, when the document is served over HTTP, this is in direct violation of the HTTP 1.1 protocol [RFC 2616] which specifies a default value of ISO-8859-1 for text/* subtypes. This too is in violation of RFC 2046, which specifies US-ASCII, but the discussion of this issue is best saved for another post.

Additionally, for text/* subtypes, web intermediaries are allowed to transcode the file (i.e., convert one character encoding to another) and if the default encoding is assumed, yet another is declared inline (which would not be parsed by such an intermediary), then the results may not be good. For these reasons, it is not recommended that inline encoding information be relied upon in text/html. (Interestingly, these same reasons apply to the use of text/xml, which is partly why text/xml is not recommended for use in favour of application/xml.)

Setting HTTP Headers

Although it may seem much easier to copy and paste the meta element into every HTML document published, it is almost as trivial to configure the server to send the correct HTTP headers. The method to do so will vary depending on the server or server-side technology used, but specific information can usually be found in the appropriate documentation. The W3C’s I18N activity have provided a useful summary of how to specify the encoding information using various servers and languages.