As web standards advocates, many of us participate in numerous online communities
such as mailing lists, forums, newsgroups and even blogs (both our own and
comments on others). In these communities, we often encounter beginners who
are either just starting out with HTML, or have been doing HTML for a while,
but are new to the concept of developing with standards.
Invariably, such beginners face the eternal question of HTML or XHTML; and
today, I intend to answer this question (as it applies to beginners) once and
for all. For experienced users, the answer may be different, this only applies
to beginners and to those of us teaching them.
I don’t particularly want to start up the XHTML vs. HTML debate again, nor
simply reiterate that XHTML as text/html is extremely harmful; and I must stress
that both HTML and XHTML have their uses and it’s important to use the right
tool for the job. But for beginners, there needs to be a clear answer with a
clear learning path, and those of us teaching them need to be united in our
position. For if beginners are hearing different answers from different parties,
only confusion will result and we may end up losing them to dark side of the
force forever.
Let me start off by saying that XHTML is not for beginners. We must start
with HTML and have a clear learning path towards the future with XHTML. It
has been argued, that since the
future lies with XHTML (although that is yet to be
seen), we should be teaching XHTML from the ground up. That sounds nice in
theory, but the reality is that we’re still teaching in a predominately text/html environment,
and the fact is: trying to teach XHTML under HTML (tag-soup) conditions is
like trying to teach a child to swim by throwing them in the deep end and
not realising they’re drowning until it’s too late. When it comes to XHTML:
there is far too much for a beginner to learn, not to mention the significant
issues of browser support, that we must simply accept that they’re not ready
and teach them HTML instead.
XHTML is not merely HTML 4 in XML syntax, it comes packaged with all the XML
handling requirements as well, with great big “Fragile” and “Handle
with Care”
stickers on the front of the box. Despite all the myths surrounding the ability
to use XHTML as text/html and then simply make the switch to XML when browser
support improves, there is significant evidence to show that XHTML developed
in a text/html environment will not survive the transition to XML.
The sheer number of tag-soup pages claiming to be XHTML are a direct result
of pushing it upon newcomers while leaving out all the extremely important
details, most of which they won’t understand yet anyway, but do actually
need to learn before using it. I won’t go into the details here, but these
issues with XHTML include, among others, the following; and I guarantee that
if you ask a beginner (who learned XHTML under HTML conditions) about
any of them, they’ll look at you blankly, without a clue what your talking
about.
General Markup Issues
- Internet Explorer 7 and below do not support XHTML at all, not even limited
support. Anyone who says otherwise is either ignorant or lying.
(It is expected, but not guaranteed, that IE8 will finally support it).
- Well-formedness errors are fatal.
- The namespace (
xmlns attribute) must be declared in the root element, despite
the validator not issuing an error if it’s omitted.
- Use of named entity references may be fatal for non-validating parsers
(except for
amp, lt, gt, quot and apos).
- Use
xml:lang instead of lang.
- The meaning of the XML empty element syntax has a different meaning in
SGML and HTML, though browsers don’t support it.
- DTDs do not support validation of mixed namespace documents very well.
- When served as XML, the
DOCTYPE is not required to trigger standards
mode in browsers.
- The XML declaration will trigger quirks mode in IE6 when served as
text/html,
it should be omitted in such cases (but see the next few points).
MIME and Encoding
- MIME type must be declared appropriately in the HTTP headers (
application/xhtml+xml (preferred), application/xml (acceptable) or text/xml (not recommended)).
- Encoding should be declared within the XML declaration, rather than the
HTTP headers, since XML is a self describing format. (This does not apply to
text/xml).
- For
text/xml, unless specified at the protocol level, US-ASCII must
be used.
- When the XML declaration is omitted,
UTF-8 or UTF-16 must be used, unless
specified in a higher level protocol.
- The
meta element is is useless for specifying the character encoding and
MIME type.
Scripts and Stylesheets
script and style elements are parsed differently, the traditional HTML
comment-like syntax within script and style elements must not be used for the purpose
of hiding from obsolete browsers.
document.write() and document.writeln() do not work.
innerHTML (non-standard property) is not supported by some XHTML UAs.
- DOM requires the use of namespace aware methods, where applicable.
- DOM methods are case sensitive.
- Element and attribute names from DOM methods are exposed case sensitively
(lowercase), compared with uppercase in HTML.
- XML rules for CSS Stylesheets are applied and they differ significantly
from HTML rules. e.g. No special treatment for the
body element.
- Case sensitivity of CSS selectors depends on the markup language, and are
thus case sensitive for XHTML.
I’m quite sure that isn’t a complete list of
differences between HTML and XHTML, but each and every one of them (plus
any that I’ve missed) needs to be learned by anyone who is learning XHTML properly.
The vast majority of those do not apply, or are at least not exposed well,
under HTML conditions. Therefore, because of all of this and the
fact that most beginners will be learning under HTML conditions, XHTML is
not safe for beginners to learn. By teaching XHTML to beginners, we’re really
only teaching a new form of tag soup under the guise of “standards based
development” and it is doing significantly more harm than good.
Experienced users who are competent enough to understand all of these issues
and make an informed decision about whether to use HTML or XHTML may do so,
but we cannot expect the same from beginners. So, let me reiterate that we must
be united on this issue and we must encourage beginners to start with HTML,
not XHTML.