As web standards advocates, many of us participate in numerous online communities such as mailing lists, forums, newsgroups and even blogs (both our own and comments on others). In these communities, we often encounter beginners who are either just starting out with HTML, or have been doing HTML for a while, but are new to the concept of developing with standards.
Invariably, such beginners face the eternal question of HTML or XHTML; and today, I intend to answer this question (as it applies to beginners) once and for all. For experienced users, the answer may be different, this only applies to beginners and to those of us teaching them.
I don’t particularly want to start up the XHTML vs. HTML debate again, nor simply reiterate that XHTML as text/html is extremely harmful; and I must stress that both HTML and XHTML have their uses and it’s important to use the right tool for the job. But for beginners, there needs to be a clear answer with a clear learning path, and those of us teaching them need to be united in our position. For if beginners are hearing different answers from different parties, only confusion will result and we may end up losing them to dark side of the force forever.
Let me start off by saying that XHTML is not for beginners. We must start
with HTML and have a clear learning path towards the future with XHTML. It
has been argued, that since the
future lies with XHTML (although that is yet to be
seen), we should be teaching XHTML from the ground up. That sounds nice in
theory, but the reality is that we’re still teaching in a predominately
and the fact is: trying to teach XHTML under HTML (tag-soup) conditions is
like trying to teach a child to swim by throwing them in the deep end and
not realising they’re drowning until it’s too late. When it comes to XHTML:
there is far too much for a beginner to learn, not to mention the significant
issues of browser support, that we must simply accept that they’re not ready
and teach them HTML instead.
XHTML is not merely HTML 4 in XML syntax, it comes packaged with all the XML
handling requirements as well, with great big “Fragile” and “Handle
stickers on the front of the box. Despite all the myths surrounding the ability
to use XHTML as
text/html and then simply make the switch to XML when browser
support improves, there is significant evidence to show that XHTML developed
text/html environment will not survive the transition to XML.
The sheer number of tag-soup pages claiming to be XHTML are a direct result of pushing it upon newcomers while leaving out all the extremely important details, most of which they won’t understand yet anyway, but do actually need to learn before using it. I won’t go into the details here, but these issues with XHTML include, among others, the following; and I guarantee that if you ask a beginner (who learned XHTML under HTML conditions) about any of them, they’ll look at you blankly, without a clue what your talking about.
General Markup Issues
- Internet Explorer 7 and below do not support XHTML at all, not even limited support. Anyone who says otherwise is either ignorant or lying. (It is expected, but not guaranteed, that IE8 will finally support it).
- Well-formedness errors are fatal.
- The namespace (
xmlnsattribute) must be declared in the root element, despite the validator not issuing an error if it’s omitted.
- Use of named entity references may be fatal for non-validating parsers
- The meaning of the XML empty element syntax has a different meaning in SGML and HTML, though browsers don’t support it.
- DTDs do not support validation of mixed namespace documents very well.
- When served as XML, the
DOCTYPEis not required to trigger standards mode in browsers.
- The XML declaration will trigger quirks mode in IE6 when served as
text/html, it should be omitted in such cases (but see the next few points).
MIME and Encoding
- MIME type must be declared appropriately in the HTTP headers (
- Encoding should be declared within the XML declaration, rather than the
HTTP headers, since XML is a self describing format. (This does not apply to
text/xml, unless specified at the protocol level,
US-ASCIImust be used.
- When the XML declaration is omitted,
UTF-16must be used, unless specified in a higher level protocol.
metaelement is is useless for specifying the character encoding and MIME type.
Scripts and Stylesheets
styleelements are parsed differently, the traditional HTML comment-like syntax within script and style elements must not be used for the purpose of hiding from obsolete browsers.
document.writeln()do not work.
innerHTML(non-standard property) is not supported by some XHTML UAs.
- DOM requires the use of namespace aware methods, where applicable.
- DOM methods are case sensitive.
- Element and attribute names from DOM methods are exposed case sensitively (lowercase), compared with uppercase in HTML.
- XML rules for CSS Stylesheets are applied and they differ significantly
from HTML rules. e.g. No special treatment for the
- Case sensitivity of CSS selectors depends on the markup language, and are thus case sensitive for XHTML.
I’m quite sure that isn’t a complete list of differences between HTML and XHTML, but each and every one of them (plus any that I’ve missed) needs to be learned by anyone who is learning XHTML properly.
The vast majority of those do not apply, or are at least not exposed well, under HTML conditions. Therefore, because of all of this and the fact that most beginners will be learning under HTML conditions, XHTML is not safe for beginners to learn. By teaching XHTML to beginners, we’re really only teaching a new form of tag soup under the guise of “standards based development” and it is doing significantly more harm than good.
Experienced users who are competent enough to understand all of these issues and make an informed decision about whether to use HTML or XHTML may do so, but we cannot expect the same from beginners. So, let me reiterate that we must be united on this issue and we must encourage beginners to start with HTML, not XHTML.