XHTML is not for Beginners

2005-12-05MarkUp, StandardsLachlan Hunt

As web standards advocates, many of us participate in numerous online communities such as mailing lists, forums, newsgroups and even blogs (both our own and comments on others). In these communities, we often encounter beginners who are either just starting out with HTML, or have been doing HTML for a while, but are new to the concept of developing with standards.

Invariably, such beginners face the eternal question of HTML or XHTML; and today, I intend to answer this question (as it applies to beginners) once and for all. For experienced users, the answer may be different, this only applies to beginners and to those of us teaching them.

I don’t particularly want to start up the XHTML vs. HTML debate again, nor simply reiterate that XHTML as text/html is extremely harmful; and I must stress that both HTML and XHTML have their uses and it’s important to use the right tool for the job. But for beginners, there needs to be a clear answer with a clear learning path, and those of us teaching them need to be united in our position. For if beginners are hearing different answers from different parties, only confusion will result and we may end up losing them to dark side of the force forever.

Let me start off by saying that XHTML is not for beginners. We must start with HTML and have a clear learning path towards the future with XHTML. It has been argued, that since the future lies with XHTML (although that is yet to be seen), we should be teaching XHTML from the ground up. That sounds nice in theory, but the reality is that we’re still teaching in a predominately text/html environment, and the fact is: trying to teach XHTML under HTML (tag-soup) conditions is like trying to teach a child to swim by throwing them in the deep end and not realising they’re drowning until it’s too late. When it comes to XHTML: there is far too much for a beginner to learn, not to mention the significant issues of browser support, that we must simply accept that they’re not ready and teach them HTML instead.

XHTML is not merely HTML 4 in XML syntax, it comes packaged with all the XML handling requirements as well, with great big “Fragile” and “Handle with Care” stickers on the front of the box. Despite all the myths surrounding the ability to use XHTML as text/html and then simply make the switch to XML when browser support improves, there is significant evidence to show that XHTML developed in a text/html environment will not survive the transition to XML.

The sheer number of tag-soup pages claiming to be XHTML are a direct result of pushing it upon newcomers while leaving out all the extremely important details, most of which they won’t understand yet anyway, but do actually need to learn before using it. I won’t go into the details here, but these issues with XHTML include, among others, the following; and I guarantee that if you ask a beginner (who learned XHTML under HTML conditions) about any of them, they’ll look at you blankly, without a clue what your talking about.

General Markup Issues

Internet Explorer 7 and below do not support XHTML at all, not even limited support. Anyone who says otherwise is either ignorant or lying. (It is expected, but not guaranteed, that IE8 will finally support it).
Well-formedness errors are fatal.
The namespace (xmlns attribute) must be declared in the root element, despite the validator not issuing an error if it’s omitted.
Use of named entity references may be fatal for non-validating parsers (except for amp, lt, gt, quot and apos).
Use xml:lang instead of lang.
The meaning of the XML empty element syntax has a different meaning in SGML and HTML, though browsers don’t support it.
DTDs do not support validation of mixed namespace documents very well.
When served as XML, the DOCTYPE is not required to trigger standards mode in browsers.
The XML declaration will trigger quirks mode in IE6 when served as text/html, it should be omitted in such cases (but see the next few points).

MIME and Encoding

MIME type must be declared appropriately in the HTTP headers (application/xhtml+xml (preferred), application/xml (acceptable) or text/xml (not recommended)).
Encoding should be declared within the XML declaration, rather than the HTTP headers, since XML is a self describing format. (This does not apply to text/xml).
For text/xml, unless specified at the protocol level, US-ASCII must be used.
When the XML declaration is omitted, UTF-8 or UTF-16 must be used, unless specified in a higher level protocol.
The meta element is is useless for specifying the character encoding and MIME type.

Scripts and Stylesheets

script and style elements are parsed differently, the traditional HTML comment-like syntax within script and style elements must not be used for the purpose of hiding from obsolete browsers.
document.write() and document.writeln() do not work.
innerHTML (non-standard property) is not supported by some XHTML UAs.
DOM requires the use of namespace aware methods, where applicable.
DOM methods are case sensitive.
Element and attribute names from DOM methods are exposed case sensitively (lowercase), compared with uppercase in HTML.
XML rules for CSS Stylesheets are applied and they differ significantly from HTML rules. e.g. No special treatment for the body element.
Case sensitivity of CSS selectors depends on the markup language, and are thus case sensitive for XHTML.

I’m quite sure that isn’t a complete list of differences between HTML and XHTML, but each and every one of them (plus any that I’ve missed) needs to be learned by anyone who is learning XHTML properly.

The vast majority of those do not apply, or are at least not exposed well, under HTML conditions. Therefore, because of all of this and the fact that most beginners will be learning under HTML conditions, XHTML is not safe for beginners to learn. By teaching XHTML to beginners, we’re really only teaching a new form of tag soup under the guise of “standards based development” and it is doing significantly more harm than good.

Experienced users who are competent enough to understand all of these issues and make an informed decision about whether to use HTML or XHTML may do so, but we cannot expect the same from beginners. So, let me reiterate that we must be united on this issue and we must encourage beginners to start with HTML, not XHTML.

53 thoughts on “XHTML is not for Beginners”

Sébastien says:

2005-12-05 at 12:03

Thanks for another very useful artice Lachlan!

And guess what? I’m a beginner too, I didn’t know about this one: “When served as XML, the DOCTYPE is not required to trigger standards mode in browsers.”

I’ll be adding this article to the collection to which I point “wanabee XHTML advocates” (to paraphrase a well know article).
Rimantas says:

2005-12-05 at 12:30

At the first glance the only thing missing from the list is html vs. body treatment differences.
And I think you can write another post about Appendix C in XHTML1.0 spec..
Lachlan Hunt says:

2005-12-05 at 12:37

Rimantas, I’m not exactly sure what you’re referring to, could you please elaborate?
minghong says:

2005-12-05 at 13:05

I thought all XML use UTF-8 without specifying.
Rimantas says:

2005-12-05 at 16:14

Lachlan, I mean 14.2 from CSS2.1. If you specify background properties
only for body element, it applies to the entire canvas when document is treated as HTML. In the case of XHTML it does not happen – you must style html element instead.
Pingback: En webbplats på svenska om xhtml » Är XHTML lämpligt för nybörjare egentligen ?
Lachlan Hunt says:

2005-12-05 at 22:48

Rimantas, I already mentioned that, although I didn’t go into the specific details:

XML rules for CSS Stylesheets are applied and they differ significantly from HTML rules. e.g. No special treatment for the body element.
Sebastian, orsus says:

2005-12-07 at 01:26

The real question is what do you need, a beginner has no need of the fancyness of XHTML. A biginner will make tagsoup, and it will crash(If he actually uses the correct MIME type)
Getting a beginner to learn HTML over XHTML will make the web a better place to be.
You are not cool just coz you master XHTML, if you dont need it.
Yannick says:

2005-12-07 at 02:04

Very nice and interesting article Lachlan. Thanks.
Devon says:

2005-12-07 at 03:54

not so fast. A few years ago I taught my 10 year old neice XHTML + CSS when she wanted to know how to make a webpage. It didn’t take her long to learn the basics (and she has ADHD) and she didn’t make mistakes (except mistypes). It’s not nearly as tough as people like to claim. I think it’s only tough for those who’ve set themselves in the valley of HTML and now have a tough time thinking outside that box.
Michael Newton says:

2005-12-07 at 05:17

I’m quite sure that isn’t a complete list of differences between HTML and XHTML, but each and every one of them (plus any that I’ve missed) needs to be learned by anyone who is learning XHTML properly.

When one learns PHP, does one need to learn the differences between PHP and Perl? For people who learned HTML originally, there is certainly a learning curve. But if that’s all you’ve ever known, I don’t see it being a problem. I think one of the advantages of using XHTML is that it forces you to be careful.

That said, if you don’t need XHTML, you may as well use HTML. Then you get to save time and bandwidth by cutting out half your tags if you want! Oh wait, we probably don’t want to get beginners into that habit…
Lachlan Hunt says:

2005-12-07 at 05:45

Michael, Perl and PHP are completely different languages and one never expects a PHP processor to run a Perl script, unlike people expect an HTML parser to handle XHTML.

If you just learned XHTML in an XML environment and you never need to serve it as text/html (thus preventing IE, Google and other older browsers from accessing the page), then you can learn XHTML from scratch without learning the differences between HTML and XHTML.

However, the reason people need to learn the differences between HTML and XHTML is because we’re still building in a predominately text/html environment and the reality is that XHTML pages will invariably be served as text/html to some (if not all) clients or transformed to HTML on the server side.
Harold Teener says:

2005-12-07 at 10:52

I’m a beginner. [swear word removed] YOU Lachy
Lachlan Hunt says:

2005-12-07 at 11:05

Harold, such language is not appreciated. I also fail to understand why you are so agitated by my article. If you’d care to elaborate, I may be able to address your concerns.
Robert Wellock says:

2005-12-07 at 11:23

Interestingly in the past; I have had some people tell me Internet Explorer 6.0 understands XHTML perfectly-well if sent as application/xml but people say strange things.
Lachlan Hunt says:

2005-12-07 at 11:39

Robert, there is an XSLT workaround that can make IE convert an XHTML document served as application/xml into HTML, but it’s still effectively treating it as HTML once it’s converted and uses up a lot of resource in IE, which causes an unnecessary delay for IE users, so it’s pointless. There’s definately no native support for it. To test it, make an XHTML document, give it a .xml extension and open it from your local file system. The result will be the same if served as application/xml from the web.
Robert Wellock says:

2005-12-07 at 12:53

Yes, that’s what I thought the guy uses XSLT a lot so probably explains his odd response.
Martin Smales says:

2005-12-07 at 13:52

Regardless of whether or not pages should be served as application/xhtml+xml (which is besides the point in terms of learning markup), people should always learn and use XHTML because XHTML is the future and HTML is long dead, and rightly so for many reasons.

SVG is now supported in Firefox 1.5. Please tell me learning HTML allows me to quickly learn SVG, for mobile audiences and implementing microformats?
Lachlan Hunt says:

2005-12-07 at 14:03

Martin, the MIME type is certainly not beside the point, it is in fact one of the major reasons. There are very different parsing requirements for XML and HTML, and trying to learn XHTML markup with HTML parsing rules is not learning correctly at all.

The use of SVG certainly requires XML conditions and if, as a beginner, you’re learning XHTML with SVG, you must be learning under XHTML and that is OK. The article is mostly addressing the vast majority of cases where beginners learn under HTML conditions and therefore may as well be using HTML.
Martin Smales says:

2005-12-07 at 14:23

I guess learning HTML is best if pages are to be served in text/html in browsers, which have good recovery mechanisms dealing with HTML errors.

I can see beginners will have no trouble with HTML because of that. How about the long-term effect of learning especially if they are not beginners anymore?

The problem I see is that I think XHTML is equally as good as HTML when serving text/html despite XHTML having certain parsing problems serving under that MIME type.

Firstly, XHTML is the future. There are many other XML-flavoured markups like SVG.

Secondly, when the future comes, people will have to un-learn everything about HTML and re-learn many XML languages including XHTML, MathML and SVG.

Thirdly, pages in HTML is not always well-formed even if they are valid. XHTML handle interoperability of systems a lot better because it is XML.

Hope I am not rambling on, just let it be known.
Lachlan Hunt says:

2005-12-07 at 14:36

Secondly, when the future comes, people will have to un-learn everything about HTML and re-learn many XML languages including XHTML, MathML and SVG.

When the future does come for people currently learning XHTML as text/html, they will still need to unlearn many things they have learned wrongly, so that argument simply doesn’t hold.

pages in HTML is not always well-formed even if they are valid.

There is no formal concept of well-formedness in SGML, so it’s difficult to understand precisely what you mean by it. If you mean that all non-empty elements have end-tags, all entity/character references have semi-colons, all ampersands are encoded (even for the cases where they don’t have to be), etc. then most of that doesn’t matter in the slightest (except insofar as browser bugs are concerned), as SGML has different parsing rules from XML.

The only concept of well formedness that can be carried over from XML is that all elements are properly nested and elements with required start-tags and required end-tags have them, but that is all handled by validation, so there cannot be a document that is valid but not “well-formed” in this psudo-SGML-sense of the term.
Jules says:

2005-12-07 at 16:48

Lachlan:

I don’t believe that discussions like this will encourage the development of the “new amateurs” but instead, may impede it. If any teacher/instructor/student were to read posts like this, they may be discouraged from learning/teaching XHTML and that will perpetuate HTML when the new amateurs should be moving on to XHTML.

While I do agree that the PHP/Perl comparison is not valid for this discussion, I also don’t agree with your shallow end/deep end swimming lessons comparison either: failing to use the correct MIME type is not quite as serious as drowning in a pool.

There are two things here as far as I am concerned: the code and the delivery method and the two are separate. To repeat the same example I just wrote in Roger’s blog, if I emailed you valid XHTML 1.1 code, would you be able to tell me that it was tag-soup because it was not delivered using application/xhtml-xml? What about a printout, or having you look over my shoulder at the page I was putting together on my computer? Code is code, delivery method is delivery method and valid XHTML code is valid XHTML code.

If someone who produced valid XHTML (and their server delivered it using the correct MIME type) had a problem with their server and had to switch to another that was configured for text/html, have they created tag soup? No, their code is just as good as before.

XHTML may be harder for us to learn because we have to unlearn HTML but if a student was taught XHTML properly, they shouldn’t have the same difficulties. Therefore a properly-trained, new student of XHTML will be able to create valid XHTML documents. It is then up to the teacher to ensure that the student knows about MIME types so that the valid XHTML can be delivered properly.

Therefore, I don’t think that learning XHTML should be tied to the delivery of the XHTML pages although the student should also learn about the delivery method too.
Mislav says:

2005-12-07 at 19:03

Secondly, when the future comes, people will have to un-learn everything about HTML and re-learn many XML languages including XHTML, MathML and SVG.

You don’t have to un-learn anything. The XHTML 1.0 specification is based on HTML 4, so it doesn’t define any new elements and their semantics.

A switcher to XHTML only needs to learn the differences described here, and that is not much as you can see. For me the most important part is setting up the server to do content negotiation and serve application/xhtml+xml to compliant browsers – that covers the author’s part of the responsibility. Of course that includes fixing scripts (serves you right when you use proprietary JavaScript).

My compliments to the author – great article This is the most complete overview of XHTML issues I have ever seen. No doubt a lot of people will benefit from reading and learning it…
nortypig says:

2005-12-07 at 21:54

I don’t think there is a right or wrong answer to which way people should go in the beginning. I’d say it depends on personal doggedness, personality, their ability and interest in the subject matter, level of intelligence, and probably whether they are alone or know others who can help. I don’t think using HTML Strict or XHTML Strict makes much difference in the end if they’re just making pages that fulfil a basic learning function. Why not learn to use lowercase and close tags? The world doesn’t stop does it?

I mean why not have a bit of faith that people will adapt and overcome to learn this stuff eventually, incrementally? Others have and probably both ways.

I don’t particularly see why the average Joe who isn’t a professional and just wants to whack up a few pages in the beginning would really care either way. Mostly they just want to get the image to show on the screen. I remember it was from that elation of having made something work that I actually got ‘interested’ and wanted to learn.

So basically I don’t see the issue with learning XHTML first unless the person in question in advance thought that they were going to do this for a career and then the argument probably has some merit, although still I’d say its their call.

The worst thing we can do as technologists is instill a bad-guy image onto XHTML. We’d never move forward in the world if everyone just skipped the new stuff cos it was too hard and unsupported.

Also, I guess this comes back to a case by case basis. I can’t see right or wrong too much either way in any of this… but interesting article Lachlan.

I think the empahsis should be on understanding semantics and more rudimentary stuff in the early stages but that’s just me I guess. Cheers.
Gunlaug Sørtun says:

2005-12-08 at 02:30

Beginners come in all colors and sizes. I expect that most of them can learn XHTML correctly from the start, if the correct information is made available to them. Whether or not to choose HTML or XHTML should be based on knowledge – not on discouragement and “you can’t master XHTML and you don’t need XHTML” arguments.

You are in no position to decide what is best for any beginner, even if you have seen a lot of them get it wrong. So have I. It won’t look any better on the web in the future if all beginners are told that they should just stick to “tag soup” since they can’t master anything else.

Don’t discourage beginners from learning and using XHTML. Encourage them to get it right from the start instead, and inform them so they can make their own, informed, choices. There’s a real lack of in-depth information out here that’s readily available and digestible for beginners. This article has some – but the headline is wrong.
Pingback: Web Coder Plus » XHTML is not for Beginners?
Jim says:

2005-12-09 at 00:17

You forgot one: XML doesn’t have implied elements, so in HTML, every table has a tbody element whether or not you explicitly include one, but in XHTML, tables only have tbody elements if you explicitly include them.

This primarily causes problems when manipulating tables via the DOM, but it can also cause problems with CSS if tbody appears in any selectors (or otherwise affects the selection, e.g. with child selectors).
Jason Mobley says:

2005-12-09 at 03:24

I disagree. A beginner should learn the markup language with the fewest surprises (XHTML). If you write a C program with syntax errors in the source will the compiler just try to guess what you mean and compile it anyway? No! Neither should the parser of markup for the web! HTML is so encumbered with workarounds and quirks that wouldn’t be necessary if the parser would simply let the developer know that they made a mistake (thanks Netscape, Microsoft)! Beginners need to know about their mistakes more than anyone. XHTML has strict rules that are easy to follow. Teach people to use XHTML-only and the UAs will come around.
Pingback: Effair | Billet | Les débutants devraient se tenir loins du XHTML
Lachlan Hunt says:

2005-12-09 at 03:41

Jason, in principle I agree, but for that to work, it requires that XHTML be taught under XML conditions. But the fact is most beginners learn under HTML conditions, in which case everything you say about errors being reported to the author does not apply. If we could work properly in XML conditions and IE, Google and every other browser supported XHTML properly, I agree that we could and should teach XHTML to beginners properly from the start. But until such a time comes and while we’re still teaching and building in an HTML environment, we should be teaching HTML.

The most important issue is that beginners learn and understand semantics and the separation of structure, style and behaviour layers. Once they have such a foundation, it’s relatively easy to learn the additional requirements for XHTML when it is taught properly.
Victor Engmark says:

2005-12-09 at 09:46

To me it seems like you have a point only in case the beginner has to use a CMS which doesn’t support XHTML completely, or will have to maintain HTML tag soup made by others. Then learning and using XHTML will be extra work, but even so it can be very useful, for reasons mentioned below.

I’ve gone from presentational HTML tag soup to semantic XHTML 1.1 and CSS the three years, and the latter has a lot of advantages over the former:
– Easier to ensure that it will work the same everywhere; i.e., either it “just works”, or browsers such as Firefox show an error message.
– Shorter markup, which is easier to debug, and ready to be styled by others using CSS.
– No learning of the three different rules of closing tags; i.e., mandatory, optional, or forbidden.
– Easier to create accessible markup
– Future-proof web pages
– Enables manipulation using XSLT if that should ever be interesting

Besides, many of the “issues” mentioned are IMO not valid:
– IE7 may not support XHTML completely, but no browser in existence even supports HTML 4 perfectly. My site works, all the pages I develop at work are just dandy, and I don’t see why lack of perfect compliance should be a barrier (unless you want to develop 1996-style pages)
– It’s great that well-formedness errors are fatal, for the same reason that missing semicolons lead to compilation errors – It teaches you to write unambiguous code/markup.
– Many of the issues (such as encoding) are moot since beginner tutorials typically start with presenting a typical file framework. The finer points, such as doctype differences, are left for the experts.
– The scripting rules are different, but that doesn’t mean they are more difficult. I’m in no position to elaborate on that, but it would be interesting to know from someone who has learned both.

Shameless plug: My personal website uses valid XHTML 1.1, with the only non-standard behavior being that content is served as application/xhtml+xml to browsers which support it (otherwise the server default is used). And it works fine.
Jim says:

2005-12-09 at 20:30

I guess learning HTML is best if pages are to be served in text/html in browsers, which have good recovery mechanisms dealing with HTML errors.

I can see beginners will have no trouble with HTML because of that.

I can see the exact opposite. If the browser tells you immediately every time you make a mistake, then you learn learn not to make mistakes very quickly. If the browser lets you get away with it, then it will be very hard to get rid of your bad habits.

I’m in two minds about whether beginners should start with HTML or XHTML. XHTML from an authoring perspective is definitely advantageous for the added strictness. But as far as serving is concerned, you need additional expertise to avoid some of the obscure XHTML pitfalls, so HTML is the better choice.

I’ve recently picked up the Kid templating system, and it occurred to me that this would be the best approach for beginners: they author in XHTML (and thus any errors are caught immediately), but you can easily set the final publishing format to HTML, so they don’t have to worry about the problems XHTML causes. If they ever want to switch to XHTML, then a one-line change can do this.

There are three downsides:

1. Added processing requirements: the XSLT transformation doesn’t come for free, it takes time and memory. Still, most beginners have the luxury of low amounts of traffic, so I don’t see this being a problem.

2. Pretty printing. I’m not sure if this has been added to Kid yet, but by default, it generates HTML with no linebreaks, indentation, or anything like that, which could make debugging difficult. If this hasn’t already been added to Kid, then I’m sure it will be soon.

3. Tool complexity. Good luck finding a host with kid installed. Or, if you pre-process it before uploading, you have to learn how to do that.

So unfortunately, Kid isn’t ideal. But the roadblocks are mainly packaging and support. In a learning environment, these aren’t problems, so I’d probably recommend this approach to somebody teaching a course.
Roger says:

2005-12-12 at 20:51

Wouldn’t it make more sense just to encourage beginners to study under XML conditions, instead of letting them start out at the roots of tag soup?

Firefox, W3’s validator and files named .xml should do the job, right?
Pingback: Bien batido y revuelto » Inconvenientes de XHTML
Pingback: Pervasive Smothering » Blog Archive » links for 2006-01-08
George Mikos says:

2006-01-09 at 01:39

OK. I’m a beginner. I missed the ending tag on the 1st URI.

I disagree.

The W3C wrote HTML’s epitaph in December 1999, more than 6 years ago.

All W3C pages validate as XHTML 1.0 Strict.

It seems the major criticism leveled against using XHTML is that the majority of Web pages are tag soup, anathema with XHTML & its requirement for well-formedness. Well, what if all one’s Web pages are well-formed? What if all pages pass W3C validation? If so, then this criticism becomes moot.

The next major criticism is that beginners don’t understand all the technical concepts elite gurus do: XML, MIME types, DOMs, namespaces, etc. OK, so we don’t. However, the W3C says to beginners: Just do what this short Recommended DTDs to use in your Web document tells you to do, and everything will be OK. Guess what? They’re right. Everything is OK.

The next major criticism is that IE6 does not do XHTML, & that IE7 won’t, either. So what? We beginners won’t attempt 3-column float layouts & therefore need to use IE6-specific hacks due to IE6’s non-compliance with the CSS2 box model. You gurus may need to, but we beginners won’t.

If a document is declared as XHTML 1.0 Strict, then all browsers besides IE6 are forced to comply with that W3C-defined open standard. Thus, I need not concern myself with whether or not Konqueror, Opera, Safari, etc. will render my documents correctly.

Of course, testing in both IE6 and Firefox is a necessity.

I am Tidy’ing my 100+ Web pages, and upgrading them all to XHTML 1.0 Strict. They will all pass the W3C’s HTML validation markup, CSS validation, & link checker.

I’ll not be using any of the sophisticated facilities that XML offers. None. I simply want to publish Web pages that render the same cross-browser. XHTML 1.0 Strict fulfills this purpose.

You should retitle your article: XHTML is for beginners, but HTML is for gurus.
Lachlan Hunt says:

2006-01-09 at 02:03

George, there is far more to it than just ensuring well-formedness and validation. This is one of the major points of the article. Using that list of valid DTDs, which is nothing more than an informative document, as evidence for the W3C officially recommending XHTML is laughable and simply ignoring the technical details about XML, MIME types, the DOM, etc. is a very good argument for why you should not attempt XHTML yet; at least until you’ve decided to learn about them.

As for not supporting Konqueror, Opera, Safari, etc. and only testing in IE and Firefox, that’s a mistake. Those 3 browsers, and many others, have excellent support for standards and, in some ways, far superior support than Firefox does.

Lastly, if your not going to be using any XML only features, what on earth is the point of wasting your time by transforming 100+ documents to XHTML 1?
George Mikos says:

2006-01-09 at 18:29

Lachlan,

The iconic Hixie says: “Unfortunately, IE6 does not support application/xhtml+xml (in fact, it does not support XHTML at all).”

True. IE6 doesn’t support XHTML. IE8 may.

False. IE6 correctly renders one of my documents validated as XHTML 1.1 that uses the application/xhtml+xml MIME type.

We need a common definition of well-formedness. Wikipedia’s XML article, Correctness in an XML document section provides this definition.

You missed my point re well-formedness.

The iconic Hixie says: “Why trying to use XHTML and then sending it as text/html is bad. These are not likely to be problems for authors who regularly validate their pages, but other authors will run into these problems.” Lachy echoes this point: “(W)e should be teaching XHTML from the ground up. That sounds nice in theory, but the reality is that we’re still teaching in a predominately text/html environment, and the fact is: trying to teach XHTML under HTML (tag-soup) conditions is like trying to teach a child to swim by throwing them in the deep end and not realising they’re drowning until it’s too late.”

The primary criticism of XHTML among gurus, it seems, is that documents declared as XHTML but rendered as text/html may be tag soup (ie, not well-formed and valid), & thus anathema.

But, what if my documents validate? This criticism becomes void. Even the iconic Hixie has the integrity to state that his criticisms apply mainly (only?) to the tag soup environment.

So, I have a large (187KB) document that validates as XHTML 1.1 using the application/xhtml+xml MIME type. Both IE6 & FF 1.5 render it just fine. As far as I could determine, they render it exactly the same.

Yes, I am aware of the other pitfalls Hixie and others detail; ergo, I avoid these pitfalls.

Hixie also says this: “Scripts that use document.write() will not work in XHTML contexts. (You have to use DOM Core methods.)”

False. I have a document that validates as XHTML 1.1 using the application/xhtml+xml MIME type that also uses document.write(). It works just fine in IE6 & FF 1.5.

You also missed my point about browsers other than IE6 & FF, which is this: If my documents are declared as XHTML 1.0 Strict, then I can be assured without testing in those browsers that those browsers will render my documents correctly. Yes, IE6 does not do XHTML, but all other browsers do, right?

I disagree with your point re needing to learn all about XHTML before using it. All I needed to learn was the URI of the W3C site that says: To declare your document as XHTML 1.0, add this. That’s what I did to upgrade 2 of my documents to XHTML 1.1: I copied-and-pasted the 1.1 !DOCTYPE, eliminated the lang=”en” from the html tag, and replaced all occurrences of “a name=…” with “a id=…”

Let’s assume something changes, & my cut-and-paste !DOCTYPE, html, & meta tags are no longer valid. Well, I go into my global-search-and-replace HTML editor, open all my documents, and replace the invalid tags with valid ones. This 5-minute exercise is not onerous. Granted, the upload will take a few more minutes, but I suppose I could find the time.

You say: Why use XHTML? My response is: Why not? The rigor that XHTML imposes I orgasmically embrace. Declaring my documents as XHTML 1.0 or 1.1 Strict does no harm. They all work fine in IE6 & FF. What’s the big deal?

My pragmatic opinion based upon decades of experience within this industry is that the virulence directed against Microsoft by the anti-IE6 faction within the guru community tsunamis the technical reasons explicated by Lachy, Hixie, & legions of others why beginners should not use XTHML.

Rest assured: I’m not saying you’re a charter member of the anti-IE6 faction. What I’m saying is that I think the anti-IE6 faction sucks; ergo, I’ve chosen to upgrade to XHTML to spurn them.
Lachlan Hunt says:

2006-01-09 at 22:51

George, please provide some evidence that you have an XHTML 1.1 document served as application/xhtml+xml which is accpted by IE and contains working document.write() statements – I do not believe you. I suspect you’ve simply done as many beginners commonly do and changed the meta element to read <meta http-equiv="Content-Type" content="application/xhtml+xml" />, although that has no effect whatsoever upon the MIME type of the document.

There is only one condition under which I’m aware that IE will appear to accept documents really served as application/xhtml+xml, but even then it treats them as text/html. There is, however, absolutely no condition under which any known browser will support document.write() in XML.
George Mikos says:

2006-01-10 at 01:25

Lachlan,

The Tooltip Demo – Map document is declared as XHTML 1.1, & uses the application/xhtml+xml content-type. Scroll down to the bottom & click the W3C icon: The document validates as XHTML 1.1.

tooltip_gm_map.js uses document.write.

document.write works fine with both IE6 & FF 1.5. There is a problem in FF, however, but not a disabling one. I haven’t yet bugged my ISP to add “text/css css” to their Apache mimes.type file. Ergo, FF does not recognize the CSS, but tooltips work OK.

I would guess the reason document.write works is that there is no “?xml version=”1.0″?” declaration at the beginning of the document. Is this guess correct?
Lachlan Hunt says:

2006-01-10 at 01:42

George, this is a perfect illustration of why I say XHTML is not for beginners, you have clearly misunderstood several aspects I mentioned in the article and are doing things wrongly. In particular, that page is not served as application/xhtml+xml, it is text/html. I intend to post an article about this topic in the next few days, I suggest you read it when I publish it.

Whether or not document.write() works, has absolutely nothing to do with the XML declaration being present or not, it has everything to do with the MIME type used.
George Mikos says:

2006-01-10 at 02:01

Well, Lachlan, I am gratified that I stoked your creative juices.

Insofar as your arguments are concerned, I tend to dismiss elitism under any guise. We would all be better served if you stopped trying to prove your superiority.
Lachlan Hunt says:

2006-01-10 at 02:23

I’m am not trying to prove my superiority, nor do I consider myself an elitist. I was merely disputing your arguments on technical grounds and if you choose to see such arguments as a form of elitism, fine; but if you wish to learn, I suggest you don’t simply dismiss them.
George Mikos says:

2006-01-10 at 17:42

Better, Lachlan.

Let me reiterate my main argument.

I’ve read the iconic Hixie article & others, but find his & other such arguments uncompelling. The primary objections, it seems, are 2:

(1) Documents can be declared as XHTML yet be non-validated, malformmated, & tag-souped; &
(2) Facilities that work in HTML might not work in XHTML.

Both these arguments are easily dismissable:

(1) I’ve Tidy’ed & W3C-validated my documents; &
(2) I use none of these facilites; eg, hiding script & style elements from legacy UAs.

Remember, Hixie said this: “These are not likely to be problems for authors who regularly validate their pages, but other authors will run into these problems.”

Not using XHTML because documents might be tag soup is ultimately a hollow argument.

The iconic Hixie seemed to think he needed to rely upon disingenuous arguments, perhaps to pad his article & make a stronger case. For example, he states this:

“In addition, currently, the majority (over 90% by most counts) of the UA market is unable to correctly render real XHTML content sent as text/xml (or other XML MIME types). For example, point IE at:

http://www.mozillaquestquest.com/

Only Mozilla, Mozilla-based browsers such as Netscape 6 and 7, recent versions of Opera, and Safari, are able to correctly render that site. (IE6 shows a DOM tree!)”

Indeed, Hixie is correct. I pointed IE6 at the document and it showed a DOM tree. However, when I added a !DOCTYPE to it, IE6 rendered the document properly. Try it for yourself if you don’t believe me. Both the W3C-Recommended XHTML 1.0 & 1.1 !DOCTYPE’s fix the problem.

Of course, IE6 sucks, & doesn’t do XHTML. Those indisputable facts are givens.

I ran across a problem with IE6 that nobody else has documented — at least, insofar as my search was concerned; ie, it may be documented somewhere but the usual sites that detail IE6 bugs didn’t mention it. But, I fixed it, & moved on.

The situation with IE6 is resoluble: Simply test all documents in IE6 & Firefox, which I do. The nice thing about XHTML 1.0 Strict is that, if documents render acceptably in Firefox, then they should aslo work OK in all other browsers that do XHTML correctly.

From a pragmatic standpoint, XHTML 1.0 Strict does no harm despite IE6, & does some good.

Your argument that, as a beginner, I should learn all about XHTML before using it is also uncompelling. The simple fact is that my current implementation of XHTML is 100% OK. However, when I finally upgrade all my documents to XHTML, I’ll then have a sound foundation for advancing to the next level & learning all about XHTML & its requisite, concomitant technologies.

October of 2007 will be my 40th anniversary as a software professional. I’ve been around the block more than a few times. Along the way, that harsh mistress called reality has transformed me into a pragmatist.

Sorry, but arguments like yours simply don’t resonate with me, anymore. My jousting at windmills phase is history.
Lachlan Hunt says:

2006-01-10 at 23:45

George, each and every one of your arguments is flawed, though I don’t have the time to painstakingly go through each one. One in particular that I will address is that the presence or absense of the DOCTYPE is irrelevant as far as rendering XHTML served as XML in IE is concerned.

My guess is that when you saved the MozillaQuestQuest site to add the DOCTYPE, you gave the file a .html extension, in which case the DOCTYPE is irrelevant insofar as handling as HTML is concerned (ignoring quirks/standards mode). .html extensions are, by default, associated with text/html and when you opened the file in IE, that is the MIME type used. If you use a .xml extension instead, which is associated with either application/xml or text/xml, then you will see a DOM tree in IE regardless of the DOCTYPE. If you’re testing from a server, ensure the MIME type sent is application/xml to test it properly.

As for your claim that your XHTML is 100% ok, I find that hard to believe in light of all your flawed arguments and just by looking at the sample you lined to earlier. These are just some of the problems I found:

Use of named entity references (considered bad practice because it requries a validating parser).
MIME type declared as text/html in the HTTP headers.
Attempted use of the meta element for specifying the character encoding and MIME type – this doesn’t work.
Use of ISO-8859-1 without specifying in HTTP headers or in the XML declaration, you need to use UTF-8 or UTF-16 in such cases.
Use of HTML comments to hide javascript (1 instance).
Use of document.write() in scripts (plus, they’re outputting uppercase tag-names).
George Mikos says:

2006-01-12 at 16:51
Lachlan,

Sorry. Hixie was disingenuous re http://www.mozillaquestquest.com.

Let’s cite the iconic Activating the Right Layout Mode Using the Doctype Declaration article by Henri Sivonen. Note the link to the iconic Hixie article.

“I am not recommending any of the XHTML doctypes, because serving XHTML as text/html is considered harmful.
- "Document contains tag soup and/or relies on quirks. (Authoring new pages like this is a bad thing.) No doctype
- Document validates as HTML 4.01 Transitional (may contain deprecated markup) and works with the CSS2 box layout model (for compatibility with those Mozilla-based browsers that activate the full standards mode)… HTML 4.01 Transitional …
- Document validates as HTML 4.01 Strict and works with the CSS2 box layout model… HTML 4.01 … strict"
If the reverse is true, then, since the above page has no doctype, it must be tag soup.

You keep saying things like, "Each and every one of your arguments is flawed, though I don’t have the time to painstakingly go through each one."

Well, I’ve expressed the same argument 3 times, and you haven’t responded. Let me paraphrase it for the 4th time: The iconic guru or iconic guru-in-training (Which are you?) iconic argument that "Sending XHTML as text/html Considered Harmful" is valid only if the XHTML is tag soup.

Your article paraphrases: "(T)rying to teach XHTML under HTML (tag-soup) conditions is like trying to teach a child to swim by throwing them in the deep end and not realising they’re drowning until it’s too late."

However, I contend that this argument is meritless if the XHTML is not tag soup; ie, is W3C-validated.

Let’s reboot. What I find offensive in your article is your self-proclaimed guru status: "… this only applies to beginners and to those of us teaching them… we must simply accept that they’re not ready and teach them HTML instead".

Sorry, but nobody appointed you my teacher, least of all me.

OK. The above comments re your hubris were necessary. Decades ago, I went thru my hubristic self-appointed guru phase, too. Now that I’m an adult, one of my missions is to try to teach others like you what I’ve learned re this posture.

Carrot and stick. Stick and carrot. Carrot and stick.

I couldn’t agree more strongly with your statement: "… but are new to the concept of developing with standards."

The key is standards, not HTML vs XHTML.

I’ve been at it since June upgrading my site to XHTML 1.0 Strict, but XHTML is only part of the upgrade. Best Practices are, too. Among the ones I’ve implemented are:
- I use no tags or attributes deprecated in HTML 4.01. – Tables are not used for layout, only float and absolute positioning.
- All presentation elements are defined using CSS.
- All CSS styles are defined in external style sheets and not in individual documents; ie, the style tag appears nowhere, either in the head or body section.
- All documents use a strictly defined template.
- Although font sizes defined in my style sheets use pixels, I could easily upgrade to dimensionless font sizes. In fact, I have a document that uses the same styles I use in all other documents, but where the font sizes are dimensionless.
There are exceptions to the above rules, but there are cogent rationales for each. For example, the Demo document I cited in an earlier post defines styles in the document’s head section, but this document is intended as a tutorial & has the relevant styles included within a textarea box.

By the way, did you know that W3School’s main page for their XHTML Tutorial uses tables for layout?

Thanks for pointing out 2 problems in the aforecited Demo program; Use of comments to hide scripts; the HTML created by the JavaScript is essentially tag soup.

Here’s the deal. Since all my documents use a strict template, I can fix the problem with the comments hiding scripts using global search-and-replace, a trivial process; ie, every one of my documents uses the exact same syntax for these comments. In 10 minutes, the corrected documents will be uploaded. Plus, I plan to rewrite the JavaScript. It does suck.

The fact that I use a strict document template bears upon another point I made: If the cut-and-paste !DOCTYPE and other statements in the head section need to be changed, I can use global search-and-replace here, too.

Remember: I’ve made the point repeatedly that my documents conform to W3C-defined standards, and W3C-validate.

Sure, I’ve C&P’ed what the W3C recommends in their Recommended DTDs to use in your Web document article. You may object to what they recommend, and may even be able to prove that the W3C errs. However, I consider the W3C to have more gravitas than you.

I may have ranted and railed against Microsoft longer than you’ve been alive. I still do. However, Microsoft contributes positively to the US trade deficit. I can live with IE6 not doing XHTML. The workarounds are trivial.

Of all the posts here, Victor Engmark’s rocks the most. Why? Because he is a pragmatist like me rather than an anti-Microsoft utopian Marxist.

Carrot and stick.

You and your fellow iconic or iconic-in-training gurus would be better served if you lost your condescending and patronizing attitude. If the intent of your article was to prove to your fellow iconic or iconic-in-training gurus that you are a certified master of XML, you’ve proven that point beyond dispute. However, if your intent was to teach us the-masses-are-asses beginners, you’ve failed.

In my opinion, being ruthlessly OCD about developing and using rigorous standards is far more important to ‘teach’ than teaching “XHTML is not for beginners”. Standards should be a precursor for everything.
Lachlan Hunt says:

2006-01-12 at 22:45

George, IMHO, you would do well to listen than to rant and rave about topics for which you only think you know more than you actually do. Also, I’d appreciate it if you kept your insulting comments and attitude to yourself.

MozillaQuestQuest doesn’t need a DOCTYPE to trigger standards mode, it’s delivered as XML and DOCTYPE switching only applies to text/html. XML documents always use standards mode.

Sending XHTML as text/html is harmful regardless of whether it validates or not, I repeat: there is more to XHTML than just ensuring well-formedness and validity! and that is why your rebuttal stating that “is valid only if the XHTML is tag soup” is wrong.

In fact, much of the damage has already been done. For example, browsers will never support SGML SHORTTAG syntax in HTML because of the number of authors sending XHTML as text/html and thus using XML empty element syntax in HTML.

Did you know that W3Schools is not affiliated with the W3C and that both their site and their tutorials are often criticised in web standards circles!?

Regarding the recommended DTD list, that list contains HTML 4, XHTML 1.x and many others, and just because their template is an XHTML document doesn’t mean that the W3C strongly recommends XHTML over HTML.

Remember: I’ve made the point repeatedly that my documents conform to W3C-defined standards, and W3C-validate.

While your documents may validate, they certainly don’t conform to all the standards that you so wildly proclaim. In fact, I’m yet to see an XHTML document you’ve published that is served as XML, they’ve all been text/html!

You’re right about the key being standards, but when it comes to XHTML vs HTML, that is all about standards. There’s no point in trying to teach standards with XHTML if XHTML is being taught wrongly, which in most cases it is.
Henri Sivonen says:

2006-01-13 at 23:02

If the reverse is true, then, since the above page has no doctype, it must be tag soup.

The advice on choosing a doctype is scoped to text/html: “Here are simple guidelines for choosing a doctype for a new text/html document” (emphasis added). Therefore, the advice does not apply to MozillaQuestQuest nor does doctypelessness make XML tag soup. (I guess I should make the scope of my page even clearer.)
By the way, did you read all the way to the end including the addendum?
Iconic?
George Mikos says:

2006-01-14 at 19:02

Henri,

Your article is iconic. It pops up near the top of sundry Google searches: #8 on “DOCTYPE”, #2 on “DOCTYPE quirks standard”, #8 on “DOCTYPE HTML XHTML”, etc.

Plus, expressed in the most earnest and guileless way possible, it is simply an outstanding article.

Yes, what I extrapolated from your article is febrile. Of course, the reverse is not true. Your article is fine as is.

Yes, I had read your addendum but its total import escaped me until now. You’re right. We need a break with the past and, as you might say, doing stochastic rather than heuristic.
George Mikos says:

2006-01-14 at 20:48

OK, Lachlan, time to tone it down.

A definition is in order:

Sandbag: Downplay one’s ability (towards others) in a game in order to deceive, as in gambling.”>

Since last June, I’ve been self-educating myself on XHTML, and upgrading my Web site to it. From a technical standpoint, my Web site is unflourished. On a guru scale from 0 to 100, 100 being best, it might rate single-digit, maybe even low single-digit.

About 2 months ago, I encountered a vexing problem, so I posted it on some forum. One of the left-field responses was: “Why are you using XHTML?” I responded “Why not?” The response was: “Because Hixie says not to.” Hixie’s article started a cascade of events that ended on your blog. More later.

Although I couldn’t express my overriding goal for my upgrade to XHTML 2 months ago in simple terms, I can do so now: What I’m doing is future-proofing my documents when XML is the universe. That had been my goal all along, but I couldn’t express it in these terms until now.

I’ve made a concerted effort to do the upgrade the right way; specifically, using industry-accepted standards and Best Practices. I perused hundreds of articles to determine what those standards and practices are. Slowly and incrementally, I embraced them and tried to implement them.

I like to keep things clean and simple. In fact, I try to use the minimum subset of software technology to get the job done. However, what I do use, I like to use extraordinarily well. Doing so requires adopting ruthless & pitiless standards. No problem. I am nano-OCD on standards. Re standards, Monk on the USA network has no idiosyncratic quirks compared to me.

Now, back to you and Hickson and Henri. You and your cohorts comprise a large (majority?) block within the senior Web developer community that strongly recommends against using XHTML. Your arguments seemed defensible, but I wasn’t convinced. However, our dialogue has been clarifying, not specifically for its content but mainly for its catalytic nature. It made me think. Writing makes me think. Exchanging written thoughts with a real person makes me think even harder. Now, I understand this issue far better.

What I did yesterday was convert one of my documents to XML to vet the guts of my upgrade to XHTML and use Firefox to render it. The document rendered just fine. I encountered only 3 problems: (1) UTF-8; (2) The slightly different way XML treats the body tag and the CSS margin attribute; and (3) A JavaScript execution error in other-authored code. The first 2 problems took about 15 minutes to correct, and the last should be easy to fix using the JavaScript debugger in the Web Developer toolbar.

While the XML document rendered OK locally, it didn’t after I uploaded it. Unfortunately, my ISP’s Apache server has misconfiguration problems: Firefox won’t render even DOCTYPE-less documents since their server doesn’t access the CSS. Mozilla has documented the fix here in their Incorrect MIME type for CSS files article.

Bottom line: What I proved to myself is that I can do XML and XHTML without learning all about XHMTL. However, I do need to implement standards.

For example, based upon the Mozilla’s Properly Using JavaScript and CSS in XHTML documents article, I plan to externalize all CSS and JavaScript. This article references the W3C HTML Compatibility Guidelines article. That article recommends using UTF-8 encoding. I trial-validated a few of my documents in UTF-8. There were errors, but easily correctible.

Finally, even though I tried to completely separate content from presentation using CSS, there were exceptions; eg, bolding and italicizing. Not good. I plan to check all unique HTML tags I use to make sure that any that do presentation are replaced with CSS.

So, does my experience invalidate your article? Yes, but only in my case. What I do and how I do it is certainly not extrapolatable to the world.
chris bennett says:

2006-01-29 at 04:38

Does anyone have any suggestions on helpful sites to actually learn best practices or is this only a head-butting session?
Pingback: As Dicas de CSS que você não deve deixar de saber » Revolução
Pingback: Mudei meu mime-type » Revolução Etc