Category Archives: User Agents

Browsers, search engines and other user agents

Validating (X)HTML With IE Using File Upload

Warning: The following describes how to modify the registry in order to trick Windows XP SP2 into allowing text/html to be sent with file uploads. This hack has known side affects which may affect other applications running on your system, some of which are discussed in the comments. As a result, I accept no responsibility for damage caused to your system as a result of applying this hack, and this solution is provided as-is, with no guarentee, warranty or support. If you do not understand the regitry, nor how to reverse any change, then do not apply these changes – use them at your own risk.

Update: This technique is no longer required for HTML. Please see Validation by file upload and Internet Explorer on WinXP SP2

After downloading Windows XP Service Pack 2 recently, I was shocked that IE was now sending HTML documents with a .htm or .html extension as text/plain, thus causing any the W3C Markup Validator to issue this warning message:

Sorry, I am unable to validate this document because its content type is text/plain, which is not currently supported by this service.

The Content-Type field is sent by your web server (or web browser if you use the file upload interface) and depends on its configuration. Commonly, web servers will have a mapping of filename extensions (such as “.html”) to MIME Content-Type values (such as text/html).

That you recieved this message can mean that your server is not configured correctly, that your file does not have the correct filename extension, or that you are attempting to validate a file type that we do not support yet. In the latter case you should let us know that you need us to support that content type (please include all relevant details, including the URL to the standards document defining the content type) using the instructions on the Feedback Page.

This essentially means that it was impossible to validate any local HTML document using IE. This is really annoying, especially for any unfortunate developers who are forced to develop using only IE at work. Although I do pity anyone in that situation, there is now some relief!

After spending about half an hour searching through the registry for any setting that could be causing .html files to be sent as text/plain, I realised that it would be eaiser to find where the setting for other content types that do work, such as CSS. So, I found the setting for that, modified, and tested. When the CSS Content Type value was set to anything but text/html, IE uploaded the file with that MIME type. Thus, I came to the conclusion that it was not that the setting was incorrect, but that something in Windows security was preventing any text/html content being sent by changing it to text/plain on the way.

After that, I tried setting the valud for .html files to another type that the validator may support, such as text/sgml or application/sgml, but sadly, without luck! But, just before giving up all hope, I realised that perhaps Windows security, being as insecure as ever, is only checking for an exact match on the content type being set by IE with file uploads. I was correct!

In a normal HTTP header, the Content-Type can also include a charset parameter. For example:

Content-Type: text/html; charset=UTF-8

So, I figured, what if I want IE to send a charset parameter also. I set the Content Type value in the registry to that above, and it worked perfectly — the file validated!!! However, the charset will not always be UTF-8, or any other charset for that matter, so I removed the chaset parameter, and was left with the value text/html; That extra little semi-colon on the end is enough to bypass Windows security, and validate any HTML file.

Then, I remembered that IE also does not know how to validate XHTML documents either. So, I went to the registry key for .xhtml files, added the application/xhtml+xml MIME type, tested and Guess What! It Worked.

I have exported the required settings from the registry and they are availble here. IE6-SP2-Content-Type-text-html.reg will fix the value for text/html, and IE6-SP2-Content-Type-application-xhtml+xml.reg will add the MIME type for XHTML documents. Download them both, inspect their contents to ensure that they are safe, and apply them by launching them. You will be prompted by Windows to confirm that you want to apply the settings.

Update: For any users of ICQ: If you use change the text/html value to text/html; then each time the ICQ advertisement rotates, you may be prompted to save the file, because it is an unknown file type. I don’t konw why this happens, because IE still works the same as always — full of bugs! But for some reason it affects ICQ. I recommend you only apply that work around on computers that you do not use ICQ on, or else change it each time you need to validate with IE.

Safari’s Pseudo-Solution

In light of the recent backlash against Safari’s HTML extensions, Dave Hyatt has come up with what he considers to be a reasonable solution that addresses most of the concerns and potential solutions raised by many such as Eric Meyer, and Tim Bray.

However, his solution involves extending HTML by adding an xmlns attribute, which is supposed to only be used in real XML, such as XHTML.

Seriously, what is the point of adding it to HTML? Why not just do it correctly with XHTML? I’ve heard the arguments that it’s not as easy to learn, and authors are already familiar with HTML 4.01, but I disagree.

Safari supports XHTML, and since these extensions are aimed at being used in Apple’s new Dashboard, there is no reason to follow the WHAT WG decision to be bugwards-compatible with IE, and thus extend HTML, especially when these additions are presentational, as I discussed earlier, and commented on again in Eric Meyer’s latest post on the topic.

Although Dave Hyatt does mention:

…the benefit comes when you switch to real XML. In the XML implementation, the namespace is completely real and effectively maps to a new language…

I think the Safari team should just go all the way, and implement these extensions purely as an XHTML module. Although, I would prefer that the presentational additions, such as the new composite attribute for the <img/> element were actually done as proprietary extensions to CSS. eg. -safari-composite: source-over;.

In conclusion, this is a quick fix to address recent concerns, which just isn’t quite good enough. It’s a pseudo-solution that’s come out of almost complete laziness to do things correctly!

Exploring Safari’s HTML Tag-Soup Extensions

Like everyone one else, I thought the days of browser vendors adding proprietary extension to HTML was over. Sadly, I was wrong: Safari has decided to join in on the game. They’ve now introduced a <canvas> element as well as a new composite attribute for the <img> element. Not only are both of these presentational attributes, they’ve been added to HTML, while still using the HTML 4.01 doctypes.

Of course, Dave Hyatt attempted to explain with a follow up post the reasons for making these extension to HTML and not using XHTML and adding them with a different namespace. Also, as he points out, others suggested that they should have used SVG, yet fails to explain with any valid reason, except by saying that it would have basically been too difficult and time consuming. IMO, that’s just plain lazy, and I would have expected better from Apple. Sorry Dave, I have to agree with Eric Meyer on this one — this is big a mistake, and I think the Safari team should hang their heads in shame!