{"id":102,"date":"2006-01-11T13:06:21","date_gmt":"2006-01-11T13:06:21","guid":{"rendered":"http:\/\/lachy.id.au\/log\/2006\/01\/content-type"},"modified":"2006-04-30T23:26:00","modified_gmt":"2006-04-30T23:26:00","slug":"content-type","status":"publish","type":"post","link":"https:\/\/lachy.id.au\/log\/2006\/01\/content-type","title":{"rendered":"Content-Type"},"content":{"rendered":"<p>When it comes to the web, one of the most important yet least understood concepts\r\n\tis the media type of a file and, for text files, the character encoding.  Raise\r\n\tyour hands now if you\u2019ve ever been guilty of including the following <code>meta<\/code> element\r\n\t(or equivalent) in an HTML or XHTML document:<\/p>\r\n\r\n<pre><code>&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text\/html;charset=ISO-8859-1&quot;&gt;<\/code><\/pre>\r\n\r\n<p>Anyone who has ever created an HTML document and did not raise their hand\r\n\tto that question is a liar \u2014 every single HTML author in the world has used\r\n\tit and, today, I am going to explain what it does and <em>does not<\/em> do,\r\n\tand explain what you <em>should<\/em> use instead.<\/p>\r\n\r\n<h3 id=\"ctype-httpresp\">HTTP Response Headers<\/h3>\r\n<p>HTTP response headers are sent along with every single HTTP response and contain\r\n\tmetadata about the file being sent.  The response header contains a number of\r\n\theader fields used to specify a variety of information such as the last modified\r\n\tdates, content length, encoding information and, in particular, the <code>Content-Type<\/code>.<\/p>\r\n<p>Each header field appears on a new line and takes the following format (white\r\n\tspace is optional):<\/p>\r\n<pre><code>Header-Field: value; parameter=parameter-value<\/code><\/pre>\r\n<p>There are various tools available for you to examine the HTTP headers sent\r\n\tby your server, such as the <a href=\"http:\/\/chrispederick.com\/work\/webdeveloper\/\">Web\r\n\tDeveloper toolbar<\/a>, the <a href=\"http:\/\/livehttpheaders.mozdev.org\/\">Live\r\n\tHTTP Headers extension<\/a>,\r\n\t<a href=\"https:\/\/www.fiddlertool.com\/fiddler\/\">Fiddler<\/a> or an online tool like the <a href=\"http:\/\/cgi.w3.org\/cgi-bin\/headers\">W3C\u2019s\r\n\tHTTP HEAD service<\/a>.<\/p>\r\n\r\n<h3 id=\"ctype-what\">What is <code>Content-Type<\/code>?<\/h3>\r\n<p><code>Content-Type<\/code> is an HTTP header field that is used by the server\r\n\tto specify, and by the browser to determine, what type of file has been sent\r\n\tand received, respectively, in order to know how to process it.  The field\r\n\tvalue is a MIME type, preferably one registered with IANA, followed by zero\r\n\tor more parameters.<\/p>\r\n<p>For HTML documents, this value is <code>text\/html<\/code> with an optional <code>charset<\/code> parameter. \r\n\tTake a look at the <code>meta<\/code> element above and you will see the value of the <code>content<\/code>\tattribute contains this MIME type and the <code>charset<\/code> parameter, separated by a\r\n\tsemi-colon, which matches the format of the HTTP header field value.  Thus,\r\n\tthe HTTP Content-Type header field should look something like this:<\/p>\r\n<pre><code>Content-Type: text\/html; charset=UTF-8<\/code><\/pre>\r\n<p>Although, technically, the <code>charset<\/code> parameter is optional, it should always\r\n\tbe included correctly.<\/p>\r\n\r\n<h3 id=\"ctype-metaelement\">The <code>Meta<\/code> Element<\/h3>\r\n<p>The <code>meta<\/code> element in HTML has two attributes of interest in this case: <code>http-equiv<\/code>\tand <code>content<\/code>.  The <code>http-equiv<\/code> attribute, which was designed as a method to include\r\n\tHTTP header information within the document, contains the name of the header\r\n\tfield and the <code>content<\/code> attribute contains its value.<\/p>\r\n<p>The intention was that it be used by HTTP servers to create\/set real HTTP\r\n\tresponse headers prior to sending the document, but the reality is that there\r\n\tare none (at least none that I\u2019m aware of) that ever do this.  It was not really\r\n\tintended for processing by user agents on the client side, although it is described\r\n\tin the section on <a href=\"http:\/\/www.w3.org\/TR\/html401\/charset.html#h-5.2.2\">specifying\r\n\tthe character encoding<\/a> that user agents should,\r\n\tin the absence of the information from a higher level protocol, observe the\r\n\t<code>meta<\/code> element for determining the character encoding.<\/p>\r\n<p>It is, however, not used by any user agent for determining any other HTTP\r\n\theader information and thus including it for anything but <code>Content-Type<\/code> is nothing\r\n\tshort of completely useless, regardless of the examples given in the HTML 4.01\r\n\trecommendation.<\/p>\r\n\r\n<h3 id=\"ctype-contentattr\">The <code>content<\/code> Attribute<\/h3>\r\n<p>When used for specifying the <code>Content-Type<\/code>, despite the fact that it\r\n\tincludes both the media type and the <code>charset<\/code> parameter, it is only ever\r\n\tused by browsers to determine the character encoding.  Despite the popular misconception, <em>it\r\n\tis not used to determine the MIME type<\/em>, as the MIME type needs to be known\r\n\tbefore parsing the file can begin  and (as always) the information specified\r\n\tby a higher level protocol (like HTTP) takes precedence.<\/p>\r\n<p>The <code>Content-Type<\/code> header is always included for HTML files sent over HTTP and\r\n\tit must at least contain the MIME type: <code>text\/html<\/code>.  In the absence of this header,\r\n\tthe HTTP protocol provides some guidance on how to handle it, but it will likely\r\n\tend up being treated as <code>application\/octet-stream<\/code>, which typically results in\r\n\tthe user agent prompting the user for what to do with the file.<\/p>\r\n<p>Therefore, regardless of the MIME type included within the <code>meta<\/code> element, the\r\n\tMIME type used for HTML documents will always be <code>text\/html<\/code>.  (XHTML documents\r\n\tserved as <code>text\/html<\/code> are considered to be HTML documents for the purpose of this\r\n\tdiscussion).  This makes the practice of using the following within XHTML documents\r\n\tcompletely useless for specifying the MIME type:<\/p>\r\n\r\n<pre><code>&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;application\/xhtml+xml; charset=UTF-8&quot; \/&gt;<\/code><\/pre>\r\n\r\n<p>Infact, for XHTML served as XML, this <code>meta<\/code> element is not used at all \u2013 not\r\n\teven for the character encoding.  In such cases, XML rules apply and the encoding\r\n\tis determined based on protocol information (e.g. HTTP headers), XML declaration\r\n\tor the Byte Order Mark.<\/p>\r\n\r\n<h3 id=\"ctype-determineenc\">Determining Character Encoding<\/h3>\r\n<p>As mentioned, browsers do make use of the meta element for determining the\r\n\tencoding in HTML.  However, when the document is served over HTTP, this is in\r\n\tdirect violation of the HTTP 1.1 protocol [<a href=\"http:\/\/www.ietf.org\/rfc\/rfc2616\" title=\"Hypertext Transfer Protocol -- HTTP\/1.1\">RFC\r\n\t2616<\/a>] which specifies a default\r\n\tvalue of <code>ISO-8859-1<\/code> for <code>text\/*<\/code> subtypes.  This too is in violation of\r\n\t<a href=\"http:\/\/www.ietf.org\/rfc\/rfc2046\" title=\"Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types\">RFC 2046<\/a>,\r\n\twhich specifies <code>US-ASCII<\/code>, but the discussion of this issue is best saved for\r\n\tanother post.<\/p>\r\n<p>Additionally, for <code>text\/*<\/code> subtypes, web intermediaries are allowed to transcode\r\n\tthe file (i.e., convert one character encoding to another) and if the default\r\n\tencoding is assumed, yet another is declared inline (which would not be parsed\r\n\tby such an intermediary), then the results may not be good.  For these reasons,\r\n\tit is not recommended that inline encoding information be relied upon in <code>text\/html<\/code>. \r\n\t(Interestingly, these same reasons apply to the use of <code>text\/xml<\/code>, which is partly\r\n\twhy <code>text\/xml<\/code> is not recommended for use in favour of application\/xml.)<\/p>\r\n\r\n<h3 id=\"ctype-sethttp\">Setting HTTP Headers<\/h3>\r\n<p>Although it may seem much easier to copy and paste the <code>meta<\/code> element\r\n\tinto every HTML document published, it is almost as trivial to configure\r\n\tthe server to send the correct HTTP headers.  The method to do so will vary\r\n\tdepending on the server or server-side technology used, but specific information\r\n\tcan usually be found in the appropriate documentation.  The <a href=\"http:\/\/www.w3.org\/International\/\">W3C\u2019s\r\n\t<abbr title=\"Internationalisation\">I18N<\/abbr> activity<\/a> have provided\r\n\ta useful summary of <a href=\"http:\/\/www.w3.org\/International\/O-HTTP-charset\" title=\"The HTTP charset parameter\">how\r\n\tto specify the encoding information<\/a> using various servers\r\n\tand languages.<\/p>\r\n","protected":false},"excerpt":{"rendered":"When it comes to the web, one of the most important yet least understood concepts is the media type of a file and, for text files, the character encoding.  every single HTML author in the world has used the meta element for specifying this information and, today, I am going to explain what it does and does not do, and explain what you should use instead.","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[2,8],"tags":[],"_links":{"self":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/102"}],"collection":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/comments?post=102"}],"version-history":[{"count":0,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/posts\/102\/revisions"}],"wp:attachment":[{"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/media?parent=102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/categories?post=102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lachy.id.au\/log\/wp-json\/wp\/v2\/tags?post=102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}