This rationale is written in defence of a technically sound and reasoned
approach to dealing with the Content-Language
pragma directive
issue within the HTML Working Group. ISSUE-88
is a request for permitting multiple language tags to be used as the value of
the Content-Language
pragma directive. This article argues that
this change proposal
is unsupported by logic or reason, and resolving in its favour will have an
overall negative effect for both authors and implementers.
Summary
This summary is presented as an overview of the arguments presented throughout this article. The supporting rationale in favour of these arguments is presented later.
- The change proposal is based upon the false premise that the
Content-Language
HTTP header and pragma directive are equivalent. - The HTTP header is used to declare the languages of the intended
audience; the only defined function of the pragma directive is to be
used as a fallback language in the absence of the
lang
attribute. - The use of the pragma directive as part of server configuration is out of scope of HTML. Specific server side implementation choices need not affect the conformance definition.
- The pragma directive only fulfils its purpose of providing a fallback language when one language tag is specified. Multiple language tags are, by definition of the implementation requirements, not useful or beneficial.
- There are no reasons given for why it is beneficial to leave the pragma
directive in the document when the
lang
attribute is present on the root element. - Failing to offer a warning about its presence in all cases would continue to mislead the author about its legitimacy.
- The inconsistency of when warnings are issued would be confusing to authors. It is better to offer a consistent warning about the presence of a redundant feature.
- The defined effect, per the implementation requirements, of declaring multiple language tags is identical to that of omitting the pragma directive entirely. No reasons are given to explain why declaring multiple language tags is useful.
- The syntax of the
Content-Language
HTTP header field is not affected by the definition of the distinctContent-Language
pragma directive in HTML, with which it only shares a common name and does not share significant functionality. It is reasonable for this distinct feature to use a distinct conforming syntax that is suitable for its purpose. - No reason is given explaining why only emitting the warning under
specific circumstances, as opposed to the current specification
requirement, would serve better in encouraging authors to use the
lang
attribute instead. - The proposed replacement specification text contains unjustified changes, inconsistencies, unimplementable requirements and is overall inappropriate for use in the specification.
- The claimed positive
benefitseffects are unsupported by evidence and, in several cases, blatantly incorrect. - In practice, very few authors use multiple language tags in the pragma directive, and doing so is not useful. Restricting the syntax to one language would not have a significant negative impact.
Difference Between Content-Language
HTTP Header and Pragma Directive
This premise of the change proposal is that the Content-Language
HTTP header
field is functionally equivalent to the Content-Language
pragma directive
using the meta
element. This premise is used to support the idea that that
both should share the same syntax and client side processing requirements.
However, this premise is demonstrably wrong, and thus the change proposal is
unsupported by evidence and must be rejected.
In order to demonstrate the differences between the HTTP header and the pragma directive, it is necessary to analyse the purpose and functionality of each and see how they compare.
Declaring the Language of the Intended Audience
The HTTP Content-Language
header field is used by HTTP servers to announce
the language of the intended audience for a given resource representation.
This and other related information exchanged between the client and server
can be used for content negotiation based on language. When the server does
this, it is important for this information to be included in the HTTP header
where it can be seen by both the client and other intermediary servers.
The information declared within the document using the pragma directive is unsuitable for this purpose, as it will not be parsed by intermediary servers that would otherwise utilise the information for caching purposes.
Server Configuration
It has been claimed that the information declared using a pragma directive
within the document may be parsed by some server implementations, which
subsequently process and echo the value in the Content-Language
HTTP header
field. Since this header field is allowed to contain multiple language
values, it is claimed that this ability is limited by permitting only one
language in the pragma directive. However, no evidence has been presented
to demonstrate how widely used this feature is, nor why such a feature should
even be defined within HTML.
This is a layering violation because information intended for server side processing, and specific implementation details thereof, should not unnecessarily affect the conformance definition of client side HTML. That is, it is out of scope for HTML, as a client side markup language, to define specific processing requirements or features to be used by servers for implementing HTTP features. There is also no inherent need for interoperability between different back end implementation details.
Defining the pragma directive in a way that is optimised for specific server implementation details would be analogous to, for example, defining an ASP specific feature within HTML for use on Microsoft IIS platforms. While server implementations are otherwise free to make any design decision, those design decisions need not affect HTML conformance requirements.
Default Document Language
In practice, Content-Language
used within the meta
element in the
HTML serves as client side metadata. The functionality of
Content-Language
in this case is restricted entirely to the purpose of
specifying a fallback language, to be used in the absence of the lang
attribute. This purpose differs significantly from the purpose of
declaring the languages of the intended audience.
Declaring multiple languages for the document’s intended audience
makes sense in some cases. However, there can only be one default
language. Thus, for this purpose, the functionality as defined requires
that only a single language value be specified. While the HTTP
Content-Language
header field is also used for determining the fallback
language in cases where it only has a single language value, that is not
its primary purpose and is thus not a significant similarity between
these two independent features.
Permitting multiple language values to be specified in the pragma directive is at odds with its implementation requirements. Thus, for the client-side metadata functionality of the pragma directive, it is not at all useful to have multiple languages specified, and so it does not make sense for multiple languages to be considered conforming.
These 3 aspects of the functionality — declaring the language of the
intended audience, server side configuration and default document
language — clearly illustrate that the premise of this change proposal —
the shared functionality between the two features — is fundamentally
flawed. The reality is that the in-document Content-Language
pragma
directive only shares its name with the HTTP header field, while its
functionality is closer to that of the lang
attribute. And since server
side implementation details are out of scope of HTML, there is no need
for the document conformance definition to permit multiple language
values. The solution chosen for addressing this issue must take this
into account, and thus reject this change proposal.
Arguments Against the Rationale
The rationale for this change proposal states:
[The current specification] offers no carrot for doing the right thing. while the fallback language effect stops as soon as the author adds
lang
on the root element, the spec requires conformance checker to continue whining until thehttp-equiv="Content-Language"
meta
element has been removed.
The rationale fails to explain the benefit gained by leaving the
pragma directive in the document when a lang
attribute has been
specified on the root element. While leaving it in the document under
those circumstances is mostly harmless, it is redundant metadata that
the author does not need to include in their document. Failing to offer
a warning would continue to mislead the author into thinking that the
pragma directive is both acceptable and useful, which it is not.
That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header — whether they want to replicate the effect of multiple tags or a single tag.
The language fallback effect from using multiple language tags within the value is that there is no default language. This is exactly the same effect as would be achieved by omitting the pragma directive, and so the given reason is blatantly wrong.
i.e. The effect of including a value with multiple languages, like the following:
<meta http-equiv="Content-Language
" content="en, fr">
is identical to that of omitting this pragma directive entirely. This rationale also fails to provide a reason for wanting to replicate this effect by copying the same syntax.
That it underlines the confusion that may exist today, about the nature of
lang
versusContent-Language
, by requiring:
- different syntax rules for features that are expected to be identical (HTTP and
http-equiv
)- similar syntax rules for features that are different (
http-equiv
andlang
)- a warning message which asks authors to “use
lang
instead” – as if they were juxtaposable alternatives.
In actual fact, the confusion surrounding this issue is the idea that the HTTP header and pragma directive are equivalent, as clearly illustrated by this misguided change proposal. They are different. The HTTP header is used for declaring the languages of the intended audience, the pragma directive is used for specifying a default language.
The lang
attribute, on the other hand, is an alternative to the
pragma directive when a single language is specified. When multiple
languages are specified, there is absolutely no defined effect, and so it
serves no valid purpose at all. Therefore, the pragma directive is much
closer in functionality to the lang
attribute, than it is to the
HTTP header, with which it shares its name.
Instead of the above, this change proposal propose:
- the Zero-edit proposal’s warning about using
lang
instead ofContent-Language
should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (via lang) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes fromhttp-equiv
or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect.
From the authors perspective, the inconsistency of issuing the
warning about the use of the pragma directive only when the lang
attribute is absent would be confusing. The better alternative is to
issue a consistent warning (or error) that simply says to remove the
pragma directive and use lang
instead.
- to hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of lang, which forbids multiple languages), will have the effect of underlining that
lang
andContent-Language
have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases.
The syntax requirements for the HTTP Content-Type header are not
affected by the HTML implementation requirements. Since the lang
attribute on the root element and the Content-Language
pragma directive
with a single language value do have the same effect, which differs
significantly from the purpose of the HTTP Content-Language
header, and
because it is misleading to pretend otherwise, the syntax of the former does not
need to match the syntax of the latter.
- a carrot: what we want from authors is that they rely on
lang
(and xml:lang) for specifying the language — when the author does that, he/she should get immediate reward in the form of removal of conformance warning.
This rationale fails to explain why that same effect of encouraging
authors to use the lang
attribute would not be achieved by a more
consistent warning that states to use the lang
attribute and remove the
pragma directive. There is no benefit gained by leaving the directive
in; and merely silencing the validator by inserting a lang
attribute
does little to discourage the use of the redundant and totally
unnecessary pragma directive.
Arguments Against the Proposal Details
The change proposal suggests replacing the terminology for “pragma-set default language” with “pragma-set locale language”. None of the given rationale explains the need for this change in terminology.
The proposed specification text states:
This pragma contains a
Content-Language
list, whose semantics and syntax is defined in the HTTP spec.
The semantics of the Content-Language
header field as defined in RFC 2616
states:
The
Content-Language
entity-header field describes the natural language(s) of the intended audience for the enclosed entity. Note that this might not be equivalent to all the languages used within the entity-body.
This semantic definition does not match the actual purpose of the
Content-Language
pragma directive, for specifying a “pragma-set locale
language”. Therefore, referring to RFC 2616 for this semantic definition
is inappropriate. The syntax requirements from RFC 2616 are also
inappropriate, as it defines the following ABNF, which is not directly
compatible with the syntax of the meta
element with http-equiv
and
content
attributes.
Content-Language = "Content-Language" ":" 1#language-tag language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA
For these syntax requirements to be applicable at all, the
specification would have to state that the value of the content
attribute must match the ABNF production for language-tag
. However, see
below regarding the syntax defined in BCP 47.
An HTML5 parser processes this list into a known or unknown pragma-set locale language… The
Content-Language
list may also be defined in a HTTP header, and will then result in a known or unknown HTTP header-set locale language.
The proposed text fails to define what “known or unknown” means in that context. It is not clear how the implementation determines whether a value is known or unknown. The phrasing of the requirement seems to indicate that it would depend upon the result of parsing the value, rather than just the presence or absence or absence of said value. But the parsing requirements do not use such terminology, and so there is no way to determine whether a given value qualifies as known or unknown.
The parsing requirements for the value of this pragma directive are not specified by the change proposal. However, the change proposal also does not state that the existing parsing requirements in the specification are to be removed, replaced or modified in any way. Thus, by adopting the details of this change proposal, the specification would be left in an inconsistent state which says that multiple language values are supported, but where the parsing requirements abort when more than one value is used.
The aforementioned parsing requirements only focus on parsing the value of the pragma directive, and as such, there is no implementation requirement that sets the “HTTP header-set locale language”.
When a document is lacking a language declaration in the form of the
lang
orxml:lang
attribute on the root element, the document’s locale language (pragma-set or HTTP-set) is consulted by the user agent and used as fallback value for the primary document language.
Assuming the value of the “HTTP header-set locale language” comes
from the HTTP Content-Language
header, this proposed text fails to
specify the order of precedence of the values specified in the pragma
directive or the HTTP header.
The use of the term “locale language” in this context clashes with the existing use of the term in the specification to refer to the language set by the user in the user agent’s preferences. This term is used in the table within step 7 of the algorithm to determine the character encoding.
The proposed text then goes on to state:
The following info about the HTTP semantics and
Content-Language
usage, is informative:
However, in the non-normative list given following that statement, RFC 2119 terminology is incorrectly used to describe what appear to be authoring requirements. In particular:
… authors should not define the
Content-Language
list according to its parser effect, but according to it semantics.
This non-normative example text also incorrectly states that “en-US
”
would not be parsed into a useful value. However, this value complies
with the syntax requirements specified in RFC 2616, BCP 47 and also with
the existing parsing requirements in the HTML5 specification.
The proposal states that the following requirement is to be removed:
Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the
lang
attribute instead.
The rationale provided does not adequately justify the removal of
this warning, and nor does it adequately justify replacing it with a
more limited warning to be issued only when the pragma directive is in
the absence of the lang
attribute.
The proposal then states to amend this requirement as follows:
the
content
attribute must have a value consisting of a valid BCP 47 language tag, or a comma separated list of two or more BCP 47 language tags.
However, the proposal stated earlier that the syntax for the value was
defined by RFC 2616. This requirement now conflicts with that by stating that
the syntax of the content
attribute’s value is defined by
BCP 47. This inconsistency negatively affects the quality of the
specification.
The proposal states that this note is to be removed:
This pragma is not exactly equivalent to the HTTP
Content-Language
header, for instance it only supports one language.
The removal of this note would be misleading, because the note itself is factually correct as-is with the current specification, and with the details of this proposal, which, as stated above, leave the parsing requirements unchanged. The proposal fails to include any implementation requirements that actually permit multiple language tags to be used.
It has now been clearly demonstrated that the proposed specification text provided by this change proposal is thoroughly inadequate for its intended purpose. If the specification were to be amended as required by this change proposal, the inconsistency and lack of clarity would negatively affect the ability to read, understand and implement this specification. As such, this proposal should also be rejected on the basis that its proposal details are inadequate. However, if this working group does make the wrong decision to permit multiple language tags, then I ask that the editor be given full editorial discretion to phrase the requirements in a way that more clearly expresses the requirements, rather than being asked to accept the details of this proposal as written.
Arguments Against the Claimed Positive and Negative Effects
More positive: authors can get rid of the warning by adding something — <html lang="*"> — this is better than a focus on removal of the (over all) harmless
Content-Language
meta
element.
Likewise, authors can get rid of the warning as required by the
current specification by removing the meta
element. No rationale is
provided to explain why the act of removing the pragma directive is
significantly more difficult than adding the lang
attribute to the root
element. Depending on the authoring tool or CMS, both of these actions
are likely to be just as easy or just as difficult to perform. This
purported benefit is thus unsubstantiated and invalid.
More stable: same syntax as before continue to be permitted.
As documented by the null change proposal, observation of the use of this pragma directive shows that only a very small minority of authors use multiple language values. However, the claimed benefit of continuing to use this syntax is nullified by the fact that, due to the implementation requirements, multiple language values are not at all useful.
More permissive: authors, CMS-es and browsers can continue to take advantage of HTTP-EQUIV’s ability to reference what the HTTP header is/was supposed to be, including replicating its fallback effect.
No rationale is provided to explain why that ability is in any way beneficial.
More correct: the difference between
lang
andContent-Language
is pointed out, while the link betweenhttp-equiv
and HTTP is emphasized.
As has been demonstrated, this is blatantly wrong. The lang
attribute
and the Content-Language
pragma directive share more in common in terms
of functionality, than to the pragma directive and the Content-Language
HTTP header field.
More useful: a warning that a fallback feature has kicked in, is more useful than a warning which focuses on one of the places where the fallback language could potentially kick in from. Why tell the author to “please use
lang
instead” if the author has already made sure that thelang
attribute is in place?
It seems more useful for authors to be informed about the presence of a redundant and useless feature, than to have them continue to mistakenly believe that the pragma directive is in any way useful. However, either way, both of these are highly subjective claims about what may or may not be useful to authors, which cannot be objectively evaluated without supporting data.
Has positive side effect: Encouragement to place a
lang
attribute on the starttag of the html element will lead authors to actually type in the html root element, instead of relying on the parser to generate it for them.
Relative to the status quo, the zero edit change proposal, or the
proposal to make Content-Language
non-conforming, the above is not a
unique benefit. Both this and the other change proposals require
validators to notify the author about the issue and encourage the use of
the lang
attribute.
More accurate because it does not conceal the problems by introducing an artificial technical and semantic difference between
Content-Language
from the HTTP header andContent-Language
inside thehttp-equiv
meta
element.
This accuracy claim is undeniably wrong, given that the significant differences between the HTTP header and pragma directive have already been explained.
Conclusion
Based on the arguments presented in this article, it is clear that the change proposal arguing for multiple language tags to be permitted is misguided, and lacks any significant or valid supporting arguments. The overall effect of of the group accepting this change proposal would have a serious negative impact upon the quality of the specification. It is therefore my strongly reasoned opinion that the HTMLWG must reject this change proposal either in favour of the status quo, or in favour of making Content-Language entirely non-conforming.
Positive benefits?
Ms2ger, oops. That should have said “positive effects”. I think I was tossing up between writing that or “benefits”, and ended up with the tautology.
Very timely:
http://trac.tools.ietf.org/wg/httpbis/trac/ticket/214
“The HTTP header is used for declaring the languages of the intended audience”
The current wording in 2616 may be due to poor choice of words. See http://trac.tools.ietf.org/wg/httpbis/trac/ticket/214
Mark, Sabbu, RFC 2616 is very clear in its meaning. Not only does it say it means the langauges of the intended audience, but it proceeds to give very clear examples illustrating the fact that Content-Language is not meant to indicate the languages of the resource itself. That bug you’re referring to is basically asking to alter the semantics of Content-Language.
I never said it reflected the language of the resource; it’s that of the entity/representation. This is very clear in the examples.
Mark, let’s not get nit picky about precise terminology here. It’s unnecessary and does nothing to refute my argument. In fact, if you just imagine I said “representation” instead of “resource”, my argument still holds.
214 was raised by Norbert Lindenberg, who has considerable experience in i18n. My interest is in getting people in the HTML, HTTP and i18n communities talking so that we can find the right answer to what Content-Lang means in HTTP, not to refute your arguments.
Personally, I tend to agree with your conclusion (although not some of your arguments; talk about nit-picking!). However, HTML5 really needs to put big, blinking red lights around http-equiv saying that it’s not equivalent at all, for this any many other reasons.
P.S. Terminology is important, it’s how we communicate.