Well, after taking a long break from writing anything on this blog, I’m back and better than ever. I’ll try to post more regularly from now on, with much better content. I hope you, my loyal readers, didn’t miss me too much while I was gone, but anyway, let’s get on with the good stuff. ☺
One thing I’ve come to notice a lot of people believe is that in HTML, everything is a tag (or at least can be called a tag). This is most certainly not the case. The most recent offender I’ve seen, and the reason I decided to write this, is the author of Firefox, ALT Tags, and Tooltips, which, as you can see by the title, incorrectly refers to attributes as tags. The article itself is quite good, and I fully agree with its message about tooltips for alt attributes, it’s just the incorrect references to the attributes as tags that bugs me. This author is not the first, nor the last to make the mistake, but it is about time people learn to call things by their real names.
If you read part 5, Terminology, of Joe English’s humorous document: “Not the comp.text.sgml Frequently Asked Questions List”, you will see the common name for everything except a tag, is a tag. The common name for a tag being a command, which, of course, makes perfect sense! ☺
-------------------------------------------------- ISO/W3C terminology Common name -------------------------------------------------- attribute tag attribute value tag attribute value literal tag attribute value specification tag character reference tag comment tag comment declaration tag declaration tag document type declaration tag document type definition tag element tag element type tag element type name tag entity tag entity reference tag general entity tag generic identifier tag literal tag numeric character reference tag parameter entity tag parameter literal tag processing instruction tag tag command --------------------------------------------------
So what exactly is a tag then? Well, before I get to that, I’ll just explain what some of the more common SGML and XML terminology means and what a tag is not.
Firstly, tags are not commands. People believe they are commands because of the misconception that HTML is a presentational language, or even a programming language. HTML is certainly not a programming language, and while it is true that presentational features have crept in, they have already been deprecated and/or removed (X)HTML, or at least will be in future versions.
It is the presentational elements and attributes that could be seen as commands or instructions to display the content in a certain way; however, they are in fact suggestions, just like CSS properties – the only difference being that these presentational suggestions are mixed in with the markup, and have no real semantics that indicate what the content is, only what the author wants it to look like, usually in a visual medium. Any presentational feature, whether done with CSS or the presentational elements and attributes, can be overridden by a user with a user stylesheet (assuming the user agent supports that facility), therefore, they are only suggestions that a user does not have to accept, not commands that a user agent, nor user must obey.
HTML, since it has been formally based on SGML, is intended to mark up the structure and semantics of the content by saying what it is, not what it does, nor how it looks (with the exception of the afore mentioned presentational features). Basically, HTML is not a procedural programming language; it is a descriptive markup language, so tags are not commands.
There’s no excuse for calling attributes tags, other than complete laziness
and/or ignorance, but as already shown, calling attributes tags is a common
mistake. An attribute is a property of an element that is written within
the start-tag of an element, and should be referred to as simply an attribute.
the simplest way of referring to an attribute, and is only slightly longer
than writing tag. However, a shorthand method of referring to attributes, which
I occasionally see within plain text e-mails, is to write it within vertical
bars, or some other delimiter. eg. |alt|.
Character Tags (or Entities)
Character references are sometimes called tags, but are more often called entities. Just like attributes, they are not tags either, but what’s wrong with calling them entities?
Character references in HTML may appear in two forms:
- Numeric character references (either decimal or hexadecimal).
- Character entity references.
The numeric character references take the form
&#xnnnn; (hex). Character entity references
are the named entities for the ISO-8859-1
characters (from 160 to 255), symbols,
mathematical symbols and Greek letters, and finally, markup-significant
and internationalization characters.
Based on that, you may think that it is only the numeric references that are
incorrectly referred to as entities; however, it is indeed both forms. In SGML
and XML there are several types
of entities, and the simplest explanation of
what an entity is, is that which comes from ISO-8879 itself, the SGML specification:
an entity is
a collection of characters that can be referenced as a unit.
The purpose of entities can be easily understood, but understanding exactly
what an entity is and separating that concept from the markup, is more difficult.
An entity is a concept that is defined in a DTD using an entity
declaration defining both the name, and the replacement text. The entities are referred
to within a document using an entity
reference in the form:
The entity declaration and the entity reference are just the markup for the
entity, but they are not the entity itself.
Generally, when people say entities in regard to an HTML document, they are actually referring to the character entity references and/or the numeric character references; not the entity itself. Though, this is not always the case, SGML and XML experts will usually get it right, but luckily, the intended meaning of the speaker can generally be understood from the context of its use.
The DOCTYPE Tag
The Document Type Declaration, or simply DOCTYPE, is often referred to as
the DTD, or the
DOCTYPE tag. The acronym, DTD, can be mistakenly
used to refer to the Document Type Declaration, since it has the same initials
as the acronym’s defined meaning: Document Type Definition.
The DOCTYPE is not a tag either, it is a declartion, so calling it the
is incorrect. However, more often than not, is easier to simply refer to it
as just the
The <?xml?> Tag
The XML declaration, often referred to as a Processing Instruction or Prolog, is also sometimes called the <?xml?> tag. As you can probably guess, it is not a tag. It is also not a processing instruction either, but that, at least, is forgivable, since it does have the appearance of an XML PI, though it is defined separately as the XML Declaration. It is not the prolog either, but it is part of the prolog.
Elements and Tags
An element is not a tag, as noted at the end of section 3.2.1 Elements, in the HTML 4.01 recommendation:
Elements are not tags. Some people refer to elements as tags (e.g., “the P tag”). Remember that the element is one thing, and the tag (be it start or end tag) is another. For instance, the HEAD element is always present, even though both start and end HEAD tags may be missing in the markup
Tag only refers to either the start- or end-tags. Every element has a start-tag
<p>) and, with the exception of empty elements, an end-tag
</p>). Empty elements never have an end-tag in HTML,
though one is required in XML, and thus XHTML (which can use the special
empty element tag syntax). As noted, in HTML, the start- or end-tags may
be omitted for some elements, but those elements are still present.
An element is more of a concept that is defined using an element declaration, and comprises an element name, that appears within the start- and end-tags, any attributes within the start-tag, and (with the exception of empty elements) its content model and finally, its content. An element is included in a document by writing its start and end tags, as needed, but (like entity declarations and references) the element declaration and tags are only the markup for an element; they are not the element itself. It is important that this distinction be made and understood by authors – I just hope I’ve explained it well enough.