All posts by Lachlan Hunt

Link Relationships Revisited, Part 2

As I mentioned in part 1, I plan to discuss major modification to my original link relationships proposal taking into account feedback and ideas received from a variety of sources. For this, I am currently writing, and will soon publish, the first working draft for a link relationship profile called “Web Communication Link Relationships”. The aim is to define link relationships that will facilitate web communication through increased linking semantics. This proposal, rather than focussing on rating systems, focuses on the semantics of the links to be used by user agents in ways that will most benefit the user.

The criteria I will use to determine which relationships are appropriate and to make improvements includes:

  • Link Relationships should express semantics to indicate at least one of:
    • The semantic relationship between resources.
    • The type of resource.
    • The purpose of the link
  • Relationship semantics should make sense in the context of any number of user agents.
  • Relationship names should accurately represent the semantics of the relationship, and
  • Relationship semantics should avoid expressing user-agent-implementation specific functionality.

From those criteria, I determined that the number and type of categories I originally included (user feedback, quality, accuracy, accessibility, rating and endorsement) were mostly very inappropriate. I decided that the categories: quality, accuracy, accessibility and rating should be removed entirely because they don’t meet the criteria, nor were they particularly useful. This left only the values from user feedback and endorsement; however, user feedback is being renamed to user contribution because a contribution may not necessarily be the result of feedback, but rather the start of a new thread, for example.

User Contribution

The user contribution category is designed to identify links which have been published as a result of user contribution. It is common for several different kinds of links to be published as a result of user contribution including user identification (eg. User’s homepage or e-mail address), the use of a referral mechanism (eg. Pingback or Trackback) or links contained within the user’s contribution. As a result, I have the following relationships defined for this category:

  • user
  • referral
  • pingback
  • trackback
  • contribution

Note: Both pingback and trackback are implementation specific referral mechanisms, but are included because their mechanisms are so widespread and several interoperable implementations of each already exist. See the Pingback 1.0 and Trackback specifications.

Endorsement

The original endorsement category is being included with the same relationships, though their definitions are being revised. The relationships in this category will be:

  • advert
  • endorsed
  • unendorsed
Communication Tracking

A few months ago, Derek Featherstone had an interesting discussion about tracking the spread of ideas. In it, he discusses the common attribution method called the “via” link, used among web-logs (in particular, link-logs) and other similar sites for indicating where an interesting resource was found. The aim of which is to not only find out who is linking to a particular resource, but from whereabouts people have located it.

This is an interesting concept because news travels rather quickly around the blogosphere, but with little indication of the communication paths. As a result, I am including the via link within this proposal, but I’m also extending it slightly to address some other related issues including related links.

In the article, Derek mentions that it is difficult to semantically associate a “via” link with the resource and illustrates a possible solution using the for attribute, yet notes that it is also invalid markup according to the HTML specification. To address this problem, I have decided that rather than trying to explicitly associate the “via” link with the resource, it may be possible to implicitly do so. The full details of how will be discussed in the draft specification, but the relationships being included within this category include:

  • resource
  • related
  • related-to
  • via
Resource Tracking

The final category includes relationships for identifying and tracking resources commonly used in web communication, such as web-logs and news sites. Usually, each article has permanent link associated with it and archival indexes are provided for the articles. Many sites also have facilities, and designate pages/areas, for user contributions (comments). Finally most blogs, and increasingly news sites, provide syndication feeds for their articles, link logs and occasionally comments. For these purposes, I’ve defined the following relationships:

  • permalink
  • comments
  • archive
  • feed
Other Suggestions

Several other relationships have been suggested and considered, but so far rejected for various reasons including not meeting the criteria or their usefulness. Some of these include:

tag
Technorati defines, and uses, user-agent-implementation specific functionality and it seems to be inappropriately named.
category
Generic version of tag, but not sure how it could be defined and used more appropriately.
external
Suggested to indicate links to external sites, but not sure how useful it is.

Feedback is welcome and I will still incorporate suggestions and revisions into the Web Communications Link Relationship working draft before I publish it in a few days.

Link Relationships Revisited, Part 1

A while ago, I wrote an entry entitled Link Relationships that detailed my ideas for additions to the values for the rev and rel attributes. It went a long way into describing many relationships for several categories including user feedback, quality, accuracy, accessibility, rating and endorsement.

Having received some feedback from many sources regarding the value of all these relationships, the fact that Tantek had implemented an alpha system for Vote Links in Technorati and now that Google, MSN Search and Yahoo have implemented rel="nofollow"; I intend to revisit the idea and explain exactly why I don’t think Technorati’s vote-for, vote-against and vote-abstain; and Google’s nofollow relationships are suitable. Following this, in part 2, I will discuss the major modifications to my original ideas that will both simplify and reduce the number of relationships, yet provide increased semantics in a way that implements the desired functionality of vote links (including nofollow) with many more advantages that are applicable to a wider variety of user agents.

Relationship Semantics

Firstly, I would like to explain a little about the semantics and purpose behind link relationships to indicate the link role. Each relationship should meet one or more of these criteria to indicate:

  • The semantic relationship from one resource to another (eg. Next or Prev in sequence),
  • What the resource is (eg. Contents, Stylesheet, etc.), and/or
  • The purpose of the link (eg. To provide Help for the user).

Additionally the name of the relationship should accurately represent its semantics, and all relationships should make some sense in the context of any number of user agents and for any user — the relationship should be named and designed independent of any particular type of UA. However, this does not mean that every UA should be able to do something sensible with the link; nor that every UA that does, should do the same thing with it.

All of the above is true for all HTML 4 link types, somewhat true for XFN, and for most of the relationships I suggested previously. However, the same is not entirely the case for either the vote-links or nofollow relationships.

In fairness, vote links (unlike nofollow) do express some semantics. ie. A vote-for relationship expresses that the author of document A approves of document B. In this way, a search engine, for example, may make use of the relationship during page-rank calculations and another UA may simply gather and list all such links for the user to select from easily. However, the relationships do not indicate any semantic relationship, what the resource is, or the purpose of the link.

In the Technorati Vote Links Wiki, the reasons given for using the yes/no/abstain model are given. However, although they do raise some good points that I will be incorporating into my link relationship suggestions later on, they fail to make a solid case for using them specifically. The general reasoning given is that this model has served well for politics, committees and user rating systems, such as that used on eBay, and will thus, due to the democratic nature of the web, serve well for page rank calculations; and that they were designed to be easy-to-remember to manually type in a hand-coding environment, and easy for authoring tools to implement.

Given that, however, the main reason for why vote-links are unsuitable is not just for lack of useful semantics, it is that they provide too much potential for abuse. Mark Pilgrim has given his reasons for why vote-for is useless and why vote-against is harmful.

  • rel=”vote-for” –> increase PageRank (the default, this is what all links do now)
  • rel=”vote-abstain” –> ignore for PageRank (like a Javascript link that Google can’t follow, or a hypothetical link-level NOFOLLOW meta tag)
  • rel=”vote-against” –> decrease PageRank

He also states that vote-abstain is the only useful value and compares it with the idea for a nofollow value which he approves of, and which Google was soon to implement.

Because vote-abstain is essentially equivalent to nofollow in that it states that the link should not be counted for page-rank calculations, it is harmful for the same reasons given below for nofollow, though it is better because its name more accurately represents its meaning.

The nofollow Relationship

The nofollow relationship has no semantics relating to the documents in any way whatsoever – its name simply states, in no uncertain terms, that a user agent should not follow the link; however, its intended meaning is that the link should not be counted towards page-rank (ie. This link is abstaining from voting). Compare this with the similarly named, yet semantically different nofollow value for the Robot’s Meta Tag. For the robot’s meta tag, this value means that no links within the document should be followed or indexed in any way; whereas the nofollow relationship, according to google’s announcement, may be followed, and indexed but shall not recieve any additional credit.

From now on, when Google sees the attribute (rel="nofollow") on hyperlinks, those links won’t get any credit when we rank websites in our search results.

Thus, as stated earlier, vote-abstain is a better name; though both are considered harmful for the following reasons.

nofollow makes no sense at all in any user agent, other than a search engine that uses the page-rank (or equivalent) algorithm; and is, in fact, a case where the relationship is not named or designed to be independent of a particular type of UA.

Additionally, this relationship opens up the potential for abuse. Much like an untrusted content element (and other similar suggestions) would be abused by marking the entire content of every page; and due to the complete lack of semantics, uneducated authors (who make up a large proportion of the web community) may begin to include the relationship within all external links so as not to help increase the page rank for any external site they link to, and may even do so as a result of company policy. Of course its appearance in company policies may seem like a wild assumption, but just consider other such linking policies that some companies have introduced. Therefore I am stating it as a possibility, not a certainty.

Steven Garrity has raised another interesting point in his Thoughts on Weblog Comment Spam Prevention:

Tagging all links in comments left by weblog readers means that none of these links will contribute to the great hive mind that is Google PageRank. There are loads of great and valuable links in weblog comments.

He then continues to say that it is likely that some software may not use the value for links added by registered users; but most comments left on blogs are not from registered users at all. Thus, this will, in fact, inadvertently harm legitimate users, and indeed search results, by losing the benefit of mass communication.

The real question, however, which everyone will be asking is: will this stop comment spam? On the surface, it seems like a brilliant idea. It attacks the very reason behind a lot of comment spam which is designed to increase page rank for spammer’s web sites. However, to answer this question, one really needs to look at the current methods used to fulfil the same function. Not all spam prevention methods need to be looked at, just those that prevent a search engine from indexing the link and counting it towards page rank. These methods include:

  • A link redirector, such as that used by Blogger.
  • Plain text-links.
  • Moderation and deleting spammer comments.

I’ve have only listed these techniques, and avoided others like spam filters, black lists, etc. because, while those are designed to stop spam before they even get published, these are designed (like nofollow) to reduce the benefit of the link for the spammer in the event that the comment is published.

It’s not hard to see that while each of these methods have been and will continue to be used in many blogs, none of them have succeeded in stopping the flood of spam. Again, Mark Pilgrim has explained this phenomenon in regard to the use of plain text links by saying that spammers don’t care.

Spammers have it in their heads now that weblog comments are a vector to exploit.  They don’t look at individual results and tweak their software to stop bothering individuals.  They write generic software that works with millions of sites and goes after them en masse.  So you would end up with just as much spam, it would just be displayed with unlinked URLs.

That point does not only apply to plain text links, but to any method with the same effect. Spammers don’t read blogs, they don’t know who implements what features and who doesn’t; and they don’t know which spam attempts will be successful, yet they continue to automate and spam anyone and everyone they can, in the hope that some small percentage will get through.

For this method to be successful, it would require that every single web site, that provides user feedback mechanisms, to implement it. If only some do, spammers won’t care. If most do, but some don’t, spammers will just try harder to hit those that don’t, by striking more blogs harder, and faster, than ever. However, I guarantee that no method will be completely fool proof and ever succeed in being used by 100% of web sites; and as a result, spam will continue. Of course, there’s no need to be completely pessimistic about all this, it is another tool in a spam-fighter’s toolbox that may have some impact; it just won’t stop it entirely.

So, in summary, nofollow is harmful because it:

Update: changed the following list to an ordered list for easier referencing.

  1. Fails to indicate any semantic relationship between resources.
  2. Does not indicate what the resource is.
  3. Does not state the purpose of the link.
  4. Makes no sense for any user agent or user, other than a search engine using the page-rank (or equivalent) algorithm.
  5. May be abused to prevent any external links, regardless of their context, from a web site being counted towards page-rank, perhaps due to company policy.
  6. Has the potential to harm the benefits of mass communication among weblogs and their comments.
  7. Its name does not accurately represent its semantics.
  8. It will fail to stop comment spam completely.

Because of all this, I will not be endorsing nor using this relationship for any links on my site (assuming blogger gives me a choice in the matter), and I encourage all authors, weblog CMS vendors and search engines to seriously consider whether the small possibility of a slight reduction in comment spam is really worth causing so much devistation to the web community as a whole. Next, in part 2, I plan to discuss major revisions to my earlier link relationships proposal by incorporating all of the feedback I recieved and the ideas I have presented here to meet the criteria for well designed and useful relationships.

Semantics of <span>

The span element in HTML is widely regarded as a semantically empty, inline element which can only be used as a hook for styling purposes — it has no meaning whatsoever. Thus, it is widely believed by semantic purists that its use should be avoided. Others are more lenient and believe that since it has no semantics, nor any default presentation (except display: inline;), it does not hurt to use it; though advise that it should still be used sparingly. ie. Avoid including an extraneous span within every element as done for the CSS Zen Garden.

It is true that a span element on its own with no semantic attributes, or perhaps just a presentational class name and/or style attribute, has no semantics. There are many examples of using the element for purely presentational purposes (eg. Image replacement techniques); however, there are cases where span is the most appropriate element to use.

Definition of Span

The HTML 4.01 specification states in section 7.5.4 Grouping elements: the DIV and SPAN elements:

The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These elements define content to be inline (SPAN) or block-level (DIV) but impose no other presentational idioms on the content. Thus, authors may use these elements in conjunction with style sheets, the lang attribute, etc., to tailor HTML to their own needs and tastes.

Basically, span (like div) is a structural element intended for applying author-defined semantics where there is no other suitable semantic element available or as a generic container for semantics expressed through semantic attributes, such as an alternate language; though the element is often used for presentational purposes with little regard for either structure or semantics.

Semantic Elements

As discussed by Evolt in Guidelines for the use of <span>, it is often more appropriate to use other semantic elements instead. Before marking up the content, it is important to consider what the content is and its purpose. You may believe that the content, in some cases, is only being marked up to receive additional presentation (eg. bold font and/or different colour) – hence the very common use of elements like <font> – but there has to be a reason for why the presentation is required. It is the reason for the presentation that should be expressed by the markup, not the presentation itself.

Take, for example, marking up a warning to the reader. The site designer has decided that warnings should be displayed as red text in a visual medium. However, being somewhat educated, the CSS author understands the importance of semantic class names and has allocated a warning class for such purposes, and the font colour is applied using CSS — no font element required. The markup author simply needs to decide upon the most appropriate element for the class to be applied.

Some people may believe that the span element is the most appropriate since the semantics are expressed by the value of class attribute. However, this is not entirely the case. Remember that the class attribute is for author-defined semantics, which are mostly (but not entirely) meaningless to the reader in a non-CSS environment. In general, a warning should be emphasised and is, in this case, being emphasised in a visual medium through the use of red text. Thus it makes sense to use either of the emphasis elements: em or strong. Because it is a warning, and red text suggests a rather strong emphasis, I believe strong is the most appropriate; though, depending on the context, others may have completely different, yet valid opinions and, therefore, reach a different conclusion. However, by using <strong class="warning">, the semantics are expressed by the element and extended by the class, so it still makes some sense in a non-CSS environment.

Semantic Attributes

There are many cases where some, or all, of the semantics required may be expressed through the use of attributes such as title, lang, or any other semantic attributes. The applicable attributes, in this case, include those defined by the %coreattrs, %i18n and %events modules. These comprise attributes such as id, class, title, lang, dir and the onevent collection. For example, the semantics of applying a lang attribute to an element states that the element’s content is a different language from the parent element (assuming the ISO-639 language codes used do not match).

Because of the modules in which these attributes are defined, they also apply to most (not all) other elements in HTML. That means that exactly the same semantics that may be applied to span with attributes may also be applied to those other elements. However, there are indeed many cases where no other element is appropriate and span is the best choice.

The aforementioned alternate language markup is the simplest example. In the cases where an alternate language is being used, yet requires no other semantics that may be expressed by other inline elements, span is generally the most appropriate element to use.

Another example is markup for a date and/or time. You may wish to use a date-time class for the date and time of your blog posts, however they may not necessarily require any alternate presentation to make sense to the reader. Assuming the date is not being used as a link and the semantics of other inline markup is inappropriate, span may be the best choice.

In this case, although the class may not necessarily be used for presentational purposes, it is still possible for other non-CSS related processing to make use of the class attribute, such as scripting. For example, you may wish to have some JavaScript convert the date and time presented on your blog, marked up within a <span class="date-time"> element, from UTC to their local computer time for convenience. Hixie’s Natural Log does this; though his markup is different, the processing concept is the same.

Other Non-Semantic Elements

It is important to note that span carries no more nor less semantics than presentational elements such as b and i. These elements, however, although they have no semantic meaning whatsoever, do have default presentation in a visual medium which may (depending on the context) convey semantic information other than emphasis to the reader and may be suitable where strong and em are not. Because of this additional presentation, especially in a non-CSS environment, which may be used to convey some semantics, some authors, such as Eric Meyer, believe that it may be advantageous to make use of these elements, where span would ordinarily do the trick. Eric Meyer has previously explained his use of the b element as a presentational hook for styling purposes, in place of the span element. His reasons ranged from file size, which I previously questioned, to the advantage of the default bold font to express author-defined semantics, other than emphasis, in a non-CSS environment

While, in Eric’s case, the use of the extraneous element, be it span or b, was entirely presentational since the semantics of the content is being expressed by the parent element, not the element itself; there may be many cases where it is considered useful to revert to these other, often disregarded, presentational elements to assist with conveying semantics to some readers, usually in a visual medium, where no other semantic element is appropriate. However, for a similar reason, authors must be careful because the elements may convey semantics that they do not have (eg. <b> may, depending on the context, inadvertently convey a form of strong emphasis in a visual medium). Therefore, although some presentational elements are not deprecated, I do not recommend these presentational elements be used often, and that you carefully weigh up your other options before doing so.

The a element which is designed to be used as either a hyperlink using the href attribute or the destination of a fragment identifier using the name or id attributes is generally considered semantic but, in the absence of either of these or any other attriubutes, is essentially as meaningless as span. For this reason, the current XHTML 2 working draft states in the hypertext module that other than for the explicit markup of links, the element’s semantics are identical to span; and is also one of the reasons why attributes like href may now be applied to nearly every element. So, technically speaking, it is valid to use the a element in place of any span element, however authors should still be cautious with doing so because most authors generally percieve the element as being only for links.

In conclusion, although span is a semantically empty element, whose use should generally be avoided in favour of more semantic elements, there may be cases where other more semantic elements are entirely inappropriate. For example, where the markup is as a presentational hook for styling purposes only, or where all the required semantics may be expressed through attributes. Lastly, it may be advantageous to make use of the default visual-presentation provided some non-semantic, presentational elements in place of span in order to assist with expressing semantics that may not be expressed by other semantic elements. My advice is to use the non-semantic elements sparingly, but don’t be afraid to do so when required.