A while ago, I wrote an entry entitled Link
Relationships that detailed my
ideas for additions to the values for the rev
and rel
attributes. It went a
long way into describing many relationships for several categories including
user
feedback, quality, accuracy, accessibility, rating and endorsement.
Having received some feedback from many sources regarding the value of all
these relationships, the fact that Tantek had implemented an alpha
system for Vote Links in Technorati and now that Google, MSN
Search and
Yahoo have implemented rel="nofollow"
;
I intend to revisit the idea and explain exactly why I don’t think Technorati’s vote-for
, vote-against
and
vote-abstain
; and Google’s nofollow
relationships
are suitable. Following this, in part 2, I will discuss the major modifications
to my original ideas that will both simplify and reduce the number of relationships,
yet provide increased semantics in a way that implements the desired functionality
of vote links (including nofollow
)
with many more advantages that are applicable to a wider variety of user
agents.
Relationship Semantics
Firstly, I would like to explain a little about the semantics and purpose
behind link relationships to indicate the link role. Each relationship should
meet one or more of these criteria to indicate:
- The semantic relationship from one resource to another (eg.
Next
or Prev
in sequence),
- What the resource is (eg.
Contents
, Stylesheet
, etc.), and/or
- The purpose of the link (eg. To provide
Help
for the user).
Additionally the name of the relationship should accurately represent its
semantics, and all relationships should make some sense in the context of
any number of user agents and for any user — the relationship should be
named and designed independent of any particular type of UA. However, this
does not mean that every UA should be able to do something sensible
with the link; nor that every UA that does, should do the same thing
with it.
All of the above is true for all HTML
4 link
types,
somewhat true for XFN,
and for most of the relationships I suggested previously. However, the same
is not entirely the case for either the vote-links or nofollow
relationships.
Vote Links
In fairness, vote links (unlike nofollow
) do express some semantics. ie. A
vote-for
relationship expresses that the author of document A approves of document
B. In this way, a search engine, for example, may make use of the relationship
during page-rank calculations and another UA may simply gather and list all
such links for the user to select from easily. However, the relationships do
not indicate any semantic relationship, what the resource is, or the purpose
of the link.
In the Technorati
Vote Links Wiki, the reasons given for using the yes/no/abstain
model are given. However, although they do raise some good points that I
will be incorporating into my link relationship suggestions later on, they
fail to make a solid case for using them specifically. The general reasoning
given is that this model has served well for politics, committees and user
rating systems, such as that used on eBay, and will thus, due to the democratic
nature of the web, serve well for page rank calculations; and that they were
designed to be easy-to-remember to manually type in a hand-coding environment,
and easy for authoring tools to implement.
Given that, however, the main reason for why vote-links are unsuitable is
not just for lack of useful semantics, it is that they provide too much potential
for abuse. Mark Pilgrim has given
his
reasons for why vote-for
is useless and
why vote-against
is harmful.
- rel=”vote-for” –> increase PageRank (the default, this is
what all links do now)
- rel=”vote-abstain” –> ignore for PageRank (like a Javascript
link that Google can’t follow, or a hypothetical link-level NOFOLLOW
meta tag)
- rel=”vote-against” –> decrease PageRank
He also states that vote-abstain
is
the only useful value and compares it with the idea for a nofollow
value
which he approves of, and which Google was soon to implement.
Because vote-abstain
is essentially
equivalent to nofollow
in that it
states that the link should not be counted for page-rank calculations, it
is harmful for the same reasons given below for nofollow
,
though it is better because
its name more accurately represents its meaning.
The nofollow
Relationship
The nofollow
relationship has
no semantics relating to the documents in any way whatsoever – its name simply
states, in no uncertain terms, that a user agent should not follow the link;
however, its intended meaning is that the link should not be counted towards
page-rank (ie. This link is abstaining from voting). Compare this with the
similarly named, yet semantically different nofollow
value for the Robot’s
Meta Tag. For the robot’s meta tag, this value means that no links within
the document should be followed or indexed in any way; whereas the nofollow
relationship, according to google’s
announcement, may be followed, and
indexed but shall not recieve any additional credit.
From now on, when Google sees the attribute (rel="nofollow"
) on hyperlinks,
those links won’t get any credit when we rank websites in our search results.
Thus, as stated earlier, vote-abstain
is
a better name; though both are considered harmful for the following reasons.
nofollow
makes no sense at all
in any user agent, other than a search engine that uses the page-rank (or
equivalent) algorithm; and is, in fact, a case where the relationship is
not named or designed to be independent of a particular type of UA.
Additionally, this relationship opens up the potential for abuse. Much like
an untrusted
content element (and other similar suggestions) would be abused
by marking the entire content of every page; and due to the complete lack
of semantics, uneducated authors (who make up a large proportion of the web
community) may begin to include the relationship within all external links
so as not to help increase the page rank for any external site they link
to, and may even do so as a result of company policy. Of course its appearance
in company policies may seem like a wild assumption, but just consider other
such linking
policies that some companies have introduced. Therefore I am stating it
as a possibility, not a certainty.
Steven Garrity has raised another interesting point in his Thoughts
on Weblog Comment Spam Prevention:
Tagging all links in comments left by weblog readers means that none of these
links will contribute to the great hive mind that is Google PageRank.
There are loads of great and valuable links in weblog comments.
He then continues to say that it is likely that some software may not use
the value for links added by registered users; but most
comments left on blogs are not from registered users at all. Thus, this
will, in fact, inadvertently harm legitimate users, and indeed search results,
by losing the benefit of mass communication.
The real question, however, which everyone will be asking is: will this
stop comment spam? On the surface, it seems like a brilliant idea. It attacks
the very reason behind a lot of comment spam which is designed to increase
page rank for spammer’s web sites. However, to answer this question, one
really needs to look at the current methods used to fulfil the same function.
Not all spam prevention methods need to be looked at, just those that prevent
a search engine from indexing the link and counting it towards page rank.
These methods include:
- A link redirector, such as that used by Blogger.
- Plain text-links.
- Moderation and deleting spammer comments.
I’ve have only listed these techniques, and avoided others like spam filters,
black lists, etc. because, while those are designed to stop spam before they
even get published, these are designed (like nofollow
) to reduce the benefit
of the link for the spammer in the event that the comment is published.
It’s not hard to see that while each of these methods have been and will
continue to be used in many blogs, none of them have succeeded in stopping
the flood of spam. Again, Mark Pilgrim has explained
this phenomenon
in regard to the use of plain text links by saying that spammers don’t
care.
Spammers have it in their heads now that weblog
comments are a vector to exploit. They don’t look at individual results
and tweak their software to stop bothering individuals. They write generic
software that works with millions of sites and goes after them en masse. So
you would end up with just as much spam, it would just be displayed with unlinked
URLs.
That point does not only apply to plain text links, but to any method with
the same effect. Spammers don’t read blogs, they don’t know who implements what
features and who doesn’t; and they don’t know which spam attempts will be successful,
yet they continue to automate and spam anyone and everyone they can, in the
hope that some small percentage will get through.
For this method to be successful, it would require that every single web site,
that provides user feedback mechanisms, to implement it. If only some
do, spammers won’t care. If most do, but some don’t, spammers will just try
harder to hit those that don’t, by striking more blogs harder, and faster,
than ever. However, I guarantee that no method will be completely fool proof
and ever succeed in being used by 100% of web sites; and as a result, spam
will continue. Of course, there’s no need to be completely
pessimistic about
all this, it is another tool in a spam-fighter’s toolbox that may have some
impact; it just won’t stop it entirely.
So, in summary, nofollow
is harmful
because it:
Update:
changed the following list to an ordered list for easier referencing.
- Fails to indicate any semantic relationship between resources.
- Does not indicate what the resource is.
- Does not state the purpose of the link.
- Makes no sense for any user agent or user, other than a search engine using
the page-rank (or equivalent) algorithm.
- May be abused to prevent any external links, regardless of their context,
from a web site being counted towards page-rank, perhaps due to company
policy.
- Has the potential to harm the benefits of mass communication among weblogs
and their comments.
- Its name does not accurately represent its semantics.
- It will fail to stop comment spam completely.
Because of all this, I will not be endorsing nor using this relationship
for any links on my site (assuming blogger gives me a choice in the matter),
and I encourage all authors, weblog CMS vendors
and search engines to seriously consider whether the small possibility of a
slight reduction in comment spam is really worth causing so much devistation
to the web community as a whole. Next, in part 2, I plan to discuss major revisions
to my earlier link relationships proposal by incorporating all of the feedback
I recieved and the ideas I have presented here to meet the criteria for well
designed and useful relationships.