Link Relationships

This is in repsonse to Ian Hickson’s recent Strange spam to W3C lists article. In it he writes:

I’m thinking that HTML should have an element that basically says “content within this section may contain links from external sources; just because they are here does not mean we are endorsing them” which Google could then use to block Google rank whoring. I know a bunch of people being affected by Web log spam would jump at that chance to use this element if it was put into a spec.

This is my idea for a solution to his request. It involves defining a profile of values for the rev and rel attributes in (X)HTML. All link relationships described here provide additional semantic information that can potentially be used by search engines to increase the accuracy of page ranking algorithims; and other user agents to enhance accessibility and usability of a document. This further enhances, and makes use of the democratic nature of the web which was the foundation of the google’s page rank algorithm, by giving authors more control over the pages their links vote for.

Update: In response to a discussion between myself and Peter Janes in his blog entry, I realized that I should clarify that the intention of this is to create an XMDP, similar to XFN.

Currently, the only way to vote for a page is to link to it. Essentially, each link counts as one vote. It is then weighted according to several different algorithms ranging from the page rank of the linking document, to their similarity in content, and many other things. There is currently no way for an author to say that the page being linked to is a result of user feedback, to specify its quality, accuracy and/or accessibility; nor to provide a rating, or say whether or not it is an endorsement or advertisment of any kind.

For this reason, comment spammers with the intent of increasing the number of links, and thus page rank has become somewhat of a problem. This occurs mostly on blogs and as pointed out by Ian, also in archived mailing lists. In order to combat this problem, there exists a need for authors to be able to specify which links from their pages should count towards page rank, and which should not.. Additionally, there should be a way to provide additional semantic information that could be used by any user agent in a variety of ways to filter and access resources based on user settings. For example, a user agent may harvest all the pingback and trackback URIs in a document, or those that the author considers to be high quality and list them to the user for quick access. These relationships have been broken into the categories I mentioned earlier: user feedback, quality, accuracy, accessibility, rating and endorsement.

User Feedback

These relationships indicate a link to a resource owned by a reader of the document, that has provided feedback to the author in the form of a pingback, trackback or comment. These relationships are mutually exclusive and, therefore, should not be used on the same link.

pingback

This relationship has already been defined in in the Pingback 1.0 specification but is included here to further describe its potential use as a means for search engines to determine page rank information.

When used with the rel attribute, this relationship has been defined to indicate the pingback URI of a resourse. However, when used with a rev attribute, this should indicate the reverse situation. ie. that the resource designated by the link URI has pinged the document.

trackback

This is similar to pingback, and may be considered to be a synonym for pingback. The difference is only in the implementation of the feedback mechanisms. Both are included because it may be useful for authors to have a semantic difference between the two for whatever reason.

comment

This can be used to indicate, using the rel attribute, that the resource indicated by the URI is a document owned, or has just been referenced by a person who has left a pingback, trackback or comment provided via a form. The resource indicated by a link with this relationship should not be considered directly related to the document, and thus the presence of the link alone should not increase the page rank of the linked resource. However, search engines may further analyse the content of the resource to determine if it is infact related, and if so, increase the page rank accordingly. In cases where the content is completely unrelated, the page rank should not be reduced. This is because people can, and do leave legitimate comments despite having vastly different content on their site.

Examples:

In this example, Eric Meyer has been marked as the person who commented, and his article has been marked as the resource that pinged the document.

1. <a href="http://www.meyerweb.com/eric/thoughts/2004/08/24/more-markup/"
      rev="pingback">Pingback</a> by
   <a href="http://www.meyerweb.com/"
      rel="comment">Eric Meyer</a>
Quality

These relationships indicate the authors opinion about the quality of the resource. These should be considered to be either a positive or negative vote from the author. A resource with many links that mostly indicate poor, or awful quality should not rank as high as a resource that may have fewer links to it, but that mostly indicate good, or excellent quality. These relationships are mutually exclusive, and should not appear within the same attribute. A rating appearing in the rel attribute indicates that the author of the document is rating the linked resource; whereas a rating in the rev attribute indicates that the author of the linked resource has rated the document containing the link. Thus, it is allowable to have two quality attributes on the same link, each on the different attribute.

excellent
The author of the document believes the resource to be of excellent quality. This is the highest quality, and should be given the highest weighting compared with the other quality relationships. A resource that has been given this relationship should be considered as recommendation for others to read the linked resource, and thus should be counted as a positive vote from the author.
good

The author beleives the resource is of reasonably good quality, that may be interesting for many people. This should be counted as a positive vote, but not rated as highly as excellent.

average

This value indicates that the author does not consider the resource to be good or bad. Although this should be counted as a postive vote, page rank for this resource should be determined mostly in the normal way. The difference between using this value, and no value at all is that the author has explicitly defined the document to be of average quality, whereas it is undefined otherwise and may infact be of any quality. This value may be of more use to other user agents, in the ways mentioned earlier; or for the purpose of styling with CSS.

poor

The author does not consider the resource to be useful, and does not recommend it to many people. This should be counted as a negative vote, but not weighted as heavily as awful.

awful

The author believes that this resouces is not good at all, and should be given the lowest page rank weighting. The author considers the resourse to contain very poor or offensive content that is not worth reading by anyone.

Examples:

In example 1, the author has declared the content of the CSS Zen Garden to be excellent, and thus voted positively for it. In example 2, Eric Meyer has pinged the document, declaring it to have good content. Additionally, the author of the document has specifed that the content of the Eric Meyer’s article to be average (no offense Eric, this is just an example) but considers the overall content of Eric Meyer’s site to be excellent.

1: <p>Dave Shea is the creator of the
      <a href="http://www.csszengarden.com/"
         rel="excellent">CSS Zen Garden</a></p>

2: <a href="http://www.meyerweb.com/eric/thoughts/2004/08/24/more-markup/"
      rev="pingback good" rel="average">Pingback</a> by
   <a href="http://www.meyerweb.com/"
      rel="excellent comment">Eric Meyer</a>
Accessibility

These values indicate the accessibility rating of a document according to various accessibility initiatives. These do not indicate that the quality of the document is good or bad in any way; just that the document may or may not be easily read and accessed by users of assistive technologies, or infact a search engine.

wcag-A, wcag-AA and wcag-AAA

These indicate that the linked resource conforms to the W3C‘s WAIWCAG: Level A, Double-A or Triple-A accessibility requirements, depending on the level indicated by the value. These may not be used togethether in the same attribute, but may be used with section-508. These attributes should count toward increasing the page rank of the resource.

section-508

This indicates that the resource conforms to the Section 508 accessibility guidelines. This should count towards increasing the page rank of the resource. This is of equal importance to the WAIWCAG values

non-conformant

This indicates that the resources does not conform to all the requirements of any acessibility guideline specification, however the resource may still be somewhat accessible to users of assistive technologies. These resources should not be given a significantly lowered page rank weighting, but should rate lower than a resource of equivalent, or similar cotent, and of equal quality, that conforms to section-508, wcag-A or higher guideline.

inaccessible

The author believes that this resouces is not easily accessible to all users, and may depend on the use of proprietary technoliges and/or has plugins without good alternate content, or is, in any other way, completely inaccessible to many users. These documents should be given a lowered page rank weighting, however it should be noted that any resource marked with this value is likely to be inaccessible to a search engine, and thus will not recieve a high page rank anyway.

Examples:

Example 1 has extended the CSS Zen Garden link by further declaring that the content is conformant to both WAIWCAG, Level Triple-A and Section 508 guidelines

1: <p>Dave Shea is the creator of the
      <a href="http://www.csszengarden.com/"
         rel="excellent wcag-AAA section-508">CSS Zen Garden</a></p>

2: <p>The world's most popular search engine is
      <a href="http://www.google.com/"
         rel="non-conformant">Google</a></p>
Accuracy

These values indicate how accurate the author of the document believes the content of the resource to be. The more accurate votes received by the resource, the higher the page rank should be. These values will be most useful for linking to news and information sites, but can apply to any other resource that an author wishes.

accurate

This is the highets level of accuracy. The author believes that the information contained within this resource is very accurate

believable

The author believes that the content contained within the resource is accurate, but cannot verify that; or that the resource is mostly accurate but contains some information known to be inaccurate.

unsure

The author cannot verify, or dispute the accuracy of the content within the resource. There is no evidence either way, and readers should be weary of any facts and claims made.

unbelievable

The content of the resource makes claims or states facts that are unlikely to be true, or contains several claims that are known to be incorrect.

inaccurate

The resource contains a large portion of information that is known by the author, to be incorrect.

Example:

In example, the author is cheering for is country, and declaring that the article about Grant Hackett winning the 1500m swimming final to be both accurate, and excellent news. In example 2, the author has declared that the Microsft’s article about conformance to standards is unbelievable and that the resource is non-conformant to accessibility requirements.

1. <p><a href="http://www.olympics.com/swimming/news/hackett-wins"
         rel="accurate excellent">Grant Hackett wins 1500m Final</a> (Go Aussies!)</p>
2. <p><a href="http://www.microsoft.com/conformance/"
         rel="unbelievable non-conformant">Microsoft will Conform to Standards Soon</a></p>
Rating

These values indicate the recommended audience for a resource. These could potentially be used by user agents to help enforce parental restrictions. For example, a user agent could prevent a child from accessing, or provide a warning to users about content that has been labled as restriced, mature-adult, etc. depending on the user agent’s or other parental control software settings.

restricted

The indicates that the resource contains illicit, extremely offensive, or excessively pornographic content that is not suitable for most users. Search engines and user agents should not give access to resources of this nature unless the user fully accepts the responsibility, and explicitly agrees to view such content.

mature-adult

This indicates that the resource contains material that could be considered offensive, or inappropriate for many users. Caution should be taken by a user before visiting the resource, and user agents and or search engines may filter content of this kind based on user settings.

mature

The content of this resourse is recommended for mature audiences. User agents may or may not allow access to this content depending on the user’s settings. Page rank should not be affected, as a normal adult would consider this content acceptable, though would perhaps restrict access to minors.

parental-guidance

For most users, this content will be acceptable. Parental guidance may be sought, depending on the user agents settings.

general

The content is acceptable for the widest audience. There is no reason to restrict access to these resources, as they are considered appropriate for any person. However, a user agent should still allow this contnet to be restricted by the user for any reason.

children

The content is most suited to, and targeted at children.

password

The resource indicated requires a password to access the full content. This indicates that registration may be available, or that some content is available any user, possibly depending on other restrictions in place.

member-only

The resource indicated only allows access to members. No content is available to non-members, and membership registration may be restricted.

Example:

In this example, the member archives have been declared to be accessible to members only and also Level Triple-A conformant.

1. <p><a href="http://lists.w3.org/Archives/Member/"
         rel="member-only wcag-AAA">Member Archives</a></p>
Endorsement

The resource is being linked to because of an agreement between parties to advertise the resource within the document.

endorsed

The resource indicated carries with it, an endorsement that indicates that the author approves of, and recomends the resource.

unendorsed

The resource is not, in anyway endorsed or to be considered approved by the author and may be there as a result of an advertising agreement, but does not provide any other guarentee about or recommendation for the resource.

Update: This value may also be used to indicate that a link is from a third party that is not endorsed by the document author. For example, links posted in an e-mail to, and archived in a mailing list can be marked with this to prevent the page rank of the mailing list, which is likely to be quite high, from increasing the page rank of the resource. This should help to the never ending flow of spam often distributed to mailing lists, as Ian Hixon mentioned in his article. This value may be used instead of, or in conjunction with comment for links that have been included within the body text of a comment. comment should still be used for links to the authors home page, as demonstrated in the example for comment.

Example:

In this example, the author has indicated that the link is an advertisment, however the author also endorses the resource because it may be useful and have good content and/or products.

1. <p><a href="http://www.example.com/"
         rel="advertisment endorsed good">FooBar</a></p>

That’s about all I can think of now, and I would really appreciate any comment and feedback from anyone.

3 thoughts on “Link Relationships

  1. What we really should demand, when it comes to potential spam links, is something like .

    Links spam has become a serious problem for the Wiki community (and the rest of the open web), and the problem was caused by Google Inc. They still refuse to provide us with that fix, and I don’t think they are as smart people as everyone tells.

    Your link meta information profile goes beyond the spam problem, it is nice. But I don’t expect any major search engine to care, if they won’t even implement the basic stuff.

  2. The reason I don’t think we should have a method to apply to the whole page is because it makes it far too easy to abuse. If we did that, it’s possible people would have it in their templates by default, and we could potentially end up with millions of websites blocking page rank, for absolutely no reason. With my method, it forces the author to think about the purpose of each individual link, which reduces the potential for abuse. If there is ever a standardised method, then I will insist that it focus on the semantics of the relationship, rather than a all-in or all-out approach, as with the meta element, and other similar proposals.

  3. I think that typed links are underused and underspecified and unfortunately we are heading now into big trouble in whuffie world of tomorrow when rel attribute values starts to collide.

    We need to start to use namespaces and QNames may be the simples solution. I have implemented QName based linking in my information space manager. For details see:
    http://is2.xspaces.org/TypedLinks

Comments are closed.