Monthly Archives: August 2004

Link Relationships

2004-08-25MarkUpLachlan Hunt

This is in repsonse to Ian Hickson’s recent Strange spam to W3C lists article. In it he writes:

I’m thinking that HTML should have an element that basically says “content within this section may contain links from external sources; just because they are here does not mean we are endorsing them” which Google could then use to block Google rank whoring. I know a bunch of people being affected by Web log spam would jump at that chance to use this element if it was put into a spec.

This is my idea for a solution to his request. It involves defining a profile of values for the rev and rel attributes in (X)HTML. All link relationships described here provide additional semantic information that can potentially be used by search engines to increase the accuracy of page ranking algorithims; and other user agents to enhance accessibility and usability of a document. This further enhances, and makes use of the democratic nature of the web which was the foundation of the google’s page rank algorithm, by giving authors more control over the pages their links vote for.

Update: In response to a discussion between myself and Peter Janes in his blog entry, I realized that I should clarify that the intention of this is to create an XMDP, similar to XFN.

Currently, the only way to vote for a page is to link to it. Essentially, each link counts as one vote. It is then weighted according to several different algorithms ranging from the page rank of the linking document, to their similarity in content, and many other things. There is currently no way for an author to say that the page being linked to is a result of user feedback, to specify its quality, accuracy and/or accessibility; nor to provide a rating, or say whether or not it is an endorsement or advertisment of any kind.

For this reason, comment spammers with the intent of increasing the number of links, and thus page rank has become somewhat of a problem. This occurs mostly on blogs and as pointed out by Ian, also in archived mailing lists. In order to combat this problem, there exists a need for authors to be able to specify which links from their pages should count towards page rank, and which should not.. Additionally, there should be a way to provide additional semantic information that could be used by any user agent in a variety of ways to filter and access resources based on user settings. For example, a user agent may harvest all the pingback and trackback URIs in a document, or those that the author considers to be high quality and list them to the user for quick access. These relationships have been broken into the categories I mentioned earlier: user feedback, quality, accuracy, accessibility, rating and endorsement.

User Feedback

These relationships indicate a link to a resource owned by a reader of the document, that has provided feedback to the author in the form of a pingback, trackback or comment. These relationships are mutually exclusive and, therefore, should not be used on the same link.

pingback

This relationship has already been defined in in the Pingback 1.0 specification but is included here to further describe its potential use as a means for search engines to determine page rank information.

When used with the rel attribute, this relationship has been defined to indicate the pingback URI of a resourse. However, when used with a rev attribute, this should indicate the reverse situation. ie. that the resource designated by the link URI has pinged the document.

trackback

This is similar to pingback, and may be considered to be a synonym for pingback. The difference is only in the implementation of the feedback mechanisms. Both are included because it may be useful for authors to have a semantic difference between the two for whatever reason.

comment

This can be used to indicate, using the rel attribute, that the resource indicated by the URI is ~~a document~~ owned, or has just been referenced by a person who has left a pingback, trackback or comment provided via a form. The resource indicated by a link with this relationship should not be considered directly related to the document, and thus the presence of the link alone should not increase the page rank of the linked resource. However, search engines may further analyse the content of the resource to determine if it is infact related, and if so, increase the page rank accordingly. In cases where the content is completely unrelated, the page rank should not be reduced. This is because people can, and do leave legitimate comments despite having vastly different content on their site.

Examples:

In this example, Eric Meyer has been marked as the person who commented, and his article has been marked as the resource that pinged the document.

1. <a href="http://www.meyerweb.com/eric/thoughts/2004/08/24/more-markup/"
      rev="pingback">Pingback</a> by
   <a href="http://www.meyerweb.com/"
      rel="comment">Eric Meyer</a>

Quality

These relationships indicate the authors opinion about the quality of the resource. These should be considered to be either a positive or negative vote from the author. A resource with many links that mostly indicate poor, or awful quality should not rank as high as a resource that may have fewer links to it, but that mostly indicate good, or excellent quality. These relationships are mutually exclusive, and should not appear within the same attribute. A rating appearing in the rel attribute indicates that the author of the document is rating the linked resource; whereas a rating in the rev attribute indicates that the author of the linked resource has rated the document containing the link. Thus, it is allowable to have two quality attributes on the same link, each on the different attribute.

excellent: The author of the document believes the resource to be of excellent quality. This is the highest quality, and should be given the highest weighting compared with the other quality relationships. A resource that has been given this relationship should be considered as recommendation for others to read the linked resource, and thus should be counted as a positive vote from the author.
good: The author beleives the resource is of reasonably good quality, that may be interesting for many people. This should be counted as a positive vote, but not rated as highly as excellent.
average: This value indicates that the author does not consider the resource to be good or bad. Although this should be counted as a postive vote, page rank for this resource should be determined mostly in the normal way. The difference between using this value, and no value at all is that the author has explicitly defined the document to be of average quality, whereas it is undefined otherwise and may infact be of any quality. This value may be of more use to other user agents, in the ways mentioned earlier; or for the purpose of styling with CSS.
poor: The author does not consider the resource to be useful, and does not recommend it to many people. This should be counted as a negative vote, but not weighted as heavily as awful.
awful: The author believes that this resouces is not good at all, and should be given the lowest page rank weighting. The author considers the resourse to contain very poor or offensive content that is not worth reading by anyone.

Examples:

In example 1, the author has declared the content of the CSS Zen Garden to be excellent, and thus voted positively for it. In example 2, Eric Meyer has pinged the document, declaring it to have good content. Additionally, the author of the document has specifed that the content of the Eric Meyer’s article to be average (no offense Eric, this is just an example) but considers the overall content of Eric Meyer’s site to be excellent.

1: <p>Dave Shea is the creator of the
      <a href="http://www.csszengarden.com/"
         rel="excellent">CSS Zen Garden</a></p>

2: <a href="http://www.meyerweb.com/eric/thoughts/2004/08/24/more-markup/"
      rev="pingback good" rel="average">Pingback</a> by
   <a href="http://www.meyerweb.com/"
      rel="excellent comment">Eric Meyer</a>

Accessibility

These values indicate the accessibility rating of a document according to various accessibility initiatives. These do not indicate that the quality of the document is good or bad in any way; just that the document may or may not be easily read and accessed by users of assistive technologies, or infact a search engine.

wcag-A, wcag-AA and wcag-AAA: These indicate that the linked resource conforms to the W3C‘s WAI–WCAG: Level A, Double-A or Triple-A accessibility requirements, depending on the level indicated by the value. These may not be used togethether in the same attribute, but may be used with section-508. These attributes should count toward increasing the page rank of the resource.
section-508: This indicates that the resource conforms to the Section 508 accessibility guidelines. This should count towards increasing the page rank of the resource. This is of equal importance to the WAI–WCAG values
non-conformant: This indicates that the resources does not conform to all the requirements of any acessibility guideline specification, however the resource may still be somewhat accessible to users of assistive technologies. These resources should not be given a significantly lowered page rank weighting, but should rate lower than a resource of equivalent, or similar cotent, and of equal quality, that conforms to section-508, wcag-A or higher guideline.
inaccessible: The author believes that this resouces is not easily accessible to all users, and may depend on the use of proprietary technoliges and/or has plugins without good alternate content, or is, in any other way, completely inaccessible to many users. These documents should be given a lowered page rank weighting, however it should be noted that any resource marked with this value is likely to be inaccessible to a search engine, and thus will not recieve a high page rank anyway.

Examples:

Example 1 has extended the CSS Zen Garden link by further declaring that the content is conformant to both WAI–WCAG, Level Triple-A and Section 508 guidelines

1: <p>Dave Shea is the creator of the
      <a href="http://www.csszengarden.com/"
         rel="excellent wcag-AAA section-508">CSS Zen Garden</a></p>

2: <p>The world's most popular search engine is
      <a href="http://www.google.com/"
         rel="non-conformant">Google</a></p>

Accuracy

These values indicate how accurate the author of the document believes the content of the resource to be. The more accurate votes received by the resource, the higher the page rank should be. These values will be most useful for linking to news and information sites, but can apply to any other resource that an author wishes.

accurate: This is the highets level of accuracy. The author believes that the information contained within this resource is very accurate
believable: The author believes that the content contained within the resource is accurate, but cannot verify that; or that the resource is mostly accurate but contains some information known to be inaccurate.
unsure: The author cannot verify, or dispute the accuracy of the content within the resource. There is no evidence either way, and readers should be weary of any facts and claims made.
unbelievable: The content of the resource makes claims or states facts that are unlikely to be true, or contains several claims that are known to be incorrect.
inaccurate: The resource contains a large portion of information that is known by the author, to be incorrect.

Example:

In example, the author is cheering for is country, and declaring that the article about Grant Hackett winning the 1500m swimming final to be both accurate, and excellent news. In example 2, the author has declared that the Microsft’s article about conformance to standards is unbelievable and that the resource is non-conformant to accessibility requirements.

1. <p><a href="http://www.olympics.com/swimming/news/hackett-wins"
         rel="accurate excellent">Grant Hackett wins 1500m Final</a> (Go Aussies!)</p>
2. <p><a href="http://www.microsoft.com/conformance/"
         rel="unbelievable non-conformant">Microsoft will Conform to Standards Soon</a></p>

Rating

These values indicate the recommended audience for a resource. These could potentially be used by user agents to help enforce parental restrictions. For example, a user agent could prevent a child from accessing, or provide a warning to users about content that has been labled as restriced, mature-adult, etc. depending on the user agent’s or other parental control software settings.

restricted: The indicates that the resource contains illicit, extremely offensive, or excessively pornographic content that is not suitable for most users. Search engines and user agents should not give access to resources of this nature unless the user fully accepts the responsibility, and explicitly agrees to view such content.
mature-adult: This indicates that the resource contains material that could be considered offensive, or inappropriate for many users. Caution should be taken by a user before visiting the resource, and user agents and or search engines may filter content of this kind based on user settings.
mature: The content of this resourse is recommended for mature audiences. User agents may or may not allow access to this content depending on the user’s settings. Page rank should not be affected, as a normal adult would consider this content acceptable, though would perhaps restrict access to minors.
parental-guidance: For most users, this content will be acceptable. Parental guidance may be sought, depending on the user agents settings.
general: The content is acceptable for the widest audience. There is no reason to restrict access to these resources, as they are considered appropriate for any person. However, a user agent should still allow this contnet to be restricted by the user for any reason.
children: The content is most suited to, and targeted at children.
password: The resource indicated requires a password to access the full content. This indicates that registration may be available, or that some content is available any user, possibly depending on other restrictions in place.
member-only: The resource indicated only allows access to members. No content is available to non-members, and membership registration may be restricted.

Example:

In this example, the member archives have been declared to be accessible to members only and also Level Triple-A conformant.

1. <p><a href="http://lists.w3.org/Archives/Member/"
         rel="member-only wcag-AAA">Member Archives</a></p>

Endorsement

advert or advertisement

The resource is being linked to because of an agreement between parties to advertise the resource within the document.

endorsed

The resource indicated carries with it, an endorsement that indicates that the author approves of, and recomends the resource.

unendorsed

The resource is not, in anyway endorsed or to be considered approved by the author and may be there as a result of an advertising agreement, but does not provide any other guarentee about or recommendation for the resource.

Update: This value may also be used to indicate that a link is from a third party that is not endorsed by the document author. For example, links posted in an e-mail to, and archived in a mailing list can be marked with this to prevent the page rank of the mailing list, which is likely to be quite high, from increasing the page rank of the resource. This should help to the never ending flow of spam often distributed to mailing lists, as Ian Hixon mentioned in his article. This value may be used instead of, or in conjunction with comment for links that have been included within the body text of a comment. comment should still be used for links to the authors home page, as demonstrated in the example for comment.

Example:

In this example, the author has indicated that the link is an advertisment, however the author also endorses the resource because it may be useful and have good content and/or products.

1. <p><a href="http://www.example.com/"
         rel="advertisment endorsed good">FooBar</a></p>

That’s about all I can think of now, and I would really appreciate any comment and feedback from anyone.

CSS Guru Explains Ause

2004-08-24MarkUp, StandardsLachlan Hunt

A while back,Eric Meyer wrote an article regarding heading levels in which he mentioned the use of a  element within his markup. He has finally written his reasons for doing so.

In Eric’s current design, they are not much more than a left over from a previous design. He states:

The boldface element is actually a holdover from the previous designs of meyerweb. … The original idea was to provide an inline element as a hook on which I could hang some styles.

Then he goes on to explain how he’s using it to apply some styles which would not work well had they been applied heading containing it. No-one has complained about the usefulness of having an extra element to play with for styling purposes; however, most of us would have used a  element instead. But eric defends his decision by saying that:

… the element name is three letters shorter, so for every hook, I’m saving six characters. If there are, say, twenty such hooks on a page, that saves me 120 characters. It’s a small consideration, but by such incremental savings are document weights reduced.

The document weight may be reduced, but there are other ways to do that without sacrificing semantic purity. For this, I did a little experiment. I saved the markup for this the article (after 10 comments). Firstly, I converted the document to UTF-8. After saving the document as UTF-8 character encoding, excluding the BOM, this required changing all the character entities used in the file to UTF-8 character encodings. Each entity changed was documented in this comparison chart. This documents, for each entity, the Unicode character name, Hexadecimal character reference, original markup entity used, original byte count, new markup, UTF-8 Hexadecimal bytes, new byte count, saving per character, number of instances of the entity and finally, the total savings for the document. In total, this added up to 439 bytes. Afterwards, the  elements were converted to  elements. There were two in this document, totalling an additional 12 bytes.

Note: There were originally 2 additional opening tags, with no closing tags for  elements. However, these were in the comments and were meant to be encoded using < and >. Thus, these two markup errors were corrected before this experiment was started

Following this, the markup errors resulting from the XML syntax used on empty elements were corrected. There were a total of 23 errors of this kind in the document, each comprising an unnecessary space (U+0020) and solidus (aka. slash – U+002F). This removed an additional 46 characters from the files.

The sample files after these corrections were made are available. The markup errors were corrected in all files because the main focus of this file size comparison is about character encoding savings and the use of  instead of . The files are:

Example	Encoding	File Size (Bytes)
Example 1	ISO-8859-1	22,375
Example 2	UTF-8 using `<b>`	21,936
Example 3	UTF-8 using `<span>`	21,948

That is a saving of 427 bytes. Additionally, by checking the HTTP response headers, it’s easy to see that the documents are being served as compressed using gzip encoding. When using gzip, the small amount of bytes saved by using  over  is miniscule compared with that achieved by using gzip. For that article, the original file size was 24,576 bytes (as served from meyerweb including markup errors). The Content-Length indicated by the HTTP response headers is 7726. Thus, even the file size saved by converting to UTF-8 is small by comparison, but much more than the difference of using  instead of 

Also, I should point out that many have commented that Eric has used a superfluous number of classes throughout his markup. I’m not going to judge his use of classes because it would take far too long to analyse how and why each class is used. I just wanted to point out that removing some could also reduce the file size.

“What about semantic purity?” you may ask. In my view, b and span have the same semantic value, which is to say basically none. They’re both purely presentational elements, with the difference that span doesn’t have any expected presentational effects in HTML.

The problem is not the fact that it has no semantics, but the fact that in visual user agents, it portrays semanics that it does not have. It relies on style sheets to remove that perception so that the semantics of actually having no additional semantics is perceived (well, that is, not perceived) correctly by the user in the absence of stylesheets. Thus, the use of stylesheet in this case is, in a kind of backwards way, being used for semantic purposes. This, as everyone should know, breaks the rules of separating structure, content and presentation.

Finally, in an ideal, CSS3, non-IE world; he could use ::outside to provide the extra box and applied the styles directly to the containing element. But those days are still a long way off yet, so as an interim solution, I recommend the use of  for such purposes.

15 Minutes of Fame

2004-08-23PersonalLachlan Hunt

Today, I got the biggest surprise of my life since I started blogging just a few months ago. Well… perhaps not quite the biggest, but still something I’m quite proud of, and worth blogging about. Today, 2004-08-22, an example CSS technique I published, was featured in Dave Shea’s Dailies list.

I know it’s only there for a day, before it’s hidden away in the archives, but everyone has 15 minutes of fame in their life, today I got mine. I think this is a great achievement, even though it’s only one link, it is the first. It just goes to show that even the big guns, such as Dave, haven’t lost sight of the little people, and respect good quality work no matter who it’s from. Of course, I already knew this about Dave, since he accepted and published my CSS Zen Garden design, Office 2003, as he does for many other designers, whether they’re well known or not. (Note to self: I must stop writing about Dave in every post ?.)

Anyway, I must admit that that has been one of my goals since I became a blogger. ie. To write something good enough that it is considered worth linking to from one of the well known blogs. So, this got me thinking. What are my other goals? What do I want to achieve from my site and from my blogs? How many readers do I have, and how many do I consider to be a respectable amount?

One thing that I feel is important to becoming a good blogger is receiving lots of feedback and comments about what I write, or general comments about my site design, usability and functionality, etc. So, if your reading this, let me know what you think.

I have had some comments previously, and most have been quite positive. But I really want to get more. That way, I can improve the content I publish, which will benefit me because I’ll get more readers, but more importantly, it will benefit my readers because they’ll have something good to read — what good is blog, if no-one enjoys reading it? So, do yourself a favour. Leave me comment: critisize or compliment me — it’s your choice, any feedback is good feedback.

One final note: please at least leave your name and/or URI at the end of your comment. It’s nice to know names, and to see the sites of my readers.

Lachy’s Log

If I start now, I'll be finished later!

Monthly Archives: August 2004

Link Relationships

User Feedback

Examples:

Quality

Examples:

Accessibility

Examples:

Accuracy

Example:

Rating

Example:

Endorsement

Example:

CSS Guru Explains A<b>use

15 Minutes of Fame