Category Archives: MarkUp

SGML, (X)HTML, XML and other markup languages.

Web Developer Quiz Answers

These are the answers to last week’s Web Developer Quiz. If you have not attempted the quiz yourself, I recommend you do so before reading the following answers. All the responses to the quiz from ealier this week were made public earlier today.

Validation

There is only one error within the sample document, validate it and see for yourself:

Line 4, column 11: there is no attribute “ALIGN”

The align attribute is not valid in HTML 4.01 Strict because it is deprecated. It is valid in HTML 4.01 Transitional. For information about why line 7 isn’t an error, refer to the validation quiz and associated answers I published earlier.

Elements in the DOM

There are 3 p elements within the document. The syntax: <> in an empty start-tag, an unsupported SHORTTAG feature from SGML. It basically means to open the most recent unclosed element. Similarly, </> is an empty end-tag which ends the most recent open element.

The em element will not be present because, despite appearances to the contrary, it is actually commented out. The head and body elements will still be present, even though their start- and end-tags have been omitted.

Validate it and look at the Parse Tree to confirm these answers.

Semantics

The unordered list (option 3) is the most semantically correct. A stylesheet may be used to style it in any way desired.

The <h1> element without the style attribute or the class attribute with a presentational class name is the most appropriate markup for a document title. An external stylesheet may be used, and is the recommended way, to horizontally centre it in a visual medium using a large, bold font. The use of the style attribute or the presentational class name is not recommended because it fails to separate the markup from the presentation.

Everyone got these 2 questions correct. Well done. In hindsight, I wish I had made these more difficult, but since semantics is not an exact science, I found that (in general) the more complicated the question, the less specific the answer could be. So, I settled for relatively easy questions for things that beginners tend to markup poorly.

Character References

  • For an HTML 4.01 document: the numeric character reference: &#146; and the character entity reference: &apos; are invalid.
  • For an XHTML 1.0 document: technically, none of them are invalid; however the numeric character reference: &#146;, while it is not prohibited in XML, refers to a Unicode control character and should not be used anyway.
  • For a generic XML document with no DTD, only the character entity reference: &rsquo; is invalid. &apos; is valid because it is one of the 5 predefined entities in XML.

Since few people correctly answered these questions, I will be providing more information about this in tomorrow’s post.

Media Types (MIME)

An XHTML 1.1 document SHOULD NOT be served with the text/html MIME type. See the XHTML Media Types Note for more information.

An XHTML 1.0 document MAY be served as text/html when the document conforms to the Appendix C HTML Compatibility Guidelines in the XHTML 1.0 Recommendation. Those who pointed out that this is ludicrous get a bonus point.

If any of you have any questions or comments regarding this quiz, please feel free to let me know. The feedback I have recieved, or will recieve, regarding this quiz will help me a lot with the next one I’m planning, which will most likely be a CSS quiz of some kind, possibly followed by a JavaScript/DOM quiz if I have time. Beyond that, well, you’ll have to wait and see.

Web Developer Quiz Update

I’ve received quite a few responses to yesterday’s Web Developer Quiz, including some feedback about the type of questions I asked and criticism about them being too much about SGML which I’d like to take the opportunity to address.

Firstly, out of all the responses received in the last 24 hours (although, they’re not yet published), not one person has answered all questions correctly. Indeed, there are questions in there that no-one has answered correctly yet, which I am very surprised about — I was expecting to, at least collectively, receive the correct answers for all questions.

Secondly, I’m going to go through each section and explain, without giving away the answers just yet, why I asked each question and why it’s important for authors to know the answers to them.

Validation

Looking at the sample document, it’s not hard to see that it makes use of unsupported SGML features that cannot be used in the real world. However, this does not mean that authors do not need to be aware of them.

In fact, the document demonstrates just how easy it is to make unintentional use of such features, which, while it may not be what the author intended, will either result in one of two possibilites. 1. Completely unexpected errors that don’t seem to make sense: a problem I see a lot of beginners struggle with. 2. As is the case with this document, the combination of 2 specific authoring errors results in no validation error being reported at all, for the mistakes.

At this point, I’d like to point out that there is just 1 error within the document (most people have picked it so far), but it has nothing to do with the unsupported SGML features, and everything to do with the declared DOCTYPE. This will, perhaps, become more apparent to you when I reveal the answers and explain the reasons for the errors, or lack thereof, in more detail next week.

Elements in the DOM

The first of these questions is very much related to an unsupported SGML feature, rather than real world, practical HTML, and I admit, I just threw it in as a challenge for the more advanced authors. It is, however, important to be aware of the syntax and that it is unsupported, and thus cannot be used, even inadvertently.

The second question is testing your knowledge of real world, supported mark up. You need to be aware that start-tags and end-tags can be omitted for some elements, yet the elements will still be present in the DOM. You also need to be aware of the HTML/SGML comment syntax and, although it wasn’t really tested with these questions, the syntactic differences between SGML and XML comments.

Semantics

These are, perhaps, the easiest and most practical questions in the quiz. So far, nearly everybody has answered these questions correctly, and I don’t feel I need to explain why they were included, it seems quite obvious to everyone.

Character References

Surprisingly, nobody has correctly any of these 3 questions. Yet it is important from both a practical point of view and a validation point of view, to understand the similarities and differences between HTML, XHTML and XML with respect to character references. It is also important to have an understanding of the Unicode character repertoire and code points, which is what everyone has failed on, so far.

Media Types

Again, this is important from a practical perspective. Authors need to understand, that they should not use XHTML with the wrong media type, and also understand the practical limitations with doing so. Conversely, although this was not tested with these questions, it is important to understand the current practical limitations with using the correct media type for XHTML.

I’ll be revealing the answers including all the responses to the quiz on Sunday evening (local time). Until then, tell others, who haven’t seen it yet, about the quiz, I’m interested in finding out how much an average web developer really knows about the technologies they use every day.

Web Developer Quiz

This quiz is designed to test whether or not web developers have an understanding of the basic technologies used on the web, primarily HTML, HTTP, Media Types (MIME) and character repertoires and encodings. Personally, I expect every single web developer to pass this quiz with flying colours, yet reality tells me that a large proportion will struggle. So, in the interests of finding out exactly how much web developers in general do and do not know, and for your own personal benefit, I decided to publish this quiz (or survey, if you like).

Firstly, a few ground rules. Please don’t cheat. I expect all web developers to know the answers to these questions without the need for reference material or the use of automated tools. That means, please don’t make use of the validator or look up the specifications to answer these questions, they’re designed to be easy enough to answer without such tools, yet still provide enough of a challenge for all but the most knowledgeable authors. Secondly, in order to give everyone a fair go and avoid chance of having all the correct answers given away in the first response, I’ve temporarily enabled comment moderation and no comments will be appearing until I publish the results and answers next week. Ok, so on with the quiz…

This sample document applies to the first 3 questions. You may assume the HTTP headers contain:

Content-Type: text/html;charset=UTF-8

Note: This document uses some special syntax that is not widely supported in existing browsers; it is only designed to test your knowledge of HTML.

1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
2. <html lang="en">
3.   <title/Sample HTML 4.01 Document/
4.   <p align="right">This is a sample HTML 4.01 Strict document.
5.   <>How much do you know about HTML?</>
6.   <!-- -- --> <em>It’s not hard!</em> <!-- -- -->
7.   <p>Created by <a href=http://lachy.id.au/">Lachlan Hunt
8. </html>

Validation

Which lines in the above HTML document contain validation errors, if any? Note: I’m only looking for those errors that will be reported by a conforming SGML based validator.

Elements in the DOM

  1. How many p elements are there within the above document?
  2. Which of these elements, if any, will not be present within the the Document Object Model of the above document?
    • <head>
    • <body>
    • <em>

Semantics

  1. Which markup structure is the most semantically correct for a navigational link menu, regardless of how it will be presented visually?
    1. <div class="menu">
          <a href="…">Link  1</a> |
          <a href="…">Link 2</a> |
          <a href="…">Link 3</a>
      </div>
    2. <div class="menu">
          <a href="…">Link  1</a><br>
          <a href="…">Link  2</a><br>
          <a href="…">Link  3</a><br>
      </div>
    3. <ul class="menu">
          <li><a  href="…">Link 1</a></li>
          <li><a  href="…">Link 2</a></li>
          <li><a  href="…">Link 3</a></li>
      </div>
  2. Which markup structure is the most semantically correct for a title within the document body that may be horizontally centred in a visual medium (eg. screen) using a large, bold font?
    1. <div class="title">Document Title</div>
    2. <h1>Document Title</h1>
    3. <p align="center"><font size="+3"><b>Document Title</b></font></p>
    4. <h1 style="font-weight:bold;font-size:large;text-align:center;">Document Title</h1>
    5. <h1 class="LargeBoldCenterHeading">Document Title</h1>

Character References

Given these three numeric character references, and two character entity references:

  • &#x2019;
  • &#8217;
  • &#146;
  • &rsquo;
  • &apos;
  1. Which ones are invalid for an HTML 4.01 document?
  2. Which ones are invalid for an XHTML 1.0 document?
  3. Which ones are invalid for a generic XML document? (assume no DTD or Schema)

Media Types (MIME)

  1. Which of these MIME types SHOULD NOT be used for an XHTML 1.1 document?
    • application/xhtml+xml
    • text/html
    • application/xml
    • text/xml
  2. Using the answer from the previous question, under what conditions MAY (according to the recommendation) an XHTML 1.0 document use that MIME type?