Day 1: Basic HTML [prev] Home [next] Day 3: Multimedia

Web Camp Day 2: (More) Advanced HTML

This is a summary write-up of material and concepts covered in Day 2 of the Spiders at Work Web Camp. On day 2, we talked about more advanced HTML.

Review

On day 1, we talked about the structure of an HTML document, and a handful of basic tags for presenting basic markup. These included:

Please be sure you are comfortable with the use of these tags before continuing.

Positioning: using tables for layout

You will soon discover that as you write your content, your document will grow vertically, as you would expect. And then, it's only a matter of time before you start wondering about doing layout; for example, putting things next to each other. What if you want to put a paragraph next to an image? You might first try something like this:

<img src="my_image.gif" alt="My image">
<p>Here is a paragraph describing this wonderful image.  It truly is a
wonderful image, don't you think?  I sure think so.</p>

But this gives you:

My image

Here is a paragraph describing this wonderful image. It truly is a wonderful image, don't you think? I sure think so.

...which probably isn't what you wanted. Maybe then we try putting the image at the beginning of the paragraph, as in:

<p><img src="my_image.gif" alt="My image">
Here is a paragraph describing this wonderful image.  It truly is a
wonderful image, don't you think?  I sure think so.</p>

Which gives you:

My image Here is a paragraph describing this wonderful image. It truly is a wonderful image, don't you think? I sure think so.

This is a little closer, but we probably want the top edges to be vertically aligned, so this method isn't right either.

This brings us to the emotionally-charged and infinitely-abusable notion of using tables for layout. Before we head down that dangerous path, it's worthwhile introducing tables for what they were originally intended for: the tabular representation of data, spreadsheet-style.

Tables as they were intended

Here is a sample table:

<table border="1">
<tr><th>Team</th><th>Wins</th><th>Losses</th></tr>
<tr><td>Yankees</td><td>60</td><td>49</td></tr>
<tr><td>Red Sox</td><td>56</td><td>50</td></tr>
<tr><td>Blue Jays</td><td>51</td><td>57</td></tr>
</table>

It renders like this:

TeamWinsLosses
Yankees6049
Red Sox5650
Blue Jays5157

(Please note that the data in the table above was made up, in painful recognition of reality, by a rabid and perpetually-disappointed Yankee-hating Red Sox fan.)

The best way to think of tables is as stacks of horizontal rows of data contained by a <table> tag. Rows are broken down into one or more cells, which are the individual boxes shown above. The tags relating to tables are:

If not given "width" attributes, <table> and <td> will constrict as tightly as possible around the data inside them, making the table as visually small as possible for purposes of efficiency.

Tables were not part of the original, first version of HTML; they were added in version 3 (or was it 2?) of the language. It didn't take long for HTML authors to figure out that tables could be (ab)used for layout purposes.

Tables as they are now usually used

Tables are now almost always used to effect page layout, rather than to represent tabular data. In other words, it was figured out early on that you could put images in table cells, and bingo! You've got layout control. Consider the following example:

<table>
<tr>
<td><img src="my_image.gif" alt="my image"></td>
<td><p>Here is a paragraph describing this wonderful image.  It truly is a
wonderful image, don't you think?  I sure think so.</p></td>
</tr>
</table>
my image

Here is a paragraph describing this wonderful image. It truly is a wonderful image, don't you think? I sure think so.

At last count, roughly fifteen zillion people use tables for layout on their websites.

So?

So what? What's the problem with doing that? Well, it's really only of concern to purists (like this author): using tables for layout contributes to the problem of mixing form and content, discussed in the day 1 writeup. It uses a tag meaning to introduce a logical structure into your document to achieve a visual effect. People who are visually disabled have no easy way of knowing whether a table their speech-enabled browser is describing to them is going to present them with data, or is used to put things next to each other.

Unfortunately, the alternatives to using tables for layout are still very few, so most people grumble and go along with it since there's little choice. In fact, this author thinks that the grid method for layout via tables is a fine way to do things; he just wishes there were tags for it that did the same things but that were called things like <layout-grid> instead of <table>, <grid-row> instead of <tr>, <grid-cell> instead of <td>, and so forth. But there aren't - at least not until XML is widely used and rendered correctly, which is a few years off still.

If you look at the source for almost any web page out there, you will find pages consisting almost entirely of tables, often nested several levels deep. Table cells (<td>) can contain entire tables, as in the following example:

<table border="2">
<tr><th>Col 1</th><th>Col 2</th><th>Col 3</th></tr>
<tr><td>some data</td><td>some data</td><td>some data</td></tr>
<tr><td>some data</td><td>
<table border="1">
<tr><td>topleft</td><td>topright</td></tr>
<tr><td>bottomleft</td><td>bottomright</td></tr>
</table>
</td><td>some data</td></tr>
<tr><td>some data</td><td>some data</td><td>some data</td></tr>
</table>
Col 1Col 2Col 3
some datasome datasome data
some data
toplefttopright
bottomleftbottomright
some data
some datasome datasome data

This is a real headache to maintain after a while, so be warned: if you use tables for layout, be prepared to put a good chunk of effort into keeping them as simple as you can.

One more warning about tables: you must be sure to properly balance and close your tags. If you forget to close your table with </table>, your table might not be shown at all! Netscape is particularly likely to not show a table with structural errors in it. See the discussion of validation, below.

Cascading Style Sheets: the right way to achieve visual effects cleanly

There's been a fair amount of discussion so far regarding the mixing of form and content, and the difficulties that result from this entanglement. How is it to be done properly?

The answer - and the problem

The answer -- kind of -- is Cascading Style Sheets, which became part of the official HTML specification with version 4.0. "Kind of" because although they were, and are, an elegant solution to the problem of separating form and content in HTML, they were, and are, very unevenly supported by most browsers as of this writing (September 2000). Although the specification for Cascading Stylesheets (CSS) is very clear and precise, most browsers fail - and often spectacularly - when trying to render pages that use them.

It turns out that there is a small subset of CSS that works pretty reliably on most of the "4.0" browers, or the 4.0 and higher versions of Netscape Navigator and Internet Explorer. Full compliance has yet to be reached by any browser, although Internet Explorer version 5.0 for Macintosh comes the closest (the Windows version is a little off, and version 5.5 is actually less compliant). The much-ballyhooed, eagerly-awaited, and very-late open-source Mozilla browser promises 100% compliance when it's finally delivered, but it's still at least a few months off.

This means that if you're writing with Cascading Stylesheets to control the look of your documents, you must be aware that people using older versions of the popular browsers will have difficulty seeing the documents as you write them. At some point, probably by 2002 or 2003, authors will be able to reasonably count on browsers correctly rendering CSS; until then, it's touch-and-go.

This site, for example, is being written with as-of-this-writing bleeding-edge XHTML 1.0, which also uses CSS for formatting. Probably, most of you reading it in late 2000 are not seeing it correctly, although it should be perfectly readable. Want to test your browser to see if it renders CSS correctly? Take a look at this image first, and then look at this page. They should be identical. If they aren't, your browser cannot handle CSS and correct HTML. But don't feel bad; most can't.

So why use CSS?

Three reasons:

Furthermore, there's an important concept called graceful degredation that ensures that when you write HTML with CSS, users with browsers that can't handle it will at least be able to render the document well enough to be read. In a nutshell, graceful degredation means to stick to the logical tags to markup your text as you want to, and if the styles can't be rendered by the browser, the default styles will kick in. For example, if you want to change what level 1 headers look like, then define styles for the <h1> element. Don't use styles just to format ordinary text to be big and however else you want it to look. If you rely on styles alone without the logical structure, then browsers unable to render the styles will have no logical structure to fall back on.

But we're getting ahead of ourselves; first, we need to look at what CSS is and how it works.

Simple styles with CSS

Cascading Stylesheets achieve the separation of form from content by defining (and optionally naming) styles that may be abitrarily attached to tags via the new "style" or "class" attributes. A very simple example looks like this:

<p style="color: yellow; background: black; font-size: 1.5em">This is large
yellow text on a black background</p>

This is large yellow text on a black background

Note: This and all subsequent examples are rendering real CSS on the page, as is the case with the whole document. If your browser does not render CSS correctly, then you might not see the full effect! We'll stick to CSS that's likely to be rendered correctly on most modern browsers.

In the example above, we see the familiar <p> tag, but with a new attribute: a long "style" attribute. This is CSS in action! The "style" attribute contains information about how the text within the <p> tag should look. In this case, we're specifying three CSS attributes: color, background, and font-size. The colors are straightforward enough; the font-size element is set to 1.5 "em", where "em" is the current font size. This means increase the current font-size by a factor of 1.5 (a 50% increase) for this element only.

Having styles on all of your tags isn't really much of an improvement over older ways of doing things, though; you still have style (form) information scattered through your document (content). That's what we're trying to get away from. The real strength of CSS lies in its ability to define and name styles that can be attached to elements. You can change the way all of your <p> tags look at once, for example, by defining a style for <p> at the start of your document, in a <style> tag within the <head> section. This is best shown by example. The following is a sample <head> section of an HTML page defining styles:

<head>
<title>My Stylish Document</title>
<style type="text/css">
<!--
p { color: yellow; background: black; font-size: 1.5em; }
-->
</style>
</head>

Since there's a fair amount here that's new, let's look it over line-by-line.

  1. <head>

    We've seen this; it's the opening of the <head> section of the document, before the <body> section starts.

  2. <title>My Stylish Document</title>

    The standard <title> element for the document, with the name of the web page as it should be displayed in the browser's title bar.

  3. <style type="text/css">

    This is new: the <style> tag, which marks the opening of a section defining CSS styles. It is marked as CSS by the "type" attribute, whose value is "text/css"; there are actually other kinds of stylesheets besides CSS, but most of them are theoretical. Just copy this line as it stands to be safe.

  4. <!--

    This is the beginning of an HTML comment, which means hide everything between it and the closing -->. This is purely for backwards compatibility with older browsers that don't understand the <style> tag; browsers are instructed to ignore tags they don't recognize but try to display the content as best they can. Putting the style definitions in an HTML comment like this ensures that older browsers won't display the style definitions in the document, since they don't know what else to do with it. Clever, eh?

  5. p { color: yellow; background: black; font-size: 1.5em; }

    This is a style definition. Since it starts in the left-hand column with a "p", it means we're defining a style for the document that should be used for every <p> tag.

    The style data is contained between { curly-brackets }; each style attribute is given in the form name: value, and they are separated by semi-colons. Styles do not have to be combined into one line like this; as long as they are properly enclosed by the { curly-brackets } and separated by semi-colons, they can span multiple lines.

    The color attribute specifies the color of the element's data (for a paragraph, that means the text); the background is the element's background color; the font-size attribute specifies the size of the text within the style, in this case making it 1.5 times larger than it would normally be.

  6. -->

    This closes the comment started above, after all of the styles have been defined (in this case, just one).

  7. </style>

    This closes the <style> tag.

  8. </head>

    This closes out the <head> tag. The document would continue at this point with a <body> tag and the document's content.

This is pretty nice; with this definition, every <p> tag in the document will inherit the style described in the document's <style> tag. You don't need to do or say anything; plain old <p> tags will now have big yellow text on black backgrounds. If you decide that you want red text instead of yellow, all you have to do is change the style definition. Even if you have 100 paragraphs (especially if you have 100 paragraphs!), they all instantly inherit the new style. The separation of form from content is complete here, because we've separated what the data is (it's a paragraph, as indicated by the <p> tag) from what it looks like (the style information is in one place, in the header of the document).

It gets better, though. The "Cascade" part of Cascading Stylesheets is very important and powerful, and it kicks in like this: any style can be overridden by more specific styles as needed. Every element inherits the styles of every element that contains it, starting at the site-wide level, proceeding to the document level, and then into the cascade of block and inline tags (see the next section). For example, we have defined a style for <p> tags for the whole document in the example above. We could still apply styles to individual <p> tags within the document itself, and those styles, being further down the cascade, would override the document-level ones. So if we wanted to have a single paragraph with green text, we could just write this:

<p style="color: green">This paragraph will have green text on a black background.</p>

The paragraph above inherits the big-yellow-black background style from the document-level specification in the <style> tag; then the tag-level "color: green" style kicks in and overrides just the "color" part of the document-level specification, leaving us with big-green-black background.

It gets better still: it's possible to define styles and not attach them to any particular element, but have them available for use wherever it's appropriate. These are called class selectors, and have a similar syntax to style declarations, except that they start with a period and a name, rather than a tag name, like so:

.urgent { font-weight: bold; color: black; background: red; }

You can then attach this style to any element, giving it a "class" of "urgent", like so:

<p class="urgent">This paragraph is saying something really urgent.</p>
<p>This is a normal paragraph <span class="urgent">with an urgent section</span>.</p>

This paragraph is saying something really urgent.

This is a normal paragraph with an urgent section.

Notice how for the first paragraph, the whole paragraph is done in the "urgent" style; in the second, only the phrase surrounded by the <span> tag is marked with that style.

Note: Avoid the temptation to name your styles after what they look like. For example, you might initially think to name the style in the above example something like "boldred". This is a bad idea because we're back into the form-content muddle again; the whole point of having styles is to separate your form from your content so that you can easily change the form. If you decide, after writing 20 pages of HTML, that the bold red is too intense, and you want to go with italic green instead, you could change the style easily enough, but it would still be called "boldred", which doesn't make sense for something italic and green. Name your styles after their function, not their appearance.

To make things even more interesting, you can define styles within styles, to match elements that occur only in other elements. For example, in this document, <code> tags have a style of "color: purple; background: white". <pre> tags have a style of "background: #CCCCCC" (which defines a light grey background). If I put a <code> tag inside a <pre> section, the white background of the code section overrides the grey background of the pre section, which is ugly and not at all what I want.

To fix this, you can define styles that say "apply this style to all code tags within pre tags". The definition looks like this:

pre code { background: #CCCCCC; }

This sets the background for <code> tags inside <pre> tags to #CCCCCC, the same value as the style for <pre> tags.

Finally, perhaps the greatest power of Cascading Style Sheets is the ability to keep all of your styles in one central document and have all of your pages refer to it via a <link> tag in the header, so your whole site can share styles without having to describe them repeatedly in each document's <style> section. You can then make a change in your central stylesheet document, and every document in your site that links to it will be instantly updated. Stylesheet documents are typically named with a ".css" extension, and are referred to like this:

<link rel='STYLESHEET' type='text/css' href='my_styles.css'>

The <link> tag goes in the document's <head> section, and should precede the <style> tag if you have one, so that the document's declared styles can correctly override the stylesheet's if necessary.

The world of CSS

We've really done a whirlwind tour here, and haven't gone into great detail on any of the points. That's far beyond the scope of this training and this document. I can specifically recommend one book:

Cascading Style Sheets: The Definitive Guide, by Eric A. Meyer, published by O'Reilly.

It's the best treatment I know of on the whole subjects, including the limitations of current browsers, bugs in current implementations, and using stylesheets in far more advanced ways than it's been possible to show here. It's a long road, but extremely worthwhile. Enjoy!

Block and inline elements: what's the difference?

One other point it's important to touch on regarding HTML structure is the different between block and inline elements. There are some problems you'll run into if you aren't clear on the differences.

The actual precise technical definitions are extremely complicated, but the rule-of-thumb versions that will get you by are much simpler, and they run like this:

Block elements are typically rendered with one blank line between them, and may contain other block elements and inline elements. Examples include paragraphs, headers, lists, and tables.

Inline elements are typically rendered with no vertical space between them, and may only exist within block elements, and can only contain other inline elements. They cannot contain block elements. Examples include images, line breaks, links, and bold/italic formatting.

This means, for example, that the following HTML is wrong:

<b><p>This paragraph will be bold.</p></b>

It's wrong (although most browsers will probably render it as a bold paragraph) because you have a block tag (<p>) inside an inline tag (<b>). You can't have a bold section include a paragraph. That's against the rules. (What rules? See "validation" in the next section.) A paragraph can contain a bold section, though.

The correct sequence would be:

<p><b>This paragraph will be bold.</b></p>

More importantly, it means you can't have something like an image just hanging out between paragraphs, because it's an inline element. It has to be inside a container. So the following is wrong:

<p>A paragraph.</p>
<img src="my_image.gif" alt="my image">
<p>Another paragraph.</p>

It would have to be:

<p>A paragraph.</p>
<p><img src="my_image.gif" alt="my image"></p>
<p>Another paragraph.</p>

This is important mostly because some of the errors you will encounter when validating your pages will be of this nature. So let's talk about validation now.

Validation and accessibility issues

Validation is the universally-neglected process of making sure your HTML is syntactially valid, or correct. HTML, like all computer-based languages, has a syntax that must be adhered to, or wierd things happen. Valid HTML will display predictably on all browsers that can display correct HTML. Invalid HTML might display predictably, unpredictably, or not at all. If you want to make sure the content you're working so hard to get on the web will be accessible and readable as widely and broadly as possible, you should validate all of your pages to make sure they're correct.

You validate your pages by running them through a program called a validator. There are a variety of validators out there; one of the best is the W3C'S at http://validator.w3.org. It also allows you to upload pages directly from your computer for checking.

Be prepared! Virtually any page you upload will have errors in it - some will have hundreds. It can seem overwhelming; it's much easier to start with a valid structure and build up from there than to try to correct a document that has errors and has been worked on a lot.

Here are some of the more common errors you'll run into when validating:

My pages don't validate, but they look fine!

Maybe they do look fine on your browser. How do you know how they look on mine? I'm probably using a different browser and version, different fonts, different platform, different screen size, and different window size than you are. The little HTML error your browser is tolerating may be choking my browser. If you validate your HTML, you know it has no errors and it will render fine on my machine.

(Giant Company X)'s pages don't validate. Why should I worry about it if they don't?

If two million people do a foolish thing, it is still a foolish thing.

- Opus, Bloom County

Giant Company X probably spends unthinkable amounts of money trying to detect your browser and sending you HTML that displays properly on your exact version. How they curse when a new version comes out! How they rant and rave when a new browser is released and they have to tune their website again for it! (Or how they don't, and just disregard market segments that can't render their pages.)

What's this about Bobby and disabled users?

The Bobby validator, at http://www.cast.org/bobby/, checks your pages for accessiblity issues. It has lots of good information about writing HTML that is accessible to users with disabilities. If you are concerned about ADA compliance, you should validate with the Bobby validator.

You should also visit the Web Accessibility Initiative at http://www.w3.org/WAI/. The WAI, a project of the World Wide Web Consortium that created HTML, has its own validation/rating system which specifies three levels of accessibility: levels A, AA, and AAA. Level A is the miminal baseline for designing for users with disabilities; the Bobby validator corresponds approximately to this level. Level AA requires valid HTML and is increasingly required for projects that are federally funded; Bobby compliance, although an important start, does not necessarily meet federal guidelines for accessibility.

Does this mean that you can write invalid HTML and still be Bobby-compliant? Yes, although I'm not sure I'd see the point of doing so. Invalid HTML may also hinder Bobby's ability to check your pages for compliance. Plus, if you are interested in full accessibility for your pages, or if you are receiving federal funding and are required to reach ADA-levels of compliance (speaking generally; as far as I know, the ADA has not released official statements to date on the issue), then Bobby is a first step, but WAI level AA is what you should be meeting minimally, which does require valid HTML.

More practically, it's much easier to run your pages through Bobby and get a simple "everything's fine" then have to go through individual errors in HTML and decide case-by-case whether they affect usability.

The bottom line is that unless you have a very good reason not to (and this reason should be good enough to explain to a disabled user of your site), your site should contain valid, Bobby-compliant HTML.

Why HTML editors are the bane of humanity and the web

I'm not going to name any of the point-and-click, WYSIWYG (What You See Is What You Get) HTML editors that are so rampantly popular. They hold out the promise of insulating users from the need to learn and write HTML, presenting a word processor-like interface in place of text and tags. Some word processors even offer a "Save as HTML" option for files, as the web continues to grow and expand.

There are several reasons why HTML editors are the bane of humanity and the web.

OK, clearly this is all a matter of opinion. My opinions on these matters are admittedly way out there at one end of the spectrum. It's hard being a purist in a complex world.

The truth is, most people do use graphical HTML editors. I personally think they cause far, far more problems than they solve. I think the current messy state of the web and HTML is due in large part to the use of these editors; bad HTML begets browsers with the ability to display bad HTML which begets worse HTML and so forth.

HTML is not a terribly difficult language as languages go. It has some nuances which are important to understand and which cannot be handled by graphical editors: the importance of the separation of form and content; the nature of HTML as a markup language and not a layout language. Even under the best of circumstances, editors mangle these concepts; at worst, they generate invalid HTML that displays unpredictably, disadvantages users with disabilities whose browsers now have the added challenge of trying to disentangle the meaning intended behind the non-structural markup created purely for visual effect, and serve to needlessly distance authors from the language they're writing in.

Bottom line: use an editor if it improves your work. But be sure you understand what's happening under the hood as well.

Summary

In this section, we covered some more advanced HTML relating to how your site looks, and quality and accessibility issues. Next in the day 3 writeup is multimedia - images, sound and video.


Valid XHTML 1.0! Valid CSS! Bobby Approved

Copyright (c) 2000-2001 Steve Linberg, Silicon Goblin Technologies. All rights reserved.
Silicon Goblin Technologies