This is a summary write-up of material and concepts covered in Day 1 of the Spiders at Work Web Camp. On day 1, we talked about beginning HTML.
Web pages are built by combining the content of the page -- the headers, text, links, images, multimedia, and whatever else you may have -- with markup tags in the HyperText Markup Language, or HTML, which describe your document's structure and appearance. Web pages consist of a structured blend of the page's actual content (the text, images, sounds, and whatever other data types might be present) with snippets of the HTML markup language.
HTML is a markup language used to
structure your text for display in a web browser. HTML tags are enclosed
between angle braces, like this: <strong>. Most tags have
an opening and a closing version that mark the beginning and end of its
effect. The opening version will have the tag between the angle braces
(perhaps followed by one or more attributes that give additional detail,
like an image's dimensions, or modify the tag's effect, like alignment for
text), and the closing version will have (just) the tag preceded by a
forward-slash ("/"). For example, a simple paragraph tag with its enclosed
text might look like this (color added for emphasis):
<p>This is a simple paragraph, marked with the HTML 'p' tag.</p>
It is important to note that the <p> tag is a container, whose end is marked with the </p>;
the idea is that the <p> tag contains
everything between it and the closing </p> (in this case,
the text "This is a simple paragraph, marked with the HTML 'p' tag.").
Note: There are many cases where HTML is
used imprecisely (or just plain incorrectly) by popular web-design programs
-- and by humans. Browsers are very forgiving in that they will do their
best to display incorrect HTML, and often can do so well enough that people
don't notice that it's incorrect. This has actually created more problems
than it solves, as we will learn. As an example, many people incorrectly
consider the <p> to be a separator
rather than a container, and write HTML like this:
<p>This is a simple paragraph.<p>This is another paragraph.<p>(etc...)
Although this produces much the same result visually in most browsers as the container approach does, it is wrong. Future versions of HTML (notably XHTML, now an official w3c recommendation) will not tolerate this incorrect behavior.
A small number of tags are stand-alone, rather than containers, and do not have a closing tag. The main ones are:
<img> (image)<br> (line break)<hr> (horizontal rule)These tags may stand alone between other elements, as in:
<p>Here is yet another paragraph, follwed by a horizontal rule.</p><hr><p>And here is still another (short) paragraph.</p>
Note: Beginning with XHTML, it is permissible to use the XML syntax of closing stand-alone element with a trailing slash. For compatibility with current browsers, you must put a space between the tag name and the trailing slash, as in:
<hr />
In XML, this kind of closure is required, so it's not a bad thing to be aware of as the web continues to creep in that direction.
As a final preliminary note about HTML structures, HTML tags can contain other HTML tags, and this can go to any depth (theoretically). For example, this is how we might render a paragraph, which has a part marked for emphasis, and a part of the emphasis marked for extra strength:
Power corrupts,<em>and<strong>absolute power</strong>corrupts absolutely</em>.
Your browser renders the above HTML like this (color added for emphasis):
Power corrupts, and absolute power corrupts absolutely.
There are 10 or 15 tags that are commonly used in basic HTML; learning HTML includes learning these tags and how they are used. Before we go on to specific tags, we'll continue with a few more general observations.
One of the biggest difficulties (if not the biggest difficulty) that has emerged over the first ten or so years in authoring for the World Wide Web is the separation of form and content.
The form of an HTML document is the markup that describes its visual appearance: fonts, colors, spacing, sizes, margins, borders, and other presentation information. The content is the combination of the logical structure of the document (body paragraphs, headers, bulleted and numbered lists, block quotes) and the document's actual text content (the writing).
The separation of form and content refers to the effort to keep what your document looks like separate from what it says. An extreme example of confusion between form and content is a student who hands in a paper with different fonts and margins, but the same content, and calls it a new draft. In this case, the form has been altered, but the content is unchanged.
When form and content become tangled in HTML documents, the documents become increasingly difficult to work on, because the actual content can get buried so deeply in presentation markup that it becomes difficult to see. It also gets very difficult to make changes as a site grows, and more and more pages are added; the result is sometimes referred to as "spaghetti code" that makes you go crosseyed when you try to look at it and figure out what parts you need to change.
As an illustration, let's pretend that you're using the COLOR attribute of the FONT tag (form) to make all of your level 2 headers (content) green. You might use something like this in early versions of HTML:
<h2><font color="green">My level 2 header</font></h2>
While syntactically legal, this will get you into trouble down the road if
you ever want to change things, because we have a tangle of form (the
appearance, which is the <font color="green">...</font>) and content (which is <h2>My level 2
header</h2> - the text plus its logical structure as an
<h2>). Although it works to present the effect you
desire, you have to type it out every time you want to use it, and if you
decide later on that you want your headers to be blue instead of green, you
have a long and tedious task ahead of you, changing every instance by hand.
This is a developer's nightmare.
In this training, we're going to take the "purist's" approach to design and try to keep a clean separation of form and content. While requiring more discipline at first, it makes the "down-the-road" maintenance of the site far, far easier.
This approach means that the first step of the page-creation process is to create your content and use HTML to mark up its logical structure, and don't worry about the way it looks. After the content and structure are complete, you can work the way it looks. (OK, in all honesty, it's rarely that clean a separation, but that's what we strive for. Try to keep what your web page is and says from how it looks as separate concepts in your thinking, which can be joined later.)
There's enough of a similar structure in web documents that it's useful to present a simple template you can always start with. Here's a basic one:
<html><head><title>(your title here)</title></head><body>(your content here)</body></html>
Replace the "(your title here)" phrase with the title of the page as you would like it to appear in the browser window's title bar, and the "(your content here)" with the marked-up content of your page. Here's a simple example:
<html><head><title>My Web Page</title></head><body><h1>My Web Page</h1><p>This is my web page. It's very basic, but it works!</p></body></html>
That's enough for a complete web page! Let's look at the sections.
The "outermost" tag is <html>, which encloses the entire
document. This tells the browser that the contents of the file is HTML
text. It might seem redundant to tell a web browser that it's rendering
HTML, but there are devices other than browsers (like cell phones, handheld
computers, and eventually your blenders and toasters) that can connect to
the web, and the <html> tag sets the context for the
document.
Within the <html> tag are two sections, <head> and <body>. The <head> section
contains information about the document, most of which is not displayed.
This includes the document's title as it should show in the browser
window's title bar, information about any stylesheet(s) it may use, "meta"
tags that encode information about the document that might be used by
search engines for retrieval (like author, description, keywords), and
other information relative to the document's content. The <title> tag is the only required element of the <head> section; everything else is optional.
The <body> section is where the content of your document
goes. The <body> section can therefore be quite long, if
the content of your page is long. That's fine; it can be as long as it
needs to be. In the example above, the body consists of a level 1 header
with the text "My Web Page", and the short paragraph following it. Not
very exciting, but functional and correct, and you can use this template
with confidence for any web page you create.
Note: While we're on the topic of strict correctness, there's actually one more piece that you need at the beginning of your document if you want it to be completely correct, which is the somewhat-obscure-and-often-omitted DOCTYPE declaration. The DOCTYPE declaration is a complicated-looking thing that resembles an HTML tag (it's not, actually; it's a tag from SGML, which HTML is a subset of, but we're not going there right now; it's enough to know that it's there in a correct document). There are different versions of this declaration for different versions of HTML; it's a copy-and-paste operation for whichever one you use. The DOCTYPE declaration's purpose is to tell the browser what kind of markup is being used in the page, so that the browser may make any internal technical adjustments needed to be sure it displays as it was intended.
The DOCTYPE declaration is the first line of the page, before the <html> tag. The example below shows the
same document with the HTML 4.0 strict DOCTYPE declaration.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"><html><head><title>My Web Page</title></head><body><h1>My Web Page</h1><p>This is my web page. It's very basic, but it works!</p></body></html>
The basic tags you will use most often, in all likelihood, include the following:
<p><em> / <i><strong> / <b><font> (size, face, color )<br><h1> ... <h6><ul> and <ol><img><a><p>...</p>)The <p> tag is the building block of most HTML documents.
It delimits a body paragraph, which are generally rendered with one blank
line between them (or between them and other basic tag structures). Each
paragraph should be in its own <p>... </p> structure, as in the following example:
<p>There can be no doubt that vanilla ice cream is hardly worth one's time. I would rather suck an ice cube than eat plain vanilla ice cream; indeed, I can hardly see the difference. Take chocolate ice cream, subtract the flavor, and vanilla is what you're left with.</p><p>Even worse than the bland of vanilla, though, is the abomination of any kind of nuts in any kind of ice cream. Nuts are generally bitter, and their texture is entirely wrong in the context of cold, sweet ice cream; they stick in one's teeth, which can be pleasant enough when salt is the context, but in the case of ice cream serves only to distort the fundamental experience, which is that everything in it should either melt or dissolve naturally.</p><p>Ice cream should be rich in chocolate, coffee, candies, chips, caramels and sweet sauces; it should be devastating to the senses. The notion of "refined" or even "harmless" ice creams are surely alien and should be resisted by all right-thinking people.</p>
The example above renders like this in your browser (color added for emphasis):
There can be no doubt that vanilla ice cream is hardly worth one's time. I would rather suck an ice cube than eat plain vanilla ice cream; indeed, I can hardly see the difference. Take chocolate ice cream, subtract the flavor, and vanilla is what you're left with.
Even worse than the bland of vanilla, though, is the abomination of any kind of nuts in any kind of ice cream. Nuts are generally bitter, and their texture is entirely wrong in the context of cold, sweet ice cream; they stick in one's teeth, which can be pleasant enough when salt is the context, but in the case of ice cream serves only to distort the fundamental experience, which is that everything in it should either melt or dissolve naturally.
Ice cream should be rich in chocolate, coffee, candies, chips, caramels and sweet sauces; it should be devastating to the senses. The notion of "refined" or even "harmless" ice creams are surely alien and should be resisted by all right-thinking people.
<em>...</em>
and <i>...</i>)The <em> tag is used to indicate emphasis; emphasis is typically (although not always) rendered in italics. It is
used to mark a section of text that the browser should render in a manner
that connotes emphasis. For example:
There can be no doubt that vanilla ice cream is<em>hardly worth one's time</em>.
The example above renders like this in your browser (color added for emphasis):
There can be no doubt that vanilla ice cream is hardly worth one's time.
<em> is known as a logical tag, in that it describes its content's logical
structure. It tells the browser, "Render me in a way that indicates
emphasis". Note that it does not say "Render me in
italics," although most browsers do; some do not, such as older text-only
browsers that can't do italics but may do color or underlines instead, or
speech-readers for the blind that will speak the words louder or with some
other appropriate indicator of emphasis.
The tag <i>, for "italics", is known as a physical tag; it tells the browser "Render me in
italics." It does not connote any information about why
its contents should be rendered in italics; only that it should be
presented in an italic typeface. Most conventional uses of italics are to
connote emphasis, and <i> is often considered a form of
shorthand for <em>, but it has the important distinction
of being a physical tag rather than a logical one.
The following html:
This sentence shows some text<i>marked up with the "i" tag</i>, and some text<em>marked up with the "em" tag</em>.
...generates this:
This sentence shows some text marked up with the "i" tag, and some text marked up with the "em" tag.
Which should you use? It depends on your preference, style, and purpose.
Generally speaking, if what you want is to render text physically in
italics, then use the physical <i> tag, but be aware that
people who are visually disabled may not understand the meaning if they are
unable to perceive the visual style of italics. If you want to put
emphasis on the, well, emphasis of the text, then you may be better served
by the logical <em> tag.
(This site uses the logical <em> for emphasis throughout.)
<strong>...</strong> and
<b>...</b>)Similar to the logical / physical application of <em> and
<i> for italics is the logical / physical application of
<strong> and <b> for boldface text.
The logical <strong> tag tells the browser to render the
enclosed text strongly, which most browsers do using boldface text. The
physical <b> tag explicly tells the browser to render the
enclosed text using boldface text.
Again, the HTML:
This sentence shows some text<b>marked up with the "b" tag</b>, and some text<strong>marked up with the "strong" tag</strong>.
...generates this:
This sentence shows some text marked up with the "b" tag, and some text marked up with the "strong" tag.
Which to use? As with <em> and <i>
above, go with what makes sense according to your application.
<font>...</font>The <font> tag allows you to achieve some visual font
effects, such as changing the size, color, and typeface of text.
Note:
The <font> tag is already deprecated in the most recent versions HTML, meaning its use is
discouraged (even forbidden outright when you get right down to it) in
favor of Cascading Stylesheets, which
we will see in the Day 2 writeup. It is presented here for historical
reasons, and because you will undoubtedly encounter it. It is being
dropped because it is one of the main culprits in the tangle of form and
content referred to earlier; it persists to some degree today because many
of the old browsers still in use do not support the new recommended
methods. More on all this later on.
With this description of the <font> tag comes something
we haven't seen yet: attributes.
Attributes are properties of tags that modify their function or appearance.
Attributes appear inside the opening definition for a tag, and should
appear as an attribute name, an equals sign ("="), and a value in quotes.
There can be any number of attributes for a tag, and the order doesn't
matter. Although the quotes are optional in earlier versions of HTML,
modern versions require them, so it's good to be in the habit of providing
them.
The <font> tag by itself doesn't do anything; it requires
at least one attribute. Here is an example of the use of the font
attribute to turn some text green:
We're using the "color" attribute of the "font" tag to turn<font color="green">this text green</font>.
Here's how the above is rendered in your browser:
We're using the "color" attribute of the "font" tag to turn this text green.
You can specify the following colors by name: black, green, silver, lime, gray, olive, white, yellow, maroon, navy, red, blue, purple, teal, fuchsia, and aqua. Note again that support for this in older browsers is inconsistent, and it is strongly discouraged in modern HTML.
You can change the font's size with the "size" attribute, which may either take numeric values from 1 to 7, or relative adjustments such as "+1" or "-1". You can't set the font's point size with an absolute value, but you can set it relative to how the user has their browser configured for the default font. The default font size is 4, being simply the midpoint between 1 and 7. This does not refer to point size. To make the font bigger, you can do this:
We're using the "size" attribute of the "font" tag to make<font size="+1">this text bigger</font>.
Here's how the above is rendered in your browser:
We're using the "size" attribute of the "font" tag to turn this text bigger.
(Note how in both examples, the terminating tag is simply </font>. You do not need to mention the attributes in the closing tag.)
You can also combine attributes, like so:
We're using the "color" and "size" attributes of the "font" tag to make<font size="+1" color="green">this text bigger AND green</font>.
We're using the "color" and "size" attributes of the "font" tag to turn this text bigger AND green.
Another common attribute to use with <font> is "face",
which allows you to request a specific font or font
family for display. You are not guaranteed that the font
you want will be available, so it's not wise to use or count on this.
Technically speaking, it's also not a legal attribute in HTML, it's a
browser extension that isn't really part of the language, although most
browsers support it. Your pages cannot be syntactically valid (more on
this later) if you use it, and like the entire <font>
tag, its use is deprecated and discouraged in favor of Cascading Style Sheets. (In fact, my showing it here means that this
page itself cannot be made to validate; we'll make this sacrifice in the
name of education.)
Here's how it's used:
We're using the "face" (pseudo-)attribute of the "font" tag to request<font face="ariel, helvetica">a sans-serif font for this text</font>.
We're using the "face" (pseudo-)attribute of the "font" tag to request a sans-serif font for this text.
<br>As you may have noticed, HTML treats whitespace (characters that you can type but that are invisible and used for spacing: tab, space, and return/enter) a little strangely: it collapses any sequence of one or more whitespace characters, in any combination, into a single space when displayed. (This is actually done for very good reasons and is very useful.) However, if you do a paragraph like this in HTML:
<p>Halfway down the stairs is a stair where I sit There isn't any other stair quite like it.</p>
You get this in your browser:
Halfway down the stairs is a stair where I sit There isn't any other stair quite like it.
You might think about making each line its own paragraph:
<p>Halfway down the stairs is a stair where I sit</p><p></p><p>There isn't any other stair</p><p>quite like</p><p>it.</p>
...but that yields this:
Halfway down the stairs is a stair where I sit
There isn't any other stair
quite like
it.
...which, although it's what you asked for by making each line into a
paragraph, probably isn't what you wanted as far as the blank lines between
each are placed (and isn't what you meant really, because lines in a poem
usually aren't really paragraphs themselves). What you
really want is the ability to force the browser to break the lines where
you want them, and that's what the <br> tag is for:
<p>Halfway down the stairs is a stair where I sit<br><br>There isn't any other stair<br>quite like<br>it.</p>
Halfway down the stairs is a stair where I sit
There isn't any other stair
quite like
it.
Each <br> tag has the effect of pressing the "return" or
"enter" key at the end of a typed line: it moves down one line and resumes
displaying at that point.
Remember, as the temptation to go hog-wild with line breaks grows in you,
that you as the author have no idea what fonts are
available on your reader's computer, how big his/her fonts are set for
his/her preferences, how big his/her screen is, or how big his/her browser
window is, so you can only take rough guesses as to what it will look like.
Using <br> for heavy font control is a bad idea.
<h1>...<h6>)The last of the basic tags for this discussion are the header tags <h1> through <h6>. They are used to introduce
sections of documents, and are similar to what you might see in an outline
processor or a table of contents. Strictly speaking in the "Information
Mapping" sense, each "proper" document or page should have exactly one <h1> tag as its title, followed by one or more <h2> headers, which can contain <h3> headers,
and so on. Using header tags to achieve font effects (like "big") is
discouraged. Like paragraphs, headers are usually rendered with blank
space between them. Here is a sample:
<h1>My document</h1><p>This is my document. I hope you like it.</p><h2>Part 1</h2><p>Here's a section called "part 1," which has two sub-sections: A and B.</p><h3>Subsection A</h3><p>Here's subsection A, and thrilling it is!</p><h3>Subsection B</h3><p>Here's subsection B, and I never thought we'd make it this far.</p><h2>Part 2</h2><p>Now we're done with part 1 and on to part 2, which starts with another level 2 header. (and so on...)</p>
This is my document. I hope you like it.
Here's a section called "part 1," which has two sub-sections: A and B.
Here's subsection A, and thrilling it is!
Here's subsection B, and I never thought we'd make it this far.
Now we're done with part 1 and on to part 2, which starts with another level 2 header. (and so on...)
One strategy for writing HTML documents that works for some people is to begin by structuring your document only in headers, and then fill in the content below them once you have the whole document mapped out.
<ul> and <ol>You may be wondering at this point what I've been typing to get those little bullet points in the lists you've seen in this document so far. The answer is, nothing; I'm using HTML lists that do the bullets automatically.
Lists are very handy for enumerating items with either bullets or numbers.
They are a two-part HTML structure that consist of an outer tag describing
the list structure, and then inner tags for each item in the list. The <ul> tag describes an unordered list, which uses bullets;
the <ol> tag describes an ordered list, which uses
numbers (or letters). Each item in the list should be enclosed in a <li> tag, which stands for "list item"; they are used in both
ordered and unordered lists.
<ul><li>apples</li><li>oranges</li><li>cherries</li><li>boomerangs</li></ul>
The above list renders as:
Simply changing the <ul> tag to <ol>
(don't forget the closing </ul> which must also change to
</ol>) turns the list into a bulleted list:
<ol><li>apples</li><li>oranges</li><li>cherries</li><li>boomerangs</li></ol>
Now we get this:
Furthermore, with ordered lists, you can change from numbers to letters by adding a "type" attribute with a value of "a" (for lower-case letters) or "A" (for upper-case letters), as in:
<ol type="A"><li>apples</li><li>oranges</li><li>cherries</li><li>boomerangs</li></ol>
Here's our lettered list:
Finally, note that lists can nest, and as they do, they indent to show their level. Study this example:
<ul><li>something</li><li>something else:<ol><li>sub-something else 1</li><li>sub-something else 2</li><li>another sub-something</li></ol></li><li>back to the first list</li></ul>
<img>All this text is fine so far, but what about graphics? Plain-text pages are fine and fast-loading, but we all want our pages to look a bit snazzier. One inescapable element of this is the use of images within your pages.
Images are included with the use of the <img> tag. A
sample image tag looks like this:
<img src="my_image.gif" alt="My image" height="80" width="60" border="1">
Of course, using the <img> tag assumes you actually have
an image to use! Here is the result, including the image that I prepared
for this discussion at great expense:

The important attributes for the <img> tag include:
src specifies the location of the image file. You may use absolute addressing if you want to "borrow" an image on an external website, as in "http://www.yahoo.com/images/swiped_image.gif", or relative addressing for an image on your website, located relative to the web page you're writing. For example, an image in the same folder or directory as the web page can simply have a src attribute of "my_image.gif".
Planning the storage of your images takes some thinking; you may wish to have a directory of images where they are all contained, and then refer to them with a src attribute of, for example, "images/my_image.gif".
The alt attribute gives a short description of the image, and is required. It will be written within an outline of the image frame if the image is not found, or if the user viewing the page has images turned off, and it will be spoken aloud by speech-reading browsers for the visually disabled.
The height of the image in pixels. It doesn't hurt to specify this if you know it; the browser will figure it out if you omit it. You may use this attribute to force a specific height for images that scale; for example, a height of "200" an image whose actual height is 100 will double the height.
The width of the image in pixels; exactly the same as for height above, but for the width of the image.
The border attribute specifies the width, in pixels, of the border to use for the image. The border is not part of the image itself. In modern versions of HTML, the border attribute is deprecated in favor of style sheets (more about that in the day 2 writeup).
<a>What makes the web the web is the ability to use hyperlinks, which connect pages to each other, to other
resources on the web, and even to other resources outside the web. They
are created via the <a> tag, which stands for "anchor".
The basic premise is that you define a piece of text, an image, or
something to "click" as an "anchor" to another object, and you surround
that text / image / whatever with an anchor tag describing the location of
the object to connect to. For example, to make a link to the web page
whose address is http://www.yahoo.com, you would use
the following:
<a href="http://www.yahoo.com">this text is a link to yahoo.</a>
This will cause the phrase "this text is a link to yahoo" to be rendered as a link (by default, blue underlined text, which is pretty familiar to us all by now) to the named site, and when the user clicks the link, they are taken to that address.
The "href" attribute gives the location of the page or object being linked to.
If the link begins with "http://", it is taken to be a web page on another
domain, whose full address must follow. Links can also be made to local
pages via relative addressing, as discussed in the <img>
section above, by simply specifying the address of the page relative to the
current page. For example, linking to the next page in this discussion
looks like this, since the pages are next to each other, in the same folder:
the<a href="day2.html">day 2 writeup</a>has a lot more information.
It is rendered like this:
The day 2 writeup has a lot more information.
In the day 2 writeup, we'll look at more advanced HTML, including: