HTML5 Shaping Up

HTML 5It's been a while since I last wrote about the upcoming HTML5 standard -two years to be precise- and a lot has happened since then. Not only has the W3C (World Wide Web Consortium ) draft moved closer to the finishing line, but quite a bit of the HTML5 package is already implemented and ready for deployment in modern browsers. Regarding the completeness of HTML5 support, Google Chrome is currently leading the pack, followed closely by Firefox, Opera and Safari. Even Microsoft seems to have discovered the advantage of standards compliance, as its upcoming IE9 includes support for several new HTML5 features. If you are a web developer and haven't delved into the details of HTML5 yet, now is the time.

As many web developers spend more time coding server side languages than coding HTML, they might think about HTML programming as a secondary skill. However, this contains the misconception that the new HTML5 standard is just about angle bracket tags. – It is not . – HTML is the heart wood of web programming and the upcoming HTML5 standard is the most comprehensive update that web developers have seen since the days of Mosaic. In this article, I am going to summarise some important points about HTML5 that every web developer should understand before moving on to the technical details.

HTML5 is not just about markup. Although the new HTML5 standard contains new tag definitions and deprecates old ones, the package goes far beyond markup definition. It does not just define new tags with new functionality, but it also defines the accompanying APIs in unprecedented detail. It contains diverse features for audio and video playback, 3D imaging, drag-and-drop, new form elements, a canvas element for 2D drawing, offline database storage, document editing, geolocation, microdata for semantic markup embedding, and CSS3, the next level of the cascading style sheets standard.

HTML5 is not going to be released with a drum-roll. The HTML5 specifications have been developed by the Web Hypertext Application Technology Working Group (WHATWG) of the W3C since 2004. The first public working draft was published in 2008. The specifications are considered an ongoing work and are expected to reach candidate recommendation stage within the next two years. In the meantime, the parts of the specification which are considered stable are being implemented by browser developers. Thus HTML5 is expected to reach the market in gradual steps over a number of years.

HTML5 is not all-or-nothing. Indeed, there was never an all-or-nothing scenario, even with prior versions of the HTML standard. Browser detection software typically doesn't test for HTML version support, but for individual features, such as support for a certain DOM level, API constructs, or specific feature implementations. Because HTML5 is a bundle of (largely independent) features and APIs, it will be no different with HTML5. For example, geolocation does not rely in any way on 3D imaging, the canvas does nor rely on drag-and-drop, and so on. Application developers can make use of these features without having to worry about HTML5 support on a whole.

HTML5 is designed with backwards compatibility. Upgrading your web pages to HTML5 might be as easy as replacing the HTML4 doctype tag with the HTML5 doctype. Chances are that all tags in a typical page -if they weren't already deprecated in HTML4- will still work in HTML5. Furthermore, the new standard enhances rather than replaces existing functionality. For example, the <input> tags in an HTML5 form may use the new input types for email, date, and numeric data entry. On older browsers without support for HTML5 tags, these are rendered as regular text input fields. The HTML5 form validation functionality, designed to simplify routine Javascript data validation, is also designed to be degradable in older browsers.

HTML5 is already here. Since the market introduction of HTML5 occurs incrementally,  many features are already available in up-to-date web browsers. For instance, semantic markup, canvas, and basic audio and video playback are already supported by the latest browser versions. One can safely assume that the upcoming Firefox 4 and IE 9 releases will put even more HTML5 features at the developer's disposal. Go to http://html5test.com to check your browser and find out which new features it already supports. See http://www.html5rocks.com for an interactive presentation, as well as in-depth tutorials and code examples of the new HTML5 features. Finally, a list of websites that already makes use of HTML5 (and ideas what it may be used for) can be found at http://html5gallery.com.

Gimme Gadgets

There are quite a few reasons to like gadgets. They are usually free and open source by design. They use standard web technologies, such as HTML, CSS, and Javascript. They are -at least in principle- platform-independent and portable. Perhaps most importantly, they are easy to program and deploy, which makes them ideal for small personal applications. I am thinking about keeping oneself informed about the scores of one's favourite sports team, displaying local bus schedules, or aggregating social network feeds into a custom-designed widget that sits on the desktop. I am sure that every computer user can come up with an idea for a mini-application that they always wanted but never found. Gadgets are the obvious solution, as they have web connectivity and web technology built in.

The first question for the budding gadget developer is then which gadget
technology to choose. In an ideal world, there would only be a single standardised package format and only a single standardised API. This would allow gadgets to be used on any platform and the question of choosing a format would not even arise. Alas, we don't live in an ideal world and therefore different platforms and markets have produced different gadget formats. For example, there are Windows desktop gadgets, Linux desktop gadgets, Google gadgets, and gadgets designed to be integrated into web portals. We will look at the different types of gadgets and their use in brief.

Windows Gadgets

Formerly known as Windows sidebar, Windows gadgets are based on the widget engine for Microsoft gadgets, which runs on the Windows platform only. A minimal gadget contains an XML configuration file (gadget.xml) and an HTML file (main.html). Other web files can be added. These are zipped for distribution and the resulting file is renamed to *.gadget. Windows gadgets have access to a special API divided into three parts. 1. Gadget objects provide gadget state and event handling. 2. System objects provide access to files, network and OS functions. 3. Presentation objects provide visual functionality, namely background, image, and text handling. Many Windows desktop gadgets can also be run (with slight modifications) inside a Windows Live homepage. The latter don't have access to the system API and cannot modify the page's DOM object tree.

Apple Dashboard

Dashboard is an application that hosts widgets on a Mac Computer. The widgets are contained in an invisible layer that is activated by clicking on a dock icon, or by pressing a key. Like Windows gadgets, Dashboard widgets are based on standard web technologies. A typical dashboard gadget contains six files: a property list and a JavaScript containing the interactive functionality, and HTML and CSS files, a background image and an icon for the visual design. Dashboard implements a client server architecture with widgets running as clients. There are three classes of Dashboard widgets: Accessory widgets that are self-contained mini-applications like clocks, calculators, etc., application widgets that interact with an existing Mac application, and information widgets that retrieve information from the Internet.

Google Gadgets

As is the case with Windows gadgets, Google gadgets come in different flavours. They are based on the Google Gadget API and run inside an iGoogle page or can be embedded into any web page, usually by loading content from a remote server. Google gadgets can also be run on the desktop if the Google Desktop product is installed, which is a bit of a downer, because Google Desktop also contains desktop search functionality that constantly indexes your PC's filesystem and allows text searches on all of your files. The good news is that the latter functionality can be disabled. Furthermore, there are Google gadgets with enhanced capabilities for the (recently decommissioned) Google Wave application. Like Windows gadgets, Google gadgets consist of XML, HTML, JavaScript (lots of it) and other web files. The advantage over their Windows cousins is that Google gadgets are more platform-independent, since Google Desktop is available for Windows, Linux and Mac. Reusing web gadgets for the desktop (or vice versa) is also easier. The Java-like Google Gadgets API provides methods in the gadgets.* namespace for IO, string and JSON processing, skinning, and other functions. Developers can use the iGoogle gadget editor and gadget testing environment for creating gadgets.

Yahoo Widgets

Google's competitor Yahoo also offers a gadget technology called Yahoo widgets based on the Konfabulator product. Yahoo widgets are primarily intended to run on the desktop rather than inside a web page and to that end, users must install the Yahoo widget engine. Unfortunately, this product is closed source and only available for Windows and Mac. Like their cousins, Yahoo widgets are comprised of XML, HTML, JavaScript, CSS (and optionally Flash) and are zipped into a single *.widget file for distribution. The comprehensive Yahoo Widgets API includes functions for event-driven GUI programming, DOM processing, downloading web pages, and access to Yahoo services. It is even possible to create and use an SQLite database with Yahoo widgets or access OS-specific functions by running shell scripts on Windows or AppleScript on the Mac.

Linux/Unix Gadgets

There is a variety of widget engines available for Linux and the market seems to be highly fragmented. For the already mentioned Google gadgets, Linux users can download the open source Google-Gadgets-For-Linux software that allows Google gadgets to be run without Google Desktop. In addition, there are the following widget engines, among others, for which a limited choice of existing widgets is available:

Gdesklet – is a Gnome program for running gadgets on a Linux desktop. Despite its name, it can be also be used with other Desktop managers other than Gnome, like KDE or Xfce. Desklets are applets programmed in the Python language.

SuperKaramba – is a widget engine for the KDE desktop. The visual aspects of a SuperKaramba widget are specified in a text file, while its functionality can be programmed in either Python, Ruby, or JavaScript.

Screenlets – is a X11/Compiz-based widget engine that is independent of the desktop environment. It supports Python applets with skins drawn in SVG and -more recently- web widgets written in HTML, CSS and JavaScript.

CSS Grid Layouts Brittle

Recently I changed parts of the HTML template for this blog from CSS divs to tables. Gasp, tables? That's so nineties. Indeed, it is. However, the CSS floating divs were just too brittle. An occasional wide image or wide block of <pre> text would mess up the sidebar badly. Also, the visual results were different in different browsers. The problem puppy was a browser whose name shall not be mentioned (but I can tell you it starts with “I” and ends with “6.0”). Call me old-fashioned, but I think that a table-based design often beats CSS in terms of robustness. Why spend hours testing a complex CSS design if the same job can be accomplished with tables in a few minutes? Tables are especially handy with multiple columns, nested columns and rows, and elastic designs. I would still use CSS in most situations, but you can't beat tables for robust grid layouts.

HTML 5 Preview

Because HTML is at the very core of the World Wide Web, you would expect it to be a mature and refined technology. You would also expect it to provide a flexible platform for Web application development and deployment. As most web developers know, the reality is a bit different. HTML started out as a rather simple SGML application for creating hyperlinked documents. It originally provided a basic set of elements for data viewing, data input, and formatting, whereas it did a little bit of all, yet nothing quite right. While this was practical for whipping up quick-and-dirty websites, it proved to be inadequate for more demanding presentation tasks and fine-tuned user interaction. Thus a whole bunch of supplemental technologies came into being, including CSS, JavaScript, Flash and finally AJAX. You know the story. All of this was quite a messy affair and unfortunately it still is.

While the HTML 4.01 specification has ruled the Web since 1999, the fifth incarnation of HTML was released by the W3C as a working draft earlier this year and is constantly updated since then. The HTML 5 specification is supposed to pave the way for future Web standards. It contains an older draft of W3C dubbed “Web Forms 2.0”, which is W3C’s answer to Web 2.0 and the World Wide Web becoming a platform for distributed applications. Don’t expect anything too radical, though. It neither delivers the hailed “rich GUI” for the Internet, nor will it replace current technologies like AJAX. It is rather designed as a natural extension of the former. It provides good backward compatibility while smoothing some of the rough edges of HTML. No more no less. Let’s have a look at the new features in more detail.

HTML 5 mends the split between the preceding HTML 4 and XHTML 1.0 specifications. Rather than being defined in terms of syntactical rules, it makes the DOM tree its conceptual basis. Thus HTML 5 can be expressed in two similar syntaxes, the “traditional” one and the XML syntax, which both result in the same DOM tree. It goes far beyond the scope of previous specifications, for example by spelling out how markup errors are handled, rather than leaving it to browser vendors, and by specifying APIs for new and old elements. These APIs describe how scripting languages interact with HTML. So, what’s new? The following elements have been dropped from the specification:

  • <acronym>
  • <applet>
  • <basefont>
  • <center>
  • <dir>
  • <font>
  • <frame>
  • <frameset>
  • <isindex>
  • <noframes>
  • <s>
  • <small>
  • <strike>
  • <tt>
  • <u>
  • <xmp>

The following attributes are also goners:

    abbr, accesskey, align, alink, axis, background, bgcolor, border, cellpadding and cellspacing, char, charoff, charset, classid, clear, compact, codebase, codetype, coords, declare, frame, frameborder, headers, height, hspace, language, link, marginheight and marginwidth, name, nohref, noshade, nowrap, profile, rules, rev, scope, scrolling, shape, scheme, size, standby, summary, target, text, type, valuetype, valign, version, vlink, width.

Some of these elements and attributes are quite obscure, so perhaps they won’t be missed. Others like <center>, align, background, and <u> were heavily used in the past, although most of these were already deprecated in HTML 4. The message here is clear: get rid of presentational markup and use CSS instead. The <b>, <i>, <em> and <strong> tags have miraculously survived, however. Although primarily used for text formatting in the past, these tags have been assigned new (non-presentational) semantics to make them respectable. Another conspicuous omission are frames. Yes, frames are gone! But you might breath a sigh of relief to know that <iframe> is still there. Speaking presentational versus semantic HTML, there are quite a few additions to HTML 5 in the latter category. The new semantic tags are designed to aid HTML authors in structuring text and to make it easier for search engine crawlers to parse information in web pages. Here they are (explanations provided by W3C):

  • <section> represents a generic document or application section. It can be used together with h1-h6 to indicate the document structure.
  • <article> represents an independent piece of content of a document, such as a blog entry or newspaper article.
  • <aside> represents a piece of content that is only slightly related to the rest of the page.
  • <header> represents the header of a section.
  • <footer> represents a footer for a section and can contain information about the author, copyright information, et cetera.
  • <nav> represents a section of the document intended for navigation.
  • <dialog> can be used to mark up a conversation in conjunction with the <dt> and <dd> elements.
  • <figure> can be used to associate a caption together with some embedded content, such as a graphic or video.
  • <details> represents additional information or controls which the user can obtain on demand.

Most of these, except the last two, behave like the <div> element, which means their primary use is to identify a block of content that belongs together. Unlike <div> special semantics are associated with each of these elements. Not very exciting? HTML 5 also introduces the following new elements (explanations again from the W3C document):

  • <audio> and <video> for multimedia content. Both provide an API so application authors can script their own user interface, but there is also a way to trigger a user interface provided by the user agent. Source elements are used together with these elements if there are multiple streams available of different types.
  • <embed> is used for plugin content.
  • <mark> represents a run of marked (highlighted) text.
  • <meter> represents a measurement, such as disk usage.
  • <time> represents a date and/or time.
  • <canvas> is used for rendering dynamic bitmap graphics on the fly, such as graphs, games, et cetera.

The <embed> tag supersedes the <applet> and <object> tags. It defines some sort of embedded content that doesn’t expose its internal structure to the DOM tree. The content is typically rendered by a browser plugin. The <audio> and <video> tags are perhaps more interesting, because they make it possible to include multimedia files or streams directly into the HTML document without having to specify a vendor-specific plugin for playing the content. Granted, this could previously be done with the <embed> tag, but the <embed> tag was never a W3C standard and it isn’t supported by all browsers. Obviously, W3C has decided not to follow the mainstream browser implementations and added the <audio> and <video> tags instead, while reserving the <embed> tag for the above named purpose.

Arguably the most exciting additions to HTML 5 -at least from the perspective of a web developer- are the extensions to form processing and data rendering, and the related APIs, such as the editing API or the drag-and-drop API. These additions have previously evolved as a separate standard under the term Web Forms 2.0 and are now incorporated into HTML 5. The <input> element has been enhanced to support several new data types. New elements for user interface components have been defined, similar to those that can be found in GUI applications. For example, HTML 5 finally features the long awaited combo box, a combination of text input and drop-down list, which is a standard component in GUIs for decades. A new <datagrid> element for the interactive/editable representation of data in tabular, list, or tree form, is also present. Here are the new <input> types:

  • type=”datetime”- a date and time (year, month, day, hour, minute, second, fraction of a second) with the time zone set to UTC.
  • type=”datetime-local”- a date and time (year, month, day, hour, minute, second, fraction of a second) with no time zone.
  • type=”date” – a date (year, month, day) with no time zone.
  • type=”month” – a date consisting of a year and a month with no time zone.
  • type=”week” – a date consisting of a year and a week number with no time zone.
  • type=”time”- a time (hour, minute, seconds, fractional seconds) with no time zone.
  • type=”number” – a numerical value.
  • type=”range” – a numerical value, with the extra semantic that the exact value is not important.
  • type=”email”- an e-mail address.
  • type=”url” – an internationalised resource identifier.

The input element also has several new attributes in HTML 5 that enhance its functionality (many of these also apply to other form controls such as <select>, <textarea>, etc.):

  • list=”listname” - used in conjunction with the <datalist> element to create a combobox.
  • required – indicates that the user must provide an input value.
  • autofocus – automatically focuses the control upon page load.
  • form – allows a single control to be associated with multiple forms.
  • inputmode – gives a hint to the user interface as to what kind of input is expected.
  • autocomplete – tells the browser to remember the value when the user returns to the page.
  • min – minimum value constraint.
  • max – maximum value constraint.
  • pattern – specifies pattern constraint.
  • step – specifies step constraint.

The following new elements provide additional user interface components for web applications. The last three are actually not themselves UI components, but components used for scripting the UI through a server side language:

  • <command> represents a command the user can invoke (e.g. toolbar button or icon).
  • <datalist> together with the a new list attribute for input is used to create comboboxes.
  • <output> represents some type of output, such as from a calculation done through scripting.
  • <progress> represents a completion of a task, such as downloading or when performing a series of expensive operations.
  • <menu> represents a menu. The element has three new attributes: type, label and autosubmit. They allow the element to transform into a menu as found in typical user interfaces as well as providing for context menus in conjunction with the global contextmenu attribute.
  • <datagrid> represents an interactive representation of a tree list or tabular data.
  • <ruby>, <rt> and <rb> allow for marking up Ruby annotations.
  • <eventsource> represents a target that “catches” remote server events.
  • <datatemplate>, <rule> and <nest> provide a templating mechanism for HTML.

Let’s briefly look at the new <datagrid> element. <datagrid> usually has a <table> child element, although <select> and <datalist> are also possible to create a tree control. The columns in the datagrid can have clickable captions for sorting. Columns, rows, and cells can each have specific flags, known as classes, which affect the functionality of the datagrid element. Rows are selectable and single cells (or all cells) can be made editable. A cell can contain a checkbox or values that can be cycled. Rows can also be separator rows. Datagrids have a DOM API for updating, inserting, and deleting rows or columns. They also have a data provider API that controls grid data content and editing.

I hope you found this brief overview useful. Please note that the features mentioned here don’t cover everything that is new in HTML 5, but hopefully they catch the essence. The HTML 5 specification is a work in progress; it is still changing and evolving. You can find the latest editor’s draft at http://www.w3.org/html/wg/html5/. An overview of the changes from HTML 4 is available at http://www.w3.org/TR/html5-diff/.

Semantic vs. presentational HTML

Today I debated with my colleagues the differences and merits of semantic HTML versus presentational HTML. This may seem a fairly esoteric topic to non-developers. However, for web developers it touches upon a fundamental issue, namely that of best coding practices. Should HTML be coded with semantic or presentational preference? Are there different situations where one coding style is more appropriate than the other? And what constitutes semantic versus presentational HTML in the first place?

Since my colleagues and I left these issues sort of unresolved, I am going to consider them in some more detail here. Web developers are divided in two camps, the semantic HTML advocates and the presentational HTML advocates. My colleague seemed to be arguing for a presentational approach. Before we look at the reasoning that backs each of the two positions, let us define these terms first.

Semantic HTML is the subset of HTML that describes the content and structure of a document, whereas presentational HTML is the subset of HTML used to determine the appearance of the document. While this definition is straightforward and unambiguous, in practice it is often difficult to point out the exact range of these sets. In other words, it’s not always easy to tell whether a given tag belongs into the semantic or into the presentational category.

Some HTML tags can be assigned quite easily, however. For example <address>, <abbr>, <body>, <code>, <kbd> are all semantic tags while <center>, <font>, <hr>, <b> and <br> (“bed and breakfast markup”) are all presentational. The same categorisation can be expanded to distinguish between presentational and semantic attributes. In some cases, HTML offers semantic and presentational alternatives that achieve the same thing. For example, most browsers render <i> (presentational) exactly as <em> (semantic), and <b> (presentational) exactly as <strong> (semantic).

To make things even more complicated, there are HTML tags which have both presentational and semantic aspects and other tags which have neither. Tags like <button>, <caption>, and <table> are examples of hybrids, whereas <script>, <applet>, <object> are neither semantic nor presentational but constitute containers for other types of content.

There are two principle arguments for preferring semantic markup over presentational markup. The first is that semantic markup helps to make documents easier to understand for machine parsers, as for example search engine robots, agents, screen readers, accessibility software and the like. The second argument is that it is always a good idea to separate content from presentation, because it aids automation and it helps to simplify maintenance. This argument gained momentum with the introduction of style sheets, which allow to move the appearance aspects to an external document.

There are also good arguments for preferring presentational markup over semantic markup. For example, there is the ease-of-use aspect. It is simply easier to write <b> than <strong> or <span style=”font-weight:bold”>. Then there is the backward-compatibility aspect. Most if not all of the presentational HTML markup is understood by even the most outdated browsers. The first pro-semantic argument can also be called into question, because today’s robots and search engine spiders are sophisticated enough to interpret the presentational aspects of a document and derive document structure from it.

Finally, the strongest point for giving presentational HTML preference is that HTML itself is designed for document presentation, not for document storage or structuring. My own point of view is that the distinction between presentational and semantic HTML is quite academic and probably irrelevant. We have to live with the fact that HTML is a bit messy by design. In practice, presentational elements are often (ab-)used to create document structure, for example by using <br> for paragraph separation. The reverse is also the case. Semantic and structural elements are often (ab-)used to create a certain visual appearance, as for instance the <blockquote> tag or the various tags used in conjunction with tables.

I tend to see HTML as a language that is chiefly concerned with presentation. In this capacity it has been extremely practical and successful. Ideally, HTML takes care of the document structure whereas CSS takes care of the finer aspects of visual appearance. In practice, however, it is rather difficult to achieve a complete separation. Therefore I suggest to abandon the attempt to rigidly structure content with semantic markup at the expense of visual definition.

If semantic structuring is a design goal, then choose a fitting XML format. XML is much better suited to that task and HTML can be generated quite easily from XML. The semantic approach only makes sense in those cases where rather simple documents are created in HTML and where HTML is the primary format. Otherwise semantics would have to be foisted onto the limited HTML vocabulary. Since the number of dynamically generated pages is outgrowing the number of static pages on the Internet, and since the use of XML is increasing, the distinction becomes less and less important.