While the HTML 4.01 specification has ruled the Web since 1999, the fifth incarnation of HTML was released by the W3C as a working draft earlier this year and is constantly updated since then. The HTML 5 specification is supposed to pave the way for future Web standards. It contains an older draft of W3C dubbed “Web Forms 2.0”, which is W3C’s answer to Web 2.0 and the World Wide Web becoming a platform for distributed applications. Don’t expect anything too radical, though. It neither delivers the hailed “rich GUI” for the Internet, nor will it replace current technologies like AJAX. It is rather designed as a natural extension of the former. It provides good backward compatibility while smoothing some of the rough edges of HTML. No more no less. Let’s have a look at the new features in more detail.
HTML 5 mends the split between the preceding HTML 4 and XHTML 1.0 specifications. Rather than being defined in terms of syntactical rules, it makes the DOM tree its conceptual basis. Thus HTML 5 can be expressed in two similar syntaxes, the “traditional” one and the XML syntax, which both result in the same DOM tree. It goes far beyond the scope of previous specifications, for example by spelling out how markup errors are handled, rather than leaving it to browser vendors, and by specifying APIs for new and old elements. These APIs describe how scripting languages interact with HTML. So, what’s new? The following elements have been dropped from the specification:
The following attributes are also goners:
- abbr, accesskey, align, alink, axis, background, bgcolor, border, cellpadding and cellspacing, char, charoff, charset, classid, clear, compact, codebase, codetype, coords, declare, frame, frameborder, headers, height, hspace, language, link, marginheight and marginwidth, name, nohref, noshade, nowrap, profile, rules, rev, scope, scrolling, shape, scheme, size, standby, summary, target, text, type, valuetype, valign, version, vlink, width.
Some of these elements and attributes are quite obscure, so perhaps they won’t be missed. Others like <center>, align, background, and <u> were heavily used in the past, although most of these were already deprecated in HTML 4. The message here is clear: get rid of presentational markup and use CSS instead. The <b>, <i>, <em> and <strong> tags have miraculously survived, however. Although primarily used for text formatting in the past, these tags have been assigned new (non-presentational) semantics to make them respectable. Another conspicuous omission are frames. Yes, frames are gone! But you might breath a sigh of relief to know that <iframe> is still there. Speaking presentational versus semantic HTML, there are quite a few additions to HTML 5 in the latter category. The new semantic tags are designed to aid HTML authors in structuring text and to make it easier for search engine crawlers to parse information in web pages. Here they are (explanations provided by W3C):
- <section> represents a generic document or application section. It can be used together with h1-h6 to indicate the document structure.
- <article> represents an independent piece of content of a document, such as a blog entry or newspaper article.
- <aside> represents a piece of content that is only slightly related to the rest of the page.
- <header> represents the header of a section.
- <footer> represents a footer for a section and can contain information about the author, copyright information, et cetera.
- <nav> represents a section of the document intended for navigation.
- <dialog> can be used to mark up a conversation in conjunction with the <dt> and <dd> elements.
- <figure> can be used to associate a caption together with some embedded content, such as a graphic or video.
- <details> represents additional information or controls which the user can obtain on demand.
Most of these, except the last two, behave like the <div> element, which means their primary use is to identify a block of content that belongs together. Unlike <div> special semantics are associated with each of these elements. Not very exciting? HTML 5 also introduces the following new elements (explanations again from the W3C document):
- <audio> and <video> for multimedia content. Both provide an API so application authors can script their own user interface, but there is also a way to trigger a user interface provided by the user agent. Source elements are used together with these elements if there are multiple streams available of different types.
- <embed> is used for plugin content.
- <mark> represents a run of marked (highlighted) text.
- <meter> represents a measurement, such as disk usage.
- <time> represents a date and/or time.
- <canvas> is used for rendering dynamic bitmap graphics on the fly, such as graphs, games, et cetera.
The <embed> tag supersedes the <applet> and <object> tags. It defines some sort of embedded content that doesn’t expose its internal structure to the DOM tree. The content is typically rendered by a browser plugin. The <audio> and <video> tags are perhaps more interesting, because they make it possible to include multimedia files or streams directly into the HTML document without having to specify a vendor-specific plugin for playing the content. Granted, this could previously be done with the <embed> tag, but the <embed> tag was never a W3C standard and it isn’t supported by all browsers. Obviously, W3C has decided not to follow the mainstream browser implementations and added the <audio> and <video> tags instead, while reserving the <embed> tag for the above named purpose.
Arguably the most exciting additions to HTML 5 -at least from the perspective of a web developer- are the extensions to form processing and data rendering, and the related APIs, such as the editing API or the drag-and-drop API. These additions have previously evolved as a separate standard under the term Web Forms 2.0 and are now incorporated into HTML 5. The <input> element has been enhanced to support several new data types. New elements for user interface components have been defined, similar to those that can be found in GUI applications. For example, HTML 5 finally features the long awaited combo box, a combination of text input and drop-down list, which is a standard component in GUIs for decades. A new <datagrid> element for the interactive/editable representation of data in tabular, list, or tree form, is also present. Here are the new <input> types:
- type=”datetime”- a date and time (year, month, day, hour, minute, second, fraction of a second) with the time zone set to UTC.
- type=”datetime-local”- a date and time (year, month, day, hour, minute, second, fraction of a second) with no time zone.
- type=”date” – a date (year, month, day) with no time zone.
- type=”month” – a date consisting of a year and a month with no time zone.
- type=”week” – a date consisting of a year and a week number with no time zone.
- type=”time”- a time (hour, minute, seconds, fractional seconds) with no time zone.
- type=”number” – a numerical value.
- type=”range” – a numerical value, with the extra semantic that the exact value is not important.
- type=”email”- an e-mail address.
- type=”url” – an internationalised resource identifier.
The input element also has several new attributes in HTML 5 that enhance its functionality (many of these also apply to other form controls such as <select>, <textarea>, etc.):
- list=”listname” - used in conjunction with the <datalist> element to create a combobox.
- required – indicates that the user must provide an input value.
- autofocus – automatically focuses the control upon page load.
- form – allows a single control to be associated with multiple forms.
- inputmode – gives a hint to the user interface as to what kind of input is expected.
- autocomplete – tells the browser to remember the value when the user returns to the page.
- min – minimum value constraint.
- max – maximum value constraint.
- pattern – specifies pattern constraint.
- step – specifies step constraint.
The following new elements provide additional user interface components for web applications. The last three are actually not themselves UI components, but components used for scripting the UI through a server side language:
- <command> represents a command the user can invoke (e.g. toolbar button or icon).
- <datalist> together with the a new list attribute for input is used to create comboboxes.
- <output> represents some type of output, such as from a calculation done through scripting.
- <progress> represents a completion of a task, such as downloading or when performing a series of expensive operations.
- <menu> represents a menu. The element has three new attributes: type, label and autosubmit. They allow the element to transform into a menu as found in typical user interfaces as well as providing for context menus in conjunction with the global contextmenu attribute.
- <datagrid> represents an interactive representation of a tree list or tabular data.
- <ruby>, <rt> and <rb> allow for marking up Ruby annotations.
- <eventsource> represents a target that “catches” remote server events.
- <datatemplate>, <rule> and <nest> provide a templating mechanism for HTML.
Let’s briefly look at the new <datagrid> element. <datagrid> usually has a <table> child element, although <select> and <datalist> are also possible to create a tree control. The columns in the datagrid can have clickable captions for sorting. Columns, rows, and cells can each have specific flags, known as classes, which affect the functionality of the datagrid element. Rows are selectable and single cells (or all cells) can be made editable. A cell can contain a checkbox or values that can be cycled. Rows can also be separator rows. Datagrids have a DOM API for updating, inserting, and deleting rows or columns. They also have a data provider API that controls grid data content and editing.
I hope you found this brief overview useful. Please note that the features mentioned here don’t cover everything that is new in HTML 5, but hopefully they catch the essence. The HTML 5 specification is a work in progress; it is still changing and evolving. You can find the latest editor’s draft at http://www.w3.org/html/wg/html5/. An overview of the changes from HTML 4 is available at http://www.w3.org/TR/html5-diff/.