Digital Attrition

Naively, one might assume that digital artifacts, such as software, are not subject to decay and attrition, processes that affect physical objects. After all, any digital artifact reduces to a sequence of ones and zeros that -given durable storage- remains completely unaltered and would therefore function in exactly the same way even in ten, hundred, or a thousand years. However, this notion disregards an important aspect of digital products, namely that they don’t exist on their own. A digital artifact almost always exists as part of a digital ecosystem requiring other components to be available to fulfill its function. At the very least, it requires a set of conventions and standards. For example, even a simple text file requires a standard of how to encode letters.

This became once again painfully clear to me, when the WordPress software, on which this blog runs, suddenly started behaving erratically a few weeks ago. It produced 404 page-not-found errors that were impossible to diagnose and fix. I had not changed anything, being happy with the look and functionality of the blog, so the WordPress installation had reached the ripe old age of three and a half years. The cause had to be sought somewhere in the operating platform, which in this case means the web server configuration. Upon contacting the hosting provider, I was told that this problem had been diagnosed with older versions of WordPress and could only be cured by an upgrade.

Previous Blog ThemeI had no choice but to upgrade WordPress and the result is before you. Since the old theme, which can still be seen in the thumbnail image, is not compatible with the latest WordPress version, I derived a new theme from the included twentyeleven package. It takes into consideration that screen resolution has increased over the last few years and it also provides a display theme for mobile devices. Curiously, while still offering the same set of functions and features as version 2.3.1, the WordPress software version 3.2.1 has increased significantly in complexity. I ran a quick sloccount analysis, which told me that its codebase increased from 36,895 lines to 92,141 lines, not counting plugins and themes, and the average theme has roughly doubled in code size.

I am sure that this phenomenon is not unfamiliar to anyone who has worked with computers over a number of years. Remember how MS Office 97 contained every feature you would ever need? Since text processing and spreadsheets reached maturity quite early in the game, some people would even say this for the prior versions of Office. Yet, Microsoft has successfully marketed five successor versions of MS Office since then, the latest one being Office 2010. Needless to say that the more recent versions have gained significantly in complexity and size. But who needs it? Studies have shown that most people only use a small core set of features. Unless you are a Visual Basic programmer or have specific uncommon requirements, you would probably still do well with Office 97. Or would you not?

Upon closer look, you would probably not, and this is where the attrition factor comes into play. In case of Microsoft, it is safe to say that this effect has been engineered for the sake of continued profits. Not only are older version not supported any longer, but they do actually become incompatible with current versions. The change of file formats is a case in point. For example, do you know the differences/advantages of the Office x-fomats (such as .docx and .xlsx) over the older .doc/.xls formats? The new ones are zipped XML-based and as such easier to process automatically. However, most people using older office versions of Office or competing products cannot read these formats and are thus forced to upgrade or obtain software extensions for compatibility.

This does not only apply to Microsoft products, but -as previously mentioned- to digital artifacts in general. Remember floppy disks? Not long ago I found a box of them in the storage room. They contained sundry programs and files, reaching back into the Atari and MS-DOS era. Not only don’t I possess a floppy drive any longer, but even if I had one, I could not read these files. To access my earliest attempts at digital art and programming, for instance, I would have to read .PC2 and .GFA files on the GEMDOS file system, which would constitute a major archival effort. Perhaps I should keep the them until I am retired and find some time for such projects. The surprising thing is how fast attrition has rendered digital works useless in the past decades. While I can still find ways to play an old vinyl record from the eighties, for example, it’s almost impossible to access my digital records from the same era.

The Agile Samurai

The Agile Samurai

The Agile Samurai
by Jonathan Rasmusson
1st edition, 280 pages
Pragmatic Bookshelf

Book Review

Over the last ten years, I've been working with teams with different degrees of commitment to the agile process, ranging from non-existing to quite strong. I was looking for a text that summarises agile methodology to help me formalise and articulate my own experiences, and of course to enhance my knowledge of some of the finer points of agile practices. I have to admit that this book did not meet my expectations. The first eighty pages up to chapter six are mostly about project inception and read like a prolonged introduction. From chapter six onwards, the author finally comes to the point and discusses the core concepts of agile processes, so the book does get better with increasing page numbers. Unfortunately, Scrum isn't discussed at all, instead Kanban is introduced in chapter eight. The discussion of typical technical processes, such as refactoring, TDD, and continuous integration is compacted into several brief chapters at the end of the book.

The writing style is very informal; the author uses a conversational tone throughout the book. Almost every page contains illustrations, which makes it an easy and quick read. The style of the book is comparable to the Head First books. It left me with the the impression that I sat in an all-day meeting where someone said a lot of intelligent things to which everyone else agreed. Unfortunately, not many of these things seemed radically new or thought-provoking, so I fear I won't remember many of them next month. Of course, this may be entirely my own fault. I prefer a more formal, concise, old-school language. I also prefer dense and meaty text books with lots of diagrams, numbers and formulas. In return, I can dispense with stick figures, pictograms, and even with Master Sensei (a guru character used in the book). I feel that a lot of the deeper and more complex issues of agile project management have simply been left out.

To be fair, it must be mentioned that I probably do not fall into the target group for which this book was written. It is more appropriate as an introductory text for people who are new to agile project management, or even new to the entire business of project management. Think "trial lesson" and "starter course".

PHP is not Java

The fact that PHP is not Java is self-evident and does not need much emphasizing. Nevertheless, developers who come from a Java/OOP background sometimes treat PHP as if it were Java with dollar signs. For example, one finds not just generic design patterns such as singletons, factories, decorators, etc. in contemporary PHP applications, but also patterns directly taken from the JEE world, such as transfer objects, domain stores, filters, session facades, dispatchers. Even typical Java architectures involving separate business, service, persistence, presentation and integration tiers has been brought to the world of PHP application development. This trend began when PHP became a fully fledged OOP language with version 5. PHP developers seem to have looked to the Java world for inspiration ever since. Possibly it’s a new generation of graduates that was taught to think in Java, or maybe it’s just the overwhelming influence of the mainstream OOP body of thought. I can’t say for sure.

I like Java, but I don’t like PHP code that is written as if it were Java. PHP is great because it makes easy things easy. I believe that this simplicity is an asset, not a flaw. PHP provides a straightforward approach to web development, and I dare say that its success is founded on this principle. PHP gives you productivity, ease-of-use, a high level of abstraction, powerful well-tested libraries and backward compatibility. It’s a pragmatic language, not necessarily a beautiful one. PHP’s share-nothing approach may limit its use in specific areas such as concurrency and parallel processing, but it is ideally suited to the request life cycle model of the web, and it frees the programmer from a lot of potential headaches that come with concurrency. This share-nothing approach also implies that scripts are running in a virtual bubble. A malfunctioning or malicious script doesn’t bring down the entire virtual machine like on the Java platform. This property makes it perfect for shared hosting, and consequently, PHP hosting is ubiquitous and cheap.

PHP != Java

Now, let’s look at some Java-esque PHP code:

abstract class Observable {

  private $observers = array();

  public function addObserver(Observer & $observer) {
         array_push($this->observers, $observer);
  }

  public function notifyObservers() {
         for ($i = 0; $i < count($this->observers); $i++) {
                 $widget = $this->observers[$i];
                 $widget->update($this);
         }
     }
}

class DataSource extends Observable {

  private $names;
  private $prices;
  private $years;

  function __construct() {
         $this->names = array();
         $this->prices = array();
         $this->years = array();
  }

  public function addRecord($name, $price, $year) {
         array_push($this->names, $name);
         array_push($this->prices, $price);
         array_push($this->years, $year);
         $this->notifyObservers();
  }

  public function getData() {
         return array($this->names, $this->prices, $this->years);
  }
}

This is an attempt at the observer pattern taken straight from a popular PHP5 textbook. It is obvious that the implementation doesn’t do anything useful. Since the code is meant for illustration, this is forgiveable. The principal flaw is that in PHP, the observer pattern is for the birds. There are two reasons. First, the observer pattern is fundamentally a Java crutch that compensates for the lack of function arguments in Java. If you find yourself writing an observer in PHP, it might have escaped you that PHP supports function parameters and that you can implement the same functionality more concisely with a simple callback function. The second reason goes somewhat deeper into software design considerations. The observer pattern is typically used in an event-driven design, for example for maintaining the state of GUI widgets or in an event dispatcher for container objects. Such designs are rarely ever useful in PHP, because all PHP objects live in the request scope by default. This means that after the request cycle is completed, the objects -including infrastructure objects such as observers, are deallocated.

The fact that PHP objects don’t survive requests has another important implication: the amount of objects in an application is proportional to CPU consumption, because all objects have to be rebuilt at every request, and unlike in Java, there is no class loader that takes care of these things for you. Therefore, incorporating plenty of Java-esque OOP designs into PHP applications has an adverse impact on performance and scalability. Given that the performance of interpreted PHP is already a magnitude below that of JVM bytecode, you can quickly run into scalability issues with an application that serves a large user base. Putting objects into session scope is no solution, because PHP sessions are serialised/deserialised upon each request. The performance penalty of this operation is likely to be worse than explicit object construction.

The solution to the PHP scalability problem lies in using sensible caching, which is of course what every heavy traffic application does. Products such as memcache and APC provide in-memory data stores for objects. Caching, however, bumps up the complexity of persistence logic in your application considerably. It’s not a plug-and-play solution and it doesn’t provide access to shared data. On a personal note, if I was at the point where I had to consider PHP caching, I would probably already regret not having used a JVM based language in the first place. But let’s get back to the topic. Java developers are used to manipulate data with special purpose data structures called collections. PHP developers, on the other hand, generally use only one bread-and-butter data structure: the associative array. This data structure is an ordered map with either integer or string keys. It’s values can be of any type, and of course, arrays can contain mixed value types. Associative arrays can be used to mimic any collection type, such as lists, maps, queues, stacks, etc., even recursive structures such as trees, because array values can be arrays themselves. So there is no need for special purpose collections and this obviously makes things really simple. As a Java programmer, one should probably resist the temptation to define special purpose collections on top of associative arrays. Why? – It’s not the PHP way.

This statement may require some explanation. You might ask: what keeps you from inserting an element at the head or in the middle of a queue if you use associative arrays? Well, technically nothing. The issue is not about what can be done, but what should be done. It’s has to do with the PHP philosophy. PHP is NOT about enforcing programming by contract. It is a dynamic language, that treats data very loosely. This approach has its benefits and its dangers. If you don’t like it, it may be better to use a statically typed language. What I am saying here is, that it is generally wiser to leverage the strengths of a programming language to the greatest degree possible, rather than trying to compensate for its weaknesses. Dynamic languages have much to offer, but forcing a programming by contract approach onto PHP is painful. Consider the following example:

function recordVote($ballotId, $vote, DateTime $timestamp)
{
   if (!is_int($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be an integer");
  if (!is_string($vote))
    throw new InvalidArgumentException(
      "vote must be a string");
  ...
  return $something;
}

This method is doing type checking the hard way. Version 5 of the language introduced a partial type checking syntax called type hinting. Unfortunately, this works only for objects, such as the $timestamp parameter in the above code snippet. The introduction of PHP type hinting did not only dissolve the dynamic paradigm of the PHP language, but it’s implementation is also pitifully inconsistent, because it does not provide for the checking of scalar types, such as int or string. In practice, it often the wrong scalar arguments that cause bugs which are difficult to detect. Wrong object arguments are detected easily in 90% of all cases when references to non-existing methods or properties are invoked. The recordVote() method tries to compensate for the former lack by explicitly checking the arguments using is_int() and is_string(). This produces explicit error messages at run time, but the approach is painful and entails some serious trade-offs.

Most importantly, the caller is now forced to pass specific types which require type casts in all places where the method is called. These casts are easily forgotten and therefore can cause bugs themselves. Did I mention they are also cumbersome to insert? Secondly, the method loses the capacity of implicit overloading, which means that its reusability is seriously curtailed. For example, if I wanted the method to accept timestamps in other formats, I would have to either add additional methods with different names and signatures, or create an adapter. Adding different methods for the same task results in code duplication. Creating an adapter makes the code harder to understand. Both solutions increase verbosity and reduce clarity. A more elegant and more PHP-like solution to this problem would be to replace syntactic checks by semantic checks, make the method signature more dynamic, which gives caller a richer yet still exact API with mixed type parameters:

function recordVote($ballotId, $vote, $timestamp)
{
  if (!is_numeric($ballotId))
    throw new InvalidArgumentException(
      "ballotId must be numeric");
  $ballotId = (int) $ballotId;
  if (empty($vote))
    throw new InvalidArgumentException(
      "vote must not be empty");
  $vote = (string) $vote;
  if ($timestamp instanceof DateTime)
    $timestamp = doSomething($timestamp);
  else if (is_string($timestamp))
    $timestamp = doSomethingElse($timestamp);
  else throw new InvalidArgumentException(
    "timestamp not valid");
  ...
  return $something;
}

Selector Subtleties

If you have no idea what the title of this blog entry means, chances are that you are not a web developer. Fee free to skip to the next article in this case. We are going to discuss CSS selectors and how they may be used to streamline web development. Traditionally, CSS selectors are used to create logical groups of visual styles which are applied to HTML elements. You could think of selectors as patterns that are matched against a set of tags in a web document. We are considering both CSS 2.1 and CSS 3 selector syntax and functionality, while keeping in mind that the latter is not fully supported by all current browsers. Notably, the Internet Explorer took 10 years from the publication of the CSS 2 specifications in 1998 until the release of IE8 in 2008 with full support for CSS 2.

A proper understanding of selector pattern matching has become more important with the advent of jQuery, the most widely used Javascript library today, that employs CSS selector syntax for DOM element addressing, traversal, and manipulation. Once you have worked with jQuery and its powerful "query by selector" engine, using the conventional DOM document methods, such as getElementById(), feels a bit like operating an unwieldy steam engine. You would not want to go back. It also brings CSS syntax to the world of scripting. Using jQuery effectively, however, requires a fairly good command of some of the more complex CSS selector expressions, understanding the new CSS 3 selectors, and learning a few selector expressions specific to jQuery. But let's start at the beginning with the three most simple and most commonly used CSS selectors:

a { color: orange; }
#orange { color: orange; }
.orange { color: orange; }

The first is a type selector. It assigns the colour orange to the text colour of all anchor tags (links). The second is an id selector. It assigns the colour orange to the HTML element with the id "orange". The third is a class selector. It assigns the colour orange to all elements which have an "orange" class attribute. These three are the bread-and-butter selectors that probably make up 90% of all selectors in CSS style sheets. Tag selectors are typically used for global style settings that apply to a large number of web pages, for example to set the font type for a website. The id selector is typically used in combination with specific layout elements, such as containers or form fields. The class selector is typically used to style elements repetitively in the same manner. These selectors can be combined. For example:

a.orange { color: orange; }

This will set only anchor tags with the CSS class orange to orange text colour. Other anchor tags and other types of tags with the CSS class "orange" set are not affected. Combining multiple selectors in order to match document structure is an important skill that -if mastered- allows you to create efficient and maintainable style sheets. Incidentally, it also makes a good recruiting question for web developer candidates. Can you tell the difference between the following three combinations?

.footer.orange { color: orange; }
.footer .orange { color: orange; }
.footer, .orange { color: orange; }

The first selector matches elements that have two class attributes named "footer" and "orange". The second selector matches elements with the class "orange" that are descendants of an element with the class "footer". The third matches all elements with either the class "footer" or the class "orange". In other words, these selectors express three different types of relationships: logical and (written together), descendant (separated by blanks), and logical or (separated by comma). In geek speak, these are called combinators. There are a few more combinator thingies, although the following are lesser known and used:

#footer > .orange { color: orange; }
#footer + .orange { color: orange; }
#footer ~ .orange { color: orange; }

The angle bracket denotes child relationship. The first example matches all elements with the class "orange" that are children (=immediate descendants) of the element with the id "footer". The plus sign means adjacent sibling. Line 2 matches elements with the class "orange" that are immediately preceded by an element with the id "footer". The tilde denotes a general sibling combinator. The third example matches all elements with the "orange" class that are siblings of the element with the id "footer". These combinators are sometimes handy for formatting lists. The next group of selectors we are going to take a look are attribute selectors. They match elements by attributes and are equally useful for processing HTML and XML documents:

div[class] { color: orange; }
div[class="orange"] { color: orange; }
div[class~="orange"] { color: orange; }
div[class^="orange"] { color: orange; }
div[class$="orange"] { color: orange; }
div[class*="orange"] { color: orange; }

Line 1 selects all div elements that have a class attribute. Line 2 selects all div elements whose class attribute is set to "orange". Another good test question: how is this different from the class selector .orange? Answer: It only selects those elements where the class attribute exactly matches the word "orange". For example, the element <div class="orange fruit"> is not matched. This one is matched by the selector in line 3, however, because ~= matches all div elements where the class attribute contains a whitespace-separated list of words, one of which is "orange". The latter is functionally equivalent to the class selector .orange. Line 4 matches div elements whose class attribute begins with "orange", line 5 matches div elements whose class attribute ends with "orange", and line 6 matches div elements whose class attribute contains the string "orange", such as <div class="all-orange-fruits">.

We go on to the so-called pseudo classes and pseudo elements that provide useful matching techniques for dynamic manipulation in response to user interaction:

div:hover { color: orange; }
input:focus { color: red; }
input:enabled { color: green }
input:disabled { color: grey }
input:checked { color: blue }

Line 1 matches a div element during the time the mouse  pointer is on it (the "rollover" effect). Line 2 matches a form input element that has the focus (the time during which data can be manipulated with the element). Line 3 matches all enabled input elements and line 4 matches all disabled input elements. Line 5 matches a checked form element, such as a radio button or a checkbox. Question: how can you match an unchecked element? The answer is: you need an additional selector. Unchecked elements can be matched by combining the :checked selector with the :not selector, which -strictly speaking- operates like a combinator:

input:not(:checked) { color: yellow; }

You can of course put any valid CSS selector or selector combination into the parentheses of the :not() selector to create more specific and more complex expressions. Finally, there are a number of pseudo class selectors that relate to the DOM tree structure which may be useful for manipulating DOM (in combination with jQuery, for example). The three following lines  match the first, last, and third children of all div elements, respectively:

div:first-child { color: pink }
div:last-child { color: silver }
div:nth-child(3) { color: purple }

Although we have not enumerated all CSS selectors here, the ones we have covered are probably the the most useful and most often used ones. For a complete reference, see the W3C specification for CSS selectors.

To inline or not to inline

As every web developer knows, there are three ways to apply CSS styles to a document. One can use a separate style sheet in connection with the <link> tag, one can embed style definition block(s) directly in the HTML document, or one can inline CSS using the style attribute with single HTML tags. These three approaches represent different (descending) levels of separation. The last method, inline CSS, is somewhat peculiar, because it appears to defeat the principle that CSS is founded upon, namely the separation of content of presentation. After all, writing something like <span style=”font-style: bold”>something</span> ist just another -more cumbersome- way of writing <b>something<b>.

Hence, the question arises: why use inline CSS at all? The naive understanding is that only external style sheets are “good” and that inline CSS and embedded CSS styles are “bad”. It may not be a bad idea to reflect on the criticism and ask the obvious counter question: why did the designers of CSS provide for the possibility of inline styles if it was evil to begin with? Are CSS inline styles the equivalent of the infamous “goto” command?

The truth is that, despite the overarching goal of high level of abstraction and presentation separation, there are some legitimate uses for inline styles. To be precise, there are two such cases. First, inline CSS is appropriate when the default styling, as specified by one or more style sheets, needs to be overridden in specific instances. Second, inline CSS is appropriate for micro-styling issues that relate to specific HTML structure of a document. The first use case is easy to understand. For example, you have all your links defined to be blue and non-underlined using the appropriate definitions in a style sheet. There is one point, however, where you want the link to be green rather than blue. Use an inline style in this case.

The other case requires some explanation, and to be frank, some experience in web design as well. What I have called micro-styling issues are layout issues related to the specific sequence of HTML, text, and images in a given document. These are instance-specific issues that don’t repeat. The vast majority -probably over 90%- of these issues concern spacing and text flow. A typical example would be to adjust the vertical and horizontal spacing between adjacent elements using the float, margin, and padding styles. A somewhat less typical example would be setting the width of columns or text paragraphs in specific instances, or make the corners of an outlined box rounded using CSS 3 styles.

In summary, inline CSS is appropriate wherever styling is specific to the document. This could be the case if default styles must be overridden or if custom layouts require micro-styling. However, if you find yourself repeating the same style tags in many different places, there is something going wrong. As soon as a you see a pattern emerge, refactor! Withstand the copy-paste coding temptation. It is easy to generate code that way, but very hard to maintain. Create classes or appropriate selectors instead and move the definitions to an external style sheet.

Let’s look at a typical border case. Say, we use tables to contain form elements in our web application in order to arrange form controls and labels into columns and rows. Typical micro-styling issues, such as managing white space between specific columns can be solved with inline CSS. If we find that certain spacing definitions repeat, for example if we want all form rows to have a five pixel bottom padding, then this situation would call for a CSS style sheet definition.

The quintessence is: the DRY principle also applies to CSS. While the overall goals of CSS are separation of content and presentation, abstraction, and ease of maintenance, and while the use of inline CSS generally counters these goals, inline CSS is appropriate for the mentioned purposes. It takes a bit of experience to know when it’s OK to break the rules and when it is not.