Object cloning in PHP

object cloningIn any complex object-oriented PHP program, there are situations that require copies of objects. Objects are often designed mutable, which means they contain state information that can change. Consider a bank account object, for example, which contains state information about balance, credit limit, and the account holder. Let’s assume that there are withdraw() and deposit() methods that change the state of this object. By contrast, an immutable design would require that withdraw() and deposit() return new account objects with updated balance information. This may sound like an irrelevant distinction, but the the implications are actually far-reaching, because mutable objects tend to increase complexity in subtle ways. Copying objects is a good example.

$object1 = new Account();
$object2 = $object1;

By assigning an object instance to a new variable, as above, one creates only a new reference and the object’s state information is shared by both reference variables. Sometimes, this is all a program needs. If withdraw() is called on $object2, both $object1->getBalance() and $object2->getBalance() return the same value. On other occasions, this behaviour is not desirable. For instance, consider displaying the results of a withdrawal operation on an ATM machine before the transaction is executed. In this case, we can make a copy of the account object, execute the withdrawal operation, and display the new balance or an overdraft message to the user without affecting the actual account. For this we need a copy of the object rather than a copy of the reference. PHP provides an intrinsic operation using the keyword clone:

$account = new Account();
$clonedAccount = clone $account;

The $clonedAccount variable contains a copy of the original object. We can now invoke  $clonedAccount->withdraw() to display the results and -with a bit of luck- the original $account object remains unaffected. With a bit of luck? Yes, unfortunately things aren’t quite straightforward. The clone operation creates a so-called shallow copy of  the original instance, which means that it constructs a new object with all fields duplicated. Any field that contains internal type data, such as integer, string, float, or an array is copied. If the balance is of type float, for example, we should be fine. If the balance field happens to be an object, however, we have another problem, because the clone operation does not copy composite objects but only their references. If the account class uses a balance object, a call to $clonedAccount->withdraw() method would still affect the state of the original $account object, which is clearly not the desired behaviour.

This can be remedied by adding a magic method named __clone() to the original object. The __clone() method defines what happens if the object is cloned:

class Account {

  protected $balance;

  function __clone() {
    $this->balance = clone $this->balance;
  }
  …
}

The somewhat odd looking syntax of the _clone() method above instructs PHP to make a copy of the balance object that the field $balance refers to when the object is cloned. Thus not only the account object itself is copied, but also the balance object that it contains. While this should be okay for our stated purposes, note that this only copies the balance object, and not any other composite objects that the account object might contain. It is not difficult to generalise the code, however. The following even odder looking syntax makes copies of all composite objects of the account object. It does so by iterating all fields of the current instance referred to by $this, whereas $key takes the names of the fields and $value their values:

class Account {

  function __clone()
  {
    foreach ($this as $key => $value) {
      if (is_object($value)) {
        $this->$key = clone $this->$key;
      }
    }
  }

}

The is_object() test in the above code is necessary to avoid cloning non-existing composite objects, i.e. fields whose value is set to null, which would result in an exception. Yet, this code still has a minor flaw. What if our object contains array fields whose values are objects? While the array itself would be copied, the array fields still contain references and thus would point to the same objects as the array fields in the original object. This flaw can be eliminated by adding a few more lines of code that make explicit copies of the array fields:

function __clone()
{
  foreach ($this as $key => $value) {
    if (is_object($value)) {
      $this->$key = clone $this->$key;
    }
    else if (is_array($value)) {
      $newArray = array();
      foreach ($value as $arrayKey => $arrayValue) {
        $newArray[$arrayKey] = is_object($arrayValue)?
          clone $arrayValue : $arrayValue;
      }
      $this->$key = $newArray;
    }
  }
}

This already looks fairly complicated, but unfortunately it is not the end of our troubles. We also have to consider the hierarchical structure of composite objects, which means that the objects in object fields may contain object fields themselves which may yet contain objects with other object fields. Thus, creating a clone from scratch requires recursive copying of the object structure, otherwise known as making a “deep” copy. Obviously, the above method already gives us a way of implicit recursion if all of our objects implement it. The __clone() method of Object A is implicitly invoked by the __clone() method of object B when B containing A is cloned. We could use a base class for all of our objects to provide deep copying functionality. Although this works only with our own objects, and not with objects from third-party libraries, it would provide a comprehensive method for object copying. Unfortunately, the recursive approach still contains a flaw. Consider the following object structure:

class Employee {
  $name         = null; /** @var string employee name */
  $superior     = null; /** @var Employee employee's superior */
  $subordinates = null; /** @var array of Employee, subordinates */
}

This is an example of a class that represents a hierarchical graph of instances in memory. The Employee class defines a tree structure with the variable $superior containing a reference to the ancestor node and the variable $subordinates containing a reference to child nodes. Because of this double linking, the graph contains cycles, and because of these cycles, the above clone method will run into an infinite loop and cause a stack overflow. Cycles are fairly common in object graphs, though they are not necessarily as obvious as in the above example. In order to prevent the clone method from running into a cycle death trap, we need to add a cycle a detection algorithm, for example by reference counting. How exactly this is implemented is beyond the scope of this article. Let’s just say it’s not that trivial.

If you can do without cycle detection, there is a simple alternative for creating deep copies of an object, one which does not require a clone method implementation:

$object1 = new Account();
$object2 = unserialize(serialize($object1));

This takes advantage of the PHP serialize() and unserialize() library functions that  convert an object back and forth to a string expression. These functions take nested object structure into account. However, they are expensive operations, both in terms of CPU and memory, and they should therefore be used with discretion.

Pogoplugged

Everyone seems to agree that the outgoing year 2011 was the year of the cloud. Judging by how often the word “cloud” was thrown at us by computer vendors, hosting companies, and service providers, it sounds like the greatest innovation since sliced bread. Of course, nothing could be farther from the truth. Cloud computing is not new at all. It has been around since the days of Multics and the ARPANET, at least conceptually. It is neither an invention nor a product, but an application of existing computer technologies, no matter how many companies now try to productise it now. The fuzzy term includes everything from network storage, utility computing, virtual server hosting, to service oriented architectures, typically delivered via the Internet (i.e. the cloud). In fact, the term is so blurry, that even the vendors themselves often disagree what it means, as famously Larry Ellison, CEO of Oracle, before his company jumped on the bandwagon.

Pogoplug 2Most people associate the word cloud with file hosting services such as Dropbox, Windows Azure, or Apple’s iCloud. Today, I want to talk about a product that provides an alternative to these network storage services, which provides in my opinion a superior solution. It’s called Pogoplug and it is a box that comes in a flashy pink. The idea is simple enough. You connect this box with your Wifi router on one side and with your storage media on the other side, and voilà, you get networked storage, aka your own “personal cloud” which is accessible on your LAN as well as from outside via the Internet. Besides connecting the box, you have to get an account with pogoplug.com and register your device. Optionally, you can install software that makes the attached storage available on your LAN as an external mass storage device. There are also free apps for iOS and Android that that allow you to access Pogoplug-managed storage from your tablet and/or phone.

Why is this such a clever product? Well, for two reasons. First, the Pogoplug is low-cost and easy to use. Second, it provides solutions to multiple problems. Let’s start with the first. The basic Pogoplug device costs 50 USD, and the web account is free. You can plug up to four external hard disks or flash memory sticks into the four USB ports, so one could easily realise four or eight Terabyte total capacity. External hosting is expensive by comparison; for example, a 50 GB Dropbox account costs 10 USD per month; with Apple’s iCloud it’s 100 USD per year for the same size. There are cheaper alternatives, such as livedrive.com or justcloud.com, but the annual expense still exceeds the cost of a Pogoplug device. What’s the catch? The download speed via Internet is limited to the upload speed of your Internet connection, which for the average DSL user is typically lower than the access speed of an external file storage service. Filling the Pogoplug devices with data, on the other hand, is much faster, because you can access the drives locally.

Now, about the multiple solutions aspect. What I like about the Pogoplug device is that I can reuse my external backup disks as network storage. I work with redundant pairs of disks, whereas one disk is plugged into the Pogoplug at all times and the other disk is used to create backups from my computers. In the second step, I mount the Pogoplug to my Linux workstation and synchronise the online storage with the fresh backups via rsync. In addition, I use my Pogoplug as a household NAS and media server. This comes in very handy for viewing my photo library on a tablet, or for streaming audio from my music collection to my phone. As long as I stay within my house/garden’s Wifi range, the data transfer happens at Wifi speed. Streaming movies is a little trickier. Usually I download movies from the Pogoplug to the mobile device before viewing.

In summary, the product offers a miniature file server for local access via LAN/Wifi and remote access via Internet plus some streaming services. Authentication service is provided by the pogoplug.com web server. As of late, you also get 5GB free cloud storage space externally hosted by pogoplug.com, which is likewise accessible via mobile apps and can even be mounted into your local network. The pogoplug device itself consumes only 5W, less than most NAS or mini PC servers. Obviously, the power consumption increases when connected USB hard disks draw power from it, so the most energy-efficient solution is probably to use either flash memory sticks or USB-powered disks that stop spinning in idle mode. Additionally, the Pogoplug device can be deployed as a LAN print server. Those who are comfortable with Unix administration and scripting can program the Pogoplug device to do even more.

Website: www.pogoplug.com

Specifications:
1.2GHz ARM CPU with 256MB RAM plus 512MB Flash storage,
4 x USB2 ports, 1 x 10/100/1000Mbps Ethernet port, integrated DC power supply
Supported Filesystems: NTFS, FAT32, Mac OS, Extended Journaled and non-Journaled (HFS+), EXT-2/EXT-3
Supported Browsers: Safari, Firefox 3, IE7, IE8, Chrome
Supported AV File Formats: H.264, MP4, AVI with motion JPEG, MP3

Digital Attrition

Naively, one might assume that digital artifacts, such as software, are not subject to decay and attrition, processes that affect physical objects. After all, any digital artifact reduces to a sequence of ones and zeros that -given durable storage- remains completely unaltered and would therefore function in exactly the same way even in ten, hundred, or a thousand years. However, this notion disregards an important aspect of digital products, namely that they don’t exist on their own. A digital artifact almost always exists as part of a digital ecosystem requiring other components to be available to fulfill its function. At the very least, it requires a set of conventions and standards. For example, even a simple text file requires a standard of how to encode letters.

This became once again painfully clear to me, when the WordPress software, on which this blog runs, suddenly started behaving erratically a few weeks ago. It produced 404 page-not-found errors that were impossible to diagnose and fix. I had not changed anything, being happy with the look and functionality of the blog, so the WordPress installation had reached the ripe old age of three and a half years. The cause had to be sought somewhere in the operating platform, which in this case means the web server configuration. Upon contacting the hosting provider, I was told that this problem had been diagnosed with older versions of WordPress and could only be cured by an upgrade.

Previous Blog ThemeI had no choice but to upgrade WordPress and the result is before you. Since the old theme, which can still be seen in the thumbnail image, is not compatible with the latest WordPress version, I derived a new theme from the included twentyeleven package. It takes into consideration that screen resolution has increased over the last few years and it also provides a display theme for mobile devices. Curiously, while still offering the same set of functions and features as version 2.3.1, the WordPress software version 3.2.1 has increased significantly in complexity. I ran a quick sloccount analysis, which told me that its codebase increased from 36,895 lines to 92,141 lines, not counting plugins and themes, and the average theme has roughly doubled in code size.

I am sure that this phenomenon is not unfamiliar to anyone who has worked with computers over a number of years. Remember how MS Office 97 contained every feature you would ever need? Since text processing and spreadsheets reached maturity quite early in the game, some people would even say this for the prior versions of Office. Yet, Microsoft has successfully marketed five successor versions of MS Office since then, the latest one being Office 2010. Needless to say that the more recent versions have gained significantly in complexity and size. But who needs it? Studies have shown that most people only use a small core set of features. Unless you are a Visual Basic programmer or have specific uncommon requirements, you would probably still do well with Office 97. Or would you not?

Upon closer look, you would probably not, and this is where the attrition factor comes into play. In case of Microsoft, it is safe to say that this effect has been engineered for the sake of continued profits. Not only are older version not supported any longer, but they do actually become incompatible with current versions. The change of file formats is a case in point. For example, do you know the differences/advantages of the Office x-fomats (such as .docx and .xlsx) over the older .doc/.xls formats? The new ones are zipped XML-based and as such easier to process automatically. However, most people using older office versions of Office or competing products cannot read these formats and are thus forced to upgrade or obtain software extensions for compatibility.

This does not only apply to Microsoft products, but -as previously mentioned- to digital artifacts in general. Remember floppy disks? Not long ago I found a box of them in the storage room. They contained sundry programs and files, reaching back into the Atari and MS-DOS era. Not only don’t I possess a floppy drive any longer, but even if I had one, I could not read these files. To access my earliest attempts at digital art and programming, for instance, I would have to read .PC2 and .GFA files on the GEMDOS file system, which would constitute a major archival effort. Perhaps I should keep the them until I am retired and find some time for such projects. The surprising thing is how fast attrition has rendered digital works useless in the past decades. While I can still find ways to play an old vinyl record from the eighties, for example, it’s almost impossible to access my digital records from the same era.

Android versus IOs

IOs vs. Android

The heydays of the personal computer are over. The fastest growth is not in the traditional PC segment any longer, but in tablet computers and mobile devices. The technological advances in this field have been phenomenal during the past few years. When I attempted an outlook into the mobile future four years ago in this blog entry, I had a time frame of ten years in mind. But it seems that the technology enabling the described functionality is already available.

I got my last mobile gadget in 2008 which -being based on Windows Mobile 6- was outdated only a year later. Though I was determined to keep the phone as long as possible, it gave up its ghost last month, after little more than three years. First, the power button stopped working and then the audio failed. Multiple organ failure, so to speak. The time for an upgrade had come. Since I promised my wife an iPad for her birthday, I got to buy two gadgets at the same time, an IOs-based IPad 2 tablet and an Android-based Samsung Galaxy S2 smartphone. Of course, these are not mere consumer items for me, but I am interested in studying and evaluating the available software development tools.

At this time, app development for either platform does not look like a lucrative proposition per se, unless one has access to marketing channels that enable economy of scales. However, it may be worthwhile to acquire the technical know-how nevertheless. For me, mobile app development is interesting, because it can be used to leverage existing web services and server applications. People want to use web-based services on the go with their mobile devices. The demand in this area is growing rapidly and it's probably just a question of time until proprietary corporate applications go in the same direction.

I have to admit that I am more drawn towards the Android platform, not just because the SDK is Java-based, but because it is an open platform. Apple currently has a unique position in the market as innovator and technology leader, but I doubt that the company can sustain its dominance in the long run. Aggressive vendor-lock might have worked for Microsoft in the nineties, but Apple's exclusionist strategies are more likely to annoy people. They definitely annoyed me. While I consider the absence of a file manager and Flash support a minor disadvantage on the iPad, the big pain points are iTunes and the lack of seamless data exchange.

I can connect my Android phone to my PC and fill it with music, video clips, photos, or whatever I desire using a simple USB file-level utility. On the iPad, I am forced to use a synchronization process controlled by iTunes, and since the iTunes software is not available for Linux, I have to shovel my data to a Windows PC first, just like in the bad old days of Microsoft ActiveSync. In addition, iTunes dictates what formats it is willing to accept. The height of my vexation, however, was reached when I found that I cannot register with the Apple store unless I submit my credit card data, even though I did not intend to buy anything at the time. Since the iPad is totally dependent on the app store for software updates, I grudgingly complied, but it definitely left the unpleasant impression that Apple is grabbing for my purse prematurely.

Fortunately, the iPad is such a great piece of hardware, that it stands to reason people are putting up with Apples's snappishness for now. It's still one of the best, if not the best tablet PC in the market. To be fair, one must also mention that iTunes has some good points, particularly the iTunes U area, which is a part of the iTunes store where education institutions publish free audio and video lectures. You could probably get a lifetime worth of high quality lectures out of iTunes U, if life were indeed long enough to learn about every imaginable topic.

For precisely this reason, because time is a limited resource, I have decided to take a closer look at Android, before I dabble in any other mobile OS, unless someone convinces me otherwise. As the market for smartphones and tablet OS is still dynamic and continues to evolve, it would be too early to draw final conclusions.

The Agile Samurai

The Agile Samurai

The Agile Samurai
by Jonathan Rasmusson
1st edition, 280 pages
Pragmatic Bookshelf

Book Review

Over the last ten years, I've been working with teams with different degrees of commitment to the agile process, ranging from non-existing to quite strong. I was looking for a text that summarises agile methodology to help me formalise and articulate my own experiences, and of course to enhance my knowledge of some of the finer points of agile practices. I have to admit that this book did not meet my expectations. The first eighty pages up to chapter six are mostly about project inception and read like a prolonged introduction. From chapter six onwards, the author finally comes to the point and discusses the core concepts of agile processes, so the book does get better with increasing page numbers. Unfortunately, Scrum isn't discussed at all, instead Kanban is introduced in chapter eight. The discussion of typical technical processes, such as refactoring, TDD, and continuous integration is compacted into several brief chapters at the end of the book.

The writing style is very informal; the author uses a conversational tone throughout the book. Almost every page contains illustrations, which makes it an easy and quick read. The style of the book is comparable to the Head First books. It left me with the the impression that I sat in an all-day meeting where someone said a lot of intelligent things to which everyone else agreed. Unfortunately, not many of these things seemed radically new or thought-provoking, so I fear I won't remember many of them next month. Of course, this may be entirely my own fault. I prefer a more formal, concise, old-school language. I also prefer dense and meaty text books with lots of diagrams, numbers and formulas. In return, I can dispense with stick figures, pictograms, and even with Master Sensei (a guru character used in the book). I feel that a lot of the deeper and more complex issues of agile project management have simply been left out.

To be fair, it must be mentioned that I probably do not fall into the target group for which this book was written. It is more appropriate as an introductory text for people who are new to agile project management, or even new to the entire business of project management. Think "trial lesson" and "starter course".