Grid Computing For A Cause

A few months ago I wondered what to do with the computing power of my new Quadcore PC. It seemed that my daily compiler runs, virtual machines, and the occasional game session don’t make full use of the capacity of this machine. The CPU meter rarely exceeds the 30% mark processing these mundane tasks. It doesn’t even break a sweat when compressing MPEG data. In principle, this is a good thing of course. Yet, the thought that the CPU cores remain underutilised for most of their lifetime appeared slightly wasteful to me. What to do with it? Well, I have found the answer to that question. The answer is BOINC.

BOINC stands for Berkeley Open Infrastructure for Network Computing, which is quite a mouthful, but the program’s purpose is easy to explain: it lets you donate computing resources to the research sector. With BOINC your computer becomes part of a research network. You can choose one or more research projects from a list to which you want to donate computing resources. The BOINC software downloads tasks from these projects, which are then executed on your machine. When the computing tasks are completed, the results are sent back to the project’s host computer. Downloading and uploading happens automatically via the Internet. The project host computer distributes tasks to hundreds or possibly thousands of PCs and coordinates all computing tasks.

This is fashionably called “grid computing”. In essence, the grid is made up by the group of volunteers in case of BOINC, or rather their computers, which are located all over the world. BOINC has more than half a million participants which bring together a whopping 900 to 1000 teraflops from their desktops. This is more computing power than the world’s largest supercomputer, the IBM Blue Gene, currently offers. Unsurprisingly, this quasi-supercomputing platform is used for computationally intensive tasks, or “number crunching” tasks. The best thing about BOIC, however, is that it doesn’t take away CPU cycles from your applications. The BOINC computing tasks run as low priority processes in the background and thus only use CPU cycles when no other program needs them. Hence, there is no noticeable performance decrease.

You might wonder at this point what the BOINC projects are about and why you should donate computing resources to them. There are plenty of projects with different aims and scopes, but it all began with one single project: SETI@home, whereas SETI stands for Search for Extraterrestrial Intelligence. The SETI@home project began in 1999. It is different from other SETI projects in that it relies almost exclusively on donated computing power. The software analyses data from the Arecibo radio telescope and tries to identify potential ETI signals. Although no such signals were found yet, the project has been a big success and it still draws new volunteers. As one of the first volunteer-based grid computing projects, it has demonstrated that the approach is not only viable, but that results generally exceeded expectations. It has also given people a better understanding of some of the challenges that anonymous grid computing entails.

As mentioned, today there are many different research projects that make use of BOINC. The list is growing since BOINC was GPL-ed in 2003. I am sure you will find many worthy causes among them. For example, in the medical sector, there is cancer and HIV research as well as malaria control and human genome research. The World Community Grid, which uses BOINC as one type of client software, specialises in research projects that benefit humanity directly. Then there is climateprediction.net which tries to produce a forecast of the climate in the 21st century. There are a number of biology and bioinformatics projects, such as Rosetta@home which develops computational methods to accurately predict and design proteins and protein complexes. This may ultimately help to find cures for diseases. Finally, there’s a growing number of science projects from quantum physics and astronomy to mathematics.

CPU Resource UsageI am running BOINC for a week and my computer is happily plodding away at constant 100% CPU load. The resource usage graph shows all four CPU cores at max. It doesn’t seem to affect the system negatively, although I have to say, the computer does get noticeably hotter at this load. This definitely means higher energy consumption and thus a higher electricity bill. According to the BOINC Wiki at Berkeley, the power consumption increase is around 50%. Admittedly, I was a bit concerned about overheating, because this is the hot season in Thailand and room temperature is often around 30 deg. Celsius. However, my computer has borne up bravely so far. In order to reduce the heat problem, BOINC allows you to throttle CPU usage to a certain percentage, say 70%, which results in a square pulse resource usage graph. I might try that if it is getting any hotter.

Click this link to download BOINC.

Sun acquires MySQL

It’s been on the news wire for two or three weeks already, but I just learned today that Sun is going to buy MySQL. My first thought was: “Oh, that’s great news.” Now MySQL can put a Sun logo on their product. That will finally allow them to enter the Fortune 500 stratosphere. Wow! MySQL really came a long way. Who would have thought so in the late nineties. My Kudos to Michael Widenius (Monty), the programmer who started this thing, and who is currently serving as CTO at MySQL AB. I hope some of the one billion dollars, which Sun is ready to pay, will go to Monty. This would prove that you can actually get rich from giving away software. It would also prove that a company’s major assets are its people and its innovation rather than bricks and mortar. The execs at both MySQL AB and Sun seem to be quite upbeat about the deal (see Jonathan Schwartz’s blog for example) and are generous with praise (who would be surprised)? I wonder what will happen to the Dolphin logo. It’s sort of cute – a bit like Sun’s Glassfish logo.

Choosing a content management system

If you are playing with the idea of using a content management system (CMS), or if your organisation has already decided to deploy a CMS, then you are facing an important but difficult decision. On the one hand, you know that a CMS is the best way to handle your ever-growing content. On the other hand you are confronted with a bewildering variety of products that leaves you at a complete loss. To make things worse, you know that the choice of a CMS has a far-reaching implications on business processes. Choosing a CMS is not an easy task. It is imperative to select your CMS solution wisely. Deploying an inappropriate product may thwart your project, and it may even be worse than deploying no CMS at all.

In the pioneer days of the Web, there was only one way of publishing information: coding it in HTML and uploading it. The extreme simplicity of this approach was offset by its laboriousness. Developing, updating, and maintaining a medium scale website, say a hundred pages and more, required an insane amount of developer hours, and to make things worse, these were insanely expensive. The software industry soon responded to the dilemma by offering WYSIWIG editors and HTML code generators. With these tools it was possible to design and author websites graphically without having to care about nitty-gritty coding details.

The more advanced editors offered design templates, code snippets, plug-ins, and readymade sequences. They could generate the required set of HTML, JavaScript, and graphic files at a mouse click. These files then had to be uploaded one by one. Although this method is more efficient than manual coding, it still has several drawbacks. Whenever something is changed, pages must be generated and uploaded again, which is time consuming. Sometimes a small change in the design template can mean that hundreds of files need to be replaced. Moreover, the uploaded content is static. This means that it cannot change according to defined parameters, such as user preferences, sort order, date, and so on. Hence, static pages offer limited possibilities for interactive features. This drawback is overcome by the concept of dynamic web pages.

Dynamic pages are generated dynamically at request time. A dynamic web page is not a sequence of HTML tags, but an interpreted computer program (=script) that generates an HTML sequence according to predefined rules. This script is typically executed by a script language interpreter which passes on the resulting HTML sequence to the web server. Dynamical web page scripting unfolds its full potential in combination with an information repository, such as a relational database system, which holds the actual text and media contents. HTML code and information are merged when a user requests a page, and the result changes depending on defined conditions. Today, almost all large websites are based on this principle.

The CMS principle

A content management system (CMS) is a computer program that facilitates the collaborative creation, storage, delivery, distribution, and maintenance of “content”, that is documents, images, and other information. Typically the CMS is a web application and its content is distributed via the Internet or via a private intranet. A CMS exploits the principle of dynamic page generation and adds a further abstraction layer. It streamlines the process of web site creation by automating page generation and by applying templates and predefined features to an entire website. This allows the webmaster to focus on actual content creation and management. CMS either come with a special client software that allows webmasters to edit content and construct web pages, or there is a web-based administrator interface performing this function. The tasks of creating page layout, navigation, scripts and adding modules are left to the CMS. At the heart of every CMS is a database, usually a relational DBMS, which holds the information that constitutes the online content.

Types of CMS

Besides general purpose CMS that facilitate general website creation, there are a number of specialised CMS. For example, Wikis or Wikiwebs are CMS for the collaborative creation of knowledge bases, such as encyclopaedias, travel guides, directories, etc. These systems typically make it easy for anyone to change or add information. Publication CMS (PCMS) allow publishers to deliver massive amounts of content online. They are frequently used by media organisations and publishing houses to create web versions of their print media or broadcasts. Transactional CMS couple e-commerce functions with rich content. As in the case of amazon.com, they are used for applications that go beyond standard shopping cart functionality. Integrated CMS (ICMS) are systems that combine document management with content management. Frequently, the CMS part is an extension of a conventional document management application. Enterprise CMS (ECMS) are large applications that add a variety of specialised functions to the CMS core, such as document management, team collaboration, issue tracking, business process management, work flow management, customer relationship management, and so on.

It is also possible to define market segments by licensing cost. In this case, we can distinguish the following types:

  1. Free open-source CMS (no licensing cost). These products are typically quite simple and focus on general purpose and publishing functionality. Portals and Wikis also belong to this category.
  2. Boxed solutions (up to $3,000.- USD). These products typically offer solutions that allow non-technical users to create and manage websites collaboratively.
  3. Midrange solutions ($3,001.- to $ 30,000.- USD) commonly have a greatly extended set of functions in comparison to boxed solutions, although scope and philosophy may vary significantly. For example there are web development platforms, as well as powerful ICMS in this category.
  4. High-end solutions ($30,001.- USD up) are usually targeted at the enterprise market. Solutions in this class are often designed to handle massive amounts and types of documents and to automate business processes.
  5. Hosted solutions (for a monthly subscription fee) can be found in all of the three previous categories. Instead of a on-time license cost, there is a monthly fee.

The market is highly fragmented and there is a great variety of products in every segment. The largest segment is general purpose CMS with a multitude of proprietary and open-source, commercial, and non-commercial solutions. The sheer number of products makes a comprehensive review practically impossible. It is vital to narrow down the selection of CMS by compiling a list of requirements beforehand. In particular, the requirements should specify what sort of content you wish to manage, which category of CMS you are likely to prefer, and what should be its key features and capabilities. For example, if you wish to maintain documents and web pages in multiple languages, it is important to look for a software that supports this from the outset. Although many CMS can be adapted to handle multilingual content, they do this in different ways. Some may be unsatisfactory to you.

CMS Selection Checklist

Sometimes it is useful to use checklists to determine product features. These can help to narrow down the number of products you might want to review more closely.

Commercial checklist

  • Availability
  • Price
  • Licensing model
  • Total cost of ownership

Technical checklist

  • Supported operating systems
  • Supported web servers
  • Supported browsers
  • Supported database systems
  • Required hardware
  • Programming language
  • System architecture

Functionality checklist

  • Content organisation model (hierarchic/segmented, centralised/decentralised, etc.)
  • Content generation features (editors, spell checkers, etc.)
  • Content attributes (author, publication date, expiry date, etc.)
  • Content delivery (presentation, layout, visualisation, etc.)
  • Content management (moving, deleting, archiving, etc.)
  • Content versioning (multilingual, multiple versions)
  • Media management (images, animations, audio, etc.)
  • Link management (automatic navigation, link consistency checks, etc.)
  • User management (authentication, security, granularity of access privileges, etc.)
  • Template management (design, installation, maintenance)
  • Features for searching and browsing content
  • Special features (email forms, feedback lists, discussion boards, etc.)
  • Extensibility (plug-ins, add-ons, third party modules, etc.)

Integration checklist

  • Integration with external text editors
  • Integration with external image and media editors
  • Integration with external data
  • Integration with static website content
  • Integration with legacy systems

Helpful websites

There are a number of websites that offer CMS comparisons, descriptions, tests, and reviews. These may be helpful in the second phase of selection. After requirements have been gathered and desired key features have been defined, these websites assist prospects in determining concrete products for closer review.

  • www.opensourcecms.com
  • www.cmsmatrix.org
  • www.cmsjournal.com
  • www.cmswatch.com
  • www.contentmanager.net

The final step in CMS selection is to review and evaluate concrete products. This step may be fairly labour-intensive. Vendors must be invited. A trial version of the product must be obtained. It must be installed and configured properly. Its basic functions and features must be learned. Test data must be entered. Meetings and group reviews must be scheduled and held. The whole process may have to be repeated with a number of different products. This may sound off-turning, but the do-it-yourself approach is really the only way to ensure that you get the right product.

Management involvement

As always, management involvement is crucial. The decision making process cannot be completely delegated to IT, because in the end, the job of the CMS is to automate a business function, not an IT function. Depending on the nature of your content, it may be a marketing function, an R&D function, a human relation function, or even a production function as in the case of publishing houses. Depending on how you use the CMS it may also have a large impact on organisational communication. Therefore, management should be involved in phase one and three of the selection process. At the very least, management should review and approve the requirements specification and join the final review meetings. Often it is important to get an idea of the “look and feel” of a product beforehand.

After the acquisition

Once the chosen CMS is acquired and properly installed, people may create and publish content as they wish and live happily ever after. Well, not quite. If users are happy with the system, there may be a quick and uncontrolled growth of content. If they aren’t, the system may gather dust and the electronic catalogues may remain empty. The usual approach to regulate this is to put a content manager in charge of the system. The role of the content manager is different from that of a traditional webmaster. While a webmaster needs to be very tech-savvy, a content manager merely needs to be computer literate. The main responsibility is content editing and organisation. Hence, the role of a typical content manager is that of an editor and librarian.

Long term perspectives

Proprietary content management systems are currently expensive, especially in the enterprise (ECM) segment. The overall market will remain fragmented in the medium term. In the long term, however, the CMS market is likely to be commoditised. This means free open-source systems are likely to dominate the market. Currently open-source products are encroaching the “boxed solution” and “midrange” market. There are even a number of powerful open-source CMS with web delivery focus, such as typo3, which are comparable to proprietary high-performance products. As open-source solutions get more powerful, this trend is likely to continue. Extensibility, a large user base, and commercial support will be crucial for a system to assume a market leader position. At this moment, however, there are no candidates in sight.

Freebie of the Month: PSPad

A good plain text editor is the Swiss army knife of every programmer. Unfortunately, the Windows operating system offers only the “Notepad” program in this category, which is the equivalent of a $1.50 plastic knife. If you want to do more than opening an occasional README.TXT, then Notepad is definitely underpowered. This situation has created a market for commercial text editors, such as Ultra-Edit, CodeWright, EditPlus and others, which are excellent products, however, these programs are not free. In the open source arena there are well known editors, such as GNU Emacs and vim, which have evolved on the Unix platform. These editors are very powerful, but they are quirky and not exactly easy to learn and use. Why put up with a learning curve, when more user-friendly products are available? A multitude of freeware text editors with varying features is available for the Windows platform.

When I searched the Internet for a freeware editor, I was looking for raw power, speed, and features. In that order. The PSPad editor written by the Czech author Jan Fiala fits the bill perfectly. First of all, it is fast. Even on a modest Pentium IV computer, it starts up in less than two seconds. This is an important characteristic, since a text editor might get loaded dozens of times in succession for viewing or changing different files. It also makes it convenient to use PSPad when I don’t want to fire up a “heavy duty” IDE, such as Eclipse.

PSPad’s look is neat and functional. It presents itself with customisable tool bars, tabbed editor windows and a logically structured menu. Text windows can also be floated or tiled. The feature set of PSPad can compete with commercial high-end products. It includes syntax highlighting for dozens of programming languages, auto backups, macros, hex edit mode, integrated diff comparisons, pluggable text converters, customisable short-cut key map, spell checker, support for Windows, Unix, and Mac line endings, support for different character sets, HTML formatting and HTML validation through Tidy. This makes it ideal for editing a wide variety of file types from C++ source files to HTML pages, SQL statements, XML files, and shell scripts.

One feature I really liked is the multi-language code explorer, a feature that is otherwise only found in high-end IDEs. The code explorer seems to be capable of displaying almost anything from the DOM tree of an HTML document to a PHP or Java class. However, the most important aspect of a text editor for me is powerful search and replace capability. In this area, PSPad once again delivers. PSPad supports Perl-compatible regular expressions for search and replace operations, which is a make-or-break criterion for automated text processing. It also supports search and replace in multiple files, even recursively in subdirectories, which is again great for automated processing. The only limitation is that it cannot do both at the same time. It either processes regular expressions or multiple files, but not both. I am not sure why this limitation exists. Without it, PSPad would be pretty close to perfection.

Open source on the rise, says IDC

According to a recent IDC survey based on over 5,000 developer interviews in 116 countries, open source is gaining momentum. The phenomenon extends well beyond the traditional Linux user groups and computer hobbyists. IDC comes to the conclusion that open source software ought to be viewed as the most significant all-encompassing and long-term trend that the software industry has seen since the early 1980s.

Presently open source products are used in three fourths of all organisations and there are several hundred thousand open source projects under development. IDC says that the pervasive influence of open source will ultimately impact the software industry on a large scale and that it will fundamentally change the value proposition of packaged software for customers. Open source products already begin to play a prominent role in the life-cycle of major software categories.

IDC’s research indicates that open source software is presently deployed by 71% of developers worldwide. 50% stated that the use of open source products in their organisations is growing. Finally, 54% of the surveyed organisations are themselves presently working on some type of open source product.

The study offers additional insights into the proliferation of open source software:

  • Throughout the coming decade, open source will take over a percentage in the low double digits of the software market and elicit fierce price competition

  • The effect of open source on the software life-cycle and on software innovation will outweigh the importance of price effects in the market

  • Three different business models will be vital for vendor success in the software industry: the software revenue model, the public collective model, and the service broker model

  • Core competencies different from the traditional software production and marketing will determine vendor success in markets dominated by open source software

Dr. Anthony Picardi, senior vice president of Global Software Research at IDC, explains: “Although open source will significantly reduce the industry opportunity over the next ten years, the real impact of open source is to sustain innovations in mature software markets, thus extending the useful life of software assets and saving customers money.”

Picradi concluded that “business requirements shift from acquiring new customers to sustaining existing ones, the competitive landscape will move towards costs savings and serving up sustaining innovations to savvy customers, along with providing mainstream software to new market segments that are willing to pay only a fraction of conventional software license fees. Open source software is ultimately a resource for sustaining innovators.”