Laments of a would-be Ubuntuist

I have been a Linux fan for more than a decade. I used Linux in my own company and projects since 1996  and I was also one of the founding members of the Bangkok Linux User Group. Oddly however, the computer on my desktop still runs on Windows. It’s a glaring contradiction. I’ve wanted to replace Windows for years. There’s always been a reason not to, mainly because I need to test software under Windows for my customers. Last weekend, the XP installation on my laptop “forgot” my user account and with it all account data. Simultaneously, the file system started to behave funny. “Ah, a sign from above,” I thought. “Finally the day has come, I will install Ubuntu on my laptop.” So I did. Ubuntu Dekstop 9.04 was installed with ease and -even more impressively- it recognised all of my Thinkpad hardware. Even the Wifi connection was up and running without fiddling about.

I should have said “almost all” hardware. Unfortunately one piece of hardware refused cooperation with Linux, namely my Novatel USB modem. Since I’ve come to rely on 3G mobile Internet, this is a knockout criterion. No modem, no Internet. After hours of scouring the Web for possible solutions and  trying out various settings, I gave up in frustration. There wasn’t anything I could do except zapping the Linux partition and installing old friend XP. To attenuate my disappointment, I will make it a dual boot machine, though. Note to hardware vendors: please take Linux seriously and provide drivers for your nifty electronics. That would make life much easier. I guess I have to postpone my switch-over to Linux for another year. Hopefully I will be able to resist the urge to buy another piece of exotic hardware in the meantime.

Make WAR with Eclipse

No, it has nothing to do with armed conflict. Making WAR files is the Java way of packaging, distributing, and deploying web applications. While JAR stands for “Java archive”, WAR stands for “Web application archive”, or simply “Web archive”. In fact, the JAR and WAR formats are both gzipped directories that include a manifest. While a JAR file typically contains a collection of class files, a WAR file contains the entire content that goes into a Java Web application. More precisely, a WAR file contains all the static content, directories, JSPs, beans and classes, libraries, as well as the web.xml deployment descriptor. If you unpack a WAR file, you get a directory structure that mirrors the document root of a deployed application in a web container, such as Tomcat. I recently had to create a Web application in Eclipse. I realised that despite having worked with Eclipse for five years, this is something I never did before, because in the past I used Netbeans for creating web applications. But it’s just as easy in Eclipse. Here are is how:

To create a Java web project, you need to have the following software installed: a Java JDK, a recent version of Eclipse that contains the WTP Web Tools Platform module for Eclipse, and a web container or an application server, such as Tomcat, JBoss, WebSphere, etc.

1…Select File/New/Project from the menu. The following dialogue appears:

webapp-img01.png

2…Select Dynamic Web Project from the list and click on the Next button.
webapp-img02.png

3…Type a name for the new project and select a file system location for it. In the Target Runtime option, specify the web container or application server you using. This server is used to build and deploy your web application. If the drop-down box does not contain the desired server, click New… and select one of the predefined configurations (see Step 4). If you have already defined a Target Runtime, you can skip ahead to Step 6. The Dynamic Web Module version option specifies the architecture you are going to use in the web project. Select the latest version for a new project. Unfortunately, this cannot be changed later. By clicking the Modify… button in the Configuration section, you can select “facets” for your web application. What Eclipse calls “facets” are various building blocks and APIs, such as Java Server Faces, Java Persistence API, etc., that add functionality to your application.

webapp-img03.png

4…The New… button in the Target Runtime section opens a dialogue that lets you select the server on which the application is developed and deployed, which is probably the most important aspect of your configuration. Eclipse offers a number of common configurations for popular servers. If you cannot find your server in this list, click on the Download additional server adapters link and chances are that your server is listed. Make sure that the Create a new local server option is checked, so that you can find the server in the Eclipse server view later on.

webapp-img04.png

5…Once you specified the server type, you need to provide some details about it, such as the installation directory of the server, or the server root, and the JRE you want the server to run on. Click Finish when done.

webapp-img05.png

6…Finally, the dynamic web project wizard prompts you for some basic configuration data. The Context Root is the name that the web container matches with the  location where the application is deployed and simultaneously constitutes the root URL for the web application. The Content Directory specifies the name of the directory that contains the web application files. The Java Source Directory specifies the name of the directory that contains Java source code files. These settings are only relevant to the development machine. Make sure that the Generate deployment descriptor option is checked in order to automatically create the web.xml file. In most cases, you can probably accept the default settings and click Finish.

webapp-img06.png

7…Voilá. You have created a web application, or rather the framework for its development in Eclipse. The new project should now be visible in the Navigator view. There aren’t any files yet, except the ones which were generated automatically by Eclipse. The next step would be to write your web application, and possibly incorporating the application framework of your choice. Piece of cake.

webapp-img07.png

8…The Server view should display the server you have chosen for your project. If everything went OK, you can start and stop the server from this view. The server can be started in normal mode, debug mode, or profiling mode. Debug mode needs to be selected if you want to define breakpoints in your Java code. While you edit sources, such as JSP files, servlets, bean classes, static content, etc., Eclipse automatically redeploys these resources to the running server as soon as you save them. You can view your web application in a separate browser window and receive debug output in Eclipse’s Console view.

webapp-img08.png

9…After you have written your formidable web application, it’s time to share it with the world, or in more technical terms, to distribute and deploy it. The process of creating a distributable WAR file is extremely simple. Select File/Export from the Eclipse menu and click on the WAR file option in the Web category.

webapp-img09.png

10…After clicking the Next button, specify the web project to be packaged, the file destination, and the target server. Although the latter is not a mandatory option, it is probably an important one. The selected server is likely to be the same as the one chosen in Step 3. Click Finish and there you have your masterpiece in a handy WAR format.

Grid Computing For A Cause

A few months ago I wondered what to do with the computing power of my new Quadcore PC. It seemed that my daily compiler runs, virtual machines, and the occasional game session don’t make full use of the capacity of this machine. The CPU meter rarely exceeds the 30% mark processing these mundane tasks. It doesn’t even break a sweat when compressing MPEG data. In principle, this is a good thing of course. Yet, the thought that the CPU cores remain underutilised for most of their lifetime appeared slightly wasteful to me. What to do with it? Well, I have found the answer to that question. The answer is BOINC.

BOINC stands for Berkeley Open Infrastructure for Network Computing, which is quite a mouthful, but the program’s purpose is easy to explain: it lets you donate computing resources to the research sector. With BOINC your computer becomes part of a research network. You can choose one or more research projects from a list to which you want to donate computing resources. The BOINC software downloads tasks from these projects, which are then executed on your machine. When the computing tasks are completed, the results are sent back to the project’s host computer. Downloading and uploading happens automatically via the Internet. The project host computer distributes tasks to hundreds or possibly thousands of PCs and coordinates all computing tasks.

This is fashionably called “grid computing”. In essence, the grid is made up by the group of volunteers in case of BOINC, or rather their computers, which are located all over the world. BOINC has more than half a million participants which bring together a whopping 900 to 1000 teraflops from their desktops. This is more computing power than the world’s largest supercomputer, the IBM Blue Gene, currently offers. Unsurprisingly, this quasi-supercomputing platform is used for computationally intensive tasks, or “number crunching” tasks. The best thing about BOIC, however, is that it doesn’t take away CPU cycles from your applications. The BOINC computing tasks run as low priority processes in the background and thus only use CPU cycles when no other program needs them. Hence, there is no noticeable performance decrease.

You might wonder at this point what the BOINC projects are about and why you should donate computing resources to them. There are plenty of projects with different aims and scopes, but it all began with one single project: SETI@home, whereas SETI stands for Search for Extraterrestrial Intelligence. The SETI@home project began in 1999. It is different from other SETI projects in that it relies almost exclusively on donated computing power. The software analyses data from the Arecibo radio telescope and tries to identify potential ETI signals. Although no such signals were found yet, the project has been a big success and it still draws new volunteers. As one of the first volunteer-based grid computing projects, it has demonstrated that the approach is not only viable, but that results generally exceeded expectations. It has also given people a better understanding of some of the challenges that anonymous grid computing entails.

As mentioned, today there are many different research projects that make use of BOINC. The list is growing since BOINC was GPL-ed in 2003. I am sure you will find many worthy causes among them. For example, in the medical sector, there is cancer and HIV research as well as malaria control and human genome research. The World Community Grid, which uses BOINC as one type of client software, specialises in research projects that benefit humanity directly. Then there is climateprediction.net which tries to produce a forecast of the climate in the 21st century. There are a number of biology and bioinformatics projects, such as Rosetta@home which develops computational methods to accurately predict and design proteins and protein complexes. This may ultimately help to find cures for diseases. Finally, there’s a growing number of science projects from quantum physics and astronomy to mathematics.

CPU Resource UsageI am running BOINC for a week and my computer is happily plodding away at constant 100% CPU load. The resource usage graph shows all four CPU cores at max. It doesn’t seem to affect the system negatively, although I have to say, the computer does get noticeably hotter at this load. This definitely means higher energy consumption and thus a higher electricity bill. According to the BOINC Wiki at Berkeley, the power consumption increase is around 50%. Admittedly, I was a bit concerned about overheating, because this is the hot season in Thailand and room temperature is often around 30 deg. Celsius. However, my computer has borne up bravely so far. In order to reduce the heat problem, BOINC allows you to throttle CPU usage to a certain percentage, say 70%, which results in a square pulse resource usage graph. I might try that if it is getting any hotter.

Click this link to download BOINC.

Sun acquires MySQL

It’s been on the news wire for two or three weeks already, but I just learned today that Sun is going to buy MySQL. My first thought was: “Oh, that’s great news.” Now MySQL can put a Sun logo on their product. That will finally allow them to enter the Fortune 500 stratosphere. Wow! MySQL really came a long way. Who would have thought so in the late nineties. My Kudos to Michael Widenius (Monty), the programmer who started this thing, and who is currently serving as CTO at MySQL AB. I hope some of the one billion dollars, which Sun is ready to pay, will go to Monty. This would prove that you can actually get rich from giving away software. It would also prove that a company’s major assets are its people and its innovation rather than bricks and mortar. The execs at both MySQL AB and Sun seem to be quite upbeat about the deal (see Jonathan Schwartz’s blog for example) and are generous with praise (who would be surprised)? I wonder what will happen to the Dolphin logo. It’s sort of cute – a bit like Sun’s Glassfish logo.

Choosing a content management system

If you are playing with the idea of using a content management system (CMS), or if your organisation has already decided to deploy a CMS, then you are facing an important but difficult decision. On the one hand, you know that a CMS is the best way to handle your ever-growing content. On the other hand you are confronted with a bewildering variety of products that leaves you at a complete loss. To make things worse, you know that the choice of a CMS has a far-reaching implications on business processes. Choosing a CMS is not an easy task. It is imperative to select your CMS solution wisely. Deploying an inappropriate product may thwart your project, and it may even be worse than deploying no CMS at all.

In the pioneer days of the Web, there was only one way of publishing information: coding it in HTML and uploading it. The extreme simplicity of this approach was offset by its laboriousness. Developing, updating, and maintaining a medium scale website, say a hundred pages and more, required an insane amount of developer hours, and to make things worse, these were insanely expensive. The software industry soon responded to the dilemma by offering WYSIWIG editors and HTML code generators. With these tools it was possible to design and author websites graphically without having to care about nitty-gritty coding details.

The more advanced editors offered design templates, code snippets, plug-ins, and readymade sequences. They could generate the required set of HTML, JavaScript, and graphic files at a mouse click. These files then had to be uploaded one by one. Although this method is more efficient than manual coding, it still has several drawbacks. Whenever something is changed, pages must be generated and uploaded again, which is time consuming. Sometimes a small change in the design template can mean that hundreds of files need to be replaced. Moreover, the uploaded content is static. This means that it cannot change according to defined parameters, such as user preferences, sort order, date, and so on. Hence, static pages offer limited possibilities for interactive features. This drawback is overcome by the concept of dynamic web pages.

Dynamic pages are generated dynamically at request time. A dynamic web page is not a sequence of HTML tags, but an interpreted computer program (=script) that generates an HTML sequence according to predefined rules. This script is typically executed by a script language interpreter which passes on the resulting HTML sequence to the web server. Dynamical web page scripting unfolds its full potential in combination with an information repository, such as a relational database system, which holds the actual text and media contents. HTML code and information are merged when a user requests a page, and the result changes depending on defined conditions. Today, almost all large websites are based on this principle.

The CMS principle

A content management system (CMS) is a computer program that facilitates the collaborative creation, storage, delivery, distribution, and maintenance of “content”, that is documents, images, and other information. Typically the CMS is a web application and its content is distributed via the Internet or via a private intranet. A CMS exploits the principle of dynamic page generation and adds a further abstraction layer. It streamlines the process of web site creation by automating page generation and by applying templates and predefined features to an entire website. This allows the webmaster to focus on actual content creation and management. CMS either come with a special client software that allows webmasters to edit content and construct web pages, or there is a web-based administrator interface performing this function. The tasks of creating page layout, navigation, scripts and adding modules are left to the CMS. At the heart of every CMS is a database, usually a relational DBMS, which holds the information that constitutes the online content.

Types of CMS

Besides general purpose CMS that facilitate general website creation, there are a number of specialised CMS. For example, Wikis or Wikiwebs are CMS for the collaborative creation of knowledge bases, such as encyclopaedias, travel guides, directories, etc. These systems typically make it easy for anyone to change or add information. Publication CMS (PCMS) allow publishers to deliver massive amounts of content online. They are frequently used by media organisations and publishing houses to create web versions of their print media or broadcasts. Transactional CMS couple e-commerce functions with rich content. As in the case of amazon.com, they are used for applications that go beyond standard shopping cart functionality. Integrated CMS (ICMS) are systems that combine document management with content management. Frequently, the CMS part is an extension of a conventional document management application. Enterprise CMS (ECMS) are large applications that add a variety of specialised functions to the CMS core, such as document management, team collaboration, issue tracking, business process management, work flow management, customer relationship management, and so on.

It is also possible to define market segments by licensing cost. In this case, we can distinguish the following types:

  1. Free open-source CMS (no licensing cost). These products are typically quite simple and focus on general purpose and publishing functionality. Portals and Wikis also belong to this category.
  2. Boxed solutions (up to $3,000.- USD). These products typically offer solutions that allow non-technical users to create and manage websites collaboratively.
  3. Midrange solutions ($3,001.- to $ 30,000.- USD) commonly have a greatly extended set of functions in comparison to boxed solutions, although scope and philosophy may vary significantly. For example there are web development platforms, as well as powerful ICMS in this category.
  4. High-end solutions ($30,001.- USD up) are usually targeted at the enterprise market. Solutions in this class are often designed to handle massive amounts and types of documents and to automate business processes.
  5. Hosted solutions (for a monthly subscription fee) can be found in all of the three previous categories. Instead of a on-time license cost, there is a monthly fee.

The market is highly fragmented and there is a great variety of products in every segment. The largest segment is general purpose CMS with a multitude of proprietary and open-source, commercial, and non-commercial solutions. The sheer number of products makes a comprehensive review practically impossible. It is vital to narrow down the selection of CMS by compiling a list of requirements beforehand. In particular, the requirements should specify what sort of content you wish to manage, which category of CMS you are likely to prefer, and what should be its key features and capabilities. For example, if you wish to maintain documents and web pages in multiple languages, it is important to look for a software that supports this from the outset. Although many CMS can be adapted to handle multilingual content, they do this in different ways. Some may be unsatisfactory to you.

CMS Selection Checklist

Sometimes it is useful to use checklists to determine product features. These can help to narrow down the number of products you might want to review more closely.

Commercial checklist

  • Availability
  • Price
  • Licensing model
  • Total cost of ownership

Technical checklist

  • Supported operating systems
  • Supported web servers
  • Supported browsers
  • Supported database systems
  • Required hardware
  • Programming language
  • System architecture

Functionality checklist

  • Content organisation model (hierarchic/segmented, centralised/decentralised, etc.)
  • Content generation features (editors, spell checkers, etc.)
  • Content attributes (author, publication date, expiry date, etc.)
  • Content delivery (presentation, layout, visualisation, etc.)
  • Content management (moving, deleting, archiving, etc.)
  • Content versioning (multilingual, multiple versions)
  • Media management (images, animations, audio, etc.)
  • Link management (automatic navigation, link consistency checks, etc.)
  • User management (authentication, security, granularity of access privileges, etc.)
  • Template management (design, installation, maintenance)
  • Features for searching and browsing content
  • Special features (email forms, feedback lists, discussion boards, etc.)
  • Extensibility (plug-ins, add-ons, third party modules, etc.)

Integration checklist

  • Integration with external text editors
  • Integration with external image and media editors
  • Integration with external data
  • Integration with static website content
  • Integration with legacy systems

Helpful websites

There are a number of websites that offer CMS comparisons, descriptions, tests, and reviews. These may be helpful in the second phase of selection. After requirements have been gathered and desired key features have been defined, these websites assist prospects in determining concrete products for closer review.

  • www.opensourcecms.com
  • www.cmsmatrix.org
  • www.cmsjournal.com
  • www.cmswatch.com
  • www.contentmanager.net

The final step in CMS selection is to review and evaluate concrete products. This step may be fairly labour-intensive. Vendors must be invited. A trial version of the product must be obtained. It must be installed and configured properly. Its basic functions and features must be learned. Test data must be entered. Meetings and group reviews must be scheduled and held. The whole process may have to be repeated with a number of different products. This may sound off-turning, but the do-it-yourself approach is really the only way to ensure that you get the right product.

Management involvement

As always, management involvement is crucial. The decision making process cannot be completely delegated to IT, because in the end, the job of the CMS is to automate a business function, not an IT function. Depending on the nature of your content, it may be a marketing function, an R&D function, a human relation function, or even a production function as in the case of publishing houses. Depending on how you use the CMS it may also have a large impact on organisational communication. Therefore, management should be involved in phase one and three of the selection process. At the very least, management should review and approve the requirements specification and join the final review meetings. Often it is important to get an idea of the “look and feel” of a product beforehand.

After the acquisition

Once the chosen CMS is acquired and properly installed, people may create and publish content as they wish and live happily ever after. Well, not quite. If users are happy with the system, there may be a quick and uncontrolled growth of content. If they aren’t, the system may gather dust and the electronic catalogues may remain empty. The usual approach to regulate this is to put a content manager in charge of the system. The role of the content manager is different from that of a traditional webmaster. While a webmaster needs to be very tech-savvy, a content manager merely needs to be computer literate. The main responsibility is content editing and organisation. Hence, the role of a typical content manager is that of an editor and librarian.

Long term perspectives

Proprietary content management systems are currently expensive, especially in the enterprise (ECM) segment. The overall market will remain fragmented in the medium term. In the long term, however, the CMS market is likely to be commoditised. This means free open-source systems are likely to dominate the market. Currently open-source products are encroaching the “boxed solution” and “midrange” market. There are even a number of powerful open-source CMS with web delivery focus, such as typo3, which are comparable to proprietary high-performance products. As open-source solutions get more powerful, this trend is likely to continue. Extensibility, a large user base, and commercial support will be crucial for a system to assume a market leader position. At this moment, however, there are no candidates in sight.