Naming Conventions (3)

2009-06-14

Today I am going to talk about best practices in identifier naming. Before thinking about concrete strategies for devising identifier names, one should answer the following three questions: 1. What (human) language to use. 2. Whether to use a naming scheme or not. 3. How platform and language choices affect identifier naming. While I generally don’t like to use naming schemes for the reasons mentioned in the last article, it’s almost always a good idea to adopt existing conventions for the specific platform, language and domain one is working in. Certain languages have established strong conventions about how identifiers are formed. For example, nothing keeps you from assigning lower-case names to classes in Java, but it surely raises eyebrows among Java programmers. Likewise, Java enthusiasts would probably look down their noses at code that uses underscores instead of lower CamelCase.

Other programming languages, like PHP or Perl, have less strict conventions for identifier names. Whatever the case may be, it’s invariably useful to honour widely used conventions and apply them consistently, as it avoids misunderstandings and thus reduces the chance for errors. Apart from consistency and conventions, the most important point to get right is semantic precision. The identifier name should be stated as clearly and as intelligible as possible, without being overly wordy. Ambiguity is the greatest enemy here, especially when there are several identifiers floating around that have similar, yet subtly different meanings. Consider the following example:

for (Event event: events) {
  trainees += event.trainees.size();
  if (event.status == 1 && event.end >= time) {
    for (Person train : event.trainees)
      if (train.passed) {
        trainee.add(train);
        trained++;
    }
    else training.add(train);
  }
  else if (event.status == 0 || event.end < time) {
    training.addAll(event.trainees);
}

This code looks convoluted because of the overuse of the word “train”. What it does is this: it iterates over all workshop events and creates a collection of trainees that have successfully completed a workshop and another of trainees that did not complete a workshop or are still being trained. It also counts the total number of trainees and the number of successful ones. Apart from the ambiguous “train…”, readability suffers because of non-descriptive variable names and the use of constant integer literals. Here is a more readable version:

for (Event workshop: allWorkshops) {
  numOfTrainees += workshop.trainees.size();
  if (workshop.status == COMPLETED && workshop.end >= today) {
    for (Person trainee : workshop.trainees)
      if (trainee.passed) {
        graduates.add(trainee);
        numOfGraduates++;
    }
    else peopleInTraining.add(trainee);
  }
  else if (workshop.status == CANCELLED || workshop.end < today) {
    peopleInTraining.addAll(event.trainees);
}

This brief example shows how to transform an ambiguous piece of code into one that can be read and understood quite easily. The question is: what are the underlying principles? Unfortunately, the answer to this question is complicated. Rather than attempting an analysis, I suggest to look at existing best practices that are widely adopted. Steven McConnel describes such a set of practices in his acclaimed book Code Complete (2nd Edition) in Chapter 11, The Power Of Variable Names. He writes: “An effective technique for coming up with a good name is to state in words what the variable represents. Often that statement itself is the best variable name. It’s easy to read because it doesn’t contain cryptic abbreviations, and it’s unambiguous.”

In the remainder of this article, I will try to present a summary of Steve McConnell’s set of best practices. Describe the “what”, not the “how”. A good identifier name relates to the problem domain rather than to the solution approach or algorithm. For example, the name tmpStateArray might hold an array of communication states of different chat users in a chat application. The name is clearly computerish and algorithm-oriented; a better name would probably be chatStates or communicationStates. Likewise, names such as bitFlag, statusFlag, optionString, etc. should be avoided unless it is totally clear what they mean. Aim for optimum identifier length. Identifier length is important. It is almost always a compromise between conveying sufficient meaning and maintaining readability; in other words, it’s always a compromise between ambiguity and length.

Consider the number of graduated trainees in the sales department. The name numberOfGraduatedTraineesInSalesDepartment is most unambiguous, but rather unwieldy. The names trainees, salesTrainees, gradSalTrn on the other hand, are more convenient but too short and too ambiguous. A reasonable compromise would be numGradTraineesInSales.

Qualify computed values. - If a variable is used for a computed numeric value, such as totals, averages, etc. qualify the variable name accordingly. Examples are: num…, count…, sum…, average…, max…, min… and so on. Use common opposite names for variables that express the boundaries of an interval, or an operation that involves opposites, such as begin/end, first/last, min/max, old/new, lowest/highest, next/previous, source/target, sender/recipient and so on.

Use letters i, j, k for loop index variables and iterators. - This convention is probably as old as C-programming and it is a good practice, since there is no point in inventing fancy names for loop index variables. As always, there is an exception to the rule. If the index is used outside the loop, for example to count records, then a more descriptive name such as recordCount makes the reference outside the loop more readable.

Use is… or has… prefixes for boolean variables and methods that return boolean values. For example, the variable name isCustomer expresses boolean semantics much clearer than just customer. An exception to this rule are words that clearly express a yes/no status, such as ready, done, failed, found, etc. In an object-oriented language, use nouns for classes, objects and interfaces. Use verbs for class methods and functions. Sometimes, it is also meaningful to use adjectives for interfaces or methods, for example Quadratic, Sharable, isQuadratic(), or isShared().

Avoid scope prefixes unless your language forces you to do so. For example, it was customary in the past to prefix class names or functions with module names, for example: Payroll_EmployeeRecord, Payroll_Allowances, Payroll_Reimbursements, and so on. This practice battles namespace pollution, but it is a very poor replacement for actual namespaces, because it leads to incredibly long and cluttered identifiers. Use this only if your language does not support the notions of modules, packages, or namespaces.

Avoid shadowing. If you use identifiers in nested scopes, don’t use the same names to avoid shadowing. For example, if you declare a local variable with the same name as a field/member of the class, then the former shadows the latter. This is not a neat programming trick, but a recipe for disaster. It is quite confusing to read, and even more confusing to debug. An exception to this rule would be getters and setters, where the field/member is explicitly qualified with ‘this’.

Avoid reusing variables, unless the semantics are maintained. Imagine you have a variable named count which is used in one loop to count records, and in another to count errors. It may be tempting to reuse the integer already declared, but it is far more readable and maintainable to use two separate integer variables appropriately named recordCount and errorCount.

Make abbreviations readable and unambiguous. For example, the abbreviated prefix rnd is often used to denote a random value, but it could also mean rounded. If it isn’t clear from the context, write it out; for instance, write randomAmount instead of rndAmount. As a general rule, source code is read more often than it is written, thus readability is more important than writing convenience. With this in mind, if an abbreviation just saves one or two characters, always write out the word. Create abbreviations that can be pronounced. For example, use stringPos rather than stringPstn and recCount rather than rcrdCount. It should be possible to read program code aloud without twisting your tongue. Apart from the advantage of being easier to communicate, pronounceable identifiers are also easier to remember.

best practices