Coding interviews – rad or fad?

As a developer, you probably have gone through one or more coding interviews. When applying for a developer job, you are expected to demonstrate some level of programming chops by solving algorithmic problems on a whiteboard or in an online code editor. This practice is not new. My first coding interview took place more than 25 years ago sitting at a lovely green text terminal. However, it has recently proliferated and became an integral and often time-consuming part of the interview process. Especially big American tech companies like Amazon, Apple, Facebook, Google and Microsoft are pushing the envelope. To land a developer job at one of these companies, you are now expected to be able to recite the finer points of an algorithm manual and produce solutions with O log(n) time complexity in a trice.

Does that sound just slightly ridiculous to you? Perhaps it is. The question one must ask is whether having candidates produce algorithms is actually a good indicator for success in the software development profession. I have been sitting at both ends of the interviewer’s table many times, and I am quite ready to doubt this implication. There isn’t much empirical evidence to support the thesis that people who score high on coding interviews also do well in their jobs. Actually, there is none. Even Google’s own study concluded that interview scores don’t correlate with job performance. Gayle Laakmann Mc Dowell, the author of the book “Cracking the Coding Interview” says: “No one really knows, but it’s very reasonable to assume that there is a link,” and “Absence of evidence isn’t evidence of absence.”

Sorry, that’s not good enough for me. Let’s be a little more scientific.

Perhaps we begin by determining what is actually measured by solving an algorithmic problem. The coding interview is akin to a school exam. It measures how good you are at applying textbook solutions to a given task. Just as in an exam situation, you can improve your score by studying hard. The domain is broad enough to provide sufficient variation in different fields from number theory and graph theory to combinatorics and set theory. Yet, algorithms themselves are very narrow as they deal with a specific solution to a specific problem. A successful coding interview demonstrates three things: 1. the candidate can write code, 2. the candidate has a grasp of algorithms and can memorize textbook solutions, 3. the candidate can reproduce and explain these solutions.

At first glance, this seems useful. These traits are surely expedient in typical development work. They’re kind of expected from developers. But does the coding interview also measure how well pronounced these traits in a particular applicant are? Can standardised interviews establish a framework for comparison? The answer is sadly no. One applicant might receive a familiar problem that he can solve and pass with flying colours. Another more talented applicant might have never before encountered the same problem and fails to solve it. A third applicant may get bogged down by anxiety. Yet another applicant may solve the problem, but produces bugs because of using an unfamiliar language. The result is influenced by many factors and it is notoriously unreliable.

This is not the only problem. There is a more serious one: the coding interview only probes a very narrow set of skills. Being able to produce algorithms is not sufficient to become a successful developer. For example, a developer also has to read and interpret requirements, communicate within a team and with business people, have knowledge of technologies and best practices, perhaps also have an understanding of certain industries. In this light, being able to memorise textbook algorithms seems somewhat less relevant. After all, you can always look up a manual when in need of an optimised algorithm for a specific problem. This is probably what most developers would do in their day-to-day work.

I remember a peculiar coding interview that took place 10 years ago on the phone before shared online code editors became a thing. I was asked to spell out an algorithm that produces the series of prime numbers. “No problem,” I said, “just let me google the sieve of Eratosthenes.” “No no, let’s just improvise,” said the interviewer, “it doesn’t have to be perfect.” So I scribbled down some similar, yet doubtlessly inferior code that mimicked Eratosthenes’ idea and read it out loud (awkwardly) on the phone. Something I would never do in an actual work situation. Yet, I passed the interview and got the assignment.

Is this a good predictor of professional performance? No, it isn’t. It is way too arbitrary and readily produces false negatives as well as false positives. There is a considerable amount of HR management literature that has identified good and bad predictors for work performance. Among the worst predictors are first impressions, school grades, and brain teaser questions, such as “how many golf balls can you fit into a phone booth?” Finding the answer to that is a lot like devising an algorithm for a geometry problem. You get the clue. Case in point: Max Howell, the author of the popular “Homebrew” package manager for macOS was (infamously) rejected in a Google interview, because he was not able to produce an algorithm for inverting a binary tree. Howell’s software is used by millions of Mac users and is now maintained by a team of 21 people. Yet, he wasn’t good enough for Google.

One of the best predictors for future job performance is past performance. Duh. It’s kind of obvious and also backed up by evidence in the form of empirical studies in management. Had Google considered past performance instead of algorithms, the company would now be richer by one accomplished engineer, who according to his own words cares more about user experience than about computer science. People often have special talents that evade the standardised interview process. They will not escape an experienced interviewer, however. The only problem with past performance is that data is not always available. For example, what about grad students, career changers, and people with little work experience? Well, the second most reliable predictor for job performance is cognitive ability, also known as “g factor” or general intelligence. The literature is again quite solid on this point.

It stands to reason that a standardised and culturally neutral IQ test provides a much better predictor for future performance. And by “much better” I mean an order of magnitude better. It is also less time-consuming than going through a series of algorithm/coding problems. So why are companies not using them? One reason is that in certain countries, such as the USA, IQ tests are seen as potentially discriminatory and could therefore be legally contestable. This is obviously a matter of policy, and well idiocy. Encountering an IQ test in a job interview appears to raise more eyebrows than encountering a esoteric programming problems. Another reason is that high quality standardised and culturally neutral IQ tests are somewhat difficult to develop. And new ones would have to be developed all the time to ensure high quality results and to prevent abuse. Yet another predictor for career success was recently (2013) suggested by Angela Lee Duckworth in a TED talk. She called it “grit”, or the “power of passion and perseverance” and subsequently received a lot of attention from the tech industry. The problem with “grit” is that as of now no reliable way of measuring it has been devised.

So, the question that springs to mind is: why do companies use coding interviews in the selection process if they already know that the resulting score is a poor predictor for career success. Frankly, I don’t know. If you’re a big tech HR manager, please comment and enlighten us. I can only speculate. Companies like Facebook, Google or Apple are swamped with applications and might have to create artificial barriers to filter applications and reduce the number of candidates to a manageable amount for round two. What does this artificial barrier achieve? Perhaps it is about grit. Only the ambitious candidates will pass, namely the ones that are willing to learn and memorise algorithm textbooks just for the interview, and just for getting the chance to land a job with that big tech company. It is reasonable to assume that someone who has the motivation to put in three or four weeks of cramming for an interview, is also motivated to put in hard work into their job. In this case, developer interviews could just as well require candidates to conjugate Latin declinations and cite Roman philosophers. The effect would be the same, although it would certainly raise many eyebrows.

Another plausible explanation is that companies nowadays tend to put processes over people. The decision making process, especially in large enterprises, is highly formalised. Instead of leaving the hiring decision to an experienced HR manager who relies on their personal knowledge and skills, it is relegated to an impersonal procedure that produces data. On one hand, this frees decision makers from personal accountability. On the other hand it provides an instrument that top management can manipulate at will. For example, if management should want to steer their engineering staff into a new direction, it could replace algorithm questions with machine learning questions. Whatever the reasons may be, we can be sure that thousands of smaller tech companies will follow suit and imitate the example of Google, Microsoft and others, even if they don’t understand their reasons and even if the adopted procedures aren’t fitting their own true needs.

In the end, I believe coding interviews are not totally useless. They are an effective means to engage the candidate in his field and simulate a work situation. The coding interview gives the interviewer a chance to observe how a candidate approaches technical problems. For example, it may reveal whether the candidate’s course of action is methodical, or whether they have exaggerated their familiarity with a certain technology. Under ideal circumstances, it can reveal a lot about how a developer thinks. However, since an interview is a special situation these indicators need to be evaluated carefully. For the benefit of all involved parties, the coding interview is best conducted in a casual way. An expanded problem space leaves the candidate with more choices and is is likely to provide more information. I don’t see any good reason, why coding interviews should be conducted like a university exam, however. If you can think of any, please let us know.

The code is the documentation

The first time I heard someone saying: “the code is the documentation”, I thought it sounded completely wrong, like a lazy excuse for not producing documentation. However, it kept me thinking and I realised that there is also truth in this statement. This paradoxical thought-provoking quality makes it a proper mantra for agile practitioners, because it expresses a fundamental agile value: “working software over comprehensive documentation”.

Before we go further into details, I should dismiss the notion that agile developers don’t write documentation or misconceive its worth. Nope. We still produce documentation. However, we also apply the following agile principle to it: “Simplicity, the art of maximising the amount of work not done, is essential.” It is expedient to minimise documentation by adhering to practices that reduce the need for it. In agile development, documentation takes on the role of accessory parts whereas the primary attention is given to the codebase. Or more succinctly:

“Truth can only be found in one place: the code.” (Robert C. Martin, Clean Code)

The codebase is the ultimate source of truth. It is referenced in case of doubt, when a question is either too detailed or if the documentation is out-of-date. The code provides insight where no other method of reference is available. With this in mind, it is evident that code should be written in a way that is well-structured and understandable. Self-documenting code reduces or disposes the need for external documentation. Knowledge is represented in a single artefact and there is no need to synchronise multiple sources. Unfortunately, most real-world codebases do not have these ideal qualities. There could be many reasons for that, but the most common reason is that software entropy has taken its toll over time, and that too little attention was paid to refactoring.

So, code quality, self-documenting code, and continuous refactoring go hand in hand. It is important to understand that the highest code quality is achieved through conceptual clarity. Conceptual clarity comes from good naming and  good structure. This is by far more important than coding style, naming conventions, formatting and other external features, although the latter do of course contribute to code quality. Naming and structure cannot be tested automatically. Unlike code style, conventions, and formatting, they require human perception and intelligence. Which is one more reason to adopt such practices as code reviews and pair programming.

Coming back to “the code is the documentation”, I think the best way to understand this phrase is as an abstract ideal that ought to be worked towards. In an ideal world, code doesn’t need additional documentation, because it is so beautifully clear that it can be understood by anyone without any prior knowledge. It answers all questions that might arise about the software. It clarifies the intentions of the programmer and it is therefore also easy to maintain and change. Obviously, this is really difficult to achieve in the real world, especially across a large codebase, but its self-documenting properties are quite likely the best measure of code quality.

Code Review Antipatterns

Don’t you love code reviews? Having your grand solution scrutinised, criticised, and optimised by your colleagues makes you feel awkward? It doesn’t have to be that way. Code reviews are actually fun if done right and if the participants are aware of the pitfalls. They are not just conducive to achieving a high level of code quality, but they are also an ongoing learning experience.

There are three basic forms of code reviews: informal reviews (“hey, can you have a look at this?”), formal reviews (a formal task assigned to one ore more peers), or pair programming. The latter form has the advantage that it is done at the time when code is created. It is also the only form where two individuals engage in a direct conversation. There is no time lag. This makes pair programming the preferential form of review for high-priority tasks.

Agile teams working with git (or any other version control system) often encounter code reviews as a step in their code merging workflow. The typical unit of code being reviewed is a merge request, aka pull request in this setting. So, here are a number of things to avoid when doing a code review.

Not doing code reviewsLet’s begin with the obvious. Four eyes (or six or eight…) see more than two. Errors, problems, vulnerabilities, and bad design gets spotted early on. Corrective measures are still inexpensive at this stage. There is a dual benefit. Code quality increases and knowledge is shared. At least two people of a team are familiar with each feature or function, namely the author and the reviewer.

Confusing code reviews with functional reviewsCode reviews focus on code. Duh! The review is primarily (but not exclusively) concerned with the non-functional properties of the software. Functional reviews, on the other hand, ensure that functional requirements are met. The latter activity is closer to testing, therefore it makes sense to perform functional reviews in a separate procedure, probably as part of testing.

Bitching, nitpicking, fault-finding, patronising – It goes without saying that treating colleagues with kindness and respect is necessary for productive collaboration. Code reviews are no exception. Offensive behaviours only derail the process, while courteous and supportive comments go a long way. The reviewer must guard against coming across as a know-it-all and keep office politics out of the dialogue. In written reviews, annotators should aim at being helpful without being wordy.

Discussing code style conventions – Code reviews should not be about lexical code style, formatting and programming conventions. There are automated tools for this. If you care about formatting and code style, use a linter. More importantly, decide on a set of programming style conventions and configure the linter accordingly. It is totally OK if the team agrees to leave certain options open to individual preference. The formatting and style of code that passes the linter should not be argued, because it’s simply a waste of time.

Ignoring naming – Programming conventions may or may not encompass identifier naming conventions. However, linters cannot determine whether a given variable name is good or bad. Machines don’t care about names, only humans do. Phil Karlton has described naming as one of the two hard things in computer science. By all means, do review the identifier naming in the code review. As a rule of thumb, if an identifier name leaves you puzzled the first time you read it, it is probably not well chosen.

Debate – Code reviews aren’t meant as a podium for discussion. The objective is to improve the resulting artifact, not to find out who has the better arguments. Sometimes, people do have different opinions about practices and solutions. It may be helpful to resolve these by giving examples and references to commonly accepted best practices. If that is to no avail, a third party (or fourth or fifth…) should be be involved. In a written review, consider involving a third party, whenever the dialogue drifts into debate.

Not getting a third opinion – Involving a third party into the review is not only useful in cases where two people cannot agree. See first antipattern: four eyes are better than two, and six are better than four. If an important architectural question is at stake, why not involve the whole team? See: mob programming.

Reviewing huge chunks of code – Code reviews should be limited in scope, ideally to a few pages of code that can be completely read and understood in 20 minutes. Studies have shown that thoroughness and number of spotted issues are inversely proportional to the amount of code being reviewed. Therefore, results are improved if smaller units of code are being reviewed. Including time for reflection and annotations or discussion, the entire review should not take longer than 45 minutes. Anything exceeding one hour is questionable.

Unidirectional reviewing – The general idea is that everyone should review everyone’s code. Most teams have both senior and junior developers on board or perhaps a team lead or an architect. This often leads to the situation that seniors only review juniors, but not vice versa. This is wrong. First of all, seniors can make mistakes, too. Second, juniors can benefit from reading code written by senior developers. Third, varying fields of specialisation can be leveraged. A junior developer might have special knowledge or skills in a certain narrow area.

Rewriting code – It’s called code review, not code rewrite! Rewriting someone’s code is just not appropriate in most situations. It’s akin to pair programming where the navigator just grabs the keyboard from the driver whenever he feels like it. It’s simply rude. If the reviewer has a better idea, he should annotate the merge request with code examples.

Overdoing it – Programming is not a beauty contest. Code does not need to be perfect to be functional and maintainable. For example, even though I might prefer a functional loop construct over an imperative one, it’s just fine if the imperative one works. Perfectionism sometimes gets in the way and leads to other antipatterns such as premature optimisation and presumptive features (YAGNI). Also: A piece of code written for a one-time migration or ops procedure does not require the same high standards as production system code.

Underdoing it – Being too forgiving is likewise unhelpful. If the code review is executed as a mere approval formality, it’s a waste of time. Code should always be inspected for things like security holes, logical errors, code smells, technical debt and feature creep. It is important to strike a balance between clean maintainable code and reasonable effort.

Not reviewing tests – Unit tests are often mandatory and sometimes functional and acceptance tests are also added to the artifact. These should also be reviewed to ensure that the tests are present, logically correct and meet the standards. It’s a good idea to verify an agreed upon code coverage percentage as part of the review.

Not using a diff utility – Most teams probably use a tool set with built-in diff and annotation capability. In case you don’t, you make your work harder than it has to be.

The Agile Samurai

The Agile Samurai

The Agile Samurai
by Jonathan Rasmusson
1st edition, 280 pages
Pragmatic Bookshelf

Book Review

Over the last ten years, I've been working with teams with different degrees of commitment to the agile process, ranging from non-existing to quite strong. I was looking for a text that summarises agile methodology to help me formalise and articulate my own experiences, and of course to enhance my knowledge of some of the finer points of agile practices. I have to admit that this book did not meet my expectations. The first eighty pages up to chapter six are mostly about project inception and read like a prolonged introduction. From chapter six onwards, the author finally comes to the point and discusses the core concepts of agile processes, so the book does get better with increasing page numbers. Unfortunately, Scrum isn't discussed at all, instead Kanban is introduced in chapter eight. The discussion of typical technical processes, such as refactoring, TDD, and continuous integration is compacted into several brief chapters at the end of the book.

The writing style is very informal; the author uses a conversational tone throughout the book. Almost every page contains illustrations, which makes it an easy and quick read. The style of the book is comparable to the Head First books. It left me with the the impression that I sat in an all-day meeting where someone said a lot of intelligent things to which everyone else agreed. Unfortunately, not many of these things seemed radically new or thought-provoking, so I fear I won't remember many of them next month. Of course, this may be entirely my own fault. I prefer a more formal, concise, old-school language. I also prefer dense and meaty text books with lots of diagrams, numbers and formulas. In return, I can dispense with stick figures, pictograms, and even with Master Sensei (a guru character used in the book). I feel that a lot of the deeper and more complex issues of agile project management have simply been left out.

To be fair, it must be mentioned that I probably do not fall into the target group for which this book was written. It is more appropriate as an introductory text for people who are new to agile project management, or even new to the entire business of project management. Think "trial lesson" and "starter course".

NoSQL Databases

NoSQL DatabasesNoSQL databases have entered the radar of web application developers lately. While relational database management systems (RDBMS) have been powering almost every web application on the Internet for more than a decade, this is beginning to change. No longer is the selection of persistence technology a no-brainer. You have additional choices. Besides the old friend RDBMS, there are object-oriented databases, graph-oriented databases, key-value stores, column-oriented databases, and other options. Many of the newer products in this area are known as NoSQL databases. NoSQL is a movement that promotes persistence technologies that break with the conventional relational model. NoSQL databases typically don’t have tables schemas, SQL support, and are designed to scale horizontally.

For those of you old enough to remember Dbase, the NoSQL moniker may not be much of an attention grabber, because after all, products like Dbase, FoxPro, Clipper and similar DB systems never had SQL support either. With these systems, relations had to be expressed implicitly in the application and “queries” had to be coded as retrieval sequences. By contrast, modern NoSQL systems depart from the relational model and in many cases also from the tabular data structure, in order to serve use cases where traditional RDBMS fail in one or another way. A typical example would be a sparsely populated table that contains very few data in rows and columns. Such a table -if it grows to a large size- presents an efficiency problem to most RDBMS with resulting performance loss. In the remainder of this article, we will look at a few selected NoSQL databases and see which use cases they cater to.

CouchDB

Apache CouchDB is a document-oriented database that represents documents as JSON objects. CouchDB supports all data types supported by JSON, or respectively by Javascript. The JSON objects are not required to comply with schemas and can therefore be defined freely, which means that each JSON object can have a different structure. CouchDB supports queries by views. Views are aggregate functions and filters programmed in Javascript that follow the MapReduce algorithm. Views are stored and indexed in the database. CouchDB provides a RESTful API where every object (and any other item) in the database can be retrieved by an URL. It uses the HTTP POST, GET, PUT, and DELETE methods for CRUD operations. Other features include ACID semantics on basis of multi-version concurrency control, similar to RDBMS, which is optimised for a high number of concurrent reads, and a distributed architecture that allows for easy bidirectional replication and offline usage. CouchDB is thus designed from ground up for Internet use.

Neo4J

Neo4J is a graph database. As the name suggests, it is intended for use with the Java platform, which includes any language that runs on the JVM. Neo4J stores information in nodes and edges; the latter are called relationships in case of Neo4J. Relationships are always of a defined type. Both nodes and relationships can store properties, i.e. data. The Neo4J database is thus optimised for representing complex graph and network structures, such as a hierarchical object repository or a social network. It offers high-performance graph traversal operations for data access. Nodes can also be indexed and retrieved by key which enables more conventional style queries. Additional features include ACID transactions and transaction recovery, based on the Java Transaction API (JTA). Optional libraries can expose a Neo4J database as an RDF store where the node space can be queried using SPARQL. Neo4J is an embedded database with a small footprint that runs in the same JVM as the application.

Redis

Redis is a modern implementation of a persistent key-value store for general purpose use. Key-value store is a name for a simple key-based access mechanism that basically implements a dictionary (or map) data structure. Traditionally, such systems were used for caching and Redis holds its entire database in memory, which makes it ideal for applications that require ultra-fast data access. Redis allows not just plain string data but also allows sets and lists of strings in the data space. The system offers a number of special commands, such as atomic push/pop and add/remove operations for lists and set operations such as building union, intersection, and difference. Redis persists data either by asynchronously writing memory to disk, or by appending to a journalling file as data is written by clients. Additional features include easy master-slave replication and rudimentary sharding. Redis offers support for various languages, such as C/C++, Java, Scala, PHP and others through native drivers and APIs.

HBase

HBase is a free implementation of Google’s BigTable written in Java. It is not the type of database you would use for a blog or a forum software. HBase is a tabular data storage designed for massive tables in the Petabyte range with billions of rows distributed over a number of physical machines and thus optimised for horizontal scaling. HBase is part of the Apache Hadoop project, a framework for data-intensive distributed applications, inspired by Google’s MapReduce and GFS technologies. Hadoop supports the database through its distributed filesystem HDFS which provides built-in replication and MapReduce traversal for HBase tables of arbitrary size. Features include optimised query push down via server-side scan and get filters, a high performance Thrift gateway, an XLM-based RESTful Webservice gateway, Hadoop cascading, per-column probabilistic Bloom filters, as well as data warehousing and data analysis modules. Since HBase saves column families rather than columns and since empty columns are not stored, it is ideal for sparse tables with semi-structured data. Typical use cases are cloud computing and applications that require massive storage using cheap commodity hardware.

Db4o

Db4o is an open-source object-oriented database system targeted at OOP developers. The idea behind Db4o is to enable programmers to create and persist a representation of the application object model directly in the database without the need for an object-relational mapping software layer. Object instances can then be stored and retrieved with a single line of code. Db4o provides a query mechanism called Native Query (NQ). This allows querying data with native OOP language constructs thus offering type safety for query expressions while eliminating the need for building query strings. Db4o is available for the Java and .NET platforms. If used with .NET languages, data can alternatively be queried with LINQ (language integrated query). The Db4o database is embeddable with a small footprint suitable to be deployed on mobile devices. Additional features include semi-automatic schema versioning, transaction support with ACID semantics, and synchronisation/replication mechanisms that allow synchronisation between different Db4o instances and data export into SQL databases.