I was recently involved in a project to acquire a software company and its product. One of the requirements from our directors was some due diligence on the product. We didn’t want to get lumbered with some piece of junk that we couldn’t maintain if the original developers quit. I figured that a good starting point before a formal code review would be to run some automated tools on the code to get some metrics on the quality.
These are industry standard packages for automated Java code review based on certain rule sets. The difficulty is in interpreting the reports they generate. If you run CheckStyle on any piece of code it will spit out a lot of information, not all of it pertinent. You also need to relativize the data. Does the fact that 5% of the methods in the product have an excessive Cyclomatic Complexity mean that the product has been written by cowboys?
I decided to compare the product with the Catalina core of the Tomcat Webserver version 5.5.26 and the latest version (5th May 2008, svn HEAD) of the Alfresco document management suite. Two best of breed Java open source packages. Tomcat was chosen as it is a mature and relatively stable package of a similar age to the product (which we will call Kiki). Alfresco is a document management system, so in the same category as Kiki. It has a more modern code base so should have benefited from some of the developments in Java software engineering (Patterns etc). The website SourceKibitzer http://sourcekibitzer.org/ maintains metrics for a number of OSS projects and could be a useful resource for this kind data.
The tools give an indication of code complexity and poor software practices. They don’t say if the code contains logic errors (aka bugs). However more complex, poor written code is more likely to have errors and be harder to maintain and extend.
JavaNCSS is a utility for measuring the number of non commenting source statements in a file, package or project (aka actual lines of codes). It can be run from the command line (note I'm using cygwin under Windows XP):
$ java -cp "ccl.jar;javancss.jar;jhbasic.jar" javancss.Main -gui -recursive "c:/usr/src/product/"
JavaNCSS gives the following figures for the packages tested:
| Project | Packages | Classes | Methods | NCSS | Methods/Class | Lines/Method |
|---|---|---|---|---|---|---|
| Kiki | 42 | 427 | 5720 | 51424 | 13.4 | 8.99 |
| Tomcat 5.5 Catalina | 24 | 365 | 4937 | 42025 | 13.5 | 8.51 |
| Alfresco 2.9 Web Framework | 14 | 172 | 1806 | 9088 | 10.5 | 5.03 |
| Alfresco 2.9 Web Client | 74 | 741 | 6689 | 62941 | 9.0 | 9.41 |
Kiki has similar figures for Methods/Class and Lines/Method to the reference programs. As a general rule methods should not exceed 50 NCSS lines and classes 1000.
Checkstyle automates the processes of checking that Java code adheres to a coding standard and quality metrics. It is configured through an XML file. It can be invoked from ant or maven as part of the build process or directly from the command line:
$ java -jar checkstyle-all-4.4.jar -c sesame-checks.xml -r "c:/usr/src/product/"
Checkstyle was used to measure some standard quality metrics.
Cyclomatic Complexity is a measure of the number of possible code paths through a method and therefore the number of separate test cases required to test the method. It is a measure of the effort required to fully test a method. By implication the higher the number the harder a method is to understand for a programmer.
CCN It is basically a count of the number of if, while, do, for, ?:, catch, switch, case statements, and operators && and || (plus one). The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test.
Various authors and studies have, in fact, suggested that a cyclomatic complexity value of 10 or higher for a particular method is considered complex. Methods with CCN’s of between 8 to 10 are candidates for refactoring. The following table documents the number of methods which reach a certain cyclomatic complexity and the percentage of those methods compared to the total (figure in parenthesis).
| Project | CCN > 7 | CCN > 10 |
|---|---|---|
| Kiki | 346 (6%) | 212 (3.7%) |
| Tomcat 5.5 Catalina | 355 (7.1%) | 227 (4.65) |
| Alfresco 2.9 Web Framework | 36 (2.0%) | 21 (1.2%) |
| Alfresco 2.9 Web Client | 374 (5.6%) | 211 (3.2%) |
Kiki has figures similar to Tomcat’s Catalina Core and the Alfresco Web Client. Outlier methods are particular candidates for refactoring and where effort should be focused initially. There are 70 methods in the Product with CCN’s above 20. One class has five such methods alone.
Figure 1: Outlier methods on the right of graph
Cyclomatic complexity says that testing effort is related to the number code paths through the method. The NPATH complexity metric expands on this and evaluated the number of acyclic execution paths, taking into account the nesting of conditional statements (nesting depth) and multi-part boolean expressions (e.g., A && B, C || D, etc.) but ignoring loops. It is considered to be a more complete measure of method complexity.
Complex methods are hard to understand. They should be refactored into several shorter methods focused on specific tasks. Methods with NPATH’s greater than 200 are candidates for further decomposition.
| Project | >200 |
|---|---|
| Kiki | 114 (2.0%) |
| Tomcat 5.5 Catalina | 151 (3.1%) |
| Alfresco 2.9 Web Framework | 11 (0.6%) |
| Alfresco 2.9 Web Client | 92 (1.4%) |
Kiki falls someway short of Alfresco’s score but score better than Tomcat.
We used Checkstyle to determine the JavaNCSS counts. This score evaluates the complexity of methods, classes and files by counting the Non Commenting Source Statements (NCSS).
Large methods and classes are hard to read and costly to maintain. A large NCSS number often means that a method or class has too much functionally which should be decomposed into smaller units.
| Project | Class > 1500 | Method > 50 |
|---|---|---|
| Kiki | 3 (0.7%) | 116 (2.0% |
| Tomcat 5.5 Catalina | 1 (0.3%) | 81 (1.6%) |
| Alfresco 2.9 Web Framework | 0 (0.0%) | 6 (0.3%) |
| Alfresco 2.9 Web Client | 0 (0.0%) | 111 (1.7%) |
Kiki has some large classes and methods that should be refactored.
This is a measure of the number of other classes a given class relies on. It indicates the amount of maintenence a program requires. Classes with a score greater than 20 should be examined.
| Project | CFOC>20 |
|---|---|
| Kiki | 39 (9.1%) |
| Tomcat 5.5 Catalina | 40 (11.0%) |
| Alfresco 2.9 Web Framework | 6 (3.5%) |
| Alfresco 2.9 Web Client | 70 (9.4%) |
This metric measures the number of instantiations of other classes within the given class. This type of coupling is not caused by inheritance. If a class has a local variable that is an instantiation (object) of another class, there is data abstraction coupling. DACs above 7 indicate an overly complicated class structure.
| Project | DAC>7 |
|---|---|
| Kiki | 53 (12.4%) |
| Tomcat 5.5 Catalina | 42 (11.5%) |
| Alfresco 2.9 Web Framework | 7 (4.1%) |
| Alfresco 2.9 Web Client | 41 (5.5%) |
Kiki performs poorly on this score compared to Alfresco but is not significantly worse than Tomcat.
This is a measure of the number Boolean operators (&&, ||, &, | and ^) in an expression. Too many conditions leads to code that is difficult to read, debug and maintain.
| Project | BEC>3 |
|---|---|
| Kiki | 21 (0.4%) |
| Tomcat 5.5 Catalina | 16 (0.3%) |
| Alfresco 2.9 Web Framework | 1 (0.1%) |
| Alfresco 2.9 Web Client | 16 (0.2%) |
There is a significant overlap between CheckStyle and PMD. However PMD is more flexible for finding potential problems including:
It can be integrated into the Ant build task or run from the command line:
$ java -cp "lib/pmd-4.2.1.jar;lib/jaxen-1.1.1.jar;lib/asm-3.1.jar" net.sourceforge.pmd.PMD "c:/sezame/sezame/java/src/com" text basic
Running every rule-set will result in a huge number of rule violations, most of which will be unimportant. As a general strategy unusedcode should be run to remove any unused locals and fields. Then basic for standard programming errors. Design will pick up design issues.
For this test we ran the basic rule-set that evaluates a set of common best practices for Java coding. This table contains a selection of the results.
| Project | Empty Catch | Empty If/While | Boolean Instantiation | Unecessary Return | RedundantNested Ifs | Useless Overides | Call on null Object |
|---|---|---|---|---|---|---|---|
| Kiki | 59 | 41 | 19 | 10 | 105 | 7 | 0 |
| Tomcat 5.5 Catalina | 93 | 15 | 47 | 1 | 59 | 6 | 0 |
| Alfresco 2.9 Web Framework | 22 | 7 | 1 | 0 | 6 | 5 | 2 |
| Alfresco 2.9 Web Client | 156 | 6 | 14 | 0 | 47 | 10 | 3 |
This ruleset finds code size related issues. We have ignored the metrics already tested by CodeStyle (CCN, NPath etc).
| Project | Long Classes | Long Methods | Excessive Methods | Excessive Fields | Excessive Publics | Long Param Lists |
|---|---|---|---|---|---|---|
| Kiki | 12 | 89 | 39 | 35 | 26 | 1 |
| Tomcat 5.5 Catalina | 26 | 54 | 69 | 20 | 14 | 0 |
| Alfresco 2.9 Web Framework | 1 | 10 | 16 | 1 | 7 | 1 |
| Alfresco 2.9 Web Client | 14 | 119 | 46 | 29 | 13 | 7 |
This rule set identifies questionable design decisions. Only a selection of the problems are shown in this table – PMD is capable of identifying a wide range of design problems. These should be considered as warnings rather than errors. Some of behaviour may be correct in the particular context. For example a class with just static methods should be refactored to a Singleton except for abstract classes with subclasses. PMD cannot differentiate between these cases.
| Project | Switch Issues | Refactor to Singleton | Rethrown Exception | Deeply Nested ifs | String Comparisons | Synchroniza-tion | Localization |
|---|---|---|---|---|---|---|---|
| Kiki | 27 | 15 | 94 | 93 | 47 | 84 | 107 |
| Tomcat 5.5 Catalina | 8 | 18 | 89 | 18 | 127 | 0 | 54 |
| Alfresco 2.9 Web Framework | 10 | 24 | 1 | 12 | 7 | 14 | 3 |
| Alfresco 2.9 Web Client | 26 | 9 | 18 | 108 | 18 | 5 | 11 |
The main issues with Kiki is are
The automated tools have identified a number of problems with the Kiki that should be investigated. The code base is generally better than the Tomcat Catalina core and comparable with the newer Alfresco Web Client.
The main issues for improvement are a decomposition of certain large classes and methods. In particular, to reduce method complexity. Better exception handling. A reduction in the amount of code synchronized on Object monitors.
The code base could benefit from better programming practices: application of patterns, understanding of object orientation issues such as programming to interfaces, method and class length, coupling of objects, composition rather than inheritance.
A typical case is that of Kiki struts classes. These have a high CCN but low NPath value. One paradigm for programming to the Struts package is to use a lot of switch statements. This increases the CCN. Switch statements can be refactored using the strategy pattern. However in this particular limited case a decision may be taken to ignore the CCN value, at least for existing code.
The three tools that were used in this exercise can be integrated into the build process to produce automated reports on code quality issues and trends. However teams should be wary about enforcing particular measures as programmers can find ways to code around particular metrics which may, in fact, increase code complexity and maintenance costs. For example an if/for/while statement can be extracted into another method but this has made the code more complex. In the end you can’t replace a human code review with an automated tool.