Translations of this page:

A strategy for automated Java code review

I was recently involved in a project to acquire a software company and its product. One of the requirements from our directors was some due diligence on the product. We didn’t want to get lumbered with some piece of junk that we couldn’t maintain if the original developers quit. I figured that a good starting point before a formal code review would be to run some automated tools on the code to get some metrics on the quality.

  1. JavaNCSS
  2. CheckStyle
  3. PMD

These are industry standard packages for automated Java code review based on certain rule sets. The difficulty is in interpreting the reports they generate. If you run CheckStyle on any piece of code it will spit out a lot of information, not all of it pertinent. You also need to relativize the data. Does the fact that 5% of the methods in the product have an excessive Cyclomatic Complexity mean that the product has been written by cowboys?

I decided to compare the product with the Catalina core of the Tomcat Webserver version 5.5.26 and the latest version (5th May 2008, svn HEAD) of the Alfresco document management suite. Two best of breed Java open source packages. Tomcat was chosen as it is a mature and relatively stable package of a similar age to the product (which we will call Kiki). Alfresco is a document management system, so in the same category as Kiki. It has a more modern code base so should have benefited from some of the developments in Java software engineering (Patterns etc). The website SourceKibitzer http://sourcekibitzer.org/ maintains metrics for a number of OSS projects and could be a useful resource for this kind data.

The tools give an indication of code complexity and poor software practices. They don’t say if the code contains logic errors (aka bugs). However more complex, poor written code is more likely to have errors and be harder to maintain and extend.

JavaNCSS

JavaNCSS is a utility for measuring the number of non commenting source statements in a file, package or project (aka actual lines of codes). It can be run from the command line (note I'm using cygwin under Windows XP):

$ java -cp "ccl.jar;javancss.jar;jhbasic.jar" javancss.Main -gui -recursive "c:/usr/src/product/"

JavaNCSS gives the following figures for the packages tested:

Project Packages Classes Methods NCSS Methods/Class Lines/Method
Kiki 42 427 5720 51424 13.4 8.99
Tomcat 5.5 Catalina 24 365 4937 42025 13.5 8.51
Alfresco 2.9 Web Framework 14 172 1806 9088 10.5 5.03
Alfresco 2.9 Web Client 74 741 6689 62941 9.0 9.41

Kiki has similar figures for Methods/Class and Lines/Method to the reference programs. As a general rule methods should not exceed 50 NCSS lines and classes 1000.

CheckStyle

Checkstyle automates the processes of checking that Java code adheres to a coding standard and quality metrics. It is configured through an XML file. It can be invoked from ant or maven as part of the build process or directly from the command line:

$ java -jar checkstyle-all-4.4.jar -c sesame-checks.xml  -r "c:/usr/src/product/"

Checkstyle was used to measure some standard quality metrics.

Cyclomatic Complexity (CCN)

Cyclomatic Complexity is a measure of the number of possible code paths through a method and therefore the number of separate test cases required to test the method. It is a measure of the effort required to fully test a method. By implication the higher the number the harder a method is to understand for a programmer.

CCN It is basically a count of the number of if, while, do, for, ?:, catch, switch, case statements, and operators && and || (plus one). The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test.

Various authors and studies have, in fact, suggested that a cyclomatic complexity value of 10 or higher for a particular method is considered complex. Methods with CCN’s of between 8 to 10 are candidates for refactoring. The following table documents the number of methods which reach a certain cyclomatic complexity and the percentage of those methods compared to the total (figure in parenthesis).

Project CCN > 7 CCN > 10
Kiki 346 (6%) 212 (3.7%)
Tomcat 5.5 Catalina 355 (7.1%) 227 (4.65)
Alfresco 2.9 Web Framework 36 (2.0%) 21 (1.2%)
Alfresco 2.9 Web Client 374 (5.6%) 211 (3.2%)

Kiki has figures similar to Tomcat’s Catalina Core and the Alfresco Web Client. Outlier methods are particular candidates for refactoring and where effort should be focused initially. There are 70 methods in the Product with CCN’s above 20. One class has five such methods alone.

CCN Outliers

Figure 1: Outlier methods on the right of graph

Npath Complexity

Cyclomatic complexity says that testing effort is related to the number code paths through the method. The NPATH complexity metric expands on this and evaluated the number of acyclic execution paths, taking into account the nesting of conditional statements (nesting depth) and multi-part boolean expressions (e.g., A && B, C || D, etc.) but ignoring loops. It is considered to be a more complete measure of method complexity.

Complex methods are hard to understand. They should be refactored into several shorter methods focused on specific tasks. Methods with NPATH’s greater than 200 are candidates for further decomposition.

Project >200
Kiki 114 (2.0%)
Tomcat 5.5 Catalina 151 (3.1%)
Alfresco 2.9 Web Framework 11 (0.6%)
Alfresco 2.9 Web Client 92 (1.4%)

Kiki falls someway short of Alfresco’s score but score better than Tomcat.

JavaNCSS

We used Checkstyle to determine the JavaNCSS counts. This score evaluates the complexity of methods, classes and files by counting the Non Commenting Source Statements (NCSS).

Large methods and classes are hard to read and costly to maintain. A large NCSS number often means that a method or class has too much functionally which should be decomposed into smaller units.

Project Class > 1500 Method > 50
Kiki 3 (0.7%) 116 (2.0%
Tomcat 5.5 Catalina 1 (0.3%) 81 (1.6%)
Alfresco 2.9 Web Framework 0 (0.0%) 6 (0.3%)
Alfresco 2.9 Web Client 0 (0.0%) 111 (1.7%)

Kiki has some large classes and methods that should be refactored.

Class Fan Out Complexity (CFOC)

This is a measure of the number of other classes a given class relies on. It indicates the amount of maintenence a program requires. Classes with a score greater than 20 should be examined.

Project CFOC>20
Kiki 39 (9.1%)
Tomcat 5.5 Catalina 40 (11.0%)
Alfresco 2.9 Web Framework 6 (3.5%)
Alfresco 2.9 Web Client 70 (9.4%)

Class Data Abstraction Coupling (DAC)

This metric measures the number of instantiations of other classes within the given class. This type of coupling is not caused by inheritance. If a class has a local variable that is an instantiation (object) of another class, there is data abstraction coupling. DACs above 7 indicate an overly complicated class structure.

Project DAC>7
Kiki 53 (12.4%)
Tomcat 5.5 Catalina 42 (11.5%)
Alfresco 2.9 Web Framework 7 (4.1%)
Alfresco 2.9 Web Client 41 (5.5%)

Kiki performs poorly on this score compared to Alfresco but is not significantly worse than Tomcat.

Boolean Expression Complexity (BEC)

This is a measure of the number Boolean operators (&&, ||, &, | and ^) in an expression. Too many conditions leads to code that is difficult to read, debug and maintain.

Project BEC>3
Kiki 21 (0.4%)
Tomcat 5.5 Catalina 16 (0.3%)
Alfresco 2.9 Web Framework 1 (0.1%)
Alfresco 2.9 Web Client 16 (0.2%)

PMD

There is a significant overlap between CheckStyle and PMD. However PMD is more flexible for finding potential problems including:

  • Possible bugs - empty try/catch/finally/switch statements
  • Dead code - unused local variables, parameters and private methods
  • Suboptimal code - wasteful String/StringBuffer usage
  • Duplicate code - copied/pasted code means copied/pasted bugs

It can be integrated into the Ant build task or run from the command line:

$ java -cp "lib/pmd-4.2.1.jar;lib/jaxen-1.1.1.jar;lib/asm-3.1.jar" net.sourceforge.pmd.PMD "c:/sezame/sezame/java/src/com" text  basic

Running every rule-set will result in a huge number of rule violations, most of which will be unimportant. As a general strategy unusedcode should be run to remove any unused locals and fields. Then basic for standard programming errors. Design will pick up design issues.

Basic Ruleset

For this test we ran the basic rule-set that evaluates a set of common best practices for Java coding. This table contains a selection of the results.

Project Empty Catch Empty If/While Boolean Instantiation Unecessary Return RedundantNested Ifs Useless Overides Call on null Object
Kiki 59 41 19 10 105 7 0
Tomcat 5.5 Catalina 93 15 47 1 59 6 0
Alfresco 2.9 Web Framework 22 7 1 0 6 5 2
Alfresco 2.9 Web Client 156 6 14 0 47 10 3
  • Empty Catch Block places where an exception is caught but nothing is done. The exception may be swallowed without being reported.
  • Empty if/while conditions is checked but there is no action.
  • Redundant Nested Ifs can be collapsed
  • Useless overrides are methods that just call the super() method

Code Size

This ruleset finds code size related issues. We have ignored the metrics already tested by CodeStyle (CCN, NPath etc).

Project Long Classes Long Methods Excessive Methods Excessive Fields Excessive Publics Long Param Lists
Kiki 12 89 39 35 26 1
Tomcat 5.5 Catalina 26 54 69 20 14 0
Alfresco 2.9 Web Framework 1 10 16 1 7 1
Alfresco 2.9 Web Client 14 119 46 29 13 7
  • Long classes and methods should be decomposed
  • A large number of public methods and attributes increases testing effort exponentially
  • Excessive fields indicate that the Class should be decomposed further
  • Classes with excessive methods should be refactored to be more fine grained
  • Long parameter lists indicate a new Class should be created to wrap parameters.

Design

This rule set identifies questionable design decisions. Only a selection of the problems are shown in this table – PMD is capable of identifying a wide range of design problems. These should be considered as warnings rather than errors. Some of behaviour may be correct in the particular context. For example a class with just static methods should be refactored to a Singleton except for abstract classes with subclasses. PMD cannot differentiate between these cases.

Project Switch Issues Refactor to Singleton Rethrown Exception Deeply Nested ifs String Comparisons Synchroniza-tion Localization
Kiki 27 15 94 93 47 84 107
Tomcat 5.5 Catalina 8 18 89 18 127 0 54
Alfresco 2.9 Web Framework 10 24 1 12 7 14 3
Alfresco 2.9 Web Client 26 9 18 108 18 5 11

The main issues with Kiki is are

  • eeply nested if statements which are hard to understand
  • Re-thrown exceptions where the original stack trace is lost – we recently fixed an issue directly related to this rule
  • Switch issues require a redesign of the code.
  • Synchronization issues where a whole class is locked for a synchronized code block
  • Localization issues for Date/String comparisons

Conclusion

The automated tools have identified a number of problems with the Kiki that should be investigated. The code base is generally better than the Tomcat Catalina core and comparable with the newer Alfresco Web Client.

The main issues for improvement are a decomposition of certain large classes and methods. In particular, to reduce method complexity. Better exception handling. A reduction in the amount of code synchronized on Object monitors.

The code base could benefit from better programming practices: application of patterns, understanding of object orientation issues such as programming to interfaces, method and class length, coupling of objects, composition rather than inheritance.

A typical case is that of Kiki struts classes. These have a high CCN but low NPath value. One paradigm for programming to the Struts package is to use a lot of switch statements. This increases the CCN. Switch statements can be refactored using the strategy pattern. However in this particular limited case a decision may be taken to ignore the CCN value, at least for existing code.

The three tools that were used in this exercise can be integrated into the build process to produce automated reports on code quality issues and trends. However teams should be wary about enforcing particular measures as programmers can find ways to code around particular metrics which may, in fact, increase code complexity and maintenance costs. For example an if/for/while statement can be extracted into another method but this has made the code more complex. In the end you can’t replace a human code review with an automated tool.

tech/java/java-code-metrics.txt · Last modified: 2008/05/08 15:57 by davidof
Recent changes RSS feed