Test Smells and Test Code Quality Metrics
09 Dec 2009The major highlight at XP Day 2009, was Mark Striebeck’s talk on unit testing practices at Google. What makes a good test depends on experience , skill and school of thought. I had to agree when he said that developers can be almost religious when it comes to the topic of what makes a good test. This made them solve this problem the Google way, by gathering data. Let the data speak.
He went on to describe metrics that they were collecting on tests and test code. A test that has never failed is likely to be a bad test. If the test was fixed to make the test pass, then this is also an attribute of a bad test. A test can be a good test if the code was fixed to make the test pass.
This got me thinking. Generally I haven’t gathered metrics on test code. We have a pretty good metrics dashboard for production code. What metrics can I gather on test code ?
Metrics on test code should also focus on the readability of the code. Having large test methods is ok, but not too big. My opinion is that a test method with more than 20 lines is too big.
Tests should be concise, the assert should be obvious. Some code duplication is fine to make the test readable. This is all fine, but how can I get these as metrics ? Only way to judge this is to eyeball the tests, and there are differences of opinion.
However, there are ways to measure what a test should not be. These are test smells. Test smells are described in xUnit Test Patterns
I’ve listed a few test smells and NDepend CQL queries find these smells. These can be automated in the build process and flagged up.
Large Test Methods
These can be a chore to read. Tests should be written as simply as possible. These also point to too many responsibilities and dependencies in the code being tested, as most of the test code is used to do setup for the test.
SELECT METHODS WHERE HasAttribute “NUnit.Framework.TestAttribute” AND NbLinesOfCode > 20
Large setup methods
Usually when unit testing the same code, we tend to have a common setup method, in order to make the test more readable. What happens is, more and more code is moved into the common setup method. We get blind to this after a while, and all the dependencies for the test are hidden away. If you do have [Setup] methods, keep them small.
SELECT METHODS WHERE HasAttribute “NUnit.Framework.SetUpAttribute” AND NbLinesOfCode > 10
Deep inheritance trees in test fixtures
Again, common test code moved up to a base class and the base class is used in many tests. Then more base classes are created. This creates more tight coupling between test classes. Which makes tests harder to change. Low coupling and high cohesion applies to test code as well. Make each unit test class as independent as possible.
SELECT TYPES WHERE HasAttribute “NUnit.Framework.TestFixtureAttribute” AND DepthOfInheritance >2
Test fixture setup
TestFixtureSetup is bad. The TestFixtureSetup is run once before all tests. This leads to fragile tests and inadvertently leads to using some shared state. Use Setup instead
SELECT METHODS WHERE HasAttribute “NUnit.Framework.TestFixtureSetUpAttribute”
Tests that fail when they are run in a different order
The xUnit test runner helps with this, by randomizing the order tests are run.
Ignored tests
Ignored tests are like comments. Dead code that doesn’t do anything. Either fix them or delete them.
I have yet to find some way of detecting duplicated tests, shared state in tests and multiple asserts in tests. What other ways can I find test smells ?