Clean code. Make it easier to read and understand

21.11.2014 - Lesezeit ca. 16 Min - 3400 Wörter

Refactoring and cleaning

In my first genetics article , I introduced you to the basic terminology and brought some examples. In the second one I implemented some of the needed feaures using TDD. Now I would like to clean a little bit the implementation using refactoring.

Cleaning crew. Incoming

The first one will be to remove the magic numbers. As you have seen, I have used the numbers 0.25, 0.5 and 1.0 all over the code. They are the probability of an outcome, respectively a twenty five percent probability, a fifty percent probability and a hundred percent probability. Also 0.001 can be seen in the tests, meaning the allowed delta by comparing doubles. It is better to use constants for these values.

I will use the extract constant refactoring. It allows me to define a new constant and automatically replaces every occurrence of the magic number with the defined constant. I will do this for all the magic number in the code.

work, work, work, …

After I did this stuff, I realized that I have defined the same constants in many files. I don’t want it this way. I create a constants class for the project and avoid to repeat myself. In this class I define only the constants that are used in several other classes.

I also did some renaming work and a little bit Java stuff as declaring final, etc …

Of course, I ran my test suite repeatedly to ensure that everything still works.

Here is the constants class:

 1public final class Constants {
 2
 3    public static final double FIFTYPERCENT = 0.5;
 4    public static final double HUNDREDPERCENT = 1.0;
 5    public static final double TWENTYFIVEPERCENT = 0.25;
 6
 7    private Constants() {
 8
 9    }
10
11}

I show you also two of the test, so that you can see the changes:

 1private final void assertProbability(final double expected,
 2    final Double actual) {
 3    assertEquals(expected, actual.doubleValue(), ALLOWEDDELTA);
 4
 5}
 6
 7@Test
 8public void combineDominantHomozygousAndHeterozygousReturns50PercentDominantHomozygous() {
 9    final AllelePair paternalAllelePair = new DominantHomozygous("A");
10    final AllelePair maternalAllelePair = new Heterozygous("A");
11    
12    final Map<String, Double> probabilities = paternalAllelePair
13        .combineWith(maternalAllelePair);
14    
15    assertProbability(Constants.FIFTYPERCENT, probabilities.get("AA"));
16
17}
18
19@Test
20public void combineDominantHomozygousAndRecessiveHomozygousReturns100PercentHeterozygous() {
21    final AllelePair paternalAllelePair = new DominantHomozygous("A");
22    final AllelePair maternalAllelePair = new RecessiveHomozygous("A");
23    
24    final Map<String, Double> probabilities = paternalAllelePair
25        .combineWith(maternalAllelePair);
26    
27    assertProbability(Constants.HUNDREDPERCENT, probabilities.get("Aa"));
28
29}

And also one of the classes that inherits from AllelePair:

 1public class DominantHomozygous extends AllelePairImpl implements AllelePair {
 2
 3    public DominantHomozygous(final String allele) {
 4    super(allele.toUpperCase(), allele.toUpperCase());
 5    }
 6
 7    @Override
 8    public Map<String, Double> combineWith(final AllelePair otherAllele) {
 9    final Map<String, Double> genotypeProbablilities = new HashMap<String, Double>();
10    if (otherAllele.isRecessiveHomozygous()) {
11        genotypeProbablilities.put(
12            getFirstAllele() + otherAllele.getSecondAllele(),
13            Constants.HUNDREDPERCENT);
14    } else {
15        if (otherAllele.isHeterozygous()) {
16        genotypeProbablilities.put(
17            getFirstAllele() + getSecondAllele(),
18            Constants.FIFTYPERCENT);
19        genotypeProbablilities.put(
20            getFirstAllele() + otherAllele.getSecondAllele(),
21            Constants.FIFTYPERCENT);
22
23        } else {
24        genotypeProbablilities.put(
25            getFirstAllele() + getSecondAllele(),
26            Constants.HUNDREDPERCENT);
27        }
28    }
29
30    return genotypeProbablilities;
31    }
32
33    @Override
34    public boolean isDominantHomozygous() {
35    return true;
36    }
37
38}

Outsiders see more than they need. That’s not nice.

I want to hide the concrete implementations of AllelePair. DominantHomozygous, RecessiveHomozygous and Heterozygous are implementation details, which I do not want to expose. The easiest way I can think to do that is to define a factory to create the objects . Let me start with DominantHomozygous.

1public final class AllelePairFactory {
2    public static AllelePair createDominantHomozygous(final String allele) {
3    return new DominantHomozygous(allele);
4    }
5    
6    private AllelePairFactory() {
7    
8    }
9}

As you can see, the static function creates a new DominantHomozygous object and returns it as an AllelePair. Now I replace all DominantHomozygous object creations outside the factory by calling the factory method. You can see here an example of the replacement.

 1@Test
 2public void combineDominantHomozygousAndHeterozygousReturns50PercentDominantHomozygous() {
 3    final AllelePair paternalAllelePair = AllelePairFactory
 4            .createDominantHomozygous("A");
 5    final AllelePair maternalAllelePair = new Heterozygous("A");
 6    
 7    final Map<String, Double> probabilities = paternalAllelePair
 8        .combineWith(maternalAllelePair);
 9    
10    assertProbability(Constants.FIFTYPERCENT, probabilities.get("AA"));
11
12}

Doing so I can make the DominantHomozygous class package-visible instead of public.

1class DominantHomozygous extends AllelePairImpl implements AllelePair {
2
3...
4
5}

Next, I do the same for the two other classes. Then I have successfully “hidden” the three subclasses of AllelePair. I’ll save you the time to read the changed code. Exactly the same changes as in the DominantHomozygous example above: I add a static function in the factory, then replace the usages of the class outside the factory and last I reduce the visibility of the class.

Testing, testing, testing

I need to write a couple of test to improve the certainty with the implentation. Until now, I’ve only tested the positive cases. That’s not enough certainty. I should test the boundaries. I should also consider that I do not get more results, as I expect.

The Boundaries

What do I expect when creating a AllelePair if an empty string or null is passed as a parameter? I can not create an AllelePair with such a parameter, so I should expect an exception.

I need to add new functionality to achieve that and of course the appropriate tests, because that is supposed to happen on the TDD way.

Test Expected
Create a dominant homozygous with a null string IllegalArgumentException
Create a dominant homozygous with a empty string IllegalArgumentException
Create a recessive homozygous with a null string IllegalArgumentException
Create a recessive homozygous with a empty string IllegalArgumentException
Create an heterozygous with a null string IllegalArgumentException
Create an heterozygous with a empty string IllegalArgumentException

Due to the previous refactorings, there is fortunately a single place in the code where the AllelePair objects are created. Exactly, the AllelePairFactory.

So first the test:

1public final class AllelePairFactoryTest {
2    
3    @Test(expected = IllegalArgumentException.class)
4    public void createDominantHomozygousWithNullThrowsArgumentNullException() {
5    AllelePairFactory.createDominantHomozygous(null);
6    }
7}

It fails as expected. Now I implement the functionality. Just checking for null and throwing the exception as needed.

1public static AllelePair createDominantHomozygous(final String allele) {
2    if (allele == null) {
3        throw new IllegalArgumentException("Null reference.");
4    }
5    return new DominantHomozygous(allele);
6}

Test is green.

The same way, I write the test for passing an empty string as parameter and then implement the check and throw the exception.

 1@Test(expected = IllegalArgumentException.class)
 2public void createDominantHomozygousWithEmptyStringThrowsArgumentNullException() {
 3    AllelePairFactory.createDominantHomozygous("");
 4}
 5
 6public static AllelePair createDominantHomozygous(final String allele) {
 7    if (allele == null || allele.equals("")) {
 8        throw new IllegalArgumentException("Null reference.");
 9    }
10    return new DominantHomozygous(allele);
11}

I must do the same with the other functions. I think it’s better when I extract the check in a separate function.

1private static void throwExceptionIfNullOrEmpty(final String allele) {
2    if (allele == null || allele.equals("")) {
3        throw new IllegalArgumentException("Allele is null or empty string.");
4    }
5}

So I write the other four test and extend the functions in the factory to pass the tests.

Do I get only the expected results?

I also want to check that I get only the expected results and no more ones. These are extra checks so that we can have a higher certainty about the correctness of the implementation. I have already checked that the correct values ​​are returned. Now I check also that no additional combinations are returned. The simplest method is to check that the return has only so much different kinds of combinations as expected.

The additional tests are:

Test Expected
Both parents are dominant homozygous Calculation returns only one item.
Both parents are recessive homozygous Calculation returns only one item.
One parent is dominant homozygous and the other is heterozygous Calculation returns two items.
One parent is recessive homozygous and the other is heterozygous Calculation returns two items.
One parent is dominant homozygous and the other is recessive homozygous Calculation returns only one item.
Both parents are heterozygous Calculation returns three items.

Here are some of the tests:

 1@Test
 2public void combineTwoRecessiveHomozygousReturnsOnlyOneItem() {
 3    final AllelePair paternalAllelePair = AllelePairFactory
 4        .createRecessiveHomozygous("A");
 5    final AllelePair maternalAllelePair = AllelePairFactory
 6        .createRecessiveHomozygous("A");
 7
 8    final Map<String, Double> probabilities = paternalAllelePair
 9        .combineWith(maternalAllelePair);
10
11    assertEquals(ONEITEMAVAILABLE, probabilities.size());
12
13}
14@Test
15public void combineDominantHomozygousAndHeterozygousReturnsTwoItems() {
16    final AllelePair paternalAllelePair = AllelePairFactory
17        .createDominantHomozygous("A");
18    final AllelePair maternalAllelePair = AllelePairFactory
19        .createHeterozygous("A");
20
21    final Map<String, Double> probabilities = paternalAllelePair
22        .combineWith(maternalAllelePair);
23
24    assertEquals(TWOITEMSAVAILABLE, probabilities.size());
25
26}
27@Test
28public void combineHeterozygousAndHeterozygousReturnsThreeItems() {
29    final AllelePair paternalAllelePair = AllelePairFactory
30        .createHeterozygous("A");
31    final AllelePair maternalAllelePair = AllelePairFactory
32        .createHeterozygous("A");
33
34    final Map<String, Double> probabilities = paternalAllelePair
35        .combineWith(maternalAllelePair);
36
37    assertEquals(THREEITEMSAVAILABLE, probabilities.size());
38
39}

These last tests are not necessary according to TDD because they are green immediately. This means they do not serve the further development. This is clear to me but I wanted to have them, because they are still useful to verify the correctness. The test list has grown. Here it is again, updated with the new tests:

Test Expected
Both parents are dominant homozygous All hatchlings are dominant homozygous
Both parents are dominant homozygous Calculation returns only one item.
Both parents are recessive homozygous All hatchlings are recessive homozygous
Both parents are recessive homozygous Calculation returns only one item.
One parent is dominant homozygous and the other is heterozygous 50% are dominant homozygous, the other 50% are heterozygous.
One parent is dominant homozygous and the other is heterozygous Calculation returns two items.
One parent is recessive homozygous and the other is heterozygous 50% are recessive homozygous, the other 50% heterozygous.
One parent is recessive homozygous and the other is heterozygous Calculation returns two items.
One parent is dominant homozygous and the other is recessive homozygous All hatchlings are heterozygous.
One parent is dominant homozygous and the other is recessive homozygous Calculation returns only one item.
Both parents are heterozygous 25% are dominant homozygous, 50% are heterozygous and 25% are recessive homozygous.
Both parents are heterozygous Calculation returns three items.
Create a dominant homozygous with a null string IllegalArgumentException
Create a dominant homozygous with a empty string IllegalArgumentException
Create a recessive homozygous with a null string IllegalArgumentException
Create a recessive homozygous with a empty string IllegalArgumentException
Create an heterozygous with a null string IllegalArgumentException
Create an heterozygous with a empty string IllegalArgumentException

Now I have all the test green and need a pause to think about how to do the change that is still open:

How can I make the combineWith function more readable and cleaner?

After careful consideration, I have decided to make some design changes. These changes will hopefully allow me to achieve a better implementation of the combineWith function.

My first idea is that both alleles of a pair are always represented by the same letter. Sometimes uppercase and sometimes lowercase, but the same one. Currently, I hold both alleles in different variables. This is actually unnecessary and a violation against the DRY principle. My first step will be to remove this duplication. By the way, I have to make sure that my tests, during this whole change process, continue to work.

The first thing I’ll do is to add a locus variable and provide a getter function for it, I’ll also initialize it in the constructor. I’ll do that in the AllelePairImpl class.

 1abstract class AllelePairImpl implements AllelePair {
 2
 3    private final String firstAllele;
 4    private final String secondAllele;
 5    private final String locus;
 6
 7    public AllelePairImpl(final String theFirstAllele,
 8        final String theSecondAllele) {
 9    firstAllele = theFirstAllele;
10    secondAllele = theSecondAllele;
11    locus = theFirstAllele.toUpperCase();
12    }
13
14    protected String getLocus() {
15    return locus;
16    }
17    
18    ...
19}

After this little change all my tests are still running. So let me go on. Now I’ll make the functions getFirstAllele() and getSecondAllele() abstract and define them in the child classes using the getLocus() function.

 1abstract class AllelePairImpl implements AllelePair {
 2    ...
 3    @Override
 4    public abstract String getFirstAllele();
 5
 6    @Override
 7    public abstract String getSecondAllele();
 8    ...
 9}
10
11class DominantHomozygous extends AllelePairImpl implements AllelePair {
12    ...
13    @Override
14    public String getFirstAllele() {
15    return getLocus();
16    }
17
18    @Override
19    public String getSecondAllele() {
20    return getLocus();
21    }
22
23}
24
25class RecessiveHomozygous extends AllelePairImpl implements AllelePair {
26    ...
27    @Override
28    public String getFirstAllele() {
29    return getLocus().toLowerCase();
30    }
31
32    @Override
33    public String getSecondAllele() {
34    return getLocus().toLowerCase();
35    }
36}
37
38class Heterozygous extends AllelePairImpl implements AllelePair {
39    ...
40    @Override
41    public String getFirstAllele() {
42    return getLocus();
43    }
44
45    @Override
46    public String getSecondAllele() {
47    return getLocus().toLowerCase();
48    }
49}

Just a simple change. Depending on which child class is used, we get the locus as uppercase or lowercase.

Tests are still green.

After these changes, the variables firstAllele and secondAllele have become superfluous in AllelePairImpl and can be removed. In addition to that, I also can customize the signature of the constructor, so that only one parameter is used. By using the appropriate refactoring, the subclasses are adjusted automatically. Then I adjust the parameter names so that they are more informative.

 1abstract class AllelePairImpl implements AllelePair {
 2
 3    private final String locus;
 4
 5    public AllelePairImpl(final String theLocus) {
 6    locus = theLocus.toUpperCase();
 7    }
 8    ...
 9}
10
11class DominantHomozygous extends AllelePairImpl implements AllelePair {
12
13    public DominantHomozygous(final String locus) {
14    super(locus);
15    }
16    ...
17}
18
19class RecessiveHomozygous extends AllelePairImpl implements AllelePair {
20
21    public RecessiveHomozygous(final String locus) {
22    super(locus);
23    }
24    ...
25}
26
27class Heterozygous extends AllelePairImpl implements AllelePair {
28    public Heterozygous(final String locus) {
29    super(locus);
30    }
31    ...
32}

I have removed the redundant stuff and all my tests run to green. Now I can deal with the combineWith function. The is … functions are used to distinguish with which type of AllelePair should be combined. This is actually not necessary, because I can figure out the type of the AllelePair, based on the type of the variable. So I can use instanceof to determine the Art of AllelePair I have to combine with.

Here the combineWith function in the DominantHomozigous class:

 1@Override
 2public Map<String, Double> combineWith(final AllelePair otherAllele) {
 3final Map<String, Double> genotypeProbablilities = new HashMap<String, Double>();
 4if (otherAllele instanceof RecessiveHomozygous) {
 5    genotypeProbablilities.put(
 6        getFirstAllele() + otherAllele.getSecondAllele(),
 7        Constants.HUNDREDPERCENT);
 8} else {
 9    if (otherAllele instanceof Heterozygous) {
10    genotypeProbablilities.put(
11        getFirstAllele() + getSecondAllele(),
12        Constants.FIFTYPERCENT);
13    genotypeProbablilities.put(
14        getFirstAllele() + otherAllele.getSecondAllele(),
15        Constants.FIFTYPERCENT);
16
17    } else {
18    genotypeProbablilities.put(
19        getFirstAllele() + getSecondAllele(),
20        Constants.HUNDREDPERCENT);
21    }
22}

As soon as I perform the changes to the other two classes ( RecessiveHomozygous and Heterozygous), I can remove the is… functions, because I do not need it anymore. So I remove them from the interface, from the AlleleImpl class and from the subclasses.

With these changes, I have reduced the complexity a little bit further. But I can do even more.

Until now, I have used a string Double Hash Table to represent the probability of the potential offspring. The string is the representation of the possible allele. I could use the AllelePair class instead of this string.

In order to do this the DominantHomozygous, RecessiveHomozygous and Heterozygous must be accessed from outside the package. So I change the classes to public again (In a previous article I made them package visible).

The interface changes to:

1public interface AllelePair {
2
3    Map<Class<?>, Double> combineWith(AllelePair otherAllele);
4
5    String getFirstAllele();
6
7    String getSecondAllele();
8}

The AllelePairImpl class declares the new abstract function.

1@Override
2public abstract Map<Class<?>, Double> combineWith(AllelePair otherAllele);

And the subclasses are again public and implement the refactored combineWith function. As an example, I will show you the function in the DominantHomozygous class:

 1@Override
 2public Map<Class<?>, Double> combineWith(final AllelePair otherAllele) {
 3    final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
 4    if (otherAllele instanceof RecessiveHomozygous) {
 5        probabilities.put(Heterozygous.class, Constants.HUNDREDPERCENT);
 6    } else {
 7        if (otherAllele instanceof Heterozygous) {
 8            probabilities.put(DominantHomozygous.class,
 9                    Constants.FIFTYPERCENT);
10            probabilities.put(Heterozygous.class, Constants.FIFTYPERCENT);
11
12        } else {
13            probabilities.put(DominantHomozygous.class,
14                    Constants.HUNDREDPERCENT);
15        }
16    }
17
18    return probabilities;
19}

Of course, all these changes must be done carefully, otherwise the tests do not work anymore.

I do now a few cosmetic changes, extract a few functions and change the structure a little bit, so the combineWith function is more readable.

Again , I will show you the changes on the basis of the DominantHomozygous class, but obviously I make the changes on the three subclasses of AllelePair

 1@Override
 2public Map<Class<?>, Double> combineWith(final AllelePair otherAllele) {
 3    if (otherAllele instanceof RecessiveHomozygous) {
 4        return combinationsWithRecessiveHomozygous();
 5    }
 6    if (otherAllele instanceof Heterozygous) {
 7        return combinationsWithHeterozygous();
 8    }
 9    if (otherAllele instanceof DominantHomozygous) {
10        return combinationsWithDominantHomozygous();
11    }
12
13    return null;
14}
15
16private Map<Class<?>, Double> combinationsWithDominantHomozygous() {
17    final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
18    probabilities.put(DominantHomozygous.class, Constants.HUNDREDPERCENT);
19    return probabilities;
20}
21
22private Map<Class<?>, Double> combinationsWithRecessiveHomozygous() {
23    final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
24    probabilities.put(Heterozygous.class, Constants.HUNDREDPERCENT);
25    return probabilities;
26}
27
28private Map<Class<?>, Double> combinationsWithHeterozygous() {
29    final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
30    probabilities.put(DominantHomozygous.class, Constants.FIFTYPERCENT);
31    probabilities.put(Heterozygous.class, Constants.FIFTYPERCENT);
32    return probabilities;
33}

The test are still running to green. And I now see that the combineWith functions of the three possible AllelePair subclasses have become the same. I can avoid this duplication by refactoring the base class. I will define the combineWith function as final in it and I also will add the needed three abstract functions.

Here, the last versions of AllelePairImpl and DominantHomozygous

 1abstract class AllelePairImpl implements AllelePair {
 2
 3    private final String locus;
 4
 5    public AllelePairImpl(final String theLocus) {
 6        locus = theLocus.toUpperCase();
 7    }
 8
 9    protected String getLocus() {
10        return locus;
11    }
12
13    @Override
14    public final Map<Class<?>, Double> combineWith(final AllelePair otherAllele) {
15        if (otherAllele instanceof RecessiveHomozygous) {
16            return combinationsWithRecessiveHomozygous();
17        }
18        if (otherAllele instanceof Heterozygous) {
19            return combinationsWithHeterozygous();
20        }
21        if (otherAllele instanceof DominantHomozygous) {
22            return combinationsWithDominantHomozygous();
23        }
24
25        return null;
26    }
27
28    protected abstract Map<Class<?>, Double> combinationsWithRecessiveHomozygous();
29
30    protected abstract Map<Class<?>, Double> combinationsWithHeterozygous();
31
32    protected abstract Map<Class<?>, Double> combinationsWithDominantHomozygous();
33
34    @Override
35    public abstract String getFirstAllele();
36
37    @Override
38    public abstract String getSecondAllele();
39
40}
41
42public class DominantHomozygous extends AllelePairImpl implements AllelePair {
43
44    public DominantHomozygous(final String locus) {
45        super(locus);
46    }
47
48    @Override
49    protected Map<Class<?>, Double> combinationsWithDominantHomozygous() {
50        final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
51        probabilities.put(DominantHomozygous.class, Constants.HUNDREDPERCENT);
52        return probabilities;
53    }
54
55    @Override
56    protected Map<Class<?>, Double> combinationsWithRecessiveHomozygous() {
57        final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
58        probabilities.put(Heterozygous.class, Constants.HUNDREDPERCENT);
59        return probabilities;
60    }
61
62    @Override
63    protected Map<Class<?>, Double> combinationsWithHeterozygous() {
64        final Map<Class<?>, Double> probabilities = new HashMap<Class<?>, Double>();
65        probabilities.put(DominantHomozygous.class, Constants.FIFTYPERCENT);
66        probabilities.put(Heterozygous.class, Constants.FIFTYPERCENT);
67        return probabilities;
68    }
69
70    @Override
71    public String getFirstAllele() {
72        return getLocus();
73    }
74
75    @Override
76    public String getSecondAllele() {
77        return getLocus();
78    }
79}

Obviously my tests are still running. :-)

I now realize that the getFirstAllele and getSecondAllele functions aren’t used anywhere. So I can remove them from interface and from the implementations. As they aren’t used, the tests still run after removing it.

The actual implementation is not complete. You can see that the AllelePair are not able to handle different alleles per locus. But for the first approximation this is good enough for me.

Although there is much to do until the library is finished, I hope I could provide an impression of what TDD is. Of course this is not the only way to solve the problem and I’m not saying it is the best.


If you enjoyed reading the article, I would be glad if you share it. :-)