Code comments are (mostly) a violation of DRY

php, javascript, programming, documentation, dry

16 Jan 2021

“Don’t repeat yourself” is such an important and widely taught concept in programming, that it has its own acronym (DRY).

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system

— Wikipedia

DRY is a very powerful idea, and avoids a lot of issues, like having to fix the same bug in multiple places, because the same code has been duplicated. Many voices say that it is often overused leading to a wrong abstraction, and I tend to agree with that statement.

duplication is far cheaper than the wrong abstraction

— Sandi Metz

People often overdo the DRY principle by building abstractions the first time a problem occurs. Instead, the problem should not be abstracted before it has occurred multiple times, since it might easily lead to a wrong abstraction that might not live up to its responsibilities and ultimately causing more problems than it solves. There are already some principles like WET (Write everything twice) and AHA (Avoid hasty abstractions) that kind of contradict the DRY principle respectively limit its applicability.

While I welcome the recognition of DRY overuse in many situations, I think this principle tends to be underused when it comes to code comments, which is the topic of this blog post.

Comments often violate the DRY principle

In their fantastic book The Pragmatic Programmer David Thomas and Andrew Hunt have coined the DRY principle and they have explicitly listed that comments are a possible violation of this principle. When people are learning to code, they often get taught that good code needs lots of comments, which is absolutely not true in my opinion. Very often good code that is self-explanatory does not need any comments at all and if it does, the comment should describe why it has been implemented this way instead of just repeating what the code already says.

My favourite stack overflow question of all time deals with code comments and lists some really good examples of how not to do it (especially if you skip the funny ones, which unfortunately for this blog post are the majority).

There is one very obvious example of a bad comment:

return 1; # returns 1

This is a very obvious violation of the DRY principle, whenever the return value changes, the comment also has to be updated. But there are other not as obvious examples:

$i++; // increase by one

This is only acceptable as an explanatory comment in teaching material, but it should never make its way to a production codebase.

The fall of doc blocks

Especially in languages with weak typing documentation comments are very popular. Since these languages often don’t allow to specify types in code, people have invented ways to move that information to comments, which allows for a better understanding of the code when reading it. The alternative would be to read the code and try to find out based on how these variables are used what type needs to be passed. Popular libraries include PHPDoc and JSDoc.

/**
 * Adds two numbers
 *
 * @param int $a
 * @param int $b
 */
function add($a, $b) {
    // ...
}

Especially the @param made a lot of sense because the code itself does not expose that information in a very accessible way. But recent PHP versions improved the type system a lot and also in JavaScript technologies allowing to add type information like TypeScript get a lot more popular (compared it to Flow in another article ), which makes these doc blocks obsolete in many cases.

function add(int $a, int $b) {
    // ...
}

As a bonus, these type systems will also yell at you if the type is not correctly set, something a pure comment cannot really help with. So adding another comment just with the type annotation would duplicate that information with no real value unless the parameter is explained in more detail.

Comments tend to be ignored by developers too

The reason comments exist is to allow adding additional information to the source code in natural language. Whatever is added as a comment will be ignored by the compiler or interpreter. Developers know that, so many of them learned to ignore them to a certain degree. That’s especially true if they have ever worked with a codebase that contained outdated comments. I am always very skeptical when reading comments and double-check with the actual implementation if the statement of the comment is true because I have experienced too often that the code didn’t behave as the comment suggested.

Again, there is an answer in the already mentioned Stack Overflow question:

/**
 * Always returns true.
 */
public boolean isAvailable() {
    return false;
}

That might look like a really stupid example because it is so terribly obvious. But I totally believe that something like this can easily happen in a real codebase. Since developers tend to ignore code as well, it is not very unlikely that they don’t update the comment when changing the code for some reason.

The worst thing is that the above example is not even that bad, because after a second you’ll realize that the comment is wrong. More detailed errors in a comment are much harder to recognize because more complex code usually justifies comments, but they are only helpful if they are actually up to date. If developers don’t read comments in the first place, they are at the same time much more likely to not update them if they change something, giving them again less reason to believe in them. I would say this is a vicious circle.

Comments should add something

As already mentioned more complex code often justifies comments, at least if they describe reasons or thoughts that are not obvious from just looking at the code. But if it is considered very strict, this is already a violation of the DRY principle, because the comment needs an update too when the code changes. But it might be worth the tradeoff if the code is hard to understand.

A rule I am following is that a comment should not just repeat what the code is already saying. Another phrasing would be to say that comment must always add values, that would be missing if they weren’t there. Just recently there was a discussion in Austria about some JavaScript code for a covid-19 vaccination forecast because the code just seemed to make up some numbers. But the more interesting part of that code was the usage of comments in it:

if(now.hour() < 6) {
    estimated = ausgeliefert; // hour is before 6am
} else if(now.hour() > 17) { // hour is after 6pm
    // ...
}

The first comment basically just repeats what the line before is doing. If we need to describe what the line now.hour() < 6 is doing, then we would basically have to comment every single line in our code. The same is partially true for the next comment. It was probably written to indicate that although the code says now.hour() > 17 does not include times like 17:01. It might be a little bit better than the first comment, but I still don’t think that it is worth the tradeoff of duplicating the same information in two different places.

Another tradeoff is the doc block of the add function from above. As long as the int type hints are not part of the code itself, it makes sense to add this information, because it is much easier to find out what types have to be passed this way. If that information is not there, it might be quite hard and even need some debugging to be sure about the types that the function accepts. I guess this improvement in developer experience justifies the potential risk of the comment being outdated. But as already said above, the latest PHP versions support the type hints in code, making the comments obsolete and guaranteeing the type of the variable.

Good naming can often replace comments at all

Finally, I want to show some code, that might get rid of some comments by writing it in a self-explanatory way. This makes the code more obvious to read and since it is real code and not just comments, it is much less likely that developers won’t read it.

Let’s start with the JavaScript example from the previous section. We’ve already said that the first comment is kind of unnecessary, so we can safely omit it. The second comment kind of had a point because it was explaining in a hidden way that the hour has to be after 18:00, and even though 17:01 is after 17:00, it would not be accepted by the if statement. Another way to make this more clear is to use the >= operator instead. It removes that ambiguity and reads nicer.

if(now.hour() < 6) {
    estimated = ausgeliefert;
} else if(now.hour() >= 18) {
    // ...
}

Now the code itself is more clear and the comments could be removed, just by using a different operator.

The other two examples I am showing are real-world examples I’ve run into during my work as a software engineer. The first one is an if statement, that tries to find out if a given node represents a document that is a new one or if it has already existed before. The logic to do so was a bit cryptic, so it made sense to use a comment to explain what was happening here:

// Check if the document is a new document
if (
    !$node->hasProperty(
        $this->propertyEncoder->encode(
            'system_localized',
            StructureSubscriber::STRUCTURE_TYPE_FIELD,
            $event->getLocale()
        )
    )
) {
    // ...
}

A very easy way to avoid this comment, is to store the result of the if statement in a separate variable and give it a meaningful name:

$isNewDocument = !$node->hasProperty(
    $this->propertyEncoder->encode(
        'system_localized',
        StructureSubscriber::STRUCTURE_TYPE_FIELD,
        $event->getLocale()
    )
);

if ($isNewDocument) {
    // ...
}

This avoids the need for the above comment, and developers cannot really skip the variable name, because it needs to be referenced later. The comment would have been written in gray by the IDE, kind of telling the developer that these lines don’t really matter. By skipping reading that part of the code, it is also more likely that the comment does not get updated when the code changes.

It would be even better if this check would be part of a class so that it could be called like $document->isNew(), but that’s beyond the scope of this article.

Another example I’ve stumbled upon is the following code:

// remove the "sec:role-" prefix
$roleId = \substr($property->getName(), 9);

The above code will remove the prefix sec:role- of a string to retrieve the ID based on the name of a property. The code works, but the number 9 is a so-called magic number, so it needs some explanation, so it somehow feels natural to just add a comment afterwards. Sometimes constants are used to give such magic constants a name that better explains what it should be doing. But in this very specific example, there is also a different solution.

$roleId = \str_replace('sec:role-', '', $property->getName());

This example does not make use of code that counts the number of characters, but we are replacing the sec:role- prefix with an empty string. This way it is clear that the sec:role- prefix is removed, without the need of a comment violating the DRY principle.

I really like finding ways to write code in a way that better explains itself. Very often these changes are really subtle, but they change the way code is read fundamentally and avoiding comments altogether. I hope that these examples helped you to find some motivation to do so too!