Documentation of software, particularly of code, is something that most developers seem to lack enthusiasm for. Though we know we should write it we have a real reluctance about doing it, and when we do we do a sporadic job of it. Typically a poor job is done in a rush at the end of a project. Soon after, when we start work on a new project we then lament the lack of it!

Types of documentation

Technical documentation of software can be done at multiple levels from the high-level architecture and design down to single line code comments.

Code commenting

From a developer’s perspective, the lowest level form of documentation is comments in code. Sandcastle/JavaDoc/RDoc style comments are barely any higher level than this.

This type of documentation is often useful when reading the code to find out what it does or is supposed to do. However, it is notorious for being inaccurate or out of date. The compiler does not check this of course! Personally I read code comments as a last resort. I prefer to read the code itself as I trust it.

Years ago I would typically turn on the feature of the compiler where any missing Sandcastle documentation on a public class or method would trigger a warning. I found myself using GhostDoc a lot to generate the comments and using pragmas to stop missing documentation warnings on generated code. I eventually realised that many or most of the comments we were putting in were worthless and unnecessary, and cluttered the code. I became accustomed to skipping over them when scanning/reading code. The one really helpful comment you write might as well be invisible. I still write XML code documentation but only where it adds value. Its main purpose for me is for Visual Studio’s IntelliSense.

I find time and time again when I go to write code comments that I should have expressed my intent better in the code itself by better naming, better code structure, etc. I will often write a comment, then rearrange the code rendering the comment redundant and then remove the comment I just wrote!

Tests as documentation

Test Driven Development advocates emphasise “tests as documentation”. If written well, tests are very effective as documentation because they are both guaranteed to be up to date (otherwise they are failing tests that break the build) and they express the intent of code from the outside. Of course writing code test-first tends to improve the quality and hence readability of the code in the first place.

Automated acceptance tests can document behaviour at a higher level, describing what the system does (not what just what it did when the acceptance tests were written).

Design documentation

Coming into a new code base, if there was only one type of documentation I could have, I would choose high-level design documentation. I can read the code but it will probably take me weeks or months to get a feel for how it all holds together. Until then I am not very productive in that code base. Even after I have been working in a code base for a long time, I still may not be aware of the core design philosophies and may start making changes that compromise those philosophies, corrupting the structural integrity of the code base.

Unfortunately, in my experience of joining existing projects, code comments are most likely the only form of documentation available. Design documentation of the type I would like is rare. If there is any it is likely out of date (although this is still better than nothing). Despite this, organisations typically have a policy around code commenting but little or no mention of higher level documentation.

High level or design documentation should emphasise the thought processes and philosophies that went into the design. Why did we do it this way? Why did we not just do this? Where were we intending to take the design in the future? It is this sort of documentation that has the potential to keep a piece of software maintainable for a long period despite having gone through many hands. Without it, it is almost inevitable that the code will morph into something completely lacking cohesion (e.g. where a developer makes a “harmless” change to get something done that breaks a key design philosophy, then the floodgates open as the code base descends towards a ball of mud).

Diagrams, though time-consuming and awkward to keep up to date, can make a big difference. Even if it is just a photo taken of a whiteboard drawing, it is better than just text.

How-to documentation

Sometimes tutorial style documentation is needed to walk through a process that is done on a regular basis. It does not explain the why or the design philosophies but just how to go about something. Commonly screenshots are used to make it easier to follow.

Having been both the recipient and author of how-to documentation, the main thing I would emphasise is testing. Once you have authored some how-to documentation, make sure you get at least one person with limited knowledge of the area in question to run through it, making notes and asking questions where they get stuck. This feedback should go back in to correct errors, ambiguities, omitted steps, etc. Don’t unleash this how-to until you have had someone walk through it without issues. I don’t know how many frustrating developer hours I’ve seen wasted due to errors in how-to documentation.

Ideally a process that is often repeated should be automated. The automation should be written in a simple, highly readable programming language and style. This way the documentation is largely the script itself and, like automated tests, is verified to be up to date by breaking the build if it is not. Automation also all but removes the human error element.

Discoverability

It is no use having documentation if it is not found when it is needed! To improve discoverability requires thinking about the intended audience and where they will expect to find it.

The best approach, where possible is putting it in a place were they can’t miss it!

A developer is inevitably going to be looking at the code. So to put the high level documentation on a document somewhere obscure on the network or wiki is not great for discoverability. Best to put that document in the same folder as the source code itself! Alternatively, a document with links to the full documents on a wiki or elsewhere can also be useful, bearing in mind that code related documentation is bound to the code and should ideally be versioned with it.

Installation documentation should be included in the release package, ideally in text format because Word, for example, is not necessarily installed in the target environment.

Keeping it updated

Any documentation that is executable (i.e. in the form of automated tests) should by definition be up to date. Everything else requires discipline to keep up. Automated tools cannot check that your code level documentation is up to date. One way to provide this is to build it into your “definition of done”.

For higher level design documentation it is helpful to have this documentation close to the code, ideally in source control. When a new chunk of code is checked in that out dates a section of design documentation (or leaves a new area of code undocumented in that documentation), ideally it would break the build! Failing that, we have to fall back on human reviewers (i.e. code review).

There is no substitute for peer code review when it comes to documentation. The code review checklist should include high-level as well as code-level documentation. However it can be counter-productive to write comprehensive document too early (i.e. before the code to be documented stabilises).

It comes back to discoverability: that the reviewer and the code author need to know about the existence of documentation in order to keep it up to date and verify it was updated.

Improving Quality

Documentation that is done all at once at the end of a project is inevitably going to be patchy. It is essential that documentation is written or updated during or soon after each piece of development work. Notes should be written during development so that items don’t get forgotten. Both high- and low-level documentation need to be integrated into the definition of done.

When a gotcha is discovered, it should be documented as soon as possible with discoverability paramount. This may mean a comment in the code in question, perhaps with a link to a blog post. It may take the form of a simple readme.txt next to the group of source code files in question (“don’t make the mistake I just did”).

Reducing the need for documentation

In an ideal world there would be minimal need for documentation. There are practices that can help make the thing that is documented self-documenting.

Some examples:

  • Readable code that expresses intent well
  • Consistency of language/terminology
  • Automated tests that express expected behaviour well
  • Well structured code bases that follow common/familiar patterns
  • Manual processes automated with simple easy to read scripts
  • Use of common (hence recognisable) industry idioms.

Internal Software Documentation