For the book club this time I, Mark, and Daniel read Program Comprehension and Code Complexity Metrics: An fMRI Study. (Vince wasn’t able join us this time.)

In summary, the researchers measured how well various code complexity metrics correlate with how well programmers actually understand code of various complexity. This was done partly by asking questions to verify comprehension, partly by measuring how long it took for the subjects to form a conclusion, and partly by monitoring them with an fMRI brain scanner.

The complexity metrics here are computed mechanically from the source code. There are many such metrics. There is so far no clear consensus what actually works. Having a metric that actually works would be useful: if would allow a compiler, for example, warn that some code may be hard to understand.

We found the paper and results to be interesting. I personally found it to be a vindication for the times I was forced to change code to placate an arbitrary McCabe measurement: “McCabe has no correlation to neither response time nor correctness”. Take that, boss of younger me. However, this is a fairly small study, and will need to be replicated and repeated with larger sets of subjects before it can be trusted. (Vindication delayed.)

In this research, they did find some metrics that show some signs of working. For example, the number of lines of code has a medium correlation with programmers correctly understanding the code. “LOC, Halstead, and DepDegree all show a small correlation with response time and a medium correlation with correctness.” We find that his meshes with our own experience, which is nice.

Unfortunately, even medium correlation seems a bit weak. However, studies like this one may act as a guide towards developing a strong metric.

In our discussion about the paper, we concluded that we would very much like to see further research into this area. For example, understanding individual functions is important, we find that much of our time is actually spent reading and trying to understand change sets (“diffs”, “patches”, “pull and merge requests”). How can changes be presented in ways that maximize correct understanding while minimizing time to comprehend?

Overall, while we are not competent to judge this research scientifically, we conclude that brains are complicated things. It also seems written with non-expert readers in mind. We recommend the paper for all working programmers.