Comparing the Effects of Programming Practices on Code Quality
Written by Annika Diener
(This work is also available online and interactively under https://nikadev.net/r/code-quality)
Index
Introduction
Goals and Criteria
The goal of this text is to investigate the effects
that certain programming concepts have on code bases.
"Paradigms"
As the basic metric for evaluation counting is used,
if not stated otherwise.
Simply counting work has already been shown
to be an accurate enough first indicator,
in the context of work planning
The criteria by which each technique gets measured are:
- Performance
- Editability
- Coupling
- Cohesion
- Complexity
- Mental Load
- Testability
This work tries to evaluate each concept
holistically since many common programming advices
focus only on one area. Often to the detrement of others.
For example "clean code"
Performance
To enable coparrison of the different techniques,
each technique got implemented in C once and compiled
with the same compiler (gcc)
-O3 & -O0)
on the same machine using unity builds
hyperfine
-w 5 -r 100 -u millisecond -N.
A full table of taken measurements
per implemented language is provided,
for all languages that have been used.
In addition to the time measured on the test system,
the amount of assembly instructions was recorded
for each language, that offers this feature on
Compiler Explorer
Editability
Editability is defined by the number of edits to a given project
that have to be made to perform a change.
Edits are changes, addition or removal of tokens,
as defined by the lexer of the languages compiler.
A token is considered changed, if it still belongs to the same class
(integer stays integer, variable stays variable, ...).
Otherwise it is considered as one removal of the old toke and one addition of the new token.
The higher the number of edits, the worse the editability.
The best editability is "1".
Measurements will be given including and excluding newlines.
Newlines will be treated as "1" edit when included.
The tables will include the values in the format<without newlines>/<with newlines>,
since some might consider them cosmetic.
The code example above has a editability of// Old x = a + 1; // New x = a + 2;
1/1 since a single token got changed.
The code example above has a editability of// Old x = a + 1; // New x = 2 * (a + 1);
6/6 since a 2 tokens got changed,
and 4 tokens got added.
The final code example above has a editability of// Old x = vec.x * (vec.y + vec.z); // New x = sprt( sum_vec( vec ) );
6/10 since a no token got changed,
4 tokens got added and 10 tokens got removed.
There were 4 newlines introduced, which puts the second editability score to 10
Coupling
If a change in one part of the code requires the change in a different part of the code, the two pieces of code are coupled. This contrasts with editability in so far, that editability counts the total amount of changes needed, while coupling is merely concerned with the fact that a change needed to happen in the first place.
The coupling relations form a graph. To better understand not only the coupling relations them selves, but also the strength of the coupling, the coupling score for a particular change , will be the sum of the distance and the sum of all related coupling scores for all related changes .
The following are some example dependency graphs
and their resulting coupling.
Since coupling will be measured on the levels of
functions, Files/Class' and Modules, feel free to interpret
each box as one of those.
The X marks the Element that changed.
The other elements that had to change bacause of the change in X
will be annotated with their respective distance.
Where there was no change, the element will receive a 0.
In cases where the coupling score is not a sum away,
the series of graphs depicts the summations until the final score can be calculated.
Cohesion
Cohesion describes the proximity of changes, if changes take place. This is measured in this text by average and maximum of the distance in lines. A file/class change counts as 100 lines.
I'll be following the common advice to have each class (if needed) in it's own file. Hence the distinction between classes and files is not needed when analyzing coupling.
Metrics as presented in
"Class Cohesion Metrics for Software Engineering"
Additionally some of the mentioned metrics are only
applicable in so far that they warn the programmer
from low cohesion, but can't asses the cohesion meaningfully
after a certain threshold has been reached.
Many metrics also seem to measure coupling instead of cohesion.
Let's consider the following example in which a constant gets renamed that is used across multiple files.
// inside definitions.h const Vec ZERO_VEC = {0};
// inside file1.c Vec a = ZERO_VEC; a.x += 10; // ...
// inside file2.c Vec b = ZERO_VEC; b.y += 20; // ...
In case if the above example would rename
ZERO_VEC to VEC_ZERO,
the resulting change would span 2 files,
not counting the definition.
The average distance is hence (100+100)/2 = 100.
The maximum distance is 100.
Meaning, the resulting cohesion score is 100/100.
// inside file1.c const Vec ZERO_VEC = {0}; // ... 20 other lines ... Vec a = ZERO_VEC; a.x += 10; // ... 20 more lines ... Vec b = ZERO_VEC; b.y += 20; // ...
In this example, the change is contained to one file.
The change happens on relative line 21 and 42.
The average distance is hence (21+42)/2 = 31.5.
The maximum distance is 42.
Resulting in a cohesion score of 31.5/42.
// inside definitions.h const Vec ZERO_VEC = {0};
// inside file1.c Vec a = ZERO_VEC; a.x += 10; // ...
// inside file2.c Vec b = ZERO_VEC; b.y += 20; // ... 20 more lines ... Vec c = ZERO_VEC; b.z += 30; // ...
This last example combines the changes in one file and the distance across files.
The average distance is hence (100+100+121)/3 = 107.
The maximum distance is 121.
Resulting in a cohesion score of 107/121.
Complexity
Complexity is not formulated in this text
using metrics like Cyclomatic Complexity
Each operation (function call, arithmetic operation, comparison, ...) can be represented as a function that takes inputs, where I is the set of inputs, and returns outputs, where O is the set of outputs. Each element x has its own complexity number determined by the inputs used in its construction and the amount of places it is used in.
Consider the following example:
int x = a + b * c; int y = x + 1; int z = x - 1;
This method of measuring complexity also works
for functional concepts like partial function application.
It also accounts for value complecting through
controll structures like for loops:
int sum = 0 for (int i = 0; i < 3; i++) { sum += i; }
for loop
For brevities sake, the outputs will be ignored
in the formulas, since they are "1" thoughout
this example.
As apparent in this example, the complexity for code that works on arbitrary lengths of input has to be measured in relation to the length of the input and can not be resolved to a single number. This also applies to concepts like recursion.
Mental Load
Mental Load is defined by the amount of context that a programmer needs to remember to actively work with a piece of code.
The load is formed by the sum of all items () multiplied by the changes () this item goes through. An item is a named element which is not a literal itself (for example: function, variable, constant, ...).
As an example, consider the mental load for the following function:
int my_fun(int a, int b, int c) { a = a ** 2 + b ** 2 + c ** 2; int ans = a / 2; ans = a / (c == 0 ? 1 : c); return ans; }
The mental load would be:
Testability
The Testability is measured by the amount of work that has to be done to test the functionality of a piece of code. This involves counting the steps in each test.
Each line only contains one action.
So even if
my_function(1, a++);
could be written in one line, it will be separated into two
my_function(1, a); a += 1;.
This enables comparisons for languages like C,
where initialization shorthands are only available
with separate initialization.
The lower the score the better. The best possible score is 3 in manual memory managed languages. One line setup. One line test. And one line teardown. The best possible score is 2 in automatic memory managed languages. One line setup. One line test.
char is_zero(const char* x) { return 0 == *x; } charDa da_res = new_charDa(0); charDa da = da_res.result; test_char(all_charDa(da, is_zero), 1, "all succeeds for the empty case"); free_Da(&da);
The following test in C test the function all_charDa has setup in line 1, 2 and 3
and one assertion (test_char) on line 4,
followed by one line of teardown.
To test this function, there are 3 lines/actions of setup and 1 line of teardown required.
The testability of this function is hence 4.
Abilities of Large Language Models
Since advances in the field of Artificial Intelligence (AI) have created Large Language Models (LLMs) with high fidelity. The applicability of those models to perform analysis along the outlined criteria is of interest. Also the ability to suggest improvements to code to improve the metrics above.
For evaluation, models with a provided free tier were used on that level. The models used are [TODO: model list].
I hope that this analysis can further ilumenate the finding by [TODO: insert github code quality LLM studies].
Side-Effects & Pure Code
A function is considered pure, if the function doesn't affect anything on the outside and doesn't relies upon information obtained from the outside. This means a pure functions scope is only defined by the function signature.
A side-effect is an action performed by a function, that is visible from outside the functions own scope. The most common side-effect are IO operations.
While side-effects and pure functions are not opposites of each other,
they are mutually exclusive,
since any function with a side-effect
can't be pure and any pure function can't have any side effects.
Consider the following two implementations of an addition function
for illustration of both concepts.
// pure function int add(int x, int y) { return x + y; }
// impure function without side-effect const int x = 4; const int y = 2; int add() { return x + y; }
// function with side-effect int x = 4; int y = 2; int add() { x += y; }
The performance between all 3 examples above is the same. This can be easily concluded without looking at assembly: To perform the calculation, both values as well as the operation have to be loaded into the CPU registers. The best case in all 3 cases is that all values are already in cache. The worst case is that all values have to be fetched. The order of operation does not impact performance in this trivial case. Most compilers in most languages will inline this function for this exact reason.
The C Problem
This analysis is only partially possible in C and requires some assumption to be generellazied. Arbitrary memory allocation and de-allocation is hard to get right in C.
Hence it became standard for C code to pass the memory used for operations as an argument into the functions. This is a side-effect by design.
As an alternative, "handlers" to pieces of memory can be passed around as values. While memory is still managed by a function scope further up in the call stack, the handlers get treated as if they were actually a full representation of memory. This handler based approach enables us to write "pure" functions, while maintaining the advantages of centralized memory management, at the price of (often) trippling the stack memory footprint of the original pointer.
Code
The displayed code examples in the normal document flow are shortened to preserve readability of the sections. The full code examples used are liked to or can be found at this projects gitlab .
While the "pure" example looks reasonable for many, the side effect version doesn't. While I want to examine both extremes to illustrate my point, I'll add a second version that is more contracted, since this will closer resemble the coding style of some programmers.
Analysis
Performance
The performance of all 3 versions is similar enough, that no significant difference could be measured. The increased assembly cound for the pure version is C, is explained by the function call overhead. Inlining the function calls reduced the assembly to 199 lines (-31 lines).
Editability
Considered were the following scenarios:
- Additionally calculation the "minimum" and "maximum"
- Parralelizing the code
-
Switching from
Colors to integers
Coupling
Cohesion
Complexity
Mental Load
Testability
Values & References
Global Constants
With the examples it becomes apparent that there is an adverserial relationship between cohesion and coupling vs editability. It is common wisdom to factor repeaded usages if constants literals into constant variables. While this incereases editability for the case that the value changes, the increased coupling and lessened cohesion will result in larger amounts of work if the constant itself changes due to a rename, or removal.
LLMs and Quality
Conclusion
Discussion
The metric of counting might give a first indication of the usefulness of the discussed techniques, but could be improved in future work. One approach to finding a better metric could be how each aspect relates to time. This would require the analysis of many commits and time spend per commit on a large enough code base.
Categories like "Writeability"
Likewise, the metric of "Agreeablenes"
One flaw of the analysis and comparison between programming
languages and techniques is the reliance on
libariries and external modules.
Due to time constraints,
I decided based on personal experience when
usage of libraries was appropriate and when not.
As further improvement on this work,
additional implementations
under similar constraints would be recommended.
Bibliography
-
Robert W. Floyd, "The Paradigms of Programming"
1978 ACM Turing Award Lecture.
Available:
https://dl.acm.org/doi/
pdf/10.1145/1283920.1283934 [Accessed May 25, 2025]. - Vasco Duarte, "NO ESTIMATES". Oikosofy Series, 2015 [E-book]. Available: https://oikosofyseries.com/
-
Kitware, Inc. and Contributors, "CMake UNITY_BUILD" cmake.org,
2025. [Online]. Available:
https://cmake.org/cmake/
help/latest/prop_tgt/ . [Accessed May 25, 2025].UNITY_BUILD.html - Matt Godbolt, "Compiler Explorer" godbolt.org, 2025. [Online]. Available: https://godbolt.org/ . [Accessed May 25, 2025].
- "GCC, the GNU Compiler Collection 13.3.0", [Software]. Free Software Foundation, Inc. 2025. Available: https://gcc.gnu.org/ .
-
"hyperfine 1.19.0",
[Software].
David Peter.
2025. Available:
https://github.com/
sharkdp/hyperfine . -
Rich Hickey.
Presentation, Title: "Simple Made Easy"
Strange Loop, 2011.
Available:
https://youtu.be/
SxdOUGdseq4?si=9Fdck6Y7jIblNJ_d [Accessed June 13, 2025]. -
Habib Izadkhah, Maryam Hooshyar,
"Class Cohesion Metrics for Software Engineering:
A Critical Review"
2017 Computer Science Journal of Moldova, vol.25.
Available:
https://ibn.idsi.md/
sites/default/files/ [Accessed June 5, 2025].imag_file/ 44_74_Class%20 Cohesion%20Metrics%20 for%20Software%20 Engineering_A%20Critical%20Review.pdf - Casey Muratori. Presentation, Title: "Where Does Bad Code Come From?" Molly Rocket, November 2 2021. Available: https://youtu.be/7YpFGkG-u1w?si=JvlKWZUOxfdjo1t3 [Accessed June 16, 2025].
-
Casey Muratori.
Presentation, Title: "'Clean' Code, Horrible Performance"
Molly Rocket, February 28 2023.
Available:
https://youtu.be/tD5NrevFtbU?si=ky3lhyKeTbMxmP-z
Transcript available:
https://www.computerenhance.com/p/
clean-code-horrible-performance [Accessed June 16, 2025]. -
Robert C. Martin,
Clean Code: A Handbook of Agile Software Craftsmanship.
Pearson, August 2008.
Available:
https://www.oreilly.com/
library/ view/ clean-code-a/9780136083238/ -
T. J. McCabe,
"A Complexity Measure"
1976 IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320,
doi: 10.1109/TSE.1976.233837,
Available:
https://ieeexplore.ieee.org/
abstract/ [Accessed August 5, 2025].document/ 1702388