Hello World

2021-04-19
12 min read

In this post I explain my motivation to start this blog and I also discuss two papers I have read in the past weeks and the insights I yielded by studying them.

Motivation and Disclaimer

I just started a PhD program with the testShift project at TU-Delft. During the next four years I will try to uncover the hows and whys of software testing. I want to find the reasons why software engineers are (not) testing their software in the development process. Software development is a very social and collaborative process which is influenced by many socio-technical factors. I am trying to fill a research gap that involves both technical and social understanding of the work of software engineers.

I strongly believe in free access to information and the unrestricted use of research material. Writing this blog stems from my idea of how transparent and open contributions to public knowledge can and should take shape. Apart from that I want to write regularly about my work to not only reflect on what I am doing but also to build up a log of the resources that I am using and the ideas that emerge from those resources. Lastly I hope I motivate other people to join the discussion and become involved in the world of software testing.

About protesting.tech

The name of this blog (protesting.tech) is of course a play of words. I believe that the use and development of technology is political; software engineers play an important role in shaping societies' future. Their actions always carry with them an implicit political stance and very often engineers are not aware of this. Even the choice of which software we use makes a difference in my eyes. Choosing community driven software like Mastodon is often motivated by a political drive to challenge existing systems and to create sustainable alternatives. Establishing something often goes hand in hand with protesting against what we believe to be wrong.

Just like the choice of technology, the choice not to test software can have (mostly unforeseen) drastic consequences. Software failures which could have been avoided by proper testing can lead to tragic incidents and even the loss of live 1. protesting.tech is my way to position myself and advocate for a constructive shift in the testing mindset. I am protesting against irresponsible software development and I want to advocate for testing with a positive attitude, focusing not only the con:s of not testing but emphasising the pro:s of testing technology.

Towards a mental model of the state of software testing

When you ask different software engineers why they are not developing a test suite for their code, you will probably get a big variety of answers. Software development is a complex matter so it is not surprising that individuals follow different approaches for various reasons. What I have learned in the last two weeks is that more often than not developers base their reasoning on misconceptions. The two papers that I am going to introduce in this post, among other things, explore this phenomenon from two different angles. The first paper by Beller et al.1 published in 2019 looks at the behaviour of developers during coding sessions and then puts their analysis in relation to claims the participants made about themselves when they were filling out a survey. The second paper by Anice et al.2 focuses more on the reasoning of developers when they are writing test-code. Both papers investigate how developers test their software and they motivate for development of better tools, adaptation of new development methodologies, further research on the hows of software testing but also future research of external socio-technical factors of software engineering.

Patterns and Behaviour

The quantitative study Developer Testing in the IDE: Patterns, Beliefs, and Behavior conducted over a timespan of two years by Beller et al.3 is based on observations seen through the eyes of the integrated development environment (IDE). Over 2000 software engineers took part in the study by installing a program called watchdog which observed every click and keystroke they performed when they were coding. Observing the developer through the eyes of the IDE allowed the researchers to get a very intimate view on patterns and behaviour of the practitioners without disturbing their flow. To validate and contextualise the observations, practitioners had to answered questions about their testing behaviour before the recording started. The Answers were later used to compare the actual behaviour of developers with their self-assessment. The premise of the authors is that

before we are able to answer how we should test, we must first know how we are testing.

By looking at how developers behave, the authors came to a better understanding of the process and the challenges of software testing in practice and they were able to debunk a lot of the claims that were made by other researchers. The insights obtained by their observations lead to some very interesting suggestions not only on how to improve software development tools but also on what could be next steps in the evolvement of development methodologies.

One takeaway of the study is that software developers overestimate the time they spent on testing almost twofold, which surely has an impact on the developers estimation of cost vs. value when choosing development methodologies that do (not) involve testing. However even when practitioners claim to use development methodologies like TDD which rely on testing, more often then not their behaviour diverges a lot from the best practices those methodologies involve. The authors found that almost no one follows the guidelines strictly and they suggest that instead of advocating TDD, we should consider a different approach which they call Test-Guided Development (TGD). The discrepancy between survey answers and the practitioners behaviour that was revealed by looking at the survey answers and the recorder behaviour was very striking to me. The authors suggest that a diverse set of factors including psychological ones cause this divergence and I am terribly curious to see what these factors are.

The Mental Model

⚠︎ This section discusses a paper that has not yet been peer-reviewed at the time of writing

The paper about testing in the IDE from Beller at al.3 motivates to dig deeper into the thought process of developers. Anice et al.2 tackled exactly this topic with their study How Developers Engineer Test Cases: An Observational Study. They studied how developers are reasoning while they are in the act of developing tests and what they are evaluating while they are planning a coding session.

The eyes of the IDE that were utilised in Beller et al3.’s study can only see a small fraction of the practice of software testing. They are fixed on the tool and not on the developer. By focusing on the tool we see the actions but not the process in which ideas are emerging. Anice et al.2 shift the focus onto the developers mind to observe how ideas emerge and how they are turned into actions. In their qualitative study they worked with recordings of developers who were filming their screens while verbalising their thoughts during coding sessions.

By looking over the shoulder of developers in this way, the authors uncover multiple aspects of test development practice and propose a framework that maps out the different components that are connected to a test case written by a developer.

graph TB A[Documentation] -->|guides| B[Test case] B --> | in form of | C[Test code] D[Source code] --> | guides | B B --> | satisfies | E[Adequacy criterion] M[Mental model] --> | guides | B M --> | learns from failures| B M --> | derives from | A M --> | derives from | D

The framework shows the connection of the Developer to the Test code by emphasising the role of the Mental Model that is evolving when the developer reads source code and documentation, and when interacting with test code. Comments of the developers, claiming that

in real life, whenever the ticket comes with a nice documentation, testing gets easier

reveal once again the huge impact of artefacts and external factors on software testing. But not only did the authors observe that documentation and the source code are leveraged to build a mental model in order to write tests. What the study also showed is that running tests, seeing them fail and changing them can guide a developer as well and helps them gain understanding of the program they are dealing with.

What I really like about this paper is the personal touch given by the practitioners when they are verbalising what they are doing. I have the impression that this very unmediated look into the developers practice reveals many insights that cannot be seen when only looking at the usage of tools like the IDE, CI/CD pipelines or artefacts like the test-code itself. I am 100% with the authors when they suggest that

An interesting future work would be to repeat our observational study, but inside of different software companies, where different social factors may also play a role in how developers test.

I would really like to explore this in the future but as the pandemic is still soaring around the world I am afraid that I first have to focus on online research and research methods which are compatible with the current state of the world.

Reflection and takeaway

The two papers discussed in this post complement each other very nicely. They provoke a lot of questions but they also provide answers to very fundamental questions like:

  • How do developers manage failing tests?
  • How do developers reason about what test cases to write?
  • How do they know when to stop?
  • Which testing patterns are common in the IDE?

As Anice et al.2 suggest in their paper, researching these things ultimately leads to better insights which we need in order to improve software development practices.

While our body of knowledge on software testing is already quite significant, we again argue that developers are still the ones responsible for putting all these techniques together. Therefore, understanding the developers’ thought- and decision-making processes on, e.g., how they reason about what test cases to write, which techniques to apply, what types of questions they face when testing, and how they decide it is time to stop, is a fundamental step in making developers better at testing software.

Personally, as a software engineer I learned a couple of things by reading the papers that I will try to bring into the developer teams I am working with right now and in the future.

Developers largely do not run tests in the IDE. However, when they do, they do it extensively

It seems like taking an effort to make test execution very easy and accessible plays out. I can imagine that running tests with someone who is new to a project during their on-boarding process can really improve their testing behaviour right from the start. This finding also motivates me to provide easy to understand guides on how to run tests as part of a contribution guide in software projects.

We need to convene on a generally agreed-upon, formal definition of TDD

I am a big fan of TDD but after reading the paper I feel more encouraged to step a little bit out of line when it comes to the best practices associated with TDD and consciously establish my own way of test-guided development. I feel that this is also something worth to be discussed when collaborating with other developers who lean towards the TDD methodology.

Developers spend a quarter of their time engineering tests in the IDE. They overestimated this number nearly twofold.

As mentioned before, this finding was very striking to me and I will try use tools like watchdog in the future to get more awareness of what I spend my time on when I develop software. I hope this will enable me to make more precise claims and educated decisions when it comes to testing and other development practices.

Emerging ideas

After reading the papers I want to dig deeper into the methods used by the authors to observe developer behaviour from different perspectives. This can also mean to take things into account that have not been observed by the authors. What comes to mind is the usage of CI pipelines which execute tests automatically whenever new source code is pushed to a repository. Analogous to Beller et al.3 one could ask:

  • Which testing patterns are common when using CI pipelines?

What I also wonder after reading that developers overestimate the time they are investing in testing efforts and that

Roughly half of projects and users do not practice testing in the IDE actively

is what the reasons of not testing software really are. If developers generally overestimate the time (hence resources) required to properly test software, there is a chance that developers make an uneducated decision when they skip writing tests, arguing it is too expensive. If this holds, educating developers about the real costs of testing would be a big step in making developers better testers.

Another thing that I am curious about now is how different development frameworks and practices effect testing. TDD as a methodology to write code was investigated by the authors and a lot of research has been done about the effects TDD can have on developer teams. I wonder how frameworks and practices like pair-programming, agile, XP or DevOps effect the testing behaviour.

Already after two weeks many ideas are popping up and I am curious to see how those ideas will develop in the coming time. After reading a paper of Honda 4 about Socio-technical Grounded Theory (STGT), which I am planning to write in my next post about, I feel I still have a lot to learn before planning my first research project. And as my mind is still wide open to new ideas, please reach out to me if you have suggestions!


  1. Andrew J. Ko, Bryan Dosono, and Neeraja Duriseti. 2014. Thirty years of software problems in the news. In Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2014). Association for Computing Machinery, New York, NY, USA, 32–39. DOI: doi.org/10.1145/2593702.2593719 ↩︎

  2. Maurício Aniche, Christoph Treude, Andy Zaidman: How Developers Engineer Test Cases: An Observational Study. (2021) arXiv:2103.01783 ↩︎

  3. Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, Andy Zaidman: Developer Testing in the IDE: Patterns, Beliefs, and Behavior. IEEE Trans. Software Eng. 45(3): 261-284 (2019) ieeexplore.ieee.org/document/8116886/ ↩︎

  4. Rashina Hoda: Socio-Technical Grounded Theory for Software Engineering. (2021) arXiv:2103.14235 ↩︎

Avatar
Mark PhD student researching socio-technical enablers/inhibitors of software testing