What is Computational Reprodicibility?

Last updated on 2025-04-15 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What is computational reproducibility?

Objectives

Learn about computational reproducibility

Making your results and code reproducible

You have just finished your latest research project. The paper has been accepted by the journal (only minor revisions, yay!), the data is organised and ready to be placed in a repository, and your code is under version control and ready to be made public. So you’ve done everything to make your results reproducible, right?

Discussion

Does simply providing your source code equate to reproducibility?
If it doesn’t, what do you think can go wrong?

Show me the solution

The code won’t run
The code runs but doesn’t produce the same result

Add research on this

Your code is just the tip of the pyramid of your computational environment, and to ensure that your results are computationally reproducible you will need to capture some of that computational environment.

Your computational environment

A simplified way to think about your computational environment is to divide it into 5 layers, each with increasing generality:

5 layers within a computational environment

At the top is the most specific level: ‘your code’. This is the code that you produced to analyse your data and get the final published result.
Below this is the ‘packages’ layer, containing the packages you used within your code. These are also bundles of code, but they serve a more general purpose, being used in mutliple different pieces of (research) software. For example: numpy, pandas, etc.
Next is the ‘language’ layer. This is the specific programming language and version you used. Typically, both your code and the packages you have used are written in this language, and it consists of the language syntax as well as some built in packages (in some cases called the standard library).
The next layer is the ‘operating system’ layer. Which is a very simpified way of describing all the code that interacts between the programming language you are using, and the actual electronic hardware that makes up a computer.
Finally, you have the actual computer hardware.

What is computational reproducibility?

Discussion

How would you define computational reproducibility?

Show me the solution

A suggested definition:
Computational reproducibility is the degree to which your code can be run in a different computational context (i.e. either by a different person, at a different time, on a different machine, or any combination of these three) and will produce the same or equivalent outcome.

What counts as the same or equivalent will vary between research contexts. In some cases precise byte-for-byte reproducibility is essential, while in others getting results that fall in the same range will be suitable.

The more layers of your computational environment that you are able to capture (going from the top to the bottom layer), the more reproducible your results will be. Although this is accompanied by increasing technical complexity, so choosing the right degree of computational reproducibility for your project is key.

In the next section we’ll look in a bit more detail about package managers, virtual environments, and how we can use them in conjunction to capture the ‘packages’ layer of a computational environment.

Key Points

Typically, simply providing your source code does not allow other to reproduce your work.
Computational reproducibility is the degree to which code can be run in a different context.
Improving computational reproducibility relies on capturing information about your computational environment.