What is Computational Reprodicibility?
Last updated on 2025-04-15 | Edit this page
Overview
Questions
- What is computational reproducibility?
Objectives
- Learn about computational reproducibility
Making your results and code reproducible
You have just finished your latest research project. The paper has been accepted by the journal (only minor revisions, yay!), the data is organised and ready to be placed in a repository, and your code is under version control and ready to be made public. So you’ve done everything to make your results reproducible, right?
Discussion
Does simply providing your source code equate to
reproducibility?
If it doesn’t, what do you think can go wrong?
The code won’t run
The code runs but doesn’t produce the same result
Add research on this
Your code is just the tip of the pyramid of your computational environment, and to ensure that your results are computationally reproducible you will need to capture some of that computational environment.
Your computational environment
A simplified way to think about your computational environment is to divide it into 5 layers, each with increasing generality:

- At the top is the most specific level: ‘your code’.
This is the code that you produced to analyse your data and get the
final published result.
- Below this is the ‘packages’ layer, containing the
packages you used within your code. These are also bundles of code, but
they serve a more general purpose, being used in mutliple different
pieces of (research) software. For example:
numpy
,pandas
, etc. - Next is the ‘language’ layer. This is the specific
programming language and version you used. Typically, both your code and
the packages you have used are written in this language, and it consists
of the language syntax as well as some built in packages (in some cases
called the standard library).
- The next layer is the ‘operating system’ layer. Which is a very simpified way of describing all the code that interacts between the programming language you are using, and the actual electronic hardware that makes up a computer.
- Finally, you have the actual computer hardware.
What is computational reproducibility?
Discussion
How would you define computational reproducibility?
A suggested definition:
Computational reproducibility is the degree to which your code can be
run in a different computational context (i.e. either by a different
person, at a different time, on a different machine, or any combination
of these three) and will produce the same or equivalent outcome.
What counts as the same or equivalent will vary between research contexts. In some cases precise byte-for-byte reproducibility is essential, while in others getting results that fall in the same range will be suitable.
The more layers of your computational environment that you are able to capture (going from the top to the bottom layer), the more reproducible your results will be. Although this is accompanied by increasing technical complexity, so choosing the right degree of computational reproducibility for your project is key.
In the next section we’ll look in a bit more detail about package managers, virtual environments, and how we can use them in conjunction to capture the ‘packages’ layer of a computational environment.
Key Points
- Typically, simply providing your source code does not allow other to reproduce your work.
- Computational reproducibility is the degree to which code can be run in a different context.
- Improving computational reproducibility relies on capturing information about your computational environment.