What is Computational Reprodicibility?


  • Typically, simply providing your source code does not allow other to reproduce your work.
  • Computational reproducibility is the degree to which code can be run in a different context.
  • Improving computational reproducibility relies on capturing information about your computational environment.

Capturing computational environments


  • Package Managers are used to install, remove, upgrade, and track software.
  • In the context of Python and other programming languages this software is bundles of other people’s code.
  • However, installing all packages to the same place can cause dependency clashes and makes recreating a computational environment difficult.
  • Virtual environments are used to deal with this problem by creating isolated spaces where packages can be installed without interfering with one another.
  • Using these two tools together allows capture of the ‘package’ level of your computational environment

Getting started with venvand pip


  • You can create, activate, and deactivate virtual environments using the venv package
  • You can installing packages in a virtual environment using pip install, and view installed packages with pip list
  • Python and venv create an directory on your computer that contains your virtual environment ( a separate Python interpreter and library)
  • Activating and deactivating the virtual environment modifies the PATH and PYTHONPATH environmental variables to add/remove the path to the directory containing the virtual environment.

Using venv and pip to capture a computational environment


  • Package versions can be specified by using the semantic versioning syntax (or, less commonly, the calendar versioning syntax).
  • pip freeze can be used to get a list of installed packages, and these can be written to a file.
  • Packages can be restored from a file produced by pip freeze by using the --requirement option with pip install.

Limitations


  • pip and venv provide the most basic functionality for capturing this level of your computational environment.
  • Other tools (e.g. uv, pixi) are available that build on this basic functionality that would be worth investigating and incorporating.
  • Virtual environments can only capture a portion of the computational environment.
  • Projects that require more of the computational environment to be captured may need more advanced tools (e.g. VMs, containers, Nix/Guix) to achieve this.