What is Computational Reprodicibility?


  • Typically, simply providing your source code does not allow other to reproduce your work.
  • Computational reproducibility is the degree to which code can be run in a different context.
  • Improving computational reproducibility relies on capturing information about your computational environment.

Capturing computational environments


  • Package Managers are used to install, remove, upgrade, and track software.
  • In the context of Python and other programming languages this software is bundles of other people’s code.
  • However, installing all packages to the same place causes dependency clashes and makes recreating a computational environment difficult.
  • Virtual environments are used to deal with this problem by creating isolated spaces where packages can be installed without interfering with one another.
  • Using these two tools together allows capture of the ‘package’ level of your computational environment

Getting started with venvand pip


  • You can create, activate, and deactivate virtual environments using venv
  • You can installing packages in a virtual environment using pip install, and view installed packages with pip list
  • Python and venv create an directory on your computer that contains your virtual environment ( a seperate Python interpreter and library)
  • Activating and deactivating the virtual environment modifies the PATH and PYTHONPATH environmental variables to add/remove the path to the directory containing the virtual environment.

Using venv and pip to capture a computational environment


  • Package versions can be specified by using the semantic versioning syntax (or, less commonly, the calendar versioning syntax).
  • pip freeze can be used to get a list of installed packages, and these can be written to a file.
  • Packages can be restored from a file produced by pip freeze by using the --requirement option with pip install.

Limitations