What is Computational Reprodicibility?
- Typically, simply providing your source code does not allow other to reproduce your work.
- Computational reproducibility is the degree to which code can be run in a different context.
- Improving computational reproducibility relies on capturing information about your computational environment.
Capturing computational environments
- Package Managers are used to install, remove, upgrade, and track software.
- In the context of Python and other programming languages this software is bundles of other people’s code.
- However, installing all packages to the same place causes dependency clashes and makes recreating a computational environment difficult.
- Virtual environments are used to deal with this problem by creating isolated spaces where packages can be installed without interfering with one another.
- Using these two tools together allows capture of the ‘package’ level of your computational environment
Getting started with venv
and pip
- You can create, activate, and deactivate virtual environments using
venv
- You can installing packages in a virtual environment using
pip install
, and view installed packages withpip list
- Python and
venv
create an directory on your computer that contains your virtual environment ( a seperate Python interpreter and library) - Activating and deactivating the virtual environment modifies the
PATH
andPYTHONPATH
environmental variables to add/remove the path to the directory containing the virtual environment.
Using venv
and pip
to capture a computational environment
- Package versions can be specified by using the semantic versioning syntax (or, less commonly, the calendar versioning syntax).
-
pip freeze
can be used to get a list of installed packages, and these can be written to a file. - Packages can be restored from a file produced by
pip freeze
by using the--requirement
option withpip install
.