What is Computational Reprodicibility?
- Typically, simply providing your source code does not allow other to reproduce your work.
- Computational reproducibility is the degree to which code can be run in a different context.
- Improving computational reproducibility relies on capturing information about your computational environment.
Capturing computational environments
- Package Managers are used to install, remove, upgrade, and track software.
- In the context of Python and other programming languages this software is bundles of other people’s code.
- However, installing all packages to the same place causes dependency clashes and makes recreating a computational environment difficult.
- Virtual environments are used to deal with this problem by creating isolated spaces where packages can be installed without interfering with one another.
- Using these two tools together allows capture of the ‘package’ level of your computational environment
Getting started with venvand pip
- You can create, activate, and deactivate virtual environments using
venv - You can installing packages in a virtual environment using
pip install, and view installed packages withpip list - Python and
venvcreate an directory on your computer that contains your virtual environment ( a seperate Python interpreter and library) - Activating and deactivating the virtual environment modifies the
PATHandPYTHONPATHenvironmental variables to add/remove the path to the directory containing the virtual environment.
Using venv and pip to capture a computational environment
- Package versions can be specified by using the semantic versioning syntax (or, less commonly, the calendar versioning syntax).
-
pip freezecan be used to get a list of installed packages, and these can be written to a file. - Packages can be restored from a file produced by
pip freezeby using the--requirementoption withpip install.