Software Packaging
Reproducibility is an integral concept in the FAIR4RS principles. Appropriate software packaging is one way to account for reproducible research software, which involves collecting and configuring software components into a format deployable across different computer systems.
Software packaging is akin to the packaging a box for shipment. Attributes such as the software source code, installation instructions, user documentation, and test scripts all support to ensure reproducibility.
The purpose of a software package is to install source code for execution on various systems, with considerations including target users, dependencies, testability and scalability.
Package File History
- Python packages make code easier to install, reuse and maintain.
- A single pyproject.toml file is all that is required to package your Python project.
- There are multiple standards out there for Python packaging, but pyproject.toml is the current recommended way.
Accessing Packages
- pip is the most common tool used to download and access python packages from PyPI.
- PyPI is an online package repository which users can choose to upload their packages to for others to use.
- pip can also be used to install packages on your local system (installing from source)
Creating Packages
- A package can be built with as little as 2 files, a Python script and a configuration file
- pyproject.toml files have 2 key tables, [build-system] and [project]
- Editable installs allow for quick and easy package development
Versioning
Versioning is crucial for tracking the development, improvements, and bug fixes of a software package over time. It ensures that changes are documented and managed systematically, aiding in reproducibility and reliability of the software.
Tools like
setuptools_scm
help automate the version bumping process, reducing manual errors and ensuring that version numbers are updated consistently across all project files.Versioning enables users to track code changes and dependencies, allowing reliable recreation of specific software versions, and further aiding the reproducibility of your software.
Releasing Python Packages
GitHub tags provide a way to manage specific software versions via releases, enabling developers to easily reference and distribute stable versions of their software for their users.
Releases allow your software to be quickly and easily installed across different systems.
Publishing Packages
You can easily publish your package on PyPI for the wider Python community, allowing your users to simply install your software using
pip install
.The University of Sheffield’s ORDA repository is another valuable platform to upload your software, further enabling software reproducibility, transparency, and research impact for all project collaborators involved.