FAIR4RS Tools for Reproducibility: Packaging: All in One View

Content from Software Packaging

Last updated on 2025-12-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What is software packaging?
How is packaging related to reproducibility and the FAIR4RS principles?
What does packaging a Python project look like?

Objectives

Recognise the importance of software packaging to ensure reproducibility.
Understand what are the basic building blocks of a Python package.

Introduction

One of the most challenging aspects of research is reproducibility. This necessitates the need to ensure that both research data and research software adhere to a set of guidelines that better enable open research practices across all disciplines. The recent adaptation of the original FAIR principles (Findable, Accessible, Interoperable, Reusable) means that research software can now also benefit from the same general framework as research data, whilst accounting for their inherent differences, including software versioning, dependency management, writing documentation, and choosing an appropriate license. This is commonly referred to as the FAIR4RS principles.

Discussion

Can you recall a time when you have used someone else’s software but encountered difficulties in reproducing their results? What challenges did you face and how did you overcome them?

Software packaging is one of the core elements of reproducible research software. In general, software packaging encompasses the process of collecting and configuring software components into a format that can be easily deployed on different computing environments.

alt text for accessibility purposes — *Figure 1: A software package is like a box containing all the items you need for a particular activity, neatly packed together to transport to someone else*.

Callout

Think about what a package is in general; you typically have a box of items that you want to post to someone else in the world. But before you post it for others to use, you need to make sure the package has things like: an address label, an instruction manual, and protective material.

Challenge

Challenge 1: Packaging Analogy

Using the analogy in the callout above, provide an example for each package attribute in terms of the software attribute.

Give me a hint

The above callout should help you think about a few different possible analogies.

Show me the solution

Box of items: The software itself (source code, data, images).
Address label: Installation instructions specifying the target system requirements (operating system, hardware compatibility).
Instruction manual: User documentation explaining how to use the software effectively.
Protective materials: Error handling routines, data validation checks to safeguard the software from misuse or unexpected situations.

Why Should You Package Your Software?

As we’ve touched on above, there are several benefits to packaging your software:

Ease of installation: Instead of manually copying individual files and setting configurations, users can install the collective package using a package manager with very few steps involved.
Standardisation: Packaging enforces a standard structure and format for software, making it easier for users and developers to understand.
Research impact and collaboration: From a research impact perspective, software packaging ensures reproducibility, accessibility, and ease of dissemination to the wider research community.

If you decide that packaging your software is right for your project, there are some important questions that you, as the developer, should consider before getting started:

Target Users: Who are you building this package for? Beginners, experienced users, or a specific domain? This will influence the level of detail needed in the documentation and the complexity of dependencies you may need to include.
Dependencies: What other Python libraries does your package rely on to function? What about hardware dependencies? Finding the right balance between including everything a user may need and keeping the package lightweight is important.
Testability: How will users test your package? You should consider including unit tests and examples to demonstrate usage and ensure your code functions as expected.
Scalability: As your project and software grows in size and complexity, how can you effectively handle the increased modules, dependencies and distribution requirements?

Of course, this is not an exhaustive list, however, once you have thought about candidate solutions for these questions, you’re in a good position to start packaging your software.

Packaging in Python

The most basic directory structure of a Python package looks something like:

📦 my_project/
├── 📂 my_package/
└── 📄 pyproject.toml

where

- 📦 `my_project/` is the root directory of the project.
- 📂 `my_package/` is the package directory containing the source code.
- 📄 `pyproject.toml` is a configuation file for setting up the package, containing basic metadata.

Tools such as setuptools and pip use the pyproject.toml file to configure how the package is built, distributed, and installed. We’ll cover pyproject.toml in more depth in the next couple of episodes.

Optional: What is init.py?

At this point, it’s worth discussing the use of the __init__.py file that you might have come across before when packaging. Historically, the __init__.py script has been used to mark a directory as a Python package, allowing the contained modules to be imported (note; the use of double under lines in Python, often abbreviated to dunder lines, signal that this script should be “hidden” from users, helping distinguish this script from others.) It also contains any initialisation code for the package. The directory’s structure might look something like:

📦 my_project/
├── 📂 my_package/
│   └── 📄 __init__.py
└── 📄 pyproject.toml

For instance, consider the times you have imported a package, such as numpy. The ability to write:

PYTHON

import numpy

is enabled by the modular structuring of the numpy package. This includes presence of the __init__.py file, which signals to Python that the directory is a package, allowing to import its content using the import statement. The complete import numpy statement then means Python searches for the numpy package in its search path (sys.path) and loads its contents into the namespace under the name numpy. Packages that follow the folder structure above are often referred to as regular packages.

However, in Python versions >= 3.3, the concept of implicit namespace packages (see PEP 420) was introduced. Namespace packages are commonly used to split a regular Python package (as described above) across multiple directories, which ultimately means the __init__.py file is technically not required to create any Python package. For the purposes of this course, we will omit the use of the __init__.py script.

Challenge

Challenge 2: Improving your project’s packaging

The directory structure of the basic Python package shown above is a good starting point, but it can be improved. From what you have learned so far, what other files and folders could you include in your package to provide better organisation, readability, and compatibility?

Give me a hint

Use the emoji folder structure above to get you started.

Show me the solution

A possible improvement could be to include the following to your package:

📦 my_project/
├── 📂 my_package/
├── 📂 tests/
├── 📄 pyproject.toml
├── 📄 README.md
└── 📄 LICENSE

The most obvious way to improve the package structure is to include a series of unit tests in a tests directory to demonstrate usage and ensure your code functions as expected. The main benefit of a README.md file is to provide essential information and guidance about a project to users, contributors, and maintainers in a concise and easily accessible format. Similarly, the purpose of a LICENSE.md file is to specify the licensing terms and conditions under which the package’s code and associated assets are made available to others for use, modification, and distribution.

Although we have touched on the core concepts of packaging in Python, including how to set up one using the pyproject.toml configuration file, we still need to learn about how to write the metadata and logic for building a package. The next episode of this course provides a brief overview of the history of Python packaging, and what is required to turn your own project into a package.

Key Points

Reproducibility is an integral concept in the FAIR4RS principles. Appropriate software packaging is one way to account for reproducible research software, which involves collecting and configuring software components into a format deployable across different computer systems.
Software packaging is akin to the packaging a box for shipment. Attributes such as the software source code, installation instructions, user documentation, and test scripts all support to ensure reproducibility.
The purpose of a software package is to install source code for execution on various systems, with considerations including target users, dependencies, testability and scalability.

Content from Package File History

Last updated on 2025-12-09 | Edit this page

Estimated time: 11 minutes

Overview

Questions

What is required to turn your Python project into a package?
Why are there so many file types you can use to create packages in Python?
Which file type is the most appropriate for my project?

Objectives

Learn the difference between a python project and package
Understand the prerequisites for turning your project into a package
Explain the different ways of creating a Python Package
Understand the shortfalls of the previous packaging standards

Introduction

In this episode we are going to look at what turns your project of python code into a Python package. Throughout Pythons development there have been many different ways of doing this, we will aim to explore some of these. This is to both build an understanding of why the current standard is what it is and to have some context if you ever come across the other methods when looking at other projects.

Callout

What Python packaging files exist?

requirements.txt
setup.py
pyproject.toml

Requirements.txt

A requirements.txt is a text file where each line represents a package or library that your project depends on. A package managing tool like PIP can use this file to install all the necessary dependencies.

While a requirements.txt file isn’t normally directly used for packaging, its a simple and common filetype that offers some of the features that the packaging files do.

requests==2.26.0
numpy>=1.21.0
matplotlib<4.0

Setup.py

Before the introduction of pyproject.toml the main tool supported for installing packages was setup.py. As the extension suggests a setup.py is a python file where the metadata and logic for building your package are contained.

Challenge

Setup.py problems

Q: Discuss with each other what problems if any you think there may be with using a python file to create python packages

Give me a hint

Think about the differences between a code file and a text file

Show me the solution

As setup.py is a code file, there is a potential for malicious code to be hidden in them if the file comes from an untrusted source
There is quite a bit of ‘boilerplate’ in each file
The syntax has to be precise and may be confusing to those not familiar with Python

PYTHON

from setuptools import setup

setup(
  name='fibonacci',
  version='1.0.0',
  description='A package to calculate the fibonacci sequence',
  long_description=open('README.md').read(),
  author='John Doe',
  author_email='john.doe@example.com',
  license='MIT',
)

Pyproject.toml

Introduced in PEP517, the latest file for packaging a python project is the pyproject.toml file. Like a .cfg file, a toml file is designed to be easy to read and declarative. It is the current recommended way to package your Python

Callout

TOML stands for Tom’s Obvious Minimal Language!

When originally introduced, the pyproject.toml file aimed to replace setup.py as the place to declare build system dependencies. For example the most basic pyproject.toml would look like this.

TOML

[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools", "wheel"]

Project metadata however was still being specified either in a setup.py or a setup.cfg, the latter being preferred.

With the introduction of PEP621 in 2020, project metadata could also be stored in the pyproject.toml files, meaning you only now need the single file to specify all the build requirements and metadata required for your package! This is still the preferred way in the community.

We will be going into how to make a pyproject.toml file in more detail in one of the next episodes.

TOML

#Build system information
[build-system]
requires = ["setuptools", "wheel"]

#Project Metadata
[project]
name = "my_cool_package"
version = "0.0.0"
description = "A package to do awesome things"
dependencies = ["pandas", "numpy"]

#Config for an external tool
[tool.black]
line-length = 98

Key Points

Python packages make code easier to install, reuse and maintain.
A single pyproject.toml file is all that is required to package your Python project.
There are multiple standards out there for Python packaging, but pyproject.toml is the current recommended way.

Content from Accessing Packages

Last updated on 2025-12-09 | Edit this page

Estimated time: 32 minutes

Overview

Questions

What are the different ways of downloading Python packages?
What are package managers?
How can I access my own package?

Objectives

Learn about package managers such as PIP
Install packages using PIP
Install packages from source

Introduction

Due to Python’s popularity as a language, it is quite likely that you won’t be the first person to set off on solving any particular task. Many others have worked on common problems and then shared their solution in the form of a package, which you can conveniently integrate into your own code and use!

Callout

Popular Packages

Some of the most popular packages you may have heard of are:

Numpy
Pandas
Tensorflow
Matplotlib
Requests

To use a package that is installed you use the key word import in Python.

PYTHON

# This imports the pandas package and gives it a new name 'pd'.
import pandas as pd 

# Use the package to read a file
pd.read_csv("/my_data.csv")

Python Package Index (PyPI)

The Python Package Index or PyPI is an online repository of Python packages hosting over 500,000 packages! While there are alternatives such as conda-forge, PyPI is by far the most commonly used and likely to have all you need.

Discussion

Exercise 1: Explore PyPI

Explore PyPI to get familiar with it, try searching for packages that are relevant to your research domain / role!

Callout

pip

pip (package installer for Python) is the standard tool for installing packages from PyPI. You can think of PyPI being the supermarket full of packages and pip being the delivery van bringing it to you.

Using pip

pip itself is a Python package that can be found on PyPI. It however comes preinstalled with most Python installations, for example python.org and inside virtual environments.

The most common way to use pip is from the command line. At the top of a package page on PyPI will be the example line you need to install the package

python -m pip install numpy

The above will install numpy from PyPI, a popular scientific computing package enabling a wide range of mathematical and scientific functions.

Callout

You may notice a wheel file download during the pip install, for example Downloading numpy-2.0.0-cp312-cp312-win_amd64.whl. A wheel in Python is a prebuilt package format that allows for quicker and more efficient installation, so when it is downloaded your local computer doesn’t need to do any building. The alternative is source files which often take the form .zip or .tar.gz, which when downloaded will then need to be built then installed, which is often far slower.

Discussion

Exercise 2: Create venv and install Numpy

Step 1: Create a venv in the .venv directory using the command python -m venv .venv and activate it with

.\.venv\Scripts\activate

source .venv/bin/activate

When activated you should see the name of your environment in brackets at the start of your terminal line

Step 2: Install Numpy into your new environment

Step 3: Check your results with python -m pip list

Step 4: Deactivate your environment with deactivate

Virtual Environments

Check out this documentation or the FAIR4RS course on virtual environments to learn more!

pip can also be used to install packages from source. This means that the package file structure (source) is on your local computer and pip installs it using the instructions from the setup.py or pyproject.toml file. This is especially handy for packages either not on PyPI, like ones downloaded from github, or for your own packages you’re developing.

python -m pip install .

python3 -m pip install .

Instructor Note

It may be worth checking with the class at this point that they are all familiar with the -m notation, and what the above command does exactly

Here the . means to install your current directory as a Python package. For this to work the directory your command line interface is currently in needs to have a packaging file, i.e. setup.py or pyproject.toml.

Key Points

pip is the most common tool used to download and access python packages from PyPI.
PyPI is an online package repository which users can choose to upload their packages to for others to use.
pip can also be used to install packages on your local system (installing from source)

Content from Creating Packages

Last updated on 2025-12-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

Where do I start if I want to make a Python package?
What will I need / want in my package?
What’s considered good practice with packaging?

Objectives

Create and build a basic example Python package
Understand all the parts and decisions in making the package

Introduction

This episode will see us creating our own Python project from scratch and installing it into a Python virtual environment ready for use. Feel free if you’re feeling adventurous to create your own package content or follow along with this example of a Fibonacci counter.

Fibonacci Counter

This package will allow a user to find any value from the Fibonacci sequence. The Fibonacci sequence is a series of whole numbers where each number is the sum of the two previous numbers. The first 8 numbers of the sequence are 0, 1, 1, 2, 3, 5, 8, 13.

Callout

Reinventing the wheel

It is good to ask yourself if the package or features you are designing have been done before. Obviously we have chosen a simple function as the focus of this episode is on packaging code rather than developing novel code.

Creating the package contents

In this section we will go through creating everything required for the package.

Challenge

What files and content go into a package?

Think back to the earlier episodes and try to recall all the things that can go into a package.

Show me the solution

Python Module - This is the directory with the Python code that does the work.
Configuration File - e.g. your pyproject.toml file
Other metadata files - e.g. LICENCE, README.md, citation.cff
Python Tests - A directory full of unit-tests and other tests

In this episode we will only be creating a minimal example so many of the files you have thought of won’t be included. Next we will be creating our directory structure. In either your documents folder if you are on Windows or your home directory if you are on macOS or Linux, create a folder called my_project

📦 my_project/
├── 📂 my_package/
│   └── 📄 fibonacci.py
├── 📄 pyproject.toml
└── 📂 tests/
    └── 📄 test_fibonacci.py

The first thing we will do in this project is create the Python module (the actual code!).

Discussion

Creating Python module

Create a Python file called fibonacci.py as shown in the structure above.
Add the following code which contains a function that returns the Fibonacci sequence

PYTHON

def fibonacci(n_terms):
  num1 = 0
  num2 = 1
  next_num = 1
  count = 0

  while count < n_terms:
    print(num1)
    count += 1
    num1, num2 = num2, next_num
    next_num = num1 + num2

Challenge

Using your Python module

Create a script in your project directory that imports and uses your Fibonacci script. This will serve as a good quick test that it works.

Show me the solution

Create the file in /my_project, for example fibonacci_test.py.
Import and run the Fibonacci function:

PYTHON

from my_package.fibonacci import fibonacci

fibonacci(5)

Configuration File

In this section we are going to look deeper into the pyproject.toml. Sections in a .toml file are called tables. In a pyproject.toml file there are 2 tables required for a minimum working pyproject.toml: a [build-system] table and a [project] table. Take a look at the minimum example pyproject.toml below.

TOML


[build-system]
requires = ["setuptools"]


[project]
name = "fibonacci"
version = "0.1.0"
description = "A package to calculate the fibonacci sequence"
dependencies = ["pandas", "numpy"]

[build-system]

The [build-system] table specifies information required to build your project directory into a package. The main key in this table is requires, this key states what build tool(s) should be used to do this building. There are multiple popular build tools that can be used to build your project, in this tutorial we will use setuptools, as it is simple and very popular.

[project]

The [project] table is where your package’s core metadata is declared.

Callout

pyproject.toml documentation

The full list of accepted keys can be found here in the documentation

Challenge

Create your configuration file

Create a pyproject.toml file with the two required tables. In the [project] table include the following keys:

name
version
description
authors
keywords

Show me the solution

TOML

[build-system]
requires = ["setuptools"]

[project]
name = "fibonacci"
version = "0.1.0"
description = "A package which can produce the Fibonacci sequence"
authors = [{name = "your_name", email="youremail@email.com"}]
keywords = ["fibonacci", "maths"]

Running python -m pip install . will install your package. Just ensure your terminal’s working directory is the same as the pyproject.toml file!

Callout

Editable Install

When installing your own package locally, there is an option called editable or -e for short. python -m pip install -e .

With a default installation (without -e), any changes to your source package will only appear in your Python environment when your package is rebuilt and reinstalled. The editable option allows for quick development of a package by removing that need to be reinstalled, for this reason it is sometimes called development mode!

Key Points

A package can be built with as little as 2 files, a Python script and a configuration file
pyproject.toml files have 2 key tables, [build-system] and [project]
Editable installs allow for quick and easy package development

Content from Versioning

Last updated on 2025-12-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

Why is versioning essential in software development? What problems can arise if versioning is not properly managed?
How can automation tools, such as those for version bumping, improve the software development process?
Why is it important to maintain consistency and transparency in software releases?

Objectives

Explain why versioning is crucial for software development, particularly in maintaining reproducibility and ensuring consistent behaviour of the code after changes.
Understand how to use setuptools_scm for automating version bumping in Python projects.

Introduction

In previous episodes, we developed a basic Python package to demonstrate the importance of software reproducibility. However, a crucial question that we haven’t addressed yet is: how can we, as the developers, ensure that a change in our package’s source code does not result in the code failing or behaving incorrectly? This is also an important consideration for when you are releasing your package.

Discussion

One of the pitfalls of packaging is to fall into poor naming conventions, even for scripts. For instance, how many times have you worked on scripts that was named my_script_v1.py or my_script_final_version.py? What were your main challenges with this approach, and what alternative solutions can you think of to circumvent this naive approach?

Semantic Versioning

The answer the question above is based on a concept called versioning. Versioning is the practice of assigning unique version numbers to different states or releases of a given package to track its development, improvements, and bug fixes over time. The most popular approach for Python packaging is to use the Semantic Versioning framework, and can be summarised as follows:

Given a version number X.Y.Z, where X is the major version, Y is the minor version and Z is the patch version, you increment:

X when you make incompatible API changes,

Y when you add functionality in a backwards compatible manner,

Z when you make backwards compatible bug fixes.

Callout

Recall: API

An Application Programming Interface (API) is the name given to the way different programs or parts of a program to communicate with each other. It provides a set of functions, methods that can be used to interact with a piece of software or data services. Commonly, APIs are used within web-based applications to enable users to receive information from a given service, such as logging into social media accounts, creating weather widgets, or finding geographical locations.

The first version of any package typically starts at 0.1.0, and any changes following the semantic versioning rules above results in an increment to the appropriate version numbers. For example, updating a software from version (0.1.0) to (1.0.0) is called a major release. Version (1.0.0) is commonly referred to as the first stable release of the package.

An important point to highlight is the semantic versioning guidance above is a general rule of thumb. Exactly when you decide to bump the versions of your package is dependent on you, as the developer, to be able to make that decision. Developers typically take the size of the project into account as a factor; for example, small packages may require a patch release for every individual bug that is fixed. On the other hand, larger packages often group multiple bug fixes into a single patch release to help with tractability because making a release for every fix would accumulate in a myriad of releases, which can be confusing for users and other developers. The table below shows 3 examples of major, minor and patch releases developers made for Python.

Release Type	Version Change	Description
Major Release	2.0.0 to 3.0.0	Introduced significant and incompatible changes, such as the print function and new syntax.
Minor Release	3.7.0 to 3.8.0	Added new features like the walrus operator and positional-only parameters, backward-compatible.
Patch Release	3.8.0 to 3.8.1	Fixed bugs and made performance improvements without adding new features or breaking changes.

Table 1: Examples of major, minor and patch releases of Python.

Callout

Pre-release Versions

Pre-release versions in semantic versioning are versions of the software that are still in development or testing before a stable release. They are denoted by appending a hyphen and a series of dot-separated identifiers to the version number, such as 1.0.0-alpha or 1.0.0-beta.1. These versions allow developers to release early versions for testing and feedback while clearly indicating their status.

Callout

Once we publicly release a version of our software, it is crucial to maintain consistency and avoid altering it retroactively. Any necessary fixes needs to be addressed through subsequent releases, typically indicated by an increment in the patch number. For instance, Python 2 reached its final version, 2.7.18, in 2020, more than a decade after the release of Python 3.0. If the developers decided to discontinue support for an older version, leaving vulnerabilities unresolved, they would have to transparently communicate this to their users and encourage them to upgrade.

Challenge

Challenge 1: Semantic Versioning Decision Making

Imagine you are a developer working on a Python library called DataTools, which provides various utilities for data manipulation. The library uses semantic versioning and is currently at version 1.2.3. You have implemented a new feature that adds support for reading and writing CSV files with custom delimiters.

According to semantic versioning, should you bump the version to 1.3.0, 1.2.4, or 2.0.0? Explain your reasoning.

Give me a hint

Think about whether the new feature introduces any breaking changes for existing users.

Show me the solution

According to semantic versioning, since the new feature adds functionality in a backward-compatible manner, the version should be bumped to 1.3.0. This signifies a minor version increase.

Callout

Versioning vs Version Control

Note; although they share similarities, you should not confuse software versioning and version controlling your software. The table below outlines some similarities and differences to help you differentiate them:

Aspect	Version Control	Versioning
Purpose	Tracking changes, enhancing collaboration, and maintaining a history of revisions	Differentiating between various stages of software development or releases, ensuring clear identification of updates and changes
Features	Branching, conflict resolution, merging	Version numbering, compatibility guidelines, and release notes
Example	Git	Semantic Versioning
Benefits	Collaboration, code integrity, and project management	Communication of changes (major, minor, patch), transparency, and compatibility
Challenges	Managing conflicts and merges with multiple contributors, ensuring training for teams, and integrating within existing processes	Ensuring backward compatibility and avoiding confusion with version numbers that accurately reflect the changes

Dynamic Versioning using setuptools_scm

At this point, you might be thinking; “Do I have to manually update the version number everywhere every release?” Thankfully, the answer is no. Often, the version number associated to your package will typically be in multiple locations within your project, for example, in your .toml file and separately in your documentation. This means that manually updating every location for every release you have can be extremely cumbersome and prone to human-error, and therefore, you should avoid manually updating your versions. Fortunately, the setuptools library we looked at in previous episodes can help us automate these tasks.

The most natural and simplest solution is to use setuptools_scm (abbreviated as Setuptools Semantic Versioning), which is an extension of the setuptools library. setuptools_scm simplifies versioning by dynamically generating version numbers based on your version control system (e.g. git). It can extract Python package versions using git metadata instead of having to manually declare them yourself.

There are 3 changes to your pyproject.toml file required to get your project to build using the version number from the git tag instead of the manual version value:

Add setuptools-scm to the required packages
Replace the manual version with a dynamic one
Add the setuptools.scm table header to instruct the build process to use setuptools-scm

TOML


[build-system]
requires = ["setuptools"]


[project]
name = "fibonacci"
description = "A package to calculate the fibonacci sequence"
dependencies = ["pandas", "numpy"]
dynamic = ["version"]

[tool.setuptools_scm]

Once you’ve set up setuptools_scm in your pyproject.toml file, the workflow to version bump your package will look like:

Commit your changes:
- Add your changes and commit them in your Git repository.
  BASH
```
git add .
git commit -m "Your changes"
```
Tag a new version:
- Create a new Git tag to bump the version (e.g., from v1.0.0 to v1.1.0):
  BASH
```
git tag v1.1.0
```
Push the tag to your remote repo (e.g. GitHub):

Push the new tag to your remote Git repository if applicable:
BASH
```
git push origin v0.1.0
```

Rebuild/reinstall your package:
- When you build or install the package, setuptools-scm will automatically pick up the new version from the Git tag.
  BASH
```
pip install .
```
  Following this, you can confirm your new version by printing your package’s __version__ attribute. Ultimately, this means that your package’s Git tags, __version__, pyproject.toml and any other file containing your package version version will automatically updated and synchronised with each other. So when users or other developers are using your framework, they’re able to accurately tracking any code changes and dependencies, allowing them to reliably recreate specific versions of your software at any point in time.

Callout

Although setuptool_scm is the most common and simplest tool for dynamic versioning, there are many alternatives that you may consider in your own project. These include:

PDM - this is a Python package and dependency manager, which also supports the latest PEP standards.
Hatch - used for managing and publishing Python projects, and handling virtual environments and versioning. It’s also best suited for multi-environment and cross-version testing setups.
Rye - a lightweight Python project manager designed to simplify dependency management and virtual environments. This is generally ideal for users looking for a streamlined workflow.

Key Points

Versioning is crucial for tracking the development, improvements, and bug fixes of a software package over time. It ensures that changes are documented and managed systematically, aiding in reproducibility and reliability of the software.
Tools like setuptools_scm help automate the version bumping process, reducing manual errors and ensuring that version numbers are updated consistently across all project files.
Versioning enables users to track code changes and dependencies, allowing reliable recreation of specific software versions, and further aiding the reproducibility of your software.

Content from Releasing Python Packages

Last updated on 2025-12-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

What does releasing software mean?
How can you prepare your software for releasing and publishing on different platforms?

Objectives

Understand the significance of releasing and publishing your software in the context of FAIR4RS.
Learn what other files and data can be released alongside the software.

Preparing to Publish

Now that we have covered the fundamentals of packaging in Python, we can start preparing to publish the package online for others to use. But before we do, we need to make sure our package contains the necessary files. To recap, let’s review the basic directory we created back in episode one, which had the following structure:

📦 my_project/
├── 📂 my_package/
│ └── 📄 init.py
├── 📂 tests/
├── 📂 docs/
│ └── 📄 documentation.md
├── 📄 pyproject.toml
├── 📄 README.md
└── 📄 LICENSE

README

Firstly, all packages must contain a README.md file that explains what the project is. how users can install it and how they can use it. A good example of a README.md file may look something like:


# My Python Project

My Python Project is a simple utility tool designed to perform basic operations on text files. Whether you need to count words, find specific phrases, or extract data, this tool has you covered.

## Installation

You can install My Python Project via pip:

$ pip install my-python-project

## Usage

from my_python_project import text_utils

text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
word_count = text_utils.count_words(text)
print("Word count:", word_count)

This will output:

Word count: 9

Notice that the README.md should be included at the top level of our project directory. If your package was created using a .toml file, it should also be included in the metadata by adding in the following line:

TOML


[project]
readme = "README.md"

Callout

In the README.md file, developers also usually include in a “contributing” section for new users that are typically outside of the project. The purpose of this section is to encourage new developers to work on the project, while ensuring they follow the etiquette set by the project developers. This may look something like:

### Contributing

Contributions to My Python Project are welcome! If you'd like to contribute, please follow these steps:

1. Fork the repository.
2. Create a new branch for your feature (git checkout -b feature/new-feature).
3. Make your changes and ensure tests pass.
4. Commit your changes (git commit -am 'Add new feature').
5. Push to the branch (git push origin feature/new-feature).
6. Create a new Pull Request.

Licensing

Following this, it is essential for your software to have a license to emphasise to users what their rights are in regards to usage and redistribution. The purpose of this is to provide the developer with some legal protections, if needed. There are many different open source licenses available, and it is up to the developer(s) to choose the appropriate license. You can explore alternative open source licenses at www.choosealicense.com. It is important to note that your selection of license may be constrained by the licenses of your dependencies.

The most common license used in open source projects is the MIT license. The MIT license is permissive, which allows users to freely use, modify, and distribute software while providing a disclaimer of liability.

Callout

The MIT License has the following terms:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

As with the README.md, you can also add the license to your pyproject.toml file as:

TOML


[project]
license = {file = "LICENSE"}

Creating Releases of your Software

Once you have prepared all of the material above, you will be in a good position to release your software to an online repository. The most common platform to host your software packages is on GitHub, which uses git as the underlying tool to version control your code (note; alternatives are GitLab, BitBucket and SourceForge).

While the terms releasing and publishing are commonly used interchangeably, in this course, releasing refers to making a version of the software available for download and use, whereas publishing refers to the formal announcement and distribution of the software to a wider audience on a platform or marketplace.

Manual Releasing using Git Tags

On GitHub, it is a relatively simple process to create a release of your software by using Git tags. Git tags are a way of permanently tagging a specific point in your repository’s history, which can be used to denote a version that is suitable for others to use. A tag is an immutable reference to a commit (or series of commits), making it simple to identify specific versions of a software, and the tags are commonly identified in conjunction with the Semantic Versioning framework (e.g. v1.0.0). For more information about how GitHub uses tags for software releases, see releases.

In general, tagging a release is a 2 step process using Git:

Create a tag of a specific point in your software package’s history using the git tag command that is denoted by a specific version, and upload it to your remote repository using git push.
Based on your tag, create a release on GitHub of the relevant files in your repository (usually a zip or tar.gz file), which allows users to download the specific release of your software that corresponds to the time you created your tag.

Collectively, the 2 steps process would look something like:

BASH

$ git tag v1.0.0

$ git push --tags

Once you’ve pushed your tag, you can create a release with the tag you pushed to your remote repository by the following:

*Figure 1: Workflow describing how to release a package on GitHub.*.

Callout

Deleting a Release

You can also delete a release if you make an error using the following commands:

BASH

$ git tag -d v1.0.0
$ git push origin :refs/tags/v1.0.0

The first line simply deletes the tag v1.0.0 in your local repository, whereas the second line deletes the v1.0.0 tag from the remote repository named origin. Note that the colon indicates that you are not pushing any new content to replace the tag; instead, you are specifying that the tag should be deleted. Once you have ran the lines above, you will receive confirmation that the tag has been deleted.

Challenge

Challenge 1: Should you always delete a release?

Why might it not be advisable to delete a tag or release, and what alternative actions could you consider instead?

Give me a hint

Think about the impact of deleting a tag or release in version control. How might you preserve historical data while managing updates to tags and releases?

Show me the solution

Deleting a tag or release in a version control system can disrupt historical tracking and cause confusion for current and future collaborators. Instead of outright deletion, consider tagging the correct commit with a new version number or marking the tag as deprecated with clear documentation. This maintains historical integrity while clarifying the correct state of the codebase. Additionally, communicating changes effectively with team members ensures everyone understands the correct usage of tags and releases for your project.

Automated Releases using Actions

Before wrapping up this section, it is important to highlight that you can also automate your releases on GitHub using Actions, saving you time and helping you release new versions of your package quickly. Since GitHub Actions is a CI/CD platform that allows developers to automate certain aspects of their workflows (such as builds, tests, deployments), we can also configure a release pipeline that is defined by a workflow file (in YAML format) that run whenever a change is made to your repository.

Callout

Recall that GitHub uses the .github directory to store configuration files that are specific to GitHub features and integrations, and keeps the repository organised by separating these files from the main source code. Notice that it is common convention that the .github folder is a hidden directory.

The .github/workflows directory is the designated place where GitHub looks for workflow files. By placing your workflow files in .github/workflows, you enable GitHub Actions to automatically detect and run the workflows based on the triggers you specify (such as a push, pull request, or tag creations).

Callout

As a reminder, here are some of the common variables used in GitHub Actions workflow files:

Variable	Description
`name`	Specifies the name of the workflow. It helps identify the workflow in the GitHub Actions UI and in logs.
`on`	Defines the event that triggers the workflow, such as `push`, `pull_request`, `schedule`, or custom events like `workflow_dispatch`.
`jobs`	Contains one or more jobs to be executed in parallel or sequentially. Each job represents a set of steps that run on the same runner.
`runs-on`	Specifies the type of machine or virtual environment where jobs will run, such as `ubuntu-latest`, `windows-latest`, or `macos-latest`.
`steps`	Defines the sequence of tasks to be executed within a job. Each step can be a shell command, an action, or a series of commands separated by newlines (`run`).
`env`	Sets environment variables that will be available to all steps in a job.
`with`	Specifies inputs or parameters for an action or a specific step.
`uses`	Specifies the action to be used in a step. It can refer to an action in a public repository, a published Docker container, or a specific path in the repository.
`id`	Specifies a unique identifier for a step or action output, which can be referenced in subsequent steps or actions (`outputs`).
`secrets`	Allows access to encrypted secrets, such as `GITHUB_TOKEN`, which is automatically generated and scoped to the repository, used for authenticating GitHub API requests.

There are several workflow extensions already present on GitHub that you can use in your configuration file to automate your releases (e.g. action-gh-release). An example workflow file to automatically trigger a new release based on a push could look something like:

YAML


name: Create Release

on:
  push:
    tags:
      - 'v*'  # Trigger on tags starting with 'v'

permissions:
  contents: write  # Ensure write permissions for the workflow

jobs:
  release:
    name: Create GitHub Release
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2  # Checkout the repository's code

      - name: Create Release
        uses: softprops/action-gh-release@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}  # Pass GitHub token to the action
        with:
          tag_name: ${{ github.ref }}  # The full tag name, e.g., refs/tags/v1.0.0
          release_name: Release ${{ github.ref }}  # Release name based on the tag

Let’s break down what is happening in the above workflow file.

The first logic on.push.tags - 'v*' ensures the workflow triggers only when a tag starting with v is pushed, which is in line with the Semantic Versioning framework. Following this, we ensure that the workflow has the relevant write permissions to execute the workflow.
Next, after initiating the operating system environment (ubuntu-latest) there are 2 separate steps within the job that are carried out: first, Checkout code uses the actions/checkout@v2 action to fetch the repository’s code into the workflow environment. Second, Create Release uses the softprops/action-gh-release@v1 action to automate the creation of a GitHub release. The GITHUB_TOKEN environment variable, securely provided through GitHub secrets (${{ secrets.GITHUB_TOKEN }}), allows the action to perform repository operations like creating releases. The action is configured with inputs such as tag_name and release_name, derived dynamically from the Git tag (${{ github.ref }}), ensuring each release is appropriately named and described (body: Automated release created by GitHub Actions.).

Ultimately, a workflow like this streamlines the process of managing releases by automating tasks that would otherwise require manual intervention as we have demonstrated above. Once you have created a file similar to the one above, you can view the status of the workflow in the Actions tab as usual.

Challenge

Challenge 2: Automating Releases

You have been tasked with setting up a GitHub Actions workflow to automate the release process whenever a tag is pushed to your repository. Despite configuring the workflow correctly (on: push: tags: - 'v*'), you notice that the release is not being created. Provide a systematic approach to troubleshoot and resolve this issue.

Give me a hint

Does GitHub provide a way to view the output logs for a given workflow?

Show me the solution

There are several different approaches to debug this workflow. The first place to check would be the output log files from the workflow for any errors or warnings related to event triggers - this will give you a good idea where in your file the error may be arising. Since the error in question is likely due to the push itself, the most obvious line to check is on: push: tags: - 'v*' to ensure it correctly triggers on tag pushes starting with v, and ensure Semantic Versioning practices are being followed, and that there are no typographical errors. The second most common fault is that the GitHub token (secrets.GITHUB_TOKEN) used in your workflow has been incorrectly inputted, and/or may have insufficient permissions (permissions: contents: write) to create releases and perform other necessary actions in your repository.

Callout

Remember to never publish any sensitive information, such as passwords, directly on GitHub. Storing sensitive data in your repository makes it publicly accessible (if your repository is public) or easily accessible to anyone with repository access (if private). This can lead to unauthorised access, security breaches, and potential misuse of your code. Instead, use should use GitHub Secrets or environment variables to securely manage the sensitive information, ensuring it is kept safe and only accessible by authorised collaborators or workflows.

Key Points

GitHub tags provide a way to manage specific software versions via releases, enabling developers to easily reference and distribute stable versions of their software for their users.
Releases allow your software to be quickly and easily installed across different systems.

Content from Publishing Packages

Last updated on 2025-12-09 | Edit this page

Estimated time: 12 minutes

Overview

Questions

How can GitHub’s automation tools help with publishing your software?
What are the benefits of publishing your software on PyPI and ORDA?

Objectives

Learn how to publish your software to PyPI and The University of Sheffield’s ORDA repository.

Publishing your Software

Python Packaging Index

*Figure 2: Screenshot of the main landing page of PyPI. GitHub.*.

Now that we have covered how to release specific versions of your software, we will turn to how to publish your package on an online repository that allows others to easily install and use you software. PyPI (or the Python Packaging Index) is the official package repository for the Python community. It serves as the central location where developers can publish and share their packages, making them easily accessible to the wider community. When we use pip to install packages from the command line, it fetches them from PyPI by default. Uploading your packages to PyPI is recommended if you want to distribute your projects widely, as it allows other developers to easily find, install, and use your software.

Callout

Developers often use TestPyPI for testing and validating packages before they are officially published on PyPI.

To build the wheels, there are 2 tools that we need to install and use. The first is build, which is a command-line tool used to build source distributions, and wheel distributions of Python projects based on the metadata specified in the pyproject.toml. On the other hand, twine is the tool we use to securely upload the built distributions to PyPI, which handles tasks like authentication and transfer of package files.

In practice, the installation and usage of these tools would look something like:

BASH


pip install build 

python -m build

This will create dist/your-project-name-1.0.0.tar.gz (source distribution) and dist/your-project-name-1.0.0-py3-none-any.whl (wheel distribution) in the dist directory. Next, we can use twine to securely upload the built distributions to PyPI:

BASH


pip install twine

We can also run:

BASH


# Check our files is ready to be uploaded using twine check

twine check dist/*

# Check our package is ready to be uploaded to TestPyPI

twine upload --repository testpypi dist/*

Once we have confirmed that everything works as expected on TestPyPI, we may proceed with installing our package to PyPI:

BASH


twine upload dist/*

Finally, once our package is available on PyPI this means that other users can install the package using the command:

BASH


pip install your-project-name

Automating Publishing to PyPI

Challenge

Challenge 1: Automate Publishing to testPyPI

Automate releasing

Give me a hint

There are several pre-exisiting GitHub Actions that you could use, for example: gh-action-pypi-publish

Show me the solution

Create a workflow file in .github/workflows/ called release-to-pypi.yml with the following content:

YAML


name: Publish Python distribution to TestPyPI

on:
  push:
    tags:
      - 'v*'

jobs:
  build:
    name: Build distribution 📦
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        persist-credentials: false
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: "3.x"

    - name: Install pypa/build
      run: >-
        python3 -m
        pip install
        build
        --user
    - name: Install build dependencies
      run: python3 -m pip install build setuptools setuptools_scm
    - name: Build a binary wheel and a source tarball
      run: python3 -m build
    - name: Store the distribution packages
      uses: actions/upload-artifact@v4
      with:
        name: python-package-distributions
        path: dist/

  publish-to-testpypi:
    name: Publish Python 🐍 distribution 📦 to TestPyPI
    needs:
    - build
    runs-on: ubuntu-latest

    environment:
      name: testpypi
      url: https://test.pypi.org/p/<package-name>

    permissions:
      id-token: write  # IMPORTANT: mandatory for trusted publishing

    steps:
    - name: Download all the dists
      uses: actions/download-artifact@v4
      with:
        name: python-package-distributions
        path: dist/
    - name: Publish distribution 📦 to TestPyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        repository-url: https://test.pypi.org/legacy/

The above instructions triggers a GitHub Actions workflow that automatically publishes a Python package to Test PyPI whenever a new tag starting with “v” is pushed to the repository.

Trusted Publishers

This method makes use of the latest recommended authenticated method called Trusted Publishers. Trusted Publishers is a security enhancement for automating the publishing of Python packages to PyPI, particularly from continuous integration systems like GitHub Action. Instead of relying on long-lived API tokens, Trusted Publishers uses the OpenID Connect (OIDC) authentication protocol to authenticate and authorise CI/CD workflows, which creates a secure and token-less method for package uploads. When a trusted workflow runs, it generates short-lived OIDC tokens that PyPI verifies, and ensures that only authorised workflows from a specific repository or organisation can publish. Although this method is out the scope of this episode, we recommend you to read about the Trusted Publishers approach, and consider its advantages and disadvantages before applying it to your workflows.

Publishing to ORDA

*Figure 3: Screenshot of the main landing page of Figshare. GitHub.*.

At The University of Sheffield, researchers also use another popular repository called ORDA. ORDA is the University’s main research data repository, facilitating the sharing, discovery, and preservation of the university’s research data. Managed by the University Library’s Research Data Management team and provided by Figshare, ORDA assigns a DOI (Digital Object Identifier) to each deposited record, ensuring its accessibility and citability. Researchers are encouraged to use ORDA unless a subject-specific repository or data center is more commonly used in their field. ORDA supports the FAIR principles, and by extension the FAIR4RS principles, by making research outputs citable, discoverable, and easily accessible to a wider research community. As with PyPI, you should first sign up to ORDA (note; you should use your university credentials to create your account).

Demonstration of how to upload software packages to ORDA. — *Figure 4: Demonstration of how to upload data or software sources to ORDA. GitHub.*.

Figure 4 demonstrates how to upload your project to ORDA using their graphical user interface functionality. There is also the option to connect your project’s GitHub account to ORDA, allowing further portability between the two platforms. Once you have published your software on ORDA, it is readily available for other researchers to use and cite in their own research.

Challenge

Challenge 2: DOI and Reproducibility

In a research context, why is it important to cite software releases via a DOI for example, alongside academic papers?

Show me the solution

Citing software releases via a DOI alongside academic papers in research is crucial for several reasons. Firstly, it enhances reproducibility by providing clear references to the specific versions of software used in research. This allows other researchers to replicate and verify findings, ensuring the reliability of published results. Secondly, it promotes transparency by documenting the tools and methods used in studies, which is essential for research validation and building upon existing work. Also, citing software releases acknowledges the contributions of software developers and teams, ensuring they receive proper credit for their work, much like authors of academic papers.

Another useful integration feature ORDA enables is the use of their API, which allows developers to further automate their software publishing. As with the example above using PyPI, we can also use GitHub to create a CI/CD workflow that triggers a release to ORDA once we have officially published a release of our software. An example file for this workflow could look something like:

YAML


name: Upload to ORDA
on:
  release:
    types: [published]

jobs:
  upload:
    runs-on: ubuntu-latest
    steps:
      - name: Prepare Data Folder
        run: mkdir 'data'
      
      - name: Download Archive
        run: |
          curl -sL "${{ github.event.release.zipball_url }}" > "${{ github.event.repository.name }}-${{ github.event.release.tag_name }}.zip"
          curl -sL "${{ github.event.release.tarball_url }}" > "${{ github.event.repository.name }}-${{ github.event.release.tag_name }}.tar.gz"
      
      - name: Move Archive
        run: |
          mv "${{ github.event.repository.name }}-${{ github.event.release.tag_name }}.zip" data/
          mv "${{ github.event.repository.name }}-${{ github.event.release.tag_name }}.tar.gz" data/
      
      - name: Upload to Figshare
        uses: figshare/github-upload-action@v1.1
        with:
          FIGSHARE_TOKEN: ${{ secrets.FIGSHARE_TOKEN }}
          FIGSHARE_ENDPOINT: 'https://api.figshare.com/v2'
          FIGSHARE_ARTICLE_ID: YOUR_ID_NUMBER
          DATA_DIR: 'data'

The file above is tailored for uploading data to ORDA upon triggering by a published release event. It begins by preparing a data folder, downloading the archive associated with the release tag, moving the downloaded files to the data directory, and finally using the figshare/github-upload-action integration to upload the data to ORDA using the specified token, endpoint, article ID, and data directory. Note, you can create your own personal Figshare token in your account settings. Importantly, remember that as with the PyPI username and password, your Figshare token is sensitive and must be passed in as a secret or environment variable.

Finally, once you have uploaded your file sources to ORDA, your software will be readily available for other researchers to use, allowing significant progress towards building a transparent and reproducible research software environment for all involved.

Key Points

You can easily publish your package on PyPI for the wider Python community, allowing your users to simply install your software using pip install.
The University of Sheffield’s ORDA repository is another valuable platform to upload your software, further enabling software reproducibility, transparency, and research impact for all project collaborators involved.

Overview

Questions

Objectives

Introduction

Challenge 1: Packaging Analogy

Give me a hint

Show me the solution

Why Should You Package Your Software?

Packaging in Python

Optional: What is __init__.py?

PYTHON

Challenge 2: Improving your project’s packaging

Give me a hint

Show me the solution

Overview

Questions

Objectives

Introduction

What Python packaging files exist?

Requirements.txt

Setup.py

Setup.py problems

Give me a hint

Show me the solution

PYTHON

Pyproject.toml

TOML

TOML

Overview

Questions

Objectives

Introduction

Popular Packages

PYTHON

Python Package Index (PyPI)

Exercise 1: Explore PyPI

pip

Using pip

Exercise 2: Create venv and install Numpy

Virtual Environments

Instructor Note

Overview

Questions

Objectives

Introduction

Fibonacci Counter

Reinventing the wheel

Creating the package contents

What files and content go into a package?

Show me the solution

Creating Python module

PYTHON

Using your Python module

Show me the solution

PYTHON

Configuration File

TOML

[build-system]

[project]

pyproject.toml documentation

Create your configuration file

Show me the solution

TOML

Editable Install

Overview

Questions

Objectives

Introduction

Semantic Versioning

Recall: API

Pre-release Versions

Challenge 1: Semantic Versioning Decision Making

Give me a hint

Show me the solution

Versioning vs Version Control

Dynamic Versioning using setuptools_scm

TOML

BASH

BASH

BASH

Optional: What is init.py?