Documentation strings
Last updated on 2025-03-05 | Edit this page
Overview
Questions
- How do we describe our code?
- How can we annotate functions in our research code?
- Why are documentation strings useful for research software?
Objectives
- Understand the purpose of documentation strings
- Learn how to write documentation strings that will be useful for other researchers
- Introduce ways to describe the parameters and return values of functions
How do we describe our code?
Describing functions
If you’re publishing a research software package, one of the most common ways that its users will learn to interact with the code is by reading the documentation for each individual function. We learned about functions in an earlier module on software design principles. Functions help us to break our code into smaller units that have a single purpose.
By documenting those functions effectively, we aim to explain their purpose to future users and maintainers of that code. We also need to describe all the expected inputs and outputs of the functions.
Documentation strings
We describe functions by using a feature of many programming languages called documentation strings, which is sometimes abbreviated to “docstring”. A documentation string is a piece of text that describes a part of your code and helps other people to use it effectively.
To make a docstring, we write special comments in our code using syntax which is specific to each programming language, although the principle is the same.
Whenever you add functionality to a code project, consider wrapping it up into a function. It may help to write the docstring first to help work through what the purpose of your new code is before you start!
Challenge
Write a documentation string for a function. Create a script called
oddsong
and define a function named identify()
that will be used to identify bird songs by inspecting an audio file to
provide the name of that species.
Viewing docstrings
We can view documentation strings for a function by using the
?
operator or help()
function in R and the help
built-in function in Python.
Challenge
Use the help()
function to view the documentation string
for a function.
Let’s view the help text for an in-built function
abs()
that finds the absolute value of a number.
The most important thing to include in a docstring is an explanation of the purpose of this piece of code. To write a useful docstring, put yourself in the shoes of someone who encounters your code for the first time. They need a simple introduction that doesn’t assume too much implied knowledge. The explanation may seem obvious to you, but it may help a new user greatly.
Discussion
How can we tailor our documentation strings to different audiences, such as new users and experienced developers?
Arguments
Next, we must describe the inputs to the function, its arguments or parameters.
We list the input parameters in the code examples below. Each argument has a name and a brief description.
We have added an “arguments” (abbreviated to “args”) section to our docstring which lists the input parameters of the function and describes each one.
Challenge
Add a description of each argument to a function in your code.
Run help()
and evaluate the output.
Return values
Finally, we describe the output of the function. The return
value is defined by the return
statement in our
function code block.
This will help the user to understand what the function does and what they can expect to receive back when they call it.
It can also be useful to explain any potential errors or exceptions that the function will raise if the inputs aren’t as expected, and how to deal with them.
Challenge
Describe the return value of a function in a documentation string.
Run help()
and evaluate the output.
Usage examples
We can also include demonstrations of how to use our code by providing code snippets. To do this, we write a collection of sample code that demonstrate how to use functions effectively in different scenarios.
To do this, let’s add an examples section to our documentation string.
Challenge
Write a brief code example within the documentation string in a function in your code.
We can use the code examples inside docstrings to define test cases that are used in automatic software testing.
Best practices
This section contains some tips for writing useful documentation strings.
Prioritisation
Focus on the purpose and functionality of the code, rather than getting bogged down in the details of how it works. Explain what the function does, rather then the specific implementation, because this might change over time. A function encapsulates an isolated part of a system, which can be used as a black box by other parts of the system or the end user, who in many cases only needs to understand its inputs and outputs.
Tips:
- It’s a good idea to start your docstring with a high-level summary of the function.
- If the function is a major one, include a simple introduction for the new user.
Clarity is key
Be concise. Describe the essential information that user needs to know first and be brief but clear.
As with any software documentation, avoid jargon where possible.
Discussion
Read the following documentation string, which is very wordy:
PYTHON
def add(x, y):
"""
Adds two numbers together, which are the x and y arguents of this function.
This function takes two numbers as input and returns their sum.
The addition is performed using the built-in `+` operator.
Args:
x: The first number to add to the second number, y.
y: The second number to add to the first number, x.
Returns:
The sum of x and y, which are summed using the addition operator.
"""
return x + y
Discuss how can we effectively convey the purpose and functionality of a function in a docstring, without going into excessive detail about its implementation?
Don’t reinvent the wheel. Provide links to further resources for users to take a deep dive into more complicated topics.
Discussion
How can we link to external resources that can provide more in-depth information?
Be consistent. Decide a style of docstring and use that everywhere across your software project. If you’re working on a larger project with multiple developers, refer to the coding conventions and, if in doubt, follow the style of existing code.
There are several different standards for documentation strings. A standard is a convention that determines how the docstrings will be organised and the syntax that is used to represent the arguments, data types, etc.
A list of documentation string standards in Python:
- The PEP 257 docstring standard was designed by the maintainers of the Python programming langauge.
- The Google Style Guide sets out a docstring format.
- Sphinx docstring format, which has a NumpyDoc extension designed for scientific use.
It doesn’t matter which one you select, as long as it’s used consistently across a project and it’s clear what the syntax means. Some standards are better-supported by other tools such as IDEs and documentation generators.
Automatically generate docstrings
Generative large language model (LLM) services such as Google Gemini can read your code and write docstrings automatically, to a certain extent.
To do this, ask the system to create a docstring and copy your code into the prompt text box. Below is an example prompt and the reply generated by the Google Gemini algorithm:
Please generate a docstring for this Python function:
def calculate_rectangle_area(width, height):
area = width * height
return area
The result is the following docstring, in addition to some helpful descriptions of the content that it generated.
PYTHON
def calculate_rectangle_area(width, height):
"""
This function calculates the area of a rectangle.
Args:
width (float): The width of the rectangle. Must be a positive number.
height (float): The height of the rectangle. Must be a positive number.
Returns:
float: The area of the rectangle. This will be a positive number.
"""
# Calculate the area
area = width * height
return area
This AI-generated content contains a function summary, argument descriptions, and explains the return value as we discussed previously.
Challenge
Try asking a generative artificial intelligence service such as Google Gemini to read your code.
- Ask it to generate documentation of different kinds.
- Request a review of your code. What does the bot think?
- Can the chatbot create a diagram to illustrate a concept that is relevant to your research software?
This can save you a lot of time, but as with any LLM-generated content, always check the output and ensure it’s correct!
Discussion
What are the benefits and risks of using a large langauge model (LLM) service such as Google Gemini or OpenAI ChatGPT to interpret your code and produce content that you use in your research?
How should we critically evaluate this material so that it can be used appropriately to improve the productivity of our research teams without jeopardising our ethics or integrity or causing security risks?
Conclusion
Documentation strings make your code clearer to read and easier for other researchers to use. Also, they make your research software easier to maintain in the long run, saving time and resources. Good docstrings use a clear writing style and everyday language.
Well-documented, reusable research code depends upon good documentation strings. Research collaborators will benefit from clear explanations of the purpose of each function.
Key Points
- Docstrings are special comments that describe the purpose of a function and its inputs and outputs.
- Structure your docstrings to convey more information, with a concise introduction.
- Documentation strings allow you to break your documentation into bite-size chunks, with one overview comment per function.
Further resources
To find out more about documentation strings, please refer to the following resources:
Python
- Python PEP 8 Documentation Strings
- NumPy style guide describes the syntax and best practices for docstrings in the NumPy project.
R
- Function documentation in R Packages by Hadley Wickham