Documentation strings

Last updated on 2025-03-05 | Edit this page

Overview

Questions

  • How do we describe our code?
  • How can we annotate functions in our research code?
  • Why are documentation strings useful for research software?

Objectives

  • Understand the purpose of documentation strings
  • Learn how to write documentation strings that will be useful for other researchers
  • Introduce ways to describe the parameters and return values of functions

How do we describe our code?


Describing functions

If you’re publishing a research software package, one of the most common ways that its users will learn to interact with the code is by reading the documentation for each individual function. We learned about functions in an earlier module on software design principles. Functions help us to break our code into smaller units that have a single purpose.

By documenting those functions effectively, we aim to explain their purpose to future users and maintainers of that code. We also need to describe all the expected inputs and outputs of the functions.

Documentation strings


We describe functions by using a feature of many programming languages called documentation strings, which is sometimes abbreviated to “docstring”. A documentation string is a piece of text that describes a part of your code and helps other people to use it effectively.

To make a docstring, we write special comments in our code using syntax which is specific to each programming language, although the principle is the same.

Whenever you add functionality to a code project, consider wrapping it up into a function. It may help to write the docstring first to help work through what the purpose of your new code is before you start!

Challenge

Write a documentation string for a function. Create a script called oddsong and define a function named identify() that will be used to identify bird songs by inspecting an audio file to provide the name of that species.

Viewing docstrings

We can view documentation strings for a function by using the ? operator or help() function in R and the help built-in function in Python.

Challenge

Use the help() function to view the documentation string for a function.

Let’s view the help text for an in-built function abs() that finds the absolute value of a number.

The most important thing to include in a docstring is an explanation of the purpose of this piece of code. To write a useful docstring, put yourself in the shoes of someone who encounters your code for the first time. They need a simple introduction that doesn’t assume too much implied knowledge. The explanation may seem obvious to you, but it may help a new user greatly.

Discussion

How can we tailor our documentation strings to different audiences, such as new users and experienced developers?

Arguments

Next, we must describe the inputs to the function, its arguments or parameters.

We list the input parameters in the code examples below. Each argument has a name and a brief description.

We have added an “arguments” (abbreviated to “args”) section to our docstring which lists the input parameters of the function and describes each one.

Challenge

Add a description of each argument to a function in your code.

Run help() and evaluate the output.

Return values

Finally, we describe the output of the function. The return value is defined by the return statement in our function code block.

This will help the user to understand what the function does and what they can expect to receive back when they call it.

It can also be useful to explain any potential errors or exceptions that the function will raise if the inputs aren’t as expected, and how to deal with them.

Challenge

Describe the return value of a function in a documentation string.

Run help() and evaluate the output.

Usage examples

We can also include demonstrations of how to use our code by providing code snippets. To do this, we write a collection of sample code that demonstrate how to use functions effectively in different scenarios.

To do this, let’s add an examples section to our documentation string.

Challenge

Write a brief code example within the documentation string in a function in your code.

We can use the code examples inside docstrings to define test cases that are used in automatic software testing.

Best practices


This section contains some tips for writing useful documentation strings.

Prioritisation

Focus on the purpose and functionality of the code, rather than getting bogged down in the details of how it works. Explain what the function does, rather then the specific implementation, because this might change over time. A function encapsulates an isolated part of a system, which can be used as a black box by other parts of the system or the end user, who in many cases only needs to understand its inputs and outputs.

Tips:

  • It’s a good idea to start your docstring with a high-level summary of the function.
  • If the function is a major one, include a simple introduction for the new user.

Discussion

Consider this documentation string:

PYTHON

def identify(audio_file):
    """
    Process sound recording.
    """
    ...

What problems do you notice? How could we improve this?

Clarity is key

Be concise. Describe the essential information that user needs to know first and be brief but clear.

As with any software documentation, avoid jargon where possible.

Discussion

Read the following documentation string, which is very wordy:

PYTHON

def add(x, y):
    """
    Adds two numbers together, which are the x and y arguents of this function.

    This function takes two numbers as input and returns their sum.
    The addition is performed using the built-in `+` operator.

    Args:
        x: The first number to add to the second number, y.
        y: The second number to add to the first number, x.

    Returns:
        The sum of x and y, which are summed using the addition operator.
    """
    return x + y

Discuss how can we effectively convey the purpose and functionality of a function in a docstring, without going into excessive detail about its implementation?

Don’t reinvent the wheel. Provide links to further resources for users to take a deep dive into more complicated topics.

Discussion

How can we link to external resources that can provide more in-depth information?

Be consistent. Decide a style of docstring and use that everywhere across your software project. If you’re working on a larger project with multiple developers, refer to the coding conventions and, if in doubt, follow the style of existing code.

There are several different standards for documentation strings. A standard is a convention that determines how the docstrings will be organised and the syntax that is used to represent the arguments, data types, etc.

A list of documentation string standards in Python:

It doesn’t matter which one you select, as long as it’s used consistently across a project and it’s clear what the syntax means. Some standards are better-supported by other tools such as IDEs and documentation generators.

Automatically generate docstrings


Generative large language model (LLM) services such as Google Gemini can read your code and write docstrings automatically, to a certain extent.

To do this, ask the system to create a docstring and copy your code into the prompt text box. Below is an example prompt and the reply generated by the Google Gemini algorithm:

Please generate a docstring for this Python function:

def calculate_rectangle_area(width, height):
    area = width * height
    return area

The result is the following docstring, in addition to some helpful descriptions of the content that it generated.

PYTHON

def calculate_rectangle_area(width, height):
  """
  This function calculates the area of a rectangle.

  Args:
      width (float): The width of the rectangle. Must be a positive number.
      height (float): The height of the rectangle. Must be a positive number.

  Returns:
      float: The area of the rectangle. This will be a positive number.
  """

  # Calculate the area
  area = width * height

  return area

This AI-generated content contains a function summary, argument descriptions, and explains the return value as we discussed previously.

Challenge

Try asking a generative artificial intelligence service such as Google Gemini to read your code.

  • Ask it to generate documentation of different kinds.
  • Request a review of your code. What does the bot think?
  • Can the chatbot create a diagram to illustrate a concept that is relevant to your research software?

This can save you a lot of time, but as with any LLM-generated content, always check the output and ensure it’s correct!

Discussion

What are the benefits and risks of using a large langauge model (LLM) service such as Google Gemini or OpenAI ChatGPT to interpret your code and produce content that you use in your research?

How should we critically evaluate this material so that it can be used appropriately to improve the productivity of our research teams without jeopardising our ethics or integrity or causing security risks?

Conclusion


Documentation strings make your code clearer to read and easier for other researchers to use. Also, they make your research software easier to maintain in the long run, saving time and resources. Good docstrings use a clear writing style and everyday language.

Well-documented, reusable research code depends upon good documentation strings. Research collaborators will benefit from clear explanations of the purpose of each function.

Key Points

  • Docstrings are special comments that describe the purpose of a function and its inputs and outputs.
  • Structure your docstrings to convey more information, with a concise introduction.
  • Documentation strings allow you to break your documentation into bite-size chunks, with one overview comment per function.

Further resources


To find out more about documentation strings, please refer to the following resources:

Python

R