Documentation sites

Last updated on 2024-09-25 | Edit this page

Overview

Questions

  • How do I present comprehensive information to users of my research software?
  • How do I generate a website containing a user guide to my code?
  • What should a good documentation website contain?
  • How do I publish my software documentation on the internet?

Objectives

  • Learn about documentation websites for software packages.
  • Gain basic familiarity with some common website generation tools.
  • Understand the basics of structuring a documentation website.
  • Be able to set up a static site deployment workflow.

Documentation websites


A documentation website is a user guide and reference manual for a library of research code. Up to now, we’ve looked at ways to put helpful notes in our code, but now we’ll learn how to write a longer, more complete guide to the research tools you create.

A documentation site bring all your user guidance into one place. This kind of resource may be prepared for research software and will usually contain an introduction, installation instructions, a user guide, troubleshooting tips, and an in-depth reference section.

To get an idea of this, here are some links documentation websites for widely-used data analysis and research software packages:

  • pandas is a data processing library for the Python programming language.
  • ggplot2 is a plotting package for the R statistical language.
  • scikit-learn is a machine learning library for the Python programming language.

Discussion

Evaluate these documentation sites.

  • What do you like about them?
  • How approachable are they as a new user?
  • What do you find difficult to understand in this material?

Why create a website?


There are many advantages to building a documentation site to provide a information-rich resource for researchers who use your code at institutions all around the world.

Advantages

These sites can work as hubs for collaboration, sharing the latest updates, and encouraging people to take up your system and get involved in improving it. The effort of setting one up will be rewarded in the long run because you will have created a valuable asset that will foster collaboration and knowledge sharing in your research community.

A key foundation stone of modern digital research practices is the ability to replicate results by reproducing analytic workflows. Clear, thorough documentation of the research code ensures that researchers can repeat processes and verify results and other people’s outputs.

Documentation sites are really useful for introducing new users to your software. It makes it much easier and faster for new users to get started using your software to boost their research. It’s one of the most effective ways to create a user base that has a sophisticated understanding of the research code, which is essential for them to adapt it to the complex problems that often raise in research contexts.

They’re also a valuable resource for your existing user base, enabling them to look up reference material or search the manual to find new capabilities they weren’t aware of before. This will increase the potential for your software to increase the productivity of other research teams.

When to use one

Although the advantages are numerous, not all software packages require a comprehensive documentation website. However, for any code project that is growing in the number of collaborators, users, and technical complexity, consider coordinating the team to write one as soon as possible to help the project continue its’ healthy growth.

Discussion

When is it appropriate to establish a documentation website? Consider the following factors:

  • How many resources will it take to write and maintain?
  • How many end-users need the information?
  • Is there a simpler format that can convey the same information?

Contents


Documentation pages contain comphrehensive information about a particular piece of research software. Think of it like a user manual for your car or an instruction guide for building a piece of furniature.

Research context

For research software, it may be important to explain the theoretical background or statistical methods that are used and explain the domain-specific assumptions that were made when the code was designed and written. It’s good practice to provide a concise summary of the relevant concepts and link to external sources such as papers, books, and other websites for users to take a deeper dive into the principles and algorithms used.

Installation instructions

This section provides a detailed walkthrough of the steps required to install the package onto their computer, with details that are specific to their operating system.

Tutorials

It can be very useful to include an in-depth “Getting Started” guide that provides step-by-step instructions to introduce a new user to your software package. It might guide the user through each aspect of the tool’s functionality and features so they’re able to become familiar with it in a more approachable way.

A series of code examples to demonstrate how to use the software in different contexts can be very useful for users to get off the ground in implementing common research workflows to achieve their specific goals.

User reference

If you have written functions that are intended to be use in other reseachers’ code, then an on-depth explaination of these procedures is essential reference material. In the world of software engineering, these detailed appendices are called API references, which list each function and describe how the arguments may be used to control how the code works. This content may be automatically generated from the documentation strings.

Troubleshooting

As issues come up with your research code, and are eventually resolved and clarified, make a note of the causes of these troubles and make them available to the entire user base in your documentation site. This will help users to identify and fix common misunderstandings and technical problems they may run into when utilising your code.

This prevents a situation where potential solutions to common issues do exist, but are scattered around the internet are the exclusive knowledge of a few individuals and are hard to find.

FAQs

An appendix containing frequently asked questions (FAQs) is very useful to save yourself time in responding to common queries from the users of your code.

Writing style


As we discussed in the episode on READMEs, it’s important to strive to use everyday, jargon-free language. It helps to set an approachable tone that encourages others to use the software and get involved with the project. This will en sure that the code is accesible to the widest possible layers of the research community and foster collaboration.

Always consider the target audience of your documentation, because your user base may be unaware of some of the unstated assumptions and technical backgroud knowledge that you take for granted.

Tools


There are various tools available to build documentation sites for your research software.

GitHub Wiki

If you are publishing your code on GitHub, which is a web service that hosts costs repositories, then one of the easiest ways to create a documentation site is to use the wiki feature on that platform. This is a great way to write detailed, structured documents containing long-form content that describes aspects of your software. What’s more, it’s available alongside your code so your documentation and software are located in one place.

As with readme files, the text that appears on GitHub is formatted using Markdown syntax.

Getting started

To create a wiki, which is a simple, easy-to-edit web site, go to the main page of your code repository on GitHub and click on the Wiki button on the top menu. For a detailed walkthrough of this process, please read adding or editing wiki pages on the GitHub documentation.

GitHub Wikis

For more information about the wiki feature on GitHub, see Documenting your project with wikis on the GitHub documentation.

Documentation sites for R packages

It’s also possible to generate a documentation site to accompany R packages that you create. For more information about this, please refer to the book R Packages by Hadley Wickham, which has a chapter on documentation websites.

Sphinx

Sphinx is a tool for building documentation websites that is commonly used amongst developers of Python packages, although it’s also compatible with other programming languages. It doesn’t currently support packages written using the R statistical language.

Sphinx is a documentation generator tool takes plain text files that use a markup syntax (such as reStructuredText or Markdown) for formatting the content of your documentation site and transforms them into various output formats, ready to be published on the internet. It has a number of useful features, but in this module we’ll learn the basics to document our research code.

Callout

For a more in-depth guide, please see Build your first project in the Sphinx documentation.

Getting started

Let’s use Sphinx to create a documentation site for our Python code.

Installing Sphinx

Navigate to the root folder of your code project. Create a virtual environment using venv which is a separate area in which to install the Sphinx package. This command will create a virtual environment in a directory called .venv/

This will create a subdirectory that contains the packages we’ll need to complete the exercises in this section.

Run the activation script to enable the virtual environment. The specific command needed to activate the virtual environment depends on the operating system you are using.

Use the Python package manager pip to install Sphinx.

BASH

pip install sphinx
Start a new Sphinx project

Sphinx includes a command to set up a new project called sphinx-quickstart. Navigate to your project’s root folder and run the following command.

BASH

sphinx-quickstart docs --no-sep --ext-autodoc

This will initialise the configuration files for a new Sphinx site in a subdirectory called docs/ and prompt you to enter the following options:

  • Project name: Birdsong Identifier
  • Author name(s): Bill Oddie
  • Project release []: 1.0

Sphinx options

To find out more about the Sphinx configuration files, please read their guide to defining document structure on the Sphinx documentation.

Building the site

In this context, building means taking our collection of Sphinx files and converting them into the source code files that define a website. Sphinx will create HyperText Markup Language (HTML) files, which is the markup language for pages that display in a web browser commonly used on the internet.

To build our site, we run the sphinx-build command using the -M option to select HTML syntax as the output format.

BASH

sphinx-build -M html docs docs/_build

Sphinx will load our files from the docs/ directory and output the built HTML files in the docs/_build directory.

The file docs/_build/html/index.html contains the home page of your new documentation site! Open that file to view your handiwork.

The Sphinx homepage for our documentation site
The Sphinx homepage for our documentation site

Autodoc

It can be useful to automatically populate our documentation sites by converting our documentation strings into formatted text. We can achieve this using the autodoc plugin for Sphinx.

Configuring Autodoc

Let’s set up the options for autodoc. (If you struggle with these steps, please refer to the template project.)

Add the following lines to docs/conf.py which

PYTHON

# Our Python code may be imported from the parent directory
import os
import sys
sys.path.insert(0, os.path.abspath('..'))

This ensures that Sphinx can access our Python code by pointing at the root directory of our project. The .. syntax means “one folder up”, which means autodoc will search in the root directory for code to import.

The Python code uses sys.path, a list of locations to search for code. By modifying the Python module search path, we allow autodoc to locate and import our code modules from a specific directory that is not in the default search path.

This is often necessary when working with project structures that involve multiple directories, helping the interpreter to find code that isn’t installed in the standard library location.

Next, edit docs/index.rst and add the following lines to instruct Sphinx to automatically generation documentation for our Python module.

RST

.. automodule:: oddsong.song
    :members:

This reStructuredText (reST) markup language has the following elements:

  • .. indicates a directive within a reST document that is used to configure Sphinx.
  • automodule:: indicates a specific directive to use autodoc to automatically generate documentation for a module.
  • oddsong.song is the path to our Python module, for which documentation will be created.
  • :members: is an optional argument for the automodule directive that instructs Sphinx to include documentation for all members (functions, classes, variables) defined within the specified module.

For more information about reST, please read the Introduction to reStructuredText by Write The Docs.

Now, when we build our site, Sphinx will scan the contents of the oddsong Python module and automatically generate a useful reference guide to our functions.

BASH

sphinx-build -M html docs docs/_build

The result looks something like this:

Python documentation string rendered as HTML
Python documentation string rendered as HTML

Automatically generate content

Try using autodoc to analyise your own code and build a documentation site by following the steps above.

After the sphinx-build command has completed successfully, browse the contents of the docs/_build/html folder and discuss what you find.

Publishing


Now that you’ve started writing your documentation website, there are various ways to upload it to the internet so that others can read it.

There are several hosting services that can be used to publish your documentation site, such as GitHub Pages and Read the Docs.

The detailed of setting up the deployment of your site to these platforms is beyond the scope of this course.

Key Points

  • Structured documentation websites are very useful for users to learn to use all kinds of digital systems, ensuring its successful adoption by the wider research community.
  • Documentation sites contain comprehensive installation instructions, user guides, and troubleshooting tips.
  • There are several libraries that may be used to generate documentation sites.
  • Documentation websites may be deployed to a hosting platform.

Further resources


Please review the following material which provides more information about some of the topics covered in this episode.