Introduction

Last updated on 2024-10-15 | Edit this page

Overview

Questions

  • How do we provide information to users of our research software?
  • Why is documenting code useful for researchers?
  • What does well-documented software look like?

Objectives

  • Understand the basic purpose of this course
  • Learn the motivation for learning to document software
  • Be introduced to good software documentation practices

Why document code?


No code is self-explanatory. It’s a tool we design, or, more often, a complex organism that develops as we use it, such as a library of functions used within a research team to perform certain kinds of analysis. To explain our code we must write software documentation.

These documents provide information about our programs for everyone involved in its development, use, and future re-use. Documentation may consist of text, tips within a computer environment, and diagrams that guide the user in using a (potentially complex) software tool. It explained how the software works why it behaves the way it does.

Why document software?


It’s a common occurrence to get a software package, whether it’s written by ourselves, a colleague, or someone else, that’s near-impossible to use because it’s unclear what each function or tool does. Maybe we look at the source code itself, but we can’t make head-nor-tail of it. Maybe the only person who can use this software is the person who wrote it—unless you wrote it and forgot what you were thinking when you did!

Challenge

Discuss positive or negative experiences with using research software:

  • What documentation was available?
  • What challenges did you have picking up a new tool?
  • What useful software-related information do you often refer to in your research projects?

Advantages of good documentation

There are many advantages to writing guidance to go along with your research software. Software documentation helps yourself and others to use it successfully in the future and read your code ensuring that its value is sustained.

Research outputs often depend upon the code used to generate them. Clarity and confidence are essential in using code to perform calculations, simulations, or data analysis. All kinds of research processes and analysis pipelines can be made more reproducible by providing clear context and instructions for using it.

There are many advantages to making your code more readable. Well-documented software is easier to maintain and has greater sustainability, which means it can continue to be used and modified for a longer period of time, despite changes in technology. If software is more reusable then it encourages others to use it for their research, increasing the number of citations of that software and its overall research impact.

Challenge

Discuss the benefits of writing documentation for your research software.

  • How will it help you and your work?
  • What benefits will these provide to your collaborators?
  • In what ways does documentation contribute to the wider research community?

In the long run, it can help you to develop your own software engineering practice by getting into the habit of reflecting on what the purpose of the software is and to articulate what each component or module is for.

Writing a useful software package that is well-documented and can be reused in the future means that your code could take on a life of its own, with benefits that extend beyond yourself to your collaborators and other researchers in the future. High-quality documentation is a key part of ensuring a healthy software lifecycle. It can make the different between accidentally creating an abandoned piece of “gradware” (a slang term for mysterious code that a former student wrote and nobody else can use) and a successful long-term software project with lasting impact.

When should I write documentation?


Now! Start writing and sharing documentation for your reseach code from the beginning of your project. It doesn’t have to be perfect straight away, but a first draft is more useful than nothing. It should be a consideration in your software management plan, which is a concept discussed in the Module 1a on Software Lifecycle Planning. Also, it’s never too late to start documentaing an old code project.

This might include design notes, diagrams, or the various kinds of software documentation we’ll discuss in this module. The best practice for modern, collaborative research involving digital methods and tools is to document your processes early and often. Not only will writing notes about your code help other people to read and use that code, it will clarify your thought process as you design your system, focussing your work on the important parts of the task at hand.

Keep in touch with other developers and users of the research code and make a note of their feedback. Common questions and problems are a sign that there are issues that must be covered more clearly and in greater depth in the software documentation. Incorporate this feedback into the software documentation using the whichever method is most appropriate, following the guidance in this module.

Research software papers


You may decide to publish a description of your software as a paper in an academic journal. This is a kind of methods paper, which provides more detail on your research process than is possible in your main paper.

A research software paper should provide a concise introduction to your code and explain how and why it was written. It may contain a detailed description of the technical design and how algorithms are implemente, providing transparency to other researchers and enabling better replicability of your results.

For more information about writing these papers, which is beyond the scope of this course, please read Ten simple rules for writing a paper about scientific software by Joseph Romano.

An increasing number of journals allow and encourage the publication of research software and open data. Some journals focus on a specific field, while others primarily publish research software of any kind. Some relevant journals include:

For more information, please read In which journals should I publish my software? by Neil Chue Hong, the Director of the Software Sustainability Institute.

Examples


Here are some examples of some code to perform some geometry. The first example could be improved in terms of its documentation and readability, while the second one is much clearer.

Example of no documentation

Here’s an example of some code that does… something. It’s not clear what this code is for or why it was written.

Challenge

Read and evaluate this code.

  • Can you tell what the purpose of the function is?
  • What is the meaning of the variables?
  • Would you rely on this code in your research? Why, or why not?

This is a function with a name that doesn’t explain what the code will do. There are no comments or notes to explain what the author intended to achieve. The variable names don’t clarify anything either: what does x mean in this context? Where would I go to find out more about weird_num? This is effectively a “magic” number that is arbitrarily stated but unexplained.

The logic of the calculation is also… rather cryptic.

Maybe the code works, maybe it doesn’t; but it could be made clearer and easier to maintain and modify in the future.

Well-documented example

Now let’s look at an example of best practices in documenting research software. (These code snippets are part of the end-product of this course, so don’t worry if they don’t make sense yet!)

Discussion

Read and evaluate this code.

  • Can you tell what the purpose of the function is?
  • What is the meaning of the variables?
  • Which code would you prefer to use?

This time, the function name is a verb that describes what the code will attempt to do. The description of the function is also written out clearly in a note for the user. There are comment lines (starting with #) that explain the mathematicalal method used. Each variable has a descriptive, human-readable name, making the code more intuitive to read. An existing library is used to calculate the factorial, which means we can look up the usage for the factorial() function elsewhere.

This approach means that our code is much easier to interpret, maintain, and make changes to in the future.

Of course, there may be some syntax in this example that is unfamiliar to you—but don’t worry, we’ll learn the basics in this course!

Real-world examples


Let’s review real-world examples of the documentation for software packages that are used in research.

NumPy user guide

NumPy is a mathematical package for the Python programming language that’s used for linear algebra. The NumPy User Guide is a thorough website that organised into sections that cover the different aspects of using that package.

It includes a beginner’s guide, tutorials for different use-cases, and in-depth write-ups of technical details of certain aspects of the code. Some of the content is written for a target audience with no assumed knowledge, while other parts are written as a reference for people with some background in mathematics and computer programming.

ggplot2 documentation site

ggplot2 is a package for the R statistical language that generates data visualisations and graphics. The ggplot2 documentation has a simple, accessible layout and walks a new user through installing and getting up-and-running with the tool. The page provides a “cheat sheet” which is a reference guide that lists commonly-used commands in an attractice two-page layout. The documentation site is moderate in scope and links to several external resources, such as online courses hosted elsewhere.

The source code is neatly organised into R code files in the repository. For example, the function geom_point() includes an extensive description of the purpose and operation of that code, including a list of the parameters and examples of how to use it.

Key Points

  • Reproducibility: Well-documented software is easier for other researchers to understand and use with confidence. It enables them to reproduce your results to replicate research findings, enabling others to validate them and building trust in your research outputs.
  • Collaboration: Clear instructions enable other researchers to use and collaborate with your software and research projects.
  • Knowledge transfer: Your software package will be easier to maintain in the long term if others are able to learn about it and look after it after the original developers move on.