Content from Introduction
Last updated on 2024-10-15 | Edit this page
Overview
Questions
- How do we provide information to users of our research software?
- Why is documenting code useful for researchers?
- What does well-documented software look like?
Objectives
- Understand the basic purpose of this course
- Learn the motivation for learning to document software
- Be introduced to good software documentation practices
Why document code?
No code is self-explanatory. It’s a tool we design, or, more often, a complex organism that develops as we use it, such as a library of functions used within a research team to perform certain kinds of analysis. To explain our code we must write software documentation.
These documents provide information about our programs for everyone involved in its development, use, and future re-use. Documentation may consist of text, tips within a computer environment, and diagrams that guide the user in using a (potentially complex) software tool. It explained how the software works why it behaves the way it does.
Why document software?
It’s a common occurrence to get a software package, whether it’s written by ourselves, a colleague, or someone else, that’s near-impossible to use because it’s unclear what each function or tool does. Maybe we look at the source code itself, but we can’t make head-nor-tail of it. Maybe the only person who can use this software is the person who wrote it—unless you wrote it and forgot what you were thinking when you did!
Challenge
Discuss positive or negative experiences with using research software:
- What documentation was available?
- What challenges did you have picking up a new tool?
- What useful software-related information do you often refer to in your research projects?
Advantages of good documentation
There are many advantages to writing guidance to go along with your research software. Software documentation helps yourself and others to use it successfully in the future and read your code ensuring that its value is sustained.
Research outputs often depend upon the code used to generate them. Clarity and confidence are essential in using code to perform calculations, simulations, or data analysis. All kinds of research processes and analysis pipelines can be made more reproducible by providing clear context and instructions for using it.
There are many advantages to making your code more readable. Well-documented software is easier to maintain and has greater sustainability, which means it can continue to be used and modified for a longer period of time, despite changes in technology. If software is more reusable then it encourages others to use it for their research, increasing the number of citations of that software and its overall research impact.
Challenge
Discuss the benefits of writing documentation for your research software.
- How will it help you and your work?
- What benefits will these provide to your collaborators?
- In what ways does documentation contribute to the wider research community?
In the long run, it can help you to develop your own software engineering practice by getting into the habit of reflecting on what the purpose of the software is and to articulate what each component or module is for.
Writing a useful software package that is well-documented and can be reused in the future means that your code could take on a life of its own, with benefits that extend beyond yourself to your collaborators and other researchers in the future. High-quality documentation is a key part of ensuring a healthy software lifecycle. It can make the different between accidentally creating an abandoned piece of “gradware” (a slang term for mysterious code that a former student wrote and nobody else can use) and a successful long-term software project with lasting impact.
When should I write documentation?
Now! Start writing and sharing documentation for your reseach code from the beginning of your project. It doesn’t have to be perfect straight away, but a first draft is more useful than nothing. It should be a consideration in your software management plan, which is a concept discussed in the Module 1a on Software Lifecycle Planning. Also, it’s never too late to start documentaing an old code project.
This might include design notes, diagrams, or the various kinds of software documentation we’ll discuss in this module. The best practice for modern, collaborative research involving digital methods and tools is to document your processes early and often. Not only will writing notes about your code help other people to read and use that code, it will clarify your thought process as you design your system, focussing your work on the important parts of the task at hand.
Keep in touch with other developers and users of the research code and make a note of their feedback. Common questions and problems are a sign that there are issues that must be covered more clearly and in greater depth in the software documentation. Incorporate this feedback into the software documentation using the whichever method is most appropriate, following the guidance in this module.
Research software papers
You may decide to publish a description of your software as a paper in an academic journal. This is a kind of methods paper, which provides more detail on your research process than is possible in your main paper.
A research software paper should provide a concise introduction to your code and explain how and why it was written. It may contain a detailed description of the technical design and how algorithms are implemente, providing transparency to other researchers and enabling better replicability of your results.
For more information about writing these papers, which is beyond the scope of this course, please read Ten simple rules for writing a paper about scientific software by Joseph Romano.
An increasing number of journals allow and encourage the publication of research software and open data. Some journals focus on a specific field, while others primarily publish research software of any kind. Some relevant journals include:
- The Journal of Open Source Software is a peer-reviewed publications that provides academic citations for research code;
- Nature has a category of Toolbox articles that cover the technical side of research;
- Journal of Open Research Software is a peer-reviewed repository run by the Software Sustainability Institute.
For more information, please read In which journals should I publish my software? by Neil Chue Hong, the Director of the Software Sustainability Institute.
Examples
Here are some examples of some code to perform some geometry. The first example could be improved in terms of its documentation and readability, while the second one is much clearer.
Example of no documentation
Here’s an example of some code that does… something. It’s not clear what this code is for or why it was written.
Challenge
Read and evaluate this code.
- Can you tell what the purpose of the function is?
- What is the meaning of the variables?
- Would you rely on this code in your research? Why, or why not?
This is a function with a name that doesn’t explain what the code
will do. There are no comments or notes to explain what the author
intended to achieve. The variable names don’t clarify anything either:
what does x
mean in this context? Where would I go to find
out more about weird_num
? This is effectively a “magic”
number that is arbitrarily stated but unexplained.
The logic of the calculation is also… rather cryptic.
Maybe the code works, maybe it doesn’t; but it could be made clearer and easier to maintain and modify in the future.
Well-documented example
Now let’s look at an example of best practices in documenting research software. (These code snippets are part of the end-product of this course, so don’t worry if they don’t make sense yet!)
Discussion
Read and evaluate this code.
- Can you tell what the purpose of the function is?
- What is the meaning of the variables?
- Which code would you prefer to use?
This time, the function name is a verb that describes what the code
will attempt to do. The description of the function is also written out
clearly in a note for the user. There are comment lines (starting with
#
) that explain the mathematicalal method used. Each
variable has a descriptive, human-readable name, making the code more
intuitive to read. An existing library is used to calculate the
factorial, which means we can look up the usage for the
factorial()
function elsewhere.
This approach means that our code is much easier to interpret, maintain, and make changes to in the future.
Of course, there may be some syntax in this example that is unfamiliar to you—but don’t worry, we’ll learn the basics in this course!
Real-world examples
Let’s review real-world examples of the documentation for software packages that are used in research.
NumPy user guide
NumPy is a mathematical package for the Python programming language that’s used for linear algebra. The NumPy User Guide is a thorough website that organised into sections that cover the different aspects of using that package.
It includes a beginner’s guide, tutorials for different use-cases, and in-depth write-ups of technical details of certain aspects of the code. Some of the content is written for a target audience with no assumed knowledge, while other parts are written as a reference for people with some background in mathematics and computer programming.
ggplot2 documentation site
ggplot2 is a package for the R statistical language that generates data visualisations and graphics. The ggplot2 documentation has a simple, accessible layout and walks a new user through installing and getting up-and-running with the tool. The page provides a “cheat sheet” which is a reference guide that lists commonly-used commands in an attractice two-page layout. The documentation site is moderate in scope and links to several external resources, such as online courses hosted elsewhere.
The source code is neatly organised into R code files in the repository. For example, the function geom_point() includes an extensive description of the purpose and operation of that code, including a list of the parameters and examples of how to use it.
Key Points
- Reproducibility: Well-documented software is easier for other researchers to understand and use with confidence. It enables them to reproduce your results to replicate research findings, enabling others to validate them and building trust in your research outputs.
- Collaboration: Clear instructions enable other researchers to use and collaborate with your software and research projects.
- Knowledge transfer: Your software package will be easier to maintain in the long term if others are able to learn about it and look after it after the original developers move on.
Content from Writing README files
Last updated on 2024-10-14 | Edit this page
Overview
Questions
- How do we introduce our software to new researchers and developers?
- How do I structure the basic notes for my research code?
- What are the contents of good documentation?
Objectives
- Explain why and how to write a README file for research software
- Learn how to structure documentation into sections
- Understand the important components of a good README
What is a README file?
A README file is the first thing a user sees when they find your software. It should give them an approachable overview of the package, define what’s possible to achieve with this code, and get them started on the right track to use the software effectively for their research.
A README contains a brief introduction to the code and shows them how to get started using it. For larger packages, the README forms a concise beginner guide and might link to a more detailed user guide that is located elsewhere.
The audience for a README file is the end user, such as a researcher. It’s important to consider the person will read your documentation, and to see things from their point of view. It may be someone who is unfamiliar with certain technical terms, or a researcher will less experience of advanced computing. A suitable approach is to imagine writing a manual for a new user who has never seen this software before.
How to write a README
To start writing a README file, the simplest way is to create
an empty text file called README.txt
and start
writing. This file should be located in the directory (or folder) that
contains your software project.
Challenge
Let’s create a new code project. Create a new, empty directory to contain your work. Then, start writing your README!
Follow these general steps to create a README file. The specific details for each operating system are detailed below.
- Create a directory to contain your project. We call this the root directory;
- In that directory, create a new text file;
- Name the file
README.txt
; - Open the file for editing—start writing your documentation!
The essentials contents of a README file are:
- The name of the software. This seems trivial, but a clear title and description of a piece of software will be essential for others to identify your software and differentiate it from others.
- A brief introduction to your code, including links to relevant websites or contact details for the maintainers.
- It should be clear who the target audience is for the software package.
- Installation instructions or a link to further information published elsewhere.
- Usage instructions, ideally including a “quick start” guide with a few simple examples to get people up and running with your software package.
It can be useful to signpost to related useful methods and software tools by providing links and explaining how other software is related or different to this project when it comes to addressing these kinds of research problems.
Walk a mile in the user’s shoes
Put yourself in the position of a researcher who has encountered your software for the first time. Imagine that you had to start from square one, how would you like the code to be introduced to you?
Discussion
Consider your field of research and the technologies you commonly use.
- What things are obvious to you that may not be clear to others?
- What assumed knowledge must you explain to new colleagues to get them up to speed?
For research code, it’s often important to explain the context in which the software was written and the theory behind it. For example, many researchers write analysis packages or workflows that are based on previously-published research, statistical methods, or theoretical models for which citations can be provided. By including references to research papers we better help the users to understand the methods that are implemented by our software, which enables its users to properly cite their sources and increases the users’ confidence that you have applied those methods correctly.
Installation instructions
Provide instructions for installing your research software. These steps should be laid out in simple, clear language and organised in a step-by-step manner.
Discussion
Consider a research code project you’ve worked on. Discuss the technical prerequisites for that software or system. What would someone need to do, when starting from a blank slate, to recreate that environment?
Think about:
- What hardware and software did you need?
- What drivers and libraries were required?
- What software setup, calibration, and configuration is required?
Installing prerequisites
Most research code has several dependencies, such as libraries. The user will need to install the programming language onto their computer, such as R or Python, so it’s useful to link to the download pages and provide a link to the package manager tools that are commonly used in those ecosystems. This might also include listing any prerequisites such as hardware or software that must be installed first, such as device drivers.
Consider how the installation method might differ for users of other common operating systems, such as Windows, Linux, and Mac OS.
User guide
All software should include some short guidance on how to use it and what the main options and features are. This might be a “quick start” guide with simple examples of common use-cases, or a walkthrough that uses a sample data set.
Explain how the software can be configured or customised, including examples of commonly-used options. If the software integrates with other tools or uses specific file formats for its input and output, it’s useful to explain this here too. It’s a good idea to include links to further documentation if available.
Many users will benefit from a frequently asked questions (FAQs) or troubleshooting notes, which describes common error messages, explains why they occur, and the steps to resolve them.
Writing style
The writing style should be concise, jargon-free, consistent, and pitched at the appropriate level to the intended target audience. All technical terms and acronyms should be explained. However, don’t reinvent the wheel by defining all the terms used, instead link to a reliable external source or journal article.
For more information about the broad topic of improving your writing style, please review these style guides.
Diagrams can be particularly useful to explain complex concepts and workflows. Screenshots may also provide a visual demonstration of how the software will work.
Discussion
Discuss with the group:
- Reflecting on your past experiences, what software or systems have you used that included excellent diagrams and illustrations to help you learn to use them as a new user?
- Have you ever watched a tutorial video online that explained a software tool or process? What did you like and dislike about the walkthrough?
Not all READMEs must follow this structure. Always adapt the format of your documentation to suit the specific needs of your audience.
Accessibility
Accessibility means reducing barriers to use of your research software or participation in the development community on the basis of expeirencing a disability or other social factors. When writing documentation for your code, consider how you can adapt your writing style and present information in a way that means that everyone can interact with it by expending the same amount of time and energy, regardless of their relative abilities.
While this is a broad topic, some general tips to consider when authoring software documentation in a research context are:
- Global audience: Explain ideas in a way that can be understood by people anywhere in the world, regardless of background. Be sensitive to cultural differences and avoid offensive language;
- Inclusivitiy: Avoid biased language and value diversity e.g. when writing examples;
- Navigation: Ensure that the documentation is compatible with assistive technologies like screen readers and keyboard navigation.
For more information on this topic, please see the following resources:
- Alistair Duggin, What we mean when we talk about accessibility defines the core concepts of accessibility.
- Google developer documentation style guide, Write accessible documentation provides helpful examples.
- Write the Docs, Accessibility guidelines: for writing and beyond lists many useful materials.
Text formatting
Most people prefer to use a file format that allows you to format text and create headers to organise the content into sections or chapters, which makes the content more comprehensible for the reader.
In this case, a Markdown document may be used. Markdown is a simple markup language that indicates into semantic labelling (such as emphasis and structure) and visual styles that make your documentation more aesthetically pleasing and easier to navigate. It allows you to format your text using symbols to represent headers, bold text, bullet lists, etc. that are displayed to the user using their screen or other device, depending upon accessibility requirements.
A markup language is a system of special characters that are used to decorate or format pieces of plain text. The syntax normally consists of symbols or tags that are used to encode text, which means implying meaning to make it more information-rich. It can be used to structure a documented into sections to provide logical organisation so that it’s easier to navigate.
Typically, a markup language is edited in a similar way to a computer programming language, and is rendered into a document with various rich text formatting such as headers, bold face fonts, etc.
Challenge
Convert your README file to Markdown format to enable more advanced formatting options.
Follow these steps to rename README.txt
to
README.md
.
An example README file in Markdown format is shown below, in a file
called README.md
where “.md” is the file extension for
Markdown files.
Section headers
You can separate your document into hierarchical
sections with headings using the #
symbol. This
makes your README easier to navigate. For example:
MARKDOWN
# Birdsong identification tool
This user guide provides instructions on how to use this birdsong
identifier. The software is designed to assist users in
identifying bird species based on their vocalisations.
# Installation
To install this software, follow the steps below...
# Usage
To use this package, start by configuring...
The hash #
symbol means that line will be converted into
a header and displayed to the reader in a large, bold font. This makes
it easier for the reader to find the part of your text they’re looking
for, just like having chapters in a book.
Challenge
Create suitable headers in your document.
How would you organise your document by dividing up the text into subsections by adding further subheadings?
We can create the commonly-used headers used in READMEs by using the Markdown syntax shown below
MARKDOWN
# Title
Brief introduction to the tool...
# Installation
To get started...
# Usage
To use this tool...
This gives some basic structure to the document, which we’ll flesh out later.
We can further subdivide the content by using header levels,
where each subheading uses an additional #
symbol. For
example, #
is a top-level heading, ##
is a
section header, ###
is a subsection header, etc.
MARKDOWN
# Title
Brief introduction to the tool...
# Installation
To get started...
## Prerequisites
...
## Drivers
...
# Usage
To use this tool...
## Quick start
...
## Examples
...
These subheadings help the users to navigate the document.
If your code is published on GitHub, the home page of your code repository will display the README file, including a table of contents that is automatically created to easily select the section of the document to view.
Text formatting
Here are some commonly-used text formatting options that can be used with Markdown syntax:
Meaning | Example | Syntax |
---|---|---|
Strong text | Eastern towhee | **Eastern towhee** |
Emphasised text | Pipilo erythrophthalmus | *Pipilo erythrophthalmus* |
Code block | name = "Pipilo erythrophthalmus" |
`name = "Pipilo erythrophthalmus"` |
Hyperlink | Eastern towhee | [Eastern towhee](https://en.wikipedia.org/wiki/Eastern_towhee) |
These may be used to add emphasis to parts of the text or highligh key words and phrases. Using text formatting makes your software documentation easier to skim-read, so researchers can quickly find the part of the text that’s relevant for what they’re working on.
Challenge
Identify several key words in your README file. Highlight them in using a bold font face.
The Markdown syntax for bold font is to wrap the text in two
asterixes **
. This may be applied to single words or to
phrases.
For example, we can strongly emphasise a single word:
Identify a bird based on the sound of its call.
Or emphasise a phrase:
Identify a bird based on the sound of its call.
Block quotes
We can create a citation with appealing formatting by using the blockquote syntax in Markdown, which is similar to the method used in email.
MARKDOWN
> The eastern towhee (Pipilo erythrophthalmus) is a large New World
> sparrow. The taxonomy of the towhees has been under debate in
> recent decades, and formerly this bird and the spotted towhee
> were considered a single species, the rufous-sided towhee.
This will be rendered with the following apperearance:
The eastern towhee (Pipilo erythrophthalmus) is a large New World sparrow. The taxonomy of the towhees has been under debate in recent decades, and formerly this bird and the spotted towhee were considered a single species, the rufous-sided towhee.
(This text was retrieved from the Wikipedia page on the Eastern towhee.)
Code blocks
If you’d like to present the user will examples of source code, use
code
fences to display the code in a special text box with syntax
highlighting. To do this, wrap the code in three backticks
`
. For example:
If you include the name of a programming language then the syntax will be highlighted appropriately, for example:
This makes your code examples easier to read.
Markdown
You can learn more about writing documents using Markdown at Markdown Guide, a reference for using this syntax.
Remember, the README file is a first impression that research users will receive for your software. A README contains a brief description of the software, installation instructions, and a usage guide. Make them informative and user-friendly to enhance the research experience for others and foster collaboration. The writing style should be concide, clear, and explain technical terms. Use diagrams and screenshots for clarity.
Key Points
- A README file serves as an introduction to your software, guiding users on installation, usage, and understanding its capabilities.
- Consider the user’s technical background; write clearly and avoid jargon.
- Markdown is a recommended format for creating headers, bold text, bullet points, etc.
Further resources
For more information about writing basic software documentation, please review the following materials:
- Raphael Pierzina Hi, my name is README!
- Kira Oakley The Art of README
- Aleksandra Pawlik Five top tips on documentation
Content from Documentation strings
Last updated on 2024-09-12 | Edit this page
Overview
Questions
- How do we describe our code?
- How can we annotate functions in our research code?
- Why are documentation strings useful for research software?
Objectives
- Understand the purpose of documentation strings
- Learn how to write documentation strings that will be useful for other researchers
- Introduce ways to describe the parameters and return values of functions
How do we describe our code?
If you’re publishing a research software package, one of the most common ways that its users will learn to interact with the code is by reading the documentation for each individual function.
We learned about functions in an earlier module on software design principles. Functions help us to break our code into smaller units that have a single purpose. By documenting those functions effectively, we aim to explain their purpose to future users and maintainers of that code. We also need to describe all the expected inputs and outputs of the function.
Documentation strings
We describe functions by using a feature of many programming languages called documentation strings, usually abbreviated to docstring. A documentation string is a piece of text that describes that piece of code and helps people to use it.
To make a docstring, we write special comments in our code using syntax which is specific to each programming language, although the principle is the same.
Whenever you add functionality to a code project, consider wrapping it up into a function. It may help to write the docstring first to help work through what the purpose of your new code is before you start!
Challenge
Write a documentation string for a function. Create a script called
oddsong
and define a function named identify()
that’ll be used to identify bird songs by inspecting an audio file to
provide the name of that species.
In this code, the function uses the normal Python syntax, except a string has been included below the function definition. The contents of that string will be displayed to users in their development environment or by running the help function like so:
OUTPUT
>>> help(add)
Help on function add in module __main__:
add(x, y)
Calculate the sum of two numbers.
Challenge
Use the help()
function to view the documentation string
for a function.
Let’s view the help text for an in-built function
abs()
that finds the absolute value of a number.
The following text will be printed to the screen@
OUTPUT
Help on built-in function abs in module builtins:
abs(x, /)
Return the absolute value of the argument.
The most important thing to include in a docstrings is an explanation of the purpose of this piece of code. To write a useful docstring, put yourself in the shoes of someone who encounters your code for the first time and needs a simple introduction that doesn’t assume any implied knowledge. The explanation will be very basic and seem obvious to you, but it may help a new user greatly.
Discussion
How can we tailor our docstrings to different audiences, such as new users and experienced developers?
Arguments
Next, we must describe the inputs and outputs of the function, its arguments.
We list all the arguments, or input parameters, as shown in the code examples below. Each argument has a name and a brief description.
We have added an “arguments” (abbreviated to “args”) section to our docstring which lists the input parameters of the function and describes each one.
Challenge
Add a description of each argument to a function in your code.
Run help()
and evaluate the output.
Return values
Finally, we describe the result of the function that is output by the return statement.
This will help the user to understand what the function does and what they can expect to receive back when they call it. It can also be useful to explain any potential errors or exceptions that the function will raise if the inputs aren’t as expected, and how to deal with them.
Challenge
Describe the return value of a function in a documentation string.
Run help()
and evaluate the output.
Usage examples
We can also include demonstrations of how to use our code by providing code snippets. To do this, we write a collection of sample code that demonstrate how to use functions effectively in different scenarios.
To do this, let’s add an examples section to our documentation
string. Each code example has a prefix of >>>
which represents the input prompt on the Python interpreter. Some code
editors will provide syntax highlighting of these code snippets.
Challenge
Write a brief code example within the documentation string in a function in your code.
Best practices
This section contains some tips for writing useful documentation strings.
Prioritisation
Focus on the purpose and functionality of the code, rather than getting bogged down in the details of how it works. Explain what the function does, rather then the specific implementation, because this might change over time. A function encapsulates an isolated part of a system, which can be used as a black box by other parts of the system or the end user, who in many cases only needs to understand its inputs and outputs.
Tips:
- It’s a good idea to start your docstring with a high-level summary of the function.
- If the function is a major one, include a simple introduction for the new user.
Clarity is key
Be concise. Describe the essential information that user needs to know first and be brief but clear.
As with any software documentation, avoid jargon where possible.
Discussion
Read the following documentation string:
PYTHON
def add(x, y):
"""Adds two numbers together, which are the x and y arguents of this function.
This function takes two numbers as input and returns their sum.
The addition is performed using the built-in `+` operator.
Args:
x: The first number to add to the second number, y.
y: The second number to add to the first number, x.
Returns:
The sum of x and y, which are summed using the addition operator.
"""
return x + y
Discuss how can we effectively convey the purpose and functionality of a function in a docstring, without going into excessive detail about its implementation?
Don’t reinvent the wheel. Provide links to further resources for users to take a deep dive into more complicated topics.
Discussion
How can we link to external resources that can provide more in-depth information?
Be consistent. Decide a style of docstring and use that everywhere across your software project. If you’re working on a larger project with multiple developers, refer to the coding conventions and, if in doubt, follow the style of existing code.
There are several different standards for documentation strings. A standard is a convention that determines how the docstrings will be organised and the syntax that is used to represent the arguments, data types, etc.
A list of documentation string standards in Python:
- The PEP 257 docstring standard was designed by the maintainers of the Python programming langauge.
- The Google Style Guide sets out a docstring format.
- Sphinx docstring format, which has a NumpyDoc extension designed for scientific use.
It doesn’t matter which one you select, as long as it’s used consistently across a project and it’s clear what the syntax means. Some standards are better-supported by other tools such as IDEs and documentation generators.
Automatically generate docstrings
Generative AI services such as Google Gemini can read your code and write docstrings automatically, to a certain extent.
To do this, ask the system to create a docstring and copy your code into the prompt text box. Below is an example prompt and the reply generated by the Google Gemini algorithm:
Please generate a docstring for this Python function:
def calculate_rectangle_area(width, height):
area = width * height
return area
The result is the following docstring, in addition to some helpful descriptions of the content that it generated.
PYTHON
def calculate_rectangle_area(width, height):
"""
This function calculates the area of a rectangle.
Args:
width (float): The width of the rectangle. Must be a positive number.
height (float): The height of the rectangle. Must be a positive number.
Returns:
float: The area of the rectangle. This will be a positive number.
"""
# Calculate the area
area = width * height
return area
This AI-generated content contains a function summary, argument descriptions, and explains the return value as we discussed previously.
Challenge
Try asking a generative AI service such as Google Gemini to read your code.
- Ask it to generate documentation of different kinds.
- Request a review of your code. What does the bot think?
- Can the chat-bot create a diagram to illustrate a concept that is relevant to your research software?
This can save you a lot of time, but as with any LLM-generated content, always check the output and ensure it’s correct!
Discussion
What are the benefits and risks of using a large langauge model (LLM) service such as Google Gemini or OpenAI ChatGPT to interpret your code and produce content that you use in your research?
How should we critically evaluate this material so that it can be used appropriately to improve the productivity of our research teams without jeopardising our ethics or integrity or causing security risks?
Documentation strings make your code clearer to read and easier for other researchers to use. Also, they make your research software easier to maintain in the long run, saving time and resources. Good docstrings are clear and use everyday language.
Well-documented, reusable research code depends upon good documentation strings. Research collaborators will benefit from clear explanations of the purpose of each function.
Key Points
- Docstrings are special comments that describe the purpose of a function and its inputs and outputs.
- Structure your docstrings to convey more information, with a concise introduction.
- Documentation strings allow you to break your documentation into bite-size chunks, with one overview comment per function.
Further resources
To find out more about documentation strings, please refer to the following resources:
- Python PEP 8 Documentation Strings
- Numpy style guide describes the syntax and best practices for docstrings in the Numpy project.
Content from Code readability
Last updated on 2024-09-25 | Edit this page
Overview
Questions
- What is code readability?
- How do I make my code easier to interpret?
- How do I explain the purpose of my code?
Objectives
- Understand the common ways to make code easy to read
- Learn how to write code comments
- Learn to document variable types in Python and R
It’s a common trope in the software engineering world that code is read much more often than it is written. It’s important that our code is approachable for new people to use with confidence, as they might want to review the code itself to understand what it does. Also, when you maintain your code, or come back to it in the future, you’ll be grateful for the effort you made in making it easy to interpret and follow its logic.
Syntax highlighting
Many text editors use syntax highlighting to display parts of your source code using different colours or fonts to signify the meaning of each word or symbol. For example, variable names may be given a bright blue colour, strings highighted in green, and numbers shown in a red font.
Let’s take a look to see its benefits:
Which bit of code is easier to read? What a difference a splash of colour makes! I know which development environment I’d rather work in.
Code editors
To work with our source code in a colourised way like this, use a text editor or IDE with a syntax highlighting feature such as Notepad++, VSCode, PyCharm, or RStudio.
Challenge
Try using some code editing software to apply syntax highlighting to your code.
If you don’t have access to an IDE, you could try the Online syntax highlighting tool by Oleg Parashchenko which can colourise R scripts and Python code.
Meaningful names
Our code should convey as much meaning as possible to the user or developer that’s trying to interpret it.
Variable naming
Every variable has a name and a value. For example,
the code x = 42
creates a variable named x
that has the numerical value of four. But what does x
mean?
Is it the number of swallows required to carry a coconut? In this case,
we have no idea.
That’s where meaningful variable names come in.
Always try to name variables using a noun that describes its contents.
For example, in our case we’d use
laden_coconut_capacity = 42
which is much clearer.
Function names
A function contains code that defines the performance of an
action. As with variables, the name of a function
should describe its behaviour so that the user of that
code can anticipate what it will do when they run it. A vague function
name, such as calc(a, b)
will be mysterious without any
more explanation. Name your functions using a simple verb
phrase such as calculate_area(width, height)
so
it’s easy to interpret their purpose.
Discussion
Try modifying your example code by renaming the variables and functions.
- How much meaning can you include in these object names?
- What are the limitations of this approach?
Naming conventions
The communities of developers that use each programming language usually follow a conventional approach when naming objects in their code.
It’s also a good idea not to use single-letter names such as
x
or T
because it may not be clear to someone
else what these represent. Also, avoid the common pitfall of naming a
variable with the same name as an in-built function
such as sum()
.
Try writing a simple example of a research-related script using the style conventions discussed above.
Although these rules aren’t strict, because your code will still run without error, it does help clarify your intentions by describing what type of variable or object is being referred to. Whatever you do, please try to follow a consistent style with your collaborators to avoid confusion.
Comments
Code comments allow us to annotate any part of our software with a human-readable description of the expected behaviour of the code or our general intentions to aid the reader in their interpretation. Start writing these as soon as you begin development work, as they’ll capture your thought process while the knowledge is fresh in your mind, avoiding the risk of forgetting important details.
To add comments to your code, use the #
symbol at the
start of a new line, like so:
It’s best practice to use a very concise style when writing code comments. I recommend using active tense verbs.
Discussion
Try adding comments to your code.
- Which parts of the code will most benefit from comments?
- How long and detailed should comments be?
- How would you refer someone to an external website for more information?
Type hints
Type hints display the expected type of each object in your code. They are a kind of “documentation as code” that annotate the code that’s already there, rather than being written as separate documentation. While they don’t change the way the software works, they can help to improve code clarity and may be used to catch errors early in the development process.
Type hints for variables
When reading source code, it can be useful to know the type of each variable so we get an idea of what possible values they might contain as they move through the system.
Using type hints will make sure your code much easier to read and provide helpful documentation for others, and yourself in the future.
Function argument type hints
They can also be used to label the input and output types of functions. They are not strictly enforced, but act as a guide to the reader.
None of these code examples will cause an error because type hints are just passive labels that document our code. They don’t enforce any type checking or rules that are asserted when the code is executed. This means that, while type hints are very useful for static analysis of code, where we learn something about a piece of software without running it.
This is just a brief introduction to code annotation. For the keen coder, there are many more features and tools available to make your software easier for other people to understand and use.
It will take some time and effort to write these labels, but it will pay off in the long run to think about variables types and make it easier to interpret how the code will behave as it operates. It’s best practice to use an integrated development environment (IDE) that will check your type hints and inform you if it detects a problem with your source code.
Key Points
- Try to inject as much meaning into your source code as possible by naming things clearly and succintly.
- Use comments to explain your rationale—even if the code seems obvious to you know, think of the future benefits!
- Label functions and variables with type hints to tell the user what data types are expected.
Further resources
To find out more about the topics covered in this episode, please refer to the following pages:
- The Hitchhiker’s Guide to Python Code Style
- The tidyverse style guide for R
Content from Contributor guidance
Last updated on 2024-10-15 | Edit this page
Overview
Questions
- How do I introduce new contributors to my research software project?
- What is the best way to communicate processes such as bug reporting?
- Where should I write up the design and structure of the system?
Objectives
- Learn to write a contribution guide for research code
- Learn about software coding standards
- Implement ways to facilitate communication between researchers that are engaged in the project
- Provide a high-level understanding of an existing codebase
Collaborative research software development
Often, in today’s research environment, much analytics software is written in a collaborative manner, involving multiple specialists from within a team, or from multiple institutions. For the long-term health of a software package, it’s important to encourage potential contributors to get in touch and feel welcome to take part. Useful research software can take on a life of its own.
Research software project management
For more information on planning the development of research software and project governance, see Module 1a.
It’s often published using an open source licence, which means that all the code is publicly available and may be used and modified by anyone, within certain conditions (see module 1b to learn more about software licensing.)
There’s a lot more to creating and managing a sustainable community around a research software project, but having a central piece of documentation for contributors is a great start!
Discussion
Consider these questions amongst the group:
- How can we effectively foster a collaborative environment for research software development?
- How can barriers to participation be removed for a diverse range of individuals and institutions?
- What strategies can be implemented to ensure that all contributors feel valued and included?
Contribution guides
Contribution guidelines help users and understand how they can help to improve the software, whether that’s by submitting bug reports, suggesting new features, or writing better code and documentation. All of these aspects are vital to produce reusable research software.
Potential collaborators should be able to easily find out how to take part and contribute. Developers should be encouraged to use appropriate communication channels to ask questions and inform others of proposed software changes. The contact details for the project administrator or committee should be available and they should be welcome and responsive to any queries.
It’s important to explain how the project is managed so the process for evaluating new features and getting them implemented is clear, such as the code review and approval process. For many projects, a ticket system may be used to raise issues and suggest new features. Software developers often propose new code by creating a branch on the version control system (such as Git) and requesting for those changes to be merged into the main codebase.
Contribution guides will save you time in the long run, because it provides an on-ramp for people to get involved, prevents them from getting confused, and reduces the amount of incorrectly-submitted bug reports or requests for change, etc.
Discussion
Discuss these issues amongst the group:
- What essential components should be included in a comprehensive documentation for research software contributors?
- How can we make onboarding new contributors a smooth and welcoming process, ensuring they have the necessary information and support to be successful?
- How can we balance the need for clear guidelines with the desire to encourage creativity and innovation?
How to write contributor guidance
The standard practice for authoring a contribution guide for a
software project is to create a file called CONTRIBUTING.md
in the root folder of your project. This is a Markdown file that
introduces new people to the project. It lets people know the ways they
can take part in the research software project and what to do to get
involved.
The specific contents of this file depend upon the kind of research project, but some useful information to provide typically includes:
- An introduction to the organisation and structure of the code, possibly including diagrams.
- Instructions to raising issues, suggesting new features, and proposing code changes.
- Links to additional documentation that’s hosted elsewhere, such as a code of conduct or discussion forum.
- A walkthrough to setting up a development environment, such as guidance on installing developer tools or other prerequisites.
On code repository hosting platforms such as GitHub, the contribution
guide will be created automatically using this
CONTRIBUTING.md
Markdown file.
Challenge
Create a new file called CONTRIBUTING.md
and populate it
with a few sentences.
- What are the most important things for a new contributor to know?
- What should a user do if they encounter a bug?
- What are the common questions that a new developer might have when they work on this research software?
Software project governance
Project governance defines the scope and aims of a research software engineering project, and determines how decisions will be made and carried out. It sets out the processes and responsibilities that collaborators must understand to take part. This is something that should be considered when preparing a software management plan, as discussed in Module 1a of this course. This is important to make sure that questions of who does what, and how, are stated clearly so that everyone can understand and collaborate effectively to produce excellent research software. It’s worthwhile to think about this early on in a project to avoid potential pitfalls later on!
Code of conduct
A code of conduct provides guidelines for the expected behaviour of people who are involved in the project. You may want to provide some general tips to create a productive community of researchers around the software, such as creating positive interactions between contributors, treat others with respect and dignity, and recommendations for processes for handling differences of opinion.
This has the following advantages:
- Fosters a healthy, collaborative working environment where people feel respected, included, and can freely share ideas.
- Managing expectations and creating clear rules will reduce the amount of time wasted due to misunderstanding and conflicts.
- Build a communinity: an ethically-run and transparent project will encourage contributors to share the values of the project and remain engaged.
For many working in a research context, there are additional considerations to ensure that institutional policies, ethics, and data protection regulations are carefully observed. These protocols are outside the scope of this document, but these factors should be clearly communicated to all contributors.
Contributor Covenant
Many open-source research software projects adopt the Contributor Covenant, which is a template charter that may be customised to suit the needs of your collaborators.
Developer notes
For people who are contributing code to the project, they’ll need the following information:
- Which version control system is being used. Typically, this will be
git
or similar tools, as discussed in Module 2 of this course. - How to add automatic tests and whether a testing framework is in place.
- Describe the code organisation and package structure.
Technical documentation
System documentation is important for new contributors to familiarise themselves with the codebase and as a reference for existing engineers. There should be a concise description of how the system works from a more technical perspective, with the intended audience being software developers, rather than the research users.
An architecture diagram is an efficient way to provide a “map” to help developers to understand and navigate a complex system.
Coding conventions
Many projects follow a set of programming standards to manage code quality. A coding style guide will help to ensure consistency across all the code written as part of a collaborative project, which helps others to read and interpret the code, making it easier to maintain in the long run. The code style rules should cover things like the way to describe functions, how to indent code, and naming conventions for variables.
This might include guidance and advice, or more strict rules as standards that are checked by a code linter. A code linter is an analysis tool that inspects code and checks for common errors and problems, producing a report for the developer to read and act upon. Common coding style standards include the PEP 8 style guide for the Python programming language and the tidyverse style guide in the R statistical language.
Discussion
Discuss these issues as a group:
- Why are coding conventions important for collaborative research projects?
- How can we establish and enforce coding style guidelines that promote consistency and readability?
Key Points
- Encourage collaboration: There are many ways to contribute to a research software project, including bug reoprts, feature suggests, design discussions, documentation, and software engineering.
- Clear processes: Explain the process for making changes and having them included into the code
- Bug reports: Create simple ways for users to report issues and have these problems resolved in a timely manner.
- Communication: Create appropriate communication channels so that design discussions and proposed changes may be worked through transparently.
Further resources
To find out more about creating healthy communities of developers to collaborate on research software engineering projects, please visit the following resources:
- GitHub Docs Setting guidelines for repository contributors
- H. Gruson and H. Turner Software Sustainability Institute Opening the door to new contributors in open source projects
- Stephan Druskat And then there were users: Designing governance for open research software projects Talk at RSECon23 in Swansea.
Content from Documentation sites
Last updated on 2024-09-25 | Edit this page
Overview
Questions
- How do I present comprehensive information to users of my research software?
- How do I generate a website containing a user guide to my code?
- What should a good documentation website contain?
- How do I publish my software documentation on the internet?
Objectives
- Learn about documentation websites for software packages.
- Gain basic familiarity with some common website generation tools.
- Understand the basics of structuring a documentation website.
- Be able to set up a static site deployment workflow.
Documentation websites
A documentation website is a user guide and reference manual for a library of research code. Up to now, we’ve looked at ways to put helpful notes in our code, but now we’ll learn how to write a longer, more complete guide to the research tools you create.
A documentation site bring all your user guidance into one place. This kind of resource may be prepared for research software and will usually contain an introduction, installation instructions, a user guide, troubleshooting tips, and an in-depth reference section.
To get an idea of this, here are some links documentation websites for widely-used data analysis and research software packages:
- pandas is a data processing library for the Python programming language.
- ggplot2 is a plotting package for the R statistical language.
- scikit-learn is a machine learning library for the Python programming language.
Discussion
Evaluate these documentation sites.
- What do you like about them?
- How approachable are they as a new user?
- What do you find difficult to understand in this material?
Why create a website?
There are many advantages to building a documentation site to provide a information-rich resource for researchers who use your code at institutions all around the world.
Advantages
These sites can work as hubs for collaboration, sharing the latest updates, and encouraging people to take up your system and get involved in improving it. The effort of setting one up will be rewarded in the long run because you will have created a valuable asset that will foster collaboration and knowledge sharing in your research community.
A key foundation stone of modern digital research practices is the ability to replicate results by reproducing analytic workflows. Clear, thorough documentation of the research code ensures that researchers can repeat processes and verify results and other people’s outputs.
Documentation sites are really useful for introducing new users to your software. It makes it much easier and faster for new users to get started using your software to boost their research. It’s one of the most effective ways to create a user base that has a sophisticated understanding of the research code, which is essential for them to adapt it to the complex problems that often raise in research contexts.
They’re also a valuable resource for your existing user base, enabling them to look up reference material or search the manual to find new capabilities they weren’t aware of before. This will increase the potential for your software to increase the productivity of other research teams.
When to use one
Although the advantages are numerous, not all software packages require a comprehensive documentation website. However, for any code project that is growing in the number of collaborators, users, and technical complexity, consider coordinating the team to write one as soon as possible to help the project continue its’ healthy growth.
Discussion
When is it appropriate to establish a documentation website? Consider the following factors:
- How many resources will it take to write and maintain?
- How many end-users need the information?
- Is there a simpler format that can convey the same information?
Contents
Documentation pages contain comphrehensive information about a particular piece of research software. Think of it like a user manual for your car or an instruction guide for building a piece of furniature.
Research context
For research software, it may be important to explain the theoretical background or statistical methods that are used and explain the domain-specific assumptions that were made when the code was designed and written. It’s good practice to provide a concise summary of the relevant concepts and link to external sources such as papers, books, and other websites for users to take a deeper dive into the principles and algorithms used.
Installation instructions
This section provides a detailed walkthrough of the steps required to install the package onto their computer, with details that are specific to their operating system.
Tutorials
It can be very useful to include an in-depth “Getting Started” guide that provides step-by-step instructions to introduce a new user to your software package. It might guide the user through each aspect of the tool’s functionality and features so they’re able to become familiar with it in a more approachable way.
A series of code examples to demonstrate how to use the software in different contexts can be very useful for users to get off the ground in implementing common research workflows to achieve their specific goals.
User reference
If you have written functions that are intended to be use in other reseachers’ code, then an on-depth explaination of these procedures is essential reference material. In the world of software engineering, these detailed appendices are called API references, which list each function and describe how the arguments may be used to control how the code works. This content may be automatically generated from the documentation strings.
Troubleshooting
As issues come up with your research code, and are eventually resolved and clarified, make a note of the causes of these troubles and make them available to the entire user base in your documentation site. This will help users to identify and fix common misunderstandings and technical problems they may run into when utilising your code.
This prevents a situation where potential solutions to common issues do exist, but are scattered around the internet are the exclusive knowledge of a few individuals and are hard to find.
Writing style
As we discussed in the episode on READMEs, it’s important to strive to use everyday, jargon-free language. It helps to set an approachable tone that encourages others to use the software and get involved with the project. This will en sure that the code is accesible to the widest possible layers of the research community and foster collaboration.
Always consider the target audience of your documentation, because your user base may be unaware of some of the unstated assumptions and technical backgroud knowledge that you take for granted.
Tools
There are various tools available to build documentation sites for your research software.
GitHub Wiki
If you are publishing your code on GitHub, which is a web service that hosts costs repositories, then one of the easiest ways to create a documentation site is to use the wiki feature on that platform. This is a great way to write detailed, structured documents containing long-form content that describes aspects of your software. What’s more, it’s available alongside your code so your documentation and software are located in one place.
As with readme files, the text that appears on GitHub is formatted using Markdown syntax.
Getting started
To create a wiki, which is a simple, easy-to-edit web site, go to the main page of your code repository on GitHub and click on the Wiki button on the top menu. For a detailed walkthrough of this process, please read adding or editing wiki pages on the GitHub documentation.
GitHub Wikis
For more information about the wiki feature on GitHub, see Documenting your project with wikis on the GitHub documentation.
Documentation sites for R packages
It’s also possible to generate a documentation site to accompany R packages that you create. For more information about this, please refer to the book R Packages by Hadley Wickham, which has a chapter on documentation websites.
Sphinx
Sphinx is a tool for building documentation websites that is commonly used amongst developers of Python packages, although it’s also compatible with other programming languages. It doesn’t currently support packages written using the R statistical language.
Sphinx is a documentation generator tool takes plain text files that use a markup syntax (such as reStructuredText or Markdown) for formatting the content of your documentation site and transforms them into various output formats, ready to be published on the internet. It has a number of useful features, but in this module we’ll learn the basics to document our research code.
Callout
For a more in-depth guide, please see Build your first project in the Sphinx documentation.
Getting started
Let’s use Sphinx to create a documentation site for our Python code.
Installing Sphinx
Navigate to the root folder of your code project. Create a virtual
environment using venv which is a
separate area in which to install the Sphinx package. This command will
create a virtual environment in a directory called
.venv/
This will create a subdirectory that contains the packages we’ll need to complete the exercises in this section.
Run the activation script to enable the virtual environment. The specific command needed to activate the virtual environment depends on the operating system you are using.
Use the Python package manager pip to install Sphinx.
Start a new Sphinx project
Sphinx includes a command to set up a new project called sphinx-quickstart. Navigate to your project’s root folder and run the following command.
This will initialise the configuration files for a new Sphinx site in
a subdirectory called docs/
and prompt you to enter the
following options:
- Project name: Birdsong Identifier
- Author name(s): Bill Oddie
- Project release []: 1.0
Sphinx options
To find out more about the Sphinx configuration files, please read their guide to defining document structure on the Sphinx documentation.
Building the site
In this context, building means taking our collection of Sphinx files and converting them into the source code files that define a website. Sphinx will create HyperText Markup Language (HTML) files, which is the markup language for pages that display in a web browser commonly used on the internet.
To build our site, we run the sphinx-build
command using the -M
option to select
HTML syntax as the output
format.
Sphinx will load our files from the docs/
directory and
output the built HTML files in the docs/_build
directory.
The file docs/_build/html/index.html
contains the home
page of your new documentation site! Open that file to view your
handiwork.
Autodoc
It can be useful to automatically populate our documentation sites by converting our documentation strings into formatted text. We can achieve this using the autodoc plugin for Sphinx.
Configuring Autodoc
Let’s set up the options for autodoc
. (If you struggle
with these steps, please refer to the template
project.)
Add the following lines to docs/conf.py
which
PYTHON
# Our Python code may be imported from the parent directory
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
This ensures that Sphinx can access our Python code by pointing at
the root directory of our project. The ..
syntax means “one
folder up”, which means autodoc
will search in the root
directory for code to import.
The Python code uses sys.path
,
a list of locations to search for code. By modifying the Python
module search path, we allow autodoc
to locate and
import our code modules from a specific directory that is not in the
default search path.
This is often necessary when working with project structures that involve multiple directories, helping the interpreter to find code that isn’t installed in the standard library location.
Next, edit docs/index.rst
and add the following lines to
instruct Sphinx to automatically generation documentation for our Python
module.
This reStructuredText (reST) markup language has the following elements:
-
..
indicates a directive within a reST document that is used to configure Sphinx. -
automodule::
indicates a specific directive to useautodoc
to automatically generate documentation for a module. -
oddsong.song
is the path to our Python module, for which documentation will be created. -
:members:
is an optional argument for the automodule directive that instructs Sphinx to include documentation for all members (functions, classes, variables) defined within the specified module.
For more information about reST, please read the Introduction to reStructuredText by Write The Docs.
Now, when we build our site, Sphinx will scan the contents of the
oddsong
Python module and automatically generate a useful
reference guide to our functions.
The result looks something like this:
Automatically generate content
Try using autodoc
to analyise your own code and build a
documentation site by following the steps above.
After the sphinx-build
command has completed
successfully, browse the contents of the docs/_build/html
folder and discuss what you find.
Publishing
Now that you’ve started writing your documentation website, there are various ways to upload it to the internet so that others can read it.
There are several hosting services that can be used to publish your documentation site, such as GitHub Pages and Read the Docs.
The detailed of setting up the deployment of your site to these platforms is beyond the scope of this course.
Key Points
- Structured documentation websites are very useful for users to learn to use all kinds of digital systems, ensuring its successful adoption by the wider research community.
- Documentation sites contain comprehensive installation instructions, user guides, and troubleshooting tips.
- There are several libraries that may be used to generate documentation sites.
- Documentation websites may be deployed to a hosting platform.
Further resources
Please review the following material which provides more information about some of the topics covered in this episode.
- Sphinx Getting Started
- Write the Docs Introduction to reStructuredText
- GitHub documentation About wikis
- Write the Docs Tools for documentation writing
Content from Command line interfaces
Last updated on 2024-09-12 | Edit this page
Overview
Questions
- What is a command-line interface (CLI)?
- Why are they useful for making software easier to use for researchers?
- How do I create a CLI for my research code?
Objectives
- Learn what a command-line interface is
- Understand the benefits of CLIs for making research code more accessible?
- Gain a basic familiarity with the
argparse
module in Python
Command line interfaces
A command-line interface, usually abbreviated to CLI, is a terminal or prompt that accepts text input that instructs a computer what to do. They are used to start programs and perform actions within the computer’s operating system.
In this section, we’ll introduce the concept of providing a command-line interface to our research code to make it easier to use and provide a well-documented “entry point” to our software.
Advantages of CLIs for research tools
Command lines are a way of interacting with a digital system that go back to the early history of computing. They might seem old-fashioned because typing out commands means that there is no graphical component. It may seem restrictive because your mouse isn’t used, but terminals have a lot of power because we can formulate our instructions to the computer by writing commands. We have a direct line to control our computer’s operating system.
It’s a great way to “talk” to your computer because you can record the commands that you’ve run to provide a documented history of a research process. (We could record a video screen capture of your working procedure, but that’s much less efficient.)
Terminals are more efficient for running repetitive tasks and provide extra functionality for advanced users. They are an cost-effective way to provide a user interface for research software, as research teams often lack the resources and know-how to produce sophisticated graphical user interfaces.
Using the terminal
There’s a lot of powerful commands that can be learned to take full advantage of the command line, but here we’ll just address the basics to help us make our research software easier to use by providing a well-documented CLI.
This section will briefly introduce you to using the terminal to achieve simple tasks. For an an in-depth course on using the command line, please study the The Unix Shell Software Carpentry course.
How to open the command line
Each operating system has a slightly different terminal interface, but they work in basically the same way.
Example commands
An example of a CLI command is a simple text command that performs some action or interacts with the computer operating system.
Let’s examine a simple one-word command that lists the files in the current directory.
Arguments
Commands have options that allow the user to choose what the tool will do.
When using shell commands, we use the words option, flag, and arguments to describe parameters that we can use to modify the operation of that command and the inputs used to initialise our code.
Challenge
Try the command line statements described above.
- How would you seek further help if you encounter an error?
- What response does the terminal provide? Is this what you expect?
CLIs in R
This rest of this episode is focussed on the Python programming language.
R, while a powerful statistical computing language, doesn’t have a built-in module specifically designed for creating CLIs. Unlike Python, this means that you’ll need to use external packages or write your own functions to handle command-line arguments and options.
However, there are several packages that can help you create to CLIs in R:
These packages create CLIs for your R scripts, making them easier to distribute for others to use.
CLIs in Python
We can add a command-line interface to our Python code using the methods and tools that are included in the Python programming language.
Getting started
Let’s continue working on our birdsong identification software project and create an entry-point to our code.
To create an executable script that will run from the command line,
create a file called oddsong/__main__.py
. When a user runs
our code from the terminal, this __main__.py
file will be
executed first.
This is a mechanism that tells Python how we want users to interact with our software.
To find out more, please read the __main__.py section in the Python documentation.
To run our code as a script we use the Python
-m
option that runs a module as a script.
This will execute the oddsong
module by running our
oddsong/__main__.py
file.
main()
functions
main
functions are used to as the primary “starting
point” for a command-line interface, otherwise known as an “entry point”
for our scripted sequence of commands.
Inside this file, create a function called main()
and an
if
statement as shown below.
When the user executes our CLI, Python will know to run the
main()
function and execute our research code. In this
case, our research code hasn’t been written yet, so we’ll just show a
message on the screen for now.
The logical statement if __name__ == "__main__"
means
that the main()
function will only run when the
code is run from the comand line as the top-level
code environment.
CLI documentation
Python has a useful inbuilt module called argparse to quickly create a command line interface that follows the standard conventions of the Linux software ecosystem.
To get started, attempt the challenge below.
Challenge
In this exercise, we’ll create an instance of the argument parser tool. Let’s edit our Python script.
First, load the argparse
library using the import
keyword, which is conventionally done at the top of the script.
Then, we’ll add the argument parser to our main()
function
so it loads when the script runs.
PYTHON
import argparse
def main():
# Define command-line interface
parser = argparse.ArgumentParser()
parser.parse_args()
print("Identifying bird vocalisation...")
if __name__ == "__main__":
main()
This creates a basic command line interface. Let’s try it out.
What do expect to see? What actually happens?
Now let’s ask for help! Run the following command to view the usage instructions:
What should we see when using the --help
flag? What
happens in your temrinal?
When we run our script as before, it will run like normal with no change in behaviour.
But, if we invoke the command-line interface using any arguments, then this new functionality kicks in.
BASH
$ python -m oddsong --help
usage: test.py [-h]
options:
-h, --help show this help message and exit
This is the default output of a CLI with no additional arguments
specified. The first line displays the usage instructions. This means
that we may execute test.py
with an optional help option
using --help
or -h
for short. Optional flags
are denoted with square brackets like this [-h]
.
The parse_args()
method runs the parser and makes our
arguments available to the user on the command line. This makes the
default --help
flag available which displays instructions
and notes that we can customise. As we continue to develop our CLI by
adding arguments, each one will be listed and described in this help
page. This is an excellent way to document our software and make it
available to researchers!
Arguments
But what if we want to take an input from the user? We add arguments to our CLI using the following syntax.
This will create an argument called args.file
that the
user can specify when they run our script, and that we can use in our
code to do something useful.
Challenge
Add this argument to our script and note the changes to the user interface.
The code now looks something like that shown below.
PYTHON
import argparse
def main():
# Define command-line interface
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--category')
parser.parse_args()
print("Identifying bird vocalisation...")
if __name__ == "__main__":
main()
Note that we add the argument before we parse them, which makes them available to use.
Now, when we invoke the help screen, we see our new “category” argument listed.
BASH
$ python -m oddsong --help
usage: oddsong.py [-h] [-c CATEGORY]
options:
-h, --help show this help message and exit
-c CATEGORY, --category CATEGORY
The layout of this text is done for us and follows the standard conventions of terminal tools.
Of course, if you’ve imbibed the spirit of the course, you’ll notice that our new category parameter is completely undocumented! It’s unclear what it is or how to use this option.
Argument descriptions
To provide a concise explanation for each parameter we use the
help
argument of the add_argument()
function as shown below.
PYTHON
# Add the category argument
parser.add_argument('-c', '--category',
help="The type of bird call e.g. alarm, contact, flight")
This text should briefly describe the purpose of the argument, without going into too much detail (which should be covered in the user guide.)
Challenge
Add a description of the --category
argument using the
add_argument()
function. What change do you expect to
happen in your CLI?
We can achieve this in our example script by adding a
help
string.
PYTHON
import argparse
def main():
# Define command-line interface
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--category',
help="The type of bird call e.g. alarm, contact, flight")
parser.parse_args()
print("Identifying bird vocalisation...")
if __name__ == "__main__":
main()
Now, when we call the --help
option, we see this
description as an annotation to that argument.
There’s a lot
more to learn about command line arguments, including several
powerful features of the argparse
library, but these are
beyond the scope of this course.
Description
We can provide a simple summary of the software that will be
displayed on the --help
screen of our CLI by using the description
argument when creating our argument parser object. This should
concisely inform the user about the purpose of the tool and how it
works.
PYTHON
# Describe the software
parser = argparse.ArgumentParser(
description="A tool to identify bird vocalisations.")
Challenge
Write your own description for our software. Where does it display on our help screen?
We define the description when creating our argument parser object.
PYTHON
import argparse
def main():
# Define command-line interface
parser = argparse.ArgumentParser(
description="A tool to identify bird vocalisations.")
parser.add_argument('-c', '--category',
help="The type of bird call e.g. alarm, contact, flight")
parser.parse_args()
print("Identifying bird vocalisation...")
if __name__ == "__main__":
main()
This text is displayed after the usage instruction.
Usage
By default, the usage message is generated automatically based on the arguments of our script. For our example, the usage instructions look like this:
usage: oddsong.py [-h] [-c CATEGORY]
In most cases, this will do the job. If you want to overwrite this
message then use the usage
parameter when creating the argument parser object.
There are several other options to customise your CLI, but we’ve covered here the primary ways to document your research software to make it easier to use by your collaborators and other researchers.
Key Points
- Command line interfaces (CLIs) are terminal commands that provide an easy-to-use entry point to a software package.
- Researchers can use CLIs to make their research code easier to use by providing well-documented options, hiding the complexity of the software.
- Most programming languages offer frameworks for creating CLIs. In
Python, we do this using the
argparse
library.
Further resources
To find out more about command-line interfaces and using the terminal to improve your productivity for research computing, please refer to the following resources:
- Learn more about using the terminal in the Software Carpentry Unix Shell course.
- There are Python packages such as Click that provide a framework for building bigger, more complex command-line interfaces.
- To learn about distributing your CLI so others can easily install and use it, please see the packaging module in this course series.