Content from Introduction
Last updated on 2024-12-09 | Edit this page
Overview
Questions
- How can code design make your code more FAIR?
Objectives
- Understand the definition of software in research.
- Understand the FAIR principles as applied to research software.
- Understand how code design is related to the FAIR principles.
What is a software in academia?
It is not always easy to define what constitutes software in a research setting. The size of projects can vary from a small script of a few dozens of lines to a massive project with millions of lines. If you are interested in a discussion around a research software definition a good starting point is Defining Research Software: a controversial discussion. An abbreviated summary of this paper and a pragmatic definition we use as a basis for this course is…
Key Points
Research Software includes source code files, algorithms, scripts, computational workflows and executables that were created during the research process or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts,etc.) that are used for research but were not created during or with a clear research intent should be considered software in research and not Research Software. This differentiation may vary between disciplines.
Reminder: The FAIR principles applied to research software
The FAIR principles (Findable, Accessible, Interoperable and Reusable) were originally designed for research data in order to “enhance their re-usability” (see Wilkinson et al (2016)). In this seminal paper it was made clear that while data was a central aspect of research, the principles should also apply to algorithms, tools and workflows that led to the production of that data. Few years later, in 2022, a set of recommendation was published (Chue Hon et al. (2022); Barker et al. (2022)) in order to apply these FAIR principle to research software. An overview of the FAIR principles adapted to research software are that…
Key Points
Findable: Software, and its associated metadata, is easy for both humans and machines to find.
Accessible: Software, and its metadata, is retrievable via standardised protocols.
Interoperable: Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via Application Programming Interfaces (APIs), described through standards.
Reusable: Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).
How can Code Design help the FAIR principles?
By designing your code efficiently you will make it FAIRer. Code design is about making your code easy to read, adapt, maintain and share. Here’s how some of the principles benefit from good design:
Interoperability: Writing code in a modular way and using standard data format allows other systems to communicate with it.
Reusability: Documented code, use of docstrings and comments makes it easier for others to understand, use, and modify your code. Using modular design with single-task blocks also increases reusability. Indeed, small simple pieces can be more easily transferred to other projects or extended without significant refactoring.
The goal of this lecture is to dive into these practices and learn a little bit more about how the way you code will greatly enhance how maintainable, adaptable, and sustainable your software is in the long run.
Content from Why should you care?
Last updated on 2024-12-09 | Edit this page
Overview
Questions
- Why should you know about code design?
Objectives
- Understand the 4 main concepts developed in this course: Maintainability, readability, reusability and scalibility
Why should you care?
Reproducibility and Reliability
Good code practices ensure that research results are reproducible and reliable. Research findings are often scrutinized and validated by others in the field, and well-written code facilitates this process. Clean, well-documented, and well-tested code allows other researchers to replicate experiments, verify results, and build upon existing work, thus advancing scientific knowledge.
Efficiency and Maintainability
Writing good code enhances efficiency and maintainability. Research projects can span several years and involve multiple collaborators. Readable and well-structured code makes it easier for current and future researchers to understand, modify, and extend the software. This reduces the time and effort required to troubleshoot issues, implement new features, or adapt the code for different datasets or experiments.
Collaboration and Community Contribution
Good coding practices facilitate collaboration and contribution from the wider research community. Open-source research software, written with clear, standardized coding practices, attracts contributions from other researchers and developers. This collaborative environment can lead to improvements in the software, innovative uses, and more robust and versatile tools, ultimately benefiting the entire research community.
Readability
Definition and key aspects
Readability in software refers to how easily a human reader can understand the purpose, control flow, and operation of the code. High readability means that the code is clear, easy to follow, and well-organized, which greatly enhances maintainability, collaboration, and reduces the likelihood of bugs.
Key aspects:
Descriptve Naming: Use meaningful and descriptive names that convey the purpose of the variable.
Consistent Formatting: Consistent indentation improves the visual structure of the code. Keeping lines of code within a reasonable length (usually 80-100 characters) prevents horizontal scrolling and improves readability.
Comments and documentation: Brief comments within the code explaining non-obvious parts. Detailed documentation at the beginning of modules, classes, and functions explaining their purpose, parameters, and return values.
Code structure: Breaking down code into functions, classes, and modules that each handle a specific task. Group related pieces of code together, and separate different functionalities clearly.
Benefits:
1 - Maintainability: Your code will be easier to understand and modify the code. It will also greatly reduce the risk of errors when introducing changes.
2 - Collaboration: writing readable code will enhance teamwork and make it easy for others to contribute. Code reviews will be easy!
3 - Efficiency: You are going to save a LOT of time. You will waste less time deciphering your code. That saved time will be used to develop the code.
4 - Quality: Reduces the likelihood of bugs and errors, leading to more reliable code
Reusability
Definition and Key aspects
Reusability in software refers to the ability to use existing software components (such as functions, classes, modules, or libraries) across multiple projects or in different parts of the same project without significant modification. Reusable code is designed to be generic and flexible, promoting efficiency, reducing redundancy, and enhancing maintainability.
Key aspects:
Modularity: Encapsulate functionality within well-defined modules or classes that can be independently reused.
Abstraction: Provide simple interfaces while hiding the complex implementation details.
Parametrization: Design functions and methods that accept parameters to make them adaptable to different situations.
Generic and Reusable Components: Develop generic libraries and utility functions that can be reused across multiple projects.
Documentation and Naming: Provide comprehensive documentation for modules, classes, and functions to explain their usage.
Avoid hardcoding values: Instead, use constants or configuration files.
Benefits:
Time saving: Reusable components save development time. You don’t need to rewrite from sratch! Avoids duplication of effort by using existing solutions for common tasks.
Consistency: Using the same code components across projects ensures consistency in functionality and behavior.
Maintainability: Reusable components can be maintained and updated independently, making it easier to manage large codebases.
Quality: Reusable components are often well-tested, leading to more reliable and bug-free software
Maintainability
Definition and key aspects
Maintainability in software refers to the ease with which a software system can be modified to correct faults, improve performance or other attributes, or adapt to a changed environment. Highly maintainable software is designed to be easily understood, tested, and updated by developers, ensuring that the software can evolve over time with minimal effort and cost.
Key aspects:
Core readibility: your code should be organized logically with meaningful names for variables, functions and classes.
Modularity: If you divide your software into distinct modules or components, each responsible for a specific functionality, you will greatly reduce dependencies.
Documentation: The documentation of the code should be continuously updated to reflect the latest state of the sotware.
Automated testing: Testing your software is important to make sure that modification and implementatio of new functionalities do not break it.
Benefits
Reduce technical debt: Maintainable code is easier to refacto ad improve over time, reducing the accumulation of technical debt. The cost and effort to maintain the software will be significantly reduced
Faster development: If you code is maintainable, it will be easier to understand, modify and extend. It will also be easier to identfy and fix bugs.
Increase collaboration: Having a maintainable code will make it easier for people to join you!
Adaptability to new requirements: if your code is maintainable it will be easier to adapt it to changing (or new) requirements, as it is often the case in research.
Quizz
The question for each code is ‘Is this code readable, reusable and maintainable’?
Reusable: The function can be used with any list of integers to filter and transform the data.
Partially Readable: The code is readable because it uses a simple structure that is easy to follow. But it is impossible to understand its purpose. There are no comments explaining what the function is doing or why it’s doing it.
However, the code will be difficult to maintain because:
- Constraints are not explained.
- The logic includes “magic numbers” (2 and 3) without any explanation or named constants.
- There is no error handling, which makes it harder to maintain when unexpected inputs occur.
Challenge
Code #2:
PYTHON
def calculate_statistics():
data = [23, 45, 12, 67, 34, 89, 23, 45, 23, 34]
total_sum = sum(data)
count = len(data)
average = total_sum / count
data_sorted = sorted(data)
if count % 2 == 0:
median = (data_sorted[count // 2 - 1] + data_sorted[count // 2]) / 2
else:
median = data_sorted[count // 2]
occurrences = {}
for num in data:
if num in occurrences:
occurrences[num] += 1
else:
occurrences[num] = 1
mode = max(occurrences, key=occurrences.get)
print("Sum:", total_sum)
print("Average:", average)
print("Median:", median)
print("Mode:", mode)
# Calculate statistics for the specific data set
calculate_statistics()
- Maintainable: The code is well-structured, with clear variable names and straightforward logic. It’s easy to understand and modify if needed.
- Readable: The code uses descriptive variable names and simple constructs, making it easy to follow.
However, the code is not reusable because the function calculate_statistics is hardcoded to work with a specific dataset defined within the function. It cannot be easily reused with different datasets without modifying the function itself.
Content from Code structure
Last updated on 2024-12-09 | Edit this page
Overview
Questions
- How to structure a code in a scalable and reusable way?
Objectives
- Learn to use functions and classes
- Understand how to organise your code in modules and packages
Introduction
When you’re writing code, making it consistent and well-structured is just as essential as ensuring it produces the correct result. You should think about how you structure your code as you write it, as this will make it easier to read and maintain in the future both by yourself and others.
It’s important to follow the design principles of the programming language you are using. Python is known as an object-oriented programming language, which means Python code is structured around creating, using, and interacting with code objects. There are many different types of objects in Python, such as basic integers or text strings, lists or dictionaries which contain multiple objects, and functions that operate on objects. It’s also encouraged to create your own classes of objects, to make your code more modular and reusable.
When your code grows in size and complexity, it’s a good idea to split it into multiple files, known as modules, and organise these modules into packages. This makes your code easier to manage and maintain, and allows you to reuse code across multiple projects.
Functions
Functions are a way to group code together that
performs a specific task. There are many built-in functions in Python,
such as print()
to output values or len()
to
get the length of a list. But you can also create your own functions, to
perform tasks that you need to do multiple times.
Functions are defined using the def
keyword, followed by
the function name and a set of parentheses containing any
parameters the function takes. The function body is
then indented and contains the code that the function will execute. The
return
keyword is used to specify what the function should
output.
Below is a very basic function that simply takes two parameters and
returns their sum. Once you have defined a function, you can call it by
using its name and passing in the required parameters as
arguments to the function. In this case the parameters
a
and b
are set to 2
and
6
respectively, and the result of the function is stored in
the result
variable:
OUTPUT
8
Whitespace
If you’re used to other programming languages like C or R, you might
be surprised to see that Python does not use curly braces
{}
to define blocks like functions. Instead, Python uses
indentation to define blocks of code, either with
spaces or tabs (known as whitespace).
It’s important to be consistent with your indentation, as mixing different numbers of spaces or tabs can cause errors. Most code editors will automatically indent and convert tabs to spaces when writing Python.
This is obviously a very simple example, but functions can be much more complex and can take multiple parameters, return multiple values, or raise errors if something goes wrong.
Any function acts as a reusable block of code that can be called multiple times within your program. Building your code out of small, modular functions makes it easier to read and maintain, by not having to repeat the same code multiple times you save space and only need to edit the code in one place when making changes. It’s also easier and more reliable to test the output of individual functions to make sure they work correctly, rather than having to run the entire program. There will be a future session on Testing and Continuous Integration which will cover this in more detail.
Scope
When you define a variable inside a function, it is only accessible within that function. This is known as the scope of the variable. If you try to access a variable that is defined inside a function from outside the function, you will get an error:
OUTPUT
NameError: name 'x' is not defined
Anything defined outside of a function is said to be in the global scope, and can be accessed from anywhere in the program (including within functions). However, it’s generally considered good practice to avoid using global variables, as they can make your code harder to understand and debug.
OUTPUT
x: 5
y: 10
Result: 33
NameError: name 'z' is not defined
The x
variable inside the function is a different
variable to the x
variable outside, so changing it inside
the function does not affect the global x
variable.
The y
variable is accessible inside the function because
it is defined in the global scope.
The result
variable is the sum of the function’s
x
variable, the global y
variable and the
argument z
.
The z
variable is not defined outside the function, so
trying to print it in the main body of the script will raise an
error.
Classes
Classes are a way to group functions and data together into a single object. Classes act as a blueprint for creating objects, which are then called instances of the class.
Similar to functions, classes are defined using the
class
keyword, followed by the class name. The class body
is then indented and contains any properties or
methods (i.e. functions) that the class has.
Below is a very simple class for a Rectangle
object,
which has properties for its width and height, as well as a method to
calculate its area:
The self
parameter
Methods in a class always define self
as the first
parameter, which is used as a reference to the instance of the class
that the method is being called on. In this case the
get_area
method uses the width and height properties of the
Rectangle
instance. You can actually call this parameter
anything you like, but self
is the convention in
Python.
You can then create an instance of this class by calling the class
name as if it were a function, and you can access its properties and
methods using the .
operator:
PYTHON
my_rectangle = Rectangle()
print('This rectangle has a width of', my_rectangle.width, 'and a height of', my_rectangle.height)
print('Its area is', my_rectangle.get_area())
OUTPUT
This rectangle has a width of 5 and a height of 3
Its area is 15
In this case the class is not very reusable, as the width and height
are fixed. You can pass arguments when creating an instance of the class
by defining a special __init__()
method. The
__init__()
method at is called whenever a new instance of
the class is created, and it can take values to set the initial state of
the object. In the example below, the Rectangle
class takes
width
and height
parameters and stores them as
properties of the self
object (i.e. the instance of the
class that is being created):
PYTHON
class Rectangle:
def __init__(self, width, height):
self.width = width
self.height = height
def get_area(self):
return self.width * self.height
my_rectangle = Rectangle(10, 20)
print('This rectangle has a width of', my_rectangle.width, 'and a height of', my_rectangle.height)
print('Its area is', my_rectangle.get_area())
OUTPUT
This rectangle has a width of 10 and a height of 20
Its area is 200
“Dunder” methods
In Python, methods that start and end with double underscores are
called “dunder” (short for “double underscore”) or “magic” methods.
These methods are special and have specific built-in meanings, such as
__init__()
being called when an instance of a class is
initialised. We’ll see more examples of “dunders” later in this
section.
Classes are a powerful way to structure your code, as they allow you to group related functions and data together in a way that is reusable and easy to understand.
Challenge 2
In the code below, we have multiple lists representing items in a grocery store. Each fruit has a name, price, and the number in stock.
PYTHON
fruits = ['apple', 'banana', 'orange']
fruit_prices = [1.00, 0.50, 0.75]
fruit_count = [10, 20, 15]
Create a class called Stock
that has properties called
name
, price
, count
, and two
methods: display()
that prints out the name and unit price
of the fruit, and get_total_value()
that returns the total
value of the stock.
Then create a new list of Stock
objects, and call the
display()
method on each object.
PYTHON
class Stock:
def __init__(self, name, price, count):
self.name = name
self.price = price
self.count = count
def display(self):
print(f'Each {self.name} costs £{self.price}')
def get_total_value(self):
return self.price * self.count
fruits = [
Stock('apple', 1.00, 10),
Stock('banana', 0.50, 20),
Stock('orange', 0.75, 15)
]
for fruit in fruits:
fruit.display()
OUTPUT
Each apple costs £1.0
Each banana costs £0.5
Each orange costs £0.75
Challenge 3
Now, create a Shop
class that has a property called
stock
that is a list of Stock
objects. The
Shop
class should have a method called
display_stock()
that calls the display()
method on each item in the stock
list, and a method called
get_total_stock_value()
that returns the total value of all
items in the stock
list.
Then create a new Shop
object with the
fruits
list as the input, and call the
display_stock()
and get_total_stock_value()
methods.
PYTHON
class Shop:
def __init__(self, stock):
self.stock = stock
def display_stock(self):
for item in self.stock:
item.display()
def get_total_stock_value(self):
total = 0
for item in self.stock:
total += item.get_total_value()
return total
shop = Shop(fruits)
shop.display_stock()
print('Total stock value:', shop.get_total_stock_value())
OUTPUT
Each apple costs £1.0
Each banana costs £0.5
Each orange costs £0.75
Total stock value: 31.25
Naming conventions
Note that in each of these examples, the variables, properties and
class names are written as nouns (e.g. result
,
Rectangle
, width
or Shop
), while
the functions and class methods are lowercase verbs or short phrases
(e.g. add()
, get_area()
or
display_stock()
). This is a common convention in Python,
and following it can help you write more readable code and understand
the difference between objects and functions more easily.
We will see more examples of coding conventions like this in later sections.
Scripts and Modules
A Python script is a file containing Python code that can be executed by the Python interpreter. You can run a script by calling the Python interpreter with the script file as an argument, like this:
When you’re writing a large program, it’s a good idea to split your code into multiple files to make it easier to manage, and to make it easier to reuse code in other projects. Each file containing Python code is called a module, and you can import modules into scripts and other modules to use the functions and classes they contain.
For example, you could create a file called
calculator.py
that contains the add()
function
we defined earlier, as well as other functions for subtraction,
multiplication, and division.
PYTHON
"""calculator.py"""
def add(a, b):
return a + b
def subtract(a, b):
return a - b
def multiply(a, b):
return a * b
def divide(a, b):
return a / b
Now, in another script or module, you can import the
calculator
module and use its functions by using the
import
keyword:
OUTPUT
8
You can also import specific functions or classes from a module, rather than importing the whole module. This can be useful if you only need one or two functions, as it saves some time and can make your code more readable:
Python comes with a lot of built-in modules that you can import and
use in your code, collectively known as the Standard
Library. These work in the same way, for instance if you wanted
trigonometric functions you can do
from math import sin, cos, tan
and then use those functions
in your code.
Running a module as a script
In some cases, you might want a mixture of code that executes when
the module is run as a script as well as functions and classes that can
be imported into other modules. Too keep the two separate, you can use
another special “dunder” method called
__name__
. This is a built-in variable that is set to
'__main__'
when the module is run as a script, and is set
to the module name when the module is imported into another module.
As such, if you include a check for
if __name__ == '__main__':
in your module, you can define
code that only runs when the module is run as a script:
PYTHON
"""calculator.py"""
def add(a, b):
return a + b
if __name__ == '__main__':
result = add(5, 3)
print('Test result:', result)
If you didn’t include the if __name__ == '__main__':
check, then every time you tried to import the add()
function from the calculator
module, the test code would
run as well and print out the result.
Packages
Once you have a collection of modules that you want to reuse across
multiple projects, you can organise them into a
package. A package is a directory containing multiple
modules, along with a special __init__.py
file that tells
Python that the directory is a package (this is yet another example of a
“dunder” being used as a special marker in Python, in this case being
used to signify that a directory is importable).
For example, you could create a package called
my_package
that contains the calculator
module
we defined earlier, as well as a new module called geometry
that contains functions for working with shapes. When organising these
modules into a package, the directory structure would look like
this:
my_package/
__init__.py
calculator.py
geometry.py
The __init__.py
file can be empty, but it can also
contain code that runs when the package is imported.
You can then import the modules from the package in the same way as
before, but you need to include the package name either as a prefix or
by using the from
keyword:
PYTHON
import my_package.calculator
my_package.calculator.add(5, 3)
OR
from my_package import calculator
calculator.add(5, 3)
OR
from my_package.calculator import add
add(5, 3)
If your package grows large enough, you can also create sub-packages
within your package by creating subdirectories with their own
__init__.py
files. This allows you to organise your code
into a hierarchical structure that makes it easier to manage and
understand.
Challenge 4
A researcher has all of their code for a project in a single Python file, and they want to split it into multiple modules within a package. Here is a list of the functions and classes they have defined in their script:
-
load_data()
: a function that reads data from a file -
clean_data()
: a function that removes any missing values from the data -
plot_data()
: a function that plots the data -
Data
: a class that holds the data, returned byload_data()
-
Model
: a class that represents a machine learning model created from the data -
test_data()
: a function that tests the data class is working correctly -
test_model()
: a function that tests the model class is working correctly -
run_experiment()
: a function that runs the entire experiment, taking a data file as an input
There is also a test file called test_data.csv
that
contains some example data, and is used by the test_data()
and test_model()
functions.
How would you organise this code into a package?
Here is an example of how you could organise the code into a package:
research_project/
__init__.py
data/
__init__.py
data.py
model/
__init__.py
model.py
plot/
__init__.py
plot.py
scripts/
__init__.py
run_experiment.py
tests/
__init__.py
test_data.csv
test_data.py
test_model.py
- The
data
module would contain theload_data()
andclean_data()
functions and theData
class. - The
model
module would contain theModel
class definition. - The
plot
module would contain theplot_data()
function. - The
scripts
module would contain therun_experiment()
function in a standalone script. - The
tests
module would contain thetest_data()
andtest_model()
functions, as well as thetest_data.csv
file.
For instance, if you wanted to load and plot the data in a different
script, you could import the data
and plot
modules like this:
PYTHON
from research_project.data import load_data
from research_project.plot import plot_data
data = load_data('data.csv')
plot_data(cleaned_data)
Although it seems like a lot of files for a small amount of code, this structure makes it easier to manage and maintain the project over time, and will make it easier to reuse the code in other projects in the future.
Once you have your package organised, you can share it with others by
using a code hosting platform like GitHub, or uploading it to the
Python Package Index (PyPI, https://pypi.org/). If you do this there are some
additional files you should include to make your package more
user-friendly, such as a README
file that explains what the
package does and how to use it, and a LICENSE
file that
specifies the terms under which the code can be used. There is another
session on Packaging
which will go into more detail on how to create and share Python
packages.
Final exercise : Rewriting a Python Script
In this final exercise, you will rewrite a Python script that is poorly structured and difficult to read, with a large amount of repeated code.
Challenge 5
You can find the script here: student_scores.py.
This Python script is designed to read in a data file containing student names and scores (all generated randomly!), although here we just include the data as a text string in the script to save having to download a separate file.
After checking there are no errors with the input, the script then goes through each student, calculates their total score across the three exams and prints out a summary of the data. It then calculates the average score for each assignment, and prints out the student with the highest total score in the class.
You can download it and run it on your local computer using
python test_script.py
. The output should look like
this:
OUTPUT
studentid firstname surname score1 score2 score3 total
39816 Fiona Ellis 15 18 16 49
40859 Philip Holdcroft 12 17 15 44
71625 Kathleen Ingram 20 19 19 58
91462 David Nicholson 14 16 18 48
97297 Mark Walch 18 20 17 55
Average score1: 15.80
Average score2: 18.00
Average score3: 18.00
Student with highest total: Kathleen Ingram (58.00)
Your task is to rewrite this script using functions and classes to make it more modular and reusable. You can also move code out of the script into separate files and modules if you think it will make the code easier to manage.
Make sure when you’re done that the output of the script is the same as the original.
Content from The Zen of Python
Last updated on 2024-12-09 | Edit this page
Overview
Questions
- What are PEPs?
- How to write clean?
- How can I do this efficiently with Pylint?
Objectives
- Understand why it is important to write good code
- Write PEP8 compliant code
- Use Pylint to help with code formating and programmatic errors
Python Enhancement Proposals and the Zen of Python
The Python Enhancement Proposals are documents that provide information to the Python community, or describing a new feature for Python or its processes or environment. Some of them are also focusing on design and style:
- The main one is PEP8. It lays out rules to
write clean code in Python.
- Docstrings convention are given in PEP257.
- The Zen of Python in PEP20 gives principle for Python’s design. It is accesible in any python distribution with:
In [1]: import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Readability counts
As Guido van Rossum (Python creator and Benevolent dictator for life) once said ‘’Code is read much more often than it is written’’
While coding you may spend of a few hours (days) on a piece of code and when you will be done with it you will not write it again. Nevertheless there is a very high chance that you will read it again. If the piece of code is part of an on-going project you will have to remember what that code does and why you actually wrote it. Hence, readability counts! Remembering what a code does after a few weeks/months is not easy. If you follow the standard guidelines it will greatly help you (and save you a lot of time!).
In addition, if multiple people are looking at the code and developing with you, writing readable code is paramount. If people have to decipher your coding style before actually trying to understand what you are coding that will become very difficult for everybody. PEP8 provides a standardisation of the python coding style.
Explicit is better than implicit.
Writing clear code is not complicated. It starts by giving meaningful
name to variables, function and classes. Avoid single letter names like
x
or y
. For example:
PYTHON
# This is bad:
x = 5
y = 10
z = 2*x + 2*y
# This is much better:
width = 5
height = 10
diameter = 2*height + 2*width
Just by using descriptive names we can understand what the code is trying to do.
In addition, everything that you write (variables, constants, function, classes…) comes with a way to name it. The main conventions are:
-
Variables, function and methods use the
snake_case
convention. It means that they should use lowercase letters and words should be separated by underscore:
PYTHON
# This is bad
def ComputeDiameter(width, height):
return 2*width + 2*height
# This is good
def compute_diameter(width, height):
return 2*width + 2*height
- Class names follow the PascalCase convention (also known as CamelCase). In that convention, each word starts with a capital letter and there are NO underscores between words.
- Constant names follow the UPPER_SNAKE_CASE convention. Constants, or variables that are intended to remain unchanged, should be written in all uppercase letters, with words separated by underscores.
Beautiful is better than ugly
In the context of Python, beautiful means that the code is clean, readable and well structured. Beautiful code is easy to understand, not only for you but also others people who might have to maintain the code in the future. It uses meaningful names and clear logic and structure.
This one-liner tries to:
- Filter out odd numbers and the number 4.
- For even numbers, double them.
- For the rest, triple them.
‘Beautiful is better than ugly’ means that developers should aim for simplicity and elegant solution. It makes a code very difficult to maintain when the author tries to cram as much functionality as possible in a single line or function. Always tries to break down into clear single component.
Beautiful code is aesthetically pleasing because it follows good design principles (see next chapter). It is modular, reusable, and adheres to the DRY (Don’t Repeat Yourself) principle. It avoids unnecessary complexity and focuses on clarity.
Challenge
Rewrite the previous one line to make it more understandable
PYTHON
result = []
for x in range(10):
if x % 2 == 1 or x == 4:
continue
if x % 2 == 0:
result.append(x * 2)
else:
result.append(x * 3)
print(result)
The advantages of this version:
- Clarity: Each part of the logic is isolated—first filtering, then applying the transformation based on conditions.
- Step-by-Step: It’s clear what’s happening at each step without trying to parse it all at once.
- Debuggable: It’s easier to debug and modify, especially if you need to change one part of the logic.
- Maintainability: Each step is explicit, making it easier for others (or yourself in the future) to understand.
Nevertheless, this does not mean that you should over-complexify your code. While you start to know the language in more details, you will start to learn how it works and it will help you to be concise and efficient:
The advantages of this version:
- Readability: It’s immediately clear that the code checks if the list is empty.
- Conciseness: The not operator works directly with lists in Python, making the code more succinct.
- Simplicity: Eliminates unnecessary conditional checks and additional code.
Sparse is better than dense.
When you write your code it is important to make it readable. Avoiding cluttered code by making is sparse and spaced out makes it easier to read and increase clarity and readability. Use whitespaces, correct indentation and separation will make your code quicker to understand. Moreover, when code is spread out with proper comments and breaks it is easier to modify or debug. Let’s see an example:
Challenge
What is wrong with this code? Is it actually working?
PYTHON
def example_function(param1,param2):print(param1+param2*2)
def another_function(x,y):return x+y
class MyClass:
def __init__(self,param): self.param=param
def method(self):
if self.param >10: print("Value is greater than 10")
else: print("Value 10")
my_list=[1,2,3,4,5]
dictionary={'key1':'value1','key2':'value2'}
result=another_function(5,10)
print(result)
So what are the rules?
- Indentation: The convention is to use 4 spaces. Tabs are not recommended as they can lead to inconsistencies:
-
Whitespaces around operators: A single space on
both sides of binary operators should be included
(
+, -, *, /, =, ==, !=, <, >, <=, >=, etc
).
- Comma and colon spacing: you shoud include a single space after a comma and you should include a space after the colon in dictionary:
PYTHON
#This is bad
dictionary={'key1':'value1','key2':'value2'}
#This is good
dictionary = {'key1': 'value1', 'key2': 'value2'}
- Blank lines: Use two blank lines before a top-level function or class definition and use a single blank line between method definitions inside a class.
PYTHON
# This is bad
class MyClass:
def method_one(self):
pass
def method_two(self):
pass
# This is good
class MyClass:
def method_one(self):
pass
def method_two(self):
pass
Challenge
Based on what we saw up to now, rewrite this code to make it easier to understand.
PYTHON
def example_function(param1,param2):print(param1+param2*2, end=' ')
print("The result is:", param1,param2)
def another_function(x,y):return x+y
class MyClass: def __init__(self,param):self.param=param
def method(self):if self.param >10:print("Value is greater than 10")
else:print("Value is 10 or less")
my_list=[1,2,3,4,5]
dictionary={'key1':'value1','key2':'value2'}
result=another_function(5,10)
print(result)
PYTHON
def calculate_adjusted_sum(base_value, multiplier):
"""
Calculate and print the sum of the base_value and twice the multiplier.
Args:
base_value (int or float): The base value to which the adjusted multiplier will be added.
multiplier (int or float): The value that will be doubled and added to the base value.
"""
adjusted_sum = base_value + (multiplier * 2)
print(adjusted_sum, end=' ')
print("The adjusted sum is:", base_value, multiplier)
def add_two_numbers(x, y):
"""
Return the sum of two numbers.
Args:
x (int or float): The first number.
y (int or float): The second number.
Returns:
int or float: The sum of x and y.
"""
return x + y
class ValueChecker:
def __init__(self, value):
"""
Initialize with a specific value.
Args:
value (int or float): The value to be checked.
"""
self.value = value
def check_and_print_message(self):
"""
Print a message based on whether the value is greater than 10 or not.
"""
if self.value > 10:
print("The value is greater than 10.")
else:
print("The value is 10 or less.")
# Example usage
numbers_list = [1, 2, 3, 4, 5]
key_value_pairs = {'key1': 'value1', 'key2': 'value2'}
# Add two numbers and print the result
result = add_two_numbers(5, 10)
print("Sum of numbers:", result)
We actually see now that the class or the first function were not used at all in the rest of the code. If that codes stands like this, they can be removed..
If the implementation is hard to explain, it’s a bad idea…If the implementation is easy to explain, it may be a good idea.
If you follow this FAIR training program you might be interested to share your code with the wider research community. If that’s the case people might want to have a look at your code. This aphorism tells you that how you implemented your code matters! Code should always be easy to understand. If you are unable to explain what your code is doing then you should not leave it in your software. Conversely, if you are able to explain in an easy what your piece of code is doing, this is probably a good implementation. For example
In addition to writing simpler and more logical code, commenting your code is important. For more complex type of operations it is often useful to explain what is the logic behind the reasoning and why a particular approach has been chosen.
There are a few rules for writing comments in Python:
- Comments should be complete sentences and start with a capital letter.
- Block comments apply to the code coming after it and are indented to
the same level of that code. Each line should start with a
#
followed by a single space. - Inline comments should be separate by at least two spaces from the piece of code they are related to.
- Comments should not state the obvious (it is distracting).
Finally, when you update your code you should always update the comment. ‘Comments that contradict the code are worse than no comments’ [PEP8].
PyLint
PyLint is a tool that analyzes Python code to find programming errors, enforce a coding standard, and look for improvements. It provides a score based on the number of issues detected, helping you writing clean and readable code.
Key Features of PyLint
Error Detection: Detects issues such as using undefined variables, unnecessary imports, and more.
Coding Standard Enforcement: Checks the code against PEP 8. Flags violations such as incorrect indentation, naming conventions, and line length.
Code Quality Metrics: Provides a detailed report with metrics like code complexity, number of lines, and number of classes. Offers a score that reflects the overall quality of the code.
Refactoring Suggestions: Suggests improvements to make the code cleaner and more efficient. Highlights duplicated code, unused variables, and functions that can be simplified.
Running pylint
To analyse a python file you can simply run:
When you run PyLint on a Python file, it provides an output with the following components:
- Messages: Each detected issue is reported with a message ID, type, line number, and a brief description.
- Statistics: Provides a summary of the issues found, such as the number of errors, warnings, and refactor suggestions.
- Score: An overall score out of 10, reflecting the code quality based on the issues detected.
Challenge
Let’s have a look at an example: Consider that file here and run PyLint on it. Try to clean up the code according to the error messages you see.
Content from Principles of Code design
Last updated on 2024-12-04 | Edit this page
Overview
Questions
- How to write maintainable, readable, resusable and scalable code?
Objectives
- Be familiar with standard principles of code design
- Understand what they mean and how to apply them
Coding principles are guidelines and best practices that anybody writing code should follow to write clean, maintainable and efficient code. They enhance code quality and ensure it is readable, reusable and less prone to errors.
You aren’t gonna need it (YAGNI)
![xkcd.com](fig/xkcd_meme.png)
Introduction
The principle YAGNI stands for “You Aren’t Gonna Need It”. This principle encourages you to build only what is needed right now, avoiding adding features for hypothetical future needs. It comes from Agile programming and aims to reduce spending time and resources on unnecessary code and keep the code clean and understandable.
Why YAGNI is important:
- Simplicity: By avoiding unnecessary code you will reduce complexity, making it easier to read, maintain, and debug code.
- Saving Time: Don’t wast time by building features that may never be used.
- Flexibility: Writing only what is needed makes any changes in requirements easier to implement.
Applying YAGNI
Let’s consider the following instruction: create a function that implements a percentage discount price. Here is a solution that does not respect the YAGNI principle:
PYTHON
def calculate_discount(price, discount_type="percentage", value=10.0):
'''
This function applies a discount to a price
Parameters
----------
price : float
Original price
discount_type: str
type of discount [percentage of fixed]
value: float
discount to be applied
Return
------
discounted_price: float
final price after applying discount
Raises
------
ValueError
if the discount type is not 'percentage' or 'fixed'
'''
if discount_type == "percentage":
return price - (price * (value / 100))
elif discount_type == "fixed":
return price - value
else:
raise ValueError("Invalid discount type")
In that example, the software engineer has planned for possible other use cases (different type of discount) while not being required. It is an example of over-engineering. A better implementation would be:
PYTHON
def calculate_discount(price, discount_percentage):
'''
Function that applies a discount. The discount is given as a percentage of the original price.
Parameter
---------
price: float
original price
Return
------
final_price: float
final price after applying discount
'''
final_price = price - (price * (discount_percentage / 100))
return final_price
Exercise
Challenge
Context: You’re working on a feature to calculate the final price of
items in a shopping cart. Right now, the only two requirement are (1) to
apply a fixed 10% discount to the total cart price and (2) return the
final price with a $
sign in front of the total price
(e.g. $42.2). However, the initial implementation includes additional
features that anticipate potential, but not confirmed, future
requirements.
PYTHON
def calculate_final_price(prices, currency="USD", discount_type="percentage", discount_value=0.1, include_shipping=False, shipping_cost=5.0):
# Calculate the initial total price
total = sum(prices)
# Apply discount based on type
if discount_type == "percentage":
total -= total * discount_value
elif discount_type == "fixed":
total -= discount_value
# Include shipping if specified
if include_shipping:
total += shipping_cost
# Format total with currency symbol
if currency == "USD":
return f"${total:.2f}"
elif currency == "EUR":
return f"€{total:.2f}"
else:
raise ValueError("Unsupported currency")
Work on the calculate_final_price
function to apply the
YAGNI principle by removing unnecessary parameters and logic, focusing
only on the known requirements.
Keep it simple, Stupid (KISS) & Curly’s Law
Introduction
The KISS Principle stands for “Keep It Simple, Stupid” and points out that writing simple code should be a primary goal in design. Complex structure often leads to unreadable and error-prone code. This is especially important in research where maintaining code over a long time period is essential.
Why KISS is important?
- Readability: Simple code is easier to understand. There is a high chance that the person who will read your code the most is yourself, so help your future self.
- Maintainability: Bug are easier to be found and fixed when each component is simple.
- Upgrade: A simple code is easier to adapt to changes in the requirements.
It is easy to recognize complex code. When you have too many nested
loops or if
statements it means that your code is not
optimal. In such case you might take a step back and try to simplify the
structure.
The Curly’s Law says that a function should focus on a single task. Each function should “do one thing” and “do it well,” meaning that if a function has multiple tasks, consider breaking it down.
Why is the Curly’s Law important?
- Reusability: Simple single-task function are easier to reuse.
- Bug fix: When you code is composed of simple function, potential issues are easier to localise.
- Testing: Simple single-task function are easier to test.
- Modularity: Code becomes more modular and organized.
Applying KISS and Curly’s Law : Simplifying a Complex Function
Let’s consider a function that compute the area of circles, rectangles and triangles:
PYTHON
def calculate_area(shape, dimensions):
'''
This function compute the area of a given geometrical shape
Parameters
----------
shape : str
shape to consider. Can be rectangle, circle or triangle
dimensions: list
of dimension to consider. For rectangle and triangle you need to give a list
of 2 numbers. For circle, you need to pass a list of one quantity (radius).
Return
------
area : float
area of the shape
Raises
------
ValueError
if the shape is not recognised
'''
if shape == "rectangle":
area = dimensions[0] * dimensions[1]
elif shape == "circle":
area = 3.14159 * (dimensions[0] ** 2)
elif shape == "triangle":
area = 0.5 * dimensions[0] * dimensions[1]
else:
raise ValueError("Unsupported shape!")
return area
area = calculate_area("rectangle", [10, 20])
This function is able to compute the area of each shape. To apply KISS and the Curly’s law what you can do is to split this function into three simple independent functions:
PYTHON
def rectangle_area(length, width):
return length * width
def circle_area(radius):
return 3.14159 * radius ** 2
def triangle_area(base, height):
return 0.5 * base * height
# Simple and clear usage
area = rectangle_area(10, 20)
In that version, functions are specific and easy to understand and there is no unnecessary complexity in shape management. It is easier to maintain and extend.
Exercice
Challenge
Let’s consider a function that processes data by removing values, calculating the average and returning a formatted result :
PYTHON
def process_data(data):
cleaned_data = [x for x in data if x is not None] # Remove missing values
average = sum(cleaned_data) / len(cleaned_data) # Calculate average
return f"Average: {average:.2f}"
Using KISS and Curly’s law, rewrite this code.
PYTHON
def remove_missing(data):
'''
This function is removing missing data from a list
Parameter
---------
data : list
list of numbers
Return
------
cleaned_data: list
of data without missing values
'''
clenaed_data = [x for x in data if x is not None]
return cleaned_data
def calculate_average(data):
'''
This function computes the average of the input data
Parameters
----------
data : list
of numbers
Return
------
average : float
average of the data
'''
average = sum(data) / len(data)
return average
def format_average(average):
'''
Format the number as given in parameter as string.
Parameter
---------
average : float
number to format
Return
------
formatted_string : str
of the form 'Average: X.YZ'
'''
return f"Average: {average:.2f}"
Don’t repeat yourself (DRY) - Rule of three
Introduction
The DRY Principle states: “Don’t Repeat Yourself.” It encourages you to minimize duplication by refactoring similar code patterns. This leads to more readable, maintainable, and scalable code.
Why DRY is it important:
- Improves Readability: Code is clearer when it’s not cluttered with repeated logic.
- Reduces Bugs: If you need to make changes, you only do it in one place, reducing the chance of errors.
- Saves Time: Updating and testing code is faster when code is organized with minimal duplication.
Using functions to avoid repeting code
Instead of writing the same code in multiple places in your script, create a function. This makes updates easier and avoids errors. For example consider the following code:
The same operation is repeated three times with a different value. If you create a function that makes this operation you can refactor your code:
Using loops instead of manual repetition
In the previous examples we still call the function three time which is not optimal. In general, If you’re applying the same operation to multiple elements, use a loop to avoid repeated code blocks:
Using constants for common values
When a value is repeated in multiple places, declare it as a constant variable. This way, you only need to change it once if necessary. Consider the following code:
The value 0.1
is repeated three times. If you want to
change it, you will need to do it three times. To save time and to add
some clarity to your code, you may want to declare the value
0.1
, as follows:
Now if you want to change 0.1
to 0.2
you
need to do it only once. In addition, now you have a better idea of what
that constant is! The code is already clearer.
Challenge
Write a code, without repetition, that produces the following output:
Hello, Alice!
Hello, Bob!
Hello, Charlie!
Summary
DRY helps you write clear, efficient, and error-resistant code. Use functions, loops, and constants to reduce repetition. A DRY approach saves time and effort in the long run, especially when scaling or debugging code.
It is important to note that prematurely refactoring a code might lead to the unnecessary complexity. This is why DRY is often associated to the Rule of Three. The latter is a guideline suggesting that you should wait until a piece of code is repeated three times before refactoring it. It ensures that you only refactor when a pattern is stable and repeated enough time.
Principle of least astonishment (POLA)
Introduction
The Principle of Least Astonishment (POLA) states that code should work in a way that does not surprise its users and maintainers. POLA encourages you to design code that aligns with common expectations.
Why POLA is important:
- Usability: When code works as expected, users and maintainers are less likely to misuse or misunderstand it.
- Maintainability: Familiar and predictable patterns make the code easier to maintain and upgrade.
- Collaboration: Using consistent and intuitive code make it easier for multiple people to work with and develop.
Common violations
Here are three common violation of POLA:
Naming Conventions: Function or variable names that don’t align with their purpose often lead to problems
Unexpected Return Types: Functions that return types users wouldn’t expect, such as a function sometimes returning an integer and other times returning None.
Multiple Functionalities: Using functions for multiple unrelated tasks often leads to unexpected behaviors.
Applying POLA
Example 1: Consider a function that returns different types based on a condition, which could confuse users who expect one type.
PYTHON
def calculate_total(items):
if not items:
return None # If no items, return None
return sum(items)
The problem in that function is that depending on a condition, the returned value has a different type. To overcome this problem a potential solution is to return a number anyway:
PYTHON
def calculate_total(items):
if not items:
return 0 # Return 0 instead of None for consistency
return sum(items)
With this solution, the user of the code will always get the same type out of that function.
Example 2: Consider a function that does two different tasks: processing some data and save them in a file.
PYTHON
def process_data(data, save=False):
cleaned_data = [d.strip() for d in data]
if save:
with open('data.txt', 'w') as f:
f.write('\n'.join(cleaned_data))
return cleaned_data
The user may not expect that processing data will save them into a file as well. This can lead to data being overwriten. To overcome this potential problem, you might want to separate the two functionalities into two different functions:
PYTHON
def process_data(data):
return [d.strip() for d in data]
def save_data(data, filename='data.txt'):
with open(filename, 'w') as f:
f.write('\n'.join(data))
This solution keeps each function’s purpose clear.
Exercice
Challenge
Refactor calculate area
to make it more predictable and
intuitive.
PYTHON
from math import pi
def calculate_area(shape, a, b=0):
if shape == "rectangle":
return a * b # Expects both `a` and `b`
elif shape == "circle":
return pi * (a ** 2) # Ignores `b`
elif shape == "triangle":
return 0.5 * a * b # Expects `a` as base and `b` as height
else:
return "Unknown shape"
# Example usage:
print(calculate_area("rectangle", 5))
print(calculate_area("circle", 3, 4))
print(calculate_area("triangle", 6, 3))
print(calculate_area("hexagon", 5, 5))
PYTHON
from math import pi
# Specific functions for each shape
def rectangle_area(length, width):
if length <= 0 or width <= 0:
return "Error: Length and width must be positive numbers."
return length * width
def circle_area(radius):
if radius <= 0:
return "Error: Radius must be a positive number."
return pi * radius ** 2
def triangle_area(base, height):
if base <= 0 or height <= 0:
return "Error: Base and height must be positive numbers."
return 0.5 * base * height
# Example usage
rect_area = rectangle_area(10, 5) # Expected: Valid rectangle area
circle_area_invalid = circle_area(-3) # Expected: Error message
tri_area = triangle_area(6, 3) # Expected: Valid triangle area
rect_invalid = rectangle_area(10, -5) # Expected: Error message
# Output results
print(f"Rectangle Area: {rect_area}")
print(f"Circle Area: {circle_area_invalid}")
print(f"Triangle Area: {tri_area}")
print(f"Invalid Rectangle Area: {rect_invalid}")
Content from Don't touch your code anymore!
Last updated on 2024-12-10 | Edit this page
Overview
Questions
- How can you modify your code configuration without touching it?
Objectives
- Understand the purpose and usage of configuration files and command line interfaces.
- Create, read, and write configuration files in different formats (INI, JSON, YAML)
- Learn to design and implement command-line interfaces (CLI) using argparse
- Integrate configuration files with CLI arguments for dynamic applications.
Until now, what we have seen deals with the design of the code itself and how to make it cleaner, more readable and maintainable. Now we are going to see how to reduce the amount of time you change the code while still modify parameters. Research is often based on a trial-error or trial-trial loops. You will often find yourself trying to rerun a code with different parameters to try different configuration. Hard coding this values can lead to inflexibility and error-prone results because it means that you will need to go change the code itself to change the configuration. In addition, and unless you are able to track very well all your trials, you will probably loose track of some of them.
Configuration files
Why would need them?
Configuration files will allow you to adjust some parameters of the code (it can be filenames, directories, values, etc) while actually leaving the code untouched.
Benefits:
- Easier Reproducibility: By simply changing configuration files, you can reproduce the same results or adjust parameters for new experiments.
- Collaboration: Configuration files allow collaborators to use the same script but adjust settings for their own environment. It is also easier to share configurations between collaborators.
- Minimizing Code Modifications: Parameters are externalized, making the core code cleaner and more maintainable.
- Documentation: Well-structured configuration files serve as documentation for your run. They provide a clear and organized record of the settings used, which is crucial for understanding and interpreting results.
- Version Control: Configuration files can be versioned alongside the code using version control systems like Git.
Types of configuration files
As it is often the case in Python, multiple options are available:
- INI Files are easy to read and parse. The module used to load these files is configparser and part of the Python standard Library.
[section1]
key1 = value1
key2 = value2
#Comments
[section2]
key1 = value1
[Section3]
key = value3
multiline
INI files are structured as (case sensitive) sections in which you
can list keyword/value pairs (like for a dictionary) separated by either
the =
or :
signs. Values can span multiple
lines and comments are accepted as long as the extra lines are indented
with respect to the first line. All data are parsed as strings.
- JSON: Originally developed for JavaScript, they are very popular in web applications. The module to read these files is json and also part of the standard library.
{
"section1": {
"key1": "value1",
"key2": "value2"
},
"section2": {
"key1": "value1"
}
}
JSON files are also structured as section and keyword/value pairs.
JSON files start with an opening brace {
and end with a
closing brace }
. Then each section comes with its name
followed by :
. Then key/value pairs are listed within
braces (one for each section). Nevertheless, comments are not allowed
and they might be a little bit more complex to write.
- YAML Files: are also a popular format (used for github action for example). In order to read (and write) YAML files, you will need to install a third party package called PyYAML.
section1:
key1: value1
key2: value2
section2:
key1: value1
# Comments
YAML files work also with sections and keyword/value pairs.
- TOML files are a bit more recent than the other ones but start to be widely use in Python (a simple example is the setup.py file for installation that became a pyproject.toml file in the last years). They allow structure and data formats. They are quite similar to INI files for the syntax. It is worth mentioning that the library tomllib is part of the Python standard library from python versoin 3.11.
[section1]
key1 = value1
key2 = value2
[section2]
key1 = value1
Loading and writing INI files: Configparser
In the following we will be using INI files. We will start by a simple exercice on writing a configuration file, manually.
Challenge
Using the text editor of your choice, create an INI file with three
sections: simulation, environment and initial conditions. In the first
section, two parameters are given: time_step
set at 0.01s
and total_time
set at 100.0s. The environment section also
has two parameters with gravity
at 9.81 and
air_resistance
at 0.02. Finally the initial conditions are:
velocity
at 10.0 km/s, angle
at 45 degrees and
height
at 1m.
Creating a file 'config.ini'
with the following
content.
[simulation]
time_step = 0.01
total_time = 100.0
[environment]
gravity = 9.81
air_resistance = 0.02
[initial_conditions]
velocity = 10.0
angle = 45.0
height = 0.0
Reading configuration files: INI
Reading an INI file is very easy. It requires the use of the Configparser library. You do not need to install it because it comes as part of the standard library. When you want to read a config file you will need to import it and create a parser object which will then be used to read the file we created just above, as follows:
PYTHON
##Import the library
import configparser
##Create the parser object
parser = configparser.ConfigParser()
##Read the configuration file
parser.read('config.ini')
From there you can access everything that is in the configuration file. Firstly you can access the section names and check if sections are there or not (useful to check that the config file is compliant with what you would expect):
PYTHON
>>> print(parser.sections())
['simulation', 'environment', 'initial_conditions']
>>>print(parser.has_section('simulation'))
True
>>>print(parser.has_section('Finalstate'))
False
Eventually, you will need to extract the values in the configuration file. You can get all the keys inside a section at once:
You can also extract everything at once, in that case each couple key/value will be called an item:
PYTHON
>>> items_in_simulation = parser.items('simulation')
>>> print(items_in_simulation)
[('time_step', '0.01'), ('total_time', '100.0')]
That method will return a list of tuples, each tuple will contain the couple key/value. Values will always be of type string.
Alternatively, you can turn each section to a dictionary:
Finally, you can access directly values of keys inside a given section like this:
By default, ALL values will be a string. Another
option is to use the method .get()
:
PYTHON
>>> time_step_with_get = parser.get('simulation', 'time_step')
>>> print(time_step_with_get)
0.01
It will also be giving a string…And that can be annoying when you have some other types because you will have to convert everything to the right type. Fortunately, other methods are available:
-
.getint()
will extract the keyword and convert it to integer -
.getfloat()
will extract the keyword and convert it to a float -
.getboolean()
will extract the keyword and convert it to a boolean. Interestingly, you it returnTrue
is the value is1
,yes
,true
oron
, while it will return False if the value is0
,no
,false
, oroff
.
Writing configuration files
In some occasions it might also be interesting to be able to write configuration file programatically. Configparser allows the user to write INI files as well. As for reading them, everything starts by importing the module and creating an object:
PYTHON
#Let's import the ConfigParser object directly
import ConfigParser
# And create a config object
config = ConfigParser()
Creating a configuration is equivalent to creating a dictionaries:
PYTHON
config['simulation'] = {'time_step': 1.0, 'total_time': 200.0}
config['environment'] = {'gravity': 9.81, 'air_resistance': 0.02}
config['initial_conditions'] = {'velocity': 5.0, 'angle': 30.0, 'height': 0.5}
And finally you will have to save it:
PYTHON
with open('config_file_program.ini', 'w') as configfile: ##This open the config_file_program.ini in write mode
config.write(configfile)
After running that piece of code, you will end with a new file called
config_file_program.ini
with the following content:
[simulation]
time_step = 1.0
total_time = 200.0
[environment]
gravity = 9.81
air_resistance = 0.02
[initial_conditions]
velocity = 5.0
angle = 30.0
height = 0.5
Challenge
Consider the following INI file:
[fruits]
oranges = 3
lemons = 6
apples = 5
[vegetables]
onions = 1
asparagus = 2
beetroots = 4
Read it using the configparser library. Then you will change the number of beetroot to 2 and the number of oranges to 5 and a section ‘pastries’ with 5 croissants. Then save it back on disk in a different file.
PYTHON
##Import the package
import configparser
###create the object
config = configparser.ConfigParser()
##read the file
config.read('conf_fruit.ini')
###Change the values
config['fruits']['oranges'] = str(5)
config['vegetables']['beetroots'] = str(2)
###Add a section with a new key/pair value
config['pastries'] = {'croissants': '5'}
##save it back
with open('new_conf_fruits', 'w') as openconfig:
config.write(openconfig)
Using command line interfaces
Definition & advantages
A Command Line Interface (CLI) is a text-based interface used to interact with software and operating systems. It allows users to type commands into a terminal or command prompt to perform specific tasks, ranging from file manipulation to running scripts or programs.
When writing research software CLIs are particularly suitable:
Configuration: Using CLI it is easy to modify the configuration of a software without having to touch the source code.
Batch Processing: Researchers often need to process large datasets or run simulations multiple times. CLI commands can be easily scripted to automate these tasks, saving time and reducing the potential for human error.
Quick Execution: Experienced users can perform complex tasks more quickly with a CLI compared to navigating through a GUI.
Adding New Features: Adding new arguments and options is straightforward, making it easy to extend the functionality of your software as requirements evolve.
Documentation: CLI helps document the functionality of your script through the
help
command, making it clearer how different options and parameters affect the outcome.Use in HPCs: HPCs are often accessible through terminal making command line interfaces particularly useful to start codes from HPCs.
Creating a command line interface in Python
In Python, there is a very nice module called argparse. It allows to write, in a very limited amount of lines, a command line interface. Again, that module is part of the standard library so you do not need to install anything.
As for the configuration files, we must start by importing the module and creating a parser object. The parser object can take a few arguments, the main ones are:
-
prog
: The name of the program -
description
: A short description of the program. -
epilog
: Text displayed at the bottom of the help
We would proceed as follows:
PYTHON
###import the library
import argparse
###create the parser object
parser = argparse.ArgumentParser(description='This program is an example of command line interface in Python',
epilog='Author: R. Thomas, 2024, UoS')
Once this is written, you need to tell the program to analyse (parse)
the arguments passed to program. This is done with the
parse_args()
method:
args = parser.parse_args()
If you save everything in a python file
(e.g. cli_course.py
) and run
python cli_course.py --help
you will see the following on
the terminal:
usage: cli_course.py [-h]
This program is an example of command line interface in Python
optional arguments:
-h, --help show this help message and exit
Author: R. Thomas, 2024, UoS
You can see that the only option that was implemented is the
help
. It is done by constructing the command line interface
and you do not need to implement it yourself. Now let’s add extra
arguments!
Define command line arguments
The argparse
modules implements the
add_argument
method to add argument. Based on the code we
prepared before, you would use if this way:
Two main types of arguments are possible: * Optional arguments: their
name start by -
or --
and are called in the
terminal by their name. They can be ignored by the user. * Positional
arguments: Their name DO NOT start by -
or --
,
the user cannot ignore them and they are not to be called by their name
(just the value need to be passed).
For example, you can add this three lines before the
args = parser.parse.args()
in the file
commandline.py
that you created before:
PYTHON
parser.add_argument('file') # positional argument (mandatory)
parser.add_argument('file2') # positional argument (mandatory)
parser.add_argument('-c', '--count') # option that takes a value
parser.add_argument('-n') # option that takes a value
parser.add_argument('--max') # option that takes a value
If once again you want to print the help in the terminal
python commandline.py --help
you will see the following
being displayed:
usage: cli_course.py [-h] [-c COUNT] [-n N] [--max MAX] file file2
This program is an example of command line interface in Python
positional arguments:
file
file2
optional arguments:
-h, --help show this help message and exit
-c COUNT, --count COUNT
-n N
--max MAX
Author: R. Thomas, 2024, UoS
The help tells you that file
and file2
are
positional arguments. THe user have to provide the values for each of
them (in the right order!). In the next section named ‘options’, we find
the help, that was already there before, and the count
,
n
and max
options: - The count option can be
called by using -c
OR --count
and a value
COUNT
that the user will need to provide. - The
n
option is called using ‘-n’ plus a value - The
max
option is called using ‘–max’ plus a value
Now that we have defined a few argument, we can tune them a little bit. The first thing you will want to do is to provide the user of your program with a small help. As is stands now, displaying the help tells you what are the arguments to be used but nothing tell you what they actually are. To prevent any confusion, add a one-liner help to your argument:
PYTHON
parser.add_argument('file', help='input data file to the program') # position argument
parser.add_argument('file2', help='Configuration file to the program') # position argument
parser.add_argument('-c', '--count', help='Number of counts per iteration') # option that takes a value
parser.add_argument('-n', help='Number of iteration') # option that takes a value
parser.add_argument('--max', help='Maximum population per iteration') # option that takes a value
These short descriptions will be displayed when using the help:
usage: My program [-h] [-c COUNT] [-n N] [--max MAX] file file2
This program is an example of command line interface in Python
positional arguments:
file input data file to the program
file2 Configuration file to the program
options:
-h, --help show this help message and exit
-c COUNT, --count COUNT
Number of counts per iteration
-n N Number of iteration
--max MAX Maximum population per iteration
Author: R. Thomas, 2024, UoS
It is possible to use extra options to define arguments, we list a few here:
action
: this options allows you to do store boolean values.default
: This allows you to define a default value for the argument. In the case thr argument will not be used by the user, the default value will be selected:parser.add_argument('--color', default='blue')
.type
: By default, the argument will be extracted as strings. Nevertheless, it is possible to have them interpreted as other types using thetype
argument:parser.add_argument('-i', type=int)
. It the user passes a value that cannot be converted to the expected type an error will be returned.choices
: If you want to restrict the values an argument can take, you can use thechoice
option to add this contraints:parser.add_argument('--color', choices=['blue', 'red', 'green'])
. If the user pass ‘purple’ as value, an error will be raised.
Finally you must be able to retrieve all the argument values:
How to get values from the CLI?
To get values from the command line interface you need to look into
the args
variable that you defined with the line
args = parser.parse_args()
. Each argument can be called via
the structure ‘arge.’ + argument name:
args = parser.parse_args()
print(args) # Gives the namespace content
print(args.file) #direct access to the 1st positional argument
print(args.max) #direct access to the max optional argument
Below we give a couple of examples of calls to the program with different configurations:
[user@user]$ python cli_course.py file1path/file.py -c 3 --max 5
usage: cli_course.py [-h] [-c COUNT] [-n N] [--max MAX] file file2
cli_course.py: error: too few arguments ####<---One positional argument is missing.
[user@user]$ python cli_course.py file1path/file.py file2path/file2.py -c 3
Namespace(count='3', file='file1path/file.py', file2='file2path/file2.py', max=None, n=None)
file1path/file.py
None
Challenge
Create a Python script called basic_cli.py
that:
Accepts two arguments: --input_file
(path to the data
file) and --output_dir
(directory for saving results).
Prints out the values of these arguments.
Expected output:
Input file: /data/input.txt
Output directory: /results/
Final exercice: Mixing everything
For this last part of the final lecture we will combine a bit of everything we have seen during the module (you should plan for an hour for this exercice).
Challenge
It all start with a configuration file that you should download.
You will create four python files in a directory called ‘final’:
-
main.py
: it will contain the main code of the program -
cli.py
: that will contain the command line interface -
conf.py
: that will handle configuration file -
simulation.py
that will handle the simulation that we are going to fake. -
__init__.py
that will stay empty.
You will start by creating the command line interface in the cli.py file with the following optional arguments:
-
--config
: that will take a string value and the user will use it to pass the configuration file. -
--timestamp
: that will take a float as value -
--save
: an action argument. If used it should be true, false otherwise.
You should wrap this up in a function called
command_line
.
In the main.py, you will import the file
cli.py and call the command_line
function.
You should get the value of all the arguments and we are going to
analysing them. If the argument --config
is empty (=None),
you will close the program with a message printed in terminal (‘No
config file passed…exit’).
If something is passed to --config
the code will
continue and you will read the configuration file. This will be done by
calling the conf.py file where you will create a
function call read_conf
that takes the file as argument.
This function will return a dictionary with the complete configuration.
In the main.py, you must retrieve this complete
configuration.
Once you are there, check the --timestamp
option from
the command line interface. If something has been given, you should
replace, in the configuration, the value under
Parameters/time_step
by the value given by the user.
With this final configuration (updated or not, depending on the
user’s request) you will create a simulation. This will be done in
simulation.py. In that file you will create a
Simulation
class, that takes the three parameters
L
, M
and H
from the configuration
file as parameters. These parameters will become properties of that
class. You will create a method (function inside that class) called
get_total()
that makes the addition L+M.
Back to main.py, you will create a Simulation object
and then print the result of the get_total()
method.
Finally, if the user called the --save
argument, you
will use a function write_conf()
that you will create in
the conf.py file that will write the final
configuration to a file called final_conf.ini.
The solution of the code can be found in the github repository