4. Lecture 2 - The 5Rs#

This section of the course looks at Premise 4 and Premise 5 in more detail and examines how the 5Rs can be used in practice and to provide a guide to help choose best practice.

4.1. Overview#

“Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort.”

Wilson et al. 2014. “Best Practices for Scientific Computing”.

Most practitioners would broadly agree with the preceding claim. Consequently, there is a large body of high quality and readily available work providing guidance to programmers with a scientific background. Most of these focus almost entirely on the details of how to do this e.g. commenting and documenting code, choosing sensible variable names, organising code across multiple files - even rules for how to lay your code out; the list goes on. However, they tend not to discuss why this should be done.

In contrast, this course first sets out to examine what it is that determines the quality of scientific software and then uses this as a basis for exploring how to achieve this quality. The rationale for this approach is simple - the desired qualities for the software do not change over time but, as we are all aware, the technology we use to create and maintain software does change and often very rapidly. Better then to fully understand the basic requirements so that we can translate this into ways of working the current technology and also be able to adapt our working practices as new technology emerges.

What follows is based on the assumption that the following premises

4.1.1. Premise 4#

When it is part of a scientific endeavour the quality of computing should be quantified with respect to the same metrics as the overall endeavour. These metrics naturally include Replicability, Reproducibility, and Repeatability

4.1.2. Premise 5#

In addition to Replicability, Reproducibility, and Repeatability, the quality of computing should be quantified with respect to the intrisic software related metrics of Re-usability and Re-runability

form, at least for the purposes scientific software development, a good basis to examine, quantify, and develop best practice. This viewpoint has been developed over time by a number of researchers, some of them are scientists working a variety of disciplines, some are computer scientists, some of them are software engineers, and many of them are a bit off all of these.

A common theme that emerges from this research are the concepts of

  • Replicability

  • Reproducibility

  • Repeatability

  • Re-usability

  • Re-runability

These will be refered to as The 5Rs.

Of course, to make any progress in employing these concepts in practice, it makes sense to have a good working definition for each one. It turns out that this is a little more difficult than might be first thought. However, a good starting point is to consider these concepts in terms the more traditional scientific setting of a laboratory. The following basic description of what it means to do each of the 5Rs is taken from Goble 2016

4.1.3. Re-run#

Variations on experiment and setup

Let’s re-run the experiment, but this time we will test people in France instead Germany, and sample twice as many individuals.

4.1.4. Repeat#

Same experiment, same setup

Let’s repeat our experiment to check we followed the procedure correctly

4.1.5. Replicate#

Same experiment, same setup, independent lab

Let’s repeat the experiment of Fleischmann and Pons, but using our own equipment and researchers.

4.1.6. Reproduce#

Variations of experiment, independent lab

Let’s adapt the experiment of Fleischmann and Pons to see if we can get it to work using our own equipment and researchers.

4.1.7. Reuse#

Variations of experiment, independent lab

Let’s use John F. Shepard’s maze techniques but using cats instead of rats and using our own maze design

4.1.8. Exercise#

We would like to use the Rs within the domain of scientific software. Try translating each of the above descriptions into a scientific computing setting.

If you have time, have a quick look at the links provided in the descriptions of Replicate and Reproduce.

4.2. The 5Rs - a road map for scientific software development#

Now have a look at this online presentation.

4.3. The 5Rs - Roadmap and Signposts#

There are many techniques, tools and practices that can be adopted to guide and assist in realising good quality scientific software when assessed with respect to the 5Rs. The table below might serve as a reasonable starting point for providing a road map for achieving this.

<font color='green'>

Quaity

Description

Methods

Documentation

Reusable

code can be easily adapted to variations in problem specification
and readily integrated into other code

design patterns, packages / libraries

in code documentation / api documentation

Re-runnable

someone else can run the code

packages / libraries / repositories
unit tests / forums / user groups
training / instruction

installation instructions and scripts,
technical manuals and documentation

Repeatable

same results over time

unit tests / maintainance / version control

use cases

Reproducible

same results for a given problem

unit tests

case stdies / articles / reports

Replicable

algorithm / solution can be recoded by someone else

pseudo code

articles / books / reports

4.4. Best Practice#

The following is reproduced from Best Practices for Scientific Computing.

Remember that you should regularly reflect on your code and programming and try and quantify (so far as is practically possible) its quality when measured against the 5Rs as outlined in the presentation.

Measuring the 5RS

The scientific software development life cycle (SSDLC) is generally (very) iterative. Keep reflecting !!

The Scientific Software Development Life Cycle (SSDLC)

4.5. Further Reading#

The following works are referenced in the presentation and make for some interesting and thought provoking reading.

The 5Rs

Best Practices for Scientific Computing

Developing Scientific Sortware

Reproducible Research for Scientific Computing

Pandemic Simulation Verification