Lecture 2 - The 5Rs

4. Lecture 2 - The 5Rs#

This section of the course looks at Premise 4 and Premise 5 in more detail and examines how the 5Rs can be used in practice and to provide a guide to help choose best practice.

4.1. Overview#

“Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort.”

Wilson et al. 2014. “Best Practices for Scientific Computing”.

Most practitioners would broadly agree with the preceding claim. Consequently, there is a large body of high quality and readily available work providing guidance to programmers with a scientific background. Most of these focus almost entirely on the details of how to do this e.g. commenting and documenting code, choosing sensible variable names, organising code across multiple files - even rules for how to lay your code out; the list goes on. However, they tend not to discuss why this should be done.

In contrast, this course first sets out to examine what it is that determines the quality of scientific software and then uses this as a basis for exploring how to achieve this quality. The rationale for this approach is simple - the desired qualities for the software do not change over time but, as we are all aware, the technology we use to create and maintain software does change and often very rapidly. Better then to fully understand the basic requirements so that we can translate this into ways of working the current technology and also be able to adapt our working practices as new technology emerges.

What follows is based on the assumption that the following premises

4.1.1. Premise 4#

When it is part of a scientific endeavour the quality of computing should be quantified with respect to the same metrics as the overall endeavour. These metrics naturally include Replicability, Reproducibility, and Repeatability

4.1.2. Premise 5#

In addition to Replicability, Reproducibility, and Repeatability, the quality of computing should be quantified with respect to the intrisic software related metrics of Re-usability and Re-runability

form, at least for the purposes scientific software development, a good basis to examine, quantify, and develop best practice. This viewpoint has been developed over time by a number of researchers, some of them are scientists working a variety of disciplines, some are computer scientists, some of them are software engineers, and many of them are a bit off all of these.

A common theme that emerges from this research are the concepts of

Replicability
Reproducibility
Repeatability
Re-usability
Re-runability

These will be refered to as The 5Rs.

Of course, to make any progress in employing these concepts in practice, it makes sense to have a good working definition for each one. It turns out that this is a little more difficult than might be first thought. However, a good starting point is to consider these concepts in terms the more traditional scientific setting of a laboratory. The following basic description of what it means to do each of the 5Rs is taken from Goble 2016

4.1.3. Re-run#

Variations on experiment and setup

“Let’s re-run the experiment, but this time we will test people in France instead Germany, and sample twice as many individuals.”

4.1.4. Repeat#

Same experiment, same setup

“Let’s repeat our experiment to check we followed the procedure correctly”

4.1.5. Replicate#

Same experiment, same setup, independent lab

“Let’s repeat the experiment of Fleischmann and Pons, but using our own equipment and researchers.”

4.1.6. Reproduce#

Variations of experiment, independent lab

“Let’s adapt the experiment of Fleischmann and Pons to see if we can get it to work using our own equipment and researchers.”

4.1.7. Reuse#

Variations of experiment, independent lab

“Let’s use John F. Shepard’s maze techniques but using cats instead of rats and using our own maze design”

4.1.8. Exercise#

We would like to use the Rs within the domain of scientific software. Try translating each of the above descriptions into a scientific computing setting.

If you have time, have a quick look at the links provided in the descriptions of Replicate and Reproduce.

4.2. The 5Rs - a road map for scientific software development#

Now have a look at this online presentation.

4.3. The 5Rs - Roadmap and Signposts#

There are many techniques, tools and practices that can be adopted to guide and assist in realising good quality scientific software when assessed with respect to the 5Rs. The table below might serve as a reasonable starting point for providing a road map for achieving this.

<font color='green'>

Quaity	Description	Methods	Documentation
Reusable	code can be easily adapted to variations in problem specification and readily integrated into other code	design patterns, packages / libraries	in code documentation / api documentation
Re-runnable	someone else can run the code	packages / libraries / repositories unit tests / forums / user groups training / instruction	installation instructions and scripts, technical manuals and documentation
Repeatable	same results over time	unit tests / maintainance / version control	use cases
Reproducible	same results for a given problem	unit tests	case stdies / articles / reports
Replicable	algorithm / solution can be recoded by someone else	pseudo code	articles / books / reports

4.4. Best Practice#

The following is reproduced from Best Practices for Scientific Computing.

Remember that you should regularly reflect on your code and programming and try and quantify (so far as is practically possible) its quality when measured against the 5Rs as outlined in the presentation.