1. Module Outline#
1.1. Overview#
An essential component of Statistics and Operational Research today is the computer implementation of models and algorithms. Although modern high-level programming languages such as Python and R have greatly facilitated coding for scientists and analysts, the nature of research often means coding projects are complicated, requiring many source code files, potentially thousands of lines of code, and a precise set of dependencies. For scientific research, where accuracy, transparency and reproducibility are key tenets, code must be particularly reliable, and academic journals are increasingly requiring the submission of source code alongside published articles.
Unfortunately, doctoral students in Statistics and Operational Research (and scientists more generally), usually have little or no formal training in computer science, nor experience of software engineering, and so are often ill-equipped for the task of developing and maintaining research code. As a result, research code is often seen as unreliable.
This module aims to address this potential deficiency by training you to produce scientific software that is replicable, reproducible, reusable, re-runnable, and repeatable.
The course is divided into three parts. In the first part foundational computation science concepts will be introduced. In particular we will cover the analysis of algorithms and data structures, and different of models of programming and design patterns. The second part of the course then proceeds with how these ideas are applied in practice with particular reference to the software engineering practices associated with collaborative programming, software maintenance and support, testing and distribution of software. The final part of the course consists of a larger group programming project for which you will be required to draw on the knowledge and skills developed in the rest of the course to reproduce and established piece of contemporary research.
During the lectures and workshops, emphasis is placed on problems and techniques associated with the fields of statistics and operational research. Although programming concepts will be taught in a language-agnostic way, we will primarily use Python for teaching, as this is a popular language for scientific computing and it supports the required programming paradigms.
1.2. Prerequisites#
Basic knowledge of Python, such as that as provided by the short course “Introductory Python”
Basic Unix skills, such as those acquired from the short course Introduction to Unix
Access to a Windows and a Unix based systems with installations of Python, Jupyter Notebook, and Spyder.
1.3. Learning Outcomes#
Students who pass this module should be able to:
analyse an algorithm in terms of its computational complexity and determine appropriate data structures for its implementation in software
understand the difference between declarative and imperetive programming styles and how, in practice, they can be used together to design and implement software
use three common models of programming (procedural, object-oriented and functional), and identify applicable generic design patterns
use software engineering tools such as profilers, debuggers, testing frameworks and environment management systems
understand basic computer architecture and operating systems and be able to parallelize code where appropriate
use tools and mechanisms for collaborative programming, and distribution and support of code within the wider research community
produce scientific software that is replicable, reproducible, reusable, re-runnable, and repeatable
1.4. Teaching#
This module will be taught by a mixture of lectures and computer workshops (34 hours total).
1.5. Assessment#
There are three main assessments for this course. Two smaller assessments will take place while the module is being taught to reinforce fundamental concepts and materials introduced in the earlier parts of the course, as well as presenting an opportunity to provide students with formative feedback. The final assessment, which has the highest weighting, will take the form of a large coding project of the type you may have to undertake in their future research.
Details of each assessment are as follows:
Programming Assessment (Weighting: 25%)
The first assessment requires students to demonstrate an application of fundamental concepts regarding standard patterns and data structures to make appropriate software design decisions for a given algorithm. Students will then implement their design using an appropriate programming language. The implementation should include an element of parallelisation.
Software Engineering Assessment (Weighting: 25%)
The second assignment requires students to study and implement a given algorithm. The implementation should be engineered so as to be suitable for hosting on a public repository, such as PyPI or CRAN, and be under publicly accessible version control using, for example, GitHub. The software should have supporting material including, but not restricted to, documentation, use cases, and information regarding how it can be re-used and extended by other researchers.
Reproduction Group Project (Weighting: 50%)
The final coursework will consist of a programming reproduction exercise where groups will have to implement a methodology from the stats or OR literature as a package. This will require students to draw on all concepts and practices covered during the course. Besides producing a code repository and report, this component will also include a group presentation and peer assessment component. The peer assessment component is included to ensure that each team member is fairly awarded for their contribution to the project.
1.6. Timeline#
These following dates and timings are provisional and may be updated over the course of the module.
Staff Abreviations :
JF = Jamie Fairbrother
DG = Daniel Grose
DB = Dylan Bahia
HE = Harry Ellingham
In addition to the sessions detailed below there will be session for group presentations of Assessment 3.
1.6.1. Part 1 : Foundations#
Session 1 : 19/01 - Introduction to the Course - The 5Rs - Programming Challenges : (2 hours) JF, DG, DB : Lab2 PSC
Session 2 : 21/01 - Iterated Prisoners Dilemma - Declarative Programming - Functional Design Patterns : (2 hours) DG, DB : LT16 LUMS
Session 3 : 26/01 - Software Development Environments - Functional Design Patterns - Iterated Prisoners Dilemma : (2 hours) DG, HE : Lab2 PSC
Session 4 : 28/01 - Iterated Prisoners Dilemma : (2 hours) DG, DB : LT16 LUMS
Session 5 : 02/02 - Pseudo Code - Iterated Prisoners Dilemma : (2 hours) DG, DB : Lab2 PSC
Session 6 : 04/02 - Data Structures - Computational Complexity : (2 hours) DG, DB : LT16 LUMS
Session 7 : 09/02 - Data Structures - Computational Complexity : (3 hours) DG, DB : Lab2 PSC
Session 8 : 11/02 - The Game of Pig - Assessment 1 : (2 hours) DG, DB : LT16 LUMS
1.6.2. Part 2 : Software Engineering#
Session 9 : 02/03 - Version Control - Debugging : (3 hours) JF, HE : Lab2 PSC
Session 10 : 04/03 - Profiling : (1 hour) DG, HE : LT16 LUMS
Session 11 : 09/03 - Packaging and Serialisation : (3 hours) DG, HE : Lab2 PSC
Session 12 : 11/03 - Unit Tests : (1 hour) DG, HE : LT16 LUMS
Session 13 : 20/04 - Advanced Version Control : (2 hours) JF, HE : Lab2 PSC
1.6.3. Part 3 : Replication / Reproduction Project#
Session 14 : 22/04 - Assessment 3 Support Tutorial : (1 hour) JF, DG, HE : LT16 LUMS
Session 15 : 29/04 - Assessment 3 Support Tutorial : (3 hours) JF, DG, HE : LT16 LUMS
Session 16 : 08/05 - Assessment 3 Support Tutorial : (3 hours) JF, DG, HE : LT16 LUMS
1.7. Teaching Staff and Contacts#
This course is administrated, organised, and led by the course covenor, Jamie Fairbrother
Jamie Fairbrother (module convenor)
Office: D32, Charles Carter
Dan Grose
Announcements for the Module will be made Teams group
Additional resources such as code may also be shared here.
You may also post your questions here.
1.8. Course Materials and Further Reading#
All lecture notes and tutorial worksheets will be provided through the course website. Although the lecture and workshops are self-contained, the following references may prove useful:
Benureau, F. C. Y. and Rougier, N. P. (2018). Re-run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions. Frontiers in Neuroinformatics. 11. https://doi.org/10.3389/fninf.2017.00069
Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Clifford, S. (2009). Introduction to Algorithms (2nd Edition). MIT Press.
Roy, P. V. and Haridi, S. (2004). Concepts, Techniques and Models of Computer Programming. MIT Press.
Gamma, E., Helm, R., Johnson, R. and Vlissides, J. M. (1994). Design Patterns: Elements of Reusable Object- Oriented Software. Addison-Wesley Professional.