Assessment

This document contains the assessed component for STOR-601 module “Scientific Computing with Python”.

Instructions

What can/cannot be used

To achieve the tasks in this assessment you can use :

  • Any of the code you wrote during the course and any of the examples provided in the course materials

  • Any standard python data structures and functions

  • The matplotlib library

  • The numpy library

  • The sortedcontainers library

  • Any other librarues availableas part of SciPy

If you want other libraries then please check there suiatable with
Daniel Grose first.

However, you should not use :

  • Any code that is NOT python 3

  • Any code that is not open source. (If it has a copyright, check you can use it)

You can use code examples from other sources, but a complete solution to a task that has only minor differences to a published solution will be considered plagiarism. Be sensible.

Submitting the assessment

Your assessment should be submitted as a Jupyter Notebook. You can use any of the notebooks used to create the course material. Your notebook must include your name, data of submission, and each of the Task details from this document should be placed above each of your solution to the Task. Your notebook should be e-mailed directly to
Daniel Grose

The submission date for this assignment is 30/04/2021

Advice

The marks for each component of each task are shown in square brackets. In some of the tasks you have to provide descriptions, justifictions and supporting material such as charts, figures and possibly some simple analysis. In such cases some of the marks will be allocated according to how creative and novel your answers are. Try and make as good as use as you can of the facilities supported by Jupyter Notebooks. For example, LaTeX, markdown, charts, html etc.

Task 1

a) Write a python function that accepts two sorted lists of n numbers as arguments and returns the value of the median of the combined data from both lists. [5]

b) Determine how the time taken for your function to calculate the median varies with n. Demonstrate this visually using matplotlib. [5]

c) What is the complexity of the algorithm you used in part a) ? [2]

d) Is it possible to write a function for part a) that has better than O(n) worst case complexity ? Research this problem, and if it is possible, write a python function that realises better than O(n) worst case complexity. If you think that it is not possible, then write a short explanation as to why this is so. [5]

Note - if your answer to part a) is better then O(n) worst case, then this constitutes an answer to part d) as well.

Task 2

a) Write methods for calculating the following statistics on a moving window of length \(K\)

i) max [3]

ii) min [3]

iii) mean [3]

iv) median [3]

v) standard deviation [3]

vi) median absolute deviation [6]

You functions should take a list as an argument and return a list and can use any standard python functions (such as min) if you wish. However, you can only use the types of data structures described in the Appendix i.e. lists, strings, tuples, dictionaries, and sets. You can also use the array data structure from the numpy library if you wish.

b) Augment your methods from part a) to implement padding using

i) pad zero [3]

ii) pad value [3]

c) Generate some random sequences (of your choice) that include “outliers” and use them to visualise your methods developed in parts a) and 2b) with matplotlib. Use at least two distributions from the numpy package. Marks will be awarded for creativity both with respect to the structure of the data and the features in the plots. [10]

d) In terms of \(K\) (i.e not considering the length of the data \(N\)) determine the worst case complexity for each of your methods from part a). You should quote the complexity using “Big O” notation. Justify your answers using timing data produced using your methods and summarise this data graphically. Make sure you account for variation in the run time due to the structure of the data and random fluctuations in the avaialibity of CPU allocation on your computer system. [15]

Task 3

This question requires that you have access to the sortedcontainers library available from PyPI.

a) Enter the worst case complexity for each method in Table 1. Leave the cell blank if the method is not supported by the data structure.

Table 1

method

list

dictionary

set

tuple

sorted list

sorted dictionary

sorted set

sort

add

append

bisect

remove

pop

b) For each data structure in Table 1 provide a short description of a simple use case that exploits a feature of that particular data structure. In each case, justify the choice of the data structure in relation to the use case. [10]

c) Can you change the way SortedList keeps its data sorted ? If so, give a short example of how this can be acheived. [5]

Task 4

a) Where possible, use the SortedList container to improve the worst case run time complexity for your solutions to Task 2 part a). Where this is not possible, explain why you think this to be the case. [20]

b) For each of your solutions in part a), use suitable data in conjunction with matplotlib to demonstrate how the complexity of your solution compares to the one you obtained in Task 2 part a). [15]

Task 5

a) Write a python function to generate a list of data corresponding to \(n\) samples of \(X_{i}^{2}\)where \(X_{i}\) is distributed according to \(X_{i} \sim \mathcal{N}(0,\sigma_{1}) + \lambda B(1,p)\) for \(a \leq i \leq b\) and \(X_{i} \sim \mathcal{N}(0,\sigma_{2}) + \lambda B(1,p)\) otherwise. Your function should accept \(n\), \(\sigma_{1}\), \(\sigma_{2}\), \(\lambda\), \(p\), \(a\), and \(b\) as parameters. Here, \(\mathcal{N}(\mu,\sigma)\) indicates a normal distribution and \(B(n,p)\) a binomial distibution. [5]

b) Use matplotlib to visualise some examples of the output from your solution to part a) [5]

c)

(i) Use your solutions from Task 2 part a) to develop a method for predicting \(a\) and \(b\) from data produced by your solution to part a). [10]

(ii) Provide an example showing how your method is used. [2]

(iii) Describe how the size of the sliding window effects the prediction of \(a\) and \(b\) ? [15]

(iv) Describe how the parameters in your solution to part a) effect the ability of your solution to a) (i) to predict \(a\) and \(b\). [15]

Note 1 - Your answers to part c) (iii) and c) (iv) can include plots, code examples, analysis, and so on. Be creative.