More Data Structures

Summarising the basic python data structures

Here are the basic python data structures again

set

S = {1,2,7,3,2,5}
print(S)
{1, 2, 3, 5, 7}

list

L = [1,2,7,3,2,5]
print(L)
[1, 2, 7, 3, 2, 5]

dictionary

D = {"first" : 1,"second" : 2,"third" : 7,"fourth" : 3,"fifth" : 2,"sixth" : 5}
print(D)
{'first': 1, 'second': 2, 'third': 7, 'fourth': 3, 'fifth': 2, 'sixth': 5}

tuple

T = (1,2,7,3,2,5)
print(T)
(1, 2, 7, 3, 2, 5)

In what way(s) could these data structures be characterised ? Here is one possibility. It characterises the data structures according to what methods can be (meaningfully) applied to them.

method

list

dictionary

set

tuple

sort

add

append

bisect

remove

pop

Exercise 1

Complete the above table. Bear in mind, the method name is meant to be descriptive. The actual method (or technique) you use in python to acheive it may not have this name.

Challenge 1

Replace each tick in the table from exercise 1 with the complexity of the method.

Sorted data structures

Sometimes you may encounter an algorithm which requires a data structure that does not fit the characterisation provided by table 1. For example, which data structure(s) did you use for exercises 2, 3, and 4 in the Moving Windows chapter ? Are you happy with the worst case complexity you acheived for each of your solutions ? If so, which ones are you not happy with ?

Fortunately, there are more data structures available, but they are not part of most standard python distributions. Quite often, these data structures seem quite familiar and just seem to have some extra methods available. So why are they not in standard python ?

The answer comes back to the table you compiled in challenge 1. The inclusion of additional methods comes with a cost. Part of the cost is that the complexity of some of the methods changes, and for the worse. You have to have a good reason to use them - they are not “off the shelf” solutions for most problems. They provide well considered trade offs in terms of complexity which make them suitable for certain, perhaps more specialist, types of problems.

The types of data structures which extend the standard ones are often refered to as augmented data structures. One useful family of augmented data structures are called “sorted” containers. They organise the data using an order operator \(<\) to provide a trade off between the time required to add or remove data, and the time required to find data. This is acheived by efficiently keeping the data sorted each time data is added or removed.

Exercise 2

Install the sorted containers library from PyPI using pip.

Solution 2

!python3 -m pip install sortedcontainers
Requirement already satisfied: sortedcontainers in /home/grosedj/python-envs/further-python-env/env/lib/python3.9/site-packages (2.3.0)
WARNING: You are using pip version 20.2.3; however, version 21.0.1 is available.
You should consider upgrading via the '/home/grosedj/python-envs/further-python-env/env/bin/python3 -m pip install --upgrade pip' command.

Example 1

The following example takes a quick look at the SortedList data structure.

from sortedcontainers import SortedList
L = [1,2,7,3,2,5]
print(L[2])
SL = SortedList([1,2,7,3,2,5])
print(SL[2])
7
2

Exercise 3

Find out how to add data to sorted list ?

Solution 3

SL.add(4)
print(SL)
SortedList([1, 2, 2, 3, 4, 5, 7])

Exercise 4

%%html
Use the following <a href="http://www.grantjenks.com/docs/sortedcontainers/index.html">link</a> to find out what methods
the SortedList data supports and add the SortedList data structure to your table from exercise 1.
Use the following link to find out what methods the SortedList data supports and add the SortedList data structure to your table from exercise 1.

Exercise 5

Can you use a SortedList to reduce the complexity of your solution to exercise 3 in the “Moving Windows” chapter ?

Assessment

This question requires that you have access to the sortedcontainers library available from PyPI.

a) Enter the worst case complexity for each method in Table 1. Leave the cell blank if the method is not supported by the data structure.

Table 1

method

list

dictionary

set

tuple

sorted list

sorted dictionary

sorted set

sort

add

append

bisect

remove

pop

b) For each data structure in Table 1 provide a short description of a simple use case that exploits a feature of that particular data structure. In each case, justify the choice of the data structure in relation to the use case.

c) Can you change the way SortedList keeps its data sorted ? If so, give a short example of how this can be acheived.

Assessment

a) Where possible, use the SortedList container to improve the worst case run time complexity for your solutions to part a) of the Assessment section in the “Moving Windows” chapter. When it is not possible, explain why you think this is the case. [20]

b) For each of your solutions in part a), use suitable data in conjunction with matplotlib to demonstrate how the complexity of your solution compares to the one you obtained in part Aa of the Assessment section of the “Moving Windows” chapter. [15]