Session 6 : Data Structures - Computational Complexity

8. Session 6 : Data Structures - Computational Complexity#

8.1. Session Overview#

The objectives of this session are

Understand datastructures in a declarative sense as mappings between sets
Understand computational complexity in terms of asymptotics
Understand the difference between several common asymptotic measures of computational complexity
Understand the difference between best, worst, and average cases
Know the basic data structures provided by python and what mappings they correspond to, and how to use them
Know the computational complexity of popular python implementations of common data structures
Know how to choose data structures according to use case

8.2. Data structures as mappings#

At a useful level of abstraction, a data structure is a way of representing and organizing data so that certain operations can be performed efficiently.

Most data structures can be viewed as implementations of one or more mathematical mappings between sets.

At minimum, a data structure represents a (possibly partial) function

\[f : K \to V\]

where \(K\) is a set of keys (indices, positions, labels, states) and \(V\) is a set of values.

8.2.1. Examples#

Arrays / vectors: a mapping from a finite index set \(\{0,\dots,n-1\}\) to values.
Hash tables / dictionaries: a mapping from an arbitrary key set to values, typically approximating constant-time access.
Trees and graphs: mappings from nodes to collections of nodes (adjacency), or from keys to values with additional structural constraints.
Matrices and tensors: mappings from multi-dimensional index tuples to values.

What distinguishes data structures is not the mapping itself, but the constraints imposed on how the mapping is stored and traversed (ordering, hierarchy, sparsity, locality).

8.2.2. Operations and abstract interfaces#

Each data structure supports a set of operations that correspond to queries or transformations of the mapping:

lookup: evaluate \(f(k)\)
insert / delete: modify the domain or codomain
iterate: traverse the domain in a prescribed order
aggregate: compute functions over subsets of the mapping

Formally, a data structure can be seen as an algorithmic representation of an abstract data type (ADT): a set plus a collection of operations with specified semantics.

8.2.3. Computational complexity as the defining trade-off#

The central purpose of a data structure is to control the computational complexity of these operations.

Complexity is typically measured asymptotically in terms of:
- time (e.g. \(O(1), O(\log n), O(n)\))
- space (memory usage as a function of input size)
Different data structures represent the same abstract mapping but optimize different operations.

8.2.4. Examples of trade-offs#

An array gives \(O(1)\) access by index but \(O(n)\) insertion.
A balanced binary search tree gives \(O(\log n)\) lookup, insertion, and deletion while maintaining order.
A hash table sacrifices ordering to achieve expected \(O(1)\) lookup.

Choosing a data structure is therefore equivalent to choosing a complexity profile over a set of operations.

8.2.5. Summary#

Data structures mediate between mathematical objects (functions, relations, graphs, tensors) and computational reality (finite memory, cache behavior, parallelism).
They determine whether an algorithm is feasible at scale, not just correct.
Many performance bottlenecks arise not from the algorithmic idea, but from a mismatch between the structure of the data and the structure imposed by the data structure.

In short: a data structure is a concrete, complexity-aware realization of a mapping, designed to make specific operations fast while accepting trade-offs elsewhere.

8.3. Computational Complexity#

Computational complexity studies how the resources required by an algorithm scale with the size of the input. The two dominant resources are:

Time complexity: number of elementary operations executed
Space complexity: amount of memory used

Formally, if an algorithm takes an input of size ( n ), its time complexity is a function

\(T(n) = \text{number of steps executed on inputs of size } n.\)

The key idea is scaling behavior, not exact runtime. Complexity theory abstracts away:

hardware details,
constant factors,
implementation quirks,

and focuses on growth rates as \( n \to \infty \).

This is important because :

datasets grow,
asymptotic behavior dominates practical feasibility,
algorithm choice often beats micro-optimisation by orders of magnitude.

8.3.1. Asymptotic analysis#

8.3.1.1. Big-O notation (upper bounds)#

We say \(T(n) \in O(f(n))\) if there exist constants \( c > 0 \) and \( n_0 \) such that \(T(n) \le c f(n) \quad \forall n \ge n_0\).

Interpretation:

\( f(n) \) is an asymptotic upper bound on the algorithm’s growth.

Example:

\(T(n) = 3n^2 + 5n + 20 \in O(n^2)\).

8.3.1.2. Big-Ω notation (lower bounds)#

\( T(n) \in \Omega(f(n)) \quad \Longleftrightarrow \quad T(n) \ge c f(n) \text{ eventually}. \)

This gives a guaranteed minimum growth rate.

8.3.1.3. Big-Θ notation (tight bounds)#

\( T(n) \in \Theta(f(n)) \quad \Longleftrightarrow \quad T(n) \in O(f(n)) \cap \Omega(f(n)). \)

This means \( f(n) \) characterises the true asymptotic order.

8.3.1.4. Overview#

The above definitions and figure are taken from Intoduction to Algorithms

In practice, Big-O is most commonly used, even when tighter bounds exist.

8.3.2. Best case, worst case, and average case#

An algorithm’s runtime may depend not just on input size \( n \), but on the structure of the input. We therefore distinguish cases.

8.3.2.1. Worst-case complexity#

\( T_{\text{worst}}(n) = \max_{\text{inputs}} T(\text{input}) \) where all of the iputs are of size \(n\).

Provides a guarantee
Common in algorithm analysis and systems design
Often pessimistic but robust

Example:
Linear search in an unsorted array

Worst case: element not present → ( O(n) )

8.3.2.2. Best-case complexity#

\( T_{\text{best}}(n) = \min_{\text{inputs}} T(\text{input}) \) where all of the iputs are of size \(n\).

Rarely used alone
Can be misleading
Useful when best-case happens frequently or is structurally meaningful

Example:

Linear search: target is first element → ( O(1) )

8.3.2.3. Average-case complexity#

\( T_{\text{avg}}(n) = \mathbb{E}[T(X_n)] \) where \( X_n \) is a random input of size \( n \) drawn from some probability distribution.

Average case is only meaningful relative to an assumed input distribution.

Example:

Quicksort:
- Worst case: ( O(n^2) )
- Average case (under random permutations): \( \Theta(n \log n) \)

Importantly, This connects naturally to:

probabilistic modelling of data,
assumptions about randomness,
distribution-sensitive algorithms.

8.3.2.4. Amortised analysis: averaging over sequences, not inputs#

Amortised complexity analyzes the average cost per operation over a sequence of operations, even if individual operations are expensive.

Crucially :

No probability is involved
Guarantees hold for every sequence

Let a sequence of \( m \) operations take total time \( T(m) \).
The amortised cost per operation is: \( \frac{T(m)}{m}. \)

8.4. Computational Complexity of Common Python Data Structures#

8.4.1. list#

Operation	Time
Index access `a[i]`	O(1)
Append `append(x)`	O(1) amortized
Pop end `pop()`	O(1)
Insert / delete middle	O(n)
Membership `x in a`	O(n)
Sort	O(n log n)

8.4.2. dict#

Operation	Time
Lookup `d[k]`	O(1) avg
Insert / update	O(1) avg
Delete	O(1) avg
Membership `k in d`	O(1) avg
Iterate	O(n)

Worst-case O(n) (hash collisions, rare)

8.4.3. set#

Operation	Time
Add	O(1) avg
Remove	O(1) avg
Membership	O(1) avg
Union / intersect	O(n)

8.4.4. deque#

Operation	Time
Append right / left	O(1)
Pop right / left	O(1)
Index access `d[i]`	O(n)
Insert / delete middle	O(n)

Best for queues, BFS, sliding windows

8.4.5. namedtuple#

Operation	Time
Field access `p.x`	O(1)
Index access	O(1)
Creation	O(n)
Mutation	❌ (immutable)

Same performance as tuple, just nicer syntax

8.4.6. SortedList#

Operation	Time
Index access	O(1)
Insert	O(log n)
Delete	O(log n)
Membership	O(log n)
Bisect	O(log n)

8.4.7. SortedDict#

Operation	Time
Lookup	O(log n)
Insert	O(log n)
Delete	O(log n)
Min / max key	O(1)

8.4.8. SortedSet#

Operation	Time
Add	O(log n)
Remove	O(log n)
Membership	O(log n)
Iterate (sorted)	O(n)

8.5. Exercises#

8.5.1. list#

8.5.1.1. Key ideas#

Ordered, mutable, allows duplicates
Fast append, slow insert/delete in the middle

8.5.1.2. Exercise 1 - Indexing and slicing#

data = [3, 5, 7, 11, 13, 17]

Extract the last three elements using slicing
Reverse the list using slicing (no loops)
What happens if you try to access data[10]?

8.5.1.3. Exercise 2: Mutation#

Exercise 2: Mutation

data = [10, 20, 30]

Append 40
Insert 15 at index 1
Remove 20 using two different methods
Observe the return values of append, insert, and remove

8.5.1.4. Exercise 3: Performance intuition#

Time how long it takes to:
- append() 100,000 items
- insert(0, x) 100,000 items
Explain the difference.

8.5.1.5. Use cases#

Storing raw observations
Feature vectors
Sequential data (time series before indexing)
Mini-batches during model training

8.5.2. tuple#

8.5.2.1. Key ideas#

Ordered, immutable
Hashable (if contents are hashable)

8.5.2.2. Exercise 1: Immutability#

point = (3, 4)

Try changing point[0]
Create a new tuple with x doubled

8.5.2.3. Exercise 2: Tuple unpacking#

record = ("Alice", 22, "Biology")

Unpack into three variables
Swap two variables without using a temp variable

8.5.2.4. Exercise 3: Tuples as keys#

locations = {
    (40.7, -74.0): "New York",
    (34.0, -118.2): "Los Angeles"
}

Look up a city
Explain why lists can’t be used as keys

8.5.2.5. Use cases#

Fixed records (rows)
Coordinates (lat, long)
Dictionary keys for multi-dimensional indexing
Lightweight, memory-efficient records

8.5.3. set#

8.5.4. Key ideas#

Unordered, mutable
No duplicates
Fast membership tests

Exercise 1: Deduplication

ids = [101, 102, 101, 103, 102]

Convert to a set
Convert back to a list
What information was lost?

Exercise 2: Set operations*

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

Compute union, intersection, difference
Use both operators and methods

8.5.4.1. Exercise 3: Membership timing#

Time x in list vs x in set for large collections
Explain why the set is faster

8.5.4.2. Use cases#

Unique identifiers
Feature vocabularies
Removing duplicates
Fast filtering and membership checks

8.5.5. dict#

8.5.5.1. Key ideas#

Key–value mapping
Insertion-ordered (Python ≥3.7)
Fast lookups

8.5.5.2. Exercise 1: Basic operations#

counts = {}

Count word frequencies from a list of strings
Rewrite using dict.get
Rewrite using collections.defaultdict

8.5.5.3. Exercise 2: Iteration#

data = {"A": 10, "B": 20, "C": 30}

Iterate over keys
Iterate over values
Iterate over key–value pairs

8.5.5.4. Exercise 3: Dictionary views#

Store keys(), values(), items() in variables
Modify the dictionary
Observe how the views change

8.5.5.5. Use cases#

Feature mappings
Label encoding
Aggregations
Sparse representations

8.5.6. str#

8.5.6.1. Key ideas#

Immutable sequence of characters
Rich API for text processing

8.5.6.2. Exercise 1: Indexing and slicing#

text = "Data Science"

Extract "Science"
Reverse the string
Get every second character

8.5.6.3. Exercise 2: String methods#

Convert to lowercase
Replace spaces with _
Split into words

8.5.6.4. Exercise 3: Immutability#

Try modifying a character
Create a modified copy instead

8.5.6.5. Use cases#

Text preprocessing
Tokenization
Feature extraction
Labels and metadata

8.5.7. deque#

8.5.7.1. Key ideas#

Double-ended queue
Fast appends and pops from both ends

8.5.7.2. Exercise 1: Basic operations#

from collections import deque
dq = deque([1, 2, 3])

Append to the right
Append to the left
Pop from both ends

8.5.7.3. Exercise 2: Sliding window#

Use a deque of max length 5
Simulate streaming data
Observe how old values are discarded

8.5.7.4. Exercise 3: Compare with list#

Time pop(0) on a list
Time popleft() on a deque

8.5.7.5. Use cases#

Streaming data
Rolling averages
Buffers and queues
Time-windowed analytics

8.5.8. namedtuple#

8.5.8.1. Key ideas#

Tuple with named fields
Immutable
Lightweight alternative to classes

8.5.8.2. Exercise 1: Creation#

from collections import namedtuple
Student = namedtuple("Student", ["name", "age", "major"])

Create a student
Access fields by name and index

8.5.8.3. Exercise 2: Immutability#

Try modifying a field
Use _replace to create a new version

8.5.8.4. Exercise 3: Comparison#

Compare a namedtuple to:
- tuple
- dictionary
Discuss readability vs flexibility

8.5.8.5. Use cases#

Structured records
Configuration objects
Model outputs
Lightweight row representations

8.5.9. SortedList#

8.5.9.1. Key ideas#

Maintains sorted order automatically
Fast lookup and slicing

8.5.9.2. Exercise 1: Automatic sorting#

from sortedcontainers import SortedList
sl = SortedList()

Add numbers in random order
Observe ordering after each insert

8.5.9.3. Exercise 2: Indexing and slicing#

Access smallest and largest values
Slice the middle range

8.5.9.4. Exercise 3: Comparison#

Compare with:
- sorting a list repeatedly
- heapq
Discuss trade-offs

8.5.9.5. Use cases#

Online statistics
Maintaining quantiles
Ranked scores
Real-time leaderboards

8.5.10. SortedSet#

8.5.10.1. Key ideas#

Sorted + unique
Combines set semantics with order

8.5.10.2. Exercise 1: Deduplication + order#

from sortedcontainers import SortedSet
ss = SortedSet([5, 1, 3, 3, 2])

Observe ordering
Try adding duplicates

8.5.10.3. Exercise 2: Set operations#

Intersection and union with another SortedSet
Compare with regular set

8.5.10.4. Data science use cases#

Unique sorted categories
Feature vocabularies with order
Ranked identifiers

8.5.11. SortedDict#

8.5.11.1. Key ideas#

Dictionary sorted by keys
Efficient range queries

8.5.11.2. Exercise 1: Ordered keys#

from sortedcontainers import SortedDict
sd = SortedDict()

Insert keys out of order
Iterate over keys and values

8.5.11.3. Exercise 2: Range queries#

Retrieve items with keys between two values
Compare with normal dict

8.5.11.4. Use cases#

Time-indexed data
Ordered categorical mappings
Efficient range filtering
Event logs

8.5.12. Summary Exercise#

Which structures preserve order? Which don’t?
Which are mutable vs immutable?
When would sortedcontainers outperform built-in types?
Which data structures would you choose for:
- Streaming data
- Text analysis
- Real-time rankings
- Sparse features

8.6. Data Structure Choice#

8.6.1. Example 1#

You need to store a growing sequence of observations in arrival order

Choice: list

Why:

Preserves insertion order
Fast append() → O(1) amortized
Simple indexing and slicing

Why not others:

deque is unnecessary unless removing from the front
SortedList adds overhead if order of arrival matters

8.6.2. Example 2#

You need a fixed record representing a row of data (name, age, major)

Choice: namedtuple

Why:

Immutable → safer for records
Clear, readable field access (student.age)
More memory-efficient than a dictionary

Why not others:

dict is flexible but less structured
tuple lacks readability

8.6.3. Example 3#

You need fast membership tests for millions of IDs

Choice: set

Why:

Average-case O(1) membership tests
Automatically removes duplicates

Why not others:

list membership is O(n)
SortedSet adds overhead if order is irrelevant

8.6.4. Example 4#

You need to count frequencies of categories or words

Choice: dict (or defaultdict)

Why:

Fast updates and lookups → O(1) average
Natural key → count mapping
Scales well for sparse data

Why not others:

list would require searching
set cannot store counts

8.6.5. Example 5#

You need to remove duplicates but also keep values sorted

Choice: SortedSet

Why:

Enforces uniqueness like a set
Maintains sorted order automatically
Efficient iteration in order

Why not others:

set is unordered
list requires manual sorting and duplicate removal

8.6.6. Example 6#

You need a rolling window over streaming data

Choice: deque

Why:

Fast insertion/removal at both ends → O(1)
Supports fixed-size windows
Designed for queue-like behavior

Why not others:

list.pop(0) is O(n)
SortedList is unnecessary if only order matters

8.6.7. Example 7#

You need to compute rolling medians in real time

Choice: SortedList

Why:

Maintains sorted order on insertion → O(log n)
Median retrieval is O(1)
Ideal for online statistics

Why not others:

Sorting a list repeatedly is expensive
heapq is more complex for median logic

8.6.8. Example 8#

You need to store time-indexed data and query ranges

Choice: SortedDict

Why:

Keys always sorted
Efficient range queries
Natural fit for timestamps

Why not others:

Regular dict requires manual sorting
list loses key–value semantics

8.6.9. Example 9#

You need to store coordinates or multi-dimensional keys

Choice: tuple (as dictionary keys)

Why:

Immutable and hashable
Lightweight
Safe as dictionary keys

Why not others:

list is mutable and unhashable
dict is unnecessary nesting

8.6.10. Example 10#

You need to preprocess and clean text data

Choice: str (with list/dict support)

Why:

Rich built-in API
Immutable → safe transformations
Works well with pipelines

Why not others:

list of characters is harder to manage
Mutability adds bugs without benefit

8.7. Summary#

8.7.1. Cheat Sheet#

Choose the simplest structure that

supports the required operations efficiently
matches the data’s mutability and ordering needs
minimizes unnecessary overhead

Need	Best Choice
Fast append	`list`
Fast front removal	`deque`
Unique items	`set`
Key → value	`dict`
Fixed record	`namedtuple`
Sorted data	`SortedList`
Sorted + unique	`SortedSet`
Sorted keys	`SortedDict`

Session 6 : Data Structures - Computational Complexity

Contents

8. Session 6 : Data Structures - Computational Complexity#

8.1. Session Overview#

8.2. Data structures as mappings#

8.2.1. Examples#

8.2.2. Operations and abstract interfaces#

8.2.3. Computational complexity as the defining trade-off#

8.2.4. Examples of trade-offs#

8.2.5. Summary#

8.3. Computational Complexity#

8.3.1. Asymptotic analysis#

8.3.1.1. Big-O notation (upper bounds)#

8.3.1.2. Big-Ω notation (lower bounds)#

8.3.1.3. Big-Θ notation (tight bounds)#

8.3.1.4. Overview#

8.3.2. Best case, worst case, and average case#

8.3.2.1. Worst-case complexity#

8.3.2.2. Best-case complexity#

8.3.2.3. Average-case complexity#

8.3.2.4. Amortised analysis: averaging over sequences, not inputs#

8.4. Computational Complexity of Common Python Data Structures#

8.4.1. list#

8.4.2. dict#

8.4.3. set#

8.4.4. deque#

8.4.5. namedtuple#

8.4.6. SortedList#

8.4.7. SortedDict#

8.4.8. SortedSet#

8.5. Exercises#

8.5.1. list#

8.5.1.1. Key ideas#

8.5.1.2. Exercise 1 - Indexing and slicing#

8.5.1.3. Exercise 2: Mutation#

8.5.1.4. Exercise 3: Performance intuition#

8.5.1.5. Use cases#

8.5.2. tuple#

8.5.2.1. Key ideas#

8.5.2.2. Exercise 1: Immutability#

8.5.2.3. Exercise 2: Tuple unpacking#

8.5.2.4. Exercise 3: Tuples as keys#

8.5.2.5. Use cases#

8.5.3. set#

8.5.4. Key ideas#

8.5.4.1. Exercise 3: Membership timing#

8.5.4.2. Use cases#

8.5.5. dict#

8.5.5.1. Key ideas#

8.5.5.2. Exercise 1: Basic operations#

8.5.5.3. Exercise 2: Iteration#

8.5.5.4. Exercise 3: Dictionary views#

8.5.5.5. Use cases#

8.5.6. str#

8.5.6.1. Key ideas#

8.5.6.2. Exercise 1: Indexing and slicing#

8.5.6.3. Exercise 2: String methods#

8.5.6.4. Exercise 3: Immutability#

8.5.6.5. Use cases#

8.5.7. deque#

8.5.7.1. Key ideas#

8.5.7.2. Exercise 1: Basic operations#

8.5.7.3. Exercise 2: Sliding window#

8.5.7.4. Exercise 3: Compare with list#

8.5.7.5. Use cases#

8.5.8. namedtuple#

8.5.8.1. Key ideas#

8.5.8.2. Exercise 1: Creation#

8.5.8.3. Exercise 2: Immutability#

8.5.8.4. Exercise 3: Comparison#

8.5.8.5. Use cases#

8.5.9. SortedList#

8.5.9.1. Key ideas#

8.5.9.2. Exercise 1: Automatic sorting#

8.5.9.3. Exercise 2: Indexing and slicing#

8.5.9.4. Exercise 3: Comparison#

8.5.9.5. Use cases#

8.5.10. SortedSet#

8.5.10.1. Key ideas#

8.5.10.2. Exercise 1: Deduplication + order#