9. Lecture 4b - Packaging#
9.1. Background#
9.1.1. What#
A software package is a collection of software related components, such as source code, data, documentation, and so on, stored on a file system. The collection is organised in a specific such that the type of component,and how the components relate to each other, can be understood.
A package also has metadata which describes the package in terms of its functionality and curation. This data is often human readable, but is usually also designed to be processed by other software. This enables tools to be designed and used for managing common and often complicated workflows.
When a packaging structure and its metadata are combined with an associated tool set the result is usually referred to as a software packaging system.
9.1.2. Why#
Grouping logically connected components in a manner that reflects how they are related to each other facilitates their distribution, installation, re-use and maintenance. It also facilitates the use of standardised tools to manage a wide variety of (often complicated) work flows e.g.
re-running software with multiple dependencies across a range heterogeneous environments
re-using software components in other software
running tests
reproducing studies
debugging
9.1.3. How#
Software packages are collections of files. As such, they can be created and managed using standard software (terminals, shell script, editors, file managers etc). However, popular package formats are often integrated as extensions into tools such as version management systems and integrated development environments (e.g. RStudio, VScode etc).
9.1.4. When#
Best practice suggests that some form of packaging should be adopted at the outset of any software related project when
their are multiple users / stakeholders
version control is required
unit tests are used
software is to be distributed
software is to be re-used
9.2. The Basics - Python#
The following diagram shows the organisation of the files in a small example python package.

As an exercise, we are going to create and use this package from scratch. The function choH is defined in a notebook which also shows how it is used. The notebook is provided in the Appendix under the heading Appendix 1 - choH (python version)
9.2.1. Source code#
The methods that the package provides need to be available in files (not a notebook), so the first task os to create the choH.py file with the choH function in it. Note that the code usually imports other modules that it is dependent on.
9.2.1.1. Exercise#
Creat a python script with the the choH function in it.
9.2.2. Modules#
A python package has one or more modules. Each module is associated with a directory the name of which defines the name of the module. Primarily, modules contain code, but they can also have data and documentation, and other modules (sub-module ?).
9.2.2.1. __init__.py#
Each module should have an __init__.py file - even if it is emtpy (many python systems rely on the presence of this file to determine that the files in the associated directory constitute a module). The __init__.py can contain lots of information, including (but not limited to), documentation, references, module level global names, and so on. Importantly, it is used to load the python files in this (and/or other) modules. The mechanism is the same as is used for importing modules into a python script
9.2.2.2. Exercise#
9.2.3. Project file#
There are a number of ways of specifying the package metadata for a python package. One such way is via a pyproject.toml file. See here for more information on toml files and here for domain specific information about the pyproject.toml file.
Here is a basic template for the pyproject.toml file.
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "<project-name>"
version = "<version>"
maintainers = [
{ name="Daniel Grose", email="dan.grose@lancaster.ac.uk" },
]
authors = [
{ name="Daniel Grose", email="dan.grose@lancaster.ac.uk"},
]
description = "<description>"
readme = "README.md"
requires-python = "<version>"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Operating System :: OS Independent",
]
dependencies = [
"<dependency-1>",
"<dependency-2>"
]
[project.urls]
homepage = "<website URL>"
There are lots of good tools to help you generate and maintain a pyproject.toml file. Details for some of these can be found here.
9.2.3.1. Exercise#
Create a pyproject.toml file for the choH package
9.2.4. Installation#
There a multiple tools to help you install and use python packages, for example pip and conda.
9.2.4.1. Install a local package using pip#
Assume that the file structure shown in the figure above is located in path_to_package, then this will install the package
python -m pip install <path_to_package>
9.2.4.2. Exercise#
Determine the path_to_package for your choH package and install it use pip.
9.2.5. Install a package from github#
pip has the facility to install from a wide range of different sources. One such source is github. Imagine you have set up the following repository using your choH package.

The package can be installed from this github repository using
python -m pip install 'git+https://github.com/grosed/choH'
9.2.6. Removing a package using pip#
Once a package has been installed it cab be removed using pip
python -m pip uninstall <package-name>
9.2.7. Using the package#
Once the package has been installed with pip it can be used in a python script by importing it.
import choH
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(0)
X = [float(x) for x in list(numpy.random.normal(0,1,1000)) + list(numpy.random.normal(0.3,1,1000))]
plt.plot(X)
plt.plot(choH.choH(X))
9.2.7.1. Exercise#
Try running the above script on your own system
Why does the above example use choH.choH ?
9.2.7.2. Exercise#
Add a README.md file to your choH repository
Can you add the above example to your repository ?
9.3. The Basics - R#
The following diagram shows the organisation of the files in a small example R package.

As an exercise, we are going to create and use this package from scratch. The function choH is defined in a notebook which also shows how it is used. The notebook is provided in the Appendix under the heading Appendix 2 - choH (R version)
9.3.1. Source code#
The methods that the package provides need to be available in files (not a notebook), so the first task os to create the choH.R file with the choH function in it. Note that the code does not use he library function to load any of its dependencies.
9.3.1.1. Exercise#
Create a R script with the the choH function in it.
9.3.2. Exercise#
The source code for the package resides in a directory named R. Create this directory and add your choH.R file to it.
9.3.3. NAMESPACE#
The NAMESPACE file imports any dependencies that your R code requires and exports the R functions you want expose from your package. Here is a template for a basic NAMESPACE file.
import(<package-name>)
import(<method-name>)
import(<data-name>)
export(<method-name>)
export(<data-name>)
The file can import whole libraries, methods, methods form other libraries, data, and even documentation from other libraries. There can be multiple import directives.
The file can export methods and data. There can be multiple export directives.
9.3.3.1. Exercise#
Create a NAMESPACE file for your choH package.
9.3.4. DESCRIPTION#
The DESCRIPTION file describes the package. This description includes details about authors, the purpose of the package, its dependencies. It is human readable but can be processed by various tools available in most common R environments.
Here is a template for the DESCRIPTION file.
Package: <package-name>
Type: Package
Title: <top-level-description-of-package>
Version: <version-number>
Date: <yyyy-mm-dd>
Authors@R: c(person("Daniel","Grose",email="dan.grose@lancaster.ac.uk",role=c("aut","cre")))
Description: <more-detailed-description>
License: GPL
Imports: <imported-packages>
LinkingTo:
Depends: R (>= 3.5.0)
NeedsCompilation: no
Suggests:
9.3.4.1. Exercise#
Create a DESCRIPTION file for your choH package.
9.3.5. Installing#
There a multiple tools to help you install and use R packages. The most fundamental tool is R itself.
9.3.5.1. Install a local package using R#
There are multiple ways of using R to install a package from a wide variety of sources. Some of these (but not all) automate the installation of dependencies. A versatile tool for installing a package and all of its dependencies, from local file systems and remote systems, such as github, is remotes.
If you do not already have the remotes package installed, you can install from within an R session using
Assume that the file structure shown in the figure above is located in path_to_package, then this will install the package
install.packages( "remotes" )
9.3.5.2. Local installation#
Assume that the file structure shown in the figure above is located in path_to_package, then this will install the package using the remotes package
library(remotes)
remotes::instal_local( <path_to_package> )
9.3.5.3. Exercise#
Determine the path_to_package for your choH package and install it use R.
9.3.5.4. Install a package from github using R#
There are various libraries available for extending base R functionality so that you can install libraries directly from various sources, including github.
Imagine you have set up the following repository using your choH package.

This can be installed using the remotes package as follows
library(remotes)
remotes::install_github( "grosed/choH" )
9.3.5.5. Exercise#
Check that remotes is installed in your current R environment. If not, install it.
Install a version (yours or someone elses) from github usingremotes.
9.3.6. Using the package#
Once the library is installed it can be imported and used in other R scripts using the library function. All of the methods you exported using the NAMESPACE file will become available for use.
library(choH)
library(purrr)
X <- c(rnorm(1000,0,1),rnorm(1000,0.3,1))
X %>% choH %>% as.numeric %>% plot
9.3.6.1. Exercise#
Install your choH package from github and test the above example.
The choH package imports purrr. Why do you think it is necessary to import in the local R script ?