9. Plotting with ggplot - the plotnine package#

ggplot is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

To use ggplot in python you first have to install the plotnine package.

9.1. Installing plotnine#

! python -m pip install plotnine
Requirement already satisfied: plotnine in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (0.9.0)
Requirement already satisfied: mizani>=0.7.3 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (0.8.1)
Requirement already satisfied: scipy>=1.5.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (1.9.1)
Requirement already satisfied: numpy>=1.19.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (1.23.2)
Requirement already satisfied: matplotlib>=3.5.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (3.5.3)
Requirement already satisfied: patsy>=0.5.1 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (0.5.2)
Requirement already satisfied: pandas>=1.3.5 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (1.4.4)
Requirement already satisfied: statsmodels>=0.13.2 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from plotnine) (0.13.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (1.4.4)
Requirement already satisfied: pyparsing>=2.2.1 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (3.0.9)
Requirement already satisfied: pillow>=6.2.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (9.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (0.11.0)
Requirement already satisfied: packaging>=20.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (21.3)
Requirement already satisfied: fonttools>=4.22.0 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from matplotlib>=3.5.0->plotnine) (4.37.1)
Requirement already satisfied: palettable in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from mizani>=0.7.3->plotnine) (3.3.0)
Requirement already satisfied: pytz>=2020.1 in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from pandas>=1.3.5->plotnine) (2022.2.1)
Requirement already satisfied: six in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (from patsy>=0.5.1->plotnine) (1.16.0)
WARNING: You are using pip version 21.2.4; however, version 23.3.2 is available.
You should consider upgrading via the '/home/grosedj/work/work-env/env/bin/python -m pip install --upgrade pip' command.

9.2. Importing plotnine#

import plotnine

9.3. Exercise 1#

Install and import plotnine.

ggplot works very well with pandas data frames. The column headerfs can be used to clearly specify details of the plot.

import pandas as pd
import numpy as np
X = np.arange(0,10,0.01)
Y = np.random.normal(X,0.5)
df = pd.DataFrame({"X" : X,"Y" : Y})
df
X Y
0 0.00 0.554466
1 0.01 -0.092952
2 0.02 0.470739
3 0.03 0.209256
4 0.04 -0.359167
... ... ...
995 9.95 10.277712
996 9.96 9.783064
997 9.97 10.037314
998 9.98 10.042017
999 9.99 10.077377

1000 rows × 2 columns

9.4. Example 1 - an empty plot#

9.4.1. import the things we need#

from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap

9.4.2. create a plot#

p = ggplot(df,aes(x=X,y=Y))
#### have a look 
print(p)
_images/ggplot-plotnine_16_0.png

All this has done is create a plot with some data and specified an aesthetic using aes. In this case, the aesthetic associates the x axis with the X data and the y axis with the Y data. The plot exists and can be displayed. Notice that the plot is assigned to a variable. This is useful because the plot can now be modified through the variable (many plotting facilities do not support this) and thus allows you to program with plots.

Notice also that to display the plot you can use the print function.

Features are added to the plot using layers. A plot can have many layers.

9.5. Example 2 - add some points#

p1 = p + geom_point()
print(p1)
_images/ggplot-plotnine_19_0.png

Notice how a new plot p1 was created from the original plot p using +.

9.6. Example 3 - add some lines#

from plotnine import geom_line
p2 = p + geom_line()
print(p2)
_images/ggplot-plotnine_23_0.png

9.7. Exercise 2#

Create a plot with both points and lines.

p3 = p + geom_line() + geom_point()
print(p3)
_images/ggplot-plotnine_25_0.png

9.8. Customising appearance#

Each layer can be customised in a way appropriate for that layer.

9.8.1. Example 4 - Customise the lines#

p2 = p + geom_line(color="red")
print(p2)
_images/ggplot-plotnine_29_0.png

9.8.2. Exercise 3#

Experiment with the following code to change the appearance of the plot. The following is a good source of information.

# p2 = p + geom_line(color="red",size=2,alpha=0.3,linetype=3)
p2 = p + geom_line(color="red",size=2,alpha=0.3,linetype="solid")
print(p2)
_images/ggplot-plotnine_31_0.png

9.9. Themes#

The “style” of a plot can be modified and controlled through the use of themes.

9.9.1. Example 5 - A black and white theme.#

from plotnine.themes import theme_bw
p2 = p2 + theme_bw()
print(p2)
_images/ggplot-plotnine_35_0.png

There are many themes to choose from. You can find out about them here.

9.9.2. Exercise 4#

Experiment with some of the themes documented on the plotnine website

9.10. Labels#

It is easy to add labels to a plot

9.10.1. Example 6 - Adding Labels#

from plotnine.labels import labs
p2 = p2 + labs(
    title = "Main Title",
    subtitle = "a subtitle",
    caption = "A caption",
    tag = "A tag",
    x = "x-axis",
    y = "y-axis"
  )
print(p2)
---------------------------------------------------------------------------
PlotnineError                             Traceback (most recent call last)
Cell In [16], line 2
      1 from plotnine.labels import labs
----> 2 p2 = p2 + labs(
      3     title = "Main Title",
      4     subtitle = "a subtitle",
      5     caption = "A caption",
      6     tag = "A tag",
      7     x = "x-axis",
      8     y = "y-axis"
      9   )
     10 print(p2)

File ~/work/work-env/env/lib/python3.9/site-packages/plotnine/labels.py:25, in labs.__init__(self, **kwargs)
     23 unknown = kwargs.keys() - VALID_LABELS
     24 if unknown:
---> 25     raise PlotnineError(
     26         f"Cannot deal with these labels: {unknown}"
     27     )
     28 self.labels = rename_aesthetics(kwargs)

PlotnineError: "Cannot deal with these labels: {'subtitle', 'tag'}"

9.11. More aesthetics#

It is possible to add aesthetics to a layer. You can think of an aesthetic as a mapping between the data and features of the layer. For example, you can make the colour change with the data.

9.11.1. Example 7 - Colour as a function of value#

p3 = p + geom_point(aes(color=Y))
print(p3)
_images/ggplot-plotnine_45_0.png

9.11.2. Example 8 - Transparency as a function of value#

p3 = p + geom_point(aes(alpha=Y))
print(p3)
_images/ggplot-plotnine_47_0.png

9.11.3. Example 9 - Smoothing the data#

from plotnine import geom_smooth
p3 = p + geom_point(aes(alpha=Y)) + geom_smooth()
print(p3)
_images/ggplot-plotnine_49_0.png

9.12. Plot types#

There are lots of different plot types. A good way to find out about them is look through the some of the online galleries. A good starting point for a wide range of examples with code and data is the R graph library. I have picked a few and added them as examples.

#### data
xa = np.random.normal(np.ones(20000)*10,1.2)
xb = np.random.normal(np.ones(20000)*14.5,1.2)
xc = np.random.normal(np.ones(20000)*9.5,1.2)
X = np.hstack([xa,xb,xc])
ya = np.random.normal(np.ones(20000)*10,1.2)
yb = np.random.normal(np.ones(20000)*14.5,1.2)
yc = np.random.normal(np.ones(20000)*15.5,1.2)
Y = np.hstack([ya,yb,yc])

df = pd.DataFrame({"X" : X,"Y" : Y})

df
X Y
0 11.249070 10.135695
1 9.667877 8.102466
2 11.291468 9.986724
3 9.363721 10.621342
4 9.639383 10.266212
... ... ...
59995 9.102634 16.842442
59996 9.555181 13.630412
59997 8.158613 14.854846
59998 7.458265 14.704807
59999 10.410255 15.411592

60000 rows × 2 columns

9.12.1. Example 10 - Simple contour plot#

from plotnine import geom_density_2d
p = ggplot(df,aes(x=X,y=Y))
p1 = p + geom_density_2d()
print(p1)
_images/ggplot-plotnine_55_0.png

from plotnine import stat_density_2d
#p2 <- p + stat_density_2d(aes(fill='..level..'), geom='polygon')
p2 = p + stat_density_2d(aes(fill='..level..'),geom='polygon')
print(p2)
_images/ggplot-plotnine_56_0.png

from plotnine import scale_x_continuous,scale_y_continuous
from plotnine.themes.themeable import legend_position
from plotnine.themes import theme
p4 = p + stat_density_2d(aes(fill='..density..'), geom='raster', contour=False)
p4 = p4 + scale_x_continuous(expand = (0, 0))
p4 = p4 + scale_y_continuous(expand = (0, 0)) 
p4 = p4 +  theme(legend_position='none')
print(p4)
_images/ggplot-plotnine_57_0.png

blob = dict()
blob[(1,2,3,4)] = 6