2. Marshalling data - Rcpp and the STL#

Alternative text

2.1. Toton Marshalling Yard - Nottinghamshire, c1951.#

2.2. Why c++ ?#

c++ is a big and complicated language. There are a large number of reasons for not using it unless you really have to. On the other hand, there are some good reasons for using it.

2.2.1. Exercise#

Can you think of any ?

2.3. Marshalling#

“In computer science, marshalling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different software components. It is typically used when data must be moved between different parts of a computer program or from one program to another.

Marshalling simplifies complex communications, because it allows using composite objects instead of being restricted to primitive objects.”

Derived from the Wikipedia entry for [Marshalling (computer science)](https://en.wikipedia.org/wiki/Marshalling_(computer_science).

2.3.1. Marshalling atomic types with Rcpp#

Different computer systems and languages represent data structures in different ways. For example, in R a list can have names for each entry. In python this would be achieved by using a dictionary (in R, an env can be used as a dictionary in the same way as in python, but only for character valued keys).

Between R and c++ the basic types are quite similar. However, there are a number of automatic conversion rules that take place. The following examples highlights this.

library(Rcpp)
marshalling_code <- '
#include "Rcpp.h"
using namespace Rcpp;

#include <string>
#include <iostream>
#include <complex>


double marshall_double(const double& X)
{
    Rcout << X << std::endl;
    double Y {3.14};
    return Y;
}

int marshall_integer(const int& X)
{
    Rcout << X << std::endl;
    int Y {3};
    return Y;
}

bool marshall_logical(const bool& X)
{
    Rcout << X << std::endl;
    bool Y = false;
    return Y;
}

char marshall_character(const char& X)
{
    Rcout << X << std::endl;
    char Y = \'A\';
    return Y;
}

std::complex<double> marshall_complex(const std::complex<double>& X)
{
    Rcout << X <<  std::endl;
    std::complex<double> Y {3.14,7.2};
    return Y;
}

RCPP_MODULE(marshalling) 
{
function("rcpp_marshall_double", &marshall_double);
function("rcpp_marshall_integer", &marshall_integer);
function("rcpp_marshall_logical", &marshall_logical);
function("rcpp_marshall_character", &marshall_character);
function("rcpp_marshall_complex", &marshall_complex);
}
'
sourceCpp(code = marshalling_code)

2.3.1.1. Marshall double from R to C++#

X <- 3.14
Y <- rcpp_marshall_double(X)
Y <- rcpp_marshall_integer(X)
Y <- rcpp_marshall_logical(X)
# Y <- rcpp_marshall_character(X) - fails
Y <- rcpp_marshall_complex(X)
3.14
3
1
(3.14,0)

2.3.1.2. Marshall integer from R to C++#

X <- 7L
Y <- rcpp_marshall_double(X)
Y <- rcpp_marshall_integer(X)
Y <- rcpp_marshall_logical(X)
# Y <- rcpp_marshall_character(X) - fails
Y <- rcpp_marshall_complex(X)
7
7
1
(7,0)

2.3.1.3. Marshall logical from R to C++#

X <- TRUE
Y <- rcpp_marshall_double(X)
Y <- rcpp_marshall_integer(X)
Y <- rcpp_marshall_logical(X)
# Y <- rcpp_marshall_character(X) - fails
Y <- rcpp_marshall_complex(X)
1
1
1
(1,0)

2.3.1.4. Marshall complex from R to C++#

X <- 1.3+2.1i
Y <- rcpp_marshall_double(X)
Y <- rcpp_marshall_integer(X)
Y <- rcpp_marshall_logical(X)
# Y <- rcpp_marshall_character(X) - fails
Y <- rcpp_marshall_complex(X)
Y <- rcpp_marshall_complex(X)
Warning message in rcpp_marshall_double(X):
“imaginary parts discarded in coercion”
1.3
Warning message in rcpp_marshall_integer(X):
“imaginary parts discarded in coercion”
1
1
(1.3,2.1)
(1.3,2.1)

2.3.1.5. Marshall Character from R to C++#

X <- 'a'
# Y <- rcpp_marshall_double(X) - fails
# Y <- rcpp_marshall_integer(X) - fails
# Y <- rcpp_marshall_logical(X) - fails
Y <- rcpp_marshall_character(X)
# Y <- rcpp_marshall_complex(X) - fails

X <- "123"
# Y <- rcpp_marshall_double(X) - fails
# Y <- rcpp_marshall_integer(X) - fails
# Y <- rcpp_marshall_logical(X) - fails
Y <- rcpp_marshall_character(X)
# Y <- rcpp_marshall_complex(X) - fails
a
1

2.3.1.6. Marshall from R to C++#

X <- 3.14 
Y <- rcpp_marshall_double(X)
cat(typeof(Y),'\n',sep="")
X <- 3L
Y <- rcpp_marshall_integer(X)
cat(typeof(Y),'\n',sep="")
X <- FALSE 
Y <- rcpp_marshall_logical(X)
cat(typeof(Y),'\n',sep="")
X <- 'a'
Y <- rcpp_marshall_character(X)
cat(typeof(Y),'\n',sep="")
X <- 1.0 + 2.0i
Y <- rcpp_marshall_complex(X)
cat(typeof(Y),'\n',sep="")
3.14
double
3
integer
0
logical
a
character
(1,2)
complex

2.3.2. Marshalling and the STL#

One of the outstanding successes of c++ is the availability (through c++ templates) of “type variables”. Using type variables well typically requires a lot of c++ experience. However, a direct consequence of c++ templates is the Standard Template Library (STL).

The STL provides a rich library of data structures and generic algorithms to operate on these data structures. It provides a compelling reason for using c++ over (most) other programming languages that are amicable for compilation, particularly when using them for scientific computing. Because of the value of the STL, Rcpp would be quite limited if it did not provide a means of “marshalling” R data structures to their STL counterparts, and vice versa. Consequently, a lot of hard work has been done in Rcpp on your behalf. Most of this you never directly see - that’s the good bit about it !!

However, marshalling data structures between different languages and systems always has some limitations.

Here is an example of how Rcpp quite seamlessly marshalls vectors between R and c++.

library(Rcpp)
marshalling_code <- '
#include "Rcpp.h"
using namespace Rcpp;

#include<vector> 

std::vector<double> marshall_vectors(const std::vector<double> X)
{
    auto n = X.size();
    std::vector<double> Y(n);
    for(int i = 0; i < n; i++)
    {
       Y[i] = 3.14*X[i];
    }
    return Y;
}


RCPP_MODULE(marshalling) 
{
function("rcpp_marshall_vectors", &marshall_vectors);
}
'

sourceCpp(code = marshalling_code)

X <- c(1L,2L,3L,4L,5L)
Y <- rcpp_marshall_vectors(X)
Y
  1. 3.14
  2. 6.28
  3. 9.42
  4. 12.56
  5. 15.7

Note that the automatic conversion rules are still at play. Also, the type stored by the STL container (vector in this case), can be any of the atomic types.

2.3.2.1. Exercise#

Check this.

2.3.2.2. Marshalling lists#

Marshalling lists between R and c++ is a bit more complicated. The main reason for this is that R lists are inhomogeneous and STL lists are homogeneous. This inhomogeneity / homogeneity is compounded by the fact that lists are recursive data structures in both R and c++

library(Rcpp)
marshalling_code <- '
#include "Rcpp.h"
using namespace Rcpp;

#include<list> 

std::list<double> marshall_lists(const std::list<std::vector<double> > X)
{
    
    std::list<double> Y;
    for(const auto& x : X)
    {
       Y.push_back(3.14*x[0]);
    }
    return Y;
}


RCPP_MODULE(marshalling) 
{
function("rcpp_marshall_lists", &marshall_lists);
}
'

sourceCpp(code = marshalling_code)

X <- list()
X[[1]] <- 1.2
X[[2]] <- 2.2
X[[3]] <- 3.2
Y <- rcpp_marshall_lists(X)
print(Y)

X[[1]] <- c(1,2,3,4)
Y <- rcpp_marshall_lists(X)
print(Y)
[1]  3.768  6.908 10.048
[1]  3.140  6.908 10.048

2.4. Workshop activity#

Experiment by writing some simple test code to see if it is possible to marshall the following data structures between R and c++ using Rcpp.

R

c++

vector

->

?

list

->

?

dataframe

->

?

matrix

->

?

vector

<-

?

list

<-

?

dataframe

<-

?

matrix

<-

?

Here, the ? indicates an STL structure or composition of two (or more) STL data structures. Think of which STL data structures might be suitable candidates and experiment with it to see if you can get it to work. Work together as a team to cover as many possibilities as you can.