I’ve recently been spending a lot of time running various simulations in **R**. Because I often use snow to perform simulations across several computers/cores, results typically come back in the form of a list object. Summarizing the results from a list is simple enough using a `for`

-loop, but it’s much “sexier” to use a functional style of programming that takes advantage of higher order functions or the `*apply`

-family of functions (**R** is, after all, a functional language at its core). But what’s the performance cost of using, for instance, `lapply`

with a large list, particularly when the results from `lapply`

have to be further manipulated to get them into a useful form? I recently did some performance testing on three of the `*apply`

functions and compared them to an equivalent `for`

-loop approach. The results follow.

First, I set up a large list with some fake data. Here I make a list, `Sims`

, containing 30000 \(5 \times 5\) matrices. (This is a data structure similar to one I recently found myself analyzing.)

set.seed(12345)
S <- 30000
Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S)

Now I define four functions that extract the fifth column of each matrix in `Sims`

and create a \(30000 \times 5 \) matrix of results.

for_method <- function(S, Sims) {
R <- matrix(NA, ncol=5, nrow=S)
for (i in 1:S) R[i,] <- Sims[[i]][,5]
R
}
sapply_method <- function(Sims) {
t(sapply(Sims, function(x) x[,5]))
}
lapply_method <- function(Sims) {
do.call(rbind, lapply(Sims, function(x) x[,5]))
}
vapply_method <- function(Sims) {
t(vapply(Sims, function(x) x[,5], rep(0, 5)))
}

Notice that the `for_method`

first creates a container matrix and inserts the results directly into it. Using something like `rbind`

would force **R** to copy the matrix in each for-iteration, which is very inefficient. Also notice that in each of the apply methods, the result has to be manipulated before it is returned. It’s often possible to eliminate this step during the initial simulation process. But if you’re like me, you sometimes start your simulations before you realize what you intend on doing with the results.

Now for some timing results:

> system.time(replicate(20, R.1 <<- for_method(S, Sims)))
user system elapsed
4.340 0.072 4.414
> system.time(replicate(20, R.2 <<- sapply_method(Sims)))
user system elapsed
3.452 0.076 3.528
> system.time(replicate(20, R.3 <<- lapply_method(Sims)))
user system elapsed
4.393 0.044 4.433
> system.time(replicate(20, R.4 <<- vapply_method(Sims)))
user system elapsed
2.588 0.040 2.628
> all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4))
[1] TRUE

These results surprised me—I never expected `vapply`

to be so much quicker than the other three methods. This may have to do with that fact that, since `vapply`

requires you to explicitly specify the resultant data structure, **R** is able allocate the memory ahead of time.

As a simple extension, I thought I would retest taking advantage of byte-compiled code:

library(compiler)
for_method <- cmpfun(for_method)
sapply_method <- cmpfun(sapply_method)
lapply_method <- cmpfun(lapply_method)
vapply_method <- cmpfun(vapply_method)

And here are the results:

> system.time(replicate(10, R.1 <<- for_method(S, Sims)))
user system elapsed
2.084 0.024 2.110
> system.time(replicate(10, R.2 <<- sapply_method(Sims)))
user system elapsed
1.624 0.016 1.642
> system.time(replicate(10, R.3 <<- lapply_method(Sims)))
user system elapsed
2.152 0.012 2.164
> system.time(replicate(10, R.4 <<- vapply_method(Sims)))
user system elapsed
1.204 0.004 1.207
> all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4))
[1] TRUE

These results are also surprising. I really thought byte-compiling the functions would make the results much closer (doesn’t each function compile into what is effectively a `for`

-loop?). However, in this simple example at least, that isn’t the case. `vapply`

still has, by a significant margin, the best performance. `sapply`

also performs noticeably better than `lapply`

and the `for`

-loop, which is an important result since `sapply`

doesn’t require a pre-specified data structure. Assuming they hold in the presence of more complicated data structures (and I don’t see what they wouldn’t), these results seem to suggest that a functional approach can provide benefits in terms of coding efficiency and run-time performance.

I welcome any feedback or suggestions on how these tests could be improved or extended.

### Update

A couple things have come up in the comments. First, Henrik provided a more elegant (and efficient) version of `vapply_method`

:

vapply_method2 <- function(Sims) {
t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5))
}

Second, several people have reported that the `for_method`

on their system is just as fast, if not faster, than the `vapply_method`

when compiled. I have retested my results, adding Henrik's `vapply_method2`

. Here are the results for compiled code (note that I have increased the number of replications):

> library(compiler)
> set.seed(12345)
> S <- 30000
> Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S)
>
> for_method <- function(S, Sims) {
+ R <- matrix(NA, ncol=5, nrow=S)
+ for (i in 1:S) R[i,] <- Sims[[i]][,5]
+ R
+ }
>
> sapply_method <- function(Sims) {
+ t(sapply(Sims, function(x) x[,5]))
+ }
>
> lapply_method <- function(Sims) {
+ do.call(rbind, lapply(Sims, function(x) x[,5]))
+ }
>
> vapply_method <- function(Sims) {
+ t(vapply(Sims, function(x) x[,5], rep(0, 5)))
+ }
>
> vapply_method2 <- function(Sims) {
+ t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5))
+ }
> for_method <- cmpfun(for_method)
> sapply_method <- cmpfun(sapply_method)
> lapply_method <- cmpfun(lapply_method)
> vapply_method <- cmpfun(vapply_method)
> vapply_method2 <- cmpfun(vapply_method2)
> system.time(replicate(50, R.1 <<- for_method(S, Sims)))
user system elapsed
6.052 0.136 6.191
> system.time(replicate(50, R.2 <<- sapply_method(Sims)))
user system elapsed
8.361 0.116 8.474
> system.time(replicate(50, R.3 <<- lapply_method(Sims)))
user system elapsed
10.548 0.144 10.692
> system.time(replicate(50, R.4 <<- vapply_method(Sims)))
user system elapsed
5.813 0.124 5.936
> system.time(replicate(50, R.5 <<- vapply_method2(Sims)))
user system elapsed
4.268 0.084 4.352

For whatever reason, the `vapply`

methods remain the fastest on my system (particularly Henrik’s version). I also tested on another Linux machine and got the same results (I have not yet tested on Windows). Of course, this is a very simple example, so the performance differences may not hold up to the increasing complexity of the input function. I’d be interested to know if anyone has an idea as to why `vapply`

would perform better in Linux than in Windows/Mac. A library thing?