# Chapter 2 Vectorization, *apply and for loops

This section will cover the basics of vectorizations, the `*apply`

family of functions and `for`

loops.

## 2.1 Vectorization

Almost everything in `R`

is a vector. A scalar is really a vector of length 1 and a `data.frame`

is a collection of vectors. An nice feature of is its vectorized capabilities. Vectorization indicates that a function operates on a whole vector of values at the same time and not just on a single value^{1}. If you have have ever taken a basic linear algebra course, this concept will be familiar to you. Take for example two vectors: \[
\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} +
\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} =
\begin{bmatrix} 2 \\ 4 \\ 6 \end{bmatrix}
\] The corresponding `R`

code is given by:

```
a <- c(1, 2, 3)
b <- c(1, 2, 3)
a + b
```

`## [1] 2 4 6`

Many of the `base`

functions in `R`

are already vectorized. Here are some common examples:

```
# generate a sequence of numbers from 1 to 10
(a <- 1:10)
```

`## [1] 1 2 3 4 5 6 7 8 9 10`

```
# sum the numbers from 1 to 10
sum(a)
```

`## [1] 55`

```
# calculate sums of each column
colSums(iris[, -5])
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 876.5 458.6 563.7 179.9
```

Exercise: What happens when you sum two vectors of different lengths?

## 2.2 Family of `*apply`

functions

`apply`

,`lapply`

and`sapply`

are some of the most commonly used class of functions in`R`

`*apply`

functions are not necessarily faster than loops, but can be easier to read (and vice cersa)`apply`

is used when you need to perform an operation on every row or column of a matrix or data.frame`lapply`

and`sapply`

differ in the format of the output. The former returns a list while the ladder returns a vector- There are other
`*apply`

functions such as`tapply`

,`vapply`

and`mapply`

with similar functionality and purpose

### 2.2.1 Loops vs. Apply

```
# Getting the row means of two columns Generate data
N <- 10000
x1 <- runif(N)
x2 <- runif(N)
d <- as.data.frame(cbind(x1, x2))
head(d)
```

```
## x1 x2
## 1 0.93196866 0.81751342
## 2 0.14861694 0.47933846
## 3 0.64465639 0.09915633
## 4 0.31383613 0.38192113
## 5 0.28983386 0.42311260
## 6 0.09529535 0.49011556
```

```
# Loop: create a vector to store the results in
rowMeanFor <- vector("double", N)
for (i in seq_len(N)) {
rowMeanFor[[i]] <- mean(c(d[i, 1], d[i, 2]))
}
# Apply:
rowMeanApply <- apply(d, 1, mean)
# are the results equal
all.equal(rowMeanFor, rowMeanApply)
```

`## [1] TRUE`

### 2.2.2 Descriptive Statistics using `*apply`

```
data(women)
# data structure
str(women)
```

```
## 'data.frame': 15 obs. of 2 variables:
## $ height: num 58 59 60 61 62 63 64 65 66 67 ...
## $ weight: num 115 117 120 123 126 129 132 135 139 142 ...
```

```
# calculate the mean for each column
apply(women, 2, mean)
```

```
## height weight
## 65.0000 136.7333
```

```
# apply 'fivenum' function to each column
vapply(women, fivenum, c(Min. = 0, `1st Qu.` = 0, Median = 0, `3rd Qu.` = 0,
Max. = 0))
```

```
## height weight
## Min. 58.0 115.0
## 1st Qu. 61.5 124.5
## Median 65.0 135.0
## 3rd Qu. 68.5 148.0
## Max. 72.0 164.0
```

### 2.2.3 Creating new columns using `sapply`

You can apply a *user defined function* to columns or the entire data frame:

```
# the ouput of sapply is a vector the 's' in sapply stands for 'simplified'
# apply
mtcars$gear2 <- sapply(mtcars$gear, function(i) if (i == 4) "alot" else "some")
head(mtcars)[, c("gear", "gear2")]
```

```
## gear gear2
## Mazda RX4 4 alot
## Mazda RX4 Wag 4 alot
## Datsun 710 4 alot
## Hornet 4 Drive 3 some
## Hornet Sportabout 3 some
## Valiant 3 some
```

### 2.2.4 Applying functions to subsets using `tapply`

```
# Fisher's famous dataset
data(iris)
str(iris)
```

```
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
```

```
# mean sepal length by species
tapply(iris$Sepal.Length, iris$Species, mean)
```

```
## setosa versicolor virginica
## 5.006 5.936 6.588
```

### 2.2.5 Nested for loops using `mapply`

`mapply`

is my favorite `base`

`R`

function and here are some reasons why:

- Using
`mapply`

is equivalent to writing nested`for`

loops except that it is 100% more human readable and less prone to errors - It is an effective way of conducting simulations because it iterates of many arguments

Let’s say you want to generate random samples from a normal distribution with varying means and standard deviations. Of course the brute force way would be to write out the command once, copy paste as many times as you want, and then manually change the arguments for `mean`

and `sd`

in the `rnorm`

function as so:

```
v1 <- rnorm(100, mean = 5, sd = 1)
v2 <- rnorm(100, mean = 10, sd = 5)
v3 <- rnorm(100, mean = -3, sd = 10)
```

This isn’t too bad for three vectors. But what if you want to generate many more combinations of means and sds ? Furthermore, how can you keep track of the parameters you used? Now lets consider the `mapply`

function:

```
means <- c(5, 10, -3)
sds <- c(1, 5, 10)
# MoreArgs is a list of arguments that dont change
randomNormals <- mapply(rnorm, mean = means, sd = sds, MoreArgs = list(n = 100))
head(randomNormals)
```

```
## [,1] [,2] [,3]
## [1,] 5.400492 3.606588 -10.544957
## [2,] 4.025367 4.395509 1.248023
## [3,] 5.001900 8.994643 -10.234892
## [4,] 5.004534 2.210005 -10.172234
## [5,] 4.004708 5.368140 -6.539932
## [6,] 4.478162 14.107530 6.502228
```

The following diagram (from r4ds) describes exactly what is going on in the above function call to `mapply`

:

Advantages:

- Result is automatically stored in a matrix
- The parameters are also saved in
`R`

objects so that they can be easily manipulated and/or recovered

Consider a more complex scenario where you want to consider many possible combinations of means and sds. We take advantage of the `expand.grid`

function to create a `data.frame`

of simulation parameters:

```
simParams <- expand.grid(means = 1:10, sds = 1:10)
randomNormals <- mapply(rnorm, mean = simParams$means, sd = simParams$sds, MoreArgs = list(n = 100))
dim(randomNormals)
```

`## [1] 100 100`

## 2.3 Creating dynamic documents with `mapply`

`mapply`

together with the `rmarkdown`

package (Allaire et al. 2016) can be very useful to create dynamic documents for exploratory analysis. We illustrate this using the Motor Trend Car Road Tests data which comes pre-loaded in `R`

.

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Copy the code below in a file called `mapplyRmarkdown.Rmd`

:

Copy the code below in a file called `boxplotTemplate`

: