Chapter 2 Arrays

In R, arrays are commonly called “vectors”. R likes to be special.

2.1 Everything is an array

In R, even single values are arrays. That’s why you see [1] in front of results: even single values are the first item in an array of length one.

2.2 Creation

c() is some sort of legacy nonsense from the S language. I think it means character array even though it can hold things other than characters.

I pronounce it “CAW”. Like the sound a crow makes.

Simple array

c(8, 6, 7, 5)
#> [1] 8 6 7 5

For multiple types, R converts elements to the most complex type (usually a string). For a real multi-typed collection, see lists

c(9, 'hello', 7)
#> [1] "9"     "hello" "7"

2.3 Array generators

R has a cultural fear of complete words. Many terms are shortcuts or acronyms.

Repeat

rep(0, 4)
#> [1] 0 0 0 0
rep(c(1,2,3), 4) # repeate the whole array
#>  [1] 1 2 3 1 2 3 1 2 3 1 2 3
rep(c(1,2,3), each=4) # repeat each item in the array before moving to the next
#>  [1] 1 1 1 1 2 2 2 2 3 3 3 3

Sequence

#increment by 1
4:10
#> [1]  4  5  6  7  8  9 10
#increment by any other value
seq(from=10, to=50, by=5)
#> [1] 10 15 20 25 30 35 40 45 50

Randomly sample from a given distribution

# uniform distribution (not 'run if')
runif(n=5, min=0, max=1)
#> [1] 0.1594836 0.4781883 0.7647987 0.7696877 0.2685485
# normal distribution
rnorm(n=5, mean=0, sd=1)
#> [1]  0.4483395  1.0208067 -0.1378989  0.2103863 -0.6428271

2.4 Combining arrays

An array made up of smaller arrays concatenates them. R doesn’t seem to allow for an array of arrays.

x = 1:3
y = c(10, 11) # arrays of arrays get flattened
z = 500

c(x, y, z)
#> [1]   1   2   3  10  11 500

Note: z is technically an array of length 1

Collapse an array into a string

paste(1:5, collapse=", ")
#> [1] "1, 2, 3, 4, 5"

2.5 Indexing

a = 10:20

Get the first value - Indices start at 1, not 0

a[1]
#> [1] 10

2nd and 6th values

a[c(2,6)]
#> [1] 11 15

Exclude the 2nd and 6th values

a[c(-2,-6)]
#> [1] 10 12 13 14 16 17 18 19 20

Range of values

a[2:6]
#> [1] 11 12 13 14 15

Any order or number of repetitions

a[c(2, 4, 6, 6, 6)]
#> [1] 11 13 15 15 15

Specify values to keep or drop using booleans (keep this in mind for the “Array operations” section)

a[c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)]
#> [1] 10 12 14 16 18 20

2.6 Sampling from an Array

Randomly sample from an array. Elements may repeat.

sample(1:3, size=10, replace=TRUE)
#>  [1] 1 1 2 3 2 2 2 2 3 1

replace means “sample with replacement”, so an element can be sampled more than once

Sample without replacement. Elements will not repeat.

sample(1:5, size=4, replace=FALSE)
#> [1] 4 3 1 5

Shuffle the order of an array

sample(a, size=length(a), replace=FALSE)
#>  [1] 15 13 17 20 18 10 12 16 14 11 19

Make sure you have enough elements

sample(1:5, size=10, replace=FALSE)
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'

2.7 Array constants

The letters and LETTERS constants hold lower and upper case letters

letters[1:5]
#> [1] "a" "b" "c" "d" "e"
LETTERS[1:5]
#> [1] "A" "B" "C" "D" "E"

2.8 Array operations

Many functions in R are vectorized, so they apply to arrays.

a * 2
#>  [1] 20 22 24 26 28 30 32 34 36 38 40

Compare individual elements

a > 15
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Compare each element across arrays

a == c(10, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20)
#>  [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Select elements using boolean array

a[a>15]
#> [1] 16 17 18 19 20

You can perform operations on the elements of two arrays even if they are different sizes. The smaller one wraps around.

a = 1:5
b = rep(1, 8)
a + b
#> Warning in a + b: longer object length is not a multiple of shorter object
#> length
#> [1] 2 3 4 5 6 2 3 4

2.9 Array functions

Length

length(20:50)
#> [1] 31

Reverse

rev(1:5)
#> [1] 5 4 3 2 1

Math

sum(1:5)
#> [1] 15
min(1:5)
#> [1] 1
max(1:5)
#> [1] 5

min() and max() are not vectorized. They only return one value.

max(1:5, 11:15, 21:25)
#> [1] 25

For a vectorized min and max, use pmin() and pmax(). P does not stand for “vectorized”, but let’s pretend it does.

pmax(1:5, 11:15, 21:25)
#> [1] 21 22 23 24 25

2.10 Array sorting

Sort

a = c(70, 20, 80, 20, 10, 40)
sort(a)
#> [1] 10 20 20 40 70 80

Reverse

sort(a, decreasing=TRUE)
#> [1] 80 70 40 20 20 10

Get the indices of the sorted values

order(a)
#> [1] 5 2 4 6 1 3

2.11 Test membership

To see if an item is in an array, use %in%

9 %in% 1:10
#> [1] TRUE
9:11 %in% 1:10
#> [1]  TRUE  TRUE FALSE