Chapter 2 Arrays
In R, arrays are commonly called “vectors”. R likes to be special.
2.1 Everything is an array
In R, even single values are arrays. That’s why you see [1] in front of results: even single values are the first item in an array of length one.
2.2 Creation
c() is some sort of legacy nonsense from the S language. I think it means character array even though it can hold things other than characters.
I pronounce it “CAW”. Like the sound a crow makes.
Simple array
c(8, 6, 7, 5)#> [1] 8 6 7 5
For multiple types, R converts elements to the most complex type (usually a string). For a real multi-typed collection, see lists
c(9, 'hello', 7)#> [1] "9" "hello" "7"
2.3 Array generators
R has a cultural fear of complete words. Many terms are shortcuts or acronyms.
Repeat
rep(0, 4)#> [1] 0 0 0 0
rep(c(1,2,3), 4) # repeate the whole array#> [1] 1 2 3 1 2 3 1 2 3 1 2 3
rep(c(1,2,3), each=4) # repeat each item in the array before moving to the next#> [1] 1 1 1 1 2 2 2 2 3 3 3 3
Sequence
#increment by 1
4:10#> [1] 4 5 6 7 8 9 10
#increment by any other value
seq(from=10, to=50, by=5)#> [1] 10 15 20 25 30 35 40 45 50
Randomly sample from a given distribution
# uniform distribution (not 'run if')
runif(n=5, min=0, max=1)#> [1] 0.1594836 0.4781883 0.7647987 0.7696877 0.2685485
# normal distribution
rnorm(n=5, mean=0, sd=1)#> [1] 0.4483395 1.0208067 -0.1378989 0.2103863 -0.6428271
2.4 Combining arrays
An array made up of smaller arrays concatenates them. R doesn’t seem to allow for an array of arrays.
x = 1:3
y = c(10, 11) # arrays of arrays get flattened
z = 500
c(x, y, z)#> [1] 1 2 3 10 11 500
Note: z is technically an array of length 1
Collapse an array into a string
paste(1:5, collapse=", ")#> [1] "1, 2, 3, 4, 5"
2.5 Indexing
a = 10:20Get the first value - Indices start at 1, not 0
a[1]#> [1] 10
2nd and 6th values
a[c(2,6)]#> [1] 11 15
Exclude the 2nd and 6th values
a[c(-2,-6)]#> [1] 10 12 13 14 16 17 18 19 20
Range of values
a[2:6]#> [1] 11 12 13 14 15
Any order or number of repetitions
a[c(2, 4, 6, 6, 6)]#> [1] 11 13 15 15 15
Specify values to keep or drop using booleans (keep this in mind for the “Array operations” section)
a[c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)]#> [1] 10 12 14 16 18 20
2.6 Sampling from an Array
Randomly sample from an array. Elements may repeat.
sample(1:3, size=10, replace=TRUE)#> [1] 1 1 2 3 2 2 2 2 3 1
replace means “sample with replacement”, so an element can be sampled more than once
Sample without replacement. Elements will not repeat.
sample(1:5, size=4, replace=FALSE)#> [1] 4 3 1 5
Shuffle the order of an array
sample(a, size=length(a), replace=FALSE)#> [1] 15 13 17 20 18 10 12 16 14 11 19
Make sure you have enough elements
sample(1:5, size=10, replace=FALSE)#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
2.7 Array constants
The letters and LETTERS constants hold lower and upper case letters
letters[1:5]#> [1] "a" "b" "c" "d" "e"
LETTERS[1:5]#> [1] "A" "B" "C" "D" "E"
2.8 Array operations
Many functions in R are vectorized, so they apply to arrays.
a * 2#> [1] 20 22 24 26 28 30 32 34 36 38 40
Compare individual elements
a > 15#> [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Compare each element across arrays
a == c(10, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20)#> [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Select elements using boolean array
a[a>15]#> [1] 16 17 18 19 20
You can perform operations on the elements of two arrays even if they are different sizes. The smaller one wraps around.
a = 1:5
b = rep(1, 8)
a + b#> Warning in a + b: longer object length is not a multiple of shorter object
#> length
#> [1] 2 3 4 5 6 2 3 4
2.9 Array functions
Length
length(20:50)#> [1] 31
Reverse
rev(1:5)#> [1] 5 4 3 2 1
Math
sum(1:5)#> [1] 15
min(1:5)#> [1] 1
max(1:5)#> [1] 5
min() and max() are not vectorized. They only return one value.
max(1:5, 11:15, 21:25)#> [1] 25
For a vectorized min and max, use pmin() and pmax(). P does not stand for “vectorized”, but let’s pretend it does.
pmax(1:5, 11:15, 21:25)#> [1] 21 22 23 24 25
2.10 Array sorting
Sort
a = c(70, 20, 80, 20, 10, 40)
sort(a)#> [1] 10 20 20 40 70 80
Reverse
sort(a, decreasing=TRUE)#> [1] 80 70 40 20 20 10
Get the indices of the sorted values
order(a)#> [1] 5 2 4 6 1 3
2.11 Test membership
To see if an item is in an array, use %in%
9 %in% 1:10#> [1] TRUE
9:11 %in% 1:10#> [1] TRUE TRUE FALSE