Chapter 2 Arrays
In R, arrays are commonly called “vectors”. R likes to be special.
2.1 Everything is an array
In R, even single values are arrays. That’s why you see [1]
in front of results: even single values are the first item in an array of length one.
2.2 Creation
c()
is some sort of legacy nonsense from the S language. I think it means character array even though it can hold things other than characters.
I pronounce it “CAW”. Like the sound a crow makes.
Simple array
c(8, 6, 7, 5)
#> [1] 8 6 7 5
For multiple types, R converts elements to the most complex type (usually a string). For a real multi-typed collection, see lists
c(9, 'hello', 7)
#> [1] "9" "hello" "7"
2.3 Array generators
R has a cultural fear of complete words. Many terms are shortcuts or acronyms.
Repeat
rep(0, 4)
#> [1] 0 0 0 0
rep(c(1,2,3), 4) # repeate the whole array
#> [1] 1 2 3 1 2 3 1 2 3 1 2 3
rep(c(1,2,3), each=4) # repeat each item in the array before moving to the next
#> [1] 1 1 1 1 2 2 2 2 3 3 3 3
Sequence
#increment by 1
4:10
#> [1] 4 5 6 7 8 9 10
#increment by any other value
seq(from=10, to=50, by=5)
#> [1] 10 15 20 25 30 35 40 45 50
Randomly sample from a given distribution
# uniform distribution (not 'run if')
runif(n=5, min=0, max=1)
#> [1] 0.1594836 0.4781883 0.7647987 0.7696877 0.2685485
# normal distribution
rnorm(n=5, mean=0, sd=1)
#> [1] 0.4483395 1.0208067 -0.1378989 0.2103863 -0.6428271
2.4 Combining arrays
An array made up of smaller arrays concatenates them. R doesn’t seem to allow for an array of arrays.
= 1:3
x = c(10, 11) # arrays of arrays get flattened
y = 500
z
c(x, y, z)
#> [1] 1 2 3 10 11 500
Note: z
is technically an array of length 1
Collapse an array into a string
paste(1:5, collapse=", ")
#> [1] "1, 2, 3, 4, 5"
2.5 Indexing
= 10:20 a
Get the first value - Indices start at 1, not 0
1] a[
#> [1] 10
2nd and 6th values
c(2,6)] a[
#> [1] 11 15
Exclude the 2nd and 6th values
c(-2,-6)] a[
#> [1] 10 12 13 14 16 17 18 19 20
Range of values
2:6] a[
#> [1] 11 12 13 14 15
Any order or number of repetitions
c(2, 4, 6, 6, 6)] a[
#> [1] 11 13 15 15 15
Specify values to keep or drop using booleans (keep this in mind for the “Array operations” section)
c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)] a[
#> [1] 10 12 14 16 18 20
2.6 Sampling from an Array
Randomly sample from an array. Elements may repeat.
sample(1:3, size=10, replace=TRUE)
#> [1] 1 1 2 3 2 2 2 2 3 1
replace
means “sample with replacement”, so an element can be sampled more than once
Sample without replacement. Elements will not repeat.
sample(1:5, size=4, replace=FALSE)
#> [1] 4 3 1 5
Shuffle the order of an array
sample(a, size=length(a), replace=FALSE)
#> [1] 15 13 17 20 18 10 12 16 14 11 19
Make sure you have enough elements
sample(1:5, size=10, replace=FALSE)
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
2.7 Array constants
The letters
and LETTERS
constants hold lower and upper case letters
1:5] letters[
#> [1] "a" "b" "c" "d" "e"
1:5] LETTERS[
#> [1] "A" "B" "C" "D" "E"
2.8 Array operations
Many functions in R are vectorized, so they apply to arrays.
* 2 a
#> [1] 20 22 24 26 28 30 32 34 36 38 40
Compare individual elements
> 15 a
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Compare each element across arrays
== c(10, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20) a
#> [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Select elements using boolean array
>15] a[a
#> [1] 16 17 18 19 20
You can perform operations on the elements of two arrays even if they are different sizes. The smaller one wraps around.
= 1:5
a = rep(1, 8)
b + b a
#> Warning in a + b: longer object length is not a multiple of shorter object
#> length
#> [1] 2 3 4 5 6 2 3 4
2.9 Array functions
Length
length(20:50)
#> [1] 31
Reverse
rev(1:5)
#> [1] 5 4 3 2 1
Math
sum(1:5)
#> [1] 15
min(1:5)
#> [1] 1
max(1:5)
#> [1] 5
min()
and max()
are not vectorized. They only return one value.
max(1:5, 11:15, 21:25)
#> [1] 25
For a vectorized min and max, use pmin()
and pmax()
. P does not stand for “vectorized”, but let’s pretend it does.
pmax(1:5, 11:15, 21:25)
#> [1] 21 22 23 24 25
2.10 Array sorting
Sort
= c(70, 20, 80, 20, 10, 40)
a sort(a)
#> [1] 10 20 20 40 70 80
Reverse
sort(a, decreasing=TRUE)
#> [1] 80 70 40 20 20 10
Get the indices of the sorted values
order(a)
#> [1] 5 2 4 6 1 3
2.11 Test membership
To see if an item is in an array, use %in%
9 %in% 1:10
#> [1] TRUE
9:11 %in% 1:10
#> [1] TRUE TRUE FALSE