Chapter 3 Types
For this chapter, you’ll need to load the tidyverse library. More on libraries.
R is dramatic and gives “conflict” messages that look like errors. Ignore them.
library(tidyverse)3.1 Numbers
R has integers but defaults all numbers to numeric which is a double precision float.
x = 5 # no decimal but still a double
y = x + 1Most types can be checked with is.[type]() and can be converted with as.[type]().
is.numeric(5)#> [1] TRUE
is.numeric("5") # a string is not a numeric#> [1] FALSE
as.numeric("5") # convert the string to a number#> [1] 5
Good ol’ float point comparison
x = .58
y = .08
x - y == 0.5#> [1] FALSE
near(x-y, 0.5) # checks if numbers are very close#> [1] TRUE
Numeric division returns a double
9 / 2 # double precision float#> [1] 4.5
9 %/% 2 # drop the part after the decimal#> [1] 4
3.2 Strings
Single and double quotes are the same in R, but a given string needs the same in the beginning and end
"hello world"#> [1] "hello world"
'hello world'#> [1] "hello world"
"single quote ' in a string"#> [1] "single quote ' in a string"
'double quote " in a string'#> [1] "double quote \" in a string"
R calls a string, a “character”. Notice that it doesn’t call a number, a “digit”.
is.character("hello world")#> [1] TRUE
Strings are not character arrays in R, so array techniques may not work as expected.
String length
length('hello world') # R sees this is one string, not many characters#> [1] 1
str_length('hello world') # the actual length of the string#> [1] 11
Substring
str_sub('hello world', 2, 10)#> [1] "ello worl"
Comparison
'hello' == "hello"#> [1] TRUE
3.2.1 Strings with special characters
If you want to use special characters in a string, you need to “escape it” by adding \
"string with backslashes \\, double quote \", and unicode \u263A"#> [1] "string with backslashes \\, double quote \", and unicode <U+263A>"
Or you can use the literal r"(text)" which is useful for a Windows path or regular expression
r"(c:\hello\world)"#> [1] "c:\\hello\\world"
3.2.2 String Concatenation
Concatenate with a space in between
paste('hello', 'world')#> [1] "hello world"
Use a different separator
paste('hello', 'world', sep='_')#> [1] "hello_world"
No separator
paste('hello', 'world', sep='')#> [1] "helloworld"
paste0('hello', 'world') # a shortcut function for no separator#> [1] "helloworld"
Combine a set of strings into one
paste(c("apple", "orange", "banana"), collapse = ", ")#> [1] "apple, orange, banana"
See the strings chapter in R4DS for more.
3.3 Dates
See the dates chapter in R4DS.
3.4 Checking the type
What’s the type?
class(5)#> [1] "numeric"
Remember, arrays are the same as single values.
class(1:5)#> [1] "integer"
An array with multiple types converts the elements to the most abstract type
class(c(5, 'hi', TRUE))#> [1] "character"
Test if numeric
is.numeric(5)#> [1] TRUE
Test if string
is.character('hi')#> [1] TRUE
Test if boolean
is.logical(TRUE)#> [1] TRUE
3.5 Converting and parsing
Parse or convert to numeric
as.numeric(c("5", TRUE, 1:3, "abc"))#> Warning: NAs introduced by coercion
#> [1] 5 NA 1 2 3 NA
To string
as.character(5)#> [1] "5"
format(1/3)#> [1] "0.3333333"
format(1/3 , digits = 16)#> [1] "0.3333333333333333"
as.character(TRUE)#> [1] "TRUE"
Convert to boolean. Zero is false. Other numbers are true.
as.logical(0:2)#> [1] FALSE TRUE TRUE
3.6 Missing values (NA)
Any type can have missing values.
class(c(1, 2, 3, NA, 5))#> [1] "numeric"
Missing values are very common in datasets.
is.na(c(NA, 1, ""))#> [1] TRUE FALSE FALSE
Any operation performed on NA will also yield NA. So, you can operate on arrays with missing values.
c(5, NA, 7) + 1#> [1] 6 NA 8
Be careful about aggregation functions like min(), max(), and mean(). To ignore NAs, use the na.rm parameter.
mean(c(5, NA, 7), na.rm=TRUE)#> [1] 6
3.7 Factor
A factor is like an enum in other languages. It encodes strings as integers via a dictionary.
Create an array with many repeating values
data = sample(c("hello", "cruel", "world"), 12, replace=TRUE)
data#> [1] "world" "cruel" "world" "hello" "world" "hello" "cruel" "world" "world"
#> [10] "hello" "cruel" "world"
Make it into a factor
data = factor(data)
data#> [1] world cruel world hello world hello cruel world world hello cruel world
#> Levels: cruel hello world
The array is now an integer array with a dictionary
as.numeric(data)#> [1] 3 1 3 2 3 2 1 3 3 2 1 3
data[1]#> [1] world
#> Levels: cruel hello world
See the different values in the array
levels(data)#> [1] "cruel" "hello" "world"
For more info, see the factors chapter in R4DS.