Chapter 3 Types
For this chapter, you’ll need to load the tidyverse library. More on libraries.
R is dramatic and gives “conflict” messages that look like errors. Ignore them.
library(tidyverse)
3.1 Numbers
R has integers but defaults all numbers to numeric
which is a double precision float.
= 5 # no decimal but still a double
x = x + 1 y
Most types can be checked with is.[type]()
and can be converted with as.[type]()
.
is.numeric(5)
#> [1] TRUE
is.numeric("5") # a string is not a numeric
#> [1] FALSE
as.numeric("5") # convert the string to a number
#> [1] 5
Good ol’ float point comparison
= .58
x = .08
y - y == 0.5 x
#> [1] FALSE
near(x-y, 0.5) # checks if numbers are very close
#> [1] TRUE
Numeric division returns a double
9 / 2 # double precision float
#> [1] 4.5
9 %/% 2 # drop the part after the decimal
#> [1] 4
3.2 Strings
Single and double quotes are the same in R, but a given string needs the same in the beginning and end
"hello world"
#> [1] "hello world"
'hello world'
#> [1] "hello world"
"single quote ' in a string"
#> [1] "single quote ' in a string"
'double quote " in a string'
#> [1] "double quote \" in a string"
R calls a string, a “character”. Notice that it doesn’t call a number, a “digit”.
is.character("hello world")
#> [1] TRUE
Strings are not character arrays in R, so array techniques may not work as expected.
String length
length('hello world') # R sees this is one string, not many characters
#> [1] 1
str_length('hello world') # the actual length of the string
#> [1] 11
Substring
str_sub('hello world', 2, 10)
#> [1] "ello worl"
Comparison
'hello' == "hello"
#> [1] TRUE
3.2.1 Strings with special characters
If you want to use special characters in a string, you need to “escape it” by adding \
"string with backslashes \\, double quote \", and unicode \u263A"
#> [1] "string with backslashes \\, double quote \", and unicode <U+263A>"
Or you can use the literal r"(text)"
which is useful for a Windows path or regular expression
"(c:\hello\world)" r
#> [1] "c:\\hello\\world"
3.2.2 String Concatenation
Concatenate with a space in between
paste('hello', 'world')
#> [1] "hello world"
Use a different separator
paste('hello', 'world', sep='_')
#> [1] "hello_world"
No separator
paste('hello', 'world', sep='')
#> [1] "helloworld"
paste0('hello', 'world') # a shortcut function for no separator
#> [1] "helloworld"
Combine a set of strings into one
paste(c("apple", "orange", "banana"), collapse = ", ")
#> [1] "apple, orange, banana"
See the strings chapter in R4DS for more.
3.3 Dates
See the dates chapter in R4DS.
3.4 Checking the type
What’s the type?
class(5)
#> [1] "numeric"
Remember, arrays are the same as single values.
class(1:5)
#> [1] "integer"
An array with multiple types converts the elements to the most abstract type
class(c(5, 'hi', TRUE))
#> [1] "character"
Test if numeric
is.numeric(5)
#> [1] TRUE
Test if string
is.character('hi')
#> [1] TRUE
Test if boolean
is.logical(TRUE)
#> [1] TRUE
3.5 Converting and parsing
Parse or convert to numeric
as.numeric(c("5", TRUE, 1:3, "abc"))
#> Warning: NAs introduced by coercion
#> [1] 5 NA 1 2 3 NA
To string
as.character(5)
#> [1] "5"
format(1/3)
#> [1] "0.3333333"
format(1/3 , digits = 16)
#> [1] "0.3333333333333333"
as.character(TRUE)
#> [1] "TRUE"
Convert to boolean. Zero is false. Other numbers are true.
as.logical(0:2)
#> [1] FALSE TRUE TRUE
3.6 Missing values (NA)
Any type can have missing values.
class(c(1, 2, 3, NA, 5))
#> [1] "numeric"
Missing values are very common in datasets.
is.na(c(NA, 1, ""))
#> [1] TRUE FALSE FALSE
Any operation performed on NA will also yield NA. So, you can operate on arrays with missing values.
c(5, NA, 7) + 1
#> [1] 6 NA 8
Be careful about aggregation functions like min()
, max()
, and mean()
. To ignore NAs, use the na.rm
parameter.
mean(c(5, NA, 7), na.rm=TRUE)
#> [1] 6
3.7 Factor
A factor is like an enum in other languages. It encodes strings as integers via a dictionary.
Create an array with many repeating values
= sample(c("hello", "cruel", "world"), 12, replace=TRUE)
data data
#> [1] "world" "cruel" "world" "hello" "world" "hello" "cruel" "world" "world"
#> [10] "hello" "cruel" "world"
Make it into a factor
= factor(data)
data data
#> [1] world cruel world hello world hello cruel world world hello cruel world
#> Levels: cruel hello world
The array is now an integer array with a dictionary
as.numeric(data)
#> [1] 3 1 3 2 3 2 1 3 3 2 1 3
1] data[
#> [1] world
#> Levels: cruel hello world
See the different values in the array
levels(data)
#> [1] "cruel" "hello" "world"
For more info, see the factors chapter in R4DS.