Chapter 3 Types

For this chapter, you’ll need to load the tidyverse library. More on libraries.

R is dramatic and gives “conflict” messages that look like errors. Ignore them.

library(tidyverse)

3.1 Numbers

R has integers but defaults all numbers to numeric which is a double precision float.

x = 5 # no decimal but still a double
y = x + 1

Most types can be checked with is.[type]() and can be converted with as.[type]().

is.numeric(5)
#> [1] TRUE
is.numeric("5") # a string is not a numeric
#> [1] FALSE
as.numeric("5") # convert the string to a number
#> [1] 5

Good ol’ float point comparison

x = .58
y = .08
x - y == 0.5
#> [1] FALSE
near(x-y, 0.5) # checks if numbers are very close
#> [1] TRUE

Numeric division returns a double

9 / 2 # double precision float
#> [1] 4.5
9 %/% 2 # drop the part after the decimal
#> [1] 4

3.2 Strings

Single and double quotes are the same in R, but a given string needs the same in the beginning and end

"hello world"
#> [1] "hello world"
'hello world'
#> [1] "hello world"
"single quote ' in a string"
#> [1] "single quote ' in a string"
'double quote " in a string'
#> [1] "double quote \" in a string"

R calls a string, a “character”. Notice that it doesn’t call a number, a “digit”.

is.character("hello world")
#> [1] TRUE

Strings are not character arrays in R, so array techniques may not work as expected.

String length

length('hello world') # R sees this is one string, not many characters
#> [1] 1
str_length('hello world') # the actual length of the string
#> [1] 11

Substring

str_sub('hello world', 2, 10)
#> [1] "ello worl"

Comparison

'hello' == "hello"
#> [1] TRUE

3.2.1 Strings with special characters

If you want to use special characters in a string, you need to “escape it” by adding \

"string with backslashes \\, double quote \", and unicode \u263A"
#> [1] "string with backslashes \\, double quote \", and unicode <U+263A>"

Or you can use the literal r"(text)" which is useful for a Windows path or regular expression

r"(c:\hello\world)"
#> [1] "c:\\hello\\world"

3.2.2 String Concatenation

Concatenate with a space in between

paste('hello', 'world')
#> [1] "hello world"

Use a different separator

paste('hello', 'world', sep='_')
#> [1] "hello_world"

No separator

paste('hello', 'world', sep='')
#> [1] "helloworld"
paste0('hello', 'world') # a shortcut function for no separator
#> [1] "helloworld"

Combine a set of strings into one

paste(c("apple", "orange", "banana"), collapse = ", ")
#> [1] "apple, orange, banana"

See the strings chapter in R4DS for more.

3.3 Dates

See the dates chapter in R4DS.

3.4 Checking the type

What’s the type?

class(5)
#> [1] "numeric"

Remember, arrays are the same as single values.

class(1:5)
#> [1] "integer"

An array with multiple types converts the elements to the most abstract type

class(c(5, 'hi', TRUE))
#> [1] "character"

Test if numeric

is.numeric(5)
#> [1] TRUE

Test if string

is.character('hi')
#> [1] TRUE

Test if boolean

is.logical(TRUE)
#> [1] TRUE

3.5 Converting and parsing

Parse or convert to numeric

as.numeric(c("5", TRUE, 1:3, "abc"))
#> Warning: NAs introduced by coercion
#> [1]  5 NA  1  2  3 NA

To string

as.character(5)
#> [1] "5"
format(1/3)
#> [1] "0.3333333"
format(1/3 , digits = 16)
#> [1] "0.3333333333333333"
as.character(TRUE)
#> [1] "TRUE"

Convert to boolean. Zero is false. Other numbers are true.

as.logical(0:2)
#> [1] FALSE  TRUE  TRUE

3.6 Missing values (NA)

Any type can have missing values.

class(c(1, 2, 3, NA, 5))
#> [1] "numeric"

Missing values are very common in datasets.

is.na(c(NA, 1, ""))
#> [1]  TRUE FALSE FALSE

Any operation performed on NA will also yield NA. So, you can operate on arrays with missing values.

c(5, NA, 7) + 1
#> [1]  6 NA  8

Be careful about aggregation functions like min(), max(), and mean(). To ignore NAs, use the na.rm parameter.

mean(c(5, NA, 7), na.rm=TRUE)
#> [1] 6

3.7 Factor

A factor is like an enum in other languages. It encodes strings as integers via a dictionary.

Create an array with many repeating values

data = sample(c("hello", "cruel", "world"), 12, replace=TRUE)
data
#>  [1] "world" "cruel" "world" "hello" "world" "hello" "cruel" "world" "world"
#> [10] "hello" "cruel" "world"

Make it into a factor

data = factor(data)
data
#>  [1] world cruel world hello world hello cruel world world hello cruel world
#> Levels: cruel hello world

The array is now an integer array with a dictionary

as.numeric(data)
#>  [1] 3 1 3 2 3 2 1 3 3 2 1 3
data[1]
#> [1] world
#> Levels: cruel hello world

See the different values in the array

levels(data)
#> [1] "cruel" "hello" "world"

For more info, see the factors chapter in R4DS.