Chapter 3 Types

For this chapter, you’ll need to load the tidyverse library. More on libraries.

R is dramatic and gives “conflict” messages that look like errors. Ignore them.

library(tidyverse)

3.1 Numbers

R has integers but defaults all numbers to numeric which is a double precision float.

x = 5 # no decimal but still a double
y = x + 1

Most types can be checked with is.[type]() and can be converted with as.[type]().

is.numeric(5)

#> [1] TRUE

is.numeric("5") # a string is not a numeric

#> [1] FALSE

as.numeric("5") # convert the string to a number

#> [1] 5

Good ol’ float point comparison

x = .58
y = .08
x - y == 0.5

#> [1] FALSE

near(x-y, 0.5) # checks if numbers are very close

#> [1] TRUE

Numeric division returns a double

9 / 2 # double precision float

#> [1] 4.5

9 %/% 2 # drop the part after the decimal

#> [1] 4

Single and double quotes are the same in R, but a given string needs the same in the beginning and end

"hello world"

#> [1] "hello world"

'hello world'

#> [1] "hello world"

"single quote ' in a string"

#> [1] "single quote ' in a string"

'double quote " in a string'

#> [1] "double quote \" in a string"

R calls a string, a “character”. Notice that it doesn’t call a number, a “digit”.

is.character("hello world")

#> [1] TRUE

Strings are not character arrays in R, so array techniques may not work as expected.

String length

length('hello world') # R sees this is one string, not many characters

#> [1] 1

str_length('hello world') # the actual length of the string

#> [1] 11

Substring

str_sub('hello world', 2, 10)

#> [1] "ello worl"

Comparison

'hello' == "hello"

#> [1] TRUE

If you want to use special characters in a string, you need to “escape it” by adding \

"string with backslashes \\, double quote \", and unicode \u263A"

#> [1] "string with backslashes \\, double quote \", and unicode <U+263A>"

Or you can use the literal r"(text)" which is useful for a Windows path or regular expression

r"(c:\hello\world)"

#> [1] "c:\\hello\\world"

Concatenate with a space in between

paste('hello', 'world')

#> [1] "hello world"

Use a different separator

paste('hello', 'world', sep='_')

#> [1] "hello_world"

No separator

paste('hello', 'world', sep='')

#> [1] "helloworld"

paste0('hello', 'world') # a shortcut function for no separator

#> [1] "helloworld"

Combine a set of strings into one

paste(c("apple", "orange", "banana"), collapse = ", ")

#> [1] "apple, orange, banana"

What’s the type?

class(5)

#> [1] "numeric"

Remember, arrays are the same as single values.

class(1:5)

#> [1] "integer"

An array with multiple types converts the elements to the most abstract type

class(c(5, 'hi', TRUE))

#> [1] "character"

Test if numeric

is.numeric(5)

#> [1] TRUE

Test if string

is.character('hi')

#> [1] TRUE

Test if boolean

is.logical(TRUE)

#> [1] TRUE

Parse or convert to numeric

as.numeric(c("5", TRUE, 1:3, "abc"))

#> Warning: NAs introduced by coercion

#> [1]  5 NA  1  2  3 NA

To string

as.character(5)

#> [1] "5"

format(1/3)

#> [1] "0.3333333"

format(1/3 , digits = 16)

#> [1] "0.3333333333333333"

as.character(TRUE)

#> [1] "TRUE"

Convert to boolean. Zero is false. Other numbers are true.

as.logical(0:2)

#> [1] FALSE  TRUE  TRUE

Any type can have missing values.

class(c(1, 2, 3, NA, 5))

#> [1] "numeric"

Missing values are very common in datasets.

is.na(c(NA, 1, ""))

#> [1]  TRUE FALSE FALSE

Any operation performed on NA will also yield NA. So, you can operate on arrays with missing values.

c(5, NA, 7) + 1

#> [1]  6 NA  8

Be careful about aggregation functions like min(), max(), and mean(). To ignore NAs, use the na.rm parameter.

mean(c(5, NA, 7), na.rm=TRUE)

#> [1] 6

A factor is like an enum in other languages. It encodes strings as integers via a dictionary.

Create an array with many repeating values

data = sample(c("hello", "cruel", "world"), 12, replace=TRUE)
data

#>  [1] "world" "cruel" "world" "hello" "world" "hello" "cruel" "world" "world"
#> [10] "hello" "cruel" "world"

Make it into a factor

data = factor(data)
data

#>  [1] world cruel world hello world hello cruel world world hello cruel world
#> Levels: cruel hello world

The array is now an integer array with a dictionary

as.numeric(data)

#>  [1] 3 1 3 2 3 2 1 3 3 2 1 3

data[1]

#> [1] world
#> Levels: cruel hello world

See the different values in the array

levels(data)

#> [1] "cruel" "hello" "world"

For more info, see the factors chapter in R4DS.