Chapter 1 Introduction to R
1.1 Basic arithmetic
# An addition
5 + 5
## [1] 10
# A subtraction
5 - 5
## [1] 0
# A multiplication
3 * 5
## [1] 15
# A division
5 + 5) / 2 (
## [1] 5
# Exponentiation (次方)
2 ^ 5
## [1] 32
# Modulo (餘數)
28 %% 5
## [1] 3
1.2 Basic data types
- numerics
- decimal (小數), eg: 4.5
- integer (整數), eg: 4
- logical: boolean values, TRUE or FALSE
- characters: text value
Check data type function: class()
# Declare variables of different types
<- 42
my_numeric <- "universe"
my_character <- FALSE
my_logical
# Check class of my_numeric
class(my_numeric)
## [1] "numeric"
# Check class of my_character
class(my_character)
## [1] "character"
# Check class of my_logical
class(my_logical)
## [1] "logical"
1.3 Vector
Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.
1.3.1 Useful function
c()
, names()
, sum()
Here the example
# Poker and roulette winnings from Monday to Friday:
<- c(140, -50, 20, -120, 240)
poker_vector <- c(-24, -50, 100, -350, 10)
roulette_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
days_vector names(poker_vector) <- days_vector #names: columns names
names(roulette_vector) <- days_vector
# Total winnings with poker
<- sum(poker_vector)
total_poker total_poker
## [1] 230
# Total winnings with roulette
<- sum(roulette_vector)
total_roulette total_roulette
## [1] -314
# Total winnings overall
<- sum(total_poker, total_roulette)
total_week
# Comparison
> total_roulette total_poker
## [1] TRUE
# Print out total_week
total_week
## [1] -84
1.3.2 Selection - by brackets or names
To select elements of a vector (and later matrices, data frames, …), you can use square brackets, or using the names of the vector elements.
# Define a new variable based on a selection
<- poker_vector[3]; poker_wednesday poker_wednesday
## Wednesday
## 20
# To select multiple elements from a vector, you can add square brackets at the end of it. You can indicate between the brackets what elements should be selected.
# Define a new variable based on a selection
# Select by brackets
<- poker_vector[c(1, 5)]; poker_two poker_two
## Monday Friday
## 140 240
<- poker_vector[2:4]; poker_midweek poker_midweek
## Tuesday Wednesday Thursday
## -50 20 -120
# Select by names
<- poker_vector[c("Monday", "Friday")]; poker_two_name poker_two_name
## Monday Friday
## 140 240
<- poker_vector[c("Tuesday", "Wednesday", "Thursday")]; poker_midweek_name poker_midweek_name
## Tuesday Wednesday Thursday
## -50 20 -120
1.3.3 Selection - by comparison - operators
The (logical) comparison operators known to R are:
<
for less than>
for greater than<=
for less than or equal to>=
for greater than or equal to==
for equal to each other!=
not equal to each other
Return True or False.
# Which days did you make money on poker?
<- poker_vector > 0
selection_vector
# Print out selection_vector
selection_vector
## Monday Tuesday Wednesday Thursday Friday
## TRUE FALSE TRUE FALSE TRUE
1.3.4 Selection - by comparison - values
When you pass a logical vector in square brackets: it will only select the elements that correspond to TRUE
in vector.
# Select from poker_vector these days
<- poker_vector[selection_vector]
poker_winning_days poker_winning_days
## Monday Wednesday Friday
## 140 20 240
1.4 Matrix
Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.
1.4.1 Construct function
In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.
You can construct a matrix in R with the matrix() function. Consider the following example:
matrix(1:9, byrow = TRUE, nrow = 3)
In the matrix()
function:
- The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use
1:9
which is a shortcut forc(1, 2, 3, 4, 5, 6, 7, 8, 9)
. - The argument
byrow
indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just placebyrow = FALSE
. - The third argument
nrow
indicates that the matrix should have three rows.
# Construct a matrix with 3 rows that contain the numbers 1 up to 9, filled by rows
matrix(1:9, byrow = T, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
# Construct a matrix with 3 rows that contain the numbers 1 up to 9, but filled by the columns
matrix(1:9, byrow = F, nrow = 3)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
1.4.2 Construct by vector
In the editor, three vectors are defined. Each one represents the box office numbers from the first three Star Wars movies. The first element of each vector indicates the US box office revenue, the second element refers to the Non-US box office (source: Wikipedia).
In this exercise, you’ll combine all these figures into a single vector. Next, you’ll build a matrix from this vector.
# Box office Star Wars (in millions!)
<- c(460.998, 314.4)
new_hope <- c(290.475, 247.900)
empire_strikes <- c(309.306, 165.8)
return_jedi
# Create box_office
<- c(new_hope, empire_strikes, return_jedi)
box_office box_office
## [1] 460.998 314.400 290.475 247.900 309.306 165.800
# Construct a matrix with 3 rows, where each row represents a movie.
<- matrix(box_office, byrow = T, nrow = 3)
star_wars_matrix star_wars_matrix
## [,1] [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8
1.4.3 Naming a matrix
Not only does this help you to read the data, but it is also useful to select certain elements from the matrix.
rownames(my_matrix) <- row_names_vector
colnames(my_matrix) <- col_names_vector
matrix(vec, byrow, nrow, dimnames = list(rownames, columnnames))
# Vectors region and titles, used for naming
<- c("US", "non-US")
region <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
titles
# Name the columns with region
colnames(star_wars_matrix) <- region
# Name the rows with titles
rownames(star_wars_matrix) <- titles
# Print out star_wars_matrix
star_wars_matrix
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
# Construct by matrix argument "dimnames"
<- matrix(box_office,
star_wars_matrix_dim nrow = 3, byrow = TRUE,
dimnames = list(titles, region))
star_wars_matrix_dim
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
1.4.4 Manipulating - sum of each row & column
In R, the function rowSums()
conveniently calculates the totals for each row of a matrix, colSums()
calculates the totals for each column of a matrix. These function creates a new vector.
# Calculate worldwide box office figures for each row
<- rowSums(star_wars_matrix); worldwide_vector worldwide_vector
## A New Hope The Empire Strikes Back Return of the Jedi
## 775.398 538.375 475.106
# Calculate worldwide box office figures for each column
<- colSums(star_wars_matrix); worldwide_vector_col worldwide_vector_col
## US non-US
## 1060.779 728.100
1.4.5 Manipulating - add columns
You can add a column or multiple columns to a matrix with the cbind()
function, which merges matrices and/or vectors together by column. For example:
big_matrix <- cbind(matrix1, matrix2, vector1 ...)
# Bind the new variable worldwide_vector as a column to star_wars_matrix
<- cbind(star_wars_matrix, worldwide_vector)
all_wars_matrix all_wars_matrix
## US non-US worldwide_vector
## A New Hope 460.998 314.4 775.398
## The Empire Strikes Back 290.475 247.9 538.375
## Return of the Jedi 309.306 165.8 475.106
1.4.6 Manipulating - add rows
You can add a row or multiple rows to a matrix with the rbind()
function, which merges matrices and/or vectors together by row.
# Construct another matrix for merging.
<- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
matrix2_rowname <- matrix(c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5),
star_wars_matrix2 byrow = T, nrow = 3,
dimnames = list(matrix2_rowname, region))
star_wars_matrix2
## US non-US
## The Phantom Menace 474.5 552.5
## Attack of the Clones 310.7 338.7
## Revenge of the Sith 380.3 468.5
# Combine both Star Wars trilogies in one matrix
<- rbind(star_wars_matrix, star_wars_matrix2)
all_wars_matrix all_wars_matrix
## US non-US
## A New Hope 460.998 314.4
## The Empire Strikes Back 290.475 247.9
## Return of the Jedi 309.306 165.8
## The Phantom Menace 474.500 552.5
## Attack of the Clones 310.700 338.7
## Revenge of the Sith 380.300 468.5
1.4.7 Selection of matrix elements
You can use the square brackets [ ]
to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example:
- my_matrix[1,2]
selects the element at the first row and second column.
- my_matrix[1:3,2:4]
results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3, 4.
If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:
- my_matrix[,1]
selects all elements of the first column.
- my_matrix[1,]
selects all elements of the first row.
# Select the non-US revenue for all movies
<- all_wars_matrix[, 2]; non_us_all non_us_all
## A New Hope The Empire Strikes Back Return of the Jedi
## 314.4 247.9 165.8
## The Phantom Menace Attack of the Clones Revenge of the Sith
## 552.5 338.7 468.5
# Select the non-US revenue for first two movies
<- all_wars_matrix[1:2, 2]; non_us_some non_us_some
## A New Hope The Empire Strikes Back
## 314.4 247.9
1.4.8 Arithmetic - 1
Similar to what you have learned with vectors, the standard operators like +
, -
, /
, *
, etc. work in an element-wise way on matrices in R.
For example, 2 * my_matrix
multiplies each element of my_matrix by two.
# Estimate the visitors.Assume that the price of a ticket was 5 dollars. Simply dividing the box office numbers by this ticket price gives you the number of visitors.
<- all_wars_matrix / 5
visitors visitors
## US non-US
## A New Hope 92.1996 62.88
## The Empire Strikes Back 58.0950 49.58
## Return of the Jedi 61.8612 33.16
## The Phantom Menace 94.9000 110.50
## Attack of the Clones 62.1400 67.74
## Revenge of the Sith 76.0600 93.70
1.4.9 Arithmetic - 2
Just like 2 * my_matrix
multiplied every element of my_matrix
by two, my_matrix1 * my_matrix2
creates a matrix where each element is the product of the corresponding elements in my_matrix1
and my_matrix2
.
Note that this is not the standard matrix multiplication for which you should use %*%
in R.
# Construct another matrix
<- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
ticket_rowname <- matrix(c(5,5,6,6,7,7,4,4,4.5,4.5,4.9,4.9),
ticket_prices_matrix byrow = T, nrow = 6,
dimnames = list(ticket_rowname, region))
ticket_prices_matrix
## US non-US
## A New Hope 5.0 5.0
## The Empire Strikes Back 6.0 6.0
## Return of the Jedi 7.0 7.0
## The Phantom Menace 4.0 4.0
## Attack of the Clones 4.5 4.5
## Revenge of the Sith 4.9 4.9
# Estimated number of visitors
<- all_wars_matrix / ticket_prices_matrix
visitors
# US visitors
<- visitors[, 1]
us_visitors
# Average number of US visitors
mean(us_visitors)
## [1] 75.01339
1.5 Factor
The term factor
refers to a statistical data type used to store categorical variables.
The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories.
A continuous variable, on the other hand, can correspond to an infinite number of values.
A good example of a categorical variable is sex(“Male” or “Female”).
1.5.1 Useful function
The function factor()
or as.factor()
will encode the vector as a factor.
# Sex vector
<- c("Male", "Female", "Female", "Male", "Male")
sex_vector
# Convert sex_vector to a factor
<-factor(sex_vector); factor_sex_vector factor_sex_vector
## [1] Male Female Female Male Male
## Levels: Female Male
1.5.2 Nominal & Ordinal categorical variable
A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that ‘one is worth more than the other’. For example, think of the categorical variable animals_vector
with the categories “Elephant”, “Giraffe”, “Donkey” and “Horse”.
In contrast, ordinal variables do have a natural ordering. Consider for example the categorical variable temperature_vector
with the categories: “Low”, “Medium” and “High”. Here it is obvious that “Medium” stands above “Low”, and “High” stands above “Medium”.
# Animals
<- c("Elephant", "Giraffe", "Donkey", "Horse")
animals_vector <- factor(animals_vector)
factor_animals_vector factor_animals_vector
## [1] Elephant Giraffe Donkey Horse
## Levels: Donkey Elephant Giraffe Horse
# Temperature, ordinal, order = T, levels
<- c("High", "Low", "High","Low", "Medium")
temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector factor_temperature_vector
## [1] High Low High Low Medium
## Levels: Low < Medium < High
1.5.3 Factor levels
However, sometimes you will want to change the names of these levels for clarity or other reasons. R allows you to do this with the function levels()
:
levels(factor_vector) <- c("name1", "name2",...)
# Code to build factor_survey_vector
<- c("M", "F", "F", "M", "M")
survey_vector <- factor(survey_vector)
factor_survey_vector factor_survey_vector
## [1] M F F M M
## Levels: F M
# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
## [1] Male Female Female Male Male
## Levels: Female Male
1.5.4 Summarizing a factor
The function summary()
will give you a quick overview of the contents of a variable.
# Generate summary for survey_vector
summary(survey_vector)
## Length Class Mode
## 5 character character
# Generate summary for factor_survey_vector
summary(factor_survey_vector)
## Female Male
## 2 3
See the difference of the outputs? So, make sure to add levels for factors.
1.5.5 Ordered factors
R will return NA
when you try to compare values in a factor, since the idea doesn’t make sense. Ordered factors, where more meaningful comparisons are possible.
# Create speed_vector
<- c("medium", "slow", "slow", "medium", "fast") speed_vector
By default, the function factor()
transforms speed_vector
into an unordered factor. To create an ordered factor, you have to add two additional arguments: ordered
and levels
.
Function:
factor(some_vector,
ordered = TRUE,
levels = c("lev1", "lev2" ...))
# Convert speed_vector to ordered factor vector
<- factor(speed_vector, order = T, levels = c("slow", "medium", "fast"))
factor_speed_vector
# Print factor_speed_vector
factor_speed_vector
## [1] medium slow slow medium fast
## Levels: slow < medium < fast
summary(factor_speed_vector)
## slow medium fast
## 2 2 1
Comparing ordered factors
# Factor value for second data analyst
<- factor_speed_vector[2]; da2 da2
## [1] slow
## Levels: slow < medium < fast
# Factor value for fifth data analyst
<- factor_speed_vector[5]; da5 da5
## [1] fast
## Levels: slow < medium < fast
# Is data analyst 2 faster than data analyst 5?
> da5 da2
## [1] FALSE
1.6 Data Frame
Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
1.6.1 Quick look at dataset
The function head()
enables you to show the first observations of a data frame. Similarly, the function tail()
prints out the last observations in your dataset.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
tail(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
?mtcars
## starting httpd help server ... done
The function str()
shows you the structure of your dataset.
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
1.6.2 Creating a data frame
Function: data.frame()
# Definition of vectors
<- c("Mercury", "Venus", "Earth",
name "Mars", "Jupiter", "Saturn",
"Uranus", "Neptune")
<- c("Terrestrial planet",
type "Terrestrial planet",
"Terrestrial planet",
"Terrestrial planet", "Gas giant",
"Gas giant", "Gas giant", "Gas giant")
<- c(0.382, 0.949, 1, 0.532,
diameter 11.209, 9.449, 4.007, 3.883)
<- c(58.64, -243.02, 1, 1.03,
rotation 0.41, 0.43, -0.72, 0.67)
<- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
rings
# Create a data frame from the vectors
<- data.frame(name, type, diameter, rotation, rings)
planets_df
head(planets_df)
## name type diameter rotation rings
## 1 Mercury Terrestrial planet 0.382 58.64 FALSE
## 2 Venus Terrestrial planet 0.949 -243.02 FALSE
## 3 Earth Terrestrial planet 1.000 1.00 FALSE
## 4 Mars Terrestrial planet 0.532 1.03 FALSE
## 5 Jupiter Gas giant 11.209 0.41 TRUE
## 6 Saturn Gas giant 9.449 0.43 TRUE
str(planets_df)
## 'data.frame': 8 obs. of 5 variables:
## $ name : chr "Mercury" "Venus" "Earth" "Mars" ...
## $ type : chr "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
## $ diameter: num 0.382 0.949 1 0.532 11.209 ...
## $ rotation: num 58.64 -243.02 1 1.03 0.41 ...
## $ rings : logi FALSE FALSE FALSE FALSE TRUE TRUE ...
1.6.3 Selection - brackets
You select elements from a data frame with the help of square brackets [ ]
. By using a comma, you can indicate what to select from the rows and the columns respectively. For example:
- my_df[1,2]
selects the value at the first row and second column in my_df
.
- my_df[1:3,2:4]
selects rows 1, 2, 3 and columns 2, 3, 4 in my_df
.
Sometimes you want to select all elements of a row or column. For example, my_df[1, ]
selects all elements of the first row.
# Print out diameter of Mercury (row 1, column 3)
1, 3] planets_df[
## [1] 0.382
# Print out data for Mars (entire fourth row)
4, ] planets_df[
## name type diameter rotation rings
## 4 Mars Terrestrial planet 0.532 1.03 FALSE
1.6.4 Selection - names
You can also use the variable names to select columns of a data frame.
# Select first 5 values of diameter column
1:5, "diameter"] planets_df[
## [1] 0.382 0.949 1.000 0.532 11.209
1.6.5 Selection - $
There is a short-cut. If your columns have names, you can use the $
sign.
# Select the rings variable from planets_df
<- planets_df$rings; rings_vector rings_vector
## [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Select all columns for planets with rings (only output True value)
planets_df[rings_vector, ]
## name type diameter rotation rings
## 5 Jupiter Gas giant 11.209 0.41 TRUE
## 6 Saturn Gas giant 9.449 0.43 TRUE
## 7 Uranus Gas giant 4.007 -0.72 TRUE
## 8 Neptune Gas giant 3.883 0.67 TRUE
1.6.6 Selection - subset
The first argument of subset()
specifies the dataset for which you want a subset. By adding the second argument, you give R the necessary information and conditions to select the correct subset.
subset(my_df, subset = some_condition)
You should see the subset()
function as a short-cut. The code below will give the exact same result as you got in the previous exercise, but this time, you didn’t need the rings_vector
!
subset(planets_df, subset = rings)
# Select planets with diameter < 1
subset(planets_df, subset = diameter < 1)
## name type diameter rotation rings
## 1 Mercury Terrestrial planet 0.382 58.64 FALSE
## 2 Venus Terrestrial planet 0.949 -243.02 FALSE
## 4 Mars Terrestrial planet 0.532 1.03 FALSE
1.6.7 Sorting
order()
is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:
<- c(100, 10, 1000)
a order(a)
## [1] 2 1 3
order(a)] a[
## [1] 10 100 1000
You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the diameter column.
# Use order() to create positions
<- order(planets_df$diameter)
positions
# Use positions to sort planets_df
planets_df[positions, ]
## name type diameter rotation rings
## 1 Mercury Terrestrial planet 0.382 58.64 FALSE
## 4 Mars Terrestrial planet 0.532 1.03 FALSE
## 2 Venus Terrestrial planet 0.949 -243.02 FALSE
## 3 Earth Terrestrial planet 1.000 1.00 FALSE
## 8 Neptune Gas giant 3.883 0.67 TRUE
## 7 Uranus Gas giant 4.007 -0.72 TRUE
## 6 Saturn Gas giant 9.449 0.43 TRUE
## 5 Jupiter Gas giant 11.209 0.41 TRUE
1.7 List
Lists: a list is some kind super data type, you can store practically any piece of information in it! It allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc.
1.7.1 Creating a list
Function: list()
my_list <- list(comp1, comp2 ...)
The arguments to the list function are the list components. These components can be matrices, vectors, other lists, etc.
# Vector with numerics from 1 up to 10
<- 1:10
my_vector
# Matrix with numerics from 1 up to 9
<- matrix(1:9, ncol = 3)
my_matrix
# First 10 elements of the built-in data frame mtcars
<- mtcars[1:10,]
my_df
# Construct list with these different elements:
<- list(my_vector, my_matrix, my_df)
my_list my_list
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## [[3]]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1.7.2 Creating a named list
Just like on your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them. This creates a list with components that are named name1
, name2
, and so on.
my_list <- list(name1 = your_comp1,
name2 = your_comp2)
If you want to name your lists after you’ve created them, you can use the names()
function as you did with vectors. The following commands are fully equivalent to the assignment above:
my_list <- list(your_comp1, your_comp2)
names(my_list) <- c("name1", "name2")
# Adapt list() call to give the components names-after
names(my_list) <- c("vec", "mat", "df")
my_list
## $vec
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## $df
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# Adapt list() call to give the components names-before
<- list("vec" = my_vector,
my_list "mat" = my_matrix,
"df" = my_df)
my_list
## $vec
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## $df
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1.7.3 Selection
You can also refer to the names of the components, with [[ ]]
or with the $
sign.
Note: to select elements from vectors, you use single square brackets: [ ]
. Don’t mix them up!
# Print the first element of the list, which is a vector
1]] my_list[[
## [1] 1 2 3 4 5 6 7 8 9 10
"vec"]] my_list[[
## [1] 1 2 3 4 5 6 7 8 9 10
$vec my_list
## [1] 1 2 3 4 5 6 7 8 9 10
# Print the second element of the vector
1]][2] my_list[[
## [1] 2
1.8 Bring it all together
# Construct vectors
<- "The Departed"
movie_title <- c("Leonardo DiCaprio", "Matt Damon", "Jack Nicholson",
movie_actors "Mark Wahlberg", "Vera Farmiga", "Martin Sheen")
<- c(4.6, 5, 4.8, 5, 4.2)
scores <- c("I would watch it again", "Amazing!", "I liked it", "One of the best movies", "Fascinating plot")
comments
# Save the average of the scores vector as avg_review
<- mean(scores)
avg_review
# Combine scores and comments into the reviews_df data frame
<- data.frame(scores, comments)
reviews_df
# Sort reviews_df by scores
<- reviews_df[order(reviews_df$scores), ]
reviews_df
# Create and print out a list, called departed_list
<- list("title" = movie_title,
departed_list "actors" = movie_actors,
"reviews" = reviews_df,
"average" = avg_review)
departed_list
## $title
## [1] "The Departed"
##
## $actors
## [1] "Leonardo DiCaprio" "Matt Damon" "Jack Nicholson"
## [4] "Mark Wahlberg" "Vera Farmiga" "Martin Sheen"
##
## $reviews
## scores comments
## 5 4.2 Fascinating plot
## 1 4.6 I would watch it again
## 3 4.8 I liked it
## 2 5.0 Amazing!
## 4 5.0 One of the best movies
##
## $average
## [1] 4.72