Chapter 1 Introduction to R

1.1 Basic arithmetic

# An addition
5 + 5

## [1] 10

# A subtraction
5 - 5

## [1] 0

# A multiplication
3 * 5

## [1] 15

 # A division
(5 + 5) / 2

## [1] 5

# Exponentiation (次方)
2 ^ 5

## [1] 32

# Modulo (餘數)
28 %% 5

## [1] 3

1.2 Basic data types

numerics

decimal (小數), eg: 4.5
integer (整數), eg: 4

logical: boolean values, TRUE or FALSE
characters: text value

Check data type function: class()

# Declare variables of different types
my_numeric <- 42
my_character <- "universe"
my_logical <- FALSE 

# Check class of my_numeric
class(my_numeric)

## [1] "numeric"

# Check class of my_character
class(my_character)

## [1] "character"

# Check class of my_logical
class(my_logical)

## [1] "logical"

1.3 Vector

Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.

1.3.1 Useful function

c(), names(), sum()

Here the example

# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector #names: columns names
names(roulette_vector) <- days_vector

# Total winnings with poker
total_poker <- sum(poker_vector)
total_poker

## [1] 230

# Total winnings with roulette
total_roulette <- sum(roulette_vector)
total_roulette

## [1] -314

# Total winnings overall
total_week <- sum(total_poker, total_roulette)

# Comparison
total_poker > total_roulette

## [1] TRUE

# Print out total_week
total_week

## [1] -84

1.3.2 Selection - by brackets or names

To select elements of a vector (and later matrices, data frames, …), you can use square brackets, or using the names of the vector elements.

# Define a new variable based on a selection
poker_wednesday <- poker_vector[3]; poker_wednesday

## Wednesday 
##        20

# To select multiple elements from a vector, you can add square brackets at the end of it. You can indicate between the brackets what elements should be selected.

# Define a new variable based on a selection
# Select by brackets
poker_two <- poker_vector[c(1, 5)]; poker_two

## Monday Friday 
##    140    240

poker_midweek <- poker_vector[2:4]; poker_midweek

##   Tuesday Wednesday  Thursday 
##       -50        20      -120

# Select by names
poker_two_name <- poker_vector[c("Monday", "Friday")]; poker_two_name

## Monday Friday 
##    140    240

poker_midweek_name <- poker_vector[c("Tuesday", "Wednesday", "Thursday")]; poker_midweek_name

##   Tuesday Wednesday  Thursday 
##       -50        20      -120

1.3.3 Selection - by comparison - operators

The (logical) comparison operators known to R are:

< for less than
> for greater than
<= for less than or equal to
>= for greater than or equal to
== for equal to each other
!= not equal to each other

Return True or False.

# Which days did you make money on poker?
selection_vector <- poker_vector > 0
  
# Print out selection_vector
selection_vector

##    Monday   Tuesday Wednesday  Thursday    Friday 
##      TRUE     FALSE      TRUE     FALSE      TRUE

1.3.4 Selection - by comparison - values

When you pass a logical vector in square brackets: it will only select the elements that correspond to TRUE in vector.

# Select from poker_vector these days
poker_winning_days <- poker_vector[selection_vector]
poker_winning_days

##    Monday Wednesday    Friday 
##       140        20       240

1.4 Matrix

Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.

1.4.1 Construct function

In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the matrix() function. Consider the following example: matrix(1:9, byrow = TRUE, nrow = 3)

In the matrix() function:

The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9).
The argument byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE.
The third argument nrow indicates that the matrix should have three rows.

# Construct a matrix with 3 rows that contain the numbers 1 up to 9, filled by rows
matrix(1:9, byrow = T, nrow = 3)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

# Construct a matrix with 3 rows that contain the numbers 1 up to 9, but filled by the columns
matrix(1:9, byrow = F, nrow = 3)

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

1.4.2 Construct by vector

In the editor, three vectors are defined. Each one represents the box office numbers from the first three Star Wars movies. The first element of each vector indicates the US box office revenue, the second element refers to the Non-US box office (source: Wikipedia).

In this exercise, you’ll combine all these figures into a single vector. Next, you’ll build a matrix from this vector.

# Box office Star Wars (in millions!)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)

# Create box_office
box_office <- c(new_hope, empire_strikes, return_jedi)
box_office

## [1] 460.998 314.400 290.475 247.900 309.306 165.800

# Construct a matrix with 3 rows, where each row represents a movie.
star_wars_matrix <- matrix(box_office, byrow = T, nrow = 3)
star_wars_matrix

##         [,1]  [,2]
## [1,] 460.998 314.4
## [2,] 290.475 247.9
## [3,] 309.306 165.8

1.4.3 Naming a matrix

Not only does this help you to read the data, but it is also useful to select certain elements from the matrix.

rownames(my_matrix) <- row_names_vector
colnames(my_matrix) <- col_names_vector
matrix(vec, byrow, nrow, dimnames = list(rownames, columnnames))

# Vectors region and titles, used for naming
region <- c("US", "non-US")
titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")

# Name the columns with region
colnames(star_wars_matrix) <- region

# Name the rows with titles
rownames(star_wars_matrix) <- titles

# Print out star_wars_matrix
star_wars_matrix

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8

# Construct by matrix argument "dimnames"
star_wars_matrix_dim <- matrix(box_office, 
                               nrow = 3, byrow = TRUE,
                               dimnames = list(titles, region))
star_wars_matrix_dim

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8

1.4.4 Manipulating - sum of each row & column

In R, the function rowSums() conveniently calculates the totals for each row of a matrix, colSums()calculates the totals for each column of a matrix. These function creates a new vector.

# Calculate worldwide box office figures for each row
worldwide_vector <- rowSums(star_wars_matrix); worldwide_vector

##              A New Hope The Empire Strikes Back      Return of the Jedi 
##                 775.398                 538.375                 475.106

# Calculate worldwide box office figures for each column
worldwide_vector_col <- colSums(star_wars_matrix); worldwide_vector_col

##       US   non-US 
## 1060.779  728.100

1.4.5 Manipulating - add columns

You can add a column or multiple columns to a matrix with the cbind() function, which merges matrices and/or vectors together by column. For example:

big_matrix <- cbind(matrix1, matrix2, vector1 ...)

# Bind the new variable worldwide_vector as a column to star_wars_matrix
all_wars_matrix <- cbind(star_wars_matrix, worldwide_vector)
all_wars_matrix

##                              US non-US worldwide_vector
## A New Hope              460.998  314.4          775.398
## The Empire Strikes Back 290.475  247.9          538.375
## Return of the Jedi      309.306  165.8          475.106

1.4.6 Manipulating - add rows

You can add a row or multiple rows to a matrix with the rbind() function, which merges matrices and/or vectors together by row.

# Construct another matrix for merging.
matrix2_rowname <- c("The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
star_wars_matrix2 <- matrix(c(474.5, 552.5, 310.7, 338.7, 380.3, 468.5),
                            byrow = T, nrow = 3,
                            dimnames = list(matrix2_rowname, region))
star_wars_matrix2

##                         US non-US
## The Phantom Menace   474.5  552.5
## Attack of the Clones 310.7  338.7
## Revenge of the Sith  380.3  468.5

# Combine both Star Wars trilogies in one matrix
all_wars_matrix <- rbind(star_wars_matrix, star_wars_matrix2)
all_wars_matrix

##                              US non-US
## A New Hope              460.998  314.4
## The Empire Strikes Back 290.475  247.9
## Return of the Jedi      309.306  165.8
## The Phantom Menace      474.500  552.5
## Attack of the Clones    310.700  338.7
## Revenge of the Sith     380.300  468.5

1.4.7 Selection of matrix elements

You can use the square brackets [ ] to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example: - my_matrix[1,2] selects the element at the first row and second column. - my_matrix[1:3,2:4] results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3, 4.

If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively: - my_matrix[,1] selects all elements of the first column. - my_matrix[1,] selects all elements of the first row.

# Select the non-US revenue for all movies
non_us_all <- all_wars_matrix[, 2]; non_us_all

##              A New Hope The Empire Strikes Back      Return of the Jedi 
##                   314.4                   247.9                   165.8 
##      The Phantom Menace    Attack of the Clones     Revenge of the Sith 
##                   552.5                   338.7                   468.5

# Select the non-US revenue for first two movies
non_us_some <- all_wars_matrix[1:2, 2]; non_us_some

##              A New Hope The Empire Strikes Back 
##                   314.4                   247.9

1.4.8 Arithmetic - 1

Similar to what you have learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R.

For example, 2 * my_matrix multiplies each element of my_matrix by two.

# Estimate the visitors.Assume that the price of a ticket was 5 dollars. Simply dividing the box office numbers by this ticket price gives you the number of visitors.
visitors <- all_wars_matrix / 5
visitors

##                              US non-US
## A New Hope              92.1996  62.88
## The Empire Strikes Back 58.0950  49.58
## Return of the Jedi      61.8612  33.16
## The Phantom Menace      94.9000 110.50
## Attack of the Clones    62.1400  67.74
## Revenge of the Sith     76.0600  93.70

1.4.9 Arithmetic - 2

Just like 2 * my_matrix multiplied every element of my_matrix by two, my_matrix1 * my_matrix2 creates a matrix where each element is the product of the corresponding elements in my_matrix1 and my_matrix2.

Note that this is not the standard matrix multiplication for which you should use %*% in R.

# Construct another matrix
ticket_rowname <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi", "The Phantom Menace", "Attack of the Clones", "Revenge of the Sith")
ticket_prices_matrix <- matrix(c(5,5,6,6,7,7,4,4,4.5,4.5,4.9,4.9),
                            byrow = T, nrow = 6,
                            dimnames = list(ticket_rowname, region))
ticket_prices_matrix

##                          US non-US
## A New Hope              5.0    5.0
## The Empire Strikes Back 6.0    6.0
## Return of the Jedi      7.0    7.0
## The Phantom Menace      4.0    4.0
## Attack of the Clones    4.5    4.5
## Revenge of the Sith     4.9    4.9

# Estimated number of visitors
visitors <- all_wars_matrix / ticket_prices_matrix

# US visitors
us_visitors <- visitors[, 1]

# Average number of US visitors
mean(us_visitors)

## [1] 75.01339

1.5 Factor

The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values. A good example of a categorical variable is sex(“Male” or “Female”).

1.5.1 Useful function

The function factor() or as.factor() will encode the vector as a factor.

# Sex vector
sex_vector <- c("Male", "Female", "Female", "Male", "Male")

# Convert sex_vector to a factor
factor_sex_vector <-factor(sex_vector); factor_sex_vector

## [1] Male   Female Female Male   Male  
## Levels: Female Male

1.5.2 Nominal & Ordinal categorical variable

A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that ‘one is worth more than the other’. For example, think of the categorical variable animals_vector with the categories “Elephant”, “Giraffe”, “Donkey” and “Horse”.

In contrast, ordinal variables do have a natural ordering. Consider for example the categorical variable temperature_vector with the categories: “Low”, “Medium” and “High”. Here it is obvious that “Medium” stands above “Low”, and “High” stands above “Medium”.

# Animals
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
factor_animals_vector

## [1] Elephant Giraffe  Donkey   Horse   
## Levels: Donkey Elephant Giraffe Horse

# Temperature, ordinal, order = T, levels
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector

## [1] High   Low    High   Low    Medium
## Levels: Low < Medium < High

1.5.3 Factor levels

However, sometimes you will want to change the names of these levels for clarity or other reasons. R allows you to do this with the function levels():

levels(factor_vector) <- c("name1", "name2",...)

# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector

## [1] M F F M M
## Levels: F M

# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female", "Male")

factor_survey_vector

## [1] Male   Female Female Male   Male  
## Levels: Female Male

1.5.4 Summarizing a factor

The function summary() will give you a quick overview of the contents of a variable.

# Generate summary for survey_vector
summary(survey_vector)

##    Length     Class      Mode 
##         5 character character

# Generate summary for factor_survey_vector
summary(factor_survey_vector)

## Female   Male 
##      2      3

See the difference of the outputs? So, make sure to add levels for factors.

1.5.5 Ordered factors

R will return NA when you try to compare values in a factor, since the idea doesn’t make sense. Ordered factors, where more meaningful comparisons are possible.

# Create speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")

By default, the function factor() transforms speed_vector into an unordered factor. To create an ordered factor, you have to add two additional arguments: ordered and levels.

Function:

factor(some_vector, 
       ordered = TRUE, 
       levels = c("lev1", "lev2" ...))

# Convert speed_vector to ordered factor vector
factor_speed_vector <- factor(speed_vector, order = T, levels = c("slow", "medium", "fast"))

# Print factor_speed_vector
factor_speed_vector

## [1] medium slow   slow   medium fast  
## Levels: slow < medium < fast

summary(factor_speed_vector)

##   slow medium   fast 
##      2      2      1

Comparing ordered factors

# Factor value for second data analyst
da2 <- factor_speed_vector[2]; da2

## [1] slow
## Levels: slow < medium < fast

# Factor value for fifth data analyst
da5 <- factor_speed_vector[5]; da5

## [1] fast
## Levels: slow < medium < fast

# Is data analyst 2 faster than data analyst 5?
da2 > da5

## [1] FALSE

1.6 Data Frame

Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.

1.6.1 Quick look at dataset

The function head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your dataset.

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

tail(mtcars)

##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

?mtcars

## starting httpd help server ... done

The function str() shows you the structure of your dataset.

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

1.6.2 Creating a data frame

Function: data.frame()

# Definition of vectors
name <- c("Mercury", "Venus", "Earth", 
          "Mars", "Jupiter", "Saturn", 
          "Uranus", "Neptune")
type <- c("Terrestrial planet", 
          "Terrestrial planet", 
          "Terrestrial planet", 
          "Terrestrial planet", "Gas giant", 
          "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 
              11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03, 
              0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

# Create a data frame from the vectors
planets_df <- data.frame(name, type, diameter, rotation, rings)


head(planets_df)

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE

str(planets_df)

## 'data.frame':    8 obs. of  5 variables:
##  $ name    : chr  "Mercury" "Venus" "Earth" "Mars" ...
##  $ type    : chr  "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
##  $ diameter: num  0.382 0.949 1 0.532 11.209 ...
##  $ rotation: num  58.64 -243.02 1 1.03 0.41 ...
##  $ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...

1.6.3 Selection - brackets

You select elements from a data frame with the help of square brackets [ ]. By using a comma, you can indicate what to select from the rows and the columns respectively. For example: - my_df[1,2] selects the value at the first row and second column in my_df. - my_df[1:3,2:4] selects rows 1, 2, 3 and columns 2, 3, 4 in my_df. Sometimes you want to select all elements of a row or column. For example, my_df[1, ] selects all elements of the first row.

# Print out diameter of Mercury (row 1, column 3)
planets_df[1, 3]

## [1] 0.382

# Print out data for Mars (entire fourth row)
planets_df[4, ]

##   name               type diameter rotation rings
## 4 Mars Terrestrial planet    0.532     1.03 FALSE

1.6.4 Selection - names

You can also use the variable names to select columns of a data frame.

# Select first 5 values of diameter column
planets_df[1:5, "diameter"]

## [1]  0.382  0.949  1.000  0.532 11.209

1.6.5 Selection - $

There is a short-cut. If your columns have names, you can use the $ sign.

# Select the rings variable from planets_df
rings_vector <- planets_df$rings; rings_vector

## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

# Select all columns for planets with rings (only output True value)
planets_df[rings_vector, ]

##      name      type diameter rotation rings
## 5 Jupiter Gas giant   11.209     0.41  TRUE
## 6  Saturn Gas giant    9.449     0.43  TRUE
## 7  Uranus Gas giant    4.007    -0.72  TRUE
## 8 Neptune Gas giant    3.883     0.67  TRUE

1.6.6 Selection - subset

The first argument of subset() specifies the dataset for which you want a subset. By adding the second argument, you give R the necessary information and conditions to select the correct subset.

subset(my_df, subset = some_condition)

You should see the subset() function as a short-cut. The code below will give the exact same result as you got in the previous exercise, but this time, you didn’t need the rings_vector!

subset(planets_df, subset = rings)

# Select planets with diameter < 1
subset(planets_df, subset = diameter < 1)

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE

1.6.7 Sorting

order() is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:

a <- c(100, 10, 1000)
order(a)

## [1] 2 1 3

a[order(a)]

## [1]   10  100 1000

You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the diameter column.

# Use order() to create positions
positions <-  order(planets_df$diameter)

# Use positions to sort planets_df
planets_df[positions, ]

##      name               type diameter rotation rings
## 1 Mercury Terrestrial planet    0.382    58.64 FALSE
## 4    Mars Terrestrial planet    0.532     1.03 FALSE
## 2   Venus Terrestrial planet    0.949  -243.02 FALSE
## 3   Earth Terrestrial planet    1.000     1.00 FALSE
## 8 Neptune          Gas giant    3.883     0.67  TRUE
## 7  Uranus          Gas giant    4.007    -0.72  TRUE
## 6  Saturn          Gas giant    9.449     0.43  TRUE
## 5 Jupiter          Gas giant   11.209     0.41  TRUE

1.7 List

Lists: a list is some kind super data type, you can store practically any piece of information in it! It allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc.

1.7.1 Creating a list

Function: list()

my_list <- list(comp1, comp2 ...)

The arguments to the list function are the list components. These components can be matrices, vectors, other lists, etc.

# Vector with numerics from 1 up to 10
my_vector <- 1:10 

# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)

# First 10 elements of the built-in data frame mtcars
my_df <- mtcars[1:10,]

# Construct list with these different elements:
my_list <- list(my_vector, my_matrix, my_df)
my_list

## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## [[3]]
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

1.7.2 Creating a named list

Just like on your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them. This creates a list with components that are named name1, name2, and so on.

my_list <- list(name1 = your_comp1, 
                name2 = your_comp2)

If you want to name your lists after you’ve created them, you can use the names() function as you did with vectors. The following commands are fully equivalent to the assignment above:

my_list <- list(your_comp1, your_comp2)
names(my_list) <- c("name1", "name2")

# Adapt list() call to give the components names-after
names(my_list) <- c("vec", "mat", "df")
my_list

## $vec
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## $df
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

# Adapt list() call to give the components names-before
my_list <- list("vec" = my_vector, 
                "mat" = my_matrix, 
                "df" = my_df)
my_list

## $vec
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $mat
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## $df
##                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

1.7.3 Selection

You can also refer to the names of the components, with [[ ]] or with the $ sign.

Note: to select elements from vectors, you use single square brackets: [ ]. Don’t mix them up!

# Print the first element of the list, which is a vector
my_list[[1]]

##  [1]  1  2  3  4  5  6  7  8  9 10

my_list[["vec"]]

##  [1]  1  2  3  4  5  6  7  8  9 10

my_list$vec

##  [1]  1  2  3  4  5  6  7  8  9 10

# Print the second element of the vector
my_list[[1]][2]

## [1] 2

1.8 Bring it all together

# Construct vectors
movie_title <- "The Departed"
movie_actors <- c("Leonardo DiCaprio", "Matt Damon", "Jack Nicholson", 
                  "Mark Wahlberg", "Vera Farmiga", "Martin Sheen")
scores <- c(4.6, 5, 4.8, 5, 4.2)
comments <- c("I would watch it again", "Amazing!", "I liked it", "One of the best movies", "Fascinating plot") 

# Save the average of the scores vector as avg_review
avg_review <- mean(scores)

# Combine scores and comments into the reviews_df data frame
reviews_df <- data.frame(scores, comments)

# Sort reviews_df by scores
reviews_df <- reviews_df[order(reviews_df$scores), ]

# Create and print out a list, called departed_list
departed_list <- list("title" = movie_title, 
                      "actors" = movie_actors, 
                      "reviews" = reviews_df, 
                      "average" = avg_review)
departed_list

## $title
## [1] "The Departed"
## 
## $actors
## [1] "Leonardo DiCaprio" "Matt Damon"        "Jack Nicholson"   
## [4] "Mark Wahlberg"     "Vera Farmiga"      "Martin Sheen"     
## 
## $reviews
##   scores               comments
## 5    4.2       Fascinating plot
## 1    4.6 I would watch it again
## 3    4.8             I liked it
## 2    5.0               Amazing!
## 4    5.0 One of the best movies
## 
## $average
## [1] 4.72