Chapter 12 Introduction to Writing Functions
12.1 Write a Function
12.1.1 Why write function?
Benefits of writing functions:
Functions eliminate repetition from your code, which
can reduce your workload, and
help avoid errors.
Functions also allow code reuse and sharing.
12.1.2 Convert scripts into functions
Make a template
Paste your script into the body
Choose the arguments
Replace specific values with arguments
Generalize variable names
Remove the final assignment
12.1.2.1 First function
# Your script, from a previous step
# Sample from coin_sides once
<- c("head", "tail")
coin_sides sample(coin_sides, 1)
## [1] "tail"
# Paste your script into the function body
<- function() {
toss_coin <- c("head", "tail")
coin_sides sample(coin_sides, 1)
}
# Call your function
toss_coin()
## [1] "head"
12.1.2.2 Inputs to functions
The inputs to functions are called arguments.
# Update the function to return n coin tosses
<- function(n_flips) {
toss_coin <- c("head", "tail")
coin_sides sample(coin_sides, size = n_flips, replace = TRUE)
}
# Generate 10 coin tosses
toss_coin(10)
## [1] "head" "head" "tail" "tail" "tail" "tail" "tail" "tail" "tail" "head"
12.1.2.3 Multiple inputs to functions
If a function should have more than one argument, list them in the function signature, separated by commas.
Bias the coin by weighting the sampling. Specify the prob
argument so that heads are sampled with probability p_head
(and tails are sampled with probability 1 - p_head
).
<- c("head", "tail")
coin_sides <- 10
n_flips <- 0.8
p_head
# Define a vector of weights
<- c(p_head, 1 - p_head)
weights
# Update so that heads are sampled with prob p_head
sample(coin_sides, n_flips, replace = TRUE, prob = weights)
## [1] "head" "head" "head" "head" "head" "head" "head" "head" "head" "head"
Update the definition of toss_coin()
so it accepts an argument, p_head
, and weights the samples using the code you wrote in the previous step.
# Update the function so heads have probability p_head
<- function(n_flips, p_head) {
toss_coin <- c("head", "tail")
coin_sides # Define a vector of weights
<- c(p_head, 1 - p_head)
weights # Modify the sampling to be weighted
sample(coin_sides, n_flips, replace = TRUE, prob = weights)
}
# Generate 10 coin tosses with an 80% chance of each head
toss_coin(10, 0.8)
## [1] "head" "head" "head" "head" "tail" "head" "head" "head" "tail" "head"
12.1.3 Readable code
Function names should contain a verb
Readability vs. typeability
Understanding code >> typing code
Code editors have autocomplete
You can alias common functions
Types of argument
Data arguments: what you compute on
Detail arguments: how you perform the computation
Data args should precede detail args
Renaming GLM
R’s generalized linear regression function, glm()
, suffers the same usability problems as lm()
: its name is an acronym, and its formula
and data
arguments are in the wrong order.
To solve this exercise, you need to know two things about generalized linear regression:
glm()
formulas are specified likelm()
formulas: response is on the left, and explanatory variables are added on the right.To model count data, set
glm()
’sfamily
argument topoisson
, making it a Poisson regression.
library(tidyverse)
<- read_rds("data/snake_river_visits.rds")
snake_river_visits
# Run a generalized linear regression
glm(
# Model no. of visits vs. gender, income, travel
~ gender + income + travel,
n_visits # Use the snake_river_visits dataset
data = snake_river_visits,
# Make it a Poisson regression
family = poisson
)
##
## Call: glm(formula = n_visits ~ gender + income + travel, family = poisson,
## data = snake_river_visits)
##
## Coefficients:
## (Intercept) genderfemale income($25k,$55k] income($55k,$95k]
## 4.0864 0.3740 -0.0199 -0.5807
## income($95k,$Inf) travel(0.25h,4h] travel(4h,Infh)
## -0.5782 -0.6271 -2.4230
##
## Degrees of Freedom: 345 Total (i.e. Null); 339 Residual
## (64 observations deleted due to missingness)
## Null Deviance: 18850
## Residual Deviance: 11530 AIC: 12860
Define a function, run_poisson_regression()
, to run a Poisson regression. This should take two arguments: data
and formula
, and call glm()
.
# Write a function to run a Poisson regression
<- function(data, formula) {
run_poisson_regression glm(formula, data, family = poisson)
}
# Re-run the Poisson regression, using your function
<- snake_river_visits %>%
model run_poisson_regression(n_visits ~ gender + income + travel)
# Run this to see the predictions
# snake_river_explanatory %>%
# mutate(predicted_n_visits = predict(model, ., type = "response"))%>%
# arrange(desc(predicted_n_visits))
12.2 All About Arguments
12.2.1 Default arguments
12.2.1.1 Numeric defaults
Only set defaults for numeric detail arguments, not data arguments.
cut_by_quantile()
converts a numeric vector into a categorical variable where quantiles define the cut points.
By specifying default arguments, you can make it easier to use. Let’s start with n
, which specifies how many categories to cut x
into.
# A numeric vector of the number of visits to Snake River
<- snake_river_visits$n_visits
n_visits
# Set the default for n to 5
<- function(x, n = 5, na.rm, labels, interval_type) {
cut_by_quantile <- seq(0, 1, length.out = n + 1)
probs <- quantile(x, probs, na.rm = na.rm, names = FALSE)
qtiles <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
right cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the n argument from the call
cut_by_quantile(
n_visits,na.rm = FALSE,
labels = c("very low", "low", "medium", "high", "very high"),
interval_type = "(lo, hi]"
)
## [1] very low very low very low very low very low very low very low
## [8] very low very low very low very low very low very low very low
## [15] very low very low very low very low very low high very high
## [22] high very low medium low very low very low very low
## [29] very low very low very low very high very high very high very high
## [36] very high high very high very high very high very high very high
## [43] medium very high very high very high medium medium low
## [50] high high high very high very high high high
## [57] very high medium very high high medium high very high
## [64] very high very high very high high high very high high
## [71] very low very high high high medium high high
## [78] high medium very high very high very high high high
## [85] high very low very high medium high very high high
## [92] high very high high very low very low medium very low
## [99] medium medium very high medium medium medium high
## [106] low high very high medium very high medium very high
## [113] low very high low very high high very low very low
## [120] very low very low low very low very low very low very low
## [127] very low very low medium very low very low low low
## [134] very low very low low very low very low very low low
## [141] low medium medium medium medium medium very low
## [148] very low low very low low medium very low very low
## [155] very low very low very high high very high high medium
## [162] very high medium very low high medium high high
## [169] very high high high very high very high high very high
## [176] high high medium very high high high high
## [183] very high very high very low high very high high high
## [190] medium very high high very high high very high high
## [197] very high high very high very low high very high very high
## [204] very low very low medium very high medium low medium
## [211] high medium very low medium very high high very high
## [218] high very high high low high medium very high
## [225] medium high high high very low high high
## [232] high very high high medium medium very low very low
## [239] very low very low medium low very low very low very low
## [246] medium high very low very low medium very low very low
## [253] very low very low very low very low very low very low very low
## [260] very low very high medium very low very high medium very high
## [267] medium low very high medium medium medium low
## [274] high medium high very high medium very high very high
## [281] medium medium very high high medium very high high
## [288] medium low very low medium very low very low very low
## [295] very low very low low very low very low very low very low
## [302] very low very low very low very low low very low very low
## [309] very low very low low very low very low low very low
## [316] very low very low very low low very low very low very low
## [323] very low very low low very low very low very low very low
## [330] very low very low very low very low very low very low very low
## [337] very low very low very low very low very low very low very low
## [344] very low very low medium very low very low very low very low
## [351] very low very low very low very low very low very low very low
## [358] very low low very low very low very low very low very low
## [365] very low very low very low very low very low very low low
## [372] very low very low very low very high high very high very high
## [379] very high high very high very high very high very high medium
## [386] medium medium high very high high high high
## [393] high high high high very high high very high
## [400] medium high low high very high low very low
## [407] medium very low medium low
## Levels: very low low medium high very high
12.2.1.2 Logical defaults
cut_by_quantile()
is now slightly easier to use, but you still always have to specify the na.rm
argument. This removes missing values.
Where functions have an argument for removing missing values, the best practice is to not remove them by default (in case you hadn’t spotted that you had missing values). That means that the default for na.rm
should be FALSE
.
# Set the default for na.rm to FALSE
<- function(x, n = 5, na.rm = FALSE, labels, interval_type) {
cut_by_quantile <- seq(0, 1, length.out = n + 1)
probs <- quantile(x, probs, na.rm = na.rm, names = FALSE)
qtiles <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
right cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the na.rm argument from the call
cut_by_quantile(
n_visits, labels = c("very low", "low", "medium", "high", "very high"),
interval_type = "(lo, hi]"
)
## [1] very low very low very low very low very low very low very low
## [8] very low very low very low very low very low very low very low
## [15] very low very low very low very low very low high very high
## [22] high very low medium low very low very low very low
## [29] very low very low very low very high very high very high very high
## [36] very high high very high very high very high very high very high
## [43] medium very high very high very high medium medium low
## [50] high high high very high very high high high
## [57] very high medium very high high medium high very high
## [64] very high very high very high high high very high high
## [71] very low very high high high medium high high
## [78] high medium very high very high very high high high
## [85] high very low very high medium high very high high
## [92] high very high high very low very low medium very low
## [99] medium medium very high medium medium medium high
## [106] low high very high medium very high medium very high
## [113] low very high low very high high very low very low
## [120] very low very low low very low very low very low very low
## [127] very low very low medium very low very low low low
## [134] very low very low low very low very low very low low
## [141] low medium medium medium medium medium very low
## [148] very low low very low low medium very low very low
## [155] very low very low very high high very high high medium
## [162] very high medium very low high medium high high
## [169] very high high high very high very high high very high
## [176] high high medium very high high high high
## [183] very high very high very low high very high high high
## [190] medium very high high very high high very high high
## [197] very high high very high very low high very high very high
## [204] very low very low medium very high medium low medium
## [211] high medium very low medium very high high very high
## [218] high very high high low high medium very high
## [225] medium high high high very low high high
## [232] high very high high medium medium very low very low
## [239] very low very low medium low very low very low very low
## [246] medium high very low very low medium very low very low
## [253] very low very low very low very low very low very low very low
## [260] very low very high medium very low very high medium very high
## [267] medium low very high medium medium medium low
## [274] high medium high very high medium very high very high
## [281] medium medium very high high medium very high high
## [288] medium low very low medium very low very low very low
## [295] very low very low low very low very low very low very low
## [302] very low very low very low very low low very low very low
## [309] very low very low low very low very low low very low
## [316] very low very low very low low very low very low very low
## [323] very low very low low very low very low very low very low
## [330] very low very low very low very low very low very low very low
## [337] very low very low very low very low very low very low very low
## [344] very low very low medium very low very low very low very low
## [351] very low very low very low very low very low very low very low
## [358] very low low very low very low very low very low very low
## [365] very low very low very low very low very low very low low
## [372] very low very low very low very high high very high very high
## [379] very high high very high very high very high very high medium
## [386] medium medium high very high high high high
## [393] high high high high very high high very high
## [400] medium high low high very high low very low
## [407] medium very low medium low
## Levels: very low low medium high very high
12.2.1.3 NULL defaults
The cut()
function used by cut_by_quantile()
can automatically provide sensible labels for each category.
The code to generate these labels is pretty complicated, so rather than appearing in the function signature directly, its labels
argument defaults to NULL
, and the calculation details are shown on the ?cut
help page.
(If you use this capability, make sure to document how the argument behaves in the function’s help page.)
# Set the default for labels to NULL
<- function(x, n = 5, na.rm = FALSE, labels = NULL, interval_type) {
cut_by_quantile <- seq(0, 1, length.out = n + 1)
probs <- quantile(x, probs, na.rm = na.rm, names = FALSE)
qtiles <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
right cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the labels argument from the call
cut_by_quantile(
n_visits,interval_type = "(lo, hi]"
)
## [1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [9] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [17] [0,1] [0,1] [0,1] (10,35] (35,350] (10,35] [0,1] (2,10]
## [25] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] (35,350]
## [33] (35,350] (35,350] (35,350] (35,350] (10,35] (35,350] (35,350] (35,350]
## [41] (35,350] (35,350] (2,10] (35,350] (35,350] (35,350] (2,10] (2,10]
## [49] (1,2] (10,35] (10,35] (10,35] (35,350] (35,350] (10,35] (10,35]
## [57] (35,350] (2,10] (35,350] (10,35] (2,10] (10,35] (35,350] (35,350]
## [65] (35,350] (35,350] (10,35] (10,35] (35,350] (10,35] [0,1] (35,350]
## [73] (10,35] (10,35] (2,10] (10,35] (10,35] (10,35] (2,10] (35,350]
## [81] (35,350] (35,350] (10,35] (10,35] (10,35] [0,1] (35,350] (2,10]
## [89] (10,35] (35,350] (10,35] (10,35] (35,350] (10,35] [0,1] [0,1]
## [97] (2,10] [0,1] (2,10] (2,10] (35,350] (2,10] (2,10] (2,10]
## [105] (10,35] (1,2] (10,35] (35,350] (2,10] (35,350] (2,10] (35,350]
## [113] (1,2] (35,350] (1,2] (35,350] (10,35] [0,1] [0,1] [0,1]
## [121] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [129] (2,10] [0,1] [0,1] (1,2] (1,2] [0,1] [0,1] (1,2]
## [137] [0,1] [0,1] [0,1] (1,2] (1,2] (2,10] (2,10] (2,10]
## [145] (2,10] (2,10] [0,1] [0,1] (1,2] [0,1] (1,2] (2,10]
## [153] [0,1] [0,1] [0,1] [0,1] (35,350] (10,35] (35,350] (10,35]
## [161] (2,10] (35,350] (2,10] [0,1] (10,35] (2,10] (10,35] (10,35]
## [169] (35,350] (10,35] (10,35] (35,350] (35,350] (10,35] (35,350] (10,35]
## [177] (10,35] (2,10] (35,350] (10,35] (10,35] (10,35] (35,350] (35,350]
## [185] [0,1] (10,35] (35,350] (10,35] (10,35] (2,10] (35,350] (10,35]
## [193] (35,350] (10,35] (35,350] (10,35] (35,350] (10,35] (35,350] [0,1]
## [201] (10,35] (35,350] (35,350] [0,1] [0,1] (2,10] (35,350] (2,10]
## [209] (1,2] (2,10] (10,35] (2,10] [0,1] (2,10] (35,350] (10,35]
## [217] (35,350] (10,35] (35,350] (10,35] (1,2] (10,35] (2,10] (35,350]
## [225] (2,10] (10,35] (10,35] (10,35] [0,1] (10,35] (10,35] (10,35]
## [233] (35,350] (10,35] (2,10] (2,10] [0,1] [0,1] [0,1] [0,1]
## [241] (2,10] (1,2] [0,1] [0,1] [0,1] (2,10] (10,35] [0,1]
## [249] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [257] [0,1] [0,1] [0,1] [0,1] (35,350] (2,10] [0,1] (35,350]
## [265] (2,10] (35,350] (2,10] (1,2] (35,350] (2,10] (2,10] (2,10]
## [273] (1,2] (10,35] (2,10] (10,35] (35,350] (2,10] (35,350] (35,350]
## [281] (2,10] (2,10] (35,350] (10,35] (2,10] (35,350] (10,35] (2,10]
## [289] (1,2] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1]
## [297] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [305] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [313] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [321] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1] [0,1] [0,1]
## [329] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [337] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [345] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [353] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [361] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [369] [0,1] [0,1] (1,2] [0,1] [0,1] [0,1] (35,350] (10,35]
## [377] (35,350] (35,350] (35,350] (10,35] (35,350] (35,350] (35,350] (35,350]
## [385] (2,10] (2,10] (2,10] (10,35] (35,350] (10,35] (10,35] (10,35]
## [393] (10,35] (10,35] (10,35] (10,35] (35,350] (10,35] (35,350] (2,10]
## [401] (10,35] (1,2] (10,35] (35,350] (1,2] [0,1] (2,10] [0,1]
## [409] (2,10] (1,2]
## Levels: [0,1] (1,2] (2,10] (10,35] (35,350]
12.2.1.4 Categorical defaults
Pass a character vector in the signature.
Call
match.arg()
in the body.
The pattern for categorical defaults is:
function(cat_arg = c("choice1", "choice2")) {
<- match.arg(cat_arg)
cat_arg }
When cutting up a numeric vector, you need to worry about what happens if a value lands exactly on a boundary. You can either put this value into a category of the lower interval or the higher interval. That is, you can choose your intervals to include values at the top boundary but not the bottom.
In mathematical terminology,
“open on the left, closed on the right” =
(lo, hi]
“closed on the left, open on the right” =
[lo, hi)
cut_by_quantile()
should allow these two choices.
# Take "rank()" for example, look at the ties.method argument
head(rank, 7)
##
## 1 function (x, na.last = TRUE, ties.method = c("average", "first",
## 2 "last", "random", "max", "min"))
## 3 {
## 4 stopifnot(length(na.last) == 1L)
## 5 nas <- is.na(x)
## 6 nm <- names(x)
## 7 ties.method <- match.arg(ties.method)
# Set the categories for interval_type to "(lo, hi]" and "[lo, hi)"
<- function(x, n = 5, na.rm = FALSE, labels = NULL,
cut_by_quantile interval_type = c("(lo, hi]", "[lo, hi)")) {
# Match the interval_type argument
<- match.arg(interval_type)
interval_type <- seq(0, 1, length.out = n + 1)
probs <- quantile(x, probs, na.rm = na.rm, names = FALSE)
qtiles <- switch(interval_type, "(lo, hi]" = TRUE, "[lo, hi)" = FALSE)
right cut(x, qtiles, labels = labels, right = right, include.lowest = TRUE)
}
# Remove the interval_type argument from the call
cut_by_quantile(n_visits)
## [1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [9] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [17] [0,1] [0,1] [0,1] (10,35] (35,350] (10,35] [0,1] (2,10]
## [25] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] (35,350]
## [33] (35,350] (35,350] (35,350] (35,350] (10,35] (35,350] (35,350] (35,350]
## [41] (35,350] (35,350] (2,10] (35,350] (35,350] (35,350] (2,10] (2,10]
## [49] (1,2] (10,35] (10,35] (10,35] (35,350] (35,350] (10,35] (10,35]
## [57] (35,350] (2,10] (35,350] (10,35] (2,10] (10,35] (35,350] (35,350]
## [65] (35,350] (35,350] (10,35] (10,35] (35,350] (10,35] [0,1] (35,350]
## [73] (10,35] (10,35] (2,10] (10,35] (10,35] (10,35] (2,10] (35,350]
## [81] (35,350] (35,350] (10,35] (10,35] (10,35] [0,1] (35,350] (2,10]
## [89] (10,35] (35,350] (10,35] (10,35] (35,350] (10,35] [0,1] [0,1]
## [97] (2,10] [0,1] (2,10] (2,10] (35,350] (2,10] (2,10] (2,10]
## [105] (10,35] (1,2] (10,35] (35,350] (2,10] (35,350] (2,10] (35,350]
## [113] (1,2] (35,350] (1,2] (35,350] (10,35] [0,1] [0,1] [0,1]
## [121] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [129] (2,10] [0,1] [0,1] (1,2] (1,2] [0,1] [0,1] (1,2]
## [137] [0,1] [0,1] [0,1] (1,2] (1,2] (2,10] (2,10] (2,10]
## [145] (2,10] (2,10] [0,1] [0,1] (1,2] [0,1] (1,2] (2,10]
## [153] [0,1] [0,1] [0,1] [0,1] (35,350] (10,35] (35,350] (10,35]
## [161] (2,10] (35,350] (2,10] [0,1] (10,35] (2,10] (10,35] (10,35]
## [169] (35,350] (10,35] (10,35] (35,350] (35,350] (10,35] (35,350] (10,35]
## [177] (10,35] (2,10] (35,350] (10,35] (10,35] (10,35] (35,350] (35,350]
## [185] [0,1] (10,35] (35,350] (10,35] (10,35] (2,10] (35,350] (10,35]
## [193] (35,350] (10,35] (35,350] (10,35] (35,350] (10,35] (35,350] [0,1]
## [201] (10,35] (35,350] (35,350] [0,1] [0,1] (2,10] (35,350] (2,10]
## [209] (1,2] (2,10] (10,35] (2,10] [0,1] (2,10] (35,350] (10,35]
## [217] (35,350] (10,35] (35,350] (10,35] (1,2] (10,35] (2,10] (35,350]
## [225] (2,10] (10,35] (10,35] (10,35] [0,1] (10,35] (10,35] (10,35]
## [233] (35,350] (10,35] (2,10] (2,10] [0,1] [0,1] [0,1] [0,1]
## [241] (2,10] (1,2] [0,1] [0,1] [0,1] (2,10] (10,35] [0,1]
## [249] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [257] [0,1] [0,1] [0,1] [0,1] (35,350] (2,10] [0,1] (35,350]
## [265] (2,10] (35,350] (2,10] (1,2] (35,350] (2,10] (2,10] (2,10]
## [273] (1,2] (10,35] (2,10] (10,35] (35,350] (2,10] (35,350] (35,350]
## [281] (2,10] (2,10] (35,350] (10,35] (2,10] (35,350] (10,35] (2,10]
## [289] (1,2] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1]
## [297] (1,2] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [305] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [313] [0,1] (1,2] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [321] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1] [0,1] [0,1]
## [329] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [337] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [345] [0,1] (2,10] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [353] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] (1,2] [0,1]
## [361] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1] [0,1]
## [369] [0,1] [0,1] (1,2] [0,1] [0,1] [0,1] (35,350] (10,35]
## [377] (35,350] (35,350] (35,350] (10,35] (35,350] (35,350] (35,350] (35,350]
## [385] (2,10] (2,10] (2,10] (10,35] (35,350] (10,35] (10,35] (10,35]
## [393] (10,35] (10,35] (10,35] (10,35] (35,350] (10,35] (35,350] (2,10]
## [401] (10,35] (1,2] (10,35] (35,350] (1,2] [0,1] (2,10] [0,1]
## [409] (2,10] (1,2]
## Levels: [0,1] (1,2] (2,10] (10,35] (35,350]
match.arg()
handles throwing an error if the user types a value that wasn’t specified.
12.2.2 Pass arguments between functions
12.2.2.1 Harmonic mean
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the data. The harmonic mean is often used to average ratio data.
You’ll be using it on the price/earnings ratio of stocks in the Standard and Poor’s 500 index, provided as std_and_poor500
. Price/earnings ratio is a measure of how expensive a stock is.
<- read_rds("data/std_and_poor500_with_pe_2019-06-21.rds")
std_and_poor500 glimpse(std_and_poor500)
## Rows: 505
## Columns: 5
## $ symbol <chr> "MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD", "…
## $ company <chr> "3M Company", "Abbott Laboratories", "AbbVie Inc.", "ABIOMED …
## $ sector <chr> "Industrials", "Health Care", "Health Care", "Health Care", "…
## $ industry <chr> "Industrial Conglomerates", "Health Care Equipment", "Pharmac…
## $ pe_ratio <dbl> 18.31678, 57.66621, 22.43805, 45.63993, 27.00233, 20.13596, 5…
# Write a function to calculate the reciprocal
<- function(x) {
get_reciprocal 1/x
}
# Write a function to calculate the harmonic mean
<- function(x) {
calc_harmonic_mean %>%
x get_reciprocal() %>%
mean() %>%
get_reciprocal()
}
# Calculate each sector Price/earnings ratio
%>%
std_and_poor500 # Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio
summarise(hmean_pe_ratio = calc_harmonic_mean(pe_ratio))
## # A tibble: 11 × 2
## sector hmean_pe_ratio
## <chr> <dbl>
## 1 Communication Services NA
## 2 Consumer Discretionary NA
## 3 Consumer Staples NA
## 4 Energy NA
## 5 Financials NA
## 6 Health Care NA
## 7 Industrials NA
## 8 Information Technology NA
## 9 Materials NA
## 10 Real Estate 32.5
## 11 Utilities NA
It looks like we have a problem though: most sectors have missing values.
12.2.2.2 Handling missing values
It would be useful for your function to be able to remove missing values before calculating.
Rather than writing your own code for this, you can outsource this functionality to mean()
.
# Add an na.rm arg with a default, and pass it to mean()
<- function(x, na.rm = FALSE) {
calc_harmonic_mean %>%
x get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
%>%
std_and_poor500 # Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio, and remove missing value
summarise(hmean_pe_ratio = calc_harmonic_mean(pe_ratio, na.rm = TRUE))
## # A tibble: 11 × 2
## sector hmean_pe_ratio
## <chr> <dbl>
## 1 Communication Services 17.5
## 2 Consumer Discretionary 15.2
## 3 Consumer Staples 19.8
## 4 Energy 13.7
## 5 Financials 12.9
## 6 Health Care 26.6
## 7 Industrials 18.2
## 8 Information Technology 21.6
## 9 Materials 16.3
## 10 Real Estate 32.5
## 11 Utilities 23.9
12.2.2.3 Using ...
The tradeoff
Benefits
Less typing for you
No need to match signatures
Drawbacks
You need to trust the inner function
The interface is not as obvious to users
Rather than explicitly giving calc_harmonic_mean()
and na.rm
argument, you can use ...
to simply “pass other arguments” to mean()
.
# Swap na.rm arg for ... in signature and body
<- function(x, ...) {
calc_harmonic_mean %>%
x get_reciprocal() %>%
mean(...) %>%
get_reciprocal()
}
%>%
std_and_poor500 # Group by sector
group_by(sector) %>%
# Summarize, calculating harmonic mean of P/E ratio, and remove missing value
summarise(hmean_pe_ratio = calc_harmonic_mean(pe_ratio, na.rm = TRUE))
## # A tibble: 11 × 2
## sector hmean_pe_ratio
## <chr> <dbl>
## 1 Communication Services 17.5
## 2 Consumer Discretionary 15.2
## 3 Consumer Staples 19.8
## 4 Energy 13.7
## 5 Financials 12.9
## 6 Health Care 26.6
## 7 Industrials 18.2
## 8 Information Technology 21.6
## 9 Materials 16.3
## 10 Real Estate 32.5
## 11 Utilities 23.9
Using ...
doesn’t change how people use your function; it just means the function is more flexible.
12.2.3 Checking arguments
12.2.3.1 Throwing errors
If a user provides a bad input to a function, the best course of action is to throw an error letting them know. The two rules are
Throw the error message as soon as you realize there is a problem (typically at the start of the function).
Make the error message easily understandable.
You can use the assert()
functions from assert
package to check inputs and throw errors when they fail. (no output means no problem)
library(assert)
<- function(x, na.rm = FALSE) {
calc_harmonic_mean # Assert that x is numeric
assert(is.numeric(x))
%>%
x get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
# See what happens when you pass it strings
calc_harmonic_mean(std_and_poor500$sector)
: in calc_harmonic_mean(x = std_and_poor500$sector)
Error:
Failed checksis.numeric(x)
12.2.3.2 Custom error logic
Sometimes the assert()
functions in assert don’t give the most informative error message. For example, won’t say why that’s a problem. In that case, you can use the is_*()
functions in conjunction with messages, warnings, or errors to define custom feedback.
The harmonic mean only makes sense when x has all positive values.
<- function(x, na.rm = FALSE) {
calc_harmonic_mean assert(is.numeric(x))
# Check if any values of x are non-positive
if(any(x < 0, na.rm = TRUE)) {
# Throw an error
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}%>%
x get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
# See what happens when you pass it negative numbers
calc_harmonic_mean(std_and_poor500$pe_ratio - 20)
in calc_harmonic_mean(std_and_poor500$pe_ratio - 20) :
Error -positive values, so the harmonic mean makes no sense. x contains non
Explaining what went wrong is helpful to users. Explaining why it is wrong is even better!
12.2.3.3 Fixing function arguments
You still need to provide some checks on the na.rm
argument. This time, rather than throwing errors when the input is in an incorrect form, you are going to try to fix it.
na.rm
should be a logical vector with one element.
Fix the na.rm
argument by using use_first()
to select the first na.rm
element, and coerce_to()
to change it to logical.
library(assert)
## Warning: package 'assert' was built under R version 4.3.1
library(assertive.base)
## Warning: package 'assertive.base' was built under R version 4.3.2
# Update the function definition to fix the na.rm argument
<- function(x, na.rm = FALSE) {
calc_harmonic_mean assert(is.numeric(x))
if(any(x < 0, na.rm = TRUE)) {
stop("x contains non-positive values, so the harmonic mean makes no sense.")
}# Use the first value of na.rm, and coerce to logical
<- coerce_to(use_first(na.rm), target_class = "logical")
na.rm %>%
x get_reciprocal() %>%
mean(na.rm = na.rm) %>%
get_reciprocal()
}
# See what happens when you pass it malformed na.rm
calc_harmonic_mean(std_and_poor500$pe_ratio, na.rm = 1:5)
## Warning: Only the first value of na.rm (= 1) will be used.
## Warning: Coercing use_first(na.rm) to class 'logical'.
## [1] 18.23871
12.3 Return Values and Scope
12.3.1 Returning values
Reasons for returning early
You already know the answer.
The input is an edge case.
12.3.1.1 Returning early
Sometimes, you don’t need to run through the whole body of a function to get the answer. In that case you can return early from that function using return()
.
Returning early can often improve the performance of your functions considerably for some input values.
You need to know that a leap year is every 400th year (like the year 2000) or every 4th year that isn’t a century (like 1904 but not 1900 or 1905).
# checking for the cases of year being divisible by 400, then 100, then 4, returning early from the function in each case.
<- function(year) {
is_leap_year # If year is div. by 400 return TRUE
if(year %% 400 == 0) {
return(TRUE)
}# If year is div. by 100 return FALSE
if(year %% 100 == 0) {
return(FALSE)
} # If year is div. by 4 return TRUE
if(year %% 4 == 0) {
return(TRUE)
}
# Otherwise return FALSE
else {
return(FALSE)
} }
12.3.1.2 Returning invisibly
When the main purpose of a function is to generate output, like drawing a plot or printing something in the console, you may not want a return value to be printed as well. In that case, the value should be invisibly returned: invisible()
The base R plot function returns NULL
, since its main purpose is to draw a plot. Recall that plot()
has a formula interface: plot(y ~ x, data = data)
# Using cars, draw a scatter plot of dist vs. speed
<- base::plot(dist ~ speed, data = cars)
plt_dist_vs_speed
# Oh no! The plot object is NULL
plt_dist_vs_speed
This isn’t helpful if you want to use it in piped code: instead it should invisibly return the plot data to be piped on to the next step.
# Define a pipeable plot fn with data and formula args
<- function(data, formula) {
pipeable_plot # Call plot() with the formula interface
::plot(formula, data)
base# Invisibly return the input dataset
invisible(data)
}
# Draw the scatter plot of dist vs. speed again
<- cars %>%
plt_dist_vs_speed pipeable_plot(dist ~ speed)
# Now the plot object has a value
plt_dist_vs_speed
12.3.2 Return multiple values
R functions can only return a single value, but there are two ways to get around this rule:
return several objects in a list
store objects as attributes
When to use each technique:
If you need the result to have a particular type, add additional return values as attributes.
Otherwise, collect all return values into a list
12.3.2.1 Returning many things
Functions can only return one value. If you want to return multiple things, then you can store them all in a list.
Multi-assignment
If users want to have the list items as separate variables, they can assign each list element to its own variable.
Using zeallot
’s multi-assignment operator, %<-%
, e.g, c(a, b) %<-% list(2 elements)
broom
package:
glance()
: modeltidy()
: coefficientaugment()
: observation
library(broom)
library(zeallot)
## Warning: package 'zeallot' was built under R version 4.3.2
# Look at the structure of model (it's a mess!)
str(model)
## List of 31
## $ coefficients : Named num [1:7] 4.0864 0.374 -0.0199 -0.5807 -0.5782 ...
## ..- attr(*, "names")= chr [1:7] "(Intercept)" "genderfemale" "income($25k,$55k]" "income($55k,$95k]" ...
## $ residuals : Named num [1:346] -0.535 -0.768 -0.944 -0.662 -0.767 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ fitted.values : Named num [1:346] 4.3 4.3 17.83 2.96 4.29 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ effects : Named num [1:346] -360 -29.2 20.3 -10 23.4 ...
## ..- attr(*, "names")= chr [1:346] "(Intercept)" "genderfemale" "income($25k,$55k]" "income($55k,$95k]" ...
## $ R : num [1:7, 1:7] -97.4 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:7] "(Intercept)" "genderfemale" "income($25k,$55k]" "income($55k,$95k]" ...
## .. ..$ : chr [1:7] "(Intercept)" "genderfemale" "income($25k,$55k]" "income($55k,$95k]" ...
## $ rank : int 7
## $ qr :List of 5
## ..$ qr : num [1:346, 1:7] -97.3861 0.0213 0.0434 0.0177 0.0213 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:346] "25" "26" "27" "29" ...
## .. .. ..$ : chr [1:7] "(Intercept)" "genderfemale" "income($25k,$55k]" "income($55k,$95k]" ...
## ..$ rank : int 7
## ..$ qraux: num [1:7] 1.02 1.02 1.04 1.01 1 ...
## ..$ pivot: int [1:7] 1 2 3 4 5 6 7
## ..$ tol : num 1e-11
## ..- attr(*, "class")= chr "qr"
## $ family :List of 13
## ..$ family : chr "poisson"
## ..$ link : chr "log"
## ..$ linkfun :function (mu)
## ..$ linkinv :function (eta)
## ..$ variance :function (mu)
## ..$ dev.resids:function (y, mu, wt)
## ..$ aic :function (y, n, mu, wt, dev)
## ..$ mu.eta :function (eta)
## ..$ initialize: expression({ if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") n <- rep.int(1, nobs| __truncated__
## ..$ validmu :function (mu)
## ..$ valideta :function (eta)
## ..$ simulate :function (object, nsim)
## ..$ dispersion: num 1
## ..- attr(*, "class")= chr "family"
## $ linear.predictors: Named num [1:346] 1.46 1.46 2.88 1.09 1.46 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ deviance : num 11529
## $ aic : num 12864
## $ null.deviance : num 18850
## $ iter : int 6
## $ weights : Named num [1:346] 4.3 4.3 17.83 2.96 4.29 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ prior.weights : Named num [1:346] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ df.residual : int 339
## $ df.null : int 345
## $ y : Named num [1:346] 2 1 1 1 1 1 80 104 55 350 ...
## ..- attr(*, "names")= chr [1:346] "25" "26" "27" "29" ...
## $ converged : logi TRUE
## $ boundary : logi FALSE
## $ model :'data.frame': 346 obs. of 4 variables:
## ..$ n_visits: num [1:346] 2 1 1 1 1 1 80 104 55 350 ...
## ..$ gender : Factor w/ 2 levels "male","female": 2 2 1 1 2 1 2 2 1 2 ...
## ..$ income : Factor w/ 4 levels "[$0,$25k]","($25k,$55k]",..: 4 4 4 4 3 1 1 4 2 2 ...
## ..$ travel : Factor w/ 3 levels "[0h,0.25h]","(0.25h,4h]",..: 3 3 2 3 3 1 1 1 2 1 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language n_visits ~ gender + income + travel
## .. .. ..- attr(*, "variables")= language list(n_visits, gender, income, travel)
## .. .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:4] "n_visits" "gender" "income" "travel"
## .. .. .. .. ..$ : chr [1:3] "gender" "income" "travel"
## .. .. ..- attr(*, "term.labels")= chr [1:3] "gender" "income" "travel"
## .. .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: 0x0000029a03797748>
## .. .. ..- attr(*, "predvars")= language list(n_visits, gender, income, travel)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "factor" "factor" "factor"
## .. .. .. ..- attr(*, "names")= chr [1:4] "n_visits" "gender" "income" "travel"
## ..- attr(*, "na.action")= 'omit' Named int [1:64] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..- attr(*, "names")= chr [1:64] "1" "2" "3" "4" ...
## $ na.action : 'omit' Named int [1:64] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:64] "1" "2" "3" "4" ...
## $ call : language glm(formula = formula, family = poisson, data = data)
## $ formula :Class 'formula' language n_visits ~ gender + income + travel
## .. ..- attr(*, ".Environment")=<environment: 0x0000029a03797748>
## $ terms :Classes 'terms', 'formula' language n_visits ~ gender + income + travel
## .. ..- attr(*, "variables")= language list(n_visits, gender, income, travel)
## .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:4] "n_visits" "gender" "income" "travel"
## .. .. .. ..$ : chr [1:3] "gender" "income" "travel"
## .. ..- attr(*, "term.labels")= chr [1:3] "gender" "income" "travel"
## .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: 0x0000029a03797748>
## .. ..- attr(*, "predvars")= language list(n_visits, gender, income, travel)
## .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "factor" "factor" "factor"
## .. .. ..- attr(*, "names")= chr [1:4] "n_visits" "gender" "income" "travel"
## $ data :'data.frame': 410 obs. of 4 variables:
## ..$ n_visits: num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ gender : Factor w/ 2 levels "male","female": 1 1 1 2 1 2 2 2 1 1 ...
## ..$ income : Factor w/ 4 levels "[$0,$25k]","($25k,$55k]",..: 4 2 4 2 4 2 4 4 4 4 ...
## ..$ travel : Factor w/ 3 levels "[0h,0.25h]","(0.25h,4h]",..: NA NA NA NA NA NA NA NA NA NA ...
## $ offset : NULL
## $ control :List of 3
## ..$ epsilon: num 1e-08
## ..$ maxit : num 25
## ..$ trace : logi FALSE
## $ method : chr "glm.fit"
## $ contrasts :List of 3
## ..$ gender: chr "contr.treatment"
## ..$ income: chr "contr.treatment"
## ..$ travel: chr "contr.treatment"
## $ xlevels :List of 3
## ..$ gender: chr [1:2] "male" "female"
## ..$ income: chr [1:4] "[$0,$25k]" "($25k,$55k]" "($55k,$95k]" "($95k,$Inf)"
## ..$ travel: chr [1:3] "[0h,0.25h]" "(0.25h,4h]" "(4h,Infh)"
## - attr(*, "class")= chr [1:2] "glm" "lm"
# Use broom tools to get a list of 3 data frames
list(
# Get model-level values
model = glance(model),
# Get coefficient-level values
coefficients = tidy(model),
# Get observation-level values
observations = augment(model)
)
## $model
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 18850. 345 -6425. 12864. 12891. 11529. 339 346
##
## $coefficients
## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.09 0.0279 146. 0
## 2 genderfemale 0.374 0.0212 17.6 2.18e- 69
## 3 income($25k,$55k] -0.0199 0.0267 -0.746 4.56e- 1
## 4 income($55k,$95k] -0.581 0.0343 -16.9 3.28e- 64
## 5 income($95k,$Inf) -0.578 0.0310 -18.7 6.88e- 78
## 6 travel(0.25h,4h] -0.627 0.0217 -28.8 5.40e-183
## 7 travel(4h,Infh) -2.42 0.0492 -49.3 0
##
## $observations
## # A tibble: 346 × 11
## .rownames n_visits gender income travel .fitted .resid .hat .sigma
## <chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 25 2 female ($95k,$Inf) (4h,Inf… 1.46 -1.24 0.0109 5.84
## 2 26 1 female ($95k,$Inf) (4h,Inf… 1.46 -1.92 0.0109 5.84
## 3 27 1 male ($95k,$Inf) (0.25h,… 2.88 -5.28 0.0129 5.83
## 4 29 1 male ($95k,$Inf) (4h,Inf… 1.09 -1.32 0.00711 5.84
## 5 30 1 female ($55k,$95k] (4h,Inf… 1.46 -1.92 0.0121 5.84
## 6 31 1 male [$0,$25k] [0h,0.2… 4.09 -10.4 0.0465 5.81
## 7 33 80 female [$0,$25k] [0h,0.2… 4.46 -0.710 0.0479 5.84
## 8 34 104 female ($95k,$Inf) [0h,0.2… 3.88 6.90 0.0332 5.83
## 9 35 55 male ($25k,$55k] (0.25h,… 3.44 3.85 0.0153 5.84
## 10 36 350 female ($25k,$55k] [0h,0.2… 4.44 21.5 0.0360 5.72
## # ℹ 336 more rows
## # ℹ 2 more variables: .cooksd <dbl>, .std.resid <dbl>
# Wrap this code into a function, groom_model
<- function(model) {
groom_model list(
model = glance(model),
coefficients = tidy(model),
observations = augment(model)
)
}
# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)
# See these individual variables
mdl; cff; obs
## # A tibble: 1 × 8
## null.deviance df.null logLik AIC BIC deviance df.residual nobs
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 18850. 345 -6425. 12864. 12891. 11529. 339 346
## # A tibble: 7 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4.09 0.0279 146. 0
## 2 genderfemale 0.374 0.0212 17.6 2.18e- 69
## 3 income($25k,$55k] -0.0199 0.0267 -0.746 4.56e- 1
## 4 income($55k,$95k] -0.581 0.0343 -16.9 3.28e- 64
## 5 income($95k,$Inf) -0.578 0.0310 -18.7 6.88e- 78
## 6 travel(0.25h,4h] -0.627 0.0217 -28.8 5.40e-183
## 7 travel(4h,Infh) -2.42 0.0492 -49.3 0
## # A tibble: 346 × 11
## .rownames n_visits gender income travel .fitted .resid .hat .sigma
## <chr> <dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 25 2 female ($95k,$Inf) (4h,Inf… 1.46 -1.24 0.0109 5.84
## 2 26 1 female ($95k,$Inf) (4h,Inf… 1.46 -1.92 0.0109 5.84
## 3 27 1 male ($95k,$Inf) (0.25h,… 2.88 -5.28 0.0129 5.83
## 4 29 1 male ($95k,$Inf) (4h,Inf… 1.09 -1.32 0.00711 5.84
## 5 30 1 female ($55k,$95k] (4h,Inf… 1.46 -1.92 0.0121 5.84
## 6 31 1 male [$0,$25k] [0h,0.2… 4.09 -10.4 0.0465 5.81
## 7 33 80 female [$0,$25k] [0h,0.2… 4.46 -0.710 0.0479 5.84
## 8 34 104 female ($95k,$Inf) [0h,0.2… 3.88 6.90 0.0332 5.83
## 9 35 55 male ($25k,$55k] (0.25h,… 3.44 3.85 0.0153 5.84
## 10 36 350 female ($25k,$55k] [0h,0.2… 4.44 21.5 0.0360 5.72
## # ℹ 336 more rows
## # ℹ 2 more variables: .cooksd <dbl>, .std.resid <dbl>
12.3.2.2 Returning metadata
Sometimes you want to return multiple things from a function, but you want the result to have a particular class (for example, a data frame or a numeric vector), so returning a list isn’t appropriate.
This is common when you have a result plus metadata about the result. (Metadata is “data about the data”. For example, it could be the file a dataset was loaded from)
In that case, you can store the metadata in attributes.
attr(object, "attribute_name") <- attribute_value
<- function(data, formula) {
pipeable_plot plot(formula, data)
# Add a "formula" attribute to data
attr(data, "formula") <- formula
invisible(data)
}
# From previous exercise
<- cars %>%
plt_dist_vs_speed pipeable_plot(dist ~ speed)
# Examine the structure of the result
str(plt_dist_vs_speed)
12.3.3 Environments
Environments are a type of variable that is used to store other variables. Mostly, you can think of them as lists.
Every environment has a parent environment (except the empty environment, at the root of the environment tree). This determines which variables R know about at different places in your code.
12.3.3.1 Create & explore environments
ls.str(list)
: list the elements, give a compact summary of multiple variables at once.list2env(list)
: convert the list to an environment.environmentName(parent.env(env))
: find the parent of an environment, and display it.
Facts about the Republic of South Africa are contained in capitals
, national_parks
, and population
.
# create capitals
<- c("Cape Town", "Bloemfontein", "Pretoria")
city <- c("Legislative", "Judicial", "Administrative")
type_of_capital <- data.frame(city, type_of_capital)
capitals
# create national_parks
<- c(
national_parks "Addo Elephant National Park",
"Agulhas National Park",
"Ai-Ais/Richtersveld Transfrontier Park",
"Augrabies Falls National Park",
"Bontebok National Park",
"Camdeboo National Park",
"Golden Gate Highlands National Park",
"Hluhluwe–Imfolozi Park",
"Karoo National Park",
"Kgalagadi Transfrontier Park",
"Knysna National Lake Area",
"Kruger National Park",
"Mapungubwe National Park",
"Marakele National Park",
"Mokala National Park",
"Mountain Zebra National Park",
"Namaqua National Park",
"Table Mountain National Park",
"Tankwa Karoo National Park",
"Tsitsikamma National Park",
"West Coast National Park",
"Wilderness National Park")
# create population
<- ts(c(40583573, 44819778, 47390900, 51770560, 55908900), start = 1996, end = 2016, frequency = 0.2); population population
## Time Series:
## Start = 1996
## End = 2016
## Frequency = 0.2
## [1] 40583573 44819778 47390900 51770560 55908900
# Add capitals, national_parks, & population to a named list
<- list(
rsa_lst capitals = capitals,
national_parks = national_parks,
population = population
)
# List the structure of each element of rsa_lst
ls.str(rsa_lst)
## capitals : 'data.frame': 3 obs. of 2 variables:
## $ city : chr "Cape Town" "Bloemfontein" "Pretoria"
## $ type_of_capital: chr "Legislative" "Judicial" "Administrative"
## national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
## population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
# Convert the list to an environment
<- list2env(rsa_lst)
rsa_env
# List the structure of each variable
ls.str(rsa_env)
## capitals : 'data.frame': 3 obs. of 2 variables:
## $ city : chr "Cape Town" "Bloemfontein" "Pretoria"
## $ type_of_capital: chr "Legislative" "Judicial" "Administrative"
## national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
## population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
# Find the parent environment of rsa_env
<- parent.env(rsa_env)
parent
# Print its name
environmentName(parent)
## [1] "R_GlobalEnv"
12.3.3.2 Do variables exist?
If R cannot find a variable in the current environment, it will look in the parent environment, then the grandparent environment, and so on until it finds it.
exists("variable", envir = env)
You can force R to only look in the environment you asked for by setting inherits
to FALSE
.
exists("variable", envir = env, inherits = FALSE)
# Compare the contents of the global environment and rsa_env
ls.str(globalenv())
## a : num 5
## account_offices : spc_tbl_ [98 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## accounts : tibble [97 × 5] (S3: tbl_df/tbl/data.frame)
## accounts_clean : tibble [88 × 6] (S3: tbl_df/tbl/data.frame)
## airquality : spc_tbl_ [153 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## akl_daily : tibble [3,661 × 10] (S3: tbl_df/tbl/data.frame)
## akl_daily_raw : spc_tbl_ [3,661 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## akl_day : tibble [10,905 × 15] (S3: tbl_df/tbl/data.frame)
## akl_hourly : tibble [17,454 × 13] (S3: tbl_df/tbl/data.frame)
## akl_hourly_raw : spc_tbl_ [17,454 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## all_deals : 'data.frame': 10548 obs. of 2 variables:
## $ product : int 3544 5073 6149 7863 14 10247 3544 5073 6149 7863 ...
## $ num_users: int 19 43 87 83 17 2 29 13 80 23 ...
## all_wars_matrix : num [1:6, 1:2] 461 290 309 474 311 ...
## amir_deals : 'data.frame': 178 obs. of 5 variables:
## $ product : Factor w/ 14 levels "Product A","Product B",..: 6 3 2 9 5 2 3 13 6 2 ...
## $ client : Factor w/ 3 levels "Current","New",..: 1 2 2 1 1 2 1 1 1 1 ...
## $ status : Factor w/ 2 levels "Lost","Won": 2 2 2 2 2 2 2 2 2 2 ...
## $ amount : num 7390 4493 5738 2591 6623 ...
## $ num_users: num 19 43 87 83 17 2 29 13 80 23 ...
## animals_vector : chr [1:4] "Elephant" "Giraffe" "Donkey" "Horse"
## answer_counts : spc_tbl_ [227,452 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## answers : spc_tbl_ [451,118 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## answers_with_tags : spc_tbl_ [800,378 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## april_10_2014 : Date[1:1], format: "2014-04-10"
## astro : Named chr [1:4] "20-Mar-2015" "25-Jun-2015" "23-Sep-2015" "22-Dec-2015"
## astro_dates : Date[1:4], format: "2015-03-20" "2015-06-25" "2015-09-23" "2015-12-22"
## avg_review : num 4.72
## avg_sum : num 22.3
## avg_sum_trimmed : num 22.6
## awards : chr [1:6] "Won 1 Oscar." "Won 1 Oscar. Another 9 wins & 24 nominations." ...
## babynames_fraction : tibble [1,924,665 × 7] (S3: tbl_df/tbl/data.frame)
## babynames_ratios_filtered : gropd_df [541,903 × 8] (S3: grouped_df/tbl_df/tbl/data.frame)
## basics : function (x)
## batman : tibble [10,111 × 13] (S3: tbl_df/tbl/data.frame)
## batman_colors : tibble [56 × 3] (S3: tbl_df/tbl/data.frame)
## batman_parts : tibble [3,455 × 3] (S3: tbl_df/tbl/data.frame)
## batmobile : tibble [173 × 5] (S3: tbl_df/tbl/data.frame)
## batwing : tibble [323 × 5] (S3: tbl_df/tbl/data.frame)
## below_zero : function (x)
## bike_share_rides : spc_tbl_ [35,231 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## bike_share_rides_past : tibble [34,025 × 10] (S3: tbl_df/tbl/data.frame)
## bike_share_rides_unique : tibble [35,229 × 6] (S3: tbl_df/tbl/data.frame)
## blue_range : function (n)
## blues : chr [1:9] "#F7FBFF" "#DEEBF7" "#C6DBEF" "#9ECAE1" "#6BAED6" "#4292C6" ...
## box_office : num [1:6] 461 314 290 248 309 ...
## breaks : num [1:4] 2.27 0 1440 1668
## by_continent : tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
## by_continent_2007 : tibble [5 × 3] (S3: tbl_df/tbl/data.frame)
## by_type_year_tag : tibble [57,331 × 4] (S3: tbl_df/tbl/data.frame)
## by_type_year_tag_filtered : tibble [38 × 4] (S3: tbl_df/tbl/data.frame)
## by_year : tibble [12 × 2] (S3: tbl_df/tbl/data.frame)
## by_year_continent : gropd_df [60 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
## c : int 3
## calc_harmonic_mean : function (x, na.rm = FALSE)
## capitals : 'data.frame': 3 obs. of 2 variables:
## $ city : chr "Cape Town" "Bloemfontein" "Pretoria"
## $ type_of_capital: chr "Legislative" "Judicial" "Administrative"
## cff : tibble [7 × 5] (S3: tbl_df/tbl/data.frame)
## char : chr "u"
## chars : chr [1:40] "r" "'" "s" " " "i" "n" "t" "e" "r" "n" "a" "l" "s" " " "a" "r" ...
## cities : 'data.frame': 5 obs. of 1 variable:
## $ city_actual: chr "new york" "los angeles" "atlanta" "san francisco" ...
## city : chr [1:3] "Cape Town" "Bloemfontein" "Pretoria"
## coin_sides : chr [1:2] "head" "tail"
## color_palette : Named chr [1:42] "#05131D" "#0055BF" "#237841" "#C91A09" "#F2CD37" ...
## colors : spc_tbl_ [262 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## colors_joined : tibble [42 × 10] (S3: tbl_df/tbl/data.frame)
## cols : chr [1:8] "country" "year_1960" "year_1961" "year_1962" "year_1963" ...
## comments : chr [1:5] "I would watch it again" "Amazing!" "I liked it" ...
## counties : spc_tbl_ [3,138 × 40] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## counties_selected : tibble [3,138 × 5] (S3: tbl_df/tbl/data.frame)
## countries : 'data.frame': 209 obs. of 1 variable:
## $ country: chr "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## cut_by_quantile : function (x, n = 5, na.rm = FALSE, labels = NULL, interval_type = c("(lo, hi]",
## "[lo, hi)"))
## da2 : Ord.factor w/ 3 levels "slow"<"medium"<..: 1
## da5 : Ord.factor w/ 3 levels "slow"<"medium"<..: 3
## datacamp_light_blue : chr "#51A8C9"
## date_difference : Formal class 'Interval' [package "lubridate"] with 3 slots
## date_landing : Date[1:1], format: "1969-07-20"
## date_stamp : function (x, locale = "C")
## date1 : Date[1:1], format: "1996-05-23"
## date2 : Date[1:1], format: "2012-03-15"
## date3 : Date[1:1], format: "2006-01-30"
## dates : chr [1:17454] "2016-01-01T00:00:00Z" "2016-01-01T00:30:00Z" ...
## day_diff : 'difftime' num [1:4] 2 5 6 5
## day1 : Date[1:1], format: "2023-09-15"
## day2 : Date[1:1], format: "2023-09-17"
## day3 : Date[1:1], format: "2023-09-22"
## day4 : Date[1:1], format: "2023-09-28"
## day5 : Date[1:1], format: "2023-10-03"
## days_vector : chr [1:5] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
## deals : int [1:52] 1 1 0 0 0 0 1 1 1 1 ...
## departed_list : List of 4
## $ title : chr "The Departed"
## $ actors : chr [1:6] "Leonardo DiCaprio" "Matt Damon" "Jack Nicholson" "Mark Wahlberg" ...
## $ reviews:'data.frame': 5 obs. of 2 variables:
## $ average: num 4.72
## dest_size : chr [1:4] "Small" "Medium" "Large" "Hub"
## dest_sizes : 'data.frame': 4 obs. of 2 variables:
## $ dest_size : chr "Small" "Medium" "Large" "Hub"
## $ passengers_per_day: chr "0-20K" "20K-70K" "70K-100K" "100K+"
## diameter : num [1:8] 0.382 0.949 1 0.532 11.209 ...
## dims : int [1:2, 1:3] 209 8 209 9 209 38
## eclipse_2017 : POSIXct[1:1], format: "2017-08-21 18:26:40"
## emails : chr [1:6] "john.doe@ivyleague.edu" "education@world.gov" ...
## emissions_by_country : tibble [130 × 2] (S3: tbl_df/tbl/data.frame)
## empire_strikes : num [1:2] 290 248
## errors : num [1:6] 1.9 -2.6 4 -9.5 -3.4 7.3
## europe_categories : chr [1:3] "EU" "eur" "Europ"
## every_two_weeks : Formal class 'Period' [package "lubridate"] with 6 slots
## expected_val : num 2.9
## extremes : function (x)
## extremes_avg : function (x)
## fac : List of 3
## $ levels : chr [1:3] "Beef" "Meat" "Poultry"
## $ ordered : logi FALSE
## $ include_na: logi FALSE
## facebook : List of 7
## $ : num 17
## $ : num 7
## $ : num 5
## $ : num 16
## $ : num 8
## $ : num 13
## $ : num 14
## factor_animals_vector : Factor w/ 4 levels "Donkey","Elephant",..: 2 3 1 4
## factor_sex_vector : Factor w/ 2 levels "Female","Male": 2 1 1 2 2
## factor_speed_vector : Ord.factor w/ 3 levels "slow"<"medium"<..: 2 1 1 2 3
## factor_survey_vector : Factor w/ 2 levels "Female","Male": 2 1 1 2 2
## factor_temperature_vector : Ord.factor w/ 3 levels "Low"<"Medium"<..: 3 1 3 1 2
## fb : num 9
## fb_vec : num [1:7] 17 7 5 16 8 13 14
## finished : chr "I finished 'Dates and Times in R' on Thursday, September 4, 2017!"
## fish : chr [1:3] "fish.species" "fish.year" "fish.tidy"
## fish.species : 'data.frame': 61 obs. of 8 variables:
## $ Year : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
## $ Pink : int 100600 259000 132600 235900 123400 244400 203400 270119 200798 200085 ...
## $ Chum : int 139300 155900 113800 99800 148700 143700 158480 125377 132407 113114 ...
## $ Sockeye : int 64100 51200 58200 66100 83800 72000 84800 69676 100520 62472 ...
## $ Coho : int 30500 40900 33600 32400 38300 45100 40000 39900 39200 32865 ...
## $ Rainbow : int 0 100 100 100 100 100 100 100 100 100 ...
## $ Chinook : int 23200 25500 24900 25300 24500 27700 25300 21200 20900 20335 ...
## $ Atlantic: int 10800 9701 9800 8800 9600 7800 8100 9000 8801 8700 ...
## fish.tidy : 'data.frame': 427 obs. of 3 variables:
## $ Species: Factor w/ 7 levels "Pink","Chum",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Year : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
## $ Capture: int 100600 259000 132600 235900 123400 244400 203400 270119 200798 200085 ...
## fish.year : 'data.frame': 7 obs. of 62 variables:
## $ Species: Factor w/ 7 levels "Pink","Chum",..: 1 2 3 4 5 6 7
## $ 1950 : int 100600 139300 64100 30500 0 23200 10800
## $ 1951 : int 259000 155900 51200 40900 100 25500 9701
## $ 1952 : int 132600 113800 58200 33600 100 24900 9800
## $ 1953 : int 235900 99800 66100 32400 100 25300 8800
## $ 1954 : int 123400 148700 83800 38300 100 24500 9600
## $ 1955 : int 244400 143700 72000 45100 100 27700 7800
## $ 1956 : int 203400 158480 84800 40000 100 25300 8100
## $ 1957 : int 270119 125377 69676 39900 100 21200 9000
## $ 1958 : int 200798 132407 100520 39200 100 20900 8801
## $ 1959 : int 200085 113114 62472 32865 100 20335 8700
## $ 1960 : int 111880 120760 83171 26898 100 17412 8900
## $ 1961 : int 178133 109053 93311 37524 200 18130 9400
## $ 1962 : int 163558 110707 66126 39883 300 17489 11500
## $ 1963 : int 222477 106491 53140 41547 1600 18928 10766
## $ 1964 : int 143623 94498 56189 38948 2300 20932 14342
## $ 1965 : int 161131 76503 94286 44207 1500 21197 12363
## $ 1966 : int 174844 93949 81080 46874 900 21549 12337
## $ 1967 : int 168960 88027 75582 43396 1700 21375 14411
## $ 1968 : int 155729 95624 113332 46463 11200 20720 12823
## $ 1969 : int 195169 67888 65106 30973 3000 21915 13316
## $ 1970 : int 133466 113650 106436 44050 4400 24000 13457
## $ 1971 : int 178823 105879 79160 43110 4600 25800 12397
## $ 1972 : int 93852 137655 43932 33880 4900 24000 12665
## $ 1973 : int 151180 125364 55637 38550 6900 27400 15387
## $ 1974 : int 94026 121052 53027 42631 8099 24148 15158
## $ 1975 : int 171012 127561 38615 31782 10302 24137 14736
## $ 1976 : int 146895 125110 60357 36965 9492 24984 11372
## $ 1977 : int 224475 118677 64651 30777 6166 26351 11491
## $ 1978 : int 174314 129513 75737 30712 7219 25417 8948
## $ 1979 : int 249558 150176 109439 34279 7036 25452 10259
## $ 1980 : int 226191 166803 111830 30557 9027 23023 12965
## $ 1981 : int 264855 186550 132651 28702 8819 22789 12425
## $ 1982 : int 170373 182561 128176 42281 6839 25147 10326
## $ 1983 : int 255129 196414 163790 30291 7473 18926 10862
## $ 1984 : int 210511 210507 126834 40834 7977 18449 10562
## $ 1985 : int 300987 267794 150860 38823 9373 20318 12422
## $ 1986 : int 211685 239065 136452 42825 9709 21187 11739
## $ 1987 : int 218121 217129 131135 28173 12196 25475 11413
## $ 1988 : int 165236 286572 107361 32245 11872 31338 10093
## $ 1989 : int 363456 243728 169410 31614 12249 20725 15795
## $ 1990 : int 235190 299438 198168 34635 12529 18249 10860
## $ 1991 : int 438998 266940 161218 38078 12545 15373 9793
## $ 1992 : int 216110 238427 199709 38433 9731 15093 9308
## $ 1993 : int 302568 286996 242613 25451 19091 15036 8161
## $ 1994 : int 326065 328765 183686 45602 12624 13623 7231
## $ 1995 : int 394735 424595 189612 28922 13450 13801 6819
## $ 1996 : int 294915 411395 188584 28201 12407 10509 6493
## $ 1997 : int 318717 347560 132075 13191 13147 13000 5579
## $ 1998 : int 371552 311965 78972 19386 18940 9840 4744
## $ 1999 : int 386928 281260 130128 15449 15874 8735 4107
## $ 2000 : int 285338 276355 124782 18035 14918 8437 4710
## $ 2001 : int 360973 307662 108618 20006 14927 8771 4772
## $ 2002 : int 268544 314098 103325 20757 12241 13911 4139
## $ 2003 : int 377749 360429 109822 16995 17128 15046 3648
## $ 2004 : int 266554 351188 142385 24546 16601 15899 4082
## $ 2005 : int 456350 318389 147151 18791 16886 13571 3727
## $ 2006 : int 316205 361561 151523 18275 17079 10497 3087
## $ 2007 : int 506343 331266 164609 17228 14844 8900 3014
## $ 2008 : int 294876 295819 138896 21280 16819 6443 3002
## $ 2009 : int 591654 359908 150040 19725 17884 6303 2373
## $ 2010 : int 370044 143959 170972 20770 20754 7968 2363
## fodors : 'data.frame': 533 obs. of 7 variables:
## $ id : int 0 1 2 3 4 5 6 7 8 9 ...
## $ name : chr "arnie morton's of chicago" "art's delicatessen" "hotel bel-air" "cafe bizou" ...
## $ addr : chr "435 s. la cienega blv ." "12224 ventura blvd." "701 stone canyon rd." "14016 ventura blvd." ...
## $ city : Factor w/ 5 levels "atlanta","los angeles",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ phone: chr "310-246-1501" "818-762-1221" "310-472-1211" "818-788-3536" ...
## $ type : chr "american" "american" "californian" "french" ...
## $ class: int 0 1 2 3 4 5 6 7 8 9 ...
## food_consumption : spc_tbl_ [1,430 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## formats : chr [1:2] "%Y-%m-%d" "%B %d, %Y"
## freezing_l : List of 7
## $ : num -1
## $ : num(0)
## $ : num [1:2] -1 -3
## $ : num -2
## $ : num(0)
## $ : num -3
## $ : num(0)
## freezing_s : List of 7
## $ : num -1
## $ : num(0)
## $ : num [1:2] -1 -3
## $ : num -2
## $ : num(0)
## $ : num -3
## $ : num(0)
## game2 : POSIXct[1:1], format: "2015-06-11 19:00:00"
## game2_local : POSIXct[1:1], format: "2015-06-11 19:00:00"
## game3 : POSIXct[1:1], format: "2015-06-15 18:30:00"
## game3_local : POSIXct[1:1], format: "2015-06-15 18:30:00"
## gapminder_1952 : tibble [142 × 6] (S3: tbl_df/tbl/data.frame)
## get_reciprocal : function (x)
## global_mean : num 67
## gm2007 : spc_tbl_ [20 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## gm2007_full : spc_tbl_ [142 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## groom_model : function (model)
## group_id : chr [1:10] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
## group_size : num [1:10] 2 4 6 2 2 2 3 2 4 2
## growth_by_dose : List of 9
## $ data : spc_tbl_ [60 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ layers :List of 2
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme :List of 97
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## halleys : tibble [27 × 7] (S3: tbl_df/tbl/data.frame)
## halleys_1066 : tibble [1 × 7] (S3: tbl_df/tbl/data.frame)
## hello : function ()
## hits : int [1:2] 1 5
## hits_w : int [1:4] 1 2 4 5
## hotdogs : 'data.frame': 54 obs. of 3 variables:
## $ type : chr "Beef" "Beef" "Beef" "Beef" ...
## $ calories: int 186 181 176 149 184 190 158 139 175 148 ...
## $ sodium : int 495 477 425 322 482 587 370 322 479 375 ...
## hotdogs_factor : spc_tbl_ [54 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## hotdogs2 : 'data.frame': 54 obs. of 2 variables:
## $ type : Factor w/ 3 levels "Beef","Meat",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ sodium: num 495 477 425 322 482 587 370 322 479 375 ...
## i : int 6
## index2 : int 7
## int : list()
## interpret : function (num_views)
## interpret_all : function (views, return_sum = T)
## inventories : spc_tbl_ [37,233 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## inventory_parts : spc_tbl_ [1,179,869 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## inventory_parts_joined : tibble [1,179,869 × 6] (S3: tbl_df/tbl/data.frame)
## inventory_parts_themes : tibble [1,113,729 × 13] (S3: tbl_df/tbl/data.frame)
## inventory_sets_themes : tibble [1,113,729 × 13] (S3: tbl_df/tbl/data.frame)
## inventory_version_1 : spc_tbl_ [35,612 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## iqr : Named num 664
## is_leap_year : function (year)
## jan_31 : Date[1:1], format: "2023-01-31"
## last : num 14
## last_release : spc_tbl_ [1 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## last_release_date : Date[1:1], format: "2017-06-30"
## last2 : num [1:2] 17 14
## li : num 17
## li_vec : num [1:7] 16 9 13 5 2 17 14
## lily : 'data.frame': 1 obs. of 3 variables:
## $ type : chr "Poultry"
## $ calories: int 86
## $ sodium : int 358
## linkedin : List of 7
## $ : num 16
## $ : num 9
## $ : num 13
## $ : num 5
## $ : num 2
## $ : num 17
## $ : num 14
## login : POSIXct[1:5], format: "2023-09-19 10:18:04" "2023-09-24 09:14:18" "2023-09-24 12:21:51" ...
## logout : POSIXct[1:5], format: "2023-09-19 10:56:29" "2023-09-24 09:14:52" "2023-09-24 12:35:48" ...
## logs : spc_tbl_ [100,000 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## lower : Named num -550
## mar_11 : POSIXct[1:1], format: "2017-03-11 12:00:00"
## mar_12 : POSIXct[1:1], format: "2017-03-12 12:00:00"
## mar_13 : POSIXct[1:1], format: "2017-03-13 12:00:00"
## matrix2_rowname : chr [1:3] "The Phantom Menace" "Attack of the Clones" ...
## max : num 30
## mdl : tibble [1 × 8] (S3: tbl_df/tbl/data.frame)
## medium : chr "LinkedIn"
## meteo : Named chr [1:4] "March 1, 15" "June 1, 15" "September 1, 15" ...
## meteo_dates : Date[1:4], format: "2015-03-01" "2015-06-01" "2015-09-01" "2015-12-01"
## millennium_falcon : tibble [239 × 6] (S3: tbl_df/tbl/data.frame)
## millennium_falcon_colors : tibble [19 × 2] (S3: tbl_df/tbl/data.frame)
## min : num 0
## model : List of 31
## $ coefficients : Named num [1:7] 4.0864 0.374 -0.0199 -0.5807 -0.5782 ...
## $ residuals : Named num [1:346] -0.535 -0.768 -0.944 -0.662 -0.767 ...
## $ fitted.values : Named num [1:346] 4.3 4.3 17.83 2.96 4.29 ...
## $ effects : Named num [1:346] -360 -29.2 20.3 -10 23.4 ...
## $ R : num [1:7, 1:7] -97.4 0 0 0 0 ...
## $ rank : int 7
## $ qr :List of 5
## $ family :List of 13
## $ linear.predictors: Named num [1:346] 1.46 1.46 2.88 1.09 1.46 ...
## $ deviance : num 11529
## $ aic : num 12864
## $ null.deviance : num 18850
## $ iter : int 6
## $ weights : Named num [1:346] 4.3 4.3 17.83 2.96 4.29 ...
## $ prior.weights : Named num [1:346] 1 1 1 1 1 1 1 1 1 1 ...
## $ df.residual : int 339
## $ df.null : int 345
## $ y : Named num [1:346] 2 1 1 1 1 1 80 104 55 350 ...
## $ converged : logi TRUE
## $ boundary : logi FALSE
## $ model :'data.frame': 346 obs. of 4 variables:
## $ na.action : 'omit' Named int [1:64] 1 2 3 4 5 6 7 8 9 10 ...
## $ call : language glm(formula = formula, family = poisson, data = data)
## $ formula :Class 'formula' language n_visits ~ gender + income + travel
## $ terms :Classes 'terms', 'formula' language n_visits ~ gender + income + travel
## $ data :'data.frame': 410 obs. of 4 variables:
## $ offset : NULL
## $ control :List of 3
## $ method : chr "glm.fit"
## $ contrasts :List of 3
## $ xlevels :List of 3
## moment_step : POSIXct[1:1], format: "1969-07-20 02:56:15"
## mon_2pm : POSIXct[1:1], format: "2018-08-27 14:00:00"
## monarchs : tibble [131 × 7] (S3: tbl_df/tbl/data.frame)
## month_seq : Formal class 'Period' [package "lubridate"] with 6 slots
## movie_actors : chr [1:6] "Leonardo DiCaprio" "Matt Damon" "Jack Nicholson" ...
## movie_title : chr "The Departed"
## mtcars : 'data.frame': 32 obs. of 16 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : num 4 4 1 1 2 1 4 2 2 4 ...
## $ fcyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ fam : Factor w/ 2 levels "manual","automatic": 1 1 1 2 2 2 2 2 2 2 ...
## $ car : chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## $ fcyl_fam: Factor w/ 6 levels "4:0","6:0","8:0",..: 5 5 4 2 3 2 3 1 1 2 ...
## $ fvs : Factor w/ 2 levels "V-shaped","straight": 1 1 2 2 1 2 1 2 2 2 ...
## mtcars_by_cyl : tibble [3 × 5] (S3: tbl_df/tbl/data.frame)
## mtcars_by_cyl1 : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
## mtcars_by_cyl2 : 'data.frame': 3 obs. of 3 variables:
## $ cyl : num 4 6 8
## $ n_wt: int 11 7 14
## $ prop: num 0.344 0.219 0.438
## my_blue : chr "#4ABEFF"
## my_book : Formal class 'workbook' [package "XLConnect"] with 2 slots
## my_character : chr "universe"
## my_df : 'data.frame': 10 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
## $ cyl : num 6 6 4 6 8 6 8 4 4 6
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1
## $ am : num 1 1 1 0 0 0 0 0 0 0
## $ gear: num 4 4 4 3 3 3 3 4 4 4
## $ carb: num 4 4 1 1 2 1 4 2 2 4
## my_list : List of 3
## $ vec: int [1:10] 1 2 3 4 5 6 7 8 9 10
## $ mat: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
## $ df :'data.frame': 10 obs. of 11 variables:
## my_logical : logi FALSE
## my_matrix : int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
## my_numeric : num 42
## my_vector : int [1:10] 1 2 3 4 5 6 7 8 9 10
## n_flips : num 10
## n_visits : num [1:410] 0 0 0 0 0 0 0 0 0 0 ...
## name : chr [1:8] "Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" ...
## names : List of 4
## $ : chr "gauss"
## $ : chr "bayes"
## $ : chr "pascal"
## $ : chr "pearson"
## names_ad : chr [1:4] "gauss" "bayes" "pascal" "pearson"
## names_an : chr [1:4] "gauss" "bayes" "pascal" "pearson"
## names_filtered : tibble [414 × 8] (S3: tbl_df/tbl/data.frame)
## names_normalized : tibble [1,924,665 × 8] (S3: tbl_df/tbl/data.frame)
## national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
## new_hope : num [1:2] 461 314
## new_mean : num 6000
## new_sales : 'data.frame': 36 obs. of 2 variables:
## $ sale_num: int 1 2 3 4 5 6 7 8 9 10 ...
## $ amount : num 3732 5735 5340 1180 5797 ...
## new_sd : num 2600
## non_us_all : Named num [1:6] 314 248 166 552 339 ...
## non_us_some : Named num [1:2] 314 248
## now : POSIXct[1:1], format: "2024-02-12 15:34:01"
## num : num 2500
## num_chars : int [1:6] 8 5 6 5 14 9
## num_views : num 14
## number : num 2500
## nyc : List of 3
## $ pop : num 8405837
## $ boroughs: chr [1:5] "Manhattan" "Bronx" "Brooklyn" "Queens" ...
## $ capital : logi FALSE
## nyc_ind : int 3
## nyc_val : logi FALSE
## obs : tibble [346 × 11] (S3: tbl_df/tbl/data.frame)
## oceania_1952 : tibble [2 × 6] (S3: tbl_df/tbl/data.frame)
## p_head : num 0.8
## p_wt_vs_fcyl_by_fam : List of 9
## $ data :'data.frame': 32 obs. of 13 variables:
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## p_wt_vs_fcyl_by_fam_jit : List of 9
## $ data :'data.frame': 32 obs. of 13 variables:
## $ layers :List of 1
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## pairs : Classes 'pairs', 'data.table' and 'data.frame': 40532 obs. of 5 variables:
## $ .x : int 125 125 125 125 125 125 125 125 125 125 ...
## $ .y : int 75 76 77 78 79 80 81 82 83 84 ...
## $ name : num 0 0 0.489 0.381 0.281 ...
## $ addr : num 0.562 0.56 0.537 0.553 0.507 ...
## $ weights: num -1.3486 -1.3558 0.0967 -0.1716 -0.6351 ...
## palette : chr [1:2] "#D7191C" "#2C7BB6"
## parent : <environment: R_GlobalEnv>
## part_categories : spc_tbl_ [66 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## parts : spc_tbl_ [52,600 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## parts_joined : tibble [9,964 × 4] (S3: tbl_df/tbl/data.frame)
## passengers_per_day : chr [1:4] "0-20K" "20K-70K" "70K-100K" "100K+"
## path : chr "data/hotdogs.txt"
## phone_no_parens : chr [1:2809] "858 990 5153" "731-813-2043" "563-732-6802" "145 725 4021" ...
## pioneers : chr [1:4] "GAUSS:1777" "BAYES:1702" "PASCAL:1623" "PEARSON:1857"
## pizza : Date[1:5], format: "2023-09-15" "2023-09-17" "2023-09-22" "2023-09-28" "2023-10-03"
## planets_df : 'data.frame': 8 obs. of 5 variables:
## $ name : chr "Mercury" "Venus" "Earth" "Mars" ...
## $ type : chr "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
## $ diameter: num 0.382 0.949 1 0.532 11.209 ...
## $ rotation: num 58.64 -243.02 1 1.03 0.41 ...
## $ rings : logi FALSE FALSE FALSE FALSE TRUE TRUE ...
## plt_country_vs_lifeExp : List of 9
## $ data : spc_tbl_ [20 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ layers :List of 3
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 8
## plt_mpg_vs_fcyl_by_fam : List of 9
## $ data :'data.frame': 32 obs. of 13 variables:
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## plt_mpg_vs_wt : List of 9
## $ data :'data.frame': 32 obs. of 13 variables:
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 2
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 2
## plt_mpg_vs_wt_by_cyl : List of 9
## $ data :'data.frame': 32 obs. of 13 variables:
## $ layers :List of 1
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme :List of 3
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## plt_price_vs_carat : List of 9
## $ data : tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 2
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 2
## plt_price_vs_carat_by_clarity : List of 9
## $ data : tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 3
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 3
## plt_price_vs_carat_transparent : List of 9
## $ data : tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ layers :List of 1
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 2
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 2
## plt_prop_unemployed_over_time : List of 9
## $ data : spc_tbl_ [574 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ layers :List of 1
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 2
## $ theme :List of 1
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 4
## poker_midweek : Named num [1:3] -50 20 -120
## poker_midweek_name : Named num [1:3] -50 20 -120
## poker_two : Named num [1:2] 140 240
## poker_two_name : Named num [1:2] 140 240
## poker_vector : Named num [1:5] 140 -50 20 -120 240
## poker_wednesday : Named num 20
## poker_winning_days : Named num [1:3] 140 20 240
## pools : 'data.frame': 20 obs. of 4 variables:
## $ Name : chr "Acacia Ridge Leisure Centre" "Bellbowrie Pool" "Carole Park" "Centenary Pool (inner City)" ...
## $ Address : chr "1391 Beaudesert Road, Acacia Ridge" "Sugarwood Street, Bellbowrie" "Cnr Boundary Road and Waterford Road Wacol" "400 Gregory Terrace, Spring Hill" ...
## $ Latitude : num -27.6 -27.6 -27.6 -27.5 -27.4 ...
## $ Longitude: num 153 153 153 153 153 ...
## pop_1 : tibble [209 × 8] (S3: tbl_df/tbl/data.frame)
## pop_2 : tibble [209 × 9] (S3: tbl_df/tbl/data.frame)
## pop_3 : tibble [209 × 38] (S3: tbl_df/tbl/data.frame)
## pop_a : tibble [209 × 8] (S3: tbl_df/tbl/data.frame)
## pop_b : tibble [209 × 8] (S3: tbl_df/tbl/data.frame)
## pop_list : List of 3
## $ : tibble [209 × 8] (S3: tbl_df/tbl/data.frame)
## $ : tibble [209 × 9] (S3: tbl_df/tbl/data.frame)
## $ : tibble [209 × 38] (S3: tbl_df/tbl/data.frame)
## population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
## positions : int [1:8] 1 4 2 3 8 7 6 5
## posn_d : Classes 'PositionDodge', 'Position', 'ggproto', 'gg' <ggproto object: Class PositionDodge, Position, gg>
## compute_layer: function
## compute_panel: function
## preserve: total
## required_aes:
## setup_data: function
## setup_params: function
## width: 0.9
## super: <ggproto object: Class PositionDodge, Position, gg>
## posn_j : Classes 'PositionJitter', 'Position', 'ggproto', 'gg' <ggproto object: Class PositionJitter, Position, gg>
## compute_layer: function
## compute_panel: function
## height: NULL
## required_aes: x y
## seed: NA
## setup_data: function
## setup_params: function
## width: 0.2
## super: <ggproto object: Class PositionJitter, Position, gg>
## posn_jd : Classes 'PositionJitterdodge', 'Position', 'ggproto', 'gg' <ggproto object: Class PositionJitterdodge, Position, gg>
## compute_layer: function
## compute_panel: function
## dodge.width: 0.1
## jitter.height: 0
## jitter.width: 0.2
## required_aes: x y
## seed: 1840911448
## setup_data: function
## setup_params: function
## super: <ggproto object: Class PositionJitterdodge, Position, gg>
## posts_with_tags : tibble [1,546,174 × 7] (S3: tbl_df/tbl/data.frame)
## potatoes : Classes 'data.table' and 'data.frame': 160 obs. of 8 variables:
## $ area : int 1 1 1 1 1 1 1 1 1 1 ...
## $ temp : int 1 1 1 1 1 1 1 1 1 1 ...
## $ size : int 1 1 1 1 1 1 1 1 1 1 ...
## $ storage : int 1 1 1 1 1 2 2 2 2 2 ...
## $ method : int 1 2 3 4 5 1 2 3 4 5 ...
## $ texture : num 2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
## $ flavor : num 3.2 2.5 2.8 2.9 2.8 3 3.1 3 3.2 2.8 ...
## $ moistness: num 3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
## potatoes_char : spc_tbl_ [160 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## potatoes_fragment : spc_tbl_ [5 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## potatoes_fread : Classes 'data.table' and 'data.frame': 160 obs. of 2 variables:
## $ texture : num 2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
## $ moistness: num 3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
## pow_two : function (x, print_info = T)
## print_info : function (x)
## prob_between_10_and_20 : num 0.333
## prob_greater_than_5 : num 0.833
## prob_less_than_5 : num 0.167
## properties : chr [1:8] "area" "temp" "size" "storage" "method" "texture" "flavor" ...
## q1 : Named num 447
## q3 : Named num 1111
## question_answer_counts : spc_tbl_ [394,732 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## question_tags : spc_tbl_ [572,153 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## questions : spc_tbl_ [394,732 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## questions_with_tags : spc_tbl_ [745,796 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## r : int 3
## r_3_4_1 : POSIXct[1:1], format: "2016-05-03 07:13:28"
## rainy_days : gropd_df [366 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
## rcount : num 5
## red_brewer_palette : chr [1:9] "#FFF5F0" "#FEE0D2" "#FCBBA1" "#FC9272" "#FB6A4A" "#EF3B2C" ...
## region : chr [1:2] "US" "non-US"
## release_time : POSIXct[1:105], format: "1997-12-04 08:47:58" "1997-12-21 13:09:22" "1998-01-10 00:31:55" ...
## releases : spc_tbl_ [105 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## restaurant_groups : 'data.frame': 10 obs. of 2 variables:
## $ group_id : chr "A" "B" "C" "D" ...
## $ group_size: num 2 4 6 2 2 2 3 2 4 2
## result : chr "large"
## return_jedi : num [1:2] 309 166
## reviews_df : 'data.frame': 5 obs. of 2 variables:
## $ scores : num 4.2 4.6 4.8 5 5
## $ comments: chr "Fascinating plot" "I would watch it again" "I liked it" "Amazing!" ...
## rings : logi [1:8] FALSE FALSE FALSE FALSE TRUE TRUE ...
## rings_vector : logi [1:8] FALSE FALSE FALSE FALSE TRUE TRUE ...
## rotation : num [1:8] 58.64 -243.02 1 1.03 0.41 ...
## roulette_vector : Named num [1:5] -24 -50 100 -350 10
## rquote : chr "r's internals are irrefutably intriguing"
## rsa_env : <environment: 0x0000029a46965448>
## rsa_lst : List of 3
## $ capitals :'data.frame': 3 obs. of 2 variables:
## $ national_parks: chr [1:22] "Addo Elephant National Park" "Agulhas National Park" "Ai-Ais/Richtersveld Transfrontier Park" "Augrabies Falls National Park" ...
## $ population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
## run_poisson_regression : function (data, formula)
## sample_means : num [1:30] 49.2 43 26.6 45.3 41 ...
## samples : 'data.frame': 100 obs. of 1 variable:
## $ mean: num 39.6 42.4 27.5 46.2 37.4 ...
## saros : Formal class 'Duration' [package "lubridate"] with 1 slot
## scores : num [1:5] 4.6 5 4.8 5 4.2
## secondary_y_axis : Classes 'AxisSecondary', 'ggproto', 'gg' <ggproto object: Class AxisSecondary, gg>
## axis: NULL
## break_info: function
## breaks: 59 68 77 86 95 104
## create_scale: function
## detail: 1000
## empty: function
## guide: waiver
## init: function
## labels: 15 20 25 30 35 40
## make_title: function
## mono_test: function
## name: Celsius
## trans: function
## transform_range: function
## super: <ggproto object: Class AxisSecondary, gg>
## select_el : function (x, index)
## select_first : function (x)
## select_second : function (x)
## selected_names : tibble [281 × 5] (S3: tbl_df/tbl/data.frame)
## selection : 'data.frame': 209 obs. of 4 variables:
## $ country: chr "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## $ X1968 : num 1182159 639964 5017299 17996 16727 ...
## $ X1969 : num 1248901 658853 5219332 18619 18088 ...
## $ X1970 : num 1319849 677839 5429743 19206 19529 ...
## selection_vector : Named logi [1:5] TRUE FALSE TRUE FALSE TRUE
## sep_10_2009 : chr [1:4] "September 10 2009" "2009-09-10" "10 Sep 2009" "09-10-2009"
## seq1 : num [1:167] 1 4 7 10 13 16 19 22 25 28 ...
## seq2 : num [1:43] 1200 1193 1186 1179 1172 ...
## sets : spc_tbl_ [21,857 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## sex_vector : chr [1:5] "Male" "Female" "Female" "Male" "Male"
## sfo_survey : tibble [2,809 × 6] (S3: tbl_df/tbl/data.frame)
## sheets : chr [1:3] "1960-1966" "1967-1974" "1975-2011"
## short_dates : chr [1:3] "11 December 1282" "May 1372" "1253"
## size_distribution : 'data.frame': 4 obs. of 3 variables:
## $ group_size : num 2 3 4 6
## $ n : int 6 1 2 1
## $ probability: num 0.6 0.1 0.2 0.1
## sms : num 24
## snake_river_visits : 'data.frame': 410 obs. of 4 variables:
## $ n_visits: num 0 0 0 0 0 0 0 0 0 0 ...
## $ gender : Factor w/ 2 levels "male","female": 1 1 1 2 1 2 2 2 1 1 ...
## $ income : Factor w/ 4 levels "[$0,$25k]","($25k,$55k]",..: 4 2 4 2 4 2 4 4 4 4 ...
## $ travel : Factor w/ 3 levels "[0h,0.25h]","(0.25h,4h]",..: NA NA NA NA NA NA NA NA NA NA ...
## social_vec : num [1:14] 16 9 13 5 2 17 14 17 7 5 ...
## speed : num 88
## speed_vector : chr [1:5] "medium" "slow" "slow" "medium" "fast"
## split_low : List of 4
## $ : chr [1:2] "gauss" "1777"
## $ : chr [1:2] "bayes" "1702"
## $ : chr [1:2] "pascal" "1623"
## $ : chr [1:2] "pearson" "1857"
## split_math : List of 4
## $ : chr [1:2] "GAUSS" "1777"
## $ : chr [1:2] "BAYES" "1702"
## $ : chr [1:2] "PASCAL" "1623"
## $ : chr [1:2] "PEARSON" "1857"
## star_destroyer : tibble [293 × 6] (S3: tbl_df/tbl/data.frame)
## star_destroyer_colors : tibble [16 × 2] (S3: tbl_df/tbl/data.frame)
## star_wars : tibble [80,843 × 13] (S3: tbl_df/tbl/data.frame)
## star_wars_colors : tibble [81 × 3] (S3: tbl_df/tbl/data.frame)
## star_wars_matrix : num [1:3, 1:2] 461 290 309 314 248 ...
## star_wars_matrix_dim : num [1:3, 1:2] 461 290 309 314 248 ...
## star_wars_matrix2 : num [1:3, 1:2] 474 311 380 552 339 ...
## star_wars_parts : tibble [9,081 × 3] (S3: tbl_df/tbl/data.frame)
## std_and_poor500 : 'data.frame': 505 obs. of 5 variables:
## $ symbol : chr "MMM" "ABT" "ABBV" "ABMD" ...
## $ company : chr "3M Company" "Abbott Laboratories" "AbbVie Inc." "ABIOMED Inc" ...
## $ sector : chr "Industrials" "Health Care" "Health Care" "Health Care" ...
## $ industry: chr "Industrial Conglomerates" "Health Care Equipment" "Pharmaceuticals" "Health Care Equipment" ...
## $ pe_ratio: num 18.3 57.7 22.4 45.6 27 ...
## str1 : chr "May 23, '96 hours:23 minutes:01 seconds:45"
## str2 : chr "2012-3-12 14:23:08"
## str3 : chr "30/January/2006"
## sum_abs : function (b, c)
## summ : 'data.frame': 3 obs. of 3 variables:
## $ sheets: chr "1960-1966" "1967-1974" "1975-2011"
## $ nrows : int 209 209 209
## $ ncols : int 8 9 38
## sun_plot : List of 9
## $ data :'data.frame': 3177 obs. of 2 variables:
## $ layers :List of 1
## $ scales :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
## $ mapping :List of 2
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels :List of 2
## sunspots.m : 'data.frame': 3177 obs. of 2 variables:
## $ year : num 1749 1749 1749 1749 1749 ...
## $ value: Time-Series from 1749 to 2014: 58 62.6 70 55.7 85 83.5 94.8 66.3 75.9 75.5 ...
## survey_vector : chr [1:5] "M" "F" "F" "M" "M"
## synodic : Formal class 'Duration' [package "lubridate"] with 1 slot
## tagged_answers : spc_tbl_ [745,796 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## tags : spc_tbl_ [48,299 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## temp : List of 7
## $ : num [1:5] 3 7 9 6 -1
## $ : num [1:5] 6 9 12 13 5
## $ : num [1:5] 4 8 3 -1 -3
## $ : num [1:5] 1 4 7 2 -2
## $ : num [1:5] 5 7 9 4 2
## $ : num [1:5] -3 5 8 9 4
## $ : num [1:5] 3 6 9 4 1
## temperature_vector : chr [1:5] "High" "Low" "High" "Low" "Medium"
## TG : spc_tbl_ [60 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## theme_recession : List of 8
## $ rect :List of 5
## $ axis.text :List of 11
## $ axis.ticks : list()
## $ legend.key :List of 5
## $ legend.position : num [1:2] 0.6 0.1
## $ panel.grid : list()
## $ panel.grid.major.y:List of 6
## $ plot.title :List of 11
## theme_tufte_recession : List of 97
## $ line :List of 6
## $ rect :List of 5
## $ text :List of 11
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : NULL
## $ axis.title.x :List of 11
## $ axis.title.x.top :List of 11
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## $ axis.text :List of 11
## $ axis.text.x :List of 11
## $ axis.text.x.top :List of 11
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## $ axis.ticks : list()
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom: NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.line : list()
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ legend.background : list()
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## $ legend.spacing : 'simpleUnit' num 11points
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key :List of 5
## $ legend.key.size : 'simpleUnit' num 1.2lines
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.text :List of 11
## $ legend.text.align : NULL
## $ legend.title :List of 11
## $ legend.title.align : NULL
## $ legend.position : num [1:2] 0.6 0.1
## $ legend.direction : NULL
## $ legend.justification : chr "center"
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## $ legend.box.background : list()
## $ legend.box.spacing : 'simpleUnit' num 11points
## $ panel.background : list()
## $ panel.border : list()
## $ panel.spacing : 'simpleUnit' num 5.5points
## $ panel.spacing.x : NULL
## $ panel.spacing.y : NULL
## $ panel.grid : list()
## $ panel.grid.major : NULL
## $ panel.grid.minor :List of 6
## $ panel.grid.major.x : NULL
## $ panel.grid.major.y :List of 6
## $ panel.grid.minor.x : NULL
## $ panel.grid.minor.y : NULL
## $ panel.ontop : logi FALSE
## $ plot.background : list()
## $ plot.title :List of 11
## $ plot.title.position : chr "panel"
## $ plot.subtitle :List of 11
## $ plot.caption :List of 11
## $ plot.caption.position : chr "panel"
## $ plot.tag :List of 11
## $ plot.tag.position : chr "topleft"
## $ plot.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## $ strip.background : list()
## $ strip.background.x : NULL
## $ strip.background.y : NULL
## $ strip.clip : chr "inherit"
## $ strip.placement : chr "inside"
## $ strip.text :List of 11
## $ strip.text.x : NULL
## $ strip.text.x.bottom : NULL
## $ strip.text.x.top : NULL
## $ strip.text.y :List of 11
## $ strip.text.y.left :List of 11
## $ strip.text.y.right : NULL
## $ strip.switch.pad.grid : 'simpleUnit' num 2.75points
## $ strip.switch.pad.wrap : 'simpleUnit' num 2.75points
## themes : spc_tbl_ [467 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## throw_die : function ()
## ticket_prices_matrix : num [1:6, 1:2] 5 6 7 4 4.5 4.9 5 6 7 4 ...
## ticket_rowname : chr [1:6] "A New Hope" "The Empire Strikes Back" "Return of the Jedi" ...
## time_online : 'difftime' num [1:5] 2305 34 837 2398 ...
## time1 : POSIXct[1:1], format: "1996-05-23 23:01:45"
## time2 : POSIXct[1:1], format: "2012-03-12 14:23:08"
## titles : chr [1:3] "A New Hope" "The Empire Strikes Back" "Return of the Jedi"
## today : Date[1:1], format: "2024-02-12"
## today_8am : POSIXct[1:1], format: "2024-02-12 08:00:00"
## tom : 'data.frame': 1 obs. of 3 variables:
## $ type : chr "Beef"
## $ calories: int 190
## $ sodium : int 645
## toss_coin : function (n_flips, p_head)
## total_poker : num 230
## total_roulette : num -314
## total_week : num -84
## triple : function (x)
## ttt : chr [1:3, 1:3] "O" NA "X" NA "O" NA "X" "O" "X"
## tue_9am : POSIXct[1:1], format: "2018-08-28 09:00:00"
## two_orders : chr [1:5] "October 7, 2001" "October 13, 2002" "April 13, 2003" ...
## type : chr [1:8] "Terrestrial planet" "Terrestrial planet" "Terrestrial planet" ...
## type_of_capital : chr [1:3] "Legislative" "Judicial" "Administrative"
## upper : Named num 2108
## urban : 'data.frame': 209 obs. of 53 variables:
## $ country: chr "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## $ 1960 : num 769308 494443 3293999 NA NA ...
## $ 1961 : num 814923 511803 3515148 13660 8724 ...
## $ 1962 : num 858522 529439 3739963 14166 9700 ...
## $ 1963 : num 903914 547377 3973289 14759 10748 ...
## $ 1964 : num 951226 565572 4220987 15396 11866 ...
## $ 1965 : num 1000582 583983 4488176 16045 13053 ...
## $ 1966 : num 1058743 602512 4649105 16693 14217 ...
## $ 1967 : num 1119067 621180 4826104 17349 15440 ...
## $ 1968 : num 1182159 639964 5017299 17996 16727 ...
## $ 1969 : num 1248901 658853 5219332 18619 18088 ...
## $ 1970 : num 1319849 677839 5429743 19206 19529 ...
## $ 1971 : num 1409001 698932 5619042 19752 20929 ...
## $ 1972 : num 1502402 720207 5815734 20263 22406 ...
## $ 1973 : num 1598835 741681 6020647 20742 23937 ...
## $ 1974 : num 1696445 763385 6235114 21194 25482 ...
## $ 1975 : num 1793266 785350 6460138 21632 27019 ...
## $ 1976 : num 1905033 807990 6774099 22047 28366 ...
## $ 1977 : num 2021308 830959 7102902 22452 29677 ...
## $ 1978 : num 2142248 854262 7447728 22899 31037 ...
## $ 1979 : num 2268015 877898 7810073 23457 32572 ...
## $ 1980 : num 2398775 901884 8190772 24177 34366 ...
## $ 1981 : num 2493265 927224 8637724 25173 36356 ...
## $ 1982 : num 2590846 952447 9105820 26342 38618 ...
## $ 1983 : num 2691612 978476 9591900 27655 40983 ...
## $ 1984 : num 2795656 1006613 10091289 29062 43207 ...
## $ 1985 : num 2903078 1037541 10600112 30524 45119 ...
## $ 1986 : num 3006983 1072365 11101757 32014 46254 ...
## $ 1987 : num 3113957 1109954 11609104 33548 47019 ...
## $ 1988 : num 3224082 1146633 12122941 35095 47669 ...
## $ 1989 : num 3337444 1177286 12645263 36618 48577 ...
## $ 1990 : num 3454129 1198293 13177079 38088 49982 ...
## $ 1991 : num 3617842 1215445 13708813 39600 51972 ...
## $ 1992 : num 3788685 1222544 14248297 41049 54469 ...
## $ 1993 : num 3966956 1222812 14789176 42443 57079 ...
## $ 1994 : num 4152960 1221364 15322651 43798 59243 ...
## $ 1995 : num 4347018 1222234 15842442 45129 60598 ...
## $ 1996 : num 4531285 1228760 16395553 46343 60927 ...
## $ 1997 : num 4722603 1238090 16935451 47527 60462 ...
## $ 1998 : num 4921227 1250366 17469200 48705 59685 ...
## $ 1999 : num 5127421 1265195 18007937 49906 59281 ...
## $ 2000 : num 5341456 1282223 18560597 51151 59719 ...
## $ 2001 : num 5564492 1315690 19198872 52341 61062 ...
## $ 2002 : num 5795940 1352278 19854835 53583 63212 ...
## $ 2003 : num 6036100 1391143 20529356 54864 65802 ...
## $ 2004 : num 6285281 1430918 21222198 56166 68301 ...
## $ 2005 : num 6543804 1470488 21932978 57474 70329 ...
## $ 2006 : num 6812538 1512255 22625052 58679 71726 ...
## $ 2007 : num 7091245 1553491 23335543 59894 72684 ...
## $ 2008 : num 7380272 1594351 24061749 61118 73335 ...
## $ 2009 : num 7679982 1635262 24799591 62357 73897 ...
## $ 2010 : num 7990746 1676545 25545622 63616 74525 ...
## $ 2011 : num 8316976 1716842 26216968 64817 75207 ...
## urban_clean : 'data.frame': 197 obs. of 53 variables:
## $ country: chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ 1960 : num 769308 494443 3293999 521205 21699 ...
## $ 1961 : num 814923 511803 3515148 548265 21635 ...
## $ 1962 : num 858522 529439 3739963 579695 21664 ...
## $ 1963 : num 903914 547377 3973289 612087 21741 ...
## $ 1964 : num 951226 565572 4220987 645262 21830 ...
## $ 1965 : num 1000582 583983 4488176 679109 21909 ...
## $ 1966 : num 1058743 602512 4649105 717833 22003 ...
## $ 1967 : num 1119067 621180 4826104 757496 22086 ...
## $ 1968 : num 1182159 639964 5017299 798459 22149 ...
## $ 1969 : num 1248901 658853 5219332 841262 22183 ...
## $ 1970 : num 1319849 677839 5429743 886402 22181 ...
## $ 1971 : num 1409001 698932 5619042 955010 22561 ...
## $ 1972 : num 1502402 720207 5815734 1027397 22908 ...
## $ 1973 : num 1598835 741681 6020647 1103830 23221 ...
## $ 1974 : num 1696445 763385 6235114 1184486 23503 ...
## $ 1975 : num 1793266 785350 6460138 1269606 23753 ...
## $ 1976 : num 1905033 807990 6774099 1372800 23977 ...
## $ 1977 : num 2021308 830959 7102902 1481598 24168 ...
## $ 1978 : num 2142248 854262 7447728 1597267 24304 ...
## $ 1979 : num 2268015 877898 7810073 1721590 24358 ...
## $ 1980 : num 2398775 901884 8190772 1855983 24312 ...
## $ 1981 : num 2493265 927224 8637724 2015833 24164 ...
## $ 1982 : num 2590846 952447 9105820 2188351 23929 ...
## $ 1983 : num 2691612 978476 9591900 2371047 23632 ...
## $ 1984 : num 2795656 1006613 10091289 2559552 23306 ...
## $ 1985 : num 2903078 1037541 10600112 2750310 22979 ...
## $ 1986 : num 3006983 1072365 11101757 2955174 22654 ...
## $ 1987 : num 3113957 1109954 11609104 3161499 22345 ...
## $ 1988 : num 3224082 1146633 12122941 3373207 22100 ...
## $ 1989 : num 3337444 1177286 12645263 3597090 21982 ...
## $ 1990 : num 3454129 1198293 13177079 3838852 22036 ...
## $ 1991 : num 3617842 1215445 13708813 4102967 22047 ...
## $ 1992 : num 3788685 1222544 14248297 4388137 22229 ...
## $ 1993 : num 3966956 1222812 14789176 4690892 22538 ...
## $ 1994 : num 4152960 1221364 15322651 5004756 22914 ...
## $ 1995 : num 4347018 1222234 15842442 5324794 23312 ...
## $ 1996 : num 4531285 1228760 16395553 5602207 23667 ...
## $ 1997 : num 4722603 1238090 16935451 5882843 24031 ...
## $ 1998 : num 4921227 1250366 17469200 6173329 24379 ...
## $ 1999 : num 5127421 1265195 18007937 6483827 24690 ...
## $ 2000 : num 5341456 1282223 18560597 6822112 24949 ...
## $ 2001 : num 5564492 1315690 19198872 7190985 25197 ...
## $ 2002 : num 5795940 1352278 19854835 7589585 25383 ...
## $ 2003 : num 6036100 1391143 20529356 8012295 25520 ...
## $ 2004 : num 6285281 1430918 21222198 8449762 25623 ...
## $ 2005 : num 6543804 1470488 21932978 8894673 25706 ...
## $ 2006 : num 6812538 1512255 22625052 9326818 25896 ...
## $ 2007 : num 7091245 1553491 23335543 9764679 26066 ...
## $ 2008 : num 7380272 1594351 24061749 10210317 26223 ...
## $ 2009 : num 7679982 1635262 24799591 10667670 26369 ...
## $ 2010 : num 7990746 1676545 25545622 11139829 26508 ...
## $ 2011 : num 8316976 1716842 26216968 11602851 26771 ...
## urbanpop_sel : 'data.frame': 209 obs. of 3 variables:
## $ X1968: num 1182159 639964 5017299 17996 16727 ...
## $ X1969: num 1248901 658853 5219332 18619 18088 ...
## $ X1970: num 1319849 677839 5429743 19206 19529 ...
## us_visitors : Named num [1:6] 92.2 48.4 44.2 118.6 69 ...
## val : num 14
## version_1_inventories : spc_tbl_ [35,612 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## view_true : logi [1:2, 1:7] FALSE FALSE FALSE FALSE TRUE FALSE ...
## views : num [1:2, 1:7] 16 17 9 7 13 5 5 16 2 8 ...
## visitors : num [1:6, 1:2] 92.2 48.4 44.2 118.6 69 ...
## Vocab : tibble [21,638 × 5] (S3: tbl_df/tbl/data.frame)
## wait_times : 'data.frame': 1000 obs. of 1 variable:
## $ simulation_nb: int 1 2 3 4 5 6 7 8 9 10 ...
## weights : num [1:2] 0.8 0.2
## wind : 'data.frame': 8753 obs. of 3 variables:
## $ date: POSIXct, format: "2003-01-01 00:00:00" "2003-01-01 01:00:00" ...
## $ ws : Factor w/ 7 levels "0 - 2","10 - 12",..: 5 5 4 5 5 5 5 5 4 4 ...
## $ wd : Factor w/ 16 levels "E","ENE","ESE",..: 11 10 10 10 10 10 11 11 11 11 ...
## won_25pct : num 0.75
## won_30pct : num 0.9
## won_35pct : num 1.05
## world_happiness : 'data.frame': 133 obs. of 10 variables:
## $ country : chr "Finland" "Denmark" "Norway" "Iceland" ...
## $ social_support : int 2 4 3 1 15 13 25 5 20 31 ...
## $ freedom : int 5 6 3 7 19 11 10 8 9 26 ...
## $ corruption : int 4 3 8 45 12 7 6 5 11 19 ...
## $ generosity : int 47 22 11 3 7 16 17 8 14 25 ...
## $ gdp_per_cap : int 42400 48300 66300 47900 50500 59000 47200 36500 44200 46900 ...
## $ life_exp : num 81.8 81 82.6 83 81.8 84.3 82.8 81.9 82.2 82 ...
## $ happiness_score : num 155 154 153 152 151 150 149 148 147 146 ...
## $ grams_sugar_per_day: num 86.8 152 120 132 122 166 115 162 132 124 ...
## $ log_gdp_per_cap : num 10.7 10.8 11.1 10.8 10.8 ...
## worldwide_vector : Named num [1:3] 775 538 475
## worldwide_vector_col : Named num [1:2] 1061 728
## x : chr "Monday June 1st 2010 at 4pm"
## x_date : Date[1:1], format: "2013-04-03"
## x_end : num 67
## x_start : num 71
## y : chr "02.01.2010"
## y_breaks : num [1:6] 59 68 77 86 95 104
## y_end : num 7.5
## y_labels : num [1:6] 15 20 25 30 35 40
## y_start : num 5.5
## years : List of 4
## $ : chr "1777"
## $ : chr "1702"
## $ : chr "1623"
## $ : chr "1857"
## years_ad : chr [1:4] "1777" "1702" "1623" "1857"
## years_an : chr [1:4] "1777" "1702" "1623" "1857"
## z : chr "Sep, 12th 2010 14:00"
## zagat : 'data.frame': 310 obs. of 7 variables:
## $ id : int 0 1 2 3 4 5 6 8 9 11 ...
## $ name : chr "apple pan the" "asahi ramen" "baja fresh" "belvedere the" ...
## $ addr : chr "10801 w. pico blvd." "2027 sawtelle blvd." "3345 kimber dr." "9882 little santa monica blvd." ...
## $ city : Factor w/ 26 levels "atlanta","los angeles",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ phone: Factor w/ 329 levels "100-813-8212",..: 143 145 312 155 139 96 323 125 150 132 ...
## $ type : chr "american" "noodle shops" "mexican" "pacific new wave" ...
## $ class: int 534 535 536 537 538 539 540 542 543 545 ...
ls.str(rsa_env)
## capitals : 'data.frame': 3 obs. of 2 variables:
## $ city : chr "Cape Town" "Bloemfontein" "Pretoria"
## $ type_of_capital: chr "Legislative" "Judicial" "Administrative"
## national_parks : chr [1:22] "Addo Elephant National Park" "Agulhas National Park" ...
## population : Time-Series [1:5] from 1996 to 2016: 40583573 44819778 47390900 51770560 55908900
# Does groom_model exist in rsa_env?
exists("groom_model", envir = rsa_env)
## [1] TRUE
# Does groom_model exist in rsa_env, ignoring inheritance?
exists("groom_model", envir = rsa_env, inherits = FALSE)
## [1] FALSE
R searches for variables in all the parent environments, unless you explicitly tell it not to (inherits = FALSE
).
12.3.4 Scope and precedence
When you call a function, R gives it an environment to store its variables.
Accessing variables outside functions
When R couldn’t find variable inside the function’s environment, it looked in the parent environment. And variable can still be used inside the function.
<- function(x) {
x_times_y * y
x
}
x_times_y(10)
in x_times_y(10) : object 'y' not found Error
<- function(x) {
x_times_y * y
x
}
<- 4
y x_times_y(10)
## [1] 40
Accessing function variables from outside
But you can’t look inside the function’s environment from outside.
<- function(x) {
x_times_y * y
x
}
<- 4
y x_times_y(10)
print(x)
: object 'x' not found Error
Although x is defined inside the function, you can’t access it from outside the function.
Variable precedence
Variables inside functions take precedence over variables outside functions.
So, R will search for variables inside the function first. If it finds y there, it doesn’t need to look in the parent environment.
<- function(x) {
x_times_y <- 6
y * y
x
}
<- 4
y
x_times_y(10)
## [1] 60
Values defined inside the function take precedence over values passed into the function.
<- function(x) {
x_times_y <- 9
x <- 6
y * y
x
}
<- 4
y
x_times_y(10)
## [1] 54
12.4 Case Study on Grain Yields
12.4.1 Grain yields & unit conversion
12.4.1.1 Convert areas to metric 1
In this chapter, you’ll be working with grain yield data from the United States Department of Agriculture, National Agricultural Statistics Service. Unfortunately, they report all areas in acres.
So, the first thing you need to do is write some utility functions to convert areas in acres to areas in hectares.
To solve this exercise, you need to know the following:
There are 4840 square yards in an acre.
There are 36 inches in a yard and one inch is 0.0254 meters.
There are 10000 square meters in a hectare.
# convert areas in acres to areas in square yards.
# Write a function to convert acres to sq. yards
<- function(acres) {
acres_to_sq_yards * 4840
acres
}
# convert distances in yards to distances in meters.
# Write a function to convert yards to meters
<- function(yards) {
yards_to_meters * 36 * 0.0254
yards
}
# convert areas in square meters to areas in hectares.
# Write a function to convert sq. meters to hectares
<- function(sq_meters) {
sq_meters_to_hectares / 10000
sq_meters }
12.4.1.2 Convert areas to metric 2
You need another utility function to deal with getting from square yards to square meters. Then, you can bring everything together to write the overall acres-to-hectares conversion function.
magrittr’s pipeable operator replacements
x * y
=x %>% multiply_by(y)
x ^ y
=x %>% raise_to_power(y)
x[y]
=x %>% extract2("y")
library(magrittr)
# Write a function to convert distance in square yards to square meters
<- function(sq_yards) {
sq_yards_to_sq_meters %>%
sq_yards # Take the square root
sqrt() %>%
# Convert yards to meters
yards_to_meters() %>%
# Square it
raise_to_power(2)
}
# Write a function to convert areas in acres to hectares
<- function(acres) {
acres_to_hectares %>%
acres # Convert acres to sq yards
acres_to_sq_yards() %>%
# Convert sq yards to sq meters
sq_yards_to_sq_meters() %>%
# Convert sq meters to hectares
sq_meters_to_hectares()
}
# harmonically convert areas in acres to hectares.
# Define a harmonic acres to hectares function
<- function(acres) {
harmonic_acres_to_hectares %>%
acres # Get the reciprocal
get_reciprocal() %>%
# Convert acres to hectares
acres_to_hectares() %>%
# Get the reciprocal again
get_reciprocal()
}
By breaking down this conversion into lots of simple functions, you have easy to read code, which helps guard against bugs.
12.4.1.3 Convert yields to metric
The yields in the NASS corn data are also given in US units, namely bushels per acre.
You’ll need to write some more utility functions to convert this unit to the metric unit of kg per hectare.
Bushels historically meant a volume of 8 gallons, but in the context of grain, they are now defined as masses. This mass differs for each grain! To solve this exercise, you need to know these facts.
- One pound (lb) is 0.45359237 kilograms (kg).
- One bushel is 48 lbs of barley, 56 lbs of corn, or 60 lbs of wheat.
# Write a function to convert lb to kg
<- function(lbs) {
lbs_to_kgs * 0.45359237
lbs
}
# Write a function to convert bushels to lbs
<- function(bushels, crop) {
bushels_to_lbs # Define a lookup table of scale factors
c(barley = 48, corn = 56, wheat = 60) %>%
# Extract the value for the crop
extract2(crop) %>%
# Multiply by the no. of bushels
multiply_by(bushels)
}
# Write a function to convert bushels to kg
<- function(bushels, crop) {
bushels_to_kgs %>%
bushels # Convert bushels to lbs for this crop
bushels_to_lbs(crop) %>%
# Convert lbs to kgs
lbs_to_kgs()
}
# Write a function to convert bushels/acre to kg/ha
<- function(bushels_per_acre, crop = c("barley", "corn", "wheat")) {
bushels_per_acre_to_kgs_per_hectare # Match the crop argument
<- match.arg(crop)
crop %>%
bushels_per_acre # Convert bushels to kgs for this crop
bushels_to_kgs(crop) %>%
# Convert harmonic acres to ha
harmonic_acres_to_hectares()
}
You now have a full stack of functions for converting the units.
12.4.1.4 Apply the unit conversion
Now that you’ve written some functions, it’s time to apply them! The NASS corn
dataset is available, and you can fortify it (jargon for “adding new columns”) with metrics areas and yields.
This fortification process can also be turned into a function, so you’ll define a function for this, and test it on the NASS wheat
dataset.
<- read_rds("data/nass.corn.rds")
corn <- read_rds("data/nass.wheat.rds")
wheat
# View the corn dataset
glimpse(corn)
## Rows: 6,381
## Columns: 4
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Alabama", "Arkansas", "California", "Connectic…
## $ farmed_area_acres <dbl> 1050000, 280000, 42000, 57000, 200000, 125000, …
## $ yield_bushels_per_acre <dbl> 9.0, 18.0, 28.0, 34.0, 23.0, 9.0, 6.0, 29.0, 36…
<- corn %>%
corn # Add some columns
mutate(
# Convert farmed area from acres to ha
farmed_area_ha = acres_to_hectares(farmed_area_acres),
# Convert yield from bushels/acre to kg/ha
yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(
yield_bushels_per_acre,crop = "corn"
)
)
glimpse(corn)
## Rows: 6,381
## Columns: 6
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Alabama", "Arkansas", "California", "Connectic…
## $ farmed_area_acres <dbl> 1050000, 280000, 42000, 57000, 200000, 125000, …
## $ yield_bushels_per_acre <dbl> 9.0, 18.0, 28.0, 34.0, 23.0, 9.0, 6.0, 29.0, 36…
## $ farmed_area_ha <dbl> 424919.92, 113311.98, 16996.80, 23067.08, 80937…
## $ yield_kg_per_ha <dbl> 564.9090, 1129.8180, 1757.4946, 2134.1006, 1443…
# Wrap this code into a function
<- function(data, crop) {
fortify_with_metric_units %>%
data mutate(
farmed_area_ha = acres_to_hectares(farmed_area_acres),
yield_kg_per_ha = bushels_per_acre_to_kgs_per_hectare(
yield_bushels_per_acre, crop = crop
)
)
}
# Try it on the wheat dataset
<- fortify_with_metric_units(wheat, "wheat")
wheat glimpse(wheat)
## Rows: 5,963
## Columns: 6
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Alabama", "Arkansas", "California", "Connectic…
## $ farmed_area_acres <dbl> 125000, 50000, 650000, 2000, 59000, 245000, 230…
## $ yield_bushels_per_acre <dbl> 5.0, 6.5, 18.0, 17.5, 11.0, 4.0, 10.5, 10.0, 13…
## $ farmed_area_ha <dbl> 50585.7053, 20234.2821, 263045.6675, 809.3713, …
## $ yield_kg_per_ha <dbl> 336.2553, 437.1320, 1210.5192, 1176.8937, 739.7…
12.4.2 Visualizing grain yields
12.4.2.1 Plot yields over time
An obvious question to ask about each crop is, “how do the yields change over time in each US state?”
# Using corn, plot yield (kg/ha) vs. year
ggplot(corn, aes(year, yield_kg_per_ha)) +
# Add a line layer, grouped by state
geom_line(aes(group = state)) +
# Add a smooth trend layer
geom_smooth()
## `geom_smooth()` using method = 'gam' and
## formula = 'y ~ s(x, bs = "cs")'
Turn the plotting code into a function.
# Wrap this plotting code into a function
<- function(data) {
plot_yield_vs_year ggplot(data, aes(year, yield_kg_per_ha)) +
geom_line(aes(group = state)) +
geom_smooth()
}
# Test it on the wheat dataset
plot_yield_vs_year(wheat)
## `geom_smooth()` using method = 'gam' and
## formula = 'y ~ s(x, bs = "cs")'
Look at the huge increase in yields from the time of the Green Revolution in the 1950s.
12.4.2.2 A nation divided
The USA has a varied climate, so we might expect yields to differ between states. Rather than trying to reason about 50 states separately, we can use the USA Census Regions to get 9 groups.
The “Corn Belt”, where most US corn is grown is in the “West North Central” and “East North Central” regions. The “Wheat Belt” is in the “West South Central” region.
<- read_delim("data/usa_census_regions.txt", delim = ",")
usa_census_regions usa_census_regions
## # A tibble: 51 × 2
## census_region state
## <chr> <chr>
## 1 New England Connecticut
## 2 New England Maine
## 3 New England Massachusetts
## 4 New England New Hampshire
## 5 New England Rhode Island
## 6 New England Vermont
## 7 Mid-Atlantic New Jersey
## 8 Mid-Atlantic New York
## 9 Mid-Atlantic Pennsylvania
## 10 East North Central Illinois
## # ℹ 41 more rows
# Inner join the corn dataset to usa_census_regions by state
<- corn %>%
corn inner_join(usa_census_regions, by = "state")
glimpse(corn)
## Rows: 6,381
## Columns: 7
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Alabama", "Arkansas", "California", "Connectic…
## $ farmed_area_acres <dbl> 1050000, 280000, 42000, 57000, 200000, 125000, …
## $ yield_bushels_per_acre <dbl> 9.0, 18.0, 28.0, 34.0, 23.0, 9.0, 6.0, 29.0, 36…
## $ farmed_area_ha <dbl> 424919.92, 113311.98, 16996.80, 23067.08, 80937…
## $ yield_kg_per_ha <dbl> 564.9090, 1129.8180, 1757.4946, 2134.1006, 1443…
## $ census_region <chr> "East South Central", "West South Central", "Pa…
Turn the code into a function.
# Wrap this code into a function
<- function(data) {
fortify_with_census_region %>%
data inner_join(usa_census_regions, by = "state")
}
# Try it on the wheat dataset
<- fortify_with_census_region(wheat)
wheat glimpse(wheat)
## Rows: 5,963
## Columns: 7
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Alabama", "Arkansas", "California", "Connectic…
## $ farmed_area_acres <dbl> 125000, 50000, 650000, 2000, 59000, 245000, 230…
## $ yield_bushels_per_acre <dbl> 5.0, 6.5, 18.0, 17.5, 11.0, 4.0, 10.5, 10.0, 13…
## $ farmed_area_ha <dbl> 50585.7053, 20234.2821, 263045.6675, 809.3713, …
## $ yield_kg_per_ha <dbl> 336.2553, 437.1320, 1210.5192, 1176.8937, 739.7…
## $ census_region <chr> "East South Central", "West South Central", "Pa…
With the census data incorporated into the crop datasets, you can now look at yield differences between the regions.
12.4.2.3 Plot yields over time by region
Now you are ready to look at how the yields change over time in each region of the USA.
Use the function you wrote to plot yield versus year for the corn
dataset, then facet the plot, wrapped by census_region
.
# Plot yield vs. year for the corn dataset
plot_yield_vs_year(corn) +
# Facet, wrapped by census region
facet_wrap(vars(census_region))
## `geom_smooth()` using method = 'gam' and
## formula = 'y ~ s(x, bs = "cs")'
The corn yields are highest in the West North Central region, the heart of the Corn Belt.
Turn the code into a function.
# Wrap this code into a function
<- function(data) {
plot_yield_vs_year_by_region plot_yield_vs_year(data) +
facet_wrap(vars(census_region))
}
# Try it on the wheat dataset
plot_yield_vs_year_by_region(wheat)
## `geom_smooth()` using method = 'gam' and
## formula = 'y ~ s(x, bs = "cs")'
For wheat, it looks like the yields are highest in the Wheat Belt (West South Central region) have been overtaken by some other regions.
12.4.3 Modeling grain yields
12.4.3.1 Running a model
Notice that the lines on the plot aren’t straight. This means that we need to choose a model that handles nonlinear responses. geom_smooth
creates the smooth trend lines by using generalized additive models (gam).
generalized additive model
mgcv
package GAM syntax:
gam(response ~ s(explanatory_var1) + explanatory_var2, data = dataset)
Here, s()
means “make the variable smooth”, where smooth very roughly means nonlinear.
library(mgcv)
# Run a generalized additive model of yield vs. smoothed year and census region
<- gam(yield_kg_per_ha ~ s(year) + census_region, data = corn)
corn_model corn_model
##
## Family: gaussian
## Link function: identity
##
## Formula:
## yield_kg_per_ha ~ s(year) + census_region
##
## Estimated degrees of freedom:
## 7.64 total = 16.64
##
## GCV score: 836963
Wrap the modeling code into a function.
# Wrap the model code into a function
<- function(data) {
run_gam_yield_vs_year_by_region gam(yield_kg_per_ha ~ s(year) + census_region, data = data)
}
# Try it on the wheat dataset
<- run_gam_yield_vs_year_by_region(wheat)
wheat_model wheat_model
##
## Family: gaussian
## Link function: identity
##
## Formula:
## yield_kg_per_ha ~ s(year) + census_region
##
## Estimated degrees of freedom:
## 7.03 total = 16.03
##
## GCV score: 316685.5
12.4.3.2 Making yield predictions
Using the models to make predictions. You can do this using a call to predict()
, in the following form:
predict(model, cases_to_predict, type = "response")
GAMs of the corn
and wheat
datasets are available as corn_model
and wheat_model
. A character vector of census regions is stored as census_regions
.
<- c("New England", "Mid-Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain", "Pacific" )
census_regions
# Make predictions in 2050
<- data.frame(
predict_this year = 2050,
census_region = census_regions
)
# Predict the yield
<- predict(corn_model, predict_this, type = "response")
pred_yield_kg_per_ha
%>%
predict_this # Add the prediction as a column of predict_this
mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha) %>%
arrange(desc(pred_yield_kg_per_ha))
## year census_region pred_yield_kg_per_ha
## 1 2050 Pacific 14396.33
## 2 2050 New England 13971.80
## 3 2050 East North Central 13858.74
## 4 2050 Mid-Atlantic 13483.19
## 5 2050 Mountain 13360.67
## 6 2050 West North Central 13262.68
## 7 2050 West South Central 12616.83
## 8 2050 South Atlantic 12517.50
## 9 2050 East South Central 12481.83
Wrap the script into a function.
# Wrap this prediction code into a function
<- function(model, year) {
predict_yields <- data.frame(
predict_this year = year,
census_region = census_regions
) <- predict(model, predict_this, type = "response")
pred_yield_kg_per_ha %>%
predict_this mutate(pred_yield_kg_per_ha = pred_yield_kg_per_ha) %>%
arrange(desc(pred_yield_kg_per_ha))
}
# Try it on the wheat dataset
predict_yields(wheat_model, 2050)
## year census_region pred_yield_kg_per_ha
## 1 2050 Pacific 5707.724
## 2 2050 New England 5506.308
## 3 2050 East North Central 5470.007
## 4 2050 Mountain 5457.741
## 5 2050 Mid-Atlantic 5425.554
## 6 2050 South Atlantic 5079.523
## 7 2050 East South Central 4985.757
## 8 2050 West North Central 4984.050
## 9 2050 West South Central 4764.289
The models predict that in 2050, the highest yields will be in the Pacific region for both corn and wheat.
12.4.3.3 Do it all over again
Now you are going to rerun the whole analysis from this chapter on a new crop, barley.
Barley prefers a cooler climate compared to corn and wheat and is commonly grown in the US mountain states of Idaho and Montana.
Since all the infrastructure is in place,
fortify_with_metric_units()
, fortify_with_census_region()
, plot_yield_vs_year_by_region()
, run_gam_yield_vs_year_by_region()
, and predict_yields()
that’s less effort than it sounds!
<- read_rds("data/nass.barley.rds")
barley head(barley)
## year state farmed_area_acres yield_bushels_per_acre
## 1 1866 Connecticut 1000 22.5
## 2 1866 Illinois 96000 23.4
## 3 1866 Indiana 11000 23.0
## 4 1866 Iowa 66000 22.0
## 5 1866 Kansas 2000 23.0
## 6 1866 Kentucky 10000 23.5
Fortify the barley
data with metric units, then with census regions.
<- barley %>%
fortified_barley # Fortify with metric units
fortify_with_metric_units("barley") %>%
# Fortify with census regions
fortify_with_census_region()
# See the result
glimpse(fortified_barley)
## Rows: 4,839
## Columns: 7
## $ year <int> 1866, 1866, 1866, 1866, 1866, 1866, 1866, 1866,…
## $ state <chr> "Connecticut", "Illinois", "Indiana", "Iowa", "…
## $ farmed_area_acres <dbl> 1000, 96000, 11000, 66000, 2000, 10000, 34000, …
## $ yield_bushels_per_acre <dbl> 22.5, 23.4, 23.0, 22.0, 23.0, 23.5, 21.5, 25.5,…
## $ farmed_area_ha <dbl> 404.6856, 38849.8217, 4451.5421, 26709.2524, 80…
## $ yield_kg_per_ha <dbl> 1210.5192, 1258.9400, 1237.4197, 1183.6188, 123…
## $ census_region <chr> "New England", "East North Central", "East Nort…
Plot the yield versus year by census region.
# Plot yield vs. year by region
%>% plot_yield_vs_year_by_region() fortified_barley
## `geom_smooth()` using method = 'gam' and
## formula = 'y ~ s(x, bs = "cs")'
Run a GAM of yield versus year by census region, then predict the yields in 2050.
%>%
fortified_barley # Run a GAM of yield vs. year by region
run_gam_yield_vs_year_by_region() %>%
# Make predictions of yields in 2050
predict_yields(2050)
## year census_region pred_yield_kg_per_ha
## 1 2050 Mountain 4638.785
## 2 2050 Pacific 4267.937
## 3 2050 South Atlantic 4248.425
## 4 2050 Mid-Atlantic 4247.439
## 5 2050 New England 4229.927
## 6 2050 East North Central 4210.503
## 7 2050 East South Central 4015.834
## 8 2050 West North Central 3843.945
## 9 2050 West South Central 3638.759
Since all your analysis code was contained in functions, it was really simple to apply it to another dataset. Here you can see that yields are highest in the Mountain region, and the model predicts that this will still be the case in 2050.