Matrices, Data frames, the function-function, and control flow ... flow control, lists?
18 Feb 2022The code we produced in class this and last week can be downloaded and reviewed here:
This and last week’s problem set and required R script and data can be found here:
Below a recap of the highlights from the last 2 weeks.
Exporting R objects
In previous meetings with discussed the load()
and save()
functions. Now consider the read.csv()
and write.csv()
functions. These allow us to export R objects (i.e., vectors, matrices, and data-frames) to a plain text format that can be read by MS Excel.
# write.csv() saves R objects as comma-separated value files
# read.csv() reads data from comma-separated value files into R
write.csv(x, file = "x.csv")
x <- read.csv(file = "x.csv")
Matrices
We barely scratched the surface with matrices. We discussed three functions to generate matrices:
rbind()
, cbind()
, and matrix()
x <- 1:10
y <- 10:1
# binds the vectors x and y together (as rows)
# returns a 2 by 10 matrix
rbind(x, y)
# binds the vectors x and y together (as columns)
# returns a 10 by 2 matrix
cbind(x, y)
# returns a 5 by 5 matrix containing 0's
matrix(data = 0, nrow = 5, ncol = 5)
# returns a 4 by 10 matrix containing a
# sequence of integers from 1 to 40, filled by row
matrix(data = 1:40, nrow = 4, ncol = 10, byrow = TRUE)
Indexing matrices works similar to the indexing of vectors via the []
notation. The difference is that matrices are two dimensional objects and we need to specify not just one location (in one dimension) but two, like so: some_object[row-location, column-location]
.
x <- matrix(data = 1:40, nrow = 4, ncol = 10, byrow = TRUE)
# returns the value stored in row 2, column 3 of matrix x
x[2,3]
# returns a sub-matrix with 2 rows and two columns
x[2:3,4:5]
# returns all of row 3
x[3, ]
# returns all of colunm 4
x[ ,4]
Data frames
We began discussing data frames (i.e., 2D heterogenous objects). These objects can most easily be constructed from atomic vectors via the data.frame()
function.
# 4 different types of vectors
x <- letters[1:5]
y <- 1:5
z <- c(TRUE, TRUE, NA, FALSE, FALSE)
pies <- rep(x = pi, times = 5)
v <- as.factor(c("yes", "yes", "no", "yes", "no"))
dat <- data.frame(x, y, z, pies, v)
# named vectors stored in data.frames can be extracted via the $
dat$pies # extracts the pies vector from the data frame
dat$x[1:2] # extracts the first and second element of x stored in dat
# code below creates new named vector (called random)
# inside data frame dat
# the vector contains 5 random normal variates
dat$random <- rnorm(n = 5, mean = 0, sd = 1)
More on indexing both by location for vectors and vectors stored in data frames.
x <- c(1, 9, 8, 1)
y <- c(2, 0, 2, 0)
dat <- data.frame(x, y)
x[2] # returns 9
dat$x[2] # returns 9
dat$y[1+2] # returns 2
dat$y[36/9] # returns 0
New functions to explore data frames and edit or created vectors included:
summary()
basic summary statistics of an objectnames()
names of vectors stored in data frame
names(dat) # evaluates to x and y
names(dat)[1] <- "XXX"
# the line above changes the name of the first
# vector stored in dat to XXX
names(dat) # returns the new names
head()
andtail()
get the first and last few rows of a data frame, respectivelyifelse()
basic if-else construct
The ifelse()
function can be very useful to edit or create vectors.
x <- c(1, 9, 8, 1)
ifelse(test = x > 1, yes = 2, no = x)
# the line above will replace all elements
# in x that are greater than 1 with 2s
# and replace those smaller or equal to 1
# with themselves (i.e., leave them as is)
# returns: [1] 1 2 2 1
The Control-Flow Construct if () {} else {}
We introduced more flexible flow control via if
and else
which allow us to instruct R to do different things (i.e., follow instruction, evaluate code) based on whether some logical condition obtains or not.
# Simple example (condition is TRUE)
if (1 + 1 == 2) {
print("True")
} else {
print("False")
}
# Simple example (condition is FALSE)
if (1 + 1 == 3) {
print("True")
} else {
print("False")
}
# another example getting R to do a bunch of things
if (0.5 > 1) {
x <- rnorm(n = 10, mean = 0, sd = 1)
x[1]
sum(x)
} else {
y <- runif(n = 10, min = 0, max = 1)
y[1]
sum(y)
}
Another big concept introduced involved the creation of custom functions via the function()
function.
# A function to compute the standard deviation of an arbitrary
# numeric input vector that gives user the option to choose between
# the sample and population variants (the default returns the
# standard variation of a sample)
my_sd <- function(input, population = FALSE) {
n <- length(x = input)
x_bar <- mean(x = input)
if (population == FALSE) {
output <- sqrt(x = sum(x = (input - x_bar)^2)/(n-1))
return(output)
} else {
output <- sqrt(x = sum( x = (input - x_bar)^2)/n)
return(output)
}
}
# Try it out and compare to canned sd() function
x <- rnorm(n = 10, mean = 0, sd = 1)
x
sd(x = x)
my_sd(input = x)
my_sd(input = x, population = TRUE)
Below is the code for a function to convert temperature in degrees Fahrenheit to degrees Celsius (and vice-versa). Note that this function here is different from the one we created in class. Unlike the version in class which returned a vector of converted temperatures, this function does not create a vector but just “cat-calls” the conversion to the screen via the cat()
function. The cat()
function concatenates, vectors, or character-strings (i.e., vectors) into a single string and prints it to the screen. (The \n
in the last string to be concatenated creates a forced line break so that the prompt appears on a new line.)
convert_temp <- function(input, C.to.F = TRUE) {
if (C.to.F == "TRUE") {
output <- input * 9/5 + 32
return(
cat(input, "degrees Celsius is equal to", output,
"degrees Fahrenheit.\n")
)
} else {
output <- (input - 32) * 5/9
return(
cat(input, "degrees Fahrenheit is equal to", output,
"degrees Celsius.\n")
)
}
}
convert_temp(input = 0, C.to.F = TRUE)
convert_temp(input = 32, C.to.F = FALSE)
Lists
As a last thing we briefly discussed lists. Lists are one-dimensional heterogenous storage containers. Whereas, data-frames are two-dimensional storage containers for vectors, lists can store vectors, matrixes, and data-frames.
# two vectors
x <- 1:10
y <- x^2
# one data-frame
df <- data.frame(x, y)
# one matrix
my_matrix <- matrix(data = 1:12, nrow = 3, ncol = 4)
# all of the above and more added to a list
my_list <- list(x, y, df, my_matrix, letters, pi, 9001)
# We have a list of length 7. Seven things have been put into
# our list. Individual elements in the list can be accessed or
# extracted using the double-square-bracket notation [[item]]
my_list[[7]] # returns the seventh item in the list which is
# 9001
my_list[[2]] # returns the second item in the list which is
# the vector y
my_list[[3]]$y[10] # returns the 10th element of the vector y
# stored in our data-frame df, stored in the
# third spot in our list
my_list[[4]][1:2, 3] # what about this?