Matrices, Data frames, the function-function, and control flow ... flow control, lists?

The code we produced in class this and last week can be downloaded and reviewed here:

This and last week’s problem set and required R script and data can be found here:

Below a recap of the highlights from the last 2 weeks.

Exporting R objects

In previous meetings with discussed the load() and save() functions. Now consider the read.csv() and write.csv() functions. These allow us to export R objects (i.e., vectors, matrices, and data-frames) to a plain text format that can be read by MS Excel.

# write.csv() saves R objects as comma-separated value files
# read.csv() reads data from comma-separated value files into R

write.csv(x, file = "x.csv")
x <- read.csv(file = "x.csv")

Matrices

We barely scratched the surface with matrices. We discussed three functions to generate matrices: rbind(), cbind(), and matrix()


x <- 1:10
y <- 10:1

# binds the vectors x and y together (as rows)
# returns a 2 by 10 matrix

rbind(x, y)

# binds the vectors x and y together (as columns)
# returns a 10 by 2 matrix

cbind(x, y)

# returns a 5 by 5 matrix containing 0's 

matrix(data = 0, nrow = 5, ncol = 5)

# returns a 4 by 10 matrix containing a 
# sequence of integers from 1 to 40, filled by row

matrix(data = 1:40, nrow = 4, ncol = 10, byrow = TRUE)

Indexing matrices works similar to the indexing of vectors via the [] notation. The difference is that matrices are two dimensional objects and we need to specify not just one location (in one dimension) but two, like so: some_object[row-location, column-location].


x <- matrix(data = 1:40, nrow = 4, ncol = 10, byrow = TRUE)

# returns the value stored in row 2, column 3 of matrix x

x[2,3]

# returns a sub-matrix with 2 rows and two columns 

x[2:3,4:5]

# returns all of row 3

x[3, ]

# returns all of colunm 4

x[ ,4]

Data frames

We began discussing data frames (i.e., 2D heterogenous objects). These objects can most easily be constructed from atomic vectors via the data.frame() function.


# 4 different types of vectors

x <- letters[1:5]
y <- 1:5
z <- c(TRUE, TRUE, NA, FALSE, FALSE)
pies <- rep(x = pi, times = 5)
v <- as.factor(c("yes", "yes", "no", "yes", "no"))

dat <- data.frame(x, y, z, pies, v)

# named vectors stored in data.frames can be extracted via the $

dat$pies # extracts the pies vector from the data frame

dat$x[1:2] # extracts the first and second element of x stored in dat

# code below creates new named vector (called random) 
# inside data frame dat
# the vector contains 5 random normal variates

dat$random <- rnorm(n = 5, mean = 0, sd = 1)

More on indexing both by location for vectors and vectors stored in data frames.

x <- c(1, 9, 8, 1)
y <- c(2, 0, 2, 0)

dat <- data.frame(x, y)

x[2] # returns 9

dat$x[2] # returns 9

dat$y[1+2] # returns 2

dat$y[36/9] # returns 0

New functions to explore data frames and edit or created vectors included:

  • summary() basic summary statistics of an object
  • names() names of vectors stored in data frame
names(dat) # evaluates to x and y

names(dat)[1] <- "XXX" 

# the line above changes the name of the first 
# vector stored in dat to XXX

names(dat) # returns the new names
  • head() and tail() get the first and last few rows of a data frame, respectively
  • ifelse() basic if-else construct

The ifelse() function can be very useful to edit or create vectors.

x <- c(1, 9, 8, 1)

ifelse(test = x > 1, yes = 2, no = x) 

# the line above will replace all elements
# in x that are greater than 1 with 2s
# and replace those smaller or equal to 1
# with themselves (i.e., leave them as is)
# returns: [1] 1 2 2 1

The Control-Flow Construct if () {} else {}

We introduced more flexible flow control via if and else which allow us to instruct R to do different things (i.e., follow instruction, evaluate code) based on whether some logical condition obtains or not.

# Simple example (condition is TRUE)

if (1 + 1 == 2) {
	print("True")
} else {
	print("False")
}

# Simple example (condition is FALSE)

if (1 + 1 == 3) {
	print("True")
} else {
	print("False")
}

# another example getting R to do a bunch of things

if (0.5 > 1) {
	x <- rnorm(n = 10, mean = 0, sd = 1)
	x[1]
	sum(x)
} else {
	y <- runif(n = 10, min = 0, max = 1)
	y[1]
	sum(y)
}

Another big concept introduced involved the creation of custom functions via the function() function.

# A function to compute the standard deviation of an arbitrary 
# numeric input vector that gives user the option to choose between 
# the sample and population variants (the default returns the
# standard variation of a sample)

my_sd <- function(input, population = FALSE) {

	n <- length(x = input)
	x_bar <- mean(x = input)

	if (population == FALSE) {

		output <- sqrt(x = sum(x = (input - x_bar)^2)/(n-1))
		return(output)

	} else {

	 	output <- sqrt(x = sum( x = (input - x_bar)^2)/n)
	 	return(output)

	}

}

# Try it out and compare to canned sd() function

x <- rnorm(n = 10, mean = 0, sd = 1)

x

sd(x = x)

my_sd(input = x)

my_sd(input = x, population = TRUE)

Below is the code for a function to convert temperature in degrees Fahrenheit to degrees Celsius (and vice-versa). Note that this function here is different from the one we created in class. Unlike the version in class which returned a vector of converted temperatures, this function does not create a vector but just “cat-calls” the conversion to the screen via the cat() function. The cat() function concatenates, vectors, or character-strings (i.e., vectors) into a single string and prints it to the screen. (The \n in the last string to be concatenated creates a forced line break so that the prompt appears on a new line.)

convert_temp <- function(input, C.to.F = TRUE) {

	if (C.to.F == "TRUE") {

		output <- input * 9/5 + 32
		return(

		  cat(input, "degrees Celsius is equal to", output,
		  "degrees Fahrenheit.\n")

		)

	} else {
		
		output <- (input - 32) * 5/9
		return(

		  cat(input, "degrees Fahrenheit is equal to", output,
		  "degrees Celsius.\n")

		)

	}

}


convert_temp(input = 0, C.to.F = TRUE)
convert_temp(input = 32, C.to.F = FALSE)

Lists

As a last thing we briefly discussed lists. Lists are one-dimensional heterogenous storage containers. Whereas, data-frames are two-dimensional storage containers for vectors, lists can store vectors, matrixes, and data-frames.


# two vectors

x <- 1:10 
y <- x^2

# one data-frame

df <- data.frame(x, y)

# one matrix

my_matrix <- matrix(data = 1:12, nrow = 3, ncol = 4)

# all of the above and more added to a list

my_list <- list(x, y, df, my_matrix, letters, pi, 9001)

# We have a list of length 7. Seven things have been put into
# our list. Individual elements in the list can be accessed or 
# extracted using the double-square-bracket notation [[item]]

my_list[[7]] # returns the seventh item in the list which is
	     # 9001

my_list[[2]] # returns the second item in the list which is 
	     # the vector y

my_list[[3]]$y[10] # returns the 10th element of the vector y
 		   # stored in our data-frame df, stored in the
 		   # third spot in our list

my_list[[4]][1:2, 3] # what about this?