ggplot2 ...

The code we produced over the last week can be downloaded and reviewed here: Code-Day15.R and Code-Day16.R.

ggplot2 - data and geoms

Over the last week we covered the ggplot2 package and its role in producing data visualizations, graphs, and figures. We began with the notion that a data visualization consists at a minimum of three components

In the context of the ggplot package these three components constitute layers that we iteratively add to produce a plot or graph.

the ggplot() function

The function takes one important argument data and consititutes the base layer of our plot

x <- seq(from = -5, to = 5, by = 0.1)
y <- sin(x)
z <- ifelse(y > 0, "positive", "negative")

z <- as.factor(z)

my_data <- data.frame(x, y, z)

ggplot(data = my_data)

The above code generates and empty plot. It initiates a canvas if you will.

Let’s add a geometric idea … how about a scatter plot. To set geometric ideas we use one of the many geom_() functions. There exist many such as:

One Variable:

Two Variables:

Scatterplots are basically points so here we’d use geom_point() to instruct R to draw a scatterplot.

ggplot(data = my_data) + geom_point()

The problem with above code is that R does not know what components of the data are supposed to be mapped to what aesthetic attributes of the points in our scatterplot. In fact R produces a warning:

Error: geom_point requires the following missing aesthetics: x, y

All geometric ideas or geoms require some kind of mapping from data to an aesthetic attribute. Points have many aesthetic attributes such as:

Importantly one of their aesthetic characteristics is the location on our canvas: here in terms of x and y coordinates.

To map data to these aesthetic attributes of our geometric ideas we use the aes() function. The aes() function can be used inside of the ggplot() function or inside of the geom_point() function.

ggplot(data = my_data) + 
	geom_point(aes(x = x, y = y))




mapping data or constants to aesthetic attributes

To change the size of all of our dots or points we set aesthetic attributes to constants. Importantly we have to do so outside of the aes() function but inside of geom_point():

ggplot(data = my_data) + 
	geom_point(aes(x = x, y = y), size = 4)




If we want the aesthetic attributes to vary with the data (i.e., map the aesthetic property to data) we need to do so inside of the aes() function. Let’s map the variable z to the color aesthetic. Let’s also set the opacity to a constant by means of the alpha aesthetic.

ggplot(data = my_data) + 
	geom_point(aes(x = x, y = y, color = z), size = 4, alpha = 0.4) 




coordinate systems

The final component of a graph or visualization in the context of ggplot2 is the coordinate system. If not explicitly added to a plot as a layer, ggplot will default to the Cartesian coordinate system. For laughs let’s change the default and set our plot to the polar coordinate system.

ggplot(data = my_data) + 
	geom_point(
		aes(x = x, y = y, color = z),
		size = 4, 
		alpha = 0.4
	) +
	coord_polar(theta = "y")




additional layers

Additional layers can be added to a plot. For example, you may want to add another geometric component: say lines in addition to points. Let’s also revert back to the cartesian coordinate system and explicitly set it by way of the coord_cartesian() function. In the code below do note that some aesthetic features are mapped to constants and others to data insider of the aes() function.

ggplot(data = my_data) + 
	geom_line(
		aes(x = x, y = y), 
		color = "blue",
		size = 1,
		linetype = "solid"
	) +
	geom_point(
		aes(x = x, y = y, color = z),
		size = 4, 
		alpha = 0.4
	) +
	geom_hline(yintercept = 0, color = "red") +
	coord_cartesian()




beautification

axis labels, titles, subtitles, captions

Features such as titles or axis labels are added by means of layers as well. Consider the layer labs() which allows us to set labels for the x and y axes, as well as add a title, a subtitle, and a caption.

ggplot(data = my_data) + 
	geom_line(
		aes(x = x, y = y), 
		color = "blue",
		size = 1,
		linetype = "solid"
	) +
	geom_point(
		aes(x = x, y = y, color = z),
		size = 4, 
		alpha = 0.4
	) +
	geom_hline(yintercept = 0, color = "red") +
	labs(
		x = "X",
		y = "Sine of X",
		title = "The Sine of X",
		subtitle = "Not the Cosine of X",
		caption = "This figure was created in R."
	) +
	coord_cartesian()




scales

All aesthetic attributes are associated with a scale. You can think of x and y being scaled either linearly and continuously, or discretely, or in some kind of transformation – say a logarithmic scale. The same is true for other aesthetic attributes. Colors could vary discretely or smoothly and continuously. Let’s manipulate the x scale as well as the color scale. The most common scale functions you will encounter are:

ggplot(data = my_data) + 
	geom_line(
		aes(x = x, y = y), 
		color = "blue",
		size = 1,
		linetype = "solid"
	) +
	geom_point(
		aes(x = x, y = y, color = z),
		size = 4, 
		alpha = 0.4
	) +
	geom_hline(yintercept = 0, color = "red") +
	labs(
		x = "X",
		y = "Sine of X",
		title = "The Sine of X",
		subtitle = "Not the Cosine of X",
		caption = "This figure was created in R."
	)+
	coord_cartesian() +
	scale_x_continuous(
		breaks = seq(from = -5, to = 5, by = 1)
	) + 
	scale_color_manual(
		name = "Sign", 
		values = c("red", "green"), 
		labels = c("negative sine", "positive sine")
	)




a brief word on color

A number of colors are predefined in R. See R-Colors.pdf for a listing of them all. If you’d like to express a color not found on this list you can do so by defining it yourself as one of over 16 million permutations of up to 255 parts of red, green, and blue each expressed using the hexadecimal number system.

Black which is zero parts red, zero parts green, and zero parts blue would be expressed as "#000000" in hexadecimal notation. White which is 255 parts red, green, and blue, respectively would be expressed as "#ffffff" . To verify you could let R convert 255 to hexadecimal notation via as.hexmode(255). The translation to hexadecimal notation can be automated via the rgb() function.

x <- rnorm(n = 100, mean = 0, sd = 1)
y <- rnorm(n = 100, mean = 0, sd = 1)

dat <- data.frame(x, y)

my_mystery_color <- rgb(red = 16, green = 128, blue = 64, max = 255)

ggplot(data = dat, aes(x = x, y = y)) +
	geom_point(color = my_mystery_color, size = 5)




I really like these

library(ggthemes)

black <- "#073642"
blue <- "#268bd2"
cyan <- "#2aa198"
green <- "#859900"
magenta <- "#d33682"
orange <- "#cb4b16"
red <- "#dc322f"
violet <- "#6c71c4"
white <- "#eee8d5"
yellow <- "#b58900"

my_colors_values <- c(black, blue, cyan, green, magenta, 
    orange, red, violet, white, yellow)
my_colors_names <- c("black", "blue", "cyan", "green", "magenta",
    "orange", "red", "violet", "white", "yellow")

y <- rnorm(1000)
x <- runif(1000, 1, 100)
z <- rep(my_colors_names, each = 100)

dat <- data.frame(x, y, z)

ggplot(data = dat, aes(x = x, y = y, color = z)) +
	geom_point(size = 12, alpha = 0.1) +
	geom_point(size = 6, alpha = 0.3) +
	geom_point(size = 3, alpha = 0.6) +
	geom_point(size = 1, alpha = 1) +
	geom_smooth(alpha = 0, span = 0.2, size = 1.5) +
	scale_x_continuous(breaks = c(2, 5, seq(0, 100, 10))) +
	scale_y_continuous(breaks = seq(-5, 5, 0.5)) +
	scale_color_manual(name = "My Colors", 
            values = my_colors_values) +
	coord_trans(x = "log") +
	theme_solarized_2() +
	ggtitle("Confetti!")