# ggplot2 ...

29 Mar 2022## ggplot2 - data and geoms

Last week we covered the ggplot2 package and its role in producing data visualizations, graphs, and figures. We began with the notion that a data visualization consists at a minimum of three components

- data
- some geometric idea or concept to visualize/represent the data
- a coordinate system

In the context of the ggplot package these three components constitute layers that we iteratively add to produce a plot or graph.

### the `ggplot()`

function

The function takes one important argument `data`

and consititutes the base layer of our plot

```
x <- seq(from = -5, to = 5, by = 0.1)
y <- sin(x)
z <- ifelse(y > 0, "positive", "negative")
z <- as.factor(z)
my_data <- data.frame(x, y, z)
ggplot(data = my_data)
```

The above code generates and empty plot. It initiates a canvas if you will.

Let’s add a geometric idea … how about a scatter plot. To set geometric ideas we use one of the many `geom_()`

functions. There exist many such as:

**One Variable:**

`geom_histogram()`

`geom_density()`

`geom_bar()`

**Two Variables:**

`geom_point()`

`geom_line()`

`geom_boxplot()`

Scatterplots are basically points so here we’d use `geom_point()`

to instruct R to draw a scatterplot.

```
ggplot(data = my_data) + geom_point()
```

The problem with above code is that R does not know what components of the data are supposed to be mapped to what aesthetic attributes of the points in our scatterplot. In fact R produces a warning:

```
Error: geom_point requires the following missing aesthetics: x, y
```

All geometric ideas or **geoms** require some kind of mapping from data to an aesthetic attribute. Points have many aesthetic attributes such as:

`color`

`size`

`shape`

`alpha`

(opacity)

Importantly one of their aesthetic characteristics is the location on our canvas: here in terms of x and y coordinates.

To map data to these aesthetic attributes of our geometric ideas we use the `aes()`

function. The `aes()`

function can be used inside of the `ggplot()`

function or inside of the `geom_point()`

function.

```
ggplot(data = my_data) +
geom_point(aes(x = x, y = y))
```

### mapping data or constants to aesthetic attributes

To change the size of *all* of our dots or points we set aesthetic attributes to constants. Importantly we have to do so outside of the `aes()`

function but inside of `geom_point()`

:

```
ggplot(data = my_data) +
geom_point(aes(x = x, y = y), size = 4)
```

If we want the aesthetic attributes to vary with the data (i.e., map the aesthetic property to data) we need to do so inside of the `aes()`

function. Let’s map the variable `z`

to the `color`

aesthetic. Let’s also set the opacity to a constant by means of the `alpha`

aesthetic.

```
ggplot(data = my_data) +
geom_point(aes(x = x, y = y, color = z), size = 4, alpha = 0.4)
```

### coordinate systems

The final component of a graph or visualization in the context of ggplot2 is the coordinate system. If not explicitly added to a plot as a layer, ggplot will default to the Cartesian coordinate system. For laughs let’s change the default and set our plot to the polar coordinate system.

```
ggplot(data = my_data) +
geom_point(
aes(x = x, y = y, color = z),
size = 4,
alpha = 0.4
) +
coord_polar(theta = "y")
```

### additional layers

Additional layers can be added to a plot. For example, you may want to add another geometric component: say lines in addition to points. Let’s also revert back to the cartesian coordinate system and explicitly set it by way of the `coord_cartesian()`

function. In the code below do note that some aesthetic features are mapped to constants and others to data insider of the `aes()`

function.

```
ggplot(data = my_data) +
geom_line(
aes(x = x, y = y),
color = "blue",
size = 1,
linetype = "solid"
) +
geom_point(
aes(x = x, y = y, color = z),
size = 4,
alpha = 0.4
) +
geom_hline(yintercept = 0, color = "red") +
coord_cartesian()
```

### beautification

#### axis labels, titles, subtitles, captions

Features such as titles or axis labels are added by means of layers as well. Consider the layer `labs()`

which allows us to set labels for the `x`

and `y`

axes, as well as add a `title`

, a `subtitle`

, and a `caption`

.

```
ggplot(data = my_data) +
geom_line(
aes(x = x, y = y),
color = "blue",
size = 1,
linetype = "solid"
) +
geom_point(
aes(x = x, y = y, color = z),
size = 4,
alpha = 0.4
) +
geom_hline(yintercept = 0, color = "red") +
labs(
x = "X",
y = "Sine of X",
title = "The Sine of X",
subtitle = "Not the Cosine of X",
caption = "This figure was created in R."
) +
coord_cartesian()
```

##### scales

*All* aesthetic attributes are associated with a scale. You can think of x and y being scaled either linearly and continuously, or discretely, or in some kind of transformation – say a logarithmic scale. The same is true for other aesthetic attributes. Colors could vary discretely or smoothly and continuously. Let’s manipulate the x scale as well as the color scale. The most common scale functions you will encounter are:

`scale_*_continuous()`

: map continuous data values to aesthetic attributes (e.g.,`scale_x_continuous()`

)`scale_*_discrete()`

: map discrete data values such as factors to aesthetic attributes (e.g.,`scale_y_discrete()`

)`scale_*_manual()`

: map discrete values to manually chosen aesthetic attributes (e.g.,`scale_color_manual()`

)

```
ggplot(data = my_data) +
geom_line(
aes(x = x, y = y),
color = "blue",
size = 1,
linetype = "solid"
) +
geom_point(
aes(x = x, y = y, color = z),
size = 4,
alpha = 0.4
) +
geom_hline(yintercept = 0, color = "red") +
labs(
x = "X",
y = "Sine of X",
title = "The Sine of X",
subtitle = "Not the Cosine of X",
caption = "This figure was created in R."
)+
coord_cartesian() +
scale_x_continuous(
breaks = seq(from = -5, to = 5, by = 1)
) +
scale_color_manual(
name = "Sign",
values = c("red", "green"),
labels = c("negative sine", "positive sine")
)
```

### a brief word on color

A number of colors are predefined in R. See R-Colors.pdf for a listing of them all. If you’d like to express a color not found on this list you can do so by defining it yourself as one of over 16.7 million permutations of up to 256 parts of red, green, and blue each expressed using the hexadecimal number system.

Black which is zero parts red, zero parts green, and zero parts blue would be expressed as `"#000000"`

in hexadecimal notation. White which is 255 parts red, green, and blue, respectively would be expressed as `"#ffffff" `

. To verify you could let R convert `255`

to hexadecimal notation via `as.hexmode(255)`

. The translation to hexadecimal notation can be automated via the `rgb()`

function.

```
x <- rnorm(n = 100, mean = 0, sd = 1)
y <- rnorm(n = 100, mean = 0, sd = 1)
dat <- data.frame(x, y)
my_mystery_color <- rgb(red = 16, green = 128, blue = 64, max = 255)
ggplot(data = dat, aes(x = x, y = y)) +
geom_point(color = my_mystery_color, size = 5)
```

### Try to replicate this figure in Excel …

```
library(ggthemes)
black <- "#073642"
blue <- "#268bd2"
cyan <- "#2aa198"
green <- "#859900"
magenta <- "#d33682"
orange <- "#cb4b16"
red <- "#dc322f"
violet <- "#6c71c4"
white <- "#eee8d5"
yellow <- "#b58900"
my_colors_values <- c(black, blue, cyan, green, magenta,
orange, red, violet, white, yellow)
my_colors_names <- c("black", "blue", "cyan", "green", "magenta",
"orange", "red", "violet", "white", "yellow")
y <- rnorm(n = 1000, mean = 0, sd = 1)
x <- runif(n = 1000, min = 1, max = 100)
z <- rep(my_colors_names, each = 100)
dat <- data.frame(x, y, z)
ggplot(data = dat, aes(x = x, y = y, color = z)) +
geom_point(size = 12, alpha = 0.1) +
geom_point(size = 6, alpha = 0.3) +
geom_point(size = 3, alpha = 0.6) +
geom_point(size = 1, alpha = 1) +
geom_smooth(alpha = 0, span = 0.2, size = 1.5) +
scale_x_continuous(breaks =
c(2, 5, seq(from = 0, to = 100, by = 10))) +
scale_y_continuous(breaks =
seq(from = -5, to = 5, by = 0.5)) +
scale_color_manual(name = "My Colors",
values = my_colors_values) +
coord_trans(x = "log") +
theme_solarized_2() +
ggtitle("Confetti!")
```

Below find the code we produced in class last week as well as our snazzy animation.

```
# ------------------------------------------------------- #
# installing packages and loading libraries
# ------------------------------------------------------- #
# install.packages("tidyverse")
# install.packages("ggthemes")
# install.packages("R.utils")
# install.packages("gganimate")
# install.packages("gifski")
library(ggplot2)
library(ggthemes)
library(scales)
library(tidyr)
library(R.utils)
library(gganimate)
library(gifski)
# ------------------------------------------------------- #
# Congress - DW-Nominate data wangling
# ------------------------------------------------------- #
cong <- read.csv("Congress.csv")
cong$party <- ifelse(cong$party_code == 100,
yes = "Democrat",
no = ifelse(cong$party_code == 200,
yes = "Republican",
no = "other")
)
HR <- subset(cong, chamber == "House")
year <- seq(from = 1789, to = 2021, by = 2)
df <- data.frame(
year,
session = NA,
n_rep = NA,
n_dem = NA,
n_all = NA,
mean_rep = NA,
mean_dem = NA,
mean_all = NA,
sd_rep = NA,
sd_dem = NA,
sd_all = NA
)
for(i in 1:117) {
tmp.rep <- subset(HR, congress == i & party == "Republican")
tmp.dem <- subset(HR, congress == i & party == "Democrat")
tmp.all <- subset(HR, congress == i & party != "other")
df$session[i] <- i
df$n_rep[i] <- dim(tmp.rep)[1]
df$n_dem[i] <- dim(tmp.dem)[1]
df$n_all[i] <- dim(tmp.all)[1]
df$mean_rep[i] <- mean(tmp.rep$nominate_dim1, na.rm = TRUE)
df$mean_dem[i] <- mean(tmp.dem$nominate_dim1, na.rm = TRUE)
df$mean_all[i] <- mean(tmp.all$nominate_dim1, na.rm = TRUE)
df$sd_rep[i] <- sd(tmp.rep$nominate_dim1, na.rm = TRUE)
df$sd_dem[i] <- sd(tmp.dem$nominate_dim1, na.rm = TRUE)
df$sd_all[i] <- sd(tmp.all$nominate_dim1, na.rm = TRUE)
}
df <- pivot_longer(df,
cols = -c(year,session),
names_sep = "_",
names_to = c(".value", "party")
)
df <- data.frame(df)
df$se <- df$sd/sqrt(df$n)
my_red <- rgb(red = 220, green = 50, blue = 47, alpha = 255, max = 255)
my_blue <- rgb(red = 38, green = 139, blue = 210, alpha = 255, max = 255)
my_black <- rgb(red = 0, green = 43, blue = 54, alpha = 255, max = 255)
my_yellow <- "#b58900"
pres <- data.frame(
Name = c("Reagan\ntakes office", "Obama\ntakes office"),
Party = c("rep", "dem"),
Year = c(1981, 2009)
)
# ------------------------------------------------------- #
# Congress Polarization Plot
# ------------------------------------------------------- #
ggplot(data = subset(df, year > 1865)) +
coord_cartesian(xlim = c(1865, 2021), ylim = c(-1, 1)) +
geom_line(aes(x = year, y = mean, color = party)) +
labs(x = "Year", y = "Average Ideology", color = "Party", fill = "Party", title = "House of Represenatives: 1865 to present") +
scale_x_continuous(breaks = seq(from = 1865, to = 2021, by = 10)) +
scale_y_continuous(breaks = seq(from = -1, to = 1, by = 0.25 )) +
scale_color_manual(values = c(my_black, my_blue, my_red), breaks = c("all", "dem", "rep")) +
scale_fill_manual(values = c(my_black, my_blue, my_red), breaks = c("all", "dem", "rep")) +
geom_ribbon(aes(x = year, ymin = mean - 2 * se,
ymax = mean + 2 * se, fill = party), alpha = 0.25) +
geom_vline(data = pres, aes(xintercept = Year), color = c(my_red, my_blue)) +
annotate("text", x = pres$Year, y = 0.75, label = pres$Name, color = c(my_red, my_blue), hjust = 1.1) +
theme_solarized()
# ------------------------------------------------------- #
# saving a plot via pdf() see ?png for alternatives
# ------------------------------------------------------- #
pdf(file = "Polarization.pdf", width = 10, height = 6)
ggplot(data = subset(df, year > 1865)) +
coord_cartesian(xlim = c(1865, 2021), ylim = c(-1, 1)) +
geom_line(aes(x = year, y = mean, color = party)) +
labs(x = "Year", y = "Average Ideology", color = "Party", fill = "Party", title = "House of Represenatives: 1865 to present") +
scale_x_continuous(breaks = seq(from = 1865, to = 2021, by = 10)) +
scale_y_continuous(breaks = seq(from = -1, to = 1, by = 0.25 )) +
scale_color_manual(values = c(my_black, my_blue, my_red), breaks = c("all", "dem", "rep")) +
scale_fill_manual(values = c(my_black, my_blue, my_red), breaks = c("all", "dem", "rep")) +
geom_ribbon(aes(x = year, ymin = mean - 2 * se,
ymax = mean + 2 * se, fill = party), alpha = 0.25) +
geom_vline(data = pres, aes(xintercept = Year), color = c(my_red, my_blue)) +
annotate("text", x = pres$Year, y = 0.75, label = pres$Name, color = c(my_red, my_blue), hjust = 1.1) +
theme_solarized()
dev.off()
# ------------------------------------------------------- #
# Histogram inside for-loop
# ------------------------------------------------------- #
Year <- seq(from = 1789, to = 2021, by = 2)
for(i in 1:117) {
title <- paste("Session:", i, "-- Year:", Year[i])
p <-ggplot(data = subset(HR, congress == i)) +
coord_cartesian(xlim = c(-1,1), ylim = c(0, 50)) +
geom_histogram(aes(x = nominate_dim1, fill = party, color = party), alpha = 0.3, position = "dodge", binwidth = 0.05) +
scale_color_manual(values = c(my_black, my_blue, my_red), breaks = c("other", "Democrat", "Republican")) +
scale_fill_manual(values = c(my_black, my_blue, my_red), breaks = c("other", "Democrat", "Republican")) +
geom_vline(aes(xintercept = median(nominate_dim1, na.rm = TRUE)), color = "red", linetype = "solid") +
geom_vline(xintercept = 0, color = "black", linetype = "dashed") +
labs(y = "Number of Represenatives", x = "Average Ideology", color = "Party", fill = "Party", title = title)+
scale_x_continuous(breaks = seq(from = -1, to = 1, by = 0.2)) +
theme_solarized()
print(p)
Sys.sleep(time = 2)
}
# ------------------------------------------------------- #
# Scatterplot inside for-loop
# ------------------------------------------------------- #
Year <- seq(from = 1789, to = 2021, by = 2)
for (i in 40:117) {
plot_i <- ggplot(data = subset(HR, congress == i)) +
geom_point(aes(x = nominate_dim1, y = nominate_dim2, color = party)) +
labs(title = paste("Session:", i, "-- Year:", Year[i])) +
coord_cartesian(xlim = c(-1, 1), ylim = c(-1,1)) +
scale_colour_manual(values = c(my_black, my_red, my_blue), drop = FALSE, breaks = c("other", "Republican", "Democrat"), name = "Party")
print(plot_i)
Sys.sleep(time = 1)
}
# ------------------------------------------------------- #
# Animate via gganimate
# ------------------------------------------------------- #
# you may have to install a renderer for this to work
# install.packages("gifski")
myplot <- ggplot(data = HR, aes(x = nominate_dim1, y = nominate_dim2, color = party)) +
geom_point() +
labs(title = "Session: {closest_state}", x = "Ideology (Dimension I)", y = "Ideology (Dimension II)") +
coord_cartesian(xlim = c(-1, 1), ylim = c(-1,1)) +
scale_colour_manual(values = c(my_yellow, my_red, my_blue), drop = FALSE, breaks = c("other", "Republican", "Democrat"), name = "Party") +
transition_states(congress, transition_length = 117, state_length = 1) +
theme_solarized_2(light = FALSE) +
enter_fade() +
exit_fade()
animate(myplot, nframes = 234)
anim_save("foo.gif", animation = last_animation())
```