Lecture 9: Structuring ggplot2

Nick Huntington-Klein

13 June, 2022

Structuring ggplot

Last time we went through the basics of the grammar of graphics
And how to put together a ggplot graph:
Data, aesthetics, geometry

Today!

We’ll talk about different ways of changing the structure of your ggplot
Scales and coordinates, labels and legends, new geometries
There are infinite possibilities here, we can’t cover everything
The goal is to reduce unknown unknowns so you know where to look/Google
Also, just start typing in RStudio and see what autocompletes!

The Key Pieces

A common structure for a ggplot() command with a fair amount of customization might be:

ggplot(data, aes(x = xvar, y = yvar, color = colorvar)) + 
  geom_mygeometry(otheraesthetic = value) + 
  scale_aesthetic_type(labels = something) + 
  labs(x = 'My X Var', y = 'My Y Var', title = 'My Title') + 
  guides(color = 'none')

Scales

Every axis (aesthetic element) has a scale
This determines how a value in the data (1, 2, 3) turns into a value on the graph (100 pixels to the right, 200 pixels, 300 pixels; or “red”, “blue”, “green”)
These scales are fully manipulable and label-able!
scale_axisname_type

Discrete and Continuous Scales

For example:
scale_x_continuous for a continuous x-axis
scale_x_discrete for a discrete one
Or similarly scale_y_continuous, scale_color_continuous, scale_color_gradient, scale_fill_discrete, and so on and so on

Discrete and Continuous Scales

Use to:
Set colors, set limits (want the axis to extend beyond its values? limits)
Name the scale
Label values (for dicrete values, or perhaps as percents?)

Setting Values Manually

scale_x_manual for discrete plots will let you set things by hand
For example if x is different companies and you want to pick each company’s brand color for its bar

Examples

data <- tibble(category = c('Apple','Banana','Carrot','Apple','Banana','Carrot'),
               person = c('Me','Me','Me','You','You','You'),
               quality = c(.06,.04,.03,.01,.06,.03))
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge')

Setting Scales

library(scales)
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_manual(values = c('Apple'='red','Banana'='yellow','Carrot'='orange'))

Setting Scales

limits in the scale function can be used to specify the vector of values to be included, or as a range for continuous variables
labels is also handy. labels = c('Red Apple', 'Yellow Banana','Orange Carrot') would relabel the legend (or axis labels)
In a continuous function, say for dollars, scale_x_continuous(labels = scales::label_dollar()) would put it in dollar terms. More on this in a moment
Or dates: scale_x_date(labels = function(x) paste(month.abb[month(x)],year(x)))

Setting Color or Fill Scales

Selecting a color scale is important - we’ve discussed discrete and continuous palettes
There are a bunch of scale_color_/scale_fill_ functions that solely exist to help with this!
(not to mention entire packages like paletteer)

Setting Color or Fill Scales

Especially useful are:

scale_color_gradient() for gradient scales (or _gradient2() for diverging scales with a “middle” in them), scale_color_viridis() also has some great gradient scales (either discrete or continuous!)
scale_color_brewer()/scale_fill_brewer() functions for discrete values, or _distiller() for continuous values, or _fermenter() for binned

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_brewer(palette = 'Dark2')

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_viridis_c()

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_gradient2(midpoint = .03)

Transformations

We can transform values as we plot them
Many scale_something_continuous entries have a trans option, set to date, log, probability, reciprocal, sqrt, reverse, etc. etc. to perform that transformation before plotting
You can also do things like scale_x_log10 or scale_y_reverse for some common transforms
scale_something_binned() takes a continuous value and puts it in bins, handy sometimes for simplification

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point()

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point() + 
  scale_x_log10() + 
  scale_y_continuous(trans='reverse') + 
  scale_color_binned()

Log Scales

When to use log scales?

When data is highly skewed, so a few huge observations are drawing all focus
When the relationship is multiplicative so you want to show, say, percentage growth

Labeling Scales for Continuous Variables

Scale functions have a labels= option.

USE IT!!
This is the quickest and easiest way to make your graph look a zillion times better and more professional
We already covered this for categorical variables. labels = c('A','B') gives the categories the names A and B, in that order (tip: check the order first)
For continuous variables, the scales package helps us immensely here

scales

Two main types of functions in scales:

Transformation functions like dollar(): dollar(10) creates $10 (NOTE: handy sometimes in RMarkdown text! Also note this creates text, not numbers, so don’t use them in aes() unless you want the variable to be a string)
Labeling functions like label_dollar() designed to slot directly into the labels= argument. scale_y_continuous(labels = label_dollar()) turns all your y-axis labels into the dollar equivalent

scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1))

Scales

The label_ functions have lots of options! You can set the accuracy (precision), decide how to break up big numbers (big.mark) or scale things down to, say, thousands! (scale=1/1000, suffix = 'k')

data(gapminder, package = 'gapminder')
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + 
  scale_x_log10(labels = label_dollar(accuracy = 1, scale = 1/1000, suffix = 'k'))

Documentation

There are a zillion options
Be sure to use help(whatever) before using something
Not sure what the function is? Check the ggplot2 documentation page
Or just start typing scales_ or whatever and see what pops up in autocomplete in RStudio

Labels with labs()

Please label your axes and graphs!!
Label legends too by naming their axes
Captions and subcaptions available too
Annotation and labeling data points we’ll do next time

All Kinds of Labels

data(mtcars)
mtcars <- mtcars %>% mutate(CarName = row.names(mtcars))
ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight',
       title = 'Title', subtitle = 'Subtitle', caption = 'Caption')

Moving the Legend

Working with ggplot2 legends is one of the trickier things
we’ll do a lot more with aesthetic details next time but for now let’s just move it and outline it.

Moving the Legend

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight',
       title = 'Title', subtitle = 'Subtitle', caption = 'Caption') +
  theme(legend.position = c(.9,.3), legend.box.background = element_rect(color='black'))

Or Removing It

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  guides(color = 'none')

Overlaid Graphs

In addition to adding chart elements, we can add whole new geometries over the top
Some stack naturally (adding a geom_smooth best fit line over top, for example)
For others we may have to change the data or aesthetic as we go

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_smooth(method='lm', se = FALSE)

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_text_repel(data = mtcars %>% slice(1:5),aes(label = CarName),hjust=-1)

Facets

Separate graphs by group!

ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point() + 
  facet_wrap('cyl') + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', title = 'Horsepower vs. MPG by Cylinders')

Multiple-Graph Notes

For overlaying the same type of graph, say two line graphs for two variables, don’t + different geometries, you’re almost always better off doing a pivot_longer() first and just using aes().
For sticking together multiple completely disparate graphs, don’t use facets, instead load patchwork and combine them as desired - we’ll discuss this later

Imitation and Flattery

Pick one of these graphs or from the previous lecture
Go into the geometry’s help file and recreate an example, or the graph in these slides
Mess with the scales and labels! Try to get it to work!

Lecture 9: Structuring ggplot2

Nick Huntington-Klein

13 June, 2022

Structuring ggplot

Today!

The Key Pieces

Scales

Discrete and Continuous Scales

Discrete and Continuous Scales

Setting Values Manually

Examples

Setting Scales

Setting Scales

Setting Color or Fill Scales

Setting Color or Fill Scales

Setting Color or Fill Scales

Setting Color or Fill Scales

Setting Color or Fill Scales

Transformations

Transformations

Transformations

Log Scales

Labeling Scales for Continuous Variables

scales

scales

Scales

Documentation

Labels with labs()

All Kinds of Labels

Moving the Legend

Moving the Legend

Or Removing It

Overlaid Graphs

Overlaid Graphs

Overlaid Graphs

Facets

ggforce’s facet_zoom

Multiple-Graph Notes

Imitation and Flattery