Lecture 9: Structuring ggplot2

Nick Huntington-Klein

13 June, 2022

Structuring ggplot

  • Last time we went through the basics of the grammar of graphics
  • And how to put together a ggplot graph:
  • Data, aesthetics, geometry

Today!

  • We’ll talk about different ways of changing the structure of your ggplot
  • Scales and coordinates, labels and legends, new geometries
  • There are infinite possibilities here, we can’t cover everything
  • The goal is to reduce unknown unknowns so you know where to look/Google
  • Also, just start typing in RStudio and see what autocompletes!

The Key Pieces

A common structure for a ggplot() command with a fair amount of customization might be:

ggplot(data, aes(x = xvar, y = yvar, color = colorvar)) + 
  geom_mygeometry(otheraesthetic = value) + 
  scale_aesthetic_type(labels = something) + 
  labs(x = 'My X Var', y = 'My Y Var', title = 'My Title') + 
  guides(color = 'none')

Scales

  • Every axis (aesthetic element) has a scale
  • This determines how a value in the data (1, 2, 3) turns into a value on the graph (100 pixels to the right, 200 pixels, 300 pixels; or “red”, “blue”, “green”)
  • These scales are fully manipulable and label-able!
  • scale_axisname_type

Discrete and Continuous Scales

  • For example:
  • scale_x_continuous for a continuous x-axis
  • scale_x_discrete for a discrete one
  • Or similarly scale_y_continuous, scale_color_continuous, scale_color_gradient, scale_fill_discrete, and so on and so on

Discrete and Continuous Scales

  • Use to:
  • Set colors, set limits (want the axis to extend beyond its values? limits)
  • Name the scale
  • Label values (for dicrete values, or perhaps as percents?)

Setting Values Manually

  • scale_x_manual for discrete plots will let you set things by hand
  • For example if x is different companies and you want to pick each company’s brand color for its bar

Examples

data <- tibble(category = c('Apple','Banana','Carrot','Apple','Banana','Carrot'),
               person = c('Me','Me','Me','You','You','You'),
               quality = c(.06,.04,.03,.01,.06,.03))
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge')

Setting Scales

library(scales)
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_manual(values = c('Apple'='red','Banana'='yellow','Carrot'='orange'))

Setting Scales

  • limits in the scale function can be used to specify the vector of values to be included, or as a range for continuous variables
  • labels is also handy. labels = c('Red Apple', 'Yellow Banana','Orange Carrot') would relabel the legend (or axis labels)
  • In a continuous function, say for dollars, scale_x_continuous(labels = scales::label_dollar()) would put it in dollar terms. More on this in a moment
  • Or dates: scale_x_date(labels = function(x) paste(month.abb[month(x)],year(x)))

Setting Color or Fill Scales

  • Selecting a color scale is important - we’ve discussed discrete and continuous palettes
  • There are a bunch of scale_color_/scale_fill_ functions that solely exist to help with this!
  • (not to mention entire packages like paletteer)

Setting Color or Fill Scales

Especially useful are:

  • scale_color_gradient() for gradient scales (or _gradient2() for diverging scales with a “middle” in them), scale_color_viridis() also has some great gradient scales (either discrete or continuous!)
  • scale_color_brewer()/scale_fill_brewer() functions for discrete values, or _distiller() for continuous values, or _fermenter() for binned

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_brewer(palette = 'Dark2')

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_viridis_c()

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_gradient2(midpoint = .03)

Transformations

  • We can transform values as we plot them
  • Many scale_something_continuous entries have a trans option, set to date, log, probability, reciprocal, sqrt, reverse, etc. etc. to perform that transformation before plotting
  • You can also do things like scale_x_log10 or scale_y_reverse for some common transforms
  • scale_something_binned() takes a continuous value and puts it in bins, handy sometimes for simplification

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point()

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point() + 
  scale_x_log10() + 
  scale_y_continuous(trans='reverse') + 
  scale_color_binned()

Log Scales

When to use log scales?

  • When data is highly skewed, so a few huge observations are drawing all focus
  • When the relationship is multiplicative so you want to show, say, percentage growth

Labeling Scales for Continuous Variables

Scale functions have a labels= option.

  • USE IT!!
  • This is the quickest and easiest way to make your graph look a zillion times better and more professional
  • We already covered this for categorical variables. labels = c('A','B') gives the categories the names A and B, in that order (tip: check the order first)
  • For continuous variables, the scales package helps us immensely here

scales

Two main types of functions in scales:

  • Transformation functions like dollar(): dollar(10) creates $10 (NOTE: handy sometimes in RMarkdown text! Also note this creates text, not numbers, so don’t use them in aes() unless you want the variable to be a string)
  • Labeling functions like label_dollar() designed to slot directly into the labels= argument. scale_y_continuous(labels = label_dollar()) turns all your y-axis labels into the dollar equivalent

scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1))

Scales

The label_ functions have lots of options! You can set the accuracy (precision), decide how to break up big numbers (big.mark) or scale things down to, say, thousands! (scale=1/1000, suffix = 'k')

data(gapminder, package = 'gapminder')
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + 
  scale_x_log10(labels = label_dollar(accuracy = 1, scale = 1/1000, suffix = 'k'))

Documentation

  • There are a zillion options
  • Be sure to use help(whatever) before using something
  • Not sure what the function is? Check the ggplot2 documentation page
  • Or just start typing scales_ or whatever and see what pops up in autocomplete in RStudio

Labels with labs()

  • Please label your axes and graphs!!
  • Label legends too by naming their axes
  • Captions and subcaptions available too
  • Annotation and labeling data points we’ll do next time

All Kinds of Labels

data(mtcars)
mtcars <- mtcars %>% mutate(CarName = row.names(mtcars))
ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight',
       title = 'Title', subtitle = 'Subtitle', caption = 'Caption')

Moving the Legend

  • Working with ggplot2 legends is one of the trickier things
  • we’ll do a lot more with aesthetic details next time but for now let’s just move it and outline it.

Moving the Legend

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight',
       title = 'Title', subtitle = 'Subtitle', caption = 'Caption') +
  theme(legend.position = c(.9,.3), legend.box.background = element_rect(color='black'))

Or Removing It

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  guides(color = 'none')

Overlaid Graphs

  • In addition to adding chart elements, we can add whole new geometries over the top
  • Some stack naturally (adding a geom_smooth best fit line over top, for example)
  • For others we may have to change the data or aesthetic as we go

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_smooth(method='lm', se = FALSE)

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_text_repel(data = mtcars %>% slice(1:5),aes(label = CarName),hjust=-1)

Facets

  • Separate graphs by group!
ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point() + 
  facet_wrap('cyl') + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', title = 'Horsepower vs. MPG by Cylinders')

ggforce’s facet_zoom

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
  geom_point() +
  facet_zoom(x = Species == 'versicolor')

Multiple-Graph Notes

  • For overlaying the same type of graph, say two line graphs for two variables, don’t + different geometries, you’re almost always better off doing a pivot_longer() first and just using aes().
  • For sticking together multiple completely disparate graphs, don’t use facets, instead load patchwork and combine them as desired - we’ll discuss this later

Imitation and Flattery

  • Pick one of these graphs or from the previous lecture
  • Go into the geometry’s help file and recreate an example, or the graph in these slides
  • Mess with the scales and labels! Try to get it to work!