Lecture 9: How ggplot2 Graphs are Built and Modified

Nick Huntington-Klein

03 March, 2026

Structuring ggplot

  • Last time we went through the basics of the grammar of graphics
  • And how to put together a ggplot graph:
  • Data, aesthetics, geometry

Today!

  • We’ll talk about different ways of changing the structure of your ggplot
  • Scales and coordinates, labels and legends, new geometries
  • There are infinite possibilities here, we can’t cover everything
  • The goal is to reduce unknown unknowns so you know where to look/Google

Why?

  • If we’re going to be using AI to edit and make our graphs look really good, why learn this?
  • This will let you know what to ask for and how to be specific about asking
  • And also will help you know how to validate the code you’re seeing and be sure you know what it’s doing
  • Plus, these are the kinds of structures you’ll be looking for if you switch languages / graphing packages

The Key Pieces

A common structure for a ggplot() command with a fair amount of customization might be:

ggplot(data, aes(x = xvar, y = yvar, color = colorvar)) + 
  geom_mygeometry(otheraesthetic = value) + 
  scale_aesthetic_type(labels = something) + 
  labs(x = 'My X Var', y = 'My Y Var', title = 'My Title') + 
  guides(color = 'none')

Scales

  • Every axis (aesthetic element) has a scale
  • This determines how a value in the data (1, 2, 3) turns into a value on the graph (100 pixels to the right, 200 pixels, 300 pixels; or “red”, “blue”, “green”)
  • These scales are fully manipulable and label-able!
  • scale_axisname_type

Discrete and Continuous Scales

  • For example:
  • scale_x_continuous for a continuous x-axis
  • scale_x_discrete for a discrete one
  • Or similarly scale_y_continuous, scale_color_continuous, scale_color_gradient, scale_fill_discrete, and so on and so on

Discrete and Continuous Scales

  • Use to:
  • Set colors, set limits (want the axis to extend beyond its values? limits)
  • Name the scale
  • Label values (for dicrete values, or perhaps as percents?)

Examples

data <- tibble(category = c('Apple','Banana','Carrot','Apple','Banana','Carrot'),
               person = c('Me','Me','Me','You','You','You'),
               quality = c(.06,.04,.03,.01,.06,.03))
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge')

Setting Scales

library(scales)
ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_manual(values = c('Apple'='red','Banana'='yellow','Carrot'='orange'))

Setting Scales

  • Please customize your scales to the kind of data you’re showing
  • labels is handy. labels = c('Red Apple', 'Yellow Banana','Orange Carrot') would relabel the legend (or axis labels)
  • In a continuous function, say for dollars, scale_x_continuous(labels = scales::label_dollar()) would put it in dollar terms. More on this in a moment
  • Or setting date formats: scale_x_date(date_labels = '%m/%Y')

Setting Color or Fill Scales

  • Selecting a color scale is important - we’ve discussed discrete and continuous palettes
  • There are a bunch of scale_color_/scale_fill_ functions that solely exist to help with this!
  • (not to mention entire packages like paletteer)
  • You might browse the palettes yourself - AI isn’t great at picking palettes to match an intent yet

Setting Color or Fill Scales

Especially useful are:

  • scale_color_gradient() for gradient scales (or _gradient2() for diverging scales with a “middle” in them), scale_color_viridis() also has some great gradient scales (either discrete or continuous!)
  • scale_color_brewer()/scale_fill_brewer() functions for discrete values, or _distiller() for continuous values, or _fermenter() for binned

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_brewer(palette = 'Dark2')

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_viridis_c()

Setting Color or Fill Scales

ggplot(data, aes(x = person, y = quality, group = category, fill = quality)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1)) +
  scale_x_discrete(position = 'top') + 
  scale_fill_gradient2(midpoint = .03)

Transformations

  • We can transform values as we plot them
  • Many scale_something_continuous entries have a trans option, set to date, log, probability, reciprocal, sqrt, reverse, etc. etc. to perform that transformation before plotting
  • Some transformations have special scale_ functions. TBH the only transformations I see frequently are scale_something_log10() or scale_something_binned()

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point()

Transformations

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + 
  geom_point() + 
  scale_x_log10() + 
  scale_y_continuous(trans='reverse') + 
  scale_color_binned()

Log Scales

When to use log scales?

  • When data is highly skewed, so a few huge observations are drawing all focus
  • When the relationship is multiplicative so you want to show, say, percentage growth

scales

Two main types of functions in scales:

  • Transformation functions like dollar(): dollar(10) creates $10 (NOTE: handy sometimes in RMarkdown text! Also note this creates text, not numbers, so don’t use them in aes() unless you want the variable to be a string)
  • Labeling functions like label_dollar() designed to slot directly into the labels= argument. scale_y_continuous(labels = label_dollar()) turns all your y-axis labels into the dollar equivalent

scales

ggplot(data, aes(x = person, y = quality, fill = category)) + geom_col(position = 'dodge') + 
  scale_y_continuous(labels = label_percent(), limits = c(0,.1))

Scales

The label_ functions have lots of options! You can set the accuracy (precision), decide how to break up big numbers (big.mark) or scale things down to, say, thousands! (scale=1/1000, suffix = 'k')

data(gapminder, package = 'gapminder')
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + 
  scale_x_log10(labels = label_dollar(accuracy = 1, scale = 1/1000, suffix = 'k'))

Documentation

  • There are a zillion options
  • Be sure to use help(whatever) before using something
  • For our purposes graphing with AI this will mostly be for the purposes of figuring out what is possible and if we can guide the AI to implement something in a specific way

Labels with labs()

  • Please label your axes and graphs!!
  • Label legends too by naming their axes
  • Captions and subcaptions available too
  • Annotation and labeling data points we’ll do next time

All Kinds of Labels

data(mtcars)
mtcars <- mtcars %>% mutate(CarName = row.names(mtcars))
ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight',
       title = 'Title', subtitle = 'Subtitle', caption = 'Caption')

Overlaid Graphs

  • In addition to adding chart elements, we can add whole new geometries over the top
  • Some stack naturally (adding a geom_smooth best fit line over top, for example)
  • For others we may have to change the data or aesthetic as we go

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_smooth(method='lm', se = FALSE)

Overlaid Graphs

ggplot(mtcars, aes(x = mpg, y = hp, color = wt)) + geom_point() +
  scale_x_log10() + scale_y_reverse() + scale_color_binned() + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', color = 'Car Weight') +
  geom_text_repel(data = mtcars %>% slice(1:5),aes(label = CarName),hjust=-1)

Facets

  • Separate graphs by group!
  • For sticking together multiple completely disparate graphs, don’t use facets, instead load patchwork and combine them as desired - we’ll discuss this later
ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point() + 
  facet_wrap('cyl') + 
  labs(x = 'Miles per Gallon', y = 'Horsepower', title = 'Horsepower vs. MPG by Cylinders')

ggforce’s facet_zoom

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
  geom_point() +
  facet_zoom(x = Species == 'versicolor')

Imitation and Flattery

  • Pick one of these graphs or from the previous lecture
  • Go into the geometry’s help file and recreate an example, or the graph in these slides
  • Mess with the scales and labels! Try to get it to work!

Make it Look Good

  • Let’s talk about how to set aesthetic attributes along every axis
  • As well as overall styling and theming
  • Highlighting, putting graphs together, making them interactive, annotating…
  • Think about this stuff. Your graphs will look unprofessional if you don’t, and I’ll start grading down all-default styling or missed opportunities for improvement.

Aesthetic Characteristics

  • As you’d expect, we’ll be setting the values of certain aesthetic properties
  • Which ones apply depend on the geometry. See this page. Some common ones:
  • color, linetype, fill, size, shape (and soon: linewidth)
  • Text aesthetics: family (font), fontface, hjust, vjust (see extrafont package to use all your system fonts)

Decorating Axes vs. Decorating Geometries

  • When a characteristic is set inside of an aes(), it becomes an axis and must be mapped to a variable name
  • Outside of an aes() (and in the geometry), it’s a setting applied to the entire geometry

Aesthetic Characteristics

# We've seen this before
mtcars <- mtcars %>%
  mutate(Transmission = factor(am, labels = c('Automatic','Manual')))
ggplot(mtcars, aes(x = mpg, y = hp, color = Transmission)) + 
  geom_point()

Aesthetic Characteristics

ggplot(mtcars, aes(x = mpg, y = hp, color = Transmission, 
                   size = wt, shape = Transmission)) + 
  geom_point()

Aesthetic Characteristics

ggplot(mtcars, aes(x = mpg, y = hp, color = Transmission, 
                   size = wt)) + 
  geom_point(shape = 10)

Aesthetic Characteristics

ggplot(mtcars, aes(x = Transmission, fill = factor(cyl))) + 
  geom_bar(position = 'dodge', linetype = 'dashed', color = 'black')

Theming

  • Pretty much all elements of presentation that aren’t in a geometry can be controlled with theme()
  • The options are endless! See help(theme)
  • We set different aspects of the theme using element_ functions like element_text(), element_line(), element_rect() which take aesthetic settings like size, color, etc.
  • This stuff can be used on its own or in conjunction with prepackaged themese like theme_classic() or theme_minimal() or theme_void()

Axes

  • Change axis and tick lines with axis.line and element_line
  • Most everything can be changed generally (axis.line or even line) or specifically (axis.ticks.x)
library(ggalt)
mtcars2 <- mtcars %>% group_by(Transmission) %>% summarize(count = n())
ggplot(mtcars2, aes(x = Transmission,y=count)) + 
  geom_lollipop(color = 'red', size = 2) + coord_flip() + 
  labs(x = '', y = '') + 
  scale_y_continuous(breaks = c(10,20), limits = c(0,20)) + 
  theme(axis.line = element_line(color = 'red'),
        axis.ticks.x = element_line(size = 5))

Axes

Background

  • Work on the panel to change what goes behind that geometry! element_rect might come up!
  • Pretty much ANY element can be eliminated with element_blank()
  • This gonna be ugly
ggplot(mtcars2, aes(x = Transmission,y=count)) + 
  geom_lollipop(color = 'red', size = 2) + coord_flip() + 
  theme(panel.background = element_rect(color = 'blue', fill = 'yellow'),
        panel.grid.major.x = element_line(color = 'green'),
        panel.grid.major.y = element_blank())

Background

Legends

  • Think carefully about your legend!
  • We’ll leave the legend coding to AI - it’s unnecessarily complicated. But ask yourself questions like:
  • Do I need a legend?
  • Where should the legend be placed?
  • How can I make the legend easy to read?

General Theming Notes

  • How can we think about theming generally?
  • We want, as always, to drive focus towards our story
  • This means not making the theme distracting it, but using it to drive attention to what we want
  • Need focus on the x-axis labels? Make em bold!
  • Often we don’t need distracting background color (as there is in the default theme). Usually get rid of it!

Other Styling

  • Let’s move on to some other ways we can build focus and clarity
  • Highlight and annotations!

Highlighting

  • Highlight certain values and gray others with the gghighlight package
library(gghighlight); data(gapminder, package = 'gapminder')
ggplot(gapminder, aes(x = year, y = lifeExp, color=country)) + geom_line(size = 1.5) + 
  labs(x = NULL, y = "Life Expectancy", title = "North America Only") + 
  scale_x_continuous(limits=c(1950,2015),
                     breaks = c(1950,1970,1990,2010))+
  gghighlight(country %in% c('United States','Canada','Mexico'),
              unhighlighted_params = aes(size=.1), 
              label_params=list(direction='y',nudge_x=10)) + 
  theme_minimal(base_family='serif') 

Highlighting

Highlighting

  • Or do it manually with scale_X_manual()
gapminder %>% mutate(color_name = ifelse(country %in% c('United States','Canada','Mexico'), as.character(country), 'Other')) %>%
ggplot(aes(x = year, y = lifeExp, group = country, color=color_name, size = color_name, alpha = color_name)) + geom_line() + 
  geom_text(aes(label = ifelse(year == 2007 & color_name != 'Other', color_name,'')), 
                           hjust = 0, size = 13/.pt) +
  labs(x = NULL, y = "Life Expectancy", title = "North America Only") + 
  scale_x_continuous(limits=c(1950,2025),
                     breaks = c(1950,1970,1990,2010))+
  scale_color_manual(values = c('red','forestgreen','gray','blue')) + 
  scale_size_manual(values = c(1.5,1.5,.1,1.5)) +
  scale_alpha_manual(values = c(1,1,.2,1)) +
  theme_minimal(base_family='serif') + 
  guides(color = 'none', size = 'none', alpha = 'none')

Highlighting

Annotation

  • There are also many options for annotating our data
  • For direct data point annotation, especially of line graphs, you can do this with some data manipulation and geom_text (see previous slide)
  • AI should be able to put labels and arrows one by one on your graph; you may have to twiddle on your own with coordinates to make it line up properly

Patchwork

library(patchwork)
# Named and saved these earlier
(p1 | p2 - p3)/p4

What Did We Learn?

  • Scales, setting labels, values, and transformations
  • Axis labels
  • Facets
  • Theming
  • Thinking about legends
  • Annotation and highlighting (do it!)