Lecture 8: Summarizing Data Part 2: Plots

Nick Huntington-Klein

January 29, 2019

Recap

  • Summary statistics are ways of describing the distribution of a variable
  • Examples include mean, percentiles, SD, and specific percentiles of interest (median, min, max)
  • Another, very good way to describe a variable is by actually showing that distribution with a graph
  • I showed you plenty of graphs last time, this time I’m going to show you how to make them yourself!

Note about ggplot

  • As I mentioned before, we’re going straightforward, doing our plotting with base R
  • In the case of plotting, it’s going to be easier to use base R, but also not look as good as the alternative, ggplot2, which is also in the tidyverse package.
  • ggplot2 code for plots will be available in the slides. Just search for ‘ggplot’
  • And have also been for most previous slides - I’ve been making most graphs with it!
  • Don’t forget extra credit opportunity for learning ggplot2

Plot types we will cover:

  • Density plots (for continuous)
  • Histograms (for continuous)
  • Box plots (for continuous)
  • Bar plots (for categorical)
  • Plus:
    • Adding plots together
    • Putting lines on plots
    • Makin ’em look good!

Plots we won’t:

  • Lots and lots and lots of plot options
  • Mosaics, Sankey plots, pie graphs
  • Some aren’t common in Econ but could be!
  • Others are just too tricky (like maps)
  • See The R Graph Gallery

Density plots and histograms

  • Density plots and histograms will show you the full distribution of the variable
  • Values along the x-axis, and how often those values show up on y
  • The difference - density plots will present a smooth line by averaging nearby values
  • A histogram will create “bins” and tell you how many observations fall into each
  • (don’t forget we can save plots using Export in the Plots pane)

Density plots

  • To make a density plot, we take our variable and make a density() object out of it, then plot() that thing!
  • Let’s play around with data from the Ecdat package
install.packages('Ecdat')
library(Ecdat)
data(MCAS)
plot(density(MCAS$totsc4))

Titles and labels

  • Readability is super important in graphs
  • Add labels and titles! Titles with ‘main’ and axis labels with ‘xlab’ and ‘ylab’
plot(density(MCAS$totsc4),main='Massachusetts Test Scores',
     xlab='Fourth Grade Test Scores')

Histograms

  • Histograms require a little less typing - just hist()
hist(MCAS$totsc4)

Histograms

  • These need labels too! Other important options:
  • Do proportions with freq=FALSE, or change how many bins there are, or where they are, with breaks
hist(MCAS$totsc4,xlab="Fourth Grade Test Scores",
     main="Test Score Histogram",freq=FALSE,breaks=50)

Box plots

  • Less common in economics, but a good way to look at your data
  • And check for outliers!
  • Basically a summary table in graph form
  • Shows the 25%, 50%, 75% percentiles
  • Plus lines that represent 1.5 IQR (75th minus 25th)
  • And dots for anything otuside that range

Box plots

  • Simple!
  • Note the negative outliers - not necessarily a problem but good to know!
boxplot(MCAS$totsc4,main="Box Plot of 4th Grade Scores")

Box plots

  • Easy to look at multiple variables at once, too
boxplot(select(MCAS,totsc4,totsc8),main="Box Plot of 4th Grade Scores")