- Summary statistics are ways of describing the
*distribution*of a variable - Examples include mean, percentiles, SD, and specific percentiles of interest (median, min, max)
- Another, very good way to describe a variable is by actually showing that distribution with a graph
- I showed you plenty of graphs last time, this time I’m going to show you how to make them yourself!

- As I mentioned before, we’re going straightforward, doing our plotting with base R
- In the case of plotting, it’s going to be easier to use base R, but also not look as good as the alternative,
`ggplot2`

, which is also in the tidyverse package. `ggplot2`

code for plots will be available in the slides. Just search for ‘ggplot’- And have also been for most previous slides - I’ve been making most graphs with it!
- Don’t forget extra credit opportunity for learning
`ggplot2`

- Density plots (for continuous)
- Histograms (for continuous)
- Box plots (for continuous)
- Bar plots (for categorical)
- Plus:
- Adding plots together
- Putting lines on plots
- Makin ’em look good!

- Lots and lots and lots of plot options
- Mosaics, Sankey plots, pie graphs
- Some aren’t common in Econ but could be!
- Others are just too tricky (like
*maps*) - See The R Graph Gallery

- Density plots and histograms will show you the full distribution of the variable
- Values along the x-axis, and how often those values show up on y
- The difference - density plots will present a smooth line by averaging nearby values
- A histogram will create “bins” and tell you how many observations fall into each
- (don’t forget we can save plots using Export in the Plots pane)

- To make a density plot, we take our variable and make a
`density()`

object out of it, then`plot()`

that thing! - Let’s play around with data from the
`Ecdat`

package

```
install.packages('Ecdat')
library(Ecdat)
data(MCAS)
plot(density(MCAS$totsc4))
```

- Readability is super important in graphs
- Add labels and titles! Titles with ‘main’ and axis labels with ‘xlab’ and ‘ylab’

```
plot(density(MCAS$totsc4),main='Massachusetts Test Scores',
xlab='Fourth Grade Test Scores')
```

- Histograms require a little less typing - just
`hist()`

`hist(MCAS$totsc4)`

- These need labels too! Other important options:
- Do proportions with
`freq=FALSE`

, or change how many bins there are, or where they are, with`breaks`

```
hist(MCAS$totsc4,xlab="Fourth Grade Test Scores",
main="Test Score Histogram",freq=FALSE,breaks=50)
```

- Less common in economics, but a good way to look at your data
- And check for outliers!
- Basically a summary table in graph form
- Shows the 25%, 50%, 75% percentiles
- Plus lines that represent 1.5 IQR (75th minus 25th)
- And dots for anything otuside that range

- Simple!
- Note the negative outliers - not necessarily a problem but good to know!

`boxplot(MCAS$totsc4,main="Box Plot of 4th Grade Scores")`

- Easy to look at multiple variables at once, too

`boxplot(select(MCAS,totsc4,totsc8),main="Box Plot of 4th Grade Scores")`