Lecture 4: R and Quarto

Nick Huntington-Klein

25 February, 2023

Tech!

  • As we go through the course, we’re going to need to kep data visualization and communication principles close at hand
  • But it’s also unavoidable that actually using them will require some technical skill on our part
  • In this class we’ll be using R
  • Specifically, the tidyverse set of packages, and Quarto
  • I don’t expect you to become R experts. You’re not learning R so much as learning dplyr and ggplot2 and some extra details

R

  • Why R instead of Python?
  • It’s good to know more than one language (nothing ever lasts!)
  • ggplot2 is a top-tier visualization package and makes it much easier to implement our principles
  • Data-cleaning is a bit more intuitive than in pandas (and faster too if you’re using data.table but we aren’t)
  • Python > R in designing software for production, or in some machine learning apps. But we aren’t doing those!

R - The Basics

  • Let’s switch over to an RStudio walkthrough. We’ll cover:
  • Getting set up
  • The RStudio window
  • Using basic R elements
  • I’ll also show you how to set up the swirl package so you can do some walkthroughs on your own

Loading in data

  • For the most part the tidyverse package will cover all our needs
  • For loading in (and saving) data, I strongly suggest the use of the rio package
  • This package is new, so you won’t find it on a lot of Google searches for “R load CSV file”
  • But it’s super easy. Doesn’t even matter what file format it is. import('filename.csv') or .Rdata or .xlsx etc. all work immediately. import_list() can import all the sheets of an Excel workbook, or a bunch of files, all at once

My Promise

  • There are many ways to do anything in R
  • But if I’ve shown it in class, almost certainly the way I’m showing it is the easiest, especially in the context of this class, as I’ve chosen stuff that makes sense for us specifically
  • Don’t make yourself work too hard! If you’re trying to figure something out, ask “did we cover this in class?” If so, go back to this material - it will probably be less work than whatever Google solution you find
  • Like, seriously. The number of times students spend hours and hours trying to stitch together a solution from dozens of StackExchange answers, when it’s something I’ve given a one-line solution to in class, is a lot

Quarto

  • Quarto is a text and code processing system
  • Text and code in -> Formatted output out (HTML, Word, PDF, slides, books, etc.)
  • These slides are made in Quarto. I write books and papers in it too.

Markdown

  • Markdown is a “markup language” like HTML or LaTeX. All your formatting goes in as raw text, and then you run it through an interpreter to get output
  • Markdown is super super simple
  • Let’s walk through the Markdown cheatsheet
  • Note HTML code also works!

Markdown

  • Why Markdown instead of, like, Word? or JuPyteR?
  • Can be automated. Includes code and output directly in the document, including for in-line numbers
  • Much easier to share via something like, e.g., GitHub
  • Output is in standard formats like Word or HTML so they don’t need any sort of setup to see it

The YAML

  • The YAML is the set of text at the top of a Quarto that tells it how to interpret all that text. Here’s an example YAML:
---
title: "Untitled"
format: 
  html:
    embed-resources: true
    toc: true
editor: visual
---

Code Chunks

  • Include code in-line (fantastic for memos) with `r mean(1:3)` which makes 2
  • Start a new code chunk with a forward slash, /. Let’s take a look at an example Quarto doc
  • Code goes inside the chunks, writing goes outside
  • Note: want a data frame to nicely format as a table? use knitr::kable().

Code Chunks

  • Common options to set in each code chunk, or altogether in the YAML:
  • echo: Show code or not?
  • eval: Evaluate the code inside or not?
  • warning and message: Show warnings and messages from the code?
  • include: Include any output at all or not?

Common Mistakes

  • Let’s take a look at the Common Errors and Mistakes Cheatsheet
  • Common Quarto mistakes: forgetting the be sure that all the code you need to run from fresh are in the Quarto, including install.packages() (don’t do it!), working directory issues, using the “Import Data” wizard

Let’s do it

  • Make a new Quarto HTML document
  • YAML: Turn the toc on, title it and add your name
  • Make one big section and title it (#) and two small (##)
  • In the setup chunk, load the tidyverse and load data(storms)
  • Write some text in the first small one, including a link to something
  • In the second, add a code chunk that does a summary() of storms
  • Then, some more text, with an in-line text to get the mean of storms$wind.
  • Bonus: load the scales package and use number() to format the in-line text.