Lecture 1: Data Communication

Nick Huntington-Klein

04 March, 2023

Data Communication

Welcome to Data Communication!

Data Communication

This class is about

  • How to turn data into a message
  • How to cleanly communicate that message
  • Technically, how to create that communication

Media

This will largely focus on data visualization but will also cover:

  • Exploratory data analysis
  • Tables
  • Notebooks
  • Workflow
  • Data cleaning and manipulation

What We’re Doing

  • This is sort of a mutt of a class, covering coding, communications concepts, data management and cleaning, and even some analytical stuff
  • That’s because this is really a course about preparing you to go out into the real world and work with data
  • The thing I really want you to come away with after this class is this:
  • When you are working with data, think carefully about what you’re doing, and make sure you’re doing it right.
  • It really is just thinking carefully and attending to detail, not technical skill. This is more important than coding. We’ll have to code, but code can only be as good as what you’re trying to make it do

What We’re Doing

  • You’re going to leave the world of a classroom where the project is set up so your prof knows there is a right answer, what that right answer is, and they can check your work
  • In the real world, you will be expected to produce good work on your own, which means you’ll be checking your work, and there won’t be a “wrong answer” buzzer that sounds if you make a mistake
  • I want you to build enough confidence in working with data that you can look at something the computer has spit out and think “the computer only does what it’s told. I know what this should be. This is not yet right.”

What We’re Doing

So when you do anything with data - load a file, create a variable, make a graph, do an analysis, ask:

  1. What am I trying to do?
  2. Did it work properly? How can I CHECK if it worked properly?
  3. If it’s a form of communication, have you made it as clear as possible? Try to find places where people might get confused.

Never think “I pushed the button/wrote the line that made the thing go. I’m done.” It’s on you to fulfill the requirement in a way that’s good. Take pride in your work, or it will be bad.

Noticing

More than anything else, this class is about noticing things. I’m serious.

  • Data analysis is full of opportunities to mess up. Mistakes are unavoidable, and results that make no sense just happen
  • Being a good analyst means noticing mistakes, and noticing what your output looks like, and taking responsibility for fixing it
  • Be careful in your work. Take time. Review.
  • That’s what I’m hoping to teach, and something I will be assessing you on

What We’re Doing

It takes no technical/coding skill, or extensive explanation from me, to see why this graph is bad. If you produce this graph, you should take it upon yourself to fix it. Ask: how can I improve this?

Admin

Let’s do some housekeeping:

  • Course website
  • Syllabus
  • Expectations and assignments

Resources

In addition to the course website and Healy’s Data Visualization:

What is Data Communication?

  • There is a lot of information in the world
  • And a lot of information at your fingertips
  • Too much
  • And so we simplify to tell the story underlying the data

What is Data Communicatoin?

  • I have a result from my data
  • I want you to understand and believe my result
  • How can I demonstrate this result to you so you’ll understand it?
  • How can I present the data so that you understand where the result came from and why they should agree that the result is accurate?

The Map and the Territory

  • Someone asks you for directions to Dick’s on Broadway
  • Do you hand them your 3.2GB perfectly detailed shapefile of Capitol Hill?
  • The answer is in there, and much more precisely than you could possibly tell them
  • But it doesn’t really answer their question, right?

The Map and the Territory

  • The goal of a data analyst is to take that shapefile and figure out how to get to Dick’s
  • The goal of a data communicator is to take what the data analyst figured out and figure out what part of the map to show you to help you understand how to get to Dick’s
  • A good data communicator will make understanding the directions easy and obvious

Storytelling With Data

  1. Understand the context
  2. Figure out the story (what you want the reader to understand, and why)
  3. Choose an appropriate visual display
  4. Eliminate clutter
  5. Focus attention where you want it
  6. Think like a designer
  7. Tell the story

Examples Outside of Data

  • Let’s consider some examples of effective communication of information outside the narrow range of “data communication”

How Does This Faucet Work?

Understand the context

The person who designed this faucet understands, hopefully, how water pipes work and how opening a valve can allow water to flow

Figure out the story

What is important for the audience to know?

I don’t care if the user understands how pipes work, or their history of water usage.

I need the user to know how to properly turn the water on

Choose an appropriate visual display

We have a handle close to the source of the water

It implies the ways the handle can be turned - towards us or away, left and right

The display doesn’t allow us any information about how those directions relate to pressure or temperature

(what version of a faucet might?)

Eliminate clutter

Nothin’ but handle

We could have other stuff here - sink stopper, an LCD with the weather report, but do we need it?

Focus attention where you want it

There’s nowhere to look but the handle (other than the spigot, not pictured)

The shape and design pushes you towards it - it calls for a hand!

Think like a designer

We want the user to understand that they can pull or rotate the handle to affect the water flow

This design affords both of those uses

And nothing else

There aren’t a lot of ways to use this wrong, other than messing up pressure vs. temperature

Tell the story

Water flow can be controlled by twisting this handle

If you were a monkey who had never experienced plumbing, it would only take you about ten seconds to follow the design to that handle, pull it, and learn about the connection between handles and water flow

Gapminder

  • Let’s move into some data
  • Gapminder (from the Gapminder institute) is a data set that, among other things, shows how differences between countries change over time
  • One thing it is commonly used to show is that economic development aids health development
  • GDP per capita \(\rightarrow\) life expectancy
  • Also, generally, both of those things have improved over time

Gapminder

## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

Understand the Context

How do things work here?

  • We know that life expectancy and GDP per capita go together closely

Figure out the story

What do we want people to learn?

  • GDP per capita and life expectancy go together strongly
  • Both GDP per capita and life expectancy have increased a lot over time (i.e. things on this front are getting better!)

This is useful information and I can see why someone would want to understand this for its own sake, to understand the world better (and perhaps have some actionable takeaway as a result!)

Choose an appropriate visual display

  • We want something that will show a relationship between two variables with many observations
  • NOW WE HAVE SOME OUTPUT. Put on those “how can we make it better?” goggles: the first output we get is NOT DONE

Eliminate clutter

  • That’s a lot of dots! Can we tell the same story by focusing on just a few countries?
  • Also, that’s a lot of background ink…

Focus attention where you want it

  • Those few high-GDP observations are drawing a LOT of space, as opposed to that left blob. Let’s put the x-axis on a log scale

Think like a designer

  • Why make the reader work?
  • Also, realize this graph sort of feels like it’s moving forward in time. Uh-oh…

Tell the story

  • Realize that we’ve lost the “things get better over time” angle
  • And also lost the part where we want to talk about the whole world!
  • Use what we’ve done so far to think about how we can show the dual GDP-and-life-expectancy improvements over time for everyone

What could still be improved?

Let’s See What We Can Get