Lecture 16: Intro to Tableau
Nick Huntington-Klein
23 June, 2022
Graphing Software
- A lot of this class has been spent going over
ggplot2 in R
- Today we’ll be going over Tableau
- (not to mention all we leave out - PowerBI,
matplotlib/Seaborn in Python, every
stats package other than R, automated stuff like Azure, etc. etc.)
- Why introduce these different tools, and what do they do?
Graphing Software
- It’s most important that we understand the underlying concepts of
data communication
- But also we have to know how to implement those concepts
- And since so much of this is subjective, we have to practice,
practice, practice. Working with the software is a requirement!
- Plus, stuff like learning the grammar of graphics, baked into
ggplot2, can help strengthen those concepts
Graphing Software
- R: High-power but with a learning curve. Great for any size
data sets, great for summarizing,
endless customization. Great for contrasting
groups
- If the data isn’t already in graphable format, R itself is
crucial for cleaning/formatting it
- Supports dashboards, notebooks,
interactivity, and animation
- Until you’re used to it, a lot of work to get going
though!
- Because there is direct access to the underlying pieces of
the graph, there is no better tool for making one
beautiful image
- Requires additional work for working with enormous databases
Graphing Software
- Tableau: High-power, with less learning curve, but more constrained.
Great for any size data sets, great for
summarizing, great for contrating groups
- Great at making a host of graphs at once to explore a data
set thoroughly
- It can smartly suggest for you what kind of graph to
use
- Built-in easy to work with tools for enormous
databases and dashboards
- Everything is interactive by default
- Easier than R and also less work but deep
customization a bit more difficult
The Canned-Coded Scale
- It is extremely common in software to have to choose between a
deeper, more customizable option (such as coding something in
ggplot2) or a more canned, one-click-does-it option
(like Tableau or Excel)
- Until you’re very familiar with the coding option, the canned option
will be faster and easier…
- IF the thing you want to do is something the
canned-software authors anticipated and decided to make easy
- If it’s not, it will usually be much harder than just
coding the thing up
The Canned-Coded Scale
- Tableau is a very cool tool and I encourage you to use it (as are
other more canned systems like PowerBI, Azure ,etc.)
- But don’t be surprised when you start thinking “hmm but what if my
visualization was different in THIS nonstandard way” and start Googling,
only to find it’s a fourteen-step process that doesn’t actually
work
- Using Tableau for the simple stuff, and immediately shifting to code
for anything that’s not stock-standard, is a decent strategy!
- That said, let’s explore Tableau
Tableau
- As opposed to (the way we’ve used) R, Tableau uses database
logic - it sends the calculation to the data rather
than bringing the data to the calculation
- So we’ll have “connections” to data rather than bringing data in;
also, if data updates, so will Tableau
- And the output we have will include underlying data, but only our
calculations of the data.
- Let’s walk through some usage of Tableau
Importing Data
- A connection can be “Live” (database-like) or an “Extract” (bring it
in!)
- Specify how variables should be read by clicking on them
- Tableau will guess as much as possible
- Unlike R we do filtering, selection, and recoding at THIS step
- Strongly recommended to do all data-cleaning in R BEFORE importing
to Tableau - Tableau can clean data but it’s a real drag
Importing Data
Importing Data
- NAs can make it think it’s a string, let’s tell it earnings is a
number
- Let’s group
pred_degree_awarded_ipeds
: 1 =
less-than-two-year college, 2 = two-year, 3 = four-year+
- Could also do a filter here, or only select certain variables
- Or create variables calculated from others - let’s make an
employment rate
The Tableau Screen
- Dimensions (categorical) and measures (continuous numeric) on the
left. Tableau really cares about discrete vs. continuous.
- Columns and Rows areas to drag to
- (single image export: Worksheet \(\rightarrow\) Copy image)
- “Show me” default graphs on the right. Decent for initial
exploration but generally not what you want to work with
Working with Variables
- To start making a graph, think about what you want on the x-axis
(column) or y-axis (row)
- How it treats a variable can then be manipulated: discrete
vs. continuous and measure vs. dimension. Measures get
aggregated/summarized in some specified way, dimensions don’t!
- Plenty of aggregation options for continuous variables
Working with Variables
- Also click the variables to do things like sort the order or turn
them “dual axis” (two variables on the same axis)
- In general, like Excel, click what you want to edit, like axis
titles
- Note with a graph we can find the data that goes into it
- Some variables like “Measure Names” can become their own “shelf”
like column/row for more complex diagrams
Common Problems in Tableau
- If something’s not working, check whether your variable has the
correct continuous/discrete measure/dimension settings
- Note that sometimes “grouped” variables don’t work the same as
regular categorical variables - make life easier by just making them
strings grouped as you like in R and import them
- Did you try Googling to see how you can do (nonstandard thing
X)?
Basic Example
Basic Example
Basic Example
- Let’s do this one together: what should be in rows and columns
here?
Another Example
- We can also set labels by clicking-and-labeling
- Let’s replicate the graph on the next page
And Explore
- Click around and make your own graph
- See what kinds of customizations you can do