X
is high, would we expect Y
to also be high, or be low?Some terms:
Y
with another X
means predicting your Y
by looking at the distribution of Y
for your value of X
table(wage1$numdep,wage1$smsa,dnn=c('Num. Dependents','Lives in Metropolitan Area'))
## Lives in Metropolitan Area
## Num. Dependents 0 1
## 0 60 192
## 1 27 78
## 2 38 61
## 3 13 32
## 4 3 13
## 5 3 4
## 6 2 0
prop.table(table(wage1$numdep,wage1$smsa,dnn=c('Num. Dependents','Lives in Metropolitan Area')),margin=2)
## Lives in Metropolitan Area
## Num. Dependents 0 1
## 0 0.41095890 0.50526316
## 1 0.18493151 0.20526316
## 2 0.26027397 0.16052632
## 3 0.08904110 0.08421053
## 4 0.02054795 0.03421053
## 5 0.02054795 0.01052632
## 6 0.01369863 0.00000000
prop.table(table(wage1$numdep,wage1$smsa,dnn=c('Number of Dependents','Lives in Metropolitan Area')),margin=1)
## Lives in Metropolitan Area
## Number of Dependents 0 1
## 0 0.2380952 0.7619048
## 1 0.2571429 0.7428571
## 2 0.3838384 0.6161616
## 3 0.2888889 0.7111111
## 4 0.1875000 0.8125000
## 5 0.4285714 0.5714286
## 6 1.0000000 0.0000000
group_by()
to organize the data into groupssummarize()
the data within those groupswage1 %>%
group_by(smsa) %>%
summarize(numdep=mean(numdep))
## # A tibble: 2 x 2
## smsa numdep
## <dbl> <dbl>
## 1 0 1.24
## 2 1 0.968
smsa
is high, numdep
tends to be low - negative correlation!X
is associated with a correlation-standard-deviation increase in Y
”cor(wage1$numdep,wage1$smsa)
## [1] -0.09636769
cor(wage1$smsa,wage1$numdep)
## [1] -0.09636769
Let’s go back to those different means:
## # A tibble: 2 x 2
## smsa numdep
## <dbl> <dbl>
## 1 0 1.24
## 2 1 0.968
table(df$var1,df$var2)
to look at two variables togetherprop.table(table(df$var1,df$var2))
for the proportion in each cellprop.table(table(df$var1,df$var2),margin=2)
to get proportions within each columnprop.table(table(df$var1,df$var2),margin=1)
to get proportions within each rowdf %>% group_by(var1) %>% summarize(mean(var2))
to get mean of var2 for each value of var1cor(df$var1,df$var2)
to calculate correlationplot(xvar,yvar)
with two variablesplot(wage1$educ,wage1$wage,xlab="Years of Education",ylab="Wage")
educ
, what changes about the values of wage
we see?plot(wage1$educ,wage1$wage,xlab="Years of Education",ylab="Wage")
abline(-.9,.5,col='red')
plot(function(x) 5.4-.6*x+.05*(x^2),0,18,add=TRUE,col='blue')
plot(xvar,yvar)
is extremely powerful, and will show you relationships at a glancecor(wage1$wage,wage1$educ)
= 0.4059033library(Ecdat)
data(Clothing)
plot(Clothing$sales,Clothing$margin,xlab="Gross Sales",ylab="Margin")
library(Ecdat)
data(Diamond)
plot(Diamond$carat,Diamond$price,xlab="Number of Carats",ylab="Price")