Last time, we took a look at an interactive scatter plot that suggested how certain stats in college basketball might be related to winning percentage. Today, we’re going to push that further several ways:
- We’ll explore what correlation is in more detail,
- We’ll build a chart with Observable, and
- We’ll use a different data set with more variables and a different focus
The end product
Here’s a refined version of the chart we saw last time:
Correlation?
In statistics, the correlation between two numeric variables is a single number intended to quantify the strength of the linear relationship between those variables. It satisfies the following :
- The correlation is always between -1 and 1.
- The endpoints +/- 1 indicate a perfect linear relationship.
- A positive number indicates a positive linear relationship; thus, when one quantity increases, the other also generally increases.
- A negative number indicates a negative linear relationship; when one quantity increases, the other generally decreases.
- A number close to zero indicates a weak linear relationship.
These ideas are illustrated in the interactive figure below.
It’s worth mentioning that correlation is an example of a summary statistic. Thus, it captures some big picture characteristic of the data in a single quantity. It must certainly miss some details to do so. The differences between the big picture and the minute details are illustrated nicely in this recent article from Scientific American.
An activity
OK, it’s your turn to build a scatter plot that looks something like the one at the top of this page. To do so:
- Log onto Observable and create a new notebook
- Download this data and attach it to your notebook
- Create a chart cell to generate a scatter plot
- Customize the scatter plot.
An assignment
I’ll post more formal details on this soon but, I’d like you to create a new WordPress post that embeds scatter plots using this same data using both Datawrapper and Observable. I’d love to hear your preferences!