Hey folks, Before digging into this week’s data visualization, I wanted to give you all a heads-up about some learning activities I’m currently developing. First, in the next month or so I will be hosting a one-day, online workshop on the basics of For this week, I’d like to share with you a newsletter that I subscribe to from Philip Bump of the Washington Post. Each week he explores some data using a variety of visualizations and also provides critiques about visualizations he’s found online. Earlier this week his newsletter talked about the success of different seeds in the men’s NCAA basketball tournament. Further into the newsletter he described a “bivariate choropleth plot” showing the correlation between drinking and smoking levels within each county across the US. Those maps are cool. But I (and I think Bump also) find them a bit difficult to decipher. Choropleth plots suffer from the instinct to assume that counties with large land area also have many people. This is very much not true. For his take, he used data from the University of Wisconsin’s Population Health Institute (UWPHI). The UWPHI has data on many variables related to health - including smoking and drinking - for each county. Here was his version of the bivariate choropleth plot rendered as a scatter plot: The five-digit numbers aren’t zip codes, they are FIPS codes. The first two digits indicate the state and the other digits the county in the state. What stands out to you about this visualization? What correlation do you see between the two variables? What questions do you have about the relationship? About how to make the plot? What would you like to learn to implement in R? What do you already feel comfortable executing? Again, this is a scatter plot. My assumption is that when we download the data from UWPHI we’ll get a data frame that we can simplify to three columns - the FIPS code, the percent of smokers, and the percent of drinkers or the number of drinks in some time period. Once we have it in this format we can map the drinking metric to the Second, there are clearly a finite number of possible values since the data already appear to be discretized. This causes a lot of over plotting of the data. I notice that the points are different shades of green. No doubt the more intense shade is where there’s more overlap. We could pull this off by using the Third, there is a fitted line through the data. We can pull this off using Fourth, he creates four quadrants that roughly align with the bivariate nature of the choropleth plot. We could achieve this by drawing light gray horizontal and vertical lines that intersect at the median smoking and drinking levels. Then we could use Finally, around the outside of the cloud of points he includes 20 solid green points with their FIPS codes. I would implement this in three steps. First, I’d use Along the way there are other interesting things we’d want to implement in this figure. Those include removing the x and y-axis text and ticks and customizing the placement of the x and y-axis titles. A more advanced move would be to think about how we might make a function to automatically generate this type of figure for any pair of data columns in the UWPHI dataset. Let me know what interests you about this figure! I’ll be sure to work your feedback into the video when I post it to YouTube in a few weeks.
|
Hey folks! As I’m writing this newsletter the US government is in shutdown mode with no clear signs that things will get going anytime soon. I’ll withhold my own political take except to say that my family has been running without an official budget for about 25 years. I don’t recommend it, but we know basically how much money goes to our mortgage, insurance, groceries, charities, etc. and how much money we generally have left over. Somehow we still are able to spend money on living a pretty...
Hey folks! This week I have a figure for you from the New York Times based on a poll they did with Siena that describes Americans’ sentiments concerning Israel’s actions in their war with Gaza. What does it say to me? This plot is saying that more Americans think that Israel is intentionally killing civilians than they did in December 2023. The change in percentage of people in the other categories seems to decrease accordingly. What do you like? I love slope plots! I think they’re a great...
Hey folks, This week I have an interesting figure for you from the Financial Times from an e-mail newsletter they distribute each week describing some visualization related to climate change. Before reading further, go ahead and spend a few minutes with the image. What does it say to you? What do you like? What don’t you like about it? How do you think you would go about making it in R? I’d encourage you to write down any of your answers to these questions before reading what I have to say....