Hey folks, Before digging into this week’s data visualization, I wanted to give you all a heads-up about some learning activities I’m currently developing. First, in the next month or so I will be hosting a one-day, online workshop on the basics of For this week, I’d like to share with you a newsletter that I subscribe to from Philip Bump of the Washington Post. Each week he explores some data using a variety of visualizations and also provides critiques about visualizations he’s found online. Earlier this week his newsletter talked about the success of different seeds in the men’s NCAA basketball tournament. Further into the newsletter he described a “bivariate choropleth plot” showing the correlation between drinking and smoking levels within each county across the US. Those maps are cool. But I (and I think Bump also) find them a bit difficult to decipher. Choropleth plots suffer from the instinct to assume that counties with large land area also have many people. This is very much not true. For his take, he used data from the University of Wisconsin’s Population Health Institute (UWPHI). The UWPHI has data on many variables related to health - including smoking and drinking - for each county. Here was his version of the bivariate choropleth plot rendered as a scatter plot: The five-digit numbers aren’t zip codes, they are FIPS codes. The first two digits indicate the state and the other digits the county in the state. What stands out to you about this visualization? What correlation do you see between the two variables? What questions do you have about the relationship? About how to make the plot? What would you like to learn to implement in R? What do you already feel comfortable executing? Again, this is a scatter plot. My assumption is that when we download the data from UWPHI we’ll get a data frame that we can simplify to three columns - the FIPS code, the percent of smokers, and the percent of drinkers or the number of drinks in some time period. Once we have it in this format we can map the drinking metric to the Second, there are clearly a finite number of possible values since the data already appear to be discretized. This causes a lot of over plotting of the data. I notice that the points are different shades of green. No doubt the more intense shade is where there’s more overlap. We could pull this off by using the Third, there is a fitted line through the data. We can pull this off using Fourth, he creates four quadrants that roughly align with the bivariate nature of the choropleth plot. We could achieve this by drawing light gray horizontal and vertical lines that intersect at the median smoking and drinking levels. Then we could use Finally, around the outside of the cloud of points he includes 20 solid green points with their FIPS codes. I would implement this in three steps. First, I’d use Along the way there are other interesting things we’d want to implement in this figure. Those include removing the x and y-axis text and ticks and customizing the placement of the x and y-axis titles. A more advanced move would be to think about how we might make a function to automatically generate this type of figure for any pair of data columns in the UWPHI dataset. Let me know what interests you about this figure! I’ll be sure to work your feedback into the video when I post it to YouTube in a few weeks.
|
Hey folks, I’ve now produced three livestream videos. What do you think? Do you watch them live or watch them later? Or are they too long? I’m looking for honest feedback! I have to admit that if I hadn’t livestreamed these videos, they would not have been produced. It’s nice that I can more or less record and post without any editing. This is still a bit of an experiment. I think fewer people are watching the episodes which makes me worry that this might be an overall step backwards for you...
Hey folks! Do you ever get that feeling where you’re scared to try something? But then you do it anyway… and it turns out way better than you expected? Well that was me on Wednesday morning. I ran my first livestream on YouTube recreating a ridgeline plot from Our World in Data showing the US baby boom. I wrote about it here in the newsletter back in May. The full session was about 2.5 hours. YouTube tells me that 272 people popped in at some point during the session. To be honest, I really...
Hey folks, I need your feedback on an idea! Don’t worry, there’s some visualization stuff at the bottom. I had a video nearly ready to post this week using a ridgeline plot to show the baby boom. I think I did a great job of recreating the plot. But through a series of unfortunate events, I lost the video. I actually recorded the video three times because my computer kept crashing as I was recording it. This was on top of increasing busyness on my part with teaching, proposal writing,...