|
Hey folks, Before digging into this week’s data visualization, I wanted to give you all a heads-up about some learning activities I’m currently developing. First, in the next month or so I will be hosting a one-day, online workshop on the basics of For this week, I’d like to share with you a newsletter that I subscribe to from Philip Bump of the Washington Post. Each week he explores some data using a variety of visualizations and also provides critiques about visualizations he’s found online. Earlier this week his newsletter talked about the success of different seeds in the men’s NCAA basketball tournament. Further into the newsletter he described a “bivariate choropleth plot” showing the correlation between drinking and smoking levels within each county across the US. Those maps are cool. But I (and I think Bump also) find them a bit difficult to decipher. Choropleth plots suffer from the instinct to assume that counties with large land area also have many people. This is very much not true. For his take, he used data from the University of Wisconsin’s Population Health Institute (UWPHI). The UWPHI has data on many variables related to health - including smoking and drinking - for each county. Here was his version of the bivariate choropleth plot rendered as a scatter plot: The five-digit numbers aren’t zip codes, they are FIPS codes. The first two digits indicate the state and the other digits the county in the state. What stands out to you about this visualization? What correlation do you see between the two variables? What questions do you have about the relationship? About how to make the plot? What would you like to learn to implement in R? What do you already feel comfortable executing? Again, this is a scatter plot. My assumption is that when we download the data from UWPHI we’ll get a data frame that we can simplify to three columns - the FIPS code, the percent of smokers, and the percent of drinkers or the number of drinks in some time period. Once we have it in this format we can map the drinking metric to the Second, there are clearly a finite number of possible values since the data already appear to be discretized. This causes a lot of over plotting of the data. I notice that the points are different shades of green. No doubt the more intense shade is where there’s more overlap. We could pull this off by using the Third, there is a fitted line through the data. We can pull this off using Fourth, he creates four quadrants that roughly align with the bivariate nature of the choropleth plot. We could achieve this by drawing light gray horizontal and vertical lines that intersect at the median smoking and drinking levels. Then we could use Finally, around the outside of the cloud of points he includes 20 solid green points with their FIPS codes. I would implement this in three steps. First, I’d use Along the way there are other interesting things we’d want to implement in this figure. Those include removing the x and y-axis text and ticks and customizing the placement of the x and y-axis titles. A more advanced move would be to think about how we might make a function to automatically generate this type of figure for any pair of data columns in the UWPHI dataset. Let me know what interests you about this figure! I’ll be sure to work your feedback into the video when I post it to YouTube in a few weeks.
|
Hey folks, What a year! This will be the last newsletter of 2025 and so it’s a natural break point to think back on the year and to look forward to the next. Some highlights for me have been recreating a number of panels from the collection of WEB DuBois visualizations on YouTube, recreating plots from the popular media, and modifying and recreating figures from the scientific literature. I guess you could say 2025 was a year of “recreating”! I have found this approach to making...
Hey folks, As 2025 is winding down, I want to encourage you to think about your goals for 2026! For many people designing an effective visualization and then implementing it with the tool of their choice is too much to take on at once. I think this is why many researchers recycle approaches that they see in the literature or that their mentors insist they use. Of course, this perpetuates problematic design practices. What if you could break out of these practices? What if you could tell your...
Hey folks, Did you miss me last week? Friday was the day after the US Thanksgiving holiday and I just couldn’t get everything done that I needed to. The result was an extra livestream on the figure I shared in the previous newsletter. If you haven’t had a chance to watch the three videos (one critique, a livestream, and another livestream) from that figure, I really encourage you to. In the first livestream I made an effort to simplify the panels as a set of facets. Towards the end a viewer...