|
Hey folks, Before digging into this week’s data visualization, I wanted to give you all a heads-up about some learning activities I’m currently developing. First, in the next month or so I will be hosting a one-day, online workshop on the basics of For this week, I’d like to share with you a newsletter that I subscribe to from Philip Bump of the Washington Post. Each week he explores some data using a variety of visualizations and also provides critiques about visualizations he’s found online. Earlier this week his newsletter talked about the success of different seeds in the men’s NCAA basketball tournament. Further into the newsletter he described a “bivariate choropleth plot” showing the correlation between drinking and smoking levels within each county across the US. Those maps are cool. But I (and I think Bump also) find them a bit difficult to decipher. Choropleth plots suffer from the instinct to assume that counties with large land area also have many people. This is very much not true. For his take, he used data from the University of Wisconsin’s Population Health Institute (UWPHI). The UWPHI has data on many variables related to health - including smoking and drinking - for each county. Here was his version of the bivariate choropleth plot rendered as a scatter plot: The five-digit numbers aren’t zip codes, they are FIPS codes. The first two digits indicate the state and the other digits the county in the state. What stands out to you about this visualization? What correlation do you see between the two variables? What questions do you have about the relationship? About how to make the plot? What would you like to learn to implement in R? What do you already feel comfortable executing? Again, this is a scatter plot. My assumption is that when we download the data from UWPHI we’ll get a data frame that we can simplify to three columns - the FIPS code, the percent of smokers, and the percent of drinkers or the number of drinks in some time period. Once we have it in this format we can map the drinking metric to the Second, there are clearly a finite number of possible values since the data already appear to be discretized. This causes a lot of over plotting of the data. I notice that the points are different shades of green. No doubt the more intense shade is where there’s more overlap. We could pull this off by using the Third, there is a fitted line through the data. We can pull this off using Fourth, he creates four quadrants that roughly align with the bivariate nature of the choropleth plot. We could achieve this by drawing light gray horizontal and vertical lines that intersect at the median smoking and drinking levels. Then we could use Finally, around the outside of the cloud of points he includes 20 solid green points with their FIPS codes. I would implement this in three steps. First, I’d use Along the way there are other interesting things we’d want to implement in this figure. Those include removing the x and y-axis text and ticks and customizing the placement of the x and y-axis titles. A more advanced move would be to think about how we might make a function to automatically generate this type of figure for any pair of data columns in the UWPHI dataset. Let me know what interests you about this figure! I’ll be sure to work your feedback into the video when I post it to YouTube in a few weeks.
|
Hey folks! I just got back from a seminar. I’m still trying to stretch out my eyes from straining to see the small text on each slide! If you don’t know why I’m brining this up, then you must have missed the videos I posted earlier this week. I was discussing the factors we should consider when converting figures designed for papers to figures designed to a slide deck. You can see me critique a figure from my own lab here and the livestream where I refactor the figure can be found here. I’d...
Hey folks, I was a student-invited speaker at the Syracuse University Biology department this week. It was great to meet with them and hear how they are benefiting from these newsletters and my videos. As much as I love posting newsletters and videos, seeing people light up at ideas, laugh at my jokes, and tell me how they are using what I teach them is like jet fuel. I actually gave two talks. One talk covered what I’ve learned about data visualization by critiquing, recreating, and remaking...
Hey folks, If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a...