|
Hey folks, I hope you enjoyed thinking last week about how you would recreate Plate 12 from the WEB DuBois collection of visuals he showed at the 1900 Paris Exhibition using ggplot2 and related R tools. You can find the entire collection of “data portraits” in a book assembled by Whitney Battle-Baptiste and Britt Rusert (here) or as a collection of plates through the Library of Congress (here). I won’t reshare all the resources describing the collection, but do encourage you to check out last week’s newsletter. As you look through those data portraits, you might wonder, “Why would Pat suggest we think about how to generate these figures? A lot of what’s in them people tell us are bad practices.” There are a few reasons. First, my original motivation behind recreating other people’s figures was taken from seeing my son’s replications of other artworks. Those recreations are done to help artists explore their technique. I thought we could do the same with data visualization. If I only ever make line plots, that look like something generated with ggplot2, then I’ll never develop the skills to make scatter plots or do weird things with axis titles, or use my own font choices. I can’t tell you how much I’ve learned about R over the past 6 months by recreating the visuals we have covered. Are they all great visuals? No. But by trying to faithfully recreate them, my technique has really developed. The DuBois data portraits are radically different from the types of plots we make. My understanding is that was intentional. Imagine walking by a poster at a conference with plots that look wildly different from everyone else. You’ll get my attention and I’ll be more likely to stop and have a look. That was what DuBois was trying to do in Paris. He wanted people to stop and see a story about the life of Black people in the US in 1900. There were a lot of negative stories being told by others, but he wanted to tell his own community’s story. So there’s value in learning to make plots that are radically different, because it will force us to use our tools to do unconventional things. In the process we’ll learn to use our tools better. Consider Plate 27... Before you clutch your pearls and shriek, “PIE CHART!”, give it some time. Again, there are other ways of presenting the same data - how would you present them? Later you could try that on your own. Let’s try to do it like DuBois did. Here are the data:
As always, a few things stand out to me that would direct my approach to recreating this “fan plot”. First, it’s a pie chart. Pie charts are best thought of as stacked bar charts drawn in polar coordinates. Something I’ve learned working with polar coordinates is to get things looking right in Cartesian coordinates before pivoting it to polar. It’s too hard to wrap my mind around what’s going on in polar coordinates. We’ll want a single stacked bar. To remove the pie pieces that are on the side, I’d insert a fake category that is about 60% for both races. Later, we can make this transparent or the color of the background. When we convert this to a pie chart, we’ll use Second, something to consider is that if the occupation category is mapped to the fill, then the same category in each race will get merged if we set Third, with a “fake” category to provide space on the sides, we’ll have 12 category-race combinations. We’ll want to use Fourth, there are two types of labels. We can add the numeric labels using Fifth, I love incorporating subtle points in figures. I noticed that both fans have a black line as their edge. Of course the fake category shouldn’t have an edge. I think we can pull this off using At each stage, I’d encourage you to see what the plot looks like in both coordinate systems by flipping back and forth between Finally, if you thought this was fun, I’d encourage you to check out Plate 22. How would you go about generating that unique “bulls eye plot”?
|
Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...
Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...
Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...