Making pie charts for my biggest fans...


Hey folks,

I hope you enjoyed thinking last week about how you would recreate Plate 12 from the WEB DuBois collection of visuals he showed at the 1900 Paris Exhibition using ggplot2 and related R tools. You can find the entire collection of “data portraits” in a book assembled by Whitney Battle-Baptiste and Britt Rusert (here) or as a collection of plates through the Library of Congress (here). I won’t reshare all the resources describing the collection, but do encourage you to check out last week’s newsletter.

As you look through those data portraits, you might wonder, “Why would Pat suggest we think about how to generate these figures? A lot of what’s in them people tell us are bad practices.” There are a few reasons. First, my original motivation behind recreating other people’s figures was taken from seeing my son’s replications of other artworks. Those recreations are done to help artists explore their technique.

I thought we could do the same with data visualization. If I only ever make line plots, that look like something generated with ggplot2, then I’ll never develop the skills to make scatter plots or do weird things with axis titles, or use my own font choices. I can’t tell you how much I’ve learned about R over the past 6 months by recreating the visuals we have covered. Are they all great visuals? No. But by trying to faithfully recreate them, my technique has really developed.

The DuBois data portraits are radically different from the types of plots we make. My understanding is that was intentional. Imagine walking by a poster at a conference with plots that look wildly different from everyone else. You’ll get my attention and I’ll be more likely to stop and have a look. That was what DuBois was trying to do in Paris. He wanted people to stop and see a story about the life of Black people in the US in 1900. There were a lot of negative stories being told by others, but he wanted to tell his own community’s story. So there’s value in learning to make plots that are radically different, because it will force us to use our tools to do unconventional things. In the process we’ll learn to use our tools better.

Consider Plate 27...

Before you clutch your pearls and shriek, “PIE CHART!”, give it some time. Again, there are other ways of presenting the same data - how would you present them? Later you could try that on your own. Let’s try to do it like DuBois did. Here are the data:


occupations <- tribble(
~category, ~negroes, ~whites,
"agriculture", 62, 64,
"manufacturing", 5, 13.5,
"services", 28, 5.5,
"professions", 0.5, 4,
"trade", 4.5, 13
)

As always, a few things stand out to me that would direct my approach to recreating this “fan plot”.

First, it’s a pie chart. Pie charts are best thought of as stacked bar charts drawn in polar coordinates. Something I’ve learned working with polar coordinates is to get things looking right in Cartesian coordinates before pivoting it to polar. It’s too hard to wrap my mind around what’s going on in polar coordinates. We’ll want a single stacked bar. To remove the pie pieces that are on the side, I’d insert a fake category that is about 60% for both races. Later, we can make this transparent or the color of the background. When we convert this to a pie chart, we’ll use coord_radial(). This function allows us to set the starting position relative to 12 o’clock in radians (360 degrees = 2 * pi). Perhaps start = -pi / 3?. We’ll also want to turn off the expand so that our pie closes.

Second, something to consider is that if the occupation category is mapped to the fill, then the same category in each race will get merged if we set x = 1 in our aes() function. Alternatively, if you set x = race then you’ll get two bars in Cartesian coordinates and concentric circles when you put it in polar coordinates. So, we’ll have to make a column that concatenates the category and race. Then we’ll have to convert that category to a factor since the default will be to order the category-race values alphabetically.

Third, with a “fake” category to provide space on the sides, we’ll have 12 category-race combinations. We’ll want to use scale_fill_manual to get those colors right. We’ll repeat each set of 6 colors twice for the two races. I think we can use NA as the fill for the fake category so that it is transparent. In the occupation tibble above, I used abbreviated names for the categories. You could write them out in your own version. I find that causes more problems than it’s worth down the road. Typically, I’d either use the labels argument in factor() when I reorder the levels or I’d use the labels argument in scale_fill_manual to set the long names for the legend. This brings us to the legend… I think I can insert a legend using the guides() and theme() functions. I’d basically make it two columns, change the glyph, make it really big and transparent, and centered. Alternatively, I could use annotate() to place the circles and labels. I’d like to push myself and not use annotate() for the legend on this figure.

Fourth, there are two types of labels. We can add the numeric labels using geom_text(). The left-right position will be set by the angle or the y value and the up-down position will be set by the x value. It looks like we’ll need to do some manual adjustment to place the percentage of each wedge. We’ll have to figure out something special for the fake wedge. Perhaps when we format the numbers to have a percent sign, we can make the label for the blank wedges to be "". The second type of label is for the two races at the top and bottom of the fans. This I would likely do with annotate. Illustrating my point from earlier about what x and y represent, I’d likely use x = 1.1 and y = c(0, 160). I’m not sure if the labels will come out reading right side up or upside down, but if we need to rotate any, we can use angle as an aesthetic for geom_text(). Of course there’s also a title to the plot that should be easy enough to add.

Fifth, I love incorporating subtle points in figures. I noticed that both fans have a black line as their edge. Of course the fake category shouldn’t have an edge. I think we can pull this off using geom_segment() with start and end positions corresponding to 0 and 100 and 160 and 260 for both segments. I think this should work.

At each stage, I’d encourage you to see what the plot looks like in both coordinate systems by flipping back and forth between coord_radial() and coord_cartesian(). I think this will give you a better sense of what is going on in your figure.

Finally, if you thought this was fun, I’d encourage you to check out Plate 22. How would you go about generating that unique “bulls eye plot”?

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, Earlier this week, those of us in the US celebrated Memorial Day. For many, this marks the unofficial start of summer. I suppose the clock is now ticking until Labor Day, which is the unofficial end of summer. Let me be the jerk to tell you that you have 100 days left to accomplish all of your summer goals. I suspect that for many of you writing papers and putting together conference posters and talks are on your list of goals. Generating attractive visualizations of your data is...

Hey folks, I’ve been getting asked to give more talks about data visualization and my experiences critiquing visualization. It’s been a lot of fun to engage with live audiences. I enjoy learning about their experiences, motivations, and limitations. As much as I love this newsletter and the content I post to YouTube, it’s clear that it isn’t a substitute to talking to people without the filter of email or a chat box. So, if you’re interested in working with me on an individual or group level...

Hey folks, The more I peruse the literature, the more I see that researchers need help designing figures to help tell their stories. I don’t just mean the mechanics of creating a figure in R, Python, Prism, or Excel. Rather, if someone had a box of dry erase markers of various colors and they had to give a talk without any slides, what would they draw to tell their story? I don’t mean to trivialize the difficulties. It’s hard! There are many figures I’ve published that I wish I could have a...