My second least favorite data visualization type


Hey folks,

This has been a busy week! I’ve been on campus teaching a 3 day, all day, R class. It’s been a while since I’ve done one of these live workshops off campus. If you’re interested in me coming to your campus, you coming to Michigan, or being in a Zoom-based workshop, please let me know! I really love being able to interact with you all in workshops.


If your experience has been at all like my own the past month or so, your conversations have all had a tinge of anxiety about the future. Hearing from former trainees who went for a safe government job have been laid off is pretty depressing. Remembering pulling an all-nighter to finish a grant proposal only to see study sections cancelled or proposals withdrawn because they use the word “diversity” - regardless of the context - is crushing. Needless to say, we’re a bit pre-occupied with things going on in DC.

With that as some context, I was interested to find this visualization from the Pew Research Center for a recent report they published on what different sectors of Americans expect over the next four years.

Of course this is survey data and there is unlikely to be any follow up in 4 years to ask people how they faired or to quantify anything objective to see how people actually faired relative to their expectation. Regardless, it’s pretty hard to see how science is going to be well-served by this administration. But back to the visualization!

This visualization stood out to me because it’s my second least favorite way to visualize data. It’s a stacked bar chart laid on its side. My least favorite way to visualize data is a stacked bar chart, but in polar coordinates… a pie chart. There are two general problems with stacked bar charts. First, in most uses there are more than a dozen categories with that many colors. The colors become impossible to distinguish from each other and there’s too many to remember. Second, we need an anchor point to make comparisons between things on an axis. Typically, that would be zero on the axis (in this case, the y-axis). But, in most stacked bar plots, there are so many categories and none of them share a common reference point. In this example, consider the gray rectangles and how they move around left and right. It’s hard to compare their size across lines because they don’t have the same anchor on the left (or right). Another anchor point on the axis can be at the maximum value if the numbers add up to a common value like 100%.

Thinking about this visual, it’s almost the least bad version of a stacked bar plot. First, there are only 3 categories and it’s easy to distinguish and remember what the colors represent. Second, there is an anchor point on the left to compare how the groups anticipated gaining influence. There’s kind of an anchor point on the right. But, if you add up the values and look closely at the right edge of the point, you’ll see they don’t all add up to 100. The differences aren’t huge, so it’s still possible to easily compare the lose influence sentiment. But I’m left wondering why they didn’t put the “no answer” responses on between the “not be affected” and “lose influence” bars to make the right edge flush.

Those standard critiques aside, I also wonder why they didn’t use more opposing colors for the gain and lose influence categories. They’re both shades of an orangish color. Why not an orange and green color - or colors that we associate with gaining and losing. Perhaps because those would be red and blue, which are associated with the two political parties. That’s fair. But I wonder why gaining influence is on the left rather than on the right. We normally put positively associated things on the right and negatively associated things on the left. That seems odd to me.

Regardless of my critiques, how would I make this plot?

By default, ggplot2 makes a stacked bar plot when you plot the category (e.g. men, women) on the axis aesthetic and another category as the fill aesthetic. So that’s easy enough. Something that stood out to me was that the bars lay horizontally rather than the traditional vertical stacked bars. I’m a fan of the horizontal orientation with data visualization - it makes it easier to read the category labels. To achieve this effect, we can map the categories to the y aesthetic rather than to the x. It’s that simple!

This plot also includes the numbers in the rectangles. I find that to be busy, but accept that many people like this look. Certainly if there’s no an x-axis, the numbers are helpful. But it would be interesting to see the appearance with an x-axis and no numbers in the bars. We can add the numbers in the bars using geom_text(). I think we can also use a position argument, but I’m not entirely sure which one. This will require some experimentation.

An interesting feature in this visualization is that the bars are clustered by broad category. There are groups by gender, race, age, education, and politics. Then there’s a gap between each group. I can think of two ways to create that gap. First, I could create dummy categories with no label and no data. Second, I could make a variable for each broad category and use that with facet_wrap() to make a one column and six row faceted plot. Then we’d remove all the facet ornamentation to make it more attractive. I think I’d use the faceted approach since it’s more elegant.

One more interesting feature is that the sub-categories under each political party use gray text rather than black text. That’s a pretty cool effect. How would we do that? I can’t vectorize axis.text.y = element_text(), but I could use {ggtext}’s element_markdown() and add an HTML span with CSS styling to make those four categories gray. Alternatively, I could plot the category labels with geom_text() and create a variable for the color I want for the text. That seems harder than it probably should be when {ggtext} is available and is pretty awesome.

Finally, I notice that the legend is laid out across the top of the plot. We can do this using the legend.position = "top" argument in theme(). We’ll also need to play with the legend.key.size argument to make the square smaller than the text.

I also notice that there’s a variety of font families and faces in the title, subtitle, and caption. I think we could do all of this with {showtext} to bring in Google fonts and {ggtext} to do any special styling. Anything else stand out to you in this visualization?

If you look through the Pew report, you’ll see other ways they could have plotted the same data. Check out this plot for similar data. How does this compare to a stacked bar plot? What do you like better or less than the one I just described? See if you can come up with a 30,000 foot view of how you’d generate this visual.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I’ve now produced three livestream videos. What do you think? Do you watch them live or watch them later? Or are they too long? I’m looking for honest feedback! I have to admit that if I hadn’t livestreamed these videos, they would not have been produced. It’s nice that I can more or less record and post without any editing. This is still a bit of an experiment. I think fewer people are watching the episodes which makes me worry that this might be an overall step backwards for you...

Hey folks! Do you ever get that feeling where you’re scared to try something? But then you do it anyway… and it turns out way better than you expected? Well that was me on Wednesday morning. I ran my first livestream on YouTube recreating a ridgeline plot from Our World in Data showing the US baby boom. I wrote about it here in the newsletter back in May. The full session was about 2.5 hours. YouTube tells me that 272 people popped in at some point during the session. To be honest, I really...

Hey folks, I need your feedback on an idea! Don’t worry, there’s some visualization stuff at the bottom. I had a video nearly ready to post this week using a ridgeline plot to show the baby boom. I think I did a great job of recreating the plot. But through a series of unfortunate events, I lost the video. I actually recorded the video three times because my computer kept crashing as I was recording it. This was on top of increasing busyness on my part with teaching, proposal writing,...