Hey folks, This has been a busy week! I’ve been on campus teaching a 3 day, all day, R class. It’s been a while since I’ve done one of these live workshops off campus. If you’re interested in me coming to your campus, you coming to Michigan, or being in a Zoom-based workshop, please let me know! I really love being able to interact with you all in workshops. If your experience has been at all like my own the past month or so, your conversations have all had a tinge of anxiety about the future. Hearing from former trainees who went for a safe government job have been laid off is pretty depressing. Remembering pulling an all-nighter to finish a grant proposal only to see study sections cancelled or proposals withdrawn because they use the word “diversity” - regardless of the context - is crushing. Needless to say, we’re a bit pre-occupied with things going on in DC. With that as some context, I was interested to find this visualization from the Pew Research Center for a recent report they published on what different sectors of Americans expect over the next four years. Of course this is survey data and there is unlikely to be any follow up in 4 years to ask people how they faired or to quantify anything objective to see how people actually faired relative to their expectation. Regardless, it’s pretty hard to see how science is going to be well-served by this administration. But back to the visualization! This visualization stood out to me because it’s my second least favorite way to visualize data. It’s a stacked bar chart laid on its side. My least favorite way to visualize data is a stacked bar chart, but in polar coordinates… a pie chart. There are two general problems with stacked bar charts. First, in most uses there are more than a dozen categories with that many colors. The colors become impossible to distinguish from each other and there’s too many to remember. Second, we need an anchor point to make comparisons between things on an axis. Typically, that would be zero on the axis (in this case, the y-axis). But, in most stacked bar plots, there are so many categories and none of them share a common reference point. In this example, consider the gray rectangles and how they move around left and right. It’s hard to compare their size across lines because they don’t have the same anchor on the left (or right). Another anchor point on the axis can be at the maximum value if the numbers add up to a common value like 100%. Thinking about this visual, it’s almost the least bad version of a stacked bar plot. First, there are only 3 categories and it’s easy to distinguish and remember what the colors represent. Second, there is an anchor point on the left to compare how the groups anticipated gaining influence. There’s kind of an anchor point on the right. But, if you add up the values and look closely at the right edge of the point, you’ll see they don’t all add up to 100. The differences aren’t huge, so it’s still possible to easily compare the lose influence sentiment. But I’m left wondering why they didn’t put the “no answer” responses on between the “not be affected” and “lose influence” bars to make the right edge flush. Those standard critiques aside, I also wonder why they didn’t use more opposing colors for the gain and lose influence categories. They’re both shades of an orangish color. Why not an orange and green color - or colors that we associate with gaining and losing. Perhaps because those would be red and blue, which are associated with the two political parties. That’s fair. But I wonder why gaining influence is on the left rather than on the right. We normally put positively associated things on the right and negatively associated things on the left. That seems odd to me. Regardless of my critiques, how would I make this plot? By default, ggplot2 makes a stacked bar plot when you plot the category (e.g. men, women) on the axis aesthetic and another category as the fill aesthetic. So that’s easy enough. Something that stood out to me was that the bars lay horizontally rather than the traditional vertical stacked bars. I’m a fan of the horizontal orientation with data visualization - it makes it easier to read the category labels. To achieve this effect, we can map the categories to the This plot also includes the numbers in the rectangles. I find that to be busy, but accept that many people like this look. Certainly if there’s no an x-axis, the numbers are helpful. But it would be interesting to see the appearance with an x-axis and no numbers in the bars. We can add the numbers in the bars using An interesting feature in this visualization is that the bars are clustered by broad category. There are groups by gender, race, age, education, and politics. Then there’s a gap between each group. I can think of two ways to create that gap. First, I could create dummy categories with no label and no data. Second, I could make a variable for each broad category and use that with One more interesting feature is that the sub-categories under each political party use gray text rather than black text. That’s a pretty cool effect. How would we do that? I can’t vectorize Finally, I notice that the legend is laid out across the top of the plot. We can do this using the I also notice that there’s a variety of font families and faces in the title, subtitle, and caption. I think we could do all of this with If you look through the Pew report, you’ll see other ways they could have plotted the same data. Check out this plot for similar data. How does this compare to a stacked bar plot? What do you like better or less than the one I just described? See if you can come up with a 30,000 foot view of how you’d generate this visual.
|
Hey folks, I really hope you enjoyed the series of newsletters and videos of me recreating the visualizations presented by W.E.B. DuBois at the 1900 Paris Exposition. I can’t express how much I enjoyed making them. Some of them were pretty tricky and required a lot of work. But I think it was worth it! It definitely forced me to use some new-to-me tools like geom_polygon() and geom_sf(). Please let me know what you thought of the series! I wonder if there’d be any interest in a companion to...
Hey folks, I can’t tell you how much I’ve enjoyed recreating the “data portraits” from the collection of visualizations that WEB DuBois and his colleagues presented at the 1900 Paris Exposition. You can find the entire collection of “data portraits” in a book assembled by Whitney Battle-Baptiste and Britt Rusert (here) or as a collection of plates through the Library of Congress (here). Perhaps this isn’t so obvious to my non-US readers and viewers, but February is Black History month. In...
Hey folks, I hope you have enjoyed the current series of newsletters and videos recreating “data portraits” from the WEB DuBois collection of visuals he showed at the 1900 Paris Exhibition. You can find the entire collection of “data portraits” in a book assembled by Whitney Battle-Baptiste and Britt Rusert (here) or as a collection of plates through the Library of Congress (here). I’ve really appreciated the positive feedback! These figures are pretty different from what we do in modern data...