Hey folks, This has been a busy week! I’ve been on campus teaching a 3 day, all day, R class. It’s been a while since I’ve done one of these live workshops off campus. If you’re interested in me coming to your campus, you coming to Michigan, or being in a Zoom-based workshop, please let me know! I really love being able to interact with you all in workshops. If your experience has been at all like my own the past month or so, your conversations have all had a tinge of anxiety about the future. Hearing from former trainees who went for a safe government job have been laid off is pretty depressing. Remembering pulling an all-nighter to finish a grant proposal only to see study sections cancelled or proposals withdrawn because they use the word “diversity” - regardless of the context - is crushing. Needless to say, we’re a bit pre-occupied with things going on in DC. With that as some context, I was interested to find this visualization from the Pew Research Center for a recent report they published on what different sectors of Americans expect over the next four years. Of course this is survey data and there is unlikely to be any follow up in 4 years to ask people how they faired or to quantify anything objective to see how people actually faired relative to their expectation. Regardless, it’s pretty hard to see how science is going to be well-served by this administration. But back to the visualization! This visualization stood out to me because it’s my second least favorite way to visualize data. It’s a stacked bar chart laid on its side. My least favorite way to visualize data is a stacked bar chart, but in polar coordinates… a pie chart. There are two general problems with stacked bar charts. First, in most uses there are more than a dozen categories with that many colors. The colors become impossible to distinguish from each other and there’s too many to remember. Second, we need an anchor point to make comparisons between things on an axis. Typically, that would be zero on the axis (in this case, the y-axis). But, in most stacked bar plots, there are so many categories and none of them share a common reference point. In this example, consider the gray rectangles and how they move around left and right. It’s hard to compare their size across lines because they don’t have the same anchor on the left (or right). Another anchor point on the axis can be at the maximum value if the numbers add up to a common value like 100%. Thinking about this visual, it’s almost the least bad version of a stacked bar plot. First, there are only 3 categories and it’s easy to distinguish and remember what the colors represent. Second, there is an anchor point on the left to compare how the groups anticipated gaining influence. There’s kind of an anchor point on the right. But, if you add up the values and look closely at the right edge of the point, you’ll see they don’t all add up to 100. The differences aren’t huge, so it’s still possible to easily compare the lose influence sentiment. But I’m left wondering why they didn’t put the “no answer” responses on between the “not be affected” and “lose influence” bars to make the right edge flush. Those standard critiques aside, I also wonder why they didn’t use more opposing colors for the gain and lose influence categories. They’re both shades of an orangish color. Why not an orange and green color - or colors that we associate with gaining and losing. Perhaps because those would be red and blue, which are associated with the two political parties. That’s fair. But I wonder why gaining influence is on the left rather than on the right. We normally put positively associated things on the right and negatively associated things on the left. That seems odd to me. Regardless of my critiques, how would I make this plot? By default, ggplot2 makes a stacked bar plot when you plot the category (e.g. men, women) on the axis aesthetic and another category as the fill aesthetic. So that’s easy enough. Something that stood out to me was that the bars lay horizontally rather than the traditional vertical stacked bars. I’m a fan of the horizontal orientation with data visualization - it makes it easier to read the category labels. To achieve this effect, we can map the categories to the This plot also includes the numbers in the rectangles. I find that to be busy, but accept that many people like this look. Certainly if there’s no an x-axis, the numbers are helpful. But it would be interesting to see the appearance with an x-axis and no numbers in the bars. We can add the numbers in the bars using An interesting feature in this visualization is that the bars are clustered by broad category. There are groups by gender, race, age, education, and politics. Then there’s a gap between each group. I can think of two ways to create that gap. First, I could create dummy categories with no label and no data. Second, I could make a variable for each broad category and use that with One more interesting feature is that the sub-categories under each political party use gray text rather than black text. That’s a pretty cool effect. How would we do that? I can’t vectorize Finally, I notice that the legend is laid out across the top of the plot. We can do this using the I also notice that there’s a variety of font families and faces in the title, subtitle, and caption. I think we could do all of this with If you look through the Pew report, you’ll see other ways they could have plotted the same data. Check out this plot for similar data. How does this compare to a stacked bar plot? What do you like better or less than the one I just described? See if you can come up with a 30,000 foot view of how you’d generate this visual.
|
Hey folks, I’m gearing up to teach a 1-day (6 hours) data visualization workshop on May 9th. This workshop will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. From this workshop, I hope that you would be able to go off on your own journey learning more advanced topics. You can learn more and register by clicking the button...
Hey folks, Long time friends of Riffomonas know that I’ve been teaching data science classes for close to 20 years. The hallmark of my teaching has been three-day workshops where I either teach R (here and here) or the mothur software package. I’ve gotten feedback that three days is just too much time for people to carve out of their busy schedules. So, I’m excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. This will cover an introduction to the ggplot2 package....
Hey folks, I’m really excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. It will cover the basics of ggplot2. If you’ve been following along this newsletter for anytime, you know I’ve thought a lot about how we learn. A critical element of learning is to create a mental model that we can hang ideas on to flesh out our understanding of a concept. The “grammar of graphics” is one such mental model for building plots. It is instantiated in ggplot2 - that’s the “gg”...