What is the most common form of "chartjunk" in the scientific literature?


Hey folks,

If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a reproducible workflow that easily regenerates a figure if I need to change something in one of the panels. Since pictures often accompany plots, being able to use R to generate figures with both is critical.

The benefit for reproduciblity wasn’t what I loved most about this livestream. I was all set to use the {ggplot2} helper package {ggh4x} to make nested facets of pictures. A viewer asked if a different approach that didn’t rely on {ggh4x} might work as well. That question really caught me off guard! So I spent the first hour or so making the panel with {ggh4x} and the rest of the livestream making the same panel using their suggestion. This type of live interaction is one of the main reasons I have grown to love doing livestreams. To be completely honest, although the differences in the finished products were small, I really prefered the straight {ggplot2} approach. Really, go watch the livestream :)


This week I’m going to share panels i and j from Figure 1 of a recently published paper in Nature, “A mechanism to initiate emergency type 2 myelopoiesis”. This paper has a lot of panels like this one. Based on the text of the paper and the appearance of the panels, I’m pretty sure these were made with R’s {ggplot2} package.

These panels have different data but are the same basic design. There’s a lot going on in these plots that I could comment on (wait for Monday’s video!). But for today, I’d like to make a couple of general points.

First, check out where the x-axis crosses the y-axis in both panels. By default, {ggplot2} adds padding or what it calls “expansion” to the left and right of the data on the x-axis and to the top and bottom of the y-axis limits. This is a feature I look for that tells me someone made a plot with {ggplot2}. This isn’t a problem for most types of plots and if often kind of nice. But I’m starting to think it is a problem for bar plots. If we look at panel j, those bars appear to be floating. That’s fine. But because there’s room for negative values, it gives the impression that we could have negative values. In these panels we’re looking at percentages of counts. Those can’t be negative. Now look at panel i. Do you see what they did there? Their bars start on the x-axis, which is in negative space. I have ideas on how they did this, but I’m not sure why they would do this.

I consider the expansion below bar plots one of the few “bad practices” that {ggplot2} reinforces. So how do we get rid of it and have the x-axis hit the y-axis at zero? Here’s an example of a bar plot with negative space on the y-axis

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
theme_classic()

I think that the easiest way to get rid of it would be to add scale_y_continuous() and modify the expand argument

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
theme_classic()

The expansion() function takes two arguments - add and mult. Both take values that can be added to or multipled by to add space on the y-axis. You can do the same thing on the x-axis with scale_x_continuous(). The two values I’m giving mult tell scale_y_continuous() to not add anything to the bottom of the bars and to have the y-axis limit go 5% beyond my data (5% is the default on both sides of continuous data).

Second, I think these bar plots are an example of “chartjunk”. They are so common in the figures I look at that I think I must be missing something. Yet, I have yet to find an instructions to authors document for a journal that indicates a preference for these plots. I also don’t see examples of published reviewer comments asking for them. Why do I think they’re chartjunk? They don’t add anything to the plot and can actually hinder the interpretation of the data.

The bars extend to the mean of the data. That’s a lot of ink to represent something that could have been indicated by a horizontal line. Although it’s not the case here, I often see bars drawn when there are only 2 or 3 points (for an example see last week’s critique video).

How can they hinder interpretation? Without thinking of it, we compare the lengths of things. We compare the lengths of those bars. Because of that it is critical that the bars start at zero. If they don’t very little differences between bars can look quite large. I showed several examples of plots violating this “rule” in scientific literature in last week’s critique video. If one were to only show the points then we go from comparing the length of things to comparing their relative position. Because of this, zooming in on the y-axis isn’t such a problem once the bars are removed.

Finally, I’ll roll out this paper that gives compelling reasons why bar plots are typically never the right answer. I encourage you to give it a read. I think it would be an interesting paper to talk about at a future journal club or lab meeting. If you do, please let me know how your discussion goes

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, This week I’ve been teaching one of my 3 day R workshops as part of my official teaching duties at the U of Michigan. I really enjoy teaching these classes! I offer recorded versions of these workshops that use microbiome data or other types of data to help motivate my teaching of R’s tidyverse packages. If you would like to purchase your own version of these workshop click on those links! Also, if you would like me to teach a live workshop to your group, reply to this email and...

Hey folks, If you missed it, on Wednesday I did a livestream where I made a stacked barplot and pronounced it good. No, I wasn’t drinking anything! But it’s a reminder to think about the question before finding the best data visualization strategy. I think this highlights the value of the constructive approach I’ve been trying to take to critiquing data visualizations. The first steps are to establish the question and figure out the question. If you aren’t a “regular”, I think you’re really...

Hey folks, As I mentioned last week, I’m exploring the possibility of holding live, in person, workshops again like I did before the pandemic. If this is something that interests you, please let me know. My thought would be to hold them at an affordable hotel near the Detroit airport (DTW). But, if you would like to host me to teach a workshop, I would be open to that as well. This week, I want to call your attention to a plot that I would not encourage you to make. This comes form “Targeted...