What is the most common form of "chartjunk" in the scientific literature?


Hey folks,

If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a reproducible workflow that easily regenerates a figure if I need to change something in one of the panels. Since pictures often accompany plots, being able to use R to generate figures with both is critical.

The benefit for reproduciblity wasn’t what I loved most about this livestream. I was all set to use the {ggplot2} helper package {ggh4x} to make nested facets of pictures. A viewer asked if a different approach that didn’t rely on {ggh4x} might work as well. That question really caught me off guard! So I spent the first hour or so making the panel with {ggh4x} and the rest of the livestream making the same panel using their suggestion. This type of live interaction is one of the main reasons I have grown to love doing livestreams. To be completely honest, although the differences in the finished products were small, I really prefered the straight {ggplot2} approach. Really, go watch the livestream :)


This week I’m going to share panels i and j from Figure 1 of a recently published paper in Nature, “A mechanism to initiate emergency type 2 myelopoiesis”. This paper has a lot of panels like this one. Based on the text of the paper and the appearance of the panels, I’m pretty sure these were made with R’s {ggplot2} package.

These panels have different data but are the same basic design. There’s a lot going on in these plots that I could comment on (wait for Monday’s video!). But for today, I’d like to make a couple of general points.

First, check out where the x-axis crosses the y-axis in both panels. By default, {ggplot2} adds padding or what it calls “expansion” to the left and right of the data on the x-axis and to the top and bottom of the y-axis limits. This is a feature I look for that tells me someone made a plot with {ggplot2}. This isn’t a problem for most types of plots and if often kind of nice. But I’m starting to think it is a problem for bar plots. If we look at panel j, those bars appear to be floating. That’s fine. But because there’s room for negative values, it gives the impression that we could have negative values. In these panels we’re looking at percentages of counts. Those can’t be negative. Now look at panel i. Do you see what they did there? Their bars start on the x-axis, which is in negative space. I have ideas on how they did this, but I’m not sure why they would do this.

I consider the expansion below bar plots one of the few “bad practices” that {ggplot2} reinforces. So how do we get rid of it and have the x-axis hit the y-axis at zero? Here’s an example of a bar plot with negative space on the y-axis

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
theme_classic()

I think that the easiest way to get rid of it would be to add scale_y_continuous() and modify the expand argument

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
theme_classic()

The expansion() function takes two arguments - add and mult. Both take values that can be added to or multipled by to add space on the y-axis. You can do the same thing on the x-axis with scale_x_continuous(). The two values I’m giving mult tell scale_y_continuous() to not add anything to the bottom of the bars and to have the y-axis limit go 5% beyond my data (5% is the default on both sides of continuous data).

Second, I think these bar plots are an example of “chartjunk”. They are so common in the figures I look at that I think I must be missing something. Yet, I have yet to find an instructions to authors document for a journal that indicates a preference for these plots. I also don’t see examples of published reviewer comments asking for them. Why do I think they’re chartjunk? They don’t add anything to the plot and can actually hinder the interpretation of the data.

The bars extend to the mean of the data. That’s a lot of ink to represent something that could have been indicated by a horizontal line. Although it’s not the case here, I often see bars drawn when there are only 2 or 3 points (for an example see last week’s critique video).

How can they hinder interpretation? Without thinking of it, we compare the lengths of things. We compare the lengths of those bars. Because of that it is critical that the bars start at zero. If they don’t very little differences between bars can look quite large. I showed several examples of plots violating this “rule” in scientific literature in last week’s critique video. If one were to only show the points then we go from comparing the length of things to comparing their relative position. Because of this, zooming in on the y-axis isn’t such a problem once the bars are removed.

Finally, I’ll roll out this paper that gives compelling reasons why bar plots are typically never the right answer. I encourage you to give it a read. I think it would be an interesting paper to talk about at a future journal club or lab meeting. If you do, please let me know how your discussion goes

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, The more I peruse the literature, the more I see that researchers need help designing figures to help tell their stories. I don’t just mean the mechanics of creating a figure in R, Python, Prism, or Excel. Rather, if someone had a box of dry erase markers of various colors and they had to give a talk without any slides, what would they draw to tell their story? I don’t mean to trivialize the difficulties. It’s hard! There are many figures I’ve published that I wish I could have a...

Hey folks, I appreciated the emails I received from people after last week’s newsletter. I hope that even if people didn’t agree with what I had to say, it was thought-provoking. Regardless of how a plot is made - R, Prism, Excel (gasp!), or AI (oh my!) - we need to train our eyes and sense of taste to make the most compelling visualization of our data. If you’re interested in working with me on an individual or group level to achieve this goal, let me know. I am offering consultation...

Hey folks, If you’ve watched any of my livestreams when someone asks why I don’t get ChatGPT or something to do a task for me, you probably saw a pained expression come across my face. Part of me dies every time someone tells me that they used some LLM chatbot to solve a problem. I have many reasons for despising the fascination with AI-based tools. I even wrote a commentary that I submitted to mBio in the fall of 2024. Yes, I wrote it. By hand. Then I typed it. No really, I typed it on a...