What is the most common form of "chartjunk" in the scientific literature?


Hey folks,

If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a reproducible workflow that easily regenerates a figure if I need to change something in one of the panels. Since pictures often accompany plots, being able to use R to generate figures with both is critical.

The benefit for reproduciblity wasn’t what I loved most about this livestream. I was all set to use the {ggplot2} helper package {ggh4x} to make nested facets of pictures. A viewer asked if a different approach that didn’t rely on {ggh4x} might work as well. That question really caught me off guard! So I spent the first hour or so making the panel with {ggh4x} and the rest of the livestream making the same panel using their suggestion. This type of live interaction is one of the main reasons I have grown to love doing livestreams. To be completely honest, although the differences in the finished products were small, I really prefered the straight {ggplot2} approach. Really, go watch the livestream :)


This week I’m going to share panels i and j from Figure 1 of a recently published paper in Nature, “A mechanism to initiate emergency type 2 myelopoiesis”. This paper has a lot of panels like this one. Based on the text of the paper and the appearance of the panels, I’m pretty sure these were made with R’s {ggplot2} package.

These panels have different data but are the same basic design. There’s a lot going on in these plots that I could comment on (wait for Monday’s video!). But for today, I’d like to make a couple of general points.

First, check out where the x-axis crosses the y-axis in both panels. By default, {ggplot2} adds padding or what it calls “expansion” to the left and right of the data on the x-axis and to the top and bottom of the y-axis limits. This is a feature I look for that tells me someone made a plot with {ggplot2}. This isn’t a problem for most types of plots and if often kind of nice. But I’m starting to think it is a problem for bar plots. If we look at panel j, those bars appear to be floating. That’s fine. But because there’s room for negative values, it gives the impression that we could have negative values. In these panels we’re looking at percentages of counts. Those can’t be negative. Now look at panel i. Do you see what they did there? Their bars start on the x-axis, which is in negative space. I have ideas on how they did this, but I’m not sure why they would do this.

I consider the expansion below bar plots one of the few “bad practices” that {ggplot2} reinforces. So how do we get rid of it and have the x-axis hit the y-axis at zero? Here’s an example of a bar plot with negative space on the y-axis

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
theme_classic()

I think that the easiest way to get rid of it would be to add scale_y_continuous() and modify the expand argument

R
mtcars %>%
ggplot(aes(x = factor(cyl))) +
geom_bar() +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
theme_classic()

The expansion() function takes two arguments - add and mult. Both take values that can be added to or multipled by to add space on the y-axis. You can do the same thing on the x-axis with scale_x_continuous(). The two values I’m giving mult tell scale_y_continuous() to not add anything to the bottom of the bars and to have the y-axis limit go 5% beyond my data (5% is the default on both sides of continuous data).

Second, I think these bar plots are an example of “chartjunk”. They are so common in the figures I look at that I think I must be missing something. Yet, I have yet to find an instructions to authors document for a journal that indicates a preference for these plots. I also don’t see examples of published reviewer comments asking for them. Why do I think they’re chartjunk? They don’t add anything to the plot and can actually hinder the interpretation of the data.

The bars extend to the mean of the data. That’s a lot of ink to represent something that could have been indicated by a horizontal line. Although it’s not the case here, I often see bars drawn when there are only 2 or 3 points (for an example see last week’s critique video).

How can they hinder interpretation? Without thinking of it, we compare the lengths of things. We compare the lengths of those bars. Because of that it is critical that the bars start at zero. If they don’t very little differences between bars can look quite large. I showed several examples of plots violating this “rule” in scientific literature in last week’s critique video. If one were to only show the points then we go from comparing the length of things to comparing their relative position. Because of this, zooming in on the y-axis isn’t such a problem once the bars are removed.

Finally, I’ll roll out this paper that gives compelling reasons why bar plots are typically never the right answer. I encourage you to give it a read. I think it would be an interesting paper to talk about at a future journal club or lab meeting. If you do, please let me know how your discussion goes

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks! Before launching into this week’s visualization, I’m looking for a bit of feedback. Since November, I’ve settled into a new routine with this newsletter and the YouTube channel. Each week this newsletter introduces a visualization at a 30,000 ft view or discusses a specific topic in some depth (example). The following Monday I post a video critiquing the visualization (example). Then on Wednesday (or Tuesday like this past week), I livestream a video where I recreate the...

Hey folks! I just got back from a seminar. I’m still trying to stretch out my eyes from straining to see the small text on each slide! If you don’t know why I’m brining this up, then you must have missed the videos I posted earlier this week. I was discussing the factors we should consider when converting figures designed for papers to figures designed to a slide deck. You can see me critique a figure from my own lab here and the livestream where I refactor the figure can be found here. I’d...

Hey folks, I was a student-invited speaker at the Syracuse University Biology department this week. It was great to meet with them and hear how they are benefiting from these newsletters and my videos. As much as I love posting newsletters and videos, seeing people light up at ideas, laugh at my jokes, and tell me how they are using what I teach them is like jet fuel. I actually gave two talks. One talk covered what I’ve learned about data visualization by critiquing, recreating, and remaking...