Some thoughts on how to recreate a descending bar plot with {ggplot2}


Hey folks,

This week I hosted the first live ensemble programming session. It went really well. We had fun and learned a lot. If you’d like to get in on these types of sessions, let me know and I’ll be sure you get a special invitation for the next series. I really believe that this form of instruction is critical to making the material learned in compact workshops stick for the long term.


I hope you had fun working with the broken axis chart last week! This week I want you to look at Figures 5 of “Strategies for effective high pressure germination or inactivation of Bacillus spores involving nisin ” by Rosa Heydenreich and colleagues, which was recently published in Applied and Environmental Microbiology.

You probably would like a little context. This is from a paper looking at using pressure to get bacteria to form spores or leave the spore state. The analysis was done before and after a heat treatment (as indicated in the legend) using four different methods (across the x-axis). They measured the number of spores observed for each condition and expressed it as the log fraction of the number of the number of spores put into the experiment (No = 10^9). The error bars indicate the standard deviation across at least three independent experiments.

What type of plot is this? What stands out to you about this figure? What do you like about it? What don’t you like about it? Can you outline the steps you would take to generate the figure? What are some of the steps you aren’t sure about and would like to learn? These are questions that I’d strongly encourage you to ask about any visual you are looking at because I think they’ll help you to develop your “taste” in data visualizations and strengthen you skills in generating those visualizations.

This is a bar plot. Here are five things that caught my eye. First, this bar plot has it’s x-axis at the top and descends into negative log values. Second, they have hashing in the bars for the “after heat” category. Third, their legend is below the plot, has italics, and has a box around it. Fourth, they only have horizontal grid lines with a thicker, dashed grid line to indicate the limit of detection at -8. Finally, I noticed that the tick marks move into plot rather than default of plot.

Here’s some data for you to experiment with:


heat_kill <- tibble(
treatment = rep(
c("A", "B", "C", "D"), each = 2),
heat = rep(c("before heat", "after heat"), 4),
log_fraction = c(-1, -1, -3.75, -2.33, -5.9, -5.9, -8.2, -8),
sd = rep(0.1, 8))

First, let’s talk about the bar plot. You may be tempted to use geom_bar() to generate your bars. Unfortunately, that will get you an error. That is because geom_bar() counts the number of observations in each category and we only have one observation per category. Therefore, you want to use the similar function geom_col() which will plot the data mapped to the y-axis as a bar starting at zero. This plot is a little more complicated than a typical bar plot because each value on the x-axis is represented by two bars. You can put the treatments across the x-axis by mapping treatment to x and log_fraction to y. To get the two bars for each x-axis value, you need to know two steps. First, map heat to the fill aesthetic. But this will give you a stacked bar plot (booo!). To unstack the bars, you’ll need to set the position argument within geom_col(). Look at the help documentation and see if you can figure out each one. I’ll tell you below if you read all the way to the end! A subtle touch to indicate that the data start at 0 and descend is to draw a solid black line that crosses the y-axis at 0. You can achieve this look using geom_hline(). Finally, you can add the error bars with geom_errorbar() by mapping log_fraction - sd to ymin and log_fraction + sd to ymax. You might have to play with the width and position arguments to get their placement correct. Remember that wherever I talk about “mapping” data from a column to an aesthetic, this should typically be done within the aes() function and I normally put that as an argument to ggplot(). Do you know which argument aes() is assigned to?

The second eye catcher is that they have diagonal lines for the bars representing what happened after the heat treatment. I think this general look comes to us from many years of using M$Excel. My personal preference would be to leave out the diagonal hashing since I think it unnecessarily clutters the bars. Why not use the two shades of blue and call it a day? Anyway, there is a cool looking {ggpattern} package that will add patterns to your plots including diagonals to bar plots. They have a special geom to use in place of geom_col(), which is geom_col_pattern(). If you give that package a try, let me know how it goes!

Third, they were able to format their treatment categories so that they could nicely tuck the legend on the left side of the axis. How’d they do that? I’d likely use scale_x_discrete with written out labels that match what is in the figure (use \n for line breaks) that match the values in treatment using the breaks argument (remember you can use unicode for the degree sign). To move the legend I would likely make use of legend.position = "inside" and then use the legend.position.inside argument in theme(). The legend.position.inside argument allows you to provide a vector of x and y positions to place the legend. You’ll have to play around with x and y values to get a look you like. You can use negative values too. To get rid of the title, you can either do that in scale_manual_fill() when you fix the colors of the bars or you can use fill = NULL in the labs() function when you rename the x and y-axes. To put a box around the legend I would modify the legend.background argument in theme() and give it element_rect(color = "black"). To make sure the padding is the same all the way around the legend, you can use the theme()legend.margin argument and give it margin(). The margin() function takes four numbers for the top, right, bottom, and left margins (remember: TRouBLe). Play with those numbers to get the right look - maybe start with 1 and go from there. Finally, to get the italic font face on the legend labels, you can again use theme() and modify legend.text with element_text() and give it face = "italics". That’s a lot of manipulating the theme() function! I commend these authors for doing their best to conserve space by putting the legend in some extra whitespace.

Fourth, they have done some interesting things with their grid lines. If you use the theme_classic() you’ll get a plot that has no grid lines. That is a good starting point for this figure. Now we’re going to add gray horizontal grid lines. We can do this by using the theme() function. This time we’ll use the panel.grid.major.y argument set to element_line(). You can play with the parameters of element_line to get the look you like. For the limit of detection, similar to how we placed the line at 0, you can use geom_hline(). To get the dash appearance to the line, you can use the linetype argument and assign different integer values to the argument. You’ll see the grid line under the dashed line. See if you can figure out how to remove the theme() argument and make your own grid lines to stop this from happening.

Finally, the plot is doing interesting things with the x-axis ticks by having them go into the plot and by removing them from the y-axis. How would you do that? If your mind went to theme(), you’re catching on! There are a suite of axis.ticks.length arguments that take the unit() function. To move the x-axis ticks to the inside of the plot, you can use axis.ticks.length.x and set it equal to unit(-4, "pt"). That negative sign is giving it a negative length. Normally we’d use a positive value to have the tick on the outside of the plot. How would you use unit() to remove the ticks from the y-axis? Can you think of another function you might use to remove the ticks from the y-axis?

There’s a lot of cool stuff going on in a relatively simple plot! I’m not sure what software they used to make this plot, but it has some really nice points. The more I looked at this figure, the more things I noticed are different from the default {ggplot2} appearance. I’ll leave these to you to figure out on your own. Im’ pretty sure I’ve touched on these in past newsletters: the order of the heat variable values is switched, the y-axis title has two subscripts and there is no x-axis title, the y-axis goes above 0 and down to -10 with no expansion on it at the bottom, and the fill colors are two nice shades of blue.

As always if you have a cool plot you’d like to share with me for a future newsletter, feel free to reply to this email. Oh yeah, that geom_col() argument that will “dodge” the stacked bars is position = "dodge" or position = position_dodge().

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development
man floating holding on orange stick white people watching on the street

Hey folks, I have long since given up trying to anticipate what types of videos will resonate with people on YouTube. One of my most popular videos shows people how to make stacked bar plots. Throughout it, I tell people that these are a horrible way to visualize data. It’s my third most viewed video. I thought a video on slope plots would be popular. Nope. People panned last week’s episode. But Venn diagrams - holy cats! People are really geeking out about this week’s episodes on Venn...

Hey folks, I’m really grateful for the people who have emailed me recently to thank me for making the recreation and makeover videos. I’ve been excited to see the types of figures some of you are trying to make. It’s really been a great part of this work for me. Thank you! Eric Hill is a loyal Riffomonas Channel viewer who recently sent me an animation he made using the p5.js platform. The animation shows his son’s performance relative to other runners in the prestigious Nike Cross Nationals...

Hey folks, One of the benefits of sending out these newsletters and making my YouTube videos is that I get a ton of practice. I can’t emphasize how much practice has paid off in learning to use dplyr, ggplot2, and other packages. Reproducing published figures has really helped me to dive into parts of ggplot2 that I wouldn’t normally use because I make plots that use the features of ggplot2 that I know. By expanding my knowledge of ggplot2, I’m finding that the plots I make from scratch are...