Visualizing the timeline of antibiotic discovery and resistance with ggplot2


Hey folks,

I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program. Why not? Well, I find that people get too hung up on tools. When they get frustrated with the tools they revert to their previous practices. By focusing on concepts, you’ll be able to design and critique any visualization. From there, you can use any tool - even a pencil and piece of paper - to implement your design. Click this button to learn more.


This week, I want to talk about a data visualization that I saw included in a presentation I was at earlier this week. This plot shows the discovery, first clinical use, and first report of resistance for 38 classes of antibiotics.

This is Figure 3 of the article, “Derivation of a Precise and Consistent Timeline for Antibiotic Development” by Stennett, Back, and Race, which was published in the journal Antibiotcs. It’s in an open access journal, so be sure to read the whole thing. Conveniently, the data are provided in Table 1 although it’s caption says it’s for Figures 1 and 2 - it’s actually for Figures 2 and 3.

What stands out about this figure? Well, it was published in 2022 and there hadn’t been any new classes of antibiotics come to the clinic in the previous 15 years. Also, resistance has been found to nearly every class of antibiotics. Yikes!

Beyond those scary stories, what stands out about the design of the figure? First, the orange bars are the “development windows” indicating the time between the discovery and first clinical use. The blue bars are the “resistance windows” indicating the time between the first clinical use and finding resistance. I would likely create those bars using geom_segment(). The y-axis would be the class of antibiotic and the x-axis would be the year.

To pull this off, we’d need four columns: (1) the class of antibiotic, (2) whether the row was in the development or resistance window, (3) the initial year of the window, and (4) the final year of the window. To mark the start and end year of each window we’ll likely need to do some work with pivot_longer(). The data in Table 1 has four columns - “Antibiotic Class”, “Discovery Date”, “Clinical Use Date”, “Resistance Date”. I’d likely rename those “date” columns to be development_start, development_end, and resistance_end. I’d make a copy of development_end and name it resistance_start. Then we could use pivot_longer() with two values in the names_to argument to create columns for the window and the period. This will create a window column with values of “development” and “resistance” and a period column with values of “start” and “end”. It will create a column called value that has the year. Then I’d use pivot_wider() using the period column with the names_from argument and the value column with the values_from argument. Got it? :)

With geom_segment(), I’d then map the start column to the x aesthetic, the end column to the xend aesthetic, the class column to the y aesthetic, and the window column to the color aesthetic. We’ll likely need to order the classes, which we can do with fct_reorder() using the development start date. To make the segments thick, I'd use the linewidth argument.

The next challenge will be adding the class name to either the left or right side of the bars. I’d likely create x_label and x_hjust columns to set the position of the label. If it’s on the left side of the bars it will be right justified (hjust = 1) and if it’s on the right side of the bars it will be left justified (hjust = 0). To set the x-axis position I’d add a nudge factor to the date. This will probably take some fiddling to make it look right.

Another interesting element of this figure is that the authors put the x-axis on the top and bottom of the plot. In my opinion, this design choice is odd. No doubt they wanted to make it easier for us to see the dates. But they made the size of the font so small that it’s pretty hard to read. I’d prefer including fewer year labels (maybe every 20 years?), but making the font size larger, and adding vertical gridlines. The larger font and gridlines should do a better job of making the dates easier to interpret.

Finally, the authors used a serif font - Times? - in this figure. It looks pretty weird to my eye. The font of the text in the PDF version of the paper is also a serif font, but the font of the text in the HTML version is a sans serif font (WHY!?!). Thinking back to how I discovered this figure, I think it’s useful to know how to customize these types of figures for your own use so that the font and style choice doesn’t look weird when you include it in your own materials.

What do you think of this figure? Feel free to email me back and let me know your thoughts!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, Did you know that you can do statistics in R? HA! Of course it is. As the first sentence of its Wikipedia entry says, “R is a programming language for statistical computing and data visualization”. I rarely discuss using R for statistical analysis and focus far more attention on the data visualization power of R. This week, I’d like to share a set of panels from a figure in a paper recently published in Nature, “Lymph node environment drives FSP1 targetability in metastasizing...

Hey folks, I’ve really enjoyed the flow of combining these newsletters with a Monday critique video, a Wednesday recreation video, and occasionally a Friday remake video. A few weeks in, I feel pretty good about our ability to engage in constructive critiques. Of course, we have to train ourselves (myself included) to use those tools and not just resort to immediate and emotional responses - “I hate that plot”. We need to engage, get in the head of the original creator, and try to understand...

Hey folks! I’m appreciating the positive feedback on Monday critique videos. They’re a lot of fun to think through and make. I think I might start looking at figures that are drawn from the scientific literature since many of you found out about me from my science work. Let me know if there are plots or practices that you’d like to see me talk about. I’ll see if I can work them into the queue. Also, if you’re working on developing figures for a presentation, poster, or paper and would like to...