Visualizing the timeline of antibiotic discovery and resistance with ggplot2


Hey folks,

I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program. Why not? Well, I find that people get too hung up on tools. When they get frustrated with the tools they revert to their previous practices. By focusing on concepts, you’ll be able to design and critique any visualization. From there, you can use any tool - even a pencil and piece of paper - to implement your design. Click this button to learn more.


This week, I want to talk about a data visualization that I saw included in a presentation I was at earlier this week. This plot shows the discovery, first clinical use, and first report of resistance for 38 classes of antibiotics.

This is Figure 3 of the article, “Derivation of a Precise and Consistent Timeline for Antibiotic Development” by Stennett, Back, and Race, which was published in the journal Antibiotcs. It’s in an open access journal, so be sure to read the whole thing. Conveniently, the data are provided in Table 1 although it’s caption says it’s for Figures 1 and 2 - it’s actually for Figures 2 and 3.

What stands out about this figure? Well, it was published in 2022 and there hadn’t been any new classes of antibiotics come to the clinic in the previous 15 years. Also, resistance has been found to nearly every class of antibiotics. Yikes!

Beyond those scary stories, what stands out about the design of the figure? First, the orange bars are the “development windows” indicating the time between the discovery and first clinical use. The blue bars are the “resistance windows” indicating the time between the first clinical use and finding resistance. I would likely create those bars using geom_segment(). The y-axis would be the class of antibiotic and the x-axis would be the year.

To pull this off, we’d need four columns: (1) the class of antibiotic, (2) whether the row was in the development or resistance window, (3) the initial year of the window, and (4) the final year of the window. To mark the start and end year of each window we’ll likely need to do some work with pivot_longer(). The data in Table 1 has four columns - “Antibiotic Class”, “Discovery Date”, “Clinical Use Date”, “Resistance Date”. I’d likely rename those “date” columns to be development_start, development_end, and resistance_end. I’d make a copy of development_end and name it resistance_start. Then we could use pivot_longer() with two values in the names_to argument to create columns for the window and the period. This will create a window column with values of “development” and “resistance” and a period column with values of “start” and “end”. It will create a column called value that has the year. Then I’d use pivot_wider() using the period column with the names_from argument and the value column with the values_from argument. Got it? :)

With geom_segment(), I’d then map the start column to the x aesthetic, the end column to the xend aesthetic, the class column to the y aesthetic, and the window column to the color aesthetic. We’ll likely need to order the classes, which we can do with fct_reorder() using the development start date. To make the segments thick, I'd use the linewidth argument.

The next challenge will be adding the class name to either the left or right side of the bars. I’d likely create x_label and x_hjust columns to set the position of the label. If it’s on the left side of the bars it will be right justified (hjust = 1) and if it’s on the right side of the bars it will be left justified (hjust = 0). To set the x-axis position I’d add a nudge factor to the date. This will probably take some fiddling to make it look right.

Another interesting element of this figure is that the authors put the x-axis on the top and bottom of the plot. In my opinion, this design choice is odd. No doubt they wanted to make it easier for us to see the dates. But they made the size of the font so small that it’s pretty hard to read. I’d prefer including fewer year labels (maybe every 20 years?), but making the font size larger, and adding vertical gridlines. The larger font and gridlines should do a better job of making the dates easier to interpret.

Finally, the authors used a serif font - Times? - in this figure. It looks pretty weird to my eye. The font of the text in the PDF version of the paper is also a serif font, but the font of the text in the HTML version is a sans serif font (WHY!?!). Thinking back to how I discovered this figure, I think it’s useful to know how to customize these types of figures for your own use so that the font and style choice doesn’t look weird when you include it in your own materials.

What do you think of this figure? Feel free to email me back and let me know your thoughts!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks! The summer is nearly over - where did it go?! Many of us are getting ready to send our kids off to school and start a new academic year. If you’re subscribed to this newsletter, I suspect you are interested in improving your data visualization skills. You can certainly continue to receive this newsletter and watch my weekly livestreams on YouTube for free to help increase those skills. If you want a more concentrated or personalized opportunity to develop your data visualization...

Hey folks! I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about...

Hey folks, Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation....