Visualizing the timeline of antibiotic discovery and resistance with ggplot2


Hey folks,

I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program. Why not? Well, I find that people get too hung up on tools. When they get frustrated with the tools they revert to their previous practices. By focusing on concepts, you’ll be able to design and critique any visualization. From there, you can use any tool - even a pencil and piece of paper - to implement your design. Click this button to learn more.


This week, I want to talk about a data visualization that I saw included in a presentation I was at earlier this week. This plot shows the discovery, first clinical use, and first report of resistance for 38 classes of antibiotics.

This is Figure 3 of the article, “Derivation of a Precise and Consistent Timeline for Antibiotic Development” by Stennett, Back, and Race, which was published in the journal Antibiotcs. It’s in an open access journal, so be sure to read the whole thing. Conveniently, the data are provided in Table 1 although it’s caption says it’s for Figures 1 and 2 - it’s actually for Figures 2 and 3.

What stands out about this figure? Well, it was published in 2022 and there hadn’t been any new classes of antibiotics come to the clinic in the previous 15 years. Also, resistance has been found to nearly every class of antibiotics. Yikes!

Beyond those scary stories, what stands out about the design of the figure? First, the orange bars are the “development windows” indicating the time between the discovery and first clinical use. The blue bars are the “resistance windows” indicating the time between the first clinical use and finding resistance. I would likely create those bars using geom_segment(). The y-axis would be the class of antibiotic and the x-axis would be the year.

To pull this off, we’d need four columns: (1) the class of antibiotic, (2) whether the row was in the development or resistance window, (3) the initial year of the window, and (4) the final year of the window. To mark the start and end year of each window we’ll likely need to do some work with pivot_longer(). The data in Table 1 has four columns - “Antibiotic Class”, “Discovery Date”, “Clinical Use Date”, “Resistance Date”. I’d likely rename those “date” columns to be development_start, development_end, and resistance_end. I’d make a copy of development_end and name it resistance_start. Then we could use pivot_longer() with two values in the names_to argument to create columns for the window and the period. This will create a window column with values of “development” and “resistance” and a period column with values of “start” and “end”. It will create a column called value that has the year. Then I’d use pivot_wider() using the period column with the names_from argument and the value column with the values_from argument. Got it? :)

With geom_segment(), I’d then map the start column to the x aesthetic, the end column to the xend aesthetic, the class column to the y aesthetic, and the window column to the color aesthetic. We’ll likely need to order the classes, which we can do with fct_reorder() using the development start date. To make the segments thick, I'd use the linewidth argument.

The next challenge will be adding the class name to either the left or right side of the bars. I’d likely create x_label and x_hjust columns to set the position of the label. If it’s on the left side of the bars it will be right justified (hjust = 1) and if it’s on the right side of the bars it will be left justified (hjust = 0). To set the x-axis position I’d add a nudge factor to the date. This will probably take some fiddling to make it look right.

Another interesting element of this figure is that the authors put the x-axis on the top and bottom of the plot. In my opinion, this design choice is odd. No doubt they wanted to make it easier for us to see the dates. But they made the size of the font so small that it’s pretty hard to read. I’d prefer including fewer year labels (maybe every 20 years?), but making the font size larger, and adding vertical gridlines. The larger font and gridlines should do a better job of making the dates easier to interpret.

Finally, the authors used a serif font - Times? - in this figure. It looks pretty weird to my eye. The font of the text in the PDF version of the paper is also a serif font, but the font of the text in the HTML version is a sans serif font (WHY!?!). Thinking back to how I discovered this figure, I think it’s useful to know how to customize these types of figures for your own use so that the font and style choice doesn’t look weird when you include it in your own materials.

What do you think of this figure? Feel free to email me back and let me know your thoughts!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, Are you looking for more personalized support and coaching to help you develop your data analysis skills? Are you looking for help in leading a data science team where your folks aren’t super proficient in analyzing data? Let me know what you’re looking for and we can discuss how I might be able to help you. Unfortunately, this wouldn’t be a free service. But, I’m confident I can help you get over the challenges that are keeping you from creating data analyses and visualizations...

Hey folks, We had another great livestream on Wednesday building a figure from the Washington Post. I talked about this plot last month in the newsletter as being a faceted waffle plot. We had a lot of fun building the figure! I didn’t think we’d get to it, but we even came up with a clever approach to making the non-uniform circles to depict each response to the WP’s survey. You’ll have to watch the livestream to see how we did it. I have really enjoyed the interaction with the people who...

Hey folks, I’ve now produced three livestream videos. What do you think? Do you watch them live or watch them later? Or are they too long? I’m looking for honest feedback! I have to admit that if I hadn’t livestreamed these videos, they would not have been produced. It’s nice that I can more or less record and post without any editing. This is still a bit of an experiment. I think fewer people are watching the episodes which makes me worry that this might be an overall step backwards for you...