Do you want to up your data visualization designs in 2026?


Hey folks,

As 2025 is winding down, I want to encourage you to think about your goals for 2026! For many people designing an effective visualization and then implementing it with the tool of their choice is too much to take on at once. I think this is why many researchers recycle approaches that they see in the literature or that their mentors insist they use. Of course, this perpetuates problematic design practices. What if you could break out of these practices? What if you could tell your mentors, colleagues, reviewers, and anyone else what the strengths and weaknesses are of what you are trying to do versus what they are advising you to do?

I have spent a lot of time creating my own plots, critiquing those of others, and reading the ideas of leaders in the field of data visualization. As you know, I’ve shared many of these ideas in this newsletter and in my YouTube videos. I’m excited to work with you more directly. On January 9th (1-4 PM Eastern), I will be offering a 3-hour Zoom workshop introducing you to the principles that drive effective data visualizations in science. There will be no coding in this workshop. Aside from Zoom to watch along, all you’ll need is some paper and a pen - if you have different colored pens you’ll be in even better shape.

What will I talk about? I’ll tell you the importance of aligning your audience and the format with your data visualization. I’ll give you fancy language like pre-attentive attributes to help you talk with your colleagues about your visualizations. You’ll be (re)introduced to the grammar of graphics framework, which will enable you to dissect any data visualization. Finally, I’ll describe strategies to align the form and function of your visualizations.

Data visualization is hard! This interactive workshop will give you greater confidence to design your own visualizations that effectively convey your science to your audience. I’ll lead you through the material by sharing numerous examples from the popular media and scientific literature. You’re also encouraged to bring your favorite visualization to share with other participants and any visualizations you are already working on.

If this sounds like how you want to start your 2026, click the button below to learn more

Because a single workshop isn’t enough to put the ideas into practice, I will also be making myself available for one-on-one and group coaching sessions. If you are interested in these sessions, please reply to this email.


Last week I introduced you to a cool microbial ecology paper recently published in Nature Microbiology by Bakkeren and colleagues, “Strain displacement in microbiomes via ecological competition”. On Monday I provided a critique of Figure 2 from this paper. As you may recall, in last week’s newsletter I discussed panels f and g from the figure. I also recreated these panels in Wednesday’s livestream. Today I want to talk about how I’d make panels b through e:

They’re all the same basic panel each describing a different type of competition. What stands out to me about these panels is that they have a “cartoon” embedded in them to explain the panel’s experiment. I really thought this was slick. I especially liked how they used the same colors in the cartoon that they use for the points and the lines. It’s crystal clear to me that the red data is from the invading strain and the blue data is the resident strain. How would we make this in R? For the sake of conversation, let’s just think about panel c.

This is actually a scatter plot. We could use geom_point() to plot the data. The time point being sampled would be mapped to the x-axis, the density to the y-axis, whether the strain was the invader or the resident would be mapped to the color of the circle edge (symbol = 21), whether the strain was the wild type (WT) or the mutant (ΔsrlAEB) would be mapped to the fill color. As I mentioned in the critique video, I would actually prefer to use geom_jitter() to separate the points. It’s hard to tell but some the time points have 9 points and others 5. Jittering the data would make it easier to see the number of points. Another issue with the original plot is that the distance between the four time points is the same although the number of hours between 8 and 24 is not the same as between 24 and 48 or 48 and 72 hours. You could get the original appearance using scale_x_discrete(). I’d prefer to use scale_x_continuous(). Of course, because the y-axis is on a log10 scale, we need to use scale_y_log10().

How about the line through the points? One approach would be to create a separate data frame that has the median density at each time point and then use that as the data for a call to geom_line(). But that’s tedious. Instead, we need to learn about stat_summary(). With stat_summary() you can give it a fun argument that indicates the statistical summary to apply to the data - median - will work in our case. Then we need a geometry to represent the summary on the plot. We’ll use "line". This will get us a line connecting the various time points. We actually want to call stat_summary() before geom_point()/geom_jitter() so the line goes behind the points.

Looking at the y-axis you might notice some interesting numbers. The title and the text both have numbers in superscripts. We can get that using the sup HTML tag. For example, 10<sup>6</sup> will render as 106 when we use element_markdown() from {ggtext}. You can use the label argument in scale_y_log10 to change 1e6 to 10<sup>6</sup>. Cool, eh?

Now that I’ve mentioned {ggtext}, let’s turn to the title of each panel. There are several packages that are useful for inserting images into {ggplot} figures. The easiest I’ve settled upon is to use {ggtext} with the <img> tag. In this approach, you could use labs() and set the title= argument to be the string you want with the <img> tag. Assuming we have the image stored as cartoon_c.png, I could use the following string to set the title for panel c:

labs(title = "Invader private nutrient
No toxins
")

The height value will scale the height of the image in pixels, so finding the right size will require some fiddling. Of course, we’ll need to use plot.title = element_markdown() in the theme() function to render the HTML. Cool, eh?

I’m planning on building this out in Wednesday’s livestream (9AM Eastern), so be on the lookout for that video. While I’m talking about livestreams… Can I tell you how much I’m learning by doing these? In each of these a viewer will make a comment like, “Why don’t you do it this way?” or “If you do it this way, then you can do this”. In each of these cases, “this way” never occurred to me. I never would have tried it “this way” if I had been recording and editing videos like I was a year ago. If I’m learning, then I’m sure others are too!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...

Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...

Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...