Leverage your experimental design to improve the visualization of your data


Hey folks,

I’ve been getting asked to give more talks about data visualization and my experiences critiquing visualization. It’s been a lot of fun to engage with live audiences. I enjoy learning about their experiences, motivations, and limitations. As much as I love this newsletter and the content I post to YouTube, it’s clear that it isn’t a substitute to talking to people without the filter of email or a chat box. So, if you’re interested in working with me on an individual or group level to improve your data visualizations, let me know. I provide free 30-minute exploratory meetings to discuss how we might work together to design the figures for your next paper, talk, or poster. You can sign up by clicking the button below!


The talks I’ve given have forced me to synthesize my observations about common challenges people seem to face visualizing their data. I don’t just mean how to do A with tool X (or R!). Rather, how to visually translate what they are trying to say with words. Surprisingly, arranging treatment groups in a way that facilitates the comparison they want to make is something that many find to be challenging. In last week’s panels I showed how the authors wanted us to compare panels i, k, and m to each other. Why not make those panels i, j, and k? Why not put the data from those three panels into one panel? Doing so would have made it so much easier to compare the data in those three panels.

This week I have a panel with a similar problem. The authors actually had a “built in” way to link their data but didn’t take advantage of it. Here’s panel d from Figure 3 of the paper “Acarbose redirects gut microbiome utilization of dietary carbohydrates to suppress anaphylaxis in mice” which was recently published in Nature Microbiology.

In this experiment the authors had four groups of 8 mice that for 21 days they sensitized to ovalbumin. For the next 7 days they either (i) left untreated (i.e., Ctrl), (ii) treated with acarbose (i.e., Acr), (iii) treated with antibiotics (i.e., Abx), or (iv) treated with acarbose and antibiotics (i.e., Abx + Acr). They obtained fecal pellets from the 32 mice at 21 (i.e., Before) and 28 days (i.e., After).

As you can see by the comparison bars in this figure they don’t really know what they should be comparing. They should be comparing the same group (e.g., Abx + Acr) between the two time points. The only group they do this for is the acarbose group. The three other comparisons they made were between the after control and the acarbose, antibiotics, and acarbose + antibiotics groups.

I’m not sure why they did it this way. Their layout causes it to be more challenging to compare the same group across time points. Also, I’m positive they used the incorrect test. The caption indicates they used a “one-way ANOVA with Dunn’s multiple-comparison post hoc test”. They should have used a paired t-test. This likely would have given them smaller p-values. Let me explain each of these points.

First, the best control for each of the treatments in the after group was the same treatment in the before group. The samples were collected from the same mice. If I was in a study looking at the impact of antibiotics in humans, the best control for me after antibiotics would be me before antibiotics - not a separate group of individuals not receiving antibiotics. Sometimes this is necessary in a retrospective study. But not here where they collected before and after samples from each animal. So put those data next to each other. It helps the audience, but it also does a better job of reflecting how the experiment was performed. If you are drawing comparison bars across multiple groups to indicate the comparison you want to show, consider whether those groups could actually be next to each other.

Second, a paired test is preferred to a one-way ANOVA. Again, because we have before and after data we can test the change in the Shannon index, which is what the authors are interested in. It’s subtle, but the test they performed tested for a difference in the Shannon index rather than a change. The paired test would be a more powerful test because the test effectively controls for the initial variation in Shannon indices across animals. Also, they don’t seem interested in the comparison across the four groups, so I’m not sure the post hoc test is necessary. The comparisons they made could be done with 4 paired t-test. Alternatively, if they were interested in comparing the change in the Shannon index between the 4 treatment groups, then they could have done the one-way ANOVA with the test for multiple comparisons using the before and after differences in Shannon indices.

My suggestion for refactoring this panel would include several steps. First, I would put the four treatment groups across the x-axis. Within each treatment group I would dodge the before and after points and likely jitter the individual points. Second, I would use a different shape or color for the before and the after points. Third, I would draw a segment connecting the before and after point for each animal. Aside from making it easier to see the comparisons, this design would better reflect the experimental design. Furthermore, by connecting the points, it would become much easier to see whether there was a downward trend in the Shannon index.

Dang. I think that's a much better figure. Wouldn't you know it, the p-values are smaller as well!

I’ve shared panel d from this figure with you to illustrate these points. But, there are several other panels consisting of an ordination, a stacked bar plot, and several box plots that all suffer from similar issues. I encourage you to check out the other panels and see if you can’t draw what these plots would look like by arranging the data to more easily compare the before and after points.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, The more I peruse the literature, the more I see that researchers need help designing figures to help tell their stories. I don’t just mean the mechanics of creating a figure in R, Python, Prism, or Excel. Rather, if someone had a box of dry erase markers of various colors and they had to give a talk without any slides, what would they draw to tell their story? I don’t mean to trivialize the difficulties. It’s hard! There are many figures I’ve published that I wish I could have a...

Hey folks, I appreciated the emails I received from people after last week’s newsletter. I hope that even if people didn’t agree with what I had to say, it was thought-provoking. Regardless of how a plot is made - R, Prism, Excel (gasp!), or AI (oh my!) - we need to train our eyes and sense of taste to make the most compelling visualization of our data. If you’re interested in working with me on an individual or group level to achieve this goal, let me know. I am offering consultation...

Hey folks, If you’ve watched any of my livestreams when someone asks why I don’t get ChatGPT or something to do a task for me, you probably saw a pained expression come across my face. Part of me dies every time someone tells me that they used some LLM chatbot to solve a problem. I have many reasons for despising the fascination with AI-based tools. I even wrote a commentary that I submitted to mBio in the fall of 2024. Yes, I wrote it. By hand. Then I typed it. No really, I typed it on a...