Data analysis enters the multiverse


Hey folks,

I’m not sure if you’re familiar with Brian Nosek - a psychology professor at the University of Virginia - but I find his work on reproducible research practices fascinating. Last week I asked you to read one of his papers, “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results” by Silberzahn et al.. This study took a single complex dataset and recruited a diverse array of analysts to determine whether soccer referees were more likely to give “red cards” to dark-skin-toned players.

They attracted 61 analysts who formed 29 teams, each with different approaches to testing this hypothesis. Of the 29 teams, 20 (69%) found a significant positive association between skin tone and receiving a red card; the other 9 teams found no significant association. The teams worked independently, but at different steps in the project were able to give each other critiques to improve their analysis. In spite of their diversity, the analysis plans were all considered valid approaches. Surprisingly, whether a significant association was found was not associated with the analysis plans’ rating, the investigators prior belief about the question, or the analysts level of expertise. That’s pretty wild, eh?

Typically, in science we generate a dataset and then analyze it one way, using approaches that we are familiar with. Sometimes we will take multiple datasets and analyze them the same way to see if important features of a single dataset are important in other datasets. So, it’s striking that the authors crowd-sourced the analysis of a single analysis to get a diversity of analysis plans.

It’s also jarring that they could get such different answers to their question. Thankfully, the overall trend across the tests was in a positive direction. They point out that they were doing a second crowd-sourced study that looked at gender and intelligence. That study produced results that showed both positive and negative associations suggesting that there really was no association.

What are the takeaways from this crowdsourcing approach to science? Nosek and his colleagues suggest a few ideas that I find pretty compelling.

First, we should be publishing our data analysis plans. No, not when we publish our results, but before we analyze the data or even before we’ve generated the data. This will avoid the temptation to p-hack or let our analysis drive the tests we do (i.e., the garden of forking paths problem). In my field of microbiome research, this is hardly ever, if ever done. I think it would be fascinating to see this practice applied outside of psychology or clinical trials.

Second, related to the previous point, we need greater transparency. As I frequently plea, we need to make our data and code publicly accessible so that others can see exactly what we’ve done. I was recently reminded of a this by a paper I was reading that said, “mothur was used to align, classify, and assign 16S rRNA gene sequences to OTUs”. That could mean about a bajillion different things. But if the authors provided a script that laid out the exact commands and arguments they used, there would be no confusion.

Finally, we should try to crowdsource our analyses. I won’t hold my breath that this will happen anytime soon in my field. My colleagues are pretty reluctant to release their data and code, much less give others a seat at the table when it comes time to analyzing data. Alternatively, Nosek and his team suggest that lone analysts brainstorm a “multiverse analysis” consisting of every defensible analysis and then run each analysis on the dataset. They can then attempt to synthesize the different results. The benefit of crowdsourcing is that it would overcome my biased approach to analyzing data and lend different perspectives and expertise to the problem.

What do you think? How likely are you to implement any of these three approaches in your next research project? Feel free to reply to this email and let me know your thoughts!

For next week, I’d like you to read another one of Nosek’s papers with me. I’ll be re-reading, “Using prediction markets to estimate the reproducibility of scientific research” that was published in PNAS back in 2015. They use a brilliant approach to thinking about reproducibility that I think you’ll get a kick out of.

In case you missed it…

The YouTube channel is on pause so that I can finish some writing projects. This will allow me to put more of my attention on the channel. If you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, We had a lot of fun last week with my first workshop on the theory of data visualization! If this is something that you’d be interested in participating in let me know. At this point, I don’t have anything scheduled. So, if you have suggestions for days or times, please let me know This week I have a fun figure to share with you from a paper recently published in Nature Microbiology, titled, “Candida auris skin tropism and antifungal resistance are mediated by carbonic anhydrase...

Hey folks, Happy 2026! It’s great to be joining you on another trip around the sun as we explore data visualization, R, and reproducible research. Later today I’ll be hosting a workshop on the design of data visualizations. If you register ASAP, I can probably still get you in. If you missed this one, but would like to be notified when I run this workshop again, reply to this email and let me know! This week I found a pretty unique plot type in a paper published in the journal Nature This is...

Hey folks, What a year! This will be the last newsletter of 2025 and so it’s a natural break point to think back on the year and to look forward to the next. Some highlights for me have been recreating a number of panels from the collection of WEB DuBois visualizations on YouTube, recreating plots from the popular media, and modifying and recreating figures from the scientific literature. I guess you could say 2025 was a year of “recreating”! I have found this approach to making...