Data analysis enters the multiverse

Hey folks,

I’m not sure if you’re familiar with Brian Nosek - a psychology professor at the University of Virginia - but I find his work on reproducible research practices fascinating. Last week I asked you to read one of his papers, “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results” by Silberzahn et al.. This study took a single complex dataset and recruited a diverse array of analysts to determine whether soccer referees were more likely to give “red cards” to dark-skin-toned players.

They attracted 61 analysts who formed 29 teams, each with different approaches to testing this hypothesis. Of the 29 teams, 20 (69%) found a significant positive association between skin tone and receiving a red card; the other 9 teams found no significant association. The teams worked independently, but at different steps in the project were able to give each other critiques to improve their analysis. In spite of their diversity, the analysis plans were all considered valid approaches. Surprisingly, whether a significant association was found was not associated with the analysis plans’ rating, the investigators prior belief about the question, or the analysts level of expertise. That’s pretty wild, eh?

Typically, in science we generate a dataset and then analyze it one way, using approaches that we are familiar with. Sometimes we will take multiple datasets and analyze them the same way to see if important features of a single dataset are important in other datasets. So, it’s striking that the authors crowd-sourced the analysis of a single analysis to get a diversity of analysis plans.

It’s also jarring that they could get such different answers to their question. Thankfully, the overall trend across the tests was in a positive direction. They point out that they were doing a second crowd-sourced study that looked at gender and intelligence. That study produced results that showed both positive and negative associations suggesting that there really was no association.

What are the takeaways from this crowdsourcing approach to science? Nosek and his colleagues suggest a few ideas that I find pretty compelling.

First, we should be publishing our data analysis plans. No, not when we publish our results, but before we analyze the data or even before we’ve generated the data. This will avoid the temptation to p-hack or let our analysis drive the tests we do (i.e., the garden of forking paths problem). In my field of microbiome research, this is hardly ever, if ever done. I think it would be fascinating to see this practice applied outside of psychology or clinical trials.

Second, related to the previous point, we need greater transparency. As I frequently plea, we need to make our data and code publicly accessible so that others can see exactly what we’ve done. I was recently reminded of a this by a paper I was reading that said, “mothur was used to align, classify, and assign 16S rRNA gene sequences to OTUs”. That could mean about a bajillion different things. But if the authors provided a script that laid out the exact commands and arguments they used, there would be no confusion.

Finally, we should try to crowdsource our analyses. I won’t hold my breath that this will happen anytime soon in my field. My colleagues are pretty reluctant to release their data and code, much less give others a seat at the table when it comes time to analyzing data. Alternatively, Nosek and his team suggest that lone analysts brainstorm a “multiverse analysis” consisting of every defensible analysis and then run each analysis on the dataset. They can then attempt to synthesize the different results. The benefit of crowdsourcing is that it would overcome my biased approach to analyzing data and lend different perspectives and expertise to the problem.

What do you think? How likely are you to implement any of these three approaches in your next research project? Feel free to reply to this email and let me know your thoughts!

For next week, I’d like you to read another one of Nosek’s papers with me. I’ll be re-reading, “Using prediction markets to estimate the reproducibility of scientific research” that was published in PNAS back in 2015. They use a brilliant approach to thinking about reproducibility that I think you’ll get a kick out of.

In case you missed it…

The YouTube channel is on pause so that I can finish some writing projects. This will allow me to put more of my attention on the channel. If you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Data analysis enters the multiverse

In case you missed it…

MIC drop! Representing antibiotic susceptibility as a stylized heatmap with R's ggplot2

What do you think of petal plots? How would you make one in R?

Looking back on 2025 and forward to 2026