|
Hey folks, As I mentioned last week, I’m exploring the possibility of holding live, in person, workshops again like I did before the pandemic. If this is something that interests you, please let me know. My thought would be to hold them at an affordable hotel near the Detroit airport (DTW). But, if you would like to host me to teach a workshop, I would be open to that as well. This week, I want to call your attention to a plot that I would not encourage you to make. This comes form “Targeted innate immune inhibition therapy compared with antibiotics for recurrent acute cystitis: a randomized, open-label phase 2 trial”, which was published recently in Nature Microbiology by Ambite and colleagues. To be perfectly honest, whenever I see a pie chart I tell myself, “Surely there was a better way… What would have been better?” I’ll be talking more about this in the critique video next week, but how about taking some time to think about it now? As always, we need to think about the context. The study had 20 people receive anakinra (an immunologic) and 10 receive nitrofurantoin (an antibiotic) to compare the safety and efficacy of the drugs in treating recurrent cyctitis, a type of bladder infection. Although I’m sure the developers of anakinra would have loved to see it perform better than nitrofurantoin, they also would have been happy to see the drugs perform equally. That’s because of concerns over antibiotic resistance. Does this context help you think about the design of this panel? What is being compared? It isn’t change in symptoms back to day 1. It’s the comparison between the two drugs at each of the four time points. Those P-values are an output of testing the comparison and that all four are non-significant is one indication that anakinra is not inferior to nitrofurantoin. That test is comparing the distribution of the change in symptoms for the patients on the two drugs. It’s basically asking whether each pair of pie charts are different from each other. I can more or less see what they want me to see with the pie charts. But I think there’s a design that would be easier for me to see the similarity. Let’s think about a couple of other designs. First, my go to approach when pie charts are proposed is to facet the x-axis by the response variables (e.g. all symptoms gone, some symptoms gone, same symptoms or worse) and put the values next to each other that we’d like to see. In this case, I’d put time across the x-axis and the percentages on the y-axis. At each time point, I’d dodge the data by the three levels of response. Then I’d give a different color to the two drugs and plot the data as a point. Something that I gives me pause in thinking about this is that the comparison is across all the response variables, not each response individually. Second, and I know this will shock many, what if we converted those pie charts into … stacked bar plots? Again, on the x-axis we’d put time and the y-axis the response, but we’d give each segment of the column a different color. We’d also put the stacked bar for each drug and time point next to each other. Aside from wanting to compare all three responses across the drugs together rather than individually, there’s another reason I’m open to this strategy. Namely, because there are only three responses. We could put the “all symptoms gone” on the bottom of the stack, “some symptoms gone” in the middle, and “same symptoms or worse” on top. If you wanted to compare individual responses as a secondary question, two of the three categories would have a fixed edge along the y-axis for making comparisons. What do you think? Have I lost touch with reality?! Something that gives me a bit of pause about this approach is that we’re “hiding” the fact that there were 20 people in the anakinra group and 10 in the nitrofurantoin group. That’s why the comparison at 6 months looks different, but has a non-significant P-value. Effectively, the error bars for the nitrofurantoin are larger than they are for anakinra. This is made clear in the figure caption, but I feel like it is lost in the visual itself. Perhaps instead of putting the percentages in the tiles of the stacked bars we could put the actual number. Again, because we only have 3 categories, I think we could get away with adding those numbers without overwhelming the appearance of the figure. Let me know what you think. Stay tuned until next week to see how they look when I give both approaches a try.
|
Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...
Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...
Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...