|
Hey folks, I was a student-invited speaker at the Syracuse University Biology department this week. It was great to meet with them and hear how they are benefiting from these newsletters and my videos. As much as I love posting newsletters and videos, seeing people light up at ideas, laugh at my jokes, and tell me how they are using what I teach them is like jet fuel. I actually gave two talks. One talk covered what I’ve learned about data visualization by critiquing, recreating, and remaking other people’s figures. I set it up by relaying an anecdote of when my son, Joe, was 13. He told me, “I can tell when words are misspelled.” I replied, “Then why is your spelling so bad?” “I can tell when words are misspelled.” I am a lot like Joe when it comes to data visualization. I can tell when there are problems in other people’s visualizations, but not always my own. The other talk was on work my lab has done trying to use the gut microbiome to create a non-invasive diagnostic for colon cancer. I felt like a hypocrite. Somehow I had gotten enough distance from the plots that I was sharing to see how I was falling short of the advice I gave in the data visualization talk. I never said, “I know you can’t see this, but… “. Still, I did have plots that were copied and pasted from my papers. When I do a critique, I have been emphasizing four stages. The first being “Description” where we establish the context for a figure. Too often we generate figures for papers and then re-use them for a slide deck or poster. But those are three very different contexts! Let’s think about what’s unique in a talk. First, we tend to have a very diverse audience. They’re smart, but they might not know what the genes are we’re studying or the disease we’re interested in or the nuances of the methods we’re using. Second, they will see the figure for only as long as we show it to them. They can’t hit pause or turn back to the figure. While they’re looking at the figure, they have to listen to me talk about the figure. As I’m talking, I’m describing what’s in the caption as well as what I want them to take away from the figure. Third, the size of the image and its part likely need to be much larger to be seen by someone in the back or the side of the room. All of this makes for a lot of cognitive overload for our audience. Ultimately, a talk is more about telling a diverse group of smart people an interesting story that will hopefully stay with them as they’re walking back to the lab. So, how would you design a figure differently for a talk and a poster? You could think about a figure from your most recent or favorite paper. But here’s the figure that as I was sharing caused me to feel my hypocrisy. This comes from a paper that Courtney Armour and other members of my lab published in mBio titled, “A Goldilocks Principle for the gut microbiome: Taxonomic resolution matters for microbiome-based classification of colorectal cancer”. I really love this paper and the story we had to tell. The gist of the paper and this figure was that the best microbiome-based diagnostic of whether someone had colon cancer wasn’t super broad (e.g., phylum) or overly specific (e.g. amplicon sequence variants [ASVs]). Surprisingly, we had to explain who “Goldilocks” was to one of the reviewers! Courtney and I went back and forth a few times on this figure and I take complete responsibility for the final design of the visual. Courtney is pretty awesome. I’m to blame for any problems in the figure. There are certainly things I’d change based on what I’ve been thinking about recently. But, if I were to recreate the figure for a slide deck, what would I do differently? Let’s think about that in the context of what I outlined above as differences between slides and papers. First, the diverse audience. The individual points within each cloud of points represent the AUROC and sensitivity of 100 random 80-20 splits of the data. Odds are good that you don’t even know what that sentence means or why it’s important to the Goldilocks story. I think I could certainly clean up the figure by removing those jittered points entirely and only displaying the mean and confidence interval. Second, the short attention span. I’d actually spent several slides describing that “AUROC” is the area under the receiver operator characteristic curve and how it and sensitivity are calculated. But it likely would have been helpful to write out what AUROC, OTU, and ASV are. I could have also put the specificity for panel B next to the x-axis label in panel B. Third, the size of the image. When I presented this slide, I mainly talked about panel A. I only mentioned panel B when I remembered that it was there and I was clicking on to the next slide. In hindsight, I probably only should have shown panel B because using a set specificity rather than all specificities (that’s what AUROC uses) is a better metric of performance, a stronger result, and tied in to another visualization I had shown them. Furthermore, by only showing one panel I would have helped the audience focus on what I thought was most important. Thinking about these factors together, I suspect having different colors for each taxonomic level was distracting. They may have been left wondering, “what’s different between the colors that isn’t conveyed by the y-axis?”. I think we picked this color scheme because we thought having black summary statistics on top of gray points would have been pretty boring. For the talk, I could have used color to highlight the data from the family, genus, and OTU taxonomic ranks to indicate they performed similarly and better than the other taxonomic ranks. That would have helped focus their attention and drive home the “not too hot, not too cold” message I was trying to send. If it wasn’t too much I could have easily used color to indicate the significance groups in the panel B. Perhaps a gradient of colors showing the order of the groups by sensitivity? As a final point, I’m trying to develop a heuristic similar to how Roman Mars thinks a state flag should look good at half the size of a business card held at an arms length. His idea is that if you can’t see the design of the flag when drawn one a 1 by 1.5 inch piece of paper then it’s too much. Michigan horrible. Ukraine awesome. For a slide, how about it needs to serve its function on a quarter of a piece of paper (4.25 by 5.5 inches) and you only have 15 seconds to interpret it? What do you think? Do you have other suggestions for how you would convert my set of figures to display in a slide deck?
|
Hey folks, If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a...
Hey folks, This week I’ve been teaching one of my 3 day R workshops as part of my official teaching duties at the U of Michigan. I really enjoy teaching these classes! I offer recorded versions of these workshops that use microbiome data or other types of data to help motivate my teaching of R’s tidyverse packages. If you would like to purchase your own version of these workshop click on those links! Also, if you would like me to teach a live workshop to your group, reply to this email and...
Hey folks, If you missed it, on Wednesday I did a livestream where I made a stacked barplot and pronounced it good. No, I wasn’t drinking anything! But it’s a reminder to think about the question before finding the best data visualization strategy. I think this highlights the value of the constructive approach I’ve been trying to take to critiquing data visualizations. The first steps are to establish the question and figure out the question. If you aren’t a “regular”, I think you’re really...