What is the value of recreating visuals you don't like?


Hey folks!

Before launching into this week’s visualization, I’m looking for a bit of feedback. Since November, I’ve settled into a new routine with this newsletter and the YouTube channel. Each week this newsletter introduces a visualization at a 30,000 ft view or discusses a specific topic in some depth (example). The following Monday I post a video critiquing the visualization (example). Then on Wednesday (or Tuesday like this past week), I livestream a video where I recreate the visualization and refactor or modify the visualization based on Monday’s critique (example). This is working well for me. I’m honestly surprised that I’m able to find new things each week in Nature and its upper echelon journals without too much effort. How is this flow working for you? Could you reply to this email and let me know?

I’m curious what you think of the livestreams. The metrics YouTube gives me for livestreams are not as strong as they were for my recorded and edited videos. But, I honestly don’t have the time to produce the more polished videos. What could I do to make the livestreams more effective and engaging? A question I asked in the community forum was whether people wanted me to only do the refactoring or if it was ok to do both the recreation and refactoring even if it was clear I didn’t like the plot. The feedback I received was 3 to 1 in favor of doing both rather than just the refactoring. What do you think?

Here’s my take. I think recreating the plot has value even if I don’t like it. There are two main reasons. As an example, consider the stacked bar plot I discussed last week.

First, recreating a visualization forces me to do things that I wouldn’t normally do. For example in last week’s visualization the authors were able to create what were effectively section headings within the legend for taxa that stained Gram-positive or -negative. If I went directly to the line plots without recreating the stacked bar plot, I wouldn’t have had the chance to think about how to recreate that look. I think this blew people’s brains when I did this on the livestream.

The second reason is that recreating a visualization tells me a lot about the data and the approach the investigators took to creating the plot. For example, in the stacked bar plot example, it was only by digging into the actual data that I was able to see that there were actually other minor populations not shown in the legend. It also helped me notice they were doing weird things with pooling time points in Figure 5. Even if I never publish a stacked bar plot of my own data I was able to learn a lot by recreating this one. Do you find my logic compelling? Let me know!

This leads in nicely to this week’s visualization. I’ll show my cards right away and tell you I’m not a fan of this set of panels from the paper, “Specialized RNA decay fine-tunes monogenic antigen expression in Trypanosoma brucei” published earlier this week in Nature Microbiology. I’ll say more on Monday, but it’s effectively a volcano plot in a polar coordinate system (WHY!?!?).

Even if I never use coord_radial() to circularize my data visualization, there are a number of things in this set of panels I’m curious how to achieve. For example, can I use geom_hline() to make the dashed circle indicate the significance threshold? Or could the arrow in the lower right corner of the panels be drawn using geom_segment() with an arrow head?

The thing I’m most excited to figure out is the legend. You’ll notice that they have three colored circles under the title of “VEX known interactors” and the two other categories have a single circle. How would we recreate this look?

I think it’s actually a lot like the legend in last week’s visualization. As a reminder, here it is:

I recreated it by using an aesthetic (e.g., alpha) in addition to fill. By using the guides_legend() function and the override.aes argument within scales_fill_manual() and scales_alpha_manual(), I could create two legends for the two types of Gram staining.

For this week’s legend, imagine if we had taken the same legend we had last week and replaced the labels argument in scales_fill_manual() and scales_alpha_manual() with a blank string "". That would remove the labels. Now if we used legend.position = "bottom" within theme() the legend would lay horizontally on the bottom of the plot. The title would be on the left. Well to get the title on the right of each set of colors we could use legend.title.position = "right" within theme() to get the title for each legend on the right side of each set of symbols. Cool, eh?

The other added wrinkle here is that we effectively have three legends. For the radial volcano plot, I’ll likely use the color, fill, and alpha aesthetics. What do you think?

I’m not sure when I’d want to add this type of structure to my legends. But these types of recreation exercises allow me to do two things. First, they force me to do something that I wouldn’t naturally think of doing. If I jumped directly to the refactoring, this design element wouldn’t be necessary and I’d easily blow off doing it. Second, it forces me to do some problem-solving using the powerful tools I have at my disposal. This reminds me of how I was bothered by the bleed through of a grid line behind a dashed line that I had in a recent visualization. My solution was to put a thick white line down followed by a thin dashed line. I noticed that I still had bleed through of the dashed line from behind my jittered points because I was using alpha = 0.5. An insightful viewer suggested I repeat the same approach I used with the line, but with using a layer of white points above the background lines but below the data I was trying to show. Exactly.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...

Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...

Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...