Visualizing how you're most likely to die vs what the media wants you to think


Hey folks!

I’m appreciating the positive feedback on Monday critique videos. They’re a lot of fun to think through and make. I think I might start looking at figures that are drawn from the scientific literature since many of you found out about me from my science work. Let me know if there are plots or practices that you’d like to see me talk about. I’ll see if I can work them into the queue. Also, if you’re working on developing figures for a presentation, poster, or paper and would like to work with me to come up with more effective designs, please don’t hesitate to get in touch.


This week I’m returning to Our World in Data for a visualization. Someone sent me a similar plot from OWID from a story about vaping. I’m starting to notice that news outlets tend to have a set of visualization approaches that they use over and over. It becomes a bit like “if your only tool is a hammer, all your problems look like nails”. Something to think about…

I’m probably biting off more than I can chew with this plot. I think I probably picked it because I’m already thinking ahead to how I might refactor it :) This is a set of stacked bar plots showing the causes of death in 2023 and how three media outlets covered different types of deaths in the same year. Perhaps you’ve heard the adage, “if it bleeds, it leads”. I think that is what this plot is trying to show. I’m not sure that I see a consistent partisan bias here.

I was pleased to find the code used to collect the data and data used in the story available elsewhere on their site. They have a nice Python notebook that walks through their data collection and curation steps. The CSV file is already in a tidy format with columns for the year (2023), source, cause of death, mentions, and single_mentions. The mentions data are the number of articles that mention the cause of death at least twice, which was what the caption indicates they used. So, I’d likely drop the single_mentions and year columns since they aren’t helpful. We’d want to calculate the percentage within each source.

Let’s start with stacked bar plots. These are straightforward to create using geom_col(). We can map the source to the x-axis, the percentage to the y-axis, and cause to the fill color. We’d use geom_text() to add the labels when the percentage is above a certain threshold. I’d use the position_stack() function to get those positioned correctly within the stack of bars. I’m afraid the labels for the actual causes of death might be more challenging. I think we can experiment with hjust and nudge_x to get those to work.

We’ll need to use facets to get the separation of the actual and reported causes of death. I’d add another variable to my data frame to indicate whether the data were actual percentage or media-reported percentages. I would then use facet_wrap() to create two panels for these. We could use some of the new facet_wrap() features from ggplot2 v. 4.4.0 to get the spacing occupied by each facet to be proportional to the category. This would likely cover the basics of the plot’s appearance. Then the challenge becomes matching the styling of the facet titles. The titles of the stacked bars in the regular font might be the x-axis label moved to the top of the stack. Then the bolded text could be the text in the facet’s strip. The different colors could be set using element_markdown() with some HTML and CSS styling. I think that might actually work!

Now for the titles. It might be easiest to make the top line of the title the actual title argument in labs() and use the subtitle argument for the second line in the title. That would allow us to position them slightly differently and to color the lines separately (notice how the color matches the title of the facets?!). By now the caption should be straightforward to pull off using element_textbox_simple() with bolding of the “Note:” and “Data sources:” text.

Those arrows might be a bit funky. I’d likely add them using annotate(), geom = "curve", and arguments to create the arrowheads. The funkiness comes in with the curvature of the arrows. These always look weird when I try them. But, I think if I can get the blue arrow right, the red arrow would be its mirror.

Ok, I still think this is a lot of work. But, by breaking it down for you, I’m starting to believe that I can implement this figure in a two-hour livestream. What do you think? Be sure to tune in on Monday to see what I like about this visual and then again on Wednesday morning to see me implement it. Let me know if you’d like to see a better way of representing the same data using a dot plot. Have I mentioned how I hate stacked bar plots? :)

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, The more I peruse the literature, the more I see that researchers need help designing figures to help tell their stories. I don’t just mean the mechanics of creating a figure in R, Python, Prism, or Excel. Rather, if someone had a box of dry erase markers of various colors and they had to give a talk without any slides, what would they draw to tell their story? I don’t mean to trivialize the difficulties. It’s hard! There are many figures I’ve published that I wish I could have a...

Hey folks, I appreciated the emails I received from people after last week’s newsletter. I hope that even if people didn’t agree with what I had to say, it was thought-provoking. Regardless of how a plot is made - R, Prism, Excel (gasp!), or AI (oh my!) - we need to train our eyes and sense of taste to make the most compelling visualization of our data. If you’re interested in working with me on an individual or group level to achieve this goal, let me know. I am offering consultation...

Hey folks, If you’ve watched any of my livestreams when someone asks why I don’t get ChatGPT or something to do a task for me, you probably saw a pained expression come across my face. Part of me dies every time someone tells me that they used some LLM chatbot to solve a problem. I have many reasons for despising the fascination with AI-based tools. I even wrote a commentary that I submitted to mBio in the fall of 2024. Yes, I wrote it. By hand. Then I typed it. No really, I typed it on a...