Annotating heat maps to highlight effects of fentanyl overdoses using the ggplot2 R package


Hey folks,

I really hope you enjoyed the series of newsletters and videos of me recreating the visualizations presented by W.E.B. DuBois at the 1900 Paris Exposition. I can’t express how much I enjoyed making them. Some of them were pretty tricky and required a lot of work. But I think it was worth it! It definitely forced me to use some new-to-me tools like geom_polygon() and geom_sf(). Please let me know what you thought of the series! I wonder if there’d be any interest in a companion to the Battle-Baptiste & Rusert book on the visuals that showed how to make each of them in R. Email me back and let me know.

This week, I want to highlight a story in the NY Times’s “TheUpshot” section from December 2024, “How Drug Overdose Deaths Have Plagued One Generation of Black Men for Decades”. The story describes how drug overdoses has had a disproportionate effect on Black men through the 1980s, 1990s, and now - these men are all from the same generation. One quote that stood out to me in the article was from Tracie Gardner, who said, “They were resilient enough to live through a bunch of other epidemics — H.I.V., crack, Covid, multi-drug-resistant tuberculosis — only to be killed by fentanyl.” Oof.

In this article, the team of journalists and data scientists present heatmaps for different diseases with the year across the x-axis, the age across the y-axis, and the intensity of each cell colored by the number of deaths per 100,000 people. They make plots for different diseases and U.S. cities. Here’s the heatmap for drug deaths among Black men in Chicago:

A few things stand out to me about this figure that I’d enjoy taking on in R with ggplot2.

First, it’s obviously a heatmap. I need to aggregate the data from the mortality data from NCHS, but let’s assume that we can get a CSV or TSV with columns for the year, age, and number of deaths in each region. We should be able to map the year to the y, age to the x, and deaths to the fill aesthetics. Then we’d use geom_tile() to create the heatmap.

Second, I’m intrigued that they also included the dashed lines indicating the ages of a cohort of men born between 1951 and 1970 across the last 40 years. I’d likely create this using geom_abline(). This function allows us to create lines with specified slopes and y-intercepts. Of course, we’d want to use a “long dash” line type to match what is used in this visual.

Third, the legend is a gradient going from a pale color to a dark purple color. This is reminiscent of one of the viridis color scales. Even if it isn’t exactly one of the built-in color scales, we could use scale_fill_gradient() to define a gradient between the range of colors. Beyond the color, the legend is interesting because it has vertical grid lines at 200, 400, and 600 deaths per 100k. It’s also justified to the right side of the heatmap.

Fourth, there’s no x-axis title, but there is a y-axis title - “AGE”. It’s rotated 90 degrees and located outside the top let corner of the heatmap. I’d likely put that there with some axis.title.y manipulation or using annotate(geom = "text"). I could go either way, but I think annotate() might be cleaner. The annotate() function would be useful for placing the “Men born from ….” text to the heatmap.

Finally, the plot has a title and caption. The title has two font faces - bold and regular sans serif font. Likely a Libre Franklin-related font that we can get from google fonts. The caption at the bottom is small and gray.

What else catches your eye about this visual? Let me know!

Reading the comments of a NY Times article is rarely a good idea. But nestled in there are other plots that I’d be interested in seeing. For example, what do the data look like for Black women? Hispanics? Whites? What do they look like in Detroit? How do you think we’d need to alter the R code to look at these questions and compare them to Black men? I suspect the WONDER NCHS Data is a treasure trove for answering these and other questions.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I’m gearing up to teach a 1-day (6 hours) data visualization workshop on May 9th. This workshop will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. From this workshop, I hope that you would be able to go off on your own journey learning more advanced topics. You can learn more and register by clicking the button...

Hey folks, Long time friends of Riffomonas know that I’ve been teaching data science classes for close to 20 years. The hallmark of my teaching has been three-day workshops where I either teach R (here and here) or the mothur software package. I’ve gotten feedback that three days is just too much time for people to carve out of their busy schedules. So, I’m excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. This will cover an introduction to the ggplot2 package....

Hey folks, I’m really excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. It will cover the basics of ggplot2. If you’ve been following along this newsletter for anytime, you know I’ve thought a lot about how we learn. A critical element of learning is to create a mental model that we can hang ideas on to flesh out our understanding of a concept. The “grammar of graphics” is one such mental model for building plots. It is instantiated in ggplot2 - that’s the “gg”...