Data visualization as a vaccination against ignorance


Hey folks,

I hope you’ve noticed that this newsletter and the YouTube channel have nearly caught up. At this point there’s a 10-day lag between when I post a newsletter describing a data visualization and when I post the recreation video. I could possibly push that to a 3-day lag, but I’d like people to have a chance to work through the code on their own before I give my solution. After having existential dread last week that I’d never find another good plot to share, it appears my cup runneth over :) I’m pretty excited to share what I’ve been collecting!

As I mentioned about a month ago, many in the US are bracing themselves for the prospect of Robert Kennedy serving as Secretary of the Health and Human Services. He’s been an outspoken opponent of vaccines. Combined with what many feel is an increase in “anti-vax” sentiment, there are fears that old diseases like measles, polio, or whooping cough might make a comeback.

This week, Francesca Paris, a data reporter at the New York Times published an article about vaccination rates across the US and how they’ve been falling since the “COVID-times”. Here’s the first graph in her article.

The take home message from this figure is that vaccination levels for measles, polio, and whooping cough hovered around 95% until the pandemic started. Then the rates declined. I’m sure there are many reasons why this happened and the article shares a few. For my family, I know it was nearly impossible to get our kids into the doctor’s office for well-child visits. Then no one was alerting us it was ok to come back. Then our providers dropped us because they hadn’t seen us in too long. Our story wasn’t unique. Combined with anti-science political winds around the COVID vaccine, it was a perfect storm for declining rates of vaccination.

Line plots are common in many fields and I was excited to see an attractive line plot that told an interesting story. What do you think of the plot? I’m repeating myself, but I really like the aesthetics of the NYT data journalism products. They’re very easy on the eyes. They use color for emphasis. The landmarks for the figure often fade into the background. There are good annotations telling you what is going on. This one is no exception. If you want to try to recreate this figure or any of the figures the article, you can get the raw data from the CDC as an MS Excel Spreadsheet through their website. A few things stood out to me about this figure.

First, it’s a line plot. We could draw the lines mapping the school year to the x-axis, the national vaccination rate to the y-axis, and color to the disease being vaccinated against and create the line with geom_line(). What’s interesting to me about this particular line plot is that have indicated where they have data by including a circle plotting symbol. If you look closely, there is a space between each line segment and the plotting symbol. When I see these types of plots, I rarely see that type of spacing. I would create this effect by using plotting symbol 21, which is a circle with a border. Instead of using a black border, I’d use a white border to give the effect of the spacing. The points could be added by running geom_point() after geom_line().

Second, instead of a separate legend, they label the lines directly. The labels for each of the diseases being vaccinated against are near the ends of the lines and are the same color as the line. Because the end of the lines for measles and polio are overlapping, they moved measles up and included a line with a 90-degree turn in it to label its line. I think all of this is pretty slick. I’d likely create the labels using geom_text(). The line for measles could probably be created using annotate() with geom = "segment". We might also be able to create it using a function from {ggrepel}. Of course, I’m more familiar with annotate() and so getting something to work might be easier than figuring out how to use something from {ggrepel}.

Third, in typical NYT fashion, they don’t have a y-axis line and they have the y-axis values on the corresponding grid line with the unit for the axis next to the top number (e.g., 95%). Something we could debate is that the range goes from 91 to 95% rather than from 0 to 95 or 100%. By narrowing the range, we see the important small changes in the data. The downside is that we lose the sense that the changes are quite small albeit in the same direction. An alternative would be to instead have a y-axis indicating the difference from the “Federal Measles Target”. Then we’d have an axis from about -3.0 to 0.5. That might be too abstract for a lay audience. But if you’re a “must include zero purist”, that’s how I would do it. We’ve seen how to create our own y-axis using annotate() in recent videos.

Fourth, I really like how they have indicated the target level of vaccination. Again, all the grid lines are a light gray color that really blends into the background. But the line at 95% is solid and black. That the label is in all capital letters drives home the message that this is the desired threshold. Because one line is different from the others, I’d likely create the black line using geom_hline() and the other grid lines using theme(). The text labelling the line is right justified (hjust = 1) and could be included using annotate() with geom = "text"..

Finally, earlier this week I posted a video recreating a bar plot from another NYT article. The “Libre Franklin” sans serifs font is still on my mind and I’d like another try at using {showtext} to use that font from Google fonts. As far as I can tell, all the text in this figure using the Franklin font.

If you scroll further down the article, you’ll see a few other plots are included. Which is your favorite? Which would you most like me to make a video of? Reply to this email and let me know!

For now, I’m leaning towards recreating this figure. We’ve done several of this type of plot in the past, but this has some unique features. See if you can think through how to recreate this figure on your own!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I need your feedback on an idea! Don’t worry, there’s some visualization stuff at the bottom. I had a video nearly ready to post this week using a ridgeline plot to show the baby boom. I think I did a great job of recreating the plot. But through a series of unfortunate events, I lost the video. I actually recorded the video three times because my computer kept crashing as I was recording it. This was on top of increasing busyness on my part with teaching, proposal writing,...

Hey folks, I really enjoyed teaching a one-day, introduction to ggplot2 workshop last week. It was a lot of fun - I enjoyed teaching the principles behind ggplot2. I’ve been noticing many learners (and teachers) focusing on making templates that they can recycle to make variations on a common plot type. This is how I often teach ggplot2 and the rest of the tidyverse - it’s also how I learned R. In the most recent workshop I was testing a hypothesis that teaching concepts would yield more long...

Hey folks, If you’re interested in participating in a 1-day (6 hours) data visualization workshop, you’re running out of time to register. I’ll be teaching this workshop on May 9th. I will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. After this workshop, you should be able to learn more advanced topics on your own. You...