Data visualization as a vaccination against ignorance

Hey folks,

I hope you’ve noticed that this newsletter and the YouTube channel have nearly caught up. At this point there’s a 10-day lag between when I post a newsletter describing a data visualization and when I post the recreation video. I could possibly push that to a 3-day lag, but I’d like people to have a chance to work through the code on their own before I give my solution. After having existential dread last week that I’d never find another good plot to share, it appears my cup runneth over :) I’m pretty excited to share what I’ve been collecting!

As I mentioned about a month ago, many in the US are bracing themselves for the prospect of Robert Kennedy serving as Secretary of the Health and Human Services. He’s been an outspoken opponent of vaccines. Combined with what many feel is an increase in “anti-vax” sentiment, there are fears that old diseases like measles, polio, or whooping cough might make a comeback.

This week, Francesca Paris, a data reporter at the New York Times published an article about vaccination rates across the US and how they’ve been falling since the “COVID-times”. Here’s the first graph in her article.

The take home message from this figure is that vaccination levels for measles, polio, and whooping cough hovered around 95% until the pandemic started. Then the rates declined. I’m sure there are many reasons why this happened and the article shares a few. For my family, I know it was nearly impossible to get our kids into the doctor’s office for well-child visits. Then no one was alerting us it was ok to come back. Then our providers dropped us because they hadn’t seen us in too long. Our story wasn’t unique. Combined with anti-science political winds around the COVID vaccine, it was a perfect storm for declining rates of vaccination.

Line plots are common in many fields and I was excited to see an attractive line plot that told an interesting story. What do you think of the plot? I’m repeating myself, but I really like the aesthetics of the NYT data journalism products. They’re very easy on the eyes. They use color for emphasis. The landmarks for the figure often fade into the background. There are good annotations telling you what is going on. This one is no exception. If you want to try to recreate this figure or any of the figures the article, you can get the raw data from the CDC as an MS Excel Spreadsheet through their website. A few things stood out to me about this figure.

First, it’s a line plot. We could draw the lines mapping the school year to the x-axis, the national vaccination rate to the y-axis, and color to the disease being vaccinated against and create the line with geom_line(). What’s interesting to me about this particular line plot is that have indicated where they have data by including a circle plotting symbol. If you look closely, there is a space between each line segment and the plotting symbol. When I see these types of plots, I rarely see that type of spacing. I would create this effect by using plotting symbol 21, which is a circle with a border. Instead of using a black border, I’d use a white border to give the effect of the spacing. The points could be added by running geom_point() after geom_line().

Second, instead of a separate legend, they label the lines directly. The labels for each of the diseases being vaccinated against are near the ends of the lines and are the same color as the line. Because the end of the lines for measles and polio are overlapping, they moved measles up and included a line with a 90-degree turn in it to label its line. I think all of this is pretty slick. I’d likely create the labels using geom_text(). The line for measles could probably be created using annotate() with geom = "segment". We might also be able to create it using a function from {ggrepel}. Of course, I’m more familiar with annotate() and so getting something to work might be easier than figuring out how to use something from {ggrepel}.

Third, in typical NYT fashion, they don’t have a y-axis line and they have the y-axis values on the corresponding grid line with the unit for the axis next to the top number (e.g., 95%). Something we could debate is that the range goes from 91 to 95% rather than from 0 to 95 or 100%. By narrowing the range, we see the important small changes in the data. The downside is that we lose the sense that the changes are quite small albeit in the same direction. An alternative would be to instead have a y-axis indicating the difference from the “Federal Measles Target”. Then we’d have an axis from about -3.0 to 0.5. That might be too abstract for a lay audience. But if you’re a “must include zero purist”, that’s how I would do it. We’ve seen how to create our own y-axis using annotate() in recent videos.

Fourth, I really like how they have indicated the target level of vaccination. Again, all the grid lines are a light gray color that really blends into the background. But the line at 95% is solid and black. That the label is in all capital letters drives home the message that this is the desired threshold. Because one line is different from the others, I’d likely create the black line using geom_hline() and the other grid lines using theme(). The text labelling the line is right justified (hjust = 1) and could be included using annotate() with geom = "text"..

Finally, earlier this week I posted a video recreating a bar plot from another NYT article. The “Libre Franklin” sans serifs font is still on my mind and I’d like another try at using {showtext} to use that font from Google fonts. As far as I can tell, all the text in this figure using the Franklin font.

If you scroll further down the article, you’ll see a few other plots are included. Which is your favorite? Which would you most like me to make a video of? Reply to this email and let me know!

For now, I’m leaning towards recreating this figure. We’ve done several of this type of plot in the past, but this has some unique features. See if you can think through how to recreate this figure on your own!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

minimalR Workshop

generalR Workshop

mothur Workshop

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Data visualization as a vaccination against ignorance

Workshops

In case you missed it…

How would you make a labelled bar plot with positive and negative values?

Making a basic line plot appear more sophisticated

Pseudo-waffle plots from LA from the Washington Post