Representing time in a bubble plot with {ggplot2}

Hey folks,

Wow, I really didn’t expect my overview of Positron to resonate with so many people last week on YouTube! I’ll work on coming up with another video showing Virtual Studio Code (VS Code) in action. As others have mentioned in the episode’s comments, I’m not really sure why Posit is building Positron instead of making things easier within VS Code for R users. To me the need for an IDE that allows people to use multiple programming languages is a red herring - VS Code does that already. Maybe I’m missing something. Ultimately, it’s important to remember that Posit is a company and they must have a business case for Positron. But… do they really want to take on Microsoft, the makers of VS Code?

After deleting my Twitter account a couple of months ago, I’ve been lurking on BlueSky waiting for it to take off. Honestly, I don’t really have the time to be scanning social media feeds. But something I miss from Twitter is the community that would post cool figures. I decided to go looking for some of those figures on BlueSky yesterday and found the feed of Tom Calver who is a Data Editor at The Times (of London?). I found this cool figure, which was part of an article he wrote about health care spending in the UK’s National Health System (NHS):

I thought this was a fascinating plot. Of course, any health care-related figure has to include the US since we are good at making others look better. It raises all sorts of interesting questions about why countries can improve their life expectancy without spending more money (e.g., Italy and Japan) or why other countries spend more with no or little benefit (e.g., US, Germany, and UK). Something I like about this plot is that Claver shows the passage of time with a trail of smaller points. On the web, this is an interactive figure that you can hover over a point and get a pop-up to tell you about the point. I’ll focus on the static parts of the plot.

This is a bubble plot that you could generate using geom_point(). There are at least 5 different aesthetics. Can you spot them all? First, there is the per-capita level of spending on health care on the x-axis. Second, there is the life expectancy on the y-axis. Third, there are seven countries each with a different color. Fourth, the size of each point varies by the year with 2000 being the smallest size and 2023 being the largest point. Finally, the most recent year has a black border to the points while the other years have a white border. In addition to the points, it also has text labels for each of the countries. I’d include those using geom_text. A few things stand out to me about the plot.

First, the size legend is built into the data. Do you see it? For the Germany series of points the smallest point has 2000 next to it and the largest has 2023 next to it. A common theme in recent newsletters has been looking for ways to build legends into the data so that the reader doesn’t have to look off to a margin to understand what they are seeing. I really like this effect. I’d probably add this legend with annotate(geom = "text").

Second, it is interesting that the older points appear to fall on top of the newer points. Except for the 2023 points, which fall on top of the 2019 points (the 2020-2022 data were excluded). Doing this with {ggplot2}, I’d likely have the older points fall under the newer points. I think geom_point() will do this by default, but we could alter the order using fct_reorder() to define the order of the factor. If I go the factor route, I’d likely need to convert the year to a character variable.

Third, this plot doesn’t have tick marks on the axes, but does have grid lines on both axes. This is another emerging theme. Why include tick marks if you have grid lines? I’m not generally a fan of grid lines, but I have to agree with these developers that if you’re using them you don’t need the tick marks. I think they’re extraneous.

Fourth, this plot puts the x and y-axis titles at the outer reach of the axes. We see “Life expectancy” in the upper left corner and “Per-capita spend” in the bottom right corner. I’m not 100% what I think of this yet. I kind of like it for the x-axis because it keeps the title from getting lost in the caption at the bottom. What do you think? I’d remove the typical titles with labs(x = NULL, y = NULL) and instead use annotate() to place both titles within the plot.

Fifth, I like how “UK” has a red background with white font in the title. In the past, I’ve highlighted a variable by changing the title text to the relevant color. I suspect the background color helps “UK” pop more than if “UK” was written in red font, even if it were bolded. I would likely pull this off using geom_textbox_simple() from {ggtext} by using **UK**. Of course, this is in the subtitle. There’s also a bolded title. It looks like the caption would use a to change the color of the second line.

Finally, there are a few odds and ends. Something that stands out as weird to me is how the US’s 2000 to 2002 points for fall outside the plotting window. Why? Regardless, we could pull this off using coord_cartesian() by including the y-axis limits and setting clip = "off". Also, the titles and caption are justified to the left of the y-axis (plot.title.position = "panel") or to the right of the left edge of the plot (plot.title.position = "plot"). I’d probably align to the plot and then use a left hand margin to move it to the right a smidge. In case you were wondering, the font appears to be Roboto.

If you want to take a swing at making this figure, you should be able to get the life expectancy data from OECD and the healthcare expenditure data from the WHO.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

minimalR Workshop

generalR Workshop

mothur Workshop

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Representing time in a bubble plot with {ggplot2}

Workshops

In case you missed it…

How would you make a labelled bar plot with positive and negative values?

Making a basic line plot appear more sophisticated

Pseudo-waffle plots from LA from the Washington Post