How would you make a labelled bar plot with positive and negative values?


Hey folks,

We had another great livestream on Wednesday building a figure from the Washington Post. I talked about this plot last month in the newsletter as being a faceted waffle plot. We had a lot of fun building the figure! I didn’t think we’d get to it, but we even came up with a clever approach to making the non-uniform circles to depict each response to the WP’s survey. You’ll have to watch the livestream to see how we did it.

I have really enjoyed the interaction with the people who are joining and chatting as I code. There were as many as 25 people logged in at any given time. From the people who told us where they were from, most were from outside the US: Sudan, Dominican Republic, the Philippines, and Greece were represented. It never ceases to amaze me how far a reach the channel has. I would love for that reach to also be deep. Please tell your friends about the channel and encourage them to participate.

I’m trying to do these every Wednesday morning at 9 AM EST


In the US, Congress just passed a mega funding and budget bill. According to estimates from the Congressional Budget Office, the budget bill is “regressive”. This means that it hurts the poor and helps the wealthy. The New York Times had a plot showing this in an article from last month (free link!) that illustrates this.

I suspect the specific estimates have changed between when this estimate came out and the final version, but I thought the figure was interesting. It would be fun to try to recreate this plot for a few reasons.

First off, it’s a bar plot. The bars go both above and below the zero point on the y-axis. The bars are also labelled at their furthest extent with the actual values. A second label for each bar indicates the decile that the data correspond to with the poorest on the left and the most wealthy on the right.

To create the basic plot, we’d use geom_col() to generate the bars. I’d strip out the x and y-axes, background, and grid lines to give the plot a minimal appearance. To indicate the zero y-axis intercept, I’d use geom_hline() to draw in the solid black line. I’d also create a column to indicate whether the percent change was positive or negative and then set the fill aesthetic to vary by the type of change.

I’d use geom_text() to add two types of text data.

First, I’d use it to add the percent change. The y-aesthetic would be set by the percent change. I’d use one of the position argument functions or nudge arguments to move the location of the text further out from the bar. Perhaps I’d need to create a column in my data frame that indicates whether we need a positive or negative nudge depending on the direction of the data. Of course, we’d also want to change the color of the text to match the fill color of the bar.

Second, I’d use geom_text() to also add the x-axis text. I’d likely make a separate column in my data frame to indicate the formatted decile (e.g., “10th-20th”) and another column for the y-axis position for the deciles. This font appears to be a dark gray - perhaps matching the subtitle and caption color - and a smidge smaller than the percentage text. We can change those with the size and color aesthetics.

Finally, the title, subtitle, and caption are all relatively straightforward. We can put those in with the labs() function and adjust their styling using theme()’s plot.title, plot.subititle, and plot.caption arguments.

An added challenge you might undertake is to generate the plot without writing out a specific tibble for these values. The underlying data from the CBO is available as a XLSX spreadsheet and you can find a slightly different version of this figure in their report as Figure 2. If you want to try reading the data in directly from the spreadsheet, you might try to use the {readxl} package. We’ll need to select the second page and deal with the extra lines at the top of that page.

Give this plot a try on your own. Be sure to tune in to a future livestream when I'll recreate this plot live!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...

Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...

Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...