Plotting the US job creation numbers (and revisions) with ggplot2


Hey folks,

Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation. I believe that once you understand the concepts, you can use any tool - even a pencil and piece of paper - to implement your design. Click this button to learn more.


In recent weeks there’s been a kerfuffle about data coming out of the government. Specifically the US Bureau of Labor Statistics (BLS) revised its forecast of the number of jobs created in May and June 2025 by about 125,000 jobs per month. This led President Trump to fire the Chief of the BLS. Obviously, it’s critical to have trustworthy data from the government to understand the state of the economy, assess effectiveness of programs, and track the overall health of society. Political firings of people implementing SOPs is jarring.

Beyond the politics, the NY Times again had a visualization that I found interesting. There are a few things in this figure that I found interesting and that I am curious to try out in R with {ggplot2} and the rest of the tidyverse. You can get the data from the first link on this page from the BLS.

First, this plot is a stacked bar chart. I really like the use of negative space for the number of jobs that were over projected in May and June. I’d start by making sure the data are in a tidy format with a column for the date (first of each month month and year between July 2024 and July 2025), a column for the number of jobs, and a column to indicate whether the number of jobs is from firm estimate or over projection. With this structure, we could use ggplot() with geom_col() and map the date to the x-axis, the number of jobs to the y-axis, and the indicator column to the fill aesthetic. We might need to play with factors or the position argument to get the ordering of the indicator column correct for May and June 2025.

Second, I really liked the use of color. For the preceding months and years the firm numbers are in gray while the projected extra numbers are white. But for July 2025 the number is in orange. This would mean that we need three values in the indicator column. We could use scale_fill_manual() to map the specific colors we want to those indicator values. Alternatively, instead of using indicator values, we could specify the fill color directly in the tibble and use scale_fill_identity(). To make it clear that the extra number for May and June are white, they use a gray border around all of the data preceding July 2025. Here again, I might add a column to my tibble for the border color to be gray for months before July 2025 and orange for July 2025. Then we could use scale_color_identity() to use those colors.

Third, there’s a couple of instances of annotation in the plot. The first that catches my eyes is the italicized “REVISED DOWN” over the bars for May and June 2024. I’d likely do this with geom_label() making the background slightly transparent, removing the border, making the text italicized and gray. The second that catches my eyes is the black text that describes the numbers. Here again I’d use geom_label() but with regular black font. Another part of the annotation are the line and curve that connect these text elements to the data. I’d make the lines using geom_curve() to get the rounded appearance for the line going to May and June.

Finally, in classic New York Times fashion the y-axis labels are on top of the horizontal grid lines. It feels like it’s been a while since I’ve done one of these. I suspect I’d again use geom_text() to place the text on the grid lines. I’d use theme() function to remove the typical y-axis text, titles, and ticks. Although we could make use of scale_x_date() for the x-axis, I think it might be easier to treat the x-axis as categorical data and use scale_x_discrete() to put specific labels on every other month.

What do you think of this visualization? Did you notice anything that I’ve glossed over? How would you go about implementing that flourish?

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in...

Hey folks, Are you looking for more personalized support and coaching to help you develop your data analysis skills? Are you looking for help in leading a data science team where your folks aren’t super proficient in analyzing data? Let me know what you’re looking for and we can discuss how I might be able to help you. Unfortunately, this wouldn’t be a free service. But, I’m confident I can help you get over the challenges that are keeping you from creating data analyses and visualizations...

Hey folks, We had another great livestream on Wednesday building a figure from the Washington Post. I talked about this plot last month in the newsletter as being a faceted waffle plot. We had a lot of fun building the figure! I didn’t think we’d get to it, but we even came up with a clever approach to making the non-uniform circles to depict each response to the WP’s survey. You’ll have to watch the livestream to see how we did it. I have really enjoyed the interaction with the people who...