Plotting the US job creation numbers (and revisions) with ggplot2


Hey folks,

Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation. I believe that once you understand the concepts, you can use any tool - even a pencil and piece of paper - to implement your design. Click this button to learn more.


In recent weeks there’s been a kerfuffle about data coming out of the government. Specifically the US Bureau of Labor Statistics (BLS) revised its forecast of the number of jobs created in May and June 2025 by about 125,000 jobs per month. This led President Trump to fire the Chief of the BLS. Obviously, it’s critical to have trustworthy data from the government to understand the state of the economy, assess effectiveness of programs, and track the overall health of society. Political firings of people implementing SOPs is jarring.

Beyond the politics, the NY Times again had a visualization that I found interesting. There are a few things in this figure that I found interesting and that I am curious to try out in R with {ggplot2} and the rest of the tidyverse. You can get the data from the first link on this page from the BLS.

First, this plot is a stacked bar chart. I really like the use of negative space for the number of jobs that were over projected in May and June. I’d start by making sure the data are in a tidy format with a column for the date (first of each month month and year between July 2024 and July 2025), a column for the number of jobs, and a column to indicate whether the number of jobs is from firm estimate or over projection. With this structure, we could use ggplot() with geom_col() and map the date to the x-axis, the number of jobs to the y-axis, and the indicator column to the fill aesthetic. We might need to play with factors or the position argument to get the ordering of the indicator column correct for May and June 2025.

Second, I really liked the use of color. For the preceding months and years the firm numbers are in gray while the projected extra numbers are white. But for July 2025 the number is in orange. This would mean that we need three values in the indicator column. We could use scale_fill_manual() to map the specific colors we want to those indicator values. Alternatively, instead of using indicator values, we could specify the fill color directly in the tibble and use scale_fill_identity(). To make it clear that the extra number for May and June are white, they use a gray border around all of the data preceding July 2025. Here again, I might add a column to my tibble for the border color to be gray for months before July 2025 and orange for July 2025. Then we could use scale_color_identity() to use those colors.

Third, there’s a couple of instances of annotation in the plot. The first that catches my eyes is the italicized “REVISED DOWN” over the bars for May and June 2024. I’d likely do this with geom_label() making the background slightly transparent, removing the border, making the text italicized and gray. The second that catches my eyes is the black text that describes the numbers. Here again I’d use geom_label() but with regular black font. Another part of the annotation are the line and curve that connect these text elements to the data. I’d make the lines using geom_curve() to get the rounded appearance for the line going to May and June.

Finally, in classic New York Times fashion the y-axis labels are on top of the horizontal grid lines. It feels like it’s been a while since I’ve done one of these. I suspect I’d again use geom_text() to place the text on the grid lines. I’d use theme() function to remove the typical y-axis text, titles, and ticks. Although we could make use of scale_x_date() for the x-axis, I think it might be easier to treat the x-axis as categorical data and use scale_x_discrete() to put specific labels on every other month.

What do you think of this visualization? Did you notice anything that I’ve glossed over? How would you go about implementing that flourish?

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, What a year! This will be the last newsletter of 2025 and so it’s a natural break point to think back on the year and to look forward to the next. Some highlights for me have been recreating a number of panels from the collection of WEB DuBois visualizations on YouTube, recreating plots from the popular media, and modifying and recreating figures from the scientific literature. I guess you could say 2025 was a year of “recreating”! I have found this approach to making...

Hey folks, As 2025 is winding down, I want to encourage you to think about your goals for 2026! For many people designing an effective visualization and then implementing it with the tool of their choice is too much to take on at once. I think this is why many researchers recycle approaches that they see in the literature or that their mentors insist they use. Of course, this perpetuates problematic design practices. What if you could break out of these practices? What if you could tell your...

Hey folks, Did you miss me last week? Friday was the day after the US Thanksgiving holiday and I just couldn’t get everything done that I needed to. The result was an extra livestream on the figure I shared in the previous newsletter. If you haven’t had a chance to watch the three videos (one critique, a livestream, and another livestream) from that figure, I really encourage you to. In the first livestream I made an effort to simplify the panels as a set of facets. Towards the end a viewer...