Thinking through how to recreate a "newsy" bar plot


Hey folks,

I’ve submitted (and re-submitted and re-re-submitted!) the {phylotypr} R package that I’ve been developing on the YouTub channel. I’m optimistic that {phylotypr} should be on CRAN in the next few days. With that, I’ll be moving on to a new strategy with my videos. My plan is to take the narrative that I present in these newsletters and go through the process of re-producing the figures I discuss. I recorded the first one yesterday and I think you’ll really like this new series.

Can you do something for me? If you’re like most of my colleagues, you probably have about 20 tabs open in your browser. I’d love it if you were to send me a copy of a figure in one of those tabs along with a link to the page. I really want to present figures that are relevant to your interests and work.


This week, I have a figure that was published within an article on the local economy in Bridge Michigan, a non-partisan, non-profit news outlet where I live.

As with last week, I want to encourage you to ask some questions about any plot you find to help you develop your taste and and think through how you would recreate elements of a plot. What type of plot is this? Aside from the data story, what is interesting about this figure? What do you like about it? What don’t you like about it? Can you outline the steps you would take to generate the figure? What are some of the steps you aren’t sure about and would like to learn?

In case it helps, here’s some code to give you a data frame that you could use to play with some of our ideas.


mi_wages <- tibble(
year = 2017:2023,
earnings = c(46709, 47918, 48721, 52459, 55432, 57354)
)

This is a bar plot, somewhat similar to what I showed you last week. Here are five things that caught my eye (in order of difficulty-ish). First, the image has a main title, a sub title, and text at the bottom of the figure indicating the source. Second, the median earnings for each year is embedded within the bars. Third, the numbers on the y-axis are horizontal and sit on their grid line. Fourth, the x-axis has lines separating each year and the year 2020 is missing (sneaky!). Fifth, they have a legend directly above the bar plot, but rather than a square the symbol is a circle.

First, let’s rough in a plot with the various titles. To review from last week, we can make a bar plot like this one by using geom_col(). Naturally, we would want to map columns from mi_wages to our aesthetics. We’d use aes() within ggplot() to map year to x and earnings to y. We’ll probably have to change this later, but I’d likely use fill = "darkred" as an argument in geom_col() to make the bars dark red. To get the titles I’d use labs() with the title, subtitle, and caption arguments. By default, the caption is placed below the plot and is generally right justified. Once we get the text in place, my thoughts would turn to using theme() to get the fonts, sizes, and faces correct. Because the title and subtitle span multiple rows, I’d likely use element_textbox_simple() from {ggtext} in place of element_text() to get the wrapping correct. We might need to play with vjust, hjust, and margin in these functions to get the placement just right.

A quick aside: One little hint that I usually forget is to go ahead and use ggsave to output the figure as a PNG file with predefined dimensions that match the original figure. It can get mighty frustrating to mess with theme arguments only to have everything move around when you output the file. Use the outputted file, not the plot in the lower right corner. Trust me.

Second, the median earnings for each year are embedded within the bars. This is a pretty cool alternative to placing the numbers in a small font above the bars, which is what I typically see. The value all have a dollar sign and a comma to separate in multiples of thousands of dollars. I generally make a column in my data frame called “pretty” for situations like this where I store the stylized value. You can use format() with paste() or glue::glue() to pull this off. Alternatively, you might want to learn how to use scales::label_currency(), which does the hard work for you. I’ll use the x-axis position of the bar, but I’ll overwrite the y aesthetic by setting y=0 or something close to zero as an argument in geom_text(). I’ll also need to include label = pretty as an argument within aes(). At this point the pretty value will be horizontal and probably pretty small. I’m pretty sure I’ll also need to adjust angle, size, hjust, and color within geom_text() to get the value to look right.

Third, I like the look of having the numbers on the y-axis be horizontal and sitting on their grid line. How would we pull this off? One thought was to use the axis.text.y argument to rotate the labels and then use values outside of zero and one for hjust and vjust to move the text to be over the tick since it will naturally be at the end of the tick. But this won’t work because we need the numbers to be left justified. As an alternative, I’ll use annotate() to create my own axis text values (surprise!). This means turning off axis.text.y and using negative values of vjust to get the values to be above the tick. We’ll also need to make those ticks longer. To plot data outside of the plotting area we’ll need to use coord_cartesian() to set clip = "off", turn off expand, and define my x and y-axis limits (bonus: why won’t doing this in scale_x/y_continuous() work?). Finally, we can go ahead and use theme_classic() and remove the y-axis line and add those horizontal grid lines.

Fourth, you’ve probably noticed that the year 2020 is missing (sneaky!) and our tick marks are aligned with the numbers rather than being dodged to the side. To close the gap, I’ll add a column to by table index that has the values of one through six. Then I’ll map index to x rather than year. Now I’ll use scale_x_continuous() to connect the years to the indices. Gap closed! How about those tick marks? First we should turn off the major tick marks in theme(). In scale_x_continuous() there is a minor_break argument that I’ll set as going from 0.5 to 6.5 by steps of 1.0. This could give us minor grid lines, which we want, but it won’t directly give us the tick marks. To get the minor tick marks, we’ll need to add something clever to scale_x_continous(). We’ll use the guide argument like this: guide = guide_axis(minor.ticks = TRUE). The guide_axis() function allows us to add special stylings to our scales. I don’t know why minor.ticks isn’t an argument of scale_x_continuous(). Oh well. You’ll want to modify axis.minor.ticks.x.bottom and axis.minor.ticks.length.x to get the look you like. We’ll also have to modify the vjust argument of axis.text.x to push the values up to the axis and between the minor tick marks. I’ll let you see if you can use annotate() to add the solid line below the numbers.

Finally, this figure has a legend. Why?! There’s one category. There’s no need for a legend! Regardless, there’s an opportunity to learn here. What I find interesting about the legend is that the symbol is a circle. Normally we get squares when we make a bar plot. There’s at least three ways to do this. First, I could probably use annotate() to draw in the legend. That seems lame. Second, I would turn off the legend for geom_bar() and use geom_point() to plot a round circle behind the bars. Then I’d use the legend.position argument in theme() to put it at the top. Elegant? Third, I could use a variation of the guide = guide_legend() trick I used above with the tick marks. To do this we’ll have to trick the system into thinking that the fill color of our bars is being mapped from a variable. Otherwise we won’t get a legend. We’ll have to do two more things. First, we’ll need to give geom_col() the key_glyph = "point" argument, which will change the typical square for a bar plot to a plotting symbol like you’d use with geom_point(). Now, to get that symbol styled correctly, we’ll need that guide argument in scale_fill_manual(). It’s a lot to explain and honestly, I always have to use google to find it. Here you go.

Phew. That’s a lot for what initially seemed like a pretty simple bar plot. There are a few other subtle things going on here that I’ll let you think about. First, I know there are ways of bringing in graphics like the watermark in the bottom left corner. I’m pretty sure we could use annotate() to put the watermark and the name of the data journalist who made this figure at the bottom of the figure. The only other difference I see between our figures are the fonts we’re using. I’ve messed with fonts in previous Code Club episodes, but I’ll let you explore importing google’s fonts into R by following this tutorial. I think the title is “Abril Fatface” and the other fonts are “Open Sans”.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development
man floating holding on orange stick white people watching on the street

Hey folks, I have long since given up trying to anticipate what types of videos will resonate with people on YouTube. One of my most popular videos shows people how to make stacked bar plots. Throughout it, I tell people that these are a horrible way to visualize data. It’s my third most viewed video. I thought a video on slope plots would be popular. Nope. People panned last week’s episode. But Venn diagrams - holy cats! People are really geeking out about this week’s episodes on Venn...

Hey folks, I’m really grateful for the people who have emailed me recently to thank me for making the recreation and makeover videos. I’ve been excited to see the types of figures some of you are trying to make. It’s really been a great part of this work for me. Thank you! Eric Hill is a loyal Riffomonas Channel viewer who recently sent me an animation he made using the p5.js platform. The animation shows his son’s performance relative to other runners in the prestigious Nike Cross Nationals...

Hey folks, One of the benefits of sending out these newsletters and making my YouTube videos is that I get a ton of practice. I can’t emphasize how much practice has paid off in learning to use dplyr, ggplot2, and other packages. Reproducing published figures has really helped me to dive into parts of ggplot2 that I wouldn’t normally use because I make plots that use the features of ggplot2 that I know. By expanding my knowledge of ggplot2, I’m finding that the plots I make from scratch are...