Making a basic line plot appear more sophisticated


Hey folks,

I’ve now produced three livestream videos. What do you think? Do you watch them live or watch them later? Or are they too long? I’m looking for honest feedback! I have to admit that if I hadn’t livestreamed these videos, they would not have been produced. It’s nice that I can more or less record and post without any editing. This is still a bit of an experiment. I think fewer people are watching the episodes which makes me worry that this might be an overall step backwards for you all. I want what I do to have maximum benefit, so please don’t hesitate to respond to this email and let me know what you think.


Yesterday morning, I received a newsletter from Philip Bump who writes a column for the The Washington Post. He has a couple of newsletters, but this one is an “add on” to his columns where he shares more of the data behind what goes into his columns. Although not overly complicated, I thought this would be a fun “basic” plot for beginners but enough ornamentation for more advanced R users.

This plot was an add on to his column on a generational rift in the Democratic Party in the aftermath of the New York City mayoral primary election. In this plot he uses March 2025 data from Gallup to compare how the two parties differ in their support for Israelis versus Palestinans. So, how would I go about making this plot?

We need the data. If you go to the Gallup article, the second plot has three tabs. One each for Democrats, Republicans, and Independents. The plots show the percent, by party, who support Israelis or Palestinians. In the lower left corner of the plot is a link to “Get the data”, which downloads a CSV-formatted file for the data in each plot. We’ll need to get both the Democrat and Republican datasets. Also, we’ll need to go back to the first plot and get the data for “All Americans”. For each of these files, we’ll need to read them in and join them into a single tibble. We can read the three files in to a single tibble using read_csv() and add a column indicating what group the data represent. We’ll then use mutate() to subtract the level of support for Palestians from that for Israelis. Now we should be ready to plot hte data.

Again, at the fundamental level, this is a line plot with three groups. We can do this in {ggplot2} by mapping the year to the x-axis, the difference to the y-axis, and the group to the color. We’ll use geom_line() to generate the lines and scale_color_manual() to customize the color of the lines. We can use labs() to add the title, subtitle, and caption. Of course, we’ll need to use theme() to left justify those three elements to the left side of the plot and to bold the title.

Now for the ornamentation.

First, the axes will need some help. There are no axis titles or ticks. Those can be removed with theme(). We will need to space the values by every 5 years and 40 percentage points on the axes. This can be done using scale_x_continuous() and scale_y_continuous(). In labelling the y-axis, he put -40 at the bottom. I probably would have used +40 on the bottom (and top of the plot) since the text around the zero line indicates the top half indicates more support for Israelis and the bottom for Palestinians. Whatever. Following his lead will give us an opportunity to look up the unicode for the +/- character next to the zero.

Second, the gridline choices are “interesting”. The y-axis gridlines look fairly standard. However, we’ll have to add a thicker black line at zero. For the x-axis gridlines he has one at 2016 and October 7, 2023. We’ll have to make those x-axis gridlines and the zero line using geom_vline() and geom_hline(), respectively. These gridlines are annotated. I think the x-axis gridlines are going to be straightforward to implement using annotate(geom = "text"). The zero line will require some markdown since the “Israelis”/”Palestinians” are bolded. We can do that with geom_richtext() (I recall from a recent episode that we can’t do annotate(geom = "richtext")).

Finally, there is text in the right hand margin indicating what each line represents. We can place the text using geom_text() and by filtering the overall dataset to position the label at the level of the most recent data. We’ll need to use coord_cartesian(clip = "off") to plot the text outside of the plotting panel. The last bit of ornamentation is a dashed line between the end of each line and the text label for the line.

All in all, this should be a less intense plot than what I’ve been making lately. At the same time, we get to practice some fun stuff with text. I think it will also give an opportunity to compare how we use geom_text() and annotate(geom = "text"). That should help me think about why I would pick one over another.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks! Do you ever get that feeling where you’re scared to try something? But then you do it anyway… and it turns out way better than you expected? Well that was me on Wednesday morning. I ran my first livestream on YouTube recreating a ridgeline plot from Our World in Data showing the US baby boom. I wrote about it here in the newsletter back in May. The full session was about 2.5 hours. YouTube tells me that 272 people popped in at some point during the session. To be honest, I really...

Hey folks, I need your feedback on an idea! Don’t worry, there’s some visualization stuff at the bottom. I had a video nearly ready to post this week using a ridgeline plot to show the baby boom. I think I did a great job of recreating the plot. But through a series of unfortunate events, I lost the video. I actually recorded the video three times because my computer kept crashing as I was recording it. This was on top of increasing busyness on my part with teaching, proposal writing,...

Hey folks, I really enjoyed teaching a one-day, introduction to ggplot2 workshop last week. It was a lot of fun - I enjoyed teaching the principles behind ggplot2. I’ve been noticing many learners (and teachers) focusing on making templates that they can recycle to make variations on a common plot type. This is how I often teach ggplot2 and the rest of the tidyverse - it’s also how I learned R. In the most recent workshop I was testing a hypothesis that teaching concepts would yield more long...