Making a basic line plot appear more sophisticated


Hey folks,

I’ve now produced three livestream videos. What do you think? Do you watch them live or watch them later? Or are they too long? I’m looking for honest feedback! I have to admit that if I hadn’t livestreamed these videos, they would not have been produced. It’s nice that I can more or less record and post without any editing. This is still a bit of an experiment. I think fewer people are watching the episodes which makes me worry that this might be an overall step backwards for you all. I want what I do to have maximum benefit, so please don’t hesitate to respond to this email and let me know what you think.


Yesterday morning, I received a newsletter from Philip Bump who writes a column for the The Washington Post. He has a couple of newsletters, but this one is an “add on” to his columns where he shares more of the data behind what goes into his columns. Although not overly complicated, I thought this would be a fun “basic” plot for beginners but enough ornamentation for more advanced R users.

This plot was an add on to his column on a generational rift in the Democratic Party in the aftermath of the New York City mayoral primary election. In this plot he uses March 2025 data from Gallup to compare how the two parties differ in their support for Israelis versus Palestinans. So, how would I go about making this plot?

We need the data. If you go to the Gallup article, the second plot has three tabs. One each for Democrats, Republicans, and Independents. The plots show the percent, by party, who support Israelis or Palestinians. In the lower left corner of the plot is a link to “Get the data”, which downloads a CSV-formatted file for the data in each plot. We’ll need to get both the Democrat and Republican datasets. Also, we’ll need to go back to the first plot and get the data for “All Americans”. For each of these files, we’ll need to read them in and join them into a single tibble. We can read the three files in to a single tibble using read_csv() and add a column indicating what group the data represent. We’ll then use mutate() to subtract the level of support for Palestians from that for Israelis. Now we should be ready to plot hte data.

Again, at the fundamental level, this is a line plot with three groups. We can do this in {ggplot2} by mapping the year to the x-axis, the difference to the y-axis, and the group to the color. We’ll use geom_line() to generate the lines and scale_color_manual() to customize the color of the lines. We can use labs() to add the title, subtitle, and caption. Of course, we’ll need to use theme() to left justify those three elements to the left side of the plot and to bold the title.

Now for the ornamentation.

First, the axes will need some help. There are no axis titles or ticks. Those can be removed with theme(). We will need to space the values by every 5 years and 40 percentage points on the axes. This can be done using scale_x_continuous() and scale_y_continuous(). In labelling the y-axis, he put -40 at the bottom. I probably would have used +40 on the bottom (and top of the plot) since the text around the zero line indicates the top half indicates more support for Israelis and the bottom for Palestinians. Whatever. Following his lead will give us an opportunity to look up the unicode for the +/- character next to the zero.

Second, the gridline choices are “interesting”. The y-axis gridlines look fairly standard. However, we’ll have to add a thicker black line at zero. For the x-axis gridlines he has one at 2016 and October 7, 2023. We’ll have to make those x-axis gridlines and the zero line using geom_vline() and geom_hline(), respectively. These gridlines are annotated. I think the x-axis gridlines are going to be straightforward to implement using annotate(geom = "text"). The zero line will require some markdown since the “Israelis”/”Palestinians” are bolded. We can do that with geom_richtext() (I recall from a recent episode that we can’t do annotate(geom = "richtext")).

Finally, there is text in the right hand margin indicating what each line represents. We can place the text using geom_text() and by filtering the overall dataset to position the label at the level of the most recent data. We’ll need to use coord_cartesian(clip = "off") to plot the text outside of the plotting panel. The last bit of ornamentation is a dashed line between the end of each line and the text label for the line.

All in all, this should be a less intense plot than what I’ve been making lately. At the same time, we get to practice some fun stuff with text. I think it will also give an opportunity to compare how we use geom_text() and annotate(geom = "text"). That should help me think about why I would pick one over another.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks! The summer is nearly over - where did it go?! Many of us are getting ready to send our kids off to school and start a new academic year. If you’re subscribed to this newsletter, I suspect you are interested in improving your data visualization skills. You can certainly continue to receive this newsletter and watch my weekly livestreams on YouTube for free to help increase those skills. If you want a more concentrated or personalized opportunity to develop your data visualization...

Hey folks! I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about...

Hey folks, Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation....