Check out this scatter plot with annotation in Swedish


Hey folks,

I hope you all are doing well! Before digging into today’s figure, I wanted to remind you that I’m offering a new interactive training opportunity. The goal is to provide greater opportunities to hone your skills in a social setting. My experience with leading this approach has been excellent. I can’t wait to have you give it a try with me. Please let me know if you have any questions.


This week, I’m grateful to Georg Andersson for sharing an interesting figure with me from Sweden. Thank you, Georg! If you have a plot you’d like to share, please let me know. He included the following commentary:

Here is a figure on what different values people get from agricultural landscape from a survey. The values are plotted in how frequent it was reported from the persons in the survey and distinguished between two groups, farmers and non-farmers. The colour of the dots are according to how big the difference were between the group.
Sorry for the values being in Swedish. :-)
Biologisk mångfald = Biodiversity
Öppna landskap = Open landscapes Rekreation/Naturupplevelse = Recreation / nature experience
Kultur/Historiska värden = Cultural and historical values
Estetiska värden = Aesthetical values
Lugn och ro = Peace and quiet

Looking at a broad overview of the figure, I first tried to interpret the visual myself. It appears that both farmers and non-farmers value biodiversity highly and ecosystem services (Ekosystemtjanster) quite low. Farmers tend to prefer open landscapes higher than non-farmers while non-farmers tend to prefer recreation and nature experience more than farmers. Living in a rural area on a farm, my experience resonates with these data!

Georg mentioned that there was a survey that was given to farmers and non-farmers. I can imagine there’s a data frame somewhere with a value column and farmers and non_farmers columns, which both indicate the percent of respondents who rated the entry in the value column positively. Beyond plotting the data as a “simple” scatter plot, the points are colored by the frequency difference between farmers and non-farmers. I’d use mutate() to make a column, difference that is the absolute value difference of the values in farmers and non_farmers. With these four columns, I think we’re in good shape to generate the figure.

As I mentioned, for all the extra ornamentation, this is a scatter plot. Whenever I see a scatter plot, my mind immediately thinks, geom_point()! Whenever I see text labels on points, I think geom_text(). To start a figure with the{ggplot2} package, we need to use the ggplot() function with the aes() function to indicate what column will be mapped to each aesthetic in the figure. Naturally, the non_farmer column will be mapped to the x aesthetic and farmer to the y aesthetic. The label for each point can be set by using the label aesthetic which we can map the value to. We can set the color of both the point and the label by mapping the difference column to the color aesthetic. There’s another geom that we could use in this plot, which would be geom_abline(), which would allow us to plot the diagonal line with a slope of 1 and an intercept of 0. To get the dashed appearance we can use the linetype argument likely with a value of "dashed". It appears to be a gray color, so we can use color = "gray" to mute the color a bit. I really like these annotation lines to help the audience interpret paired data like we have in this study. All of this is somewhat standard {ggplot2}. But there is a lot of non-standard stuff going on in this figure!

Beyond moving on to the elements of the figure that could be driven by the theme() function, I want to highlight something that stands out to me about this figure. I notice that not all of the points have a label next to them. By default, geom_text() will center the label at its x and y coordinate position. There are nudge_x and nudge_y arguments in geom_text() that will “nudge” the position of the label relative to the coordinates. But this will still show all of the labels. Also, it will not avoid the text overlapping other text. This would especially be a problem in the bottom right corner of the figure where we see a number of points do not have text near them. Another challenge with these types of plots (and perhaps this is an example) is that it is difficult to know which point a label corresponds to when there are a lot of points. For example, I’m not 100% clear which point “Ekosystemtjanster” corresponds to.

To overcome these challenges, the {ggrepel} package is a great tool. I has some great tools to move the labels so they don’t overlap and it can add arrows to indicate which point each label corresponds to. Long time followers may recall two videos I made using {ggrepel} regarding opinions by country on a vaccine for COVID-19. One video was for a scatter plot similar to this one and another was for a slope plot version of the same data. This package also has features to allow you to only show labels for specific points. Beyond paired survey data like we have in this example, I’ve seen {ggrepel} also used in scatter plots commonly called “volcano plots” where an investigator wants to highlight specific genes. This is a great package to be familiar with!

Let’s move to thinking about some of the theming options that could get a default {ggplot2} figure to more closely resemble this figure’s appearance. First, I notice that the x-axis and legend labels are tilted. The bottom axis could be modified using the axis.text.x argument with the element_text() function with the angle argument. Similarly, the values for the legend could be tilted using the legend.text argument. Second, I also notice that the developer of this visual likes to use bold text! This can be set by using face = "bold" in the element_text() function when given to arguments like axis.text.x. For example, theme(axis.text.x = element_text(face = "bold", angle = 45)) will give you bolded and tilted x-axis labels. Third, there are no ticks on the x or y-axes. Do you remember how to remove a feature like the x-axis ticks? What function would you give axis.ticks? If you said, “element_blank()” you’re correct! A fourth feature of this plot that stands out to me is the pale green background in the plotting window. We can modify the background of the plotting window with the panel.background argument and the background for the entire plot with the plot.background argument. Both arguments take the element_rect() function, which defines the appearance of rectangles. So, the color argument would modify the border color and the fill argument would modify the color like we see in this figure. Finally, this plot has grid lines! The theme() function has a panel.grid argument as well as panel.grid.major and panel.grid.minor arguments. These all take the element_line() as the value for the arguments. We should experiment, but its likely we want to use panel.grid.major adjusting the color of the line to suit our tastes. There are a few other things like the position and orientation of the legend that I’ll leave for you to figure out based on your own intuition ;)

For a relatively “simple” figure, this actually has a lot going on. If you’d like to play around with generating your own version, here’s some data to play around with

set.seed(19760620)

survey <- tibble(value = LETTERS[1:20],
  no_noise = c(seq(1, 30, length = 18), 40, 50),
  farmer = abs(no_noise + runif(20, min = -10, max = 10)),
  non_farmer = abs(no_noise + runif(20, min = -10, max = 10))) %>%
  select(-no_noise)

Finally, I’d love to see what types of figures that interest you. Please be like Georg and send me some examples of things that catch your eye. Also, as I come to the end of the current YouTube channel series building an R package, let me know whether you’d like me to take this verbal analysis of figures and translate it to real R code that I develop in video form.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, Next week is Thanksgiving here in the US and I’ll skip sending you another newsletter. In exchange, you’ll get three videos on YouTube inspired by a newsletter post from October talking about a descending bar plot with a pattern in one of the bars. Before you thank me, you might want to check out today’s newsletter🤣! I’ve always enjoyed the old 538’s articles and appreciated the data centric point of view of its founder Nate Silver. He has a Substack newsletter, “Silver Bulletin”,...

man floating holding on orange stick white people watching on the street

Hey folks, I have long since given up trying to anticipate what types of videos will resonate with people on YouTube. One of my most popular videos shows people how to make stacked bar plots. Throughout it, I tell people that these are a horrible way to visualize data. It’s my third most viewed video. I thought a video on slope plots would be popular. Nope. People panned last week’s episode. But Venn diagrams - holy cats! People are really geeking out about this week’s episodes on Venn...

Hey folks, I’m really grateful for the people who have emailed me recently to thank me for making the recreation and makeover videos. I’ve been excited to see the types of figures some of you are trying to make. It’s really been a great part of this work for me. Thank you! Eric Hill is a loyal Riffomonas Channel viewer who recently sent me an animation he made using the p5.js platform. The animation shows his son’s performance relative to other runners in the prestigious Nike Cross Nationals...