Visualizing the glass ceiling for Women's History Month


Hey folks,

Did you know that March is Women’s History Month? Each year The Economist updates what they call the “Glass Ceiling Index”. This is a measure of “the role and influence of women in the workforce”. It’s an aggregate of ten factors including the gender gap in wages, work force participation, and higher education.

Sadly, the article is behind a paywall. They also haven’t made their data publicly available. Regardless, you can get a static copy of the article through archiv.is. Here’s the graphic that appears to most popular when you google for the index.

What stands out to you about this figure? To me, it’s interesting that the countries at the top tend to stay at the top and those in the bottom tend to stay at the bottom. The countries in the middle are a bit of a jumbled mess. Poland has taken a nose dive since 2016 while Britain has climbed. The U.S. has been pretty steady between 18th and 20th place. One critique is that this shows the relative trends and not the absolute. All the countries could be getting better on each factor, but we wouldn’t see it here. We’d only see whether a country is improving at the same, better, or worse rate than other countries.

Graphically, what stands out to you? What would interest you most to see done in R? Here are my first thoughts…

At first glance, this is a line plot with 30 lines. Line plots can be generated using geom_line(). But, this isn’t a vanilla line plot. The lines have a white border to them. I’d be tempted to use geom_line() twice - the first with color = "white" and a thick linewidth value and the second mapping the final rank to color in the aes() function and a narrow linewidth value. The problem with this approach is that we wouldn’t get the depth effect that we see in this plot.

Alternatively, we could try using geom_polygon() like I did in some of the W.E.B. DuBois figures last month. This would allow us to set separate fill and border colors. The downside of that approach is that the jogs in the lines would have sharp rather than rounded corners. Thankfully, the {ggforce} package, which is allied with the {ggplot2} package has a geom_shape() function, which is a fill in for geom_polygon(), but allows you to round the line jogs. I think this is the approach I’d use. I’ll need to create a function that generates the polygons for each country, but if I can do that with a spiral, surely I can do that with a line :)

A second interesting component to the figure is that the lines/polygons are colored according to the ranking from 2024. Normally, we could pull this off with scale_color_gradient2(). But I don’t think this is a normal situation. To use scale_color_gradient2(), we need to set the middle color to be part of the gradient. But here the OECD would be in the middle and it’s a dark gray color whereas the surrounding countries are grayish blue/red. I can think of two ways to try to deal with this. First, I could set the ranking of “OECD average” to NA and then use the na.value argument to have a darker gray. Second, I could create a column in my data frame that manually sets the fill color scales::pal_gradient_n() and then substitute in a dark gray color for the “OECD average” line.

A third element that catches my eye is the order of the lines. They appear to have been laid down on the “plotting canvas” in ranked order. We’ll need to make sure this happens with our recreation. This is the type of thing I’d do with factor(). We don’t typically think of an order for lines, but this is a case where it is relevant. Because the order matters, we can’t just lay down the “OECD average” line on top of everything else with its own color.

A fourth element that stands out to me is that the countries are ordered on the left side for 2016 and the right side for 2024. The left side is easy enough to do with setting the y-axis text in scale_y_continuous(). I think we could set the right using a secondary y-axis. Alternatively and most likely easier would be to turn off the y-axis text and use geom_text() to add both sets the titles to the plot. Because they’d be outside of the plotting window, we’d need to set the x-axis limits using coord_cartesian() and set expand = FALSE and clip = "off". Of course, because “OECD average” is bolded and everything else has a regular font, we’ll label it with markdown to make it bold and then use ggtext::geom_richtext() rather than geom_text().

Finally, the x-axis has the four digit year for 2016 and the last two digits of each year for the even years that follow. That’s easy enough to do with scale_x_continuous() by setting breaks = seq(2016, 2024, 2) and using the labels argument to specify the customized four or two digit numbers. Of course, they also have vertical grid lines that fall behind the data. We can set those using theme() by modifying the panel.grid.x argument using element_line().

Oof. This is going to be challenging! But, I’m excited to learn more about {ggforce} by using its geom_shape() function. The first challenge will be manually creating the data frame to generate the plot since the data aren’t accessible.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I’m gearing up to teach a 1-day (6 hours) data visualization workshop on May 9th. This workshop will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. From this workshop, I hope that you would be able to go off on your own journey learning more advanced topics. You can learn more and register by clicking the button...

Hey folks, Long time friends of Riffomonas know that I’ve been teaching data science classes for close to 20 years. The hallmark of my teaching has been three-day workshops where I either teach R (here and here) or the mothur software package. I’ve gotten feedback that three days is just too much time for people to carve out of their busy schedules. So, I’m excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. This will cover an introduction to the ggplot2 package....

Hey folks, I’m really excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. It will cover the basics of ggplot2. If you’ve been following along this newsletter for anytime, you know I’ve thought a lot about how we learn. A critical element of learning is to create a mental model that we can hang ideas on to flesh out our understanding of a concept. The “grammar of graphics” is one such mental model for building plots. It is instantiated in ggplot2 - that’s the “gg”...