Visualizing the glass ceiling for Women's History Month


Hey folks,

Did you know that March is Women’s History Month? Each year The Economist updates what they call the “Glass Ceiling Index”. This is a measure of “the role and influence of women in the workforce”. It’s an aggregate of ten factors including the gender gap in wages, work force participation, and higher education.

Sadly, the article is behind a paywall. They also haven’t made their data publicly available. Regardless, you can get a static copy of the article through archiv.is. Here’s the graphic that appears to most popular when you google for the index.

What stands out to you about this figure? To me, it’s interesting that the countries at the top tend to stay at the top and those in the bottom tend to stay at the bottom. The countries in the middle are a bit of a jumbled mess. Poland has taken a nose dive since 2016 while Britain has climbed. The U.S. has been pretty steady between 18th and 20th place. One critique is that this shows the relative trends and not the absolute. All the countries could be getting better on each factor, but we wouldn’t see it here. We’d only see whether a country is improving at the same, better, or worse rate than other countries.

Graphically, what stands out to you? What would interest you most to see done in R? Here are my first thoughts…

At first glance, this is a line plot with 30 lines. Line plots can be generated using geom_line(). But, this isn’t a vanilla line plot. The lines have a white border to them. I’d be tempted to use geom_line() twice - the first with color = "white" and a thick linewidth value and the second mapping the final rank to color in the aes() function and a narrow linewidth value. The problem with this approach is that we wouldn’t get the depth effect that we see in this plot.

Alternatively, we could try using geom_polygon() like I did in some of the W.E.B. DuBois figures last month. This would allow us to set separate fill and border colors. The downside of that approach is that the jogs in the lines would have sharp rather than rounded corners. Thankfully, the {ggforce} package, which is allied with the {ggplot2} package has a geom_shape() function, which is a fill in for geom_polygon(), but allows you to round the line jogs. I think this is the approach I’d use. I’ll need to create a function that generates the polygons for each country, but if I can do that with a spiral, surely I can do that with a line :)

A second interesting component to the figure is that the lines/polygons are colored according to the ranking from 2024. Normally, we could pull this off with scale_color_gradient2(). But I don’t think this is a normal situation. To use scale_color_gradient2(), we need to set the middle color to be part of the gradient. But here the OECD would be in the middle and it’s a dark gray color whereas the surrounding countries are grayish blue/red. I can think of two ways to try to deal with this. First, I could set the ranking of “OECD average” to NA and then use the na.value argument to have a darker gray. Second, I could create a column in my data frame that manually sets the fill color scales::pal_gradient_n() and then substitute in a dark gray color for the “OECD average” line.

A third element that catches my eye is the order of the lines. They appear to have been laid down on the “plotting canvas” in ranked order. We’ll need to make sure this happens with our recreation. This is the type of thing I’d do with factor(). We don’t typically think of an order for lines, but this is a case where it is relevant. Because the order matters, we can’t just lay down the “OECD average” line on top of everything else with its own color.

A fourth element that stands out to me is that the countries are ordered on the left side for 2016 and the right side for 2024. The left side is easy enough to do with setting the y-axis text in scale_y_continuous(). I think we could set the right using a secondary y-axis. Alternatively and most likely easier would be to turn off the y-axis text and use geom_text() to add both sets the titles to the plot. Because they’d be outside of the plotting window, we’d need to set the x-axis limits using coord_cartesian() and set expand = FALSE and clip = "off". Of course, because “OECD average” is bolded and everything else has a regular font, we’ll label it with markdown to make it bold and then use ggtext::geom_richtext() rather than geom_text().

Finally, the x-axis has the four digit year for 2016 and the last two digits of each year for the even years that follow. That’s easy enough to do with scale_x_continuous() by setting breaks = seq(2016, 2024, 2) and using the labels argument to specify the customized four or two digit numbers. Of course, they also have vertical grid lines that fall behind the data. We can set those using theme() by modifying the panel.grid.x argument using element_line().

Oof. This is going to be challenging! But, I’m excited to learn more about {ggforce} by using its geom_shape() function. The first challenge will be manually creating the data frame to generate the plot since the data aren’t accessible.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video previewvideo preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks! The summer is nearly over - where did it go?! Many of us are getting ready to send our kids off to school and start a new academic year. If you’re subscribed to this newsletter, I suspect you are interested in improving your data visualization skills. You can certainly continue to receive this newsletter and watch my weekly livestreams on YouTube for free to help increase those skills. If you want a more concentrated or personalized opportunity to develop your data visualization...

Hey folks! I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about...

Hey folks, Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation....