Showing the effects of gerrymandering with ggplot2


Hey folks!

I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about coding, we’ll have an opportunity to focus on the big ideas that will allow us to design the most effective visualizations. If you have any questions, feel free to reply to this email.


Because it’s the only system I know and it seems weird to me, I can only assume that our system of assigning regions to legislative representatives is bizarre to everyone else in the world. Basically, state legislatures can modify the boundaries of a district so long as each district has the same number of people. Legislators can draw some pretty funky maps that have all sorts of twists and turns. The goal being to maximize the number of “safe” districts for their party and minimize the number of safe districts for the opposing party. The product is what’s called gerrymandering. This summer, attention has gone to Texas. Texas has 38 districts. Under the current regions, in 2024 Trump won 27 of the districts and 25 of those are held by Republicans. Under a proposed rewrite of the regions, he would have won 30 districts by at least 10 percentage points. The logic goes that those 30 districts would be held by Republicans in the 2026 midterm election. Keep in mind that Trump won Texas with 56% of the vote. Based on that proportion, you might expect 21 of the seats to go to Republicans. Of course, Republicans aren’t the only party that engages in this type of behavior. Democrats do it to and there are threats of other, Democrat leaning states following Texas’s lead. I am a fan of jitter plots and so a jitter plots in a NY Times article on the topic caught my eye:

A jitter plot randomizes the x (or y in this case) axis position to prevent points from falling on top of each other. The other axis is on a continuous scale. In this case, the categorical variable (i.e. current or proposed districts) is on the y-axis and the results from the 2024 election for each set of districts. A jitter plot can be created using geom_jitter(). One problem I foresee with using geom_jitter() is that the NYT plot doesn’t seem to be a perfect jitter plot. On the +40 or greater gridline there are points evenly aligned vertically. Also the jittering increases where there are more points. You can see this between 20 and 30% Trump in the bottom panel. This makes me think of a “sina plot”, which is like a jitter plot but where the shape of the jitter looks like a violin plot. We could use geom_sina() from {ggforce} to get something that approaches the appearance of the NY Times version of the plot. I think the main difference would be the points on the end having a jittered appearance when they have a more orderly appearance in the original. If we wanted to be 100% faithful to the original perhaps we could make the plot in parts combining geom_sina() and geom_point().

At first glance, I thought I might recreate this plot by making two separate plots that each have three facets. We could combine the plots with {patchwork} and put the facets close enough to each other that. you wouldn’t notice they were facets. But on thinking about it more, I think I’d rather make it a faceted plot with two rows, one for each districting scheme. Within each facet I’d then have the three labels added with geom_richtext from the {ggtext} package. I’d use geom_richtext() so I could combine bolded and regular fonts. Thinking about those labels, I’d likely use {glue} to insert the number of districts and the change in the number of districts.

Something I’m not sure about is how to have the gridlines not go up through the labels. One option would be to make the background of each label wide enough that it covers the gridlines that would normally come up behind the text. By this approach the gridlines could be controlled with theme(). Another option would be to still use geom_richtext(), but insert the gridlines using geom_segment() and have them only span the jitter plot section of the figure. Another interesting element of the figure is the shading beteween 10% Harris and 10% Trump. I’d probably create that with geom_rect(). I’d need to select my y axis boundaries depending on which gridline strategy I used.

Let’s think about the use of color for the points. I notice two things. First, there are different shades of blue and red for the points above 20%, that fall between 10 and 20%, and those that fall between 0 and 10%. This could be implemented by creating a dummy variable for each of the ranges and then changing it with scale_fill_manual(). Why fill and not color? I noticed that the points between 0 and 10% favoring Trump in the top panel have a black ring around them. These symbols could be created using plotting symbol 21 setting the fill to a light pink and the ring - or stroke - to black. Most likely, I would use symbol 21 for everything and match the fill and color values in scale_*_manual() for points outside the 0 to 10% range.

I think this should get us pretty close to a faithful representation of the original figure. Oh yeah, one small thing to consider is where to get the data! I noticed that the NY Times version isn’t interactive and doesn’t have data hiding in the source code. But I was able to track down an interactive map that does have the data hiding. Also, we can get the actual margins from the 2024 election with the current districts from wikipedia. We might need to use some tools from {rvest} to parse apart the table from wikipedia.

What do you think? See if you can give this figure your best effort and let me know how it goes!

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation....

Hey folks, I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in...

Hey folks, Are you looking for more personalized support and coaching to help you develop your data analysis skills? Are you looking for help in leading a data science team where your folks aren’t super proficient in analyzing data? Let me know what you’re looking for and we can discuss how I might be able to help you. Unfortunately, this wouldn’t be a free service. But, I’m confident I can help you get over the challenges that are keeping you from creating data analyses and visualizations...