|
Hey folks! I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about coding, we’ll have an opportunity to focus on the big ideas that will allow us to design the most effective visualizations. If you have any questions, feel free to reply to this email.
Because it’s the only system I know and it seems weird to me, I can only assume that our system of assigning regions to legislative representatives is bizarre to everyone else in the world. Basically, state legislatures can modify the boundaries of a district so long as each district has the same number of people. Legislators can draw some pretty funky maps that have all sorts of twists and turns. The goal being to maximize the number of “safe” districts for their party and minimize the number of safe districts for the opposing party. The product is what’s called gerrymandering. This summer, attention has gone to Texas. Texas has 38 districts. Under the current regions, in 2024 Trump won 27 of the districts and 25 of those are held by Republicans. Under a proposed rewrite of the regions, he would have won 30 districts by at least 10 percentage points. The logic goes that those 30 districts would be held by Republicans in the 2026 midterm election. Keep in mind that Trump won Texas with 56% of the vote. Based on that proportion, you might expect 21 of the seats to go to Republicans. Of course, Republicans aren’t the only party that engages in this type of behavior. Democrats do it to and there are threats of other, Democrat leaning states following Texas’s lead. I am a fan of jitter plots and so a jitter plots in a NY Times article on the topic caught my eye: A jitter plot randomizes the x (or y in this case) axis position to prevent points from falling on top of each other. The other axis is on a continuous scale. In this case, the categorical variable (i.e. current or proposed districts) is on the y-axis and the results from the 2024 election for each set of districts. A jitter plot can be created using At first glance, I thought I might recreate this plot by making two separate plots that each have three facets. We could combine the plots with Something I’m not sure about is how to have the gridlines not go up through the labels. One option would be to make the background of each label wide enough that it covers the gridlines that would normally come up behind the text. By this approach the gridlines could be controlled with Let’s think about the use of color for the points. I notice two things. First, there are different shades of blue and red for the points above 20%, that fall between 10 and 20%, and those that fall between 0 and 10%. This could be implemented by creating a dummy variable for each of the ranges and then changing it with I think this should get us pretty close to a faithful representation of the original figure. Oh yeah, one small thing to consider is where to get the data! I noticed that the NY Times version isn’t interactive and doesn’t have data hiding in the source code. But I was able to track down an interactive map that does have the data hiding. Also, we can get the actual margins from the 2024 election with the current districts from wikipedia. We might need to use some tools from What do you think? See if you can give this figure your best effort and let me know how it goes!
|
Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...
Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...
Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...