Hey folks, Next week is Thanksgiving here in the US and I’ll skip sending you another newsletter. In exchange, you’ll get three videos on YouTube inspired by a newsletter post from October talking about a descending bar plot with a pattern in one of the bars. Before you thank me, you might want to check out today’s newsletter🤣! I’ve always enjoyed the old 538’s articles and appreciated the data centric point of view of its founder Nate Silver. He has a Substack newsletter, “Silver Bulletin”, that is very good. I’m too cheap to pay for a subscription, so I settle for the bread crumbs he includes on the free subscription. Last night I received his latest article, Hopium comes at a high price. The article is part of a debrief on the election and the state of polling and predictive models like his. His contention is that polls continues to underestimate Trump’s numbers, but within the margin of error of those polls. Regardless of what you think of Trump or Silver’s analysis, I was captivated by the visual that he included in the newsletter. As always, I encourage you to ask some questions about any plot you find to help you develop your taste and and think through how you would recreate elements of a plot. What type of plot is this? Aside from the data story, what is interesting about this figure? What do you like about it? What don’t you like about it? Can you outline the steps you would take to generate the figure? What are some of the steps you aren’t sure about and would like to learn? This plot was eerily reminiscent of a plot that I made back in 2021 showing the likelihood of people getting the COVID-19 vaccine at different times by country. I called this plot a “dumbbell” or “barbell” plot because for each entity (e.g., state or country) there is a ball connected by a line - it looks like a dumbbell. You might recall another set of videos I made recently based on paired data where I made a scatter plot and a slope plot inspired by sentiments of farmers and non-farmers in Sweden. A dumbbell plot is another way to show paired data for a handful of entities. If I were asked to recreate Silver’s figure, I’d expect to get a data frame with three columns -
At a basic level, a dumbbell plot can be made with with a combination of Let’s start with the handles. Using my What about the “bells”? For those, I need all of the polling data in a single column. I’d need to generate a second data frame using The labels are a bit more tricky. I’d use I was also struck by the “legend” across the top indicating the white point is the polling average margin and the green the actual margin. I’d probably use a few The axis labels also have some cool things going on. The x-axis text is a pretty slick way of embedding who was favored to the left and right of the black line at zero. I’d use Finally, the plot has vertical grid lines that are grey. There’s also one that is black at zero. We could do one or the other in the This figure shows the 7 “battleground” states from the election. Because of how our elections work, it’s the state one wins that matters, not the number of votes they get overall. So, although Harris won California by 21%, she still got the same number of electoral votes as if she had only one it by 1%. Ditto for Trump and Texas. Regardless, it would be interesting to see these types of data for the 43 other states. Beyond being more complete, I’m interested in this to whether the same ~2.5 percentage point difference holds up regardless of the state. Maybe I’ll see if I can track that data down between now and when I produce the remake video, likely in January. I’ll award bonus points if anyone does that for me :)
|
Hey folks, If you’re interested in participating in a 1-day (6 hours) data visualization workshop, you’re running out of time to register. I’ll be teaching this workshop on May 9th. I will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. After this workshop, you should be able to learn more advanced topics on your own. You...
Hey folks, I’m gearing up to teach a 1-day (6 hours) data visualization workshop on May 9th. This workshop will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. From this workshop, I hope that you would be able to go off on your own journey learning more advanced topics. You can learn more and register by clicking the button...
Hey folks, Long time friends of Riffomonas know that I’ve been teaching data science classes for close to 20 years. The hallmark of my teaching has been three-day workshops where I either teach R (here and here) or the mothur software package. I’ve gotten feedback that three days is just too much time for people to carve out of their busy schedules. So, I’m excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. This will cover an introduction to the ggplot2 package....