|
Hey folks, I’m really excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. It will cover the basics of ggplot2. If you’ve been following along this newsletter for anytime, you know I’ve thought a lot about how we learn. A critical element of learning is to create a mental model that we can hang ideas on to flesh out our understanding of a concept. The “grammar of graphics” is one such mental model for building plots. It is instantiated in ggplot2 - that’s the “gg” in the name! My goal is to help you develop that mental model so that you can leave the workshop understanding the ggplot2 framework and add to your understanding of the model as you go off on your own journey learning more advanced topics. You can learn more and register by clicking the button below. Feel free to email me if you have any questions. Let me know if you’d like to see other one or part day workshops offered related to the types of things I discuss in the newsletter or over on YouTube. I’ve been swamped the last couple of weeks with a variety of things. This is keeping me from my regular posting of videos to YouTube - sorry! I do plan on getting back on track soon. However, I may be limited to one a week rather than the recent two a week cadence. I have too many things that are falling by the side that I need to get caught back up before putting more effort into the channel. Hopefully, you’ll understand. If you’re like many of us in the US, we’ve been getting whiplash trying to understand the current status of the tariff war, why it’s happening, and what the effects are. This has been quite the week for the Trump administration which seems to be trying to one up itself each week with things it can do to be unpredictable. I’ve been struck by a few of the visualizations coming out of the New York Times describing the impact of the administrations policies on the US and international stock markets. Here are two visuals they posted in an article last week: I’ll focus on the first plot and encourage you to think through the second on your own - they’re somewhat related. The first is a line plot showing the closing value of the S&P 500, a barometer of the 500 leading companies on the US stock exchange. There are breaks in the lines indicating the weekends and holidays. Each week’s worth of data has a shaded rectangle in the background that is colored by whether the week ended lower than the previous week. So how would I pull this off? I see three major components: the data, the lines, and the rectangles. To get the data, I would use the {quantmod} package. This package has a function called To generate the line plot with dots and breaks for weekends and holidays there are two strategies I’d consider. Normally, To generate the shaded background, I’d use I think the data, the lines, and the rectangles are the big parts of the plot to figure out. Other things would include: (1) the horizontal grid lines running on top of the rectangles, but behind the lines; (2) commas in the y-axis values; (3) placement of the month below the first date of the month and returning data every two weeks; and (4) getting “Orange bars” to be orange and bolded in the subtitle. If you’ve been following along over the past few months you likely have an idea for how we can do each of these. Don’t forget to give the second plot the same treatment! What do you think the hard parts are in that plot?
|
Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...
Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...
Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...