Two ways to plot a dumpster fire


Hey folks,

I’m really excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. It will cover the basics of ggplot2. If you’ve been following along this newsletter for anytime, you know I’ve thought a lot about how we learn. A critical element of learning is to create a mental model that we can hang ideas on to flesh out our understanding of a concept. The “grammar of graphics” is one such mental model for building plots. It is instantiated in ggplot2 - that’s the “gg” in the name! My goal is to help you develop that mental model so that you can leave the workshop understanding the ggplot2 framework and add to your understanding of the model as you go off on your own journey learning more advanced topics. You can learn more and register by clicking the button below. Feel free to email me if you have any questions.

Let me know if you’d like to see other one or part day workshops offered related to the types of things I discuss in the newsletter or over on YouTube.


I’ve been swamped the last couple of weeks with a variety of things. This is keeping me from my regular posting of videos to YouTube - sorry! I do plan on getting back on track soon. However, I may be limited to one a week rather than the recent two a week cadence. I have too many things that are falling by the side that I need to get caught back up before putting more effort into the channel. Hopefully, you’ll understand.


If you’re like many of us in the US, we’ve been getting whiplash trying to understand the current status of the tariff war, why it’s happening, and what the effects are. This has been quite the week for the Trump administration which seems to be trying to one up itself each week with things it can do to be unpredictable.

I’ve been struck by a few of the visualizations coming out of the New York Times describing the impact of the administrations policies on the US and international stock markets. Here are two visuals they posted in an article last week:

I’ll focus on the first plot and encourage you to think through the second on your own - they’re somewhat related. The first is a line plot showing the closing value of the S&P 500, a barometer of the 500 leading companies on the US stock exchange. There are breaks in the lines indicating the weekends and holidays. Each week’s worth of data has a shaded rectangle in the background that is colored by whether the week ended lower than the previous week.

So how would I pull this off? I see three major components: the data, the lines, and the rectangles.

To get the data, I would use the {quantmod} package. This package has a function called getSymbols() that allows you to give it a vector of stock symbols returning time series data for those symbols. For example, quantmod::getSymbols("^GSPC", auto.assign = FALSE, from = "2025-01-01") will get you the values of the S&P 500 going back to the start of the year. I would use the same approach to get the data for the other countries’ markets included in the second plot. To convert from a time series object to a data frame or tibble, we should be able to use as.data.frame() or as_tibble(). One hiccup with as_tibble() is that we’d have to find the argument to convert the rownames to a column. If you’re interested in stock price data, I’d encourage you to play around with this package. You can get intradata data at the one-minute interval!

To generate the line plot with dots and breaks for weekends and holidays there are two strategies I’d consider. Normally, geom_line() will draw a line connecting all points, unless there’s an NA value for one of the x-axis values or if there is grouping data. We could insert NA values for dates corresponding to each weekend and holiday. I could easily do this for weekends using lubridate::wday(). But I think it would be hard (for me) to remember all the holidays when the markets are closed. Instead, I think I’ll use lubridate:week(), lubridate:isoweek(), or lubridate:epiweek() to get the week of the year. Then I could map that value to the group aesthetic in ggplot(). This should put breaks in between each line.

To generate the shaded background, I’d use geom_rect(). For each week in the data, I’d need to tell geom_rect() the xmin, xmax, ymin, and ymax values. The xmin and xmax, I’d get by grouping the dates by the week and returning the minimum and maximum values using summarize(). I’ll likely look to see if there’s a {lubridate} function that allows me to give the year, the week, and the day of the week and get back a date. That would be handy. I’d then set ymin and ymax to be the limits on the y-axis. For each week, I’d want to get the closing value at the end of each week. Again, I could get this with the summarize() function. With that data, I’d see whether the Friday close is higher or lower than the previous week and then map that logical variable to the fill color.

I think the data, the lines, and the rectangles are the big parts of the plot to figure out. Other things would include: (1) the horizontal grid lines running on top of the rectangles, but behind the lines; (2) commas in the y-axis values; (3) placement of the month below the first date of the month and returning data every two weeks; and (4) getting “Orange bars” to be orange and bolded in the subtitle. If you’ve been following along over the past few months you likely have an idea for how we can do each of these.

Don’t forget to give the second plot the same treatment! What do you think the hard parts are in that plot?

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, I’m gearing up to teach a 1-day (6 hours) data visualization workshop on May 9th. This workshop will cover an introduction to the ggplot2 package and will assume no prior R knowledge. My goal is to help you to understand the ggplot2 framework and begin to apply it to make some interesting and compelling visualizations. From this workshop, I hope that you would be able to go off on your own journey learning more advanced topics. You can learn more and register by clicking the button...

Hey folks, Long time friends of Riffomonas know that I’ve been teaching data science classes for close to 20 years. The hallmark of my teaching has been three-day workshops where I either teach R (here and here) or the mothur software package. I’ve gotten feedback that three days is just too much time for people to carve out of their busy schedules. So, I’m excited to be offering a 1-day (6 hours) data visualization workshop on May 9th. This will cover an introduction to the ggplot2 package....

Hey folks, I somehow got through the month of March without a plot to commemorate the 5th anniversary of the COVID-19 pandemic. It is hard to believe that it has been five years. I know that my life and how I work has radically changed because of the pandemic. I started posting videos to YouTube and writing newsletters during the pandemic to help people who wanted to learn to use R while they were locked out of their labs. At one point I taught a workshop for U of Michigan researchers that...