đź‘ź Watching this data viz animation is making me exhausted


Hey folks,

I’m really grateful for the people who have emailed me recently to thank me for making the recreation and makeover videos. I’ve been excited to see the types of figures some of you are trying to make. It’s really been a great part of this work for me. Thank you!

Eric Hill is a loyal Riffomonas Channel viewer who recently sent me an animation he made using the p5.js platform. The animation shows his son’s performance relative to other runners in the prestigious Nike Cross Nationals (NXN) cross country race. Give it a watch - if you’re like me you’ll be exhausted just watching the video:

video preview​

I have really been blown away by the increase in data collection for different sports. Perhaps you’re familiar with the movie/book, “Moneyball”, that describes how the Oakland A’s baseball team was managed based using data over gut feelings. It’s cool to see the new insights and predictions that typically fly under the radar like boys cross country! Eric asked me how I would go about creating this animation in R. I instantly asked him if he could share some of the data. He was happy to share a csv file with headings for the latitude, longitude, distance run, elevation, pace, elapsed time, cadence, and heart rate. All of the data is for his son, Peyton. I’ll focus on making the simulation for his component of the animation.

What do you see when you watch this video? What graphical elements stick out at you? To me, I see a dashboard indicating different types of data about the race. Of course, it’s an animated dashboard. When I see animations, I think of the {gganimate} package. But we’ll get to that in a minute. As I watched the video I noticed that there are really three different panels to this video: the course, the change in elevation, and the rankings with how far each percentile has run. Let’s think about each of these separately.

The course is the most prominent part of the video. Initially I was a little overwhelmed wondering how I’d generate the gray course. But then I realized that the course is really the latitude and longitude Peyton ran. We could make the course by using geom_path() with a thick linewidth value and grayish color. Then we could plot Peyton’s position on top of the course with geom_point() but using a plotting symbol (perhaps 19?) that has a more narrow diameter than the width of the course.

The next part of the figure I noticed was the change in elevation across the course. Generating this plot is similar to the course. I’d draw the change in elevation based on all of Peyton’s elevations again using geom_path(), but this time with a thin white line. Then I’d plot each of his elevations on the line with geom_point().

Next is the collection of text on the right side of the figure. It’s basically an animated legend. This could be generated using geom_text() to plot the text to a specific x and y coordinate with the distance run by each percentile indicated. Because I don’t have the percentile data, I’ve been thinking about what else I could do in with the data to replace the text. Perhaps I would make a speedometer type plot to indicate Peyton’s running or heart rate. Or I could include a thermometer type of plot that shows the increase and decrease in his rates over time. What do you think? Email me back and share your ideas!

To assemble the dashboard, I would assemble the components using the {patchwork} package. I could easily put the elevation change under the course with the division operator. But what about the table of text under the course loop on the right side? I’m not really sure. One thought that occurs to me would be to plot that text using the latitude and longitude that corresponds to that area of the dashboard in my x and y aesthetics for geom_text().

At this point, I would still have a static plot. It would also be pretty hideous looking since each individual point would be on top of the background lines. We only want one point per time step. As I mentioned earlier, I would do this with the {gganimate} package. This package works a lot like facet_wrap() where you make separate plots based on the values in one of your columns. The {gganimate} package turns those “facets” into a GIF or mp4 file. The package will also allow us to indicate the time somewhere on the plot like Eric has on his.

I realized as I was googling to see whether you can combine {patchwork} and {gganimate} that I already made a video that did this! Silly me. I had forgotten that I previously made an animation showing how a receiver operator characteristic (ROC) curve is made from biomarker data. Silly me. Feel free to give that video a watch if you can’t wait for me to try to reproduce Eric’s video.

Of course, there are a number of other small elements in this plot that we could think about. Things like there not being any axes, the all black background, or the multi-line title in a small font. Hopefully, you’re getting more accustomed to using theme() to modify these various elements and can figure out how to match Eric’s style.

What do you think of Eric’s animation? I know that runners are collecting all sorts of data on themselves using GPS and heart monitors. Have any of you tried to visualize your own data? Let me know what you’ve done. I’d also love to hear if you’re visualizing other sports data.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

​

In case you missed it…

Here are some videos that I published this week that relate to previous content from these newsletters. Enjoy!

​
video preview​video preview​

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

​

Riffomonas Professional Development

Read more from Riffomonas Professional Development
man floating holding on orange stick white people watching on the street

Hey folks, I have long since given up trying to anticipate what types of videos will resonate with people on YouTube. One of my most popular videos shows people how to make stacked bar plots. Throughout it, I tell people that these are a horrible way to visualize data. It’s my third most viewed video. I thought a video on slope plots would be popular. Nope. People panned last week’s episode. But Venn diagrams - holy cats! People are really geeking out about this week’s episodes on Venn...

Hey folks, One of the benefits of sending out these newsletters and making my YouTube videos is that I get a ton of practice. I can’t emphasize how much practice has paid off in learning to use dplyr, ggplot2, and other packages. Reproducing published figures has really helped me to dive into parts of ggplot2 that I wouldn’t normally use because I make plots that use the features of ggplot2 that I know. By expanding my knowledge of ggplot2, I’m finding that the plots I make from scratch are...

Hey folks, I hope you’re enjoying my new approach of integrating the newsletter with my YouTube videos. The feedback I’ve gotten has been very positive. Thank you! I’d love it if you were to reply to this email with a link to the most recent figure you found in your reading of the literature or popular media. This week, I’m sharing with you Figure 5D from a paper recently published in mSystems by Charlie Bayne and colleagues where they looked at the effect of interactions between tryptophan...