|
Hey folks, If you missed it, on Wednesday I did a livestream where I made a stacked barplot and pronounced it good. No, I wasn’t drinking anything! But it’s a reminder to think about the question before finding the best data visualization strategy. I think this highlights the value of the constructive approach I’ve been trying to take to critiquing data visualizations. The first steps are to establish the question and figure out the question. If you aren’t a “regular”, I think you’re really missing out by skipping the Monday critique videos. I’d love to do visualizations that are relevant to your work, so feel free to send me things that catch your eye - either good or bad - in your reading of the literature. This week, I’m going to be critiquing, recreating, and refactoring panels c through j of Figure 1 from the paper “Rising atmospheric CO2 reduces nitrogen availability in boreal forests”, which was recently publishing in Nature. I’ll have more to say in the videos, but for now, I’d like to focus
on the statistical information in the upper right corner of each panel.
How did they generate that information? Many beginners (and more
advanced users too!) would have a single data frame that they filter to
the particular combination of variables they are analyzing. In this
case, the region and the tree species. Then they would run the code to
generate the statistical information eight times. That definitely works,
but it isn’t DRY and
leads to cumbersome code. I’d like to lead you through something you’ve
likely seen me do if you watch my videos and which has often left people
scratching their head. It’s a powerful For discussion, assume that we’re working with the
Let’s think about the beginner approach. I’d filter
The data frame
or it could be written like this
Regardless,
For most people that’s enough to generate the plot. They’d generate
two additional data frames for 6 and 8 cylinders and run
But let’s keep going to see if we can streamline the code some more.
First, we’ll join the two steps. We can pipe the output of
With the
This gives us the following output:
Again, we could repeat this with the other numbers of cylinders. Or
we could use the confusing idiom. Instead of using
This gets us a strange looking data frame with 3 rows and 2 columns.
The
or
But those give a variety of errors that are frustrating. What we need
to do is iterate over each value of
Let me break down that
We’ve added a column. Of course, that’s what
To remove the
Finally, we get this beaut…
We could have repeated the same 3 or 4 lines using
Got it?! Let me know if this did or didn’t make sense. Feel free to ask any questions that might help you understand this better. I suspect that if you can figure out this powerful tidyverse idiom you’ll be among about 5% of R users. I think it’s worth figuring it out to unlock the door to not only tidy output, but tidy code as well! I would love your feedback on this type of newsletter content. Do you like seeing code in the newsletter or do you prefer the higher level discussion I often provide?
|
Hey folks, I was a student-invited speaker at the Syracuse University Biology department this week. It was great to meet with them and hear how they are benefiting from these newsletters and my videos. As much as I love posting newsletters and videos, seeing people light up at ideas, laugh at my jokes, and tell me how they are using what I teach them is like jet fuel. I actually gave two talks. One talk covered what I’ve learned about data visualization by critiquing, recreating, and remaking...
Hey folks, If you missed Wednesday’s livestream, I encourage you to go back and check it out. I recreated a panel from a paper published in Nature that is pretty typical. It was made up entirely of photographs. Sometimes I feel like I’m the only PI that doesn’t merge panels into figures using Illustrator or Powerpoint. I prefer to use R with some help from {cowplot} or {patchwork} to do this for me. That way I can write a single script to generate the entire set of panels. The result is a...
Hey folks, This week I’ve been teaching one of my 3 day R workshops as part of my official teaching duties at the U of Michigan. I really enjoy teaching these classes! I offer recorded versions of these workshops that use microbiome data or other types of data to help motivate my teaching of R’s tidyverse packages. If you would like to purchase your own version of these workshop click on those links! Also, if you would like me to teach a live workshop to your group, reply to this email and...