|
Hey folks, If you missed it, on Wednesday I did a livestream where I made a stacked barplot and pronounced it good. No, I wasn’t drinking anything! But it’s a reminder to think about the question before finding the best data visualization strategy. I think this highlights the value of the constructive approach I’ve been trying to take to critiquing data visualizations. The first steps are to establish the question and figure out the question. If you aren’t a “regular”, I think you’re really missing out by skipping the Monday critique videos. I’d love to do visualizations that are relevant to your work, so feel free to send me things that catch your eye - either good or bad - in your reading of the literature. This week, I’m going to be critiquing, recreating, and refactoring panels c through j of Figure 1 from the paper “Rising atmospheric CO2 reduces nitrogen availability in boreal forests”, which was recently publishing in Nature. I’ll have more to say in the videos, but for now, I’d like to focus
on the statistical information in the upper right corner of each panel.
How did they generate that information? Many beginners (and more
advanced users too!) would have a single data frame that they filter to
the particular combination of variables they are analyzing. In this
case, the region and the tree species. Then they would run the code to
generate the statistical information eight times. That definitely works,
but it isn’t DRY and
leads to cumbersome code. I’d like to lead you through something you’ve
likely seen me do if you watch my videos and which has often left people
scratching their head. It’s a powerful For discussion, assume that we’re working with the
Let’s think about the beginner approach. I’d filter
The data frame
or it could be written like this
Regardless,
For most people that’s enough to generate the plot. They’d generate
two additional data frames for 6 and 8 cylinders and run
But let’s keep going to see if we can streamline the code some more.
First, we’ll join the two steps. We can pipe the output of
With the
This gives us the following output:
Again, we could repeat this with the other numbers of cylinders. Or
we could use the confusing idiom. Instead of using
This gets us a strange looking data frame with 3 rows and 2 columns.
The
or
But those give a variety of errors that are frustrating. What we need
to do is iterate over each value of
Let me break down that
We’ve added a column. Of course, that’s what
To remove the
Finally, we get this beaut…
We could have repeated the same 3 or 4 lines using
Got it?! Let me know if this did or didn’t make sense. Feel free to ask any questions that might help you understand this better. I suspect that if you can figure out this powerful tidyverse idiom you’ll be among about 5% of R users. I think it’s worth figuring it out to unlock the door to not only tidy output, but tidy code as well! I would love your feedback on this type of newsletter content. Do you like seeing code in the newsletter or do you prefer the higher level discussion I often provide?
|
Hey folks, As I mentioned last week, I’m exploring the possibility of holding live, in person, workshops again like I did before the pandemic. If this is something that interests you, please let me know. My thought would be to hold them at an affordable hotel near the Detroit airport (DTW). But, if you would like to host me to teach a workshop, I would be open to that as well. This week, I want to call your attention to a plot that I would not encourage you to make. This comes form “Targeted...
Hey folks! I’m hoping to host two workshops in March and April. The first would be a Zoom-based workshop on the principles of data visualization (I taught a version of this last month). This would be a code-free workshop and would run for about 3 hours. I don’t have a date yet. If you are interested, please reply to this email and let me know if there is a date and time in March that would work best for you. The second would be an in person 3 day workshop taught near the Detroit airport. I...
Hey folks, We had a lot of fun last week with my first workshop on the theory of data visualization! If this is something that you’d be interested in participating in let me know. At this point, I don’t have anything scheduled. So, if you have suggestions for days or times, please let me know This week I have a fun figure to share with you from a paper recently published in Nature Microbiology, titled, “Candida auris skin tropism and antifungal resistance are mediated by carbonic anhydrase...