Hey folks, One of the benefits of sending out these newsletters and making my YouTube videos is that I get a ton of practice. I can’t emphasize how much practice has paid off in learning to use dplyr, ggplot2, and other packages. Reproducing published figures has really helped me to dive into parts of ggplot2 that I wouldn’t normally use because I make plots that use the features of ggplot2 that I know. By expanding my knowledge of ggplot2, I’m finding that the plots I make from scratch are more varied and sophisticated than they would normally be. I hope I’m not bragging here - my point is that there’s no reason you couldn’t be doing the same thing. Grab your favorite journal, find a figure, think through what’s standard, what’s novel, and how you would approach the figure. Then go through the process of creating it yourself. Finally, change what you don’t like in a new version of the plot. I’d love it if you shared with me any of your (re)creations! This week, a figure by Alessandro Lo Sciuto and colleagues caught my eye in their paper, “A molecular comparison of [Fe-S] cluster-based homeostasis in Escherichia coli and Pseudomonas aeruginosa”. The paper was published in mBio. The paper looks at the iron limitation physiology of E. coli and P. aeruginosa and how they use iron-sulfur clusters. The figure that I was interested in was Figure 2C This set of figures describes the sensitivity of P. aeruginosa to stress, specifically antibiotic stress. The wild type strain (PAO1) was sensitive to these drugs. They created a mutant that had the iscU gene deleted but added back with an arabinose-inducible promoter. When there was no arabinose in the media (ΔiscU ParaiscU) the strain was also sensitive. But when they added arabinose to the media, the strain was resistant (ΔiscU ParaiscU+ (+0.5% ARA)). That’s enough biology :) What type of figure is this? What do you think is interesting about how it was created? Is there any overlap with what these authors did with what you try to do with your figures? How much of this figure do you think you could figure out on your own? What would you be interested in learning to do? For discussion, let’s assume the data comes as a data frame with a column for the name of the First off, these four plots are all line plots with time in hours across the x-axis and cell density (CFU/mL) on the y-axis. They tested four antibiotics including gentamicin, ofloxacin, meropenem, and colistin. The legend tells us that the experiments were done in triplicate and that the mean is presented with error bars representing the standard deviation. For each plot in this panel, I’d create the general appearance using Second, I notice that the y-axis is on a log-10 scale. We can get a log scale by using Third, normally I would draw a line to indicate where the limit of detection was using Fourth, sticking with that y-axis, I noticed that the y-axis labels are 5 times ten to a power (e.g., 5x106) rather than the typical 1 times ten to a power (e.g., 106). They also have minor tick marks. The minor tick marks are a common feature in log scaled axes, but I realized I rarely see these in R plots. They certainly haven’t ever been on any of my plots. We can adjust both of these in Finally, they appear to have generated these plots as four separate plots. This gets us redundant x and y-axis titles and text. Regardless, we can use our friend the patchwork package to assemble the four plots into a single figure. It is interesting that they have a common legend for the four plots at the bottom of the figure. I know there’s a vignette on the patchwork website showing how to share legends across figures. I’d check that out to duplicate their legend. Of course, there’s a number of small details that I’m skipping over here. Things like how they have a greek letter, subscripts, italics in their strain names or how they title of each plot is set off to the right side of the plot. All of these things can be modified adjusting the
|
Hey folks! I’d love to have you join me in September for a new approach to teaching workshops that I will be rolling out. For five weeks I’ll be working with two cohorts of you all to improve our data visualization skills. Each week we’ll meet for a two-hour session. These sessions will include instruction on principles and concepts in data visualization and an opportunity to apply this information to visualizations we find in the wild or that you bring to the group. By not talking about...
Hey folks, Are you interested in uping your data visualisation skills? I’m rolling out a new program to help you improve the design of your data visualizations. This program will last 5 weeks starting at the beginning of September. Each session will be two hours long and include a discussion of data visualization principles followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in this program so you can focus more on concepts than implementation....
Hey folks, I’m really excited to announce a new program to help you improve the design of your data visualizations. I emailed you about this earlier in the week, so I’ll keep this reminder brief. This data visualization makeover program will last 5 weeks starting at the beginning of September. Each two-hour session will include a discussion of data visualization principles and strategies followed by an opportunity to apply these ideas to your own visualizations. There will be no coding in...