|
Hey folks! The summer is nearly over - where did it go?! Many of us are getting ready to send our kids off to school and start a new academic year. If you’re subscribed to this newsletter, I suspect you are interested in improving your data visualization skills. You can certainly continue to receive this newsletter and watch my weekly livestreams on YouTube for free to help increase those skills. If you want a more concentrated or personalized opportunity to develop your data visualization chops, I want to remind you of a few opportunities. First, starting in September I am going to be teaching a 5-part workshop that meets weekly to discuss and apply concepts of data visualization. Second, I have pre-recorded workshops teaching the fundamentals of the tidyverse using microbiome data and data of interest to a more general audience. Finally, I would love to work one-on-one with you or your research team to develop custom learning solutions. If any of these opportunities interest you, please click on the links above or reply to this email and let’s start taking. This week reader and livestream viewer Mike Parrott from the UK forwarded a plot to me from the Pew Research Center. The plot was part of Pew’s overall effort to look at US media consumption by sex, age, race, politics, and education. Mike was happy to see that The Guardian and BBC News are relatively popular among college educated people living in the US. This plot reports the survey of 9,482 US adults that Pew surveyed back on March 2025. Part of the survey was to ask the people being surveyed where they get their news from and their level of education. One of my first questions when trying to recreate the data is whether I can get the data from somewhere that will allow me to bring it into R easily. Yes, all of the names and numbers are in the plot, but manually typing that would be a pain. I did find the data, but sadly, the data are embedded in a PDF. Why do people do this? It seems they want to be perceived as being transparent without actually having to be transparent. Someone on a recent livestream mentioned that there are R packages to extract tables from PDFs. I forget which package they mentioned. A quick google search found a few options. First is Back to the plot… Clearly this is a bar plot with the axes switched from what we traditionally see. This is helpful because it allows us to more easily read the name of the news outlet than if the names were along the x-axis and the names rotated to prevent them from overlapping. I would use The next notable element of the plot is the percentages of college graduates. I’d use Let’s think about the text elements for a moment. There are two bits of text that help orient the reader to the plot. The first is the “62% of people regularly…” blurb that helps us interpret the first bar. I think that’s pretty helpful. There’s a downward pointing triangle there to connect the text to the bar for “The Atlantic”. I’d probably put the text an the triangle in with Thinking about that second blurb, we see that the authors put a pink point at 36% on the x-axis for each media outlet. We could place that with Now that I’ve started thinking about things I would change, let’s think more about the data being displayed. The story makes a point that the visual is basically flipped for people with a high school diploma or less. For example, “Univision” and “Telemundo” are most popular among these folks and “The Atlantic” is not popular with them. I could imagine changing the plot to be a dot plot instead of a bar plot. For each media outlet, I’d place a different colored point for each of the three education categories across the x-axis. I’d like to put a vertical line to show the total percentage of US adults in each category where the color matches the color of the point. Maybe that would be too busy? If so, we could drop the “some college” population to focus on the extremes. What do you think?
|
Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...
Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...
Hey folks! I hope you enjoyed last week’s series on the radial volcano plot (newsletter, critique video, livestream). I think it did a good job of illustrating the various reasons I think it’s valuable to recreate figures, even if we don’t like how they display the data. Something I didn’t really emphasize in last week’s newsletter was that by recreating a figure, we can make sure that the data are legit. I’m surprised by the number of signals I’ve been finding where authors using tools like...