Creating a dendrogram that spans multiple facets


Hey folks,

I appreciated the emails I received from people after last week’s newsletter. I hope that even if people didn’t agree with what I had to say, it was thought-provoking. Regardless of how a plot is made - R, Prism, Excel (gasp!), or AI (oh my!) - we need to train our eyes and sense of taste to make the most compelling visualization of our data. If you’re interested in working with me on an individual or group level to achieve this goal, let me know. I am offering consultation sessions focusing on improving your data visualizations. If you are interested in learning more about what I can provide you, please sign up for a free 30-minute exploratory meeting.


Several weeks ago, I recreated a relatively complicated figure that included four heatmaps and a dendrogram using the {patchwork} and {ggdendro} packages. This week I want to try something related. Check out this set of panels from a paper recently published in Nature Cancer titled, “Reprogramming of stroma-derived chemokine networks drives the loss of tissue organization in nodal B cell lymphoma”.

This figure is created from single cell sequencing data. I don’t really know much about how these data are generated or the ins and outs of how they’re analyzed. Regardless, several things caught my eye about this visualization. First, there are the three plots (not sure why they’re grouped into 2 lettered panels rather than 1 or 3). This tells me that this is a chance for more practice with {patchwork}. I like to practice new things! Second, the legend in the horizontal stacked bar plot in panel f likely uses some of the concepts I discussed in last Monday’s how-to video on using multiple legend keys for the same category. More practice! Third, they have two dendrograms - one of which appears to span 5 facets. Oooohhhh… something new to try!

How would we make a faceted dendrogram? Yes, there’s a {ComplexHeatmap} package that appears to do exactly this. Actually, that appears to be how the authors of this paper generated the image. But how would I create this effect with tools I already know?

You might recall from the last dendrogram I created how we used the dendro_data() function from the {ggdendro} package to extract the dendrogram data from the output of using the hclust() function that we used to cluster the data. From the output of dendro_data() we used the segments() function from {ggdendro} to obtain a data frame containing the starting and ending x and y coordinates for each segment in the dendrogram. Previously we had to transpose the x and y coordinates to make a vertical dendrogram. In this case we want a horizontal dendrogram so we can use the coordinates as they are. We saw how we could use geom_segment() from {ggplot2} to draw the dendrogram.

Here’s my idea: let’s manipulate the x and xend position values to add the gap. Looking at panel e, there are gaps between the 5th and 6th and 6th and 7th positions on the x-axis (there are other gaps too, but let’s look at these for this discussion). Basically, I can imagine using a case_when() statement where I’ll add some increment - maybe 0.5 - to the x-position for anything that starts or ends at 6 on the x-axis. We’ll add twice that increment to create the second gap so that anything with an x coordinate position between 7 and 10 will have 1 added to it. We can add 1.5 to create the third gap for things between 11 and 17, and 2 to anything at 18 or greater to create the fourth gap. The y-axis positions for the faceted dendrogram aren’t affected by the facets, so they’ll stay the same.

What about the heat map? Same idea. Remember we create the heat map by mapping one variable to the x-axis and another to the y-axis. In this case, we could create a variable that we’ll map to the x-axis that uses the same spacing we used in the dendrogram. Then we can use scale_x_continuous() to map the text labels to the x-axis positions. Then when we use {patchwork} we’ll have the leaves of the dendrogram aligned with the columns in the heatmap. Cool, eh?

The key to all of this for me was remembering that we’re mapping data to the x and y-axis. By modifying the x-axis positions we can create horizontal facets. Similarly, modifying the y-axis positions would create vertical facets.

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

video preview

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Read more from Riffomonas Professional Development

Hey folks, If you’ve watched any of my livestreams when someone asks why I don’t get ChatGPT or something to do a task for me, you probably saw a pained expression come across my face. Part of me dies every time someone tells me that they used some LLM chatbot to solve a problem. I have many reasons for despising the fascination with AI-based tools. I even wrote a commentary that I submitted to mBio in the fall of 2024. Yes, I wrote it. By hand. Then I typed it. No really, I typed it on a...

Hey folks, It has been great to see the high level of engagement with my weekly critique videos on YouTube. I have really enjoyed making them and have learned a lot about current practices in data visualization. The one problem with these videos is that they’re a bit like an autopsy. We can figure out what went well or what didn’t work in a published figure. But we can’t do much to improve the published figure. What if we could do critiques before submitting our papers, preparing a...

Hey folks, This week I want to share with you a figure that resembles many a type of figure that I see in a lot of genomics papers. I’d consider it a data visualization meme - kind of like how you’re “required” to have a stacked bar plot if you’re doing microbiome research or a dynamite plot if you’re publishing in Nature :) This figure was included in the paper, “Impact of intensive control on malaria population genomics under elimination settings in Southeast Asia” that was published...