How do you represent a limit of detection?

Hey folks!

I’m hoping to host two workshops in March and April. The first would be a Zoom-based workshop on the principles of data visualization (I taught a version of this last month). This would be a code-free workshop and would run for about 3 hours. I don’t have a date yet. If you are interested, please reply to this email and let me know if there is a date and time in March that would work best for you. The second would be an in person 3 day workshop taught near the Detroit airport. I know it seems bonkers to meet in person, but this was how I got started teaching. Frankly, I miss these interactions. If you are interested in attending a live, in person workshop learning R’s tidyverse applied to a variety of interesting datasets (an in person version of this workshop), reply to this email and let me know when would work best for you. I need at least 15 people to register to make it work.

This week I found a paper recently published in Nature Microbiology, “Initial sites of SIV rebound after antiretroviral treatment cessation in rhesus macaques”. Don’t worry if you aren’t sure what most of those words mean. I often get confused over how authors represent a limit of detection. If there is a line indicating the limit of detection and points are on top of the line, were they below or above the limit of detection? I prefer to remove any doubt and put the points below the line. I found a panel in this paper that showed the limit of detection in an interesting manner

See what they did there? They actually have two plotting windows. The one on the top has a y-axis on a log-scale going from 10^-1 to 10⁵. The one on the bottom has a label indicating “Below threshold”. I thought this was a unique strategy for indicating the number of observations that had densities below the limit of detection.

This approach is different from what I typically do and so it caught my eye. My usual approach is to add a horizontal, thin, gray line, which might be dashed just above the limit of detection on the y-axis. Then any points below the line would be “below the limit of detection”. Often this is done as it is here, with a log-scaled y-axis. To pull off my approach, I have to add a small value to all the values so that I don’t have any zeroes (you can’t take the log of zero). I generally pick that value so that it has a good spacing relative to the other data. We can get away with this because if in this case I were to add 10^-2 to all the values, it wouldn’t meaningfully change the position on the y-axis of the other data points. If this worried you, you could always just add the value to those that were below the limit of detection.

Take a moment and think about how these authors could have created these figures in R. What did you come up with? Facets? Me too! I’d likely create a variable - perhaps below_lod - that is TRUE if it’s below the limit of detection and FALSE if it’s above the limit. Then I’d facet on that variable creating one column and two rows. We could play with the scales and space arguments to get the bottom panel to be shorter than the top.

Using scale_y_continuous() or scale_y_log10() we’d want to change the label on the y-axis to be the power of ten or the “Below threshold” label. I’d use the continuous function if I was plotting the log10 of the densities and the log10 function if I was using the raw density values. In this case, I could see transforming the data initially and using the continuous function. That’s because I would like to add some vertical jitter to the bottom panel. I think that would be easier to do with scaled values. Because I’d relabel the y-axis values, I’d set the zero values at something outside the range of the y-axis. Perhaps, I’d use -2. To the zero values, I’d use runif() to randomly draw values between -2.2 and -1.8 and use those in place of the zeroes. I could set the vertical jitter value in geom_jitter(), but that would also add a vertical jitter to the data above the limit of detection and I don’t want to do that.

Now that I think about it, I rarely add the vertical jitter to my points below the limit of detection. Even if I used my approach rather than the faceted version, I could still add a vertical jitter those points below the limit of detection. I’d use a similar approach, albeit in log10 space.

Which of these two approaches to indicating a limit of detection do you prefer? Reply to this email and let me know! Also, don’t forget to let me know if you’re interested in one of the upcoming workshops

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

minimalR Workshop

generalR Workshop

mothur Workshop

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

How do you represent a limit of detection?

Workshops

In case you missed it…

Using R to generate a figure with microscopy images?

Learn to make your data analysis "tidy" and your code "tidy" too!

What visualization approach would you use instead of these pie charts?