Visualizing how Americans feel about different card games

Hey folks,

Are you looking for more personalized support and coaching to help you develop your data analysis skills? Are you looking for help in leading a data science team where your folks aren’t super proficient in analyzing data? Let me know what you’re looking for and we can discuss how I might be able to help you. Unfortunately, this wouldn’t be a free service. But, I’m confident I can help you get over the challenges that are keeping you from creating data analyses and visualizations that you are proud of. Let me know by replying to this email.

I really hate stacked bar plots. Unfortunately, one of my most popular videos is how to make a stacked bar plot! I even tell people that there are better ways of representing data than with a stacked bar plot. Oh well. Today, I want to share a stacked bar plot that I think would be fun to recreate and think about how we could make it better. This visualization was published online two years ago and comes to us from YouGov.

This is a horizontal stacked bar plot showing whether people love, like, dislike, hate or don’t know if they like one of 30 card games. It also has text annotation to indicate the size of each of the bars.

If you want the data, you can copy and paste it from a PDF with their data. Incidentally, embedding data in a PDF is a sure sign to me that people don’t want you to actually use the data for secondary purposes. Thankfully, this is a nice PDF that we can copy and paste and with some regular expressions in RStudio, we can convert to a tibble. The data will come in wide format with the different sentiment types across the columns, the games in the rows, and the cells the level of sentiment for each game. We can tidy the data using pivot_longer().

By default, geom_col() creates a stacked bar plot. Typically, we see these as vertical bars, but this is obviously horizontal. To get a horizontal plot, we can map the level of sentiment to the x-axis, the game to the y-axis, and the sentiment type to the fill color. Most likely the bars will be ordered dislike, hate, like, love, not sure because that is their alphabetical ordering. We can convert the sentiment type into a factor and give our desired ordering. It’s also likely that the games will come out in alphabetical order starting with Blackjack at the bottom going up to War at the top. Again, we can use factors to set the order. In this plot, the games are ordered by the percent of people who love a game. We can use fct_reorder() from the {forcats} package which is bundled with {tidyverse}. This function will allow us to set the levels based on a numeric column. We’ll likely want to make the game column a factor before we go from wide to long format with pivot_longer().

We’d also like to add the level of sentiment for each game to each of the bars. Well, except for those bars with less than 4% support. I’d start by making a label column that is the same as the level of sentiment, except for those situations with less than 4% (e.g. people who hate Spit or Speed). For those situations, I’d replace the number with “” so the value isn’t displayed. I’d use geom_text() to add the values to the bars. This is likely to get tricky since it may try to position the text at the actual size of the bar and not in the middle of the bars. We can fix this by using the default position function used by geom_col(). This is position_stacked(). Adding that function to geom_text() should get the number into the correct bars. Again, we may need to play with the ordering of things. We’ll also need to add some adjustment factors to get the number aligned to the left side of each bar. This might be the tricky part of the figure. Another tricky thing is that the color of the text needs to vary by the sentiment type. The darker fill colors have white text and the lighter fill colors have black text.

There’s a number of interesting stylings that we’ll be able to implement in the theme() function. Starting at the top, the title has the colors for love and like embedded. We can achieve this with element_markdown() from {ggtext} using some HTML markup. The second styling to note is the location of the legend across the top of the figure. We can use the legend.position argument to put the legend at the "top".

Now, how could we improve this figure? The main problem with stacked bar plots is that it is difficult to compare the internal bars across groups. Sure the numbers are there, but it’s not as efficient as comparing the length of a bar that is anchored on either side. One solution would be to convert this to a dot plot where we’d use the same x and y-axis aesthetic mappings, but we’d use geom_point() instead of geom_col(). Then you could easily compare across the games to see which game was the most or least liked or disliked. If we stayed with a stacked bar plot and considered the title of the plot, another improvement would be to change the ordering of the games to be based on the sum of the percent of people who love or like a game. This would more easily show that people really like solitaire and don’t care much for bridge. What other improvements would you consider to this plot?

As an aside, I’m struck by the preference for solitaire and the overall dislike of bridge. Solitaire is a single person game that at one point (perhaps still?) came on every windows computer. There’s little strategy. Bridge is a very social game that I associate with the “greatest generation”. Couples would get together regularly to play with each other and there were newspapers columns about bridge strategy along side columns about chess strategy. It’s hard to not see this as some referrendum on our social media world where we think we’re participating in a community, but really we’re growing more and more isolated.

What’s your favorite card game?

Workshops

I'm pleased to be able to offer you one of three recent workshops! With each you'll get access to 18 hours of video content, my code, and other materials. Click the buttons below to learn more

minimalR Workshop

generalR Workshop

mothur Workshop

In case you missed it…

Here is a livestream that I published this week that relate to previous content from these newsletters. Enjoy!

Finally, if you would like to support the Riffomonas project financially, please consider becoming a patron through Patreon! There are multiple tiers and fun gifts for each. By no means do I expect people to become patrons, but if you need to be asked, there you go :)

I’ll talk to you more next week!

Pat

Riffomonas Professional Development

Visualizing how Americans feel about different card games

Workshops

In case you missed it…

Plotting the US job creation numbers (and revisions) with ggplot2

Visualizing the timeline of antibiotic discovery and resistance with ggplot2

How would you make a labelled bar plot with positive and negative values?