“Raincloud Plots: a Multi-Platform Tool for Robust Data Visualization”, Micah Allen, Davide Poggiali, Kirstie Whitaker, Tom R. Marshall, Rogier Kievit2018-08-23 (, ; backlinks; similar)⁠:

Across scientific disciplines, there is a rapidly growing recognition of the need for more statistically robust, transparent approaches to data visualization. Complimentary to this, many scientists have realized the need for plotting tools that accurately and transparently convey key aspects of statistical effects and raw data with minimal distortion.

Previously common approaches, such as plotting conditional mean or median barplots together with error-bars have been criticized for distorting effect size, hiding underlying patterns in the raw data, and obscuring the assumptions upon which the most commonly used statistical tests are based.

Here we describe a data visualization approach which overcomes these issues, providing maximal statistical information while preserving the desired ‘inference at a glance’ nature of barplots and other similar visualization devices. These “raincloud plots” [scatterplots + smoothed histograms/density plot + box plots] can visualize raw data, probability density, and key summary statistics such as median, mean, and relevant confidence intervals in an appealing and flexible format with minimal redundancy.

In this tutorial paper we provide basic demonstrations of the strength of raincloud plots and similar approaches, outline potential modifications for their optimal use, and provide open-source code for their streamlined implementation in R, Python and Matlab. Readers can investigate the R and Python tutorials interactively in the browser using Binder by Project Jupyter.

Figure 3: Example Raincloud plot. The raincloud plot combines an illustration of data distribution (the ‘cloud’), with jittered raw data (the ‘rain’). This can further be supplemented by adding box plots or other standard measures of central tendency and error.—See figure3.Rmd for code to generate this figure.

…To remedy these shortcomings, a variety of visualization approaches have been proposed, illustrated in Figure 2, below. One simple improvement is to overlay individual observations (datapoints) beside the standard bar-plot format, typically with some degree of randomized jitter to improve visibility (Figure 2A). Complementary to this approach, others have advocated for more statistically robust illustrations such as box plots (Tukey 1970), which display sample median alongside interquartile range. Dot plots can be used to combine a histogram-like display of distribution with individual data observations (Figure 2B). In many cases, particularly when parametric statistics are used, it is desirable to plot the distribution of observations. This can reveal valuable information about how eg. some condition may increase the skewness or overall shape of a distribution. In this case, the ‘violin plot’ (Figure 2C) which displays a probability density function of the data mirrored about the uninformative axis is often preferred (Hintze & Nelson1998). With the advent of increasingly flexible and modular plotting tools such as ggplot2 (Wickham2010; Wickham & Chang2008), all of the aforementioned techniques can be combined in a complementary fashion…Indeed, this combined approach is typically desirable as each of these visualization techniques have various trade-offs.

…On the other hand, the interpretation of dot plots depends heavily on the choice of dot-bin and dot-size, and these plots can also become extremely difficult to read when there are many observations. The violin plot in which the probability density function (PDF) of observations are mirrored, combined with overlaid box plots, have recently become a popular alternative. This provides both an assessment of the data distribution and statistical inference at a glance (SIG) via overlaid box plots3. However, there is nothing to be gained, statistically speaking, by mirroring the PDF in the violin plot, and therefore they are violating the philosophy of minimizing the “data-ink ratio” (Tufte 1983)4.

To overcome these issues, we propose the use of the ‘raincloud plot’ (Neuroconscience2018), illustrated in Figure 3: The raincloud plot combines a wide range of visualization suggestions, and similar precursors have been used in various publications (eg. Ellison1993, Figure 2.4; Wilson et al 2018). The plot attempts to address the aforementioned limitations in an intuitive, modular, and statistically robust format. In essence, raincloud plots combine a ‘split-half violin’ (an un-mirrored PDF plotted against the redundant data axis), raw jittered data points, and a standard visualization of central tendency (ie. mean or median) and error, such as a boxplot. As such the raincloud plot builds on code elements from multiple developers and scientific programming languages (Hintze & Nelson1998; Patil2018; Wickham & Chang2008; Wilke2017).