Tutorial


How to use box plots and point clouds in Tableau

How to use box plots and point clouds in Tableau



ByJulien Godenir

...

July 23, 2019

This post is all about box plots and point clouds in Tableau. The idea is to create a visualization with information about the statistical distribution of your data. In this example I'll be using the Superstore dataset (included with Tableau Desktop). If you use Tableau Public, you will find the data here.


Box-plots and point clouds in Tableau

As you can see in the visualization above, there is a sales view for each customer within a point cloud. I used a box plot to get the medians, quartiles, and deciles of our dataset. It is also possible to group customers by sales, for example, to obtain an aggregation of thousands.

In the following tutorial, you will learn how I came to the result. If you have questions or encounter problems, you can contact us at any time. Enjoy!

1. The foundation of box whisker plots and point clouds in Tableau: LOD expressions

To make my data easier to work with, I use level of detail expressions to calculate sales by customer, sales category, and number of customers by category.

Since I want to give the user the ability to resize sales categories, I'll create an Integer parameter [Category Size]. Personally, I use the values 1000, 2000, 3000, 5000, and 10000. However, you can vary and also insert other values.

Einen ersten Parameter in Tableau setzen

To determine the number of sales for a customer, I use the "FIXED“ calculation on [Customer Name] and calculate the sum of [Sales]. You can also use "INCLUDE" or "EXCLUDE" as LOD calculations if you want.


Eine erste FIXED-LOD zu Tableau hinzufügen

Next, I calculate in which "class" a customer is by dividing sales per customer by [Category Size] and then approximating the smallest integer.


Verwendung der FLOOR Kalkulation

To clarify the calculation, I add a calculated field, which I call [Label Category Sales by customer] and which indicates between which values the record lies.


Hinzufügen der Kategorien

This field will provide me with uncomplicated legends of the style "1000" x "1999". "Finally, I calculate the number of customers in a given category [Number of Clients per Category] with a second calculation by level of detail.


Ermittlung der Anzahl von Kunden pro Kategorie

Here's the result we'll get afterwards:


Das Zwischenergebnis

2. Creating a point cloud and box plots


To achieve a point cloud effect, if you only have one measure, you need to sort individual values at random. Tableau provides the RANDOM () function, which I will not use in this tutorial. For that I will create my own random function. I want to be able to place the points on the axis 0 of the ordinates if there is only one point. We will see that this approach is useful for combining two aggregation levels into a single graph.

So I create an integer parameter [Seed]], which I can set to any number (12/24/48 / ...). Then I add a calculated [Random]field using the fields already calculated. The idea is that if the category contains only one customer, I set the point to zero. Otherwise, I divide the sales value by the parameter [Seed] and subtract the nearest Integer number (ROUND) from this value.


Random Funktion in Tableau mit IIF-Kalkulation nachbilden

The result of this calculation is between -0.5 and 0.5, if [Seed] is small enough compared to the sales figures given by the customer.

Beware, if your data is only containing integers, then Seed must be set to a relatively high number in order to create a cloud-like effect. So there is a medium that depends on your data. In a new table, drag ATTR ( [Random] ) into the columns and MIN ( [Sales by Customer] ) into the rows. Add [Customer Name] to the tags and [Label Category by Customer] All you have to do is drag a Box Whisker Plot / Box Plot or box graphic onto your Dashboard (by selecting MIN ( [Sales by Customer] ) as shown below).


Box-Plot mit der Analytics-Funktion in Tableau erstellen

It should be known that the ATTR function is actually a MIN and a MAX function. It will only return a value if MIN and MAX are the same. In this way, you can verify the correct aggregation of the data. However, if you are confident in the aggregation of your data, using the MIN (or MAX) function saves a calculation and can, therefore, make your dashboard a little faster. We'll see later why I'm forced to use ATTR for columns..

3. Using the dual-axis feature in Tableau

Now that we've created our two dashboards, we have to combine the two dashboards. I use the Dual Axis function in Tableau. I also simply calculate the ordinate of the aggregations to place the points in the median of the categories. If the category "2000" is "x" "2999", I usually set the point to 2500. Of course, you can also set it to 2000, if you prefer. I call this calculation [Ordinate] out of a severe lack of inspiration this weekend:


Das berechnete Feld Ordinate erstellen

Then follow these steps:

  1. Place [Ordinate] on rows

  2. Delete [Customer Name] from AGG [Ordinate]

  3. Instead, drag [Number of Clients per Category] to the mark size of [Ordinate].

  4. Select the Dual Axis function for [Ordinate].

  5. Synchronize the axis on the right side

  6. Deactivate the option "Show header on right axis"

  7. Make aesthetic adjustments (transparency, size of the elements)

The last four points are also visualized here:


Finalizing the point cloud and adjusting the box plot


And that's it! You can download the visualization on Tableau Public.