Fundamental charts

In this post we will take a look at some of the most fundamental charts that one encounters in data visualization ¹ ². These charts will be the basic building block for most of the visualizations, and will allow us to visualize a wide range of datasets.

1-D Scatterplot

Although not very common in explanatory visualization, one may decide to simply visualize a single value attribute across some items, and in this case we can use a one dimensional scatterplot. In this example we will show the distribution of the sepal width for the well known Iris dataset.

The one dimensional scatterplot may be used to visualize the distribution of our attribute across our items or to find possible outliers.

2-D Scatterplot

In a two dimensional scatterplot we show the distribution of two quantities across the items. In this case we have no key attribute. As an example, here we show how the sepal length and the sepal width are varying across the items of the already used Iris dataset.

This visualization can be helpful to determine the underlying distribution for our attributes, to find whether there exist some correlation among the two variables or to look for clusters.

Bar chart

In a bar chart we show the how a quantitative attribute changes across a set of categories, which represent our key attribute.

Here and in the future will work under the hypothesis that there are no duplicates among the categories. In the database language, we may say that our key is a primary key.

As an example, we can visualize the number of gold medals that each country won in the 2020 Olympic games. Here we will only plot a sub-sample of the dataset, while the full dataset can be found on Mainak’s repository.

In this case the categorical variable is the team, while the quantitative variable is the number of gold medals.

The bar chart can be rotated by 90 degrees, but the vertical version (which we used) allows for a larger number of categories to be shown.

If the categories don’t have any natural order it may be a good idea to reorder the categories with respect to the plotted quantity to improve readability.

A bar chart can be very useful when one wants to compare the values of the attributes across the categories.

Line chart

In a line chart you can visualize how does a quantitative variable, which represent our value attribute, changes with respect to another quantity, which is a key attribute, and it often represents time. To better explain this graph, let us take a look at the gold price in the period 1978-2021.

This visualization can be useful to extract information between the value attribute and the key attribute.

Line chart is often abused, as the line naturally both encodes order and a concept of distance between the values in the x axes, so if x is not a quantitative variable one should never use the line chart.

Matrix chart

In a matrix we want to visualize how does a quantity (our value) distributes across two categorical variables, which are our key attributes. As an example, here we visualize how many points each team of the Six Nations Championship performed against each opponent in the period 2016-2023.

As we will discuss in a future post, this representation is never optimal, as the two spatial dimensions are already encoding the categorical variables, so one must rely on another channel, typically area or color, to encode the quantitative variable. The issue is that our perception of scale variations in both channels are prone to errors, so one may find difficulties to correctly decode the quantitative informations.

Matrix charts are typically used to find outliers or clusters.

Symbol map

The fifth and last type of visualization we will discuss here is the symbol map, where we show how a quantity varies across two spatial coordinates.

As an example, here I plot some of the places where I lived, where the area is proportional to the time I lived in each location.

Symbol maps can be used to determine the spatial distribution of a certain quantities.

Conclusions

We discussed some of the most relevant kind of visualizations, and they will be the starting point for other kind of visualizations. The choice of the visualization both depends on the attribute types and on the attribute semantics.