The science behind data visualisation

Over the last couple of centuries, data visualisation has developed to the point where it is in everyday use across all walks of life. Many recognise it as an effective tool for both storytelling and analysis, overcoming most language and educational barriers. But why is this? How are abstract shapes and colours often able to communicate large amounts of data more effectively than a table of numbers or paragraphs of text? An understanding of human perception will not only answer this question, but will also provide clear guidance and tools for improving the design of your own visualisations.

In order to understand how we are able to interpret data visualisations so effectively, we must start by examining the basics of how we perceive and process information, in particular visual information.

To better understand the differences between System 1 and System 2, consider Figure 1. In the photograph on the left we immediately perceive an angry man and probably associate loud noise and aggressive movement with the depicted scene. This exceedingly sophisticated interpretation of mere pixels is almost immediate, requires no effort and comes completely naturally. Contrast that with the multiplication on the right. We instantly recognise what is being asked of us and that we are able to work it out, but most will not attempt the mental arithmetic involved because of the conscious effort required. The initial reactions in both cases are pure System 1, while the mental arithmetic is an example of System 2.

We have evolved these separate systems so that our conscious minds do not become swamped with mundane processing. Our System 2 can focus on more complex comprehension and calculation tasks, with System 1 feeding System 2 with the necessary information for such tasks. In data visualisation, we should seek to encode as much information and understanding as possible in a way that is perceived correctly by our System 1, which then frees up System 2 for more involved understanding and analysis of the data.

Having introduced a high-level, abstract view of how we process information, we can now turn attention to the problem of how the information to be processed enters our minds in the first place, via our senses. A significant amount of the human brain is dedicated to visual processing, resulting our sight having a sharpness of perception far surpassing our other senses. As you can see from Figure 2, more information enters our minds at any given time through sight than through any of our other senses, both at the sub-conscious and conscious level. In fact, roughly 70% of the body's sense receptors relate to sight.

We can also see from Figure 2 that visual information, like all the sensory information, is heavily reduced between our sub-conscious and conscious. This is not because information is simply discarded, rather it is distilled by our System 1 so that our System 2 receives less, but richer information more relevant to whatever task we are currently undertaking. Sight's combination of bandwidth and processing power is why it is more suited to comprehending data sets than our other senses.

To maximise the efficacy of System 1's distillation of raw visual information, we need to delve into the details of our visual processing, presented in Figure 3. Light entering our eyes stimulates our retina, causing massively parallel impulses to be sent on to iconic memory. Iconic memory serves as a very short-term buffer and processor that ensures we maintain a coherent picture of the world at all times. Iconic memory also enriches the information passing through it by perceiving basic visual attributes such as shapes, edges, relative sizes and patches of colour. These are referred to as pre-attentive attributes.

Iconic memory's basic visual information is passed on to visual working memory, another form of short-term storage whose remarkably limited capacity gives rise to the observed "seven, plus or minus two" limit on the number of things we can remember at any given time. In order for us to recognise objects and scenes, the pathway described so far ("bottom-up processing") converges in visual working memory with a pathway bringing items and associations retrieved from long-term memory ("top-down processing").

Figure 4 shows a selection of the pre-attentive visual attributes that can be used to encode data, as detailed by Colin Ware in Information Visualization: Perception for Design. Stephen Few states that only a handful of these are attributes that we naturally, and universally, interpret as quantitative. Of those, length and 2-dimensional location are perceived more precisely than other attributes. For example, with length, we perceive a clear scale that corresponds well with objective measurement: bigger is "more" and smaller is "less". By contrast, with shape, we cannot say whether a circle means more or less than a square without the introduction of an artificial scale using a key.

We can compare values using quantitatively perceived pre-attentive attributes, but cannot infer actual values. For example, we can easily see that one line is longer than another, and so represents a bigger value, but to perceive that a line represents a particular value (such as 100, rather than 200) we must add an explicit scale with numbers or text. Unfortunately, numbers and text are not pre-attentively perceived because they are learned symbols, requiring a degree of memory look-up. The result is that comparison of pre-attentive visual attributes falls within our System 1, but decoding of the encoded values requires light use of System 2.

Now let us explore the perception of relationships in data, usually best presented by the structure and grouping in visualisations. In Figure 4 we can see that pre-attentive attributes that are not perceived quantitatively are effective at differentiating, i.e. grouping. However, rather than focusing on individual shapes we can use for grouping, we shall consider patterns, the pre-attentive perception of which has been captured in the Gestalt laws of perception (named after the Gestalt school of psychology where they were first observed).

Besides grouping, another extremely powerful relationship in data visualisation is ordering. The questions of "best", "worst" and more general rankings are common when considering data sets, and the simple act of applying appropriate ordering in a visualisation ensures such insights are immediate and effortless. With a little creativity and thinking, ordering can be re-enforced even in situations where it might at first not seem possible, as in the chart shown in Figure 6 where the Gestalt law of connectedness is used to great effect.

In general it can be said that the intention of all visualisations falls somewhere on a spectrum between pure presentation, i.e. telling a known story in a data set, such as static charts in newspapers, and full-on exploration, i.e. analysis and examination of a not-yet-understood data set, such as interactive analytical charts on a financial research website. Research carried out by William Cleveland and Robert McGill can inform decisions about how best to represent data depending on where on the presentation-exploration spectrum we want to target. Cleveland and McGill evaluated the relative efficacy of a number of basic visual encodings of data for comparison tasks. Their results imply a clear scale for the accuracy of comparison using the evaluated techniques, depicted in Figure 6.

From this scale you can see that it is no coincidence that we regularly see charts bar, line and scatter charts, given that all three use the visual form supporting the most accurate comparisons. Unfortunately, many consider these chart types to be "boring" and reach for more visually appealing chart forms such as pie charts. Cleveland and McGill's scale shows that the data encoding pie charts use, angle (and area, as a side-effect), do not support accurate comparison and, as such, are not a good choice in contexts where accurate comparison is required.

How does this help in the context of the presentation-exploration spectrum? The more analytical and exploratory your visualisation needs to be, the further up Cleveland and McGill's scale you must go, since accurate comparison is probably more important. This is not to say that presentational visualisations should never use the more precise forms of visual encodings. Rather, in these situations we are able to choose the appropriate level of compromise between accuracy and visual interest required for the particular story we are trying to tell and its intended audience.

Thank you for reading 5 articles this month* Join now for unlimited access

Enjoy your first month for just £1 / $1 / €1

*Read 5 free articles per month without a subscription

Join now for unlimited access

Try first month for just £1 / $1 / €1

TOPICS

The Creative Bloq team is made up of a group of art and design enthusiasts, and has changed and evolved since Creative Bloq began back in 2012. The current website team consists of eight full-time members of staff: Editor Georgia Coggan, Deputy Editor Rosie Hilder, Ecommerce Editor Beren Neale, Senior News Editor Daniel Piper, Editor, Digital Art and 3D Ian Dean, Tech Reviews Editor Erlingur Einarsson, Ecommerce Writer Beth Nicholls and Staff Writer Natalie Fear, as well as a roster of freelancers from around the world. The ImagineFX magazine team also pitch in, ensuring that content from leading digital art publication ImagineFX is represented on Creative Bloq.

The science behind data visualisation

System 1 vs System 2

Why visualisation?

How we see

Pre-attentive attributes are the way forward

Perceiving values

Perceiving relationships

And finally… getting started!

Conclusion