olihawkins

Further adventures in animating uncertainty

17 Nov 2019 16:32 GMT

As I mentioned in my last post, I have been playing with uncertainty charts again. In that post, I wanted to simplify the task of creating animated bar charts so that I could easily create uncertainty bar charts with positive and negative values.

More recently, I have been exploring how the idea of animated uncertainty could be extended to other chart types. The following two experiments were inspired by suggestions from Paul Bolton and Harvey Goldstein.

Both examples are based on the same fundamental idea behind the uncertainty bar chart, which is to illustrate the statistical uncertainty in a set of estimates by generating alternative but equally plausible data using random values drawn from the error distribution for each estimate.

In these examples, I use estimates of the number of EU nationals living in the London Borough of Haringey, which are taken from the Annual Population Survey. These figures are published by the ONS in their regular statistical release on the population of the UK by country of birth and nationality.

Haringey is the borough where I live. There has been an increase in the number of EU nationals living in Haringey since 2004, but the precise extent of the increase is uncertain due to sampling error in the APS. Coincidentally, I was also a migrant to Haringey during this period (from Wales).

I have embedded screenshots of the new charts below, but please follow the links, or click the screenshots, to see the live animated versions.

Uncertainty line chart

The uncertainty line chart starts by showing the trend for the estimates as a line. Clicking on the chart generates alternative lines for the same estimates. Each line is drawn in sequence, and then fades gradually over time until it disappears. In this way the chart builds up a constantly evolving representation of the uncertainty in the estimates, showing the range of possible trends.

An uncertainy line chart showing different possible trends for growth in the number of EU nationals living in Haringey between 2004 and 2018

Uncertainty level chart

The uncertainty level chart takes a slightly different approach. In this chart, each horizontal line represents an estimate of the number of EU nationals living in Haringey in each year.

When you click on the chart, these lines move to newly generated random values, leaving a translucent shadow of the value in each case. Over time these shadows build up to represent the density of the error distribution: the more values drawn in a given region, the darker that region becomes.

An uncertainy level chart showing the range of possible values for each estimate of the number of EU nationals living in Haringey in each year from 2004 to 2018

Like the animated bar charts, these charts show different trends that are equally likely given the uncertainty in the estimates. But unlike the bar charts, these versions also build up a visual representation of the overall degree of uncertainty.

In some respects, I quite like the epistemological terror that the bar charts can induce. Seeing a trend erased by variation is a useful antidote to the apparent solidity of numbers. But it does mean those charts are missing some important context, which is made explicit in these versions of animated uncertainty.

Animated Bars

29 Sep 2019 16:08 GMT

Over the summer I was experimenting with animated uncertainty charts again. A few people have suggested I should go back to this idea and explore it further. I started thinking about how to make a general purpose uncertainty chart and realised that an uncertainty bar chart is just a particular kind of animated bar chart. So I have written a JavaScript library for just this purpose.

animated-bars is a JavaScript library for creating animated bar and column charts in webpages. You can find the source code on GitHub and the Node module on npm. It lets you create a bar or column chart with some initial data and a configuration, and then update the chart with new values. Animated transitions between the values are handled automatically according to the settings in the configuration.

The library uses D3 to make the charts but offers a simple higher-level interface. However, some of the D3 selections that are used to build and run the charts are exposed in the chart objects, so you can use D3 to extend the behaviour of a chart if you wish.

I won't run through all of the features of the library here — the documentation is fairly detailed. But I wanted to publish some examples of the library in action, so that people can see what it looks like before deciding if they want to try it. The following four examples show some of the most important features of the library.

Simple versions of these examples can be found in the example folder on GitHub, which you can use as a starting point if you want to make your own animated chart.

As I always say when I publish something completely new: this is work in progress, it's evolving, there may be bugs I haven't found yet, don't bank on interface stability until version 1.0 etc. But please do have a play and let me know if you have any suggestions or run into any problems.

Developing a vibrant accessible colour palette

7 Apr 2019 13:29 GMT

I have been working with ggplot2 themes lately, including developing a theme for the House of Commons Library. In my spare time I have also been working on a plotting theme for personal projects. For this theme I wanted a vibrant colour palette for representing discrete categories that is accessible to people with the most common kinds of colour blindness.1

I don't have any expertise in colour vision, but there are lots of good tools to help you test how a given set of colours look to different people, including Susie Lu and Elijah Meeks' Viz Palette and the Coblis colour blindness simulator.

I spent a while playing around with colours in Viz Palette, trying to come up with a set that are visually distinct for most people. I was interested to see how many categories it was possible to safely represent.

I managed to come up with a six colour palette that seems to work reasonably well in most cases. I stress, this was just the result of my own inexpert trial and error. I am sure there is room for improvement. The colours are:

  • blue [#0080e8]
  • sky [#70d0ff]
  • mint [#98f098]
  • yellow [#ffa000]
  • green [#009900]
  • magenta [#c00060]

There are two caveats to bear in mind. First, this palette still fails for people with the most severe and least common types of colour blindness, such as tritanopia and monochromacy.

Second, the palette takes advantage of differences in the perceived brightness of different colours, so colours that look different to people with no colour deficiency appear as different shades of the same colour to people with protanopia and deuteranopia.

In practice, this means you need to take care if you interpolate between the colours in this palette to represent more than six categories. Doing this may reduce accessibility, as it risks creating colours that are not visually distinct to some people. The sequence of the colours in the palette has been chosen to reduce this risk, but it needs to be considered. To be on the safe side, use the same number of colours from the palette as there are categories and don't interpolate.

Column charts

The following charts approximate how this palette looks as a basic column chart to people with different types of vision. The colours shown are based on the transformations shown on Viz Palette, which I assume are a good approximation.

If you are colour blind in one of the ways listed here, I would love to know if you can in fact distinguish between the colours in the first chart in the way the colour blindness simulators suggest you are able to.

A simple bar chart showing how the colours look to a person with normal vision

A simple bar chart showing how the colours look to a person with deuteranomaly

A simple bar chart showing how the colours look to a person with protanomaly

A simple bar chart showing how the colours look to a person with protanopia

A simple bar chart showing how the colours look to a person with deuteranopia

Scatter plots

I wanted to test the palette in the most difficult case. So here is the same comparison using a messy scatter plot with all six categories. The charts show turnout in Parliamentary constituencies at the 2017 General Election by median age, with each constituency shaded according its settlement class, based on the Library's city and town classification of constituencies.

A scatter plot showing how the colours look to a person with normal vision

A scatter plot showing how the colours look to a person with deuteranomaly

A scatter plot showing how the colours look to a person with protanomaly

A scatter plot showing how the colours look to a person with protanopia

A scatter plot showing how the colours look to a person with deuteranopia

Line charts

And finally, here is the same comparison again with a line chart, showing an index of the value of various technology stocks over the last decade.

A line chart showing how the colours look to a person with normal vision

A line chart showing how the colours look to a person with deuteranomaly

A line chart showing how the colours look to a person with protanomaly

A line chart showing how the colours look to a person with protanopia

A line chart showing how the colours look to a person with deuteranopia

The palette is available as ggplot2 scales in the pilot package, which is my general purpose plotting theme. You are more than welcome to use or fork it, but I make no guarantees that the theme elements won't change in future.

Footnotes

1. A quick note about the spelling of colour. I'm British so I spell colour “colour”. But in most English-based programming languages the word colour is spelled “color”. Consequently, I've developed a habit of using “color” everywhere in source code, and “colour” in all other contexts. But it's leaky. Is a GitHub readme source code? Probably, if it includes function signatures.

Experimenting with grid histograms

11 Mar 2019 08:38 GMT

A few months ago, when ONS published their population estimates for Parliamentary constituencies in mid-2017, I worked on an analysis of constituencies by median age. This included some interacive charts produced with D3, which I was never able to publish as the online part of the project didn't get off the ground. Rather than see that work go to waste, I decided to publish the charts here for posterity.

The charts in question are grid histograms, which show the distribution of Parliamentary constituencies by median age at the 2017 General Election. In these charts, each square represents a single constituency, and each chart shows the squares shaded by another variable: such as party, turnout and majority. Click on the image below or the link above to see the interactive versions of these charts.

A set of four charts each showing the distribution of constituencies by median age and their relationship with another variable

On average, median age was lower in seats won by Labour and higher in seats won by the Conservatives. Turnout tended to be higher in seats with a higher median age. And some of Labour's biggest majorities were in seats with the lowest median age.

As a way of showing the strength of the relationship between two variables, I think this type of chart is probably less succesful than a scatterplot. But as a way of showing the distribution of one variable within another, I think it works quite well. Although perhaps better for the continuous variables than the categorical variable.

Web scraping for BBC More or Less

10 Feb 2019 17:27 GMT

During my career as a journalist I used to work for BBC Radio Current Affairs, where I would often work on Radio 4's pop stats programme More or Less. I don’t think anyone would be surprised to hear it was my favourite programme.

I am still in touch with the team that make the show, and every now and again the editor Richard Vadon sends me a message to chat about something statistical. Last year he sent me a DM with an interesting question.

A direct message from the editor of More or Less asking me to web scrape data on the position of the planets over time

More or Less wanted to work out which planet was closest to Earth on average, given how their relative positions change over time. They were exploring ways of getting the data and wanted to know if I could help. They did the story on the programme a couple of weeks ago, and they have produced a special version of that show for the BBC’s new interactive web player.

I want to be clear about how I helped with the story, because Tim Harford was very generous with his praise on the programme, which was kind of him and the team, but it was essentially a web scraping exercise.

I'm not an astronomer. I do use computational statistics in my job but I work primarily with social, economic and political data. Richard asked if I could help them with the story by scraping the data from the web, which I did. I’m not even sure this is the best source for the data, but it is a source, and one that was relatively easy to use.

Here’s a chart of the data that I gathered for the story — these are the “wiggly lines” Professor David Rothery talked about during the piece. It shows that on average over the last fifty years Mercury was the planet that was closest to Earth.

A chart showing how the distances of Mercury, Venus and Mars from Earth vary over time

If you want to reproduce the chart yourself, you can download the Python code to gather the data and generate the image from this gist.