29 Dec 2019 09:41 GMT
I had a busy end to the year. Parliament voted for a general election to be held two weeks before Christmas and I was once again running the House of Commons Library's data collection effort. This was my third general election working for the Library. We completed the data collection in record time and published the first edition of our briefing paper and datasets within a week of the polls closing.
In between the data collection, I found a bit of time to work on some data visualisation for the election. My colleague Carl Baker (who I worked with previously on MSOA Names) has designed a new constituency cartogram, which neatly balances equally sized constituencies with geographic groupings that make it easy to find particular constituencies and to see patterns within historic county areas.
I made an interactive version of the cartogram for showing the election results online on the morning after the vote. Embedding a small image of it doesn't do it justice; the interactive version of the election cartogram is better.
I think it is a really nice bit of visual design and I am happy with how we managed to make it look and work on the web in quite a short period of time. I hope we can do further work using a similar approach for other geographies next year.
17 Nov 2019 16:32 GMT
As I mentioned in my last post, I have been playing with uncertainty charts again. In that post, I wanted to simplify the task of creating animated bar charts so that I could easily create uncertainty bar charts with positive and negative values.
More recently, I have been exploring how the idea of animated uncertainty could be extended to other chart types. The following two experiments were inspired by suggestions from Paul Bolton and Harvey Goldstein.
Both examples are based on the same fundamental idea behind the uncertainty bar chart, which is to illustrate the statistical uncertainty in a set of estimates by generating alternative but equally plausible data using random values drawn from the error distribution for each estimate.
In these examples, I use estimates of the number of EU nationals living in the London Borough of Haringey, which are taken from the Annual Population Survey. These figures are published by the ONS in their regular statistical release on the population of the UK by country of birth and nationality.
Haringey is the borough where I live. There has been an increase in the number of EU nationals living in Haringey since 2004, but the precise extent of the increase is uncertain due to sampling error in the APS. Coincidentally, I was also a migrant to Haringey during this period (from Wales).
I have embedded screenshots of the new charts below, but please follow the links, or click the screenshots, to see the live animated versions.
Uncertainty line chart
The uncertainty line chart starts by showing the trend for the estimates as a line. Clicking on the chart generates alternative lines for the same estimates. Each line is drawn in sequence, and then fades gradually over time until it disappears. In this way the chart builds up a constantly evolving representation of the uncertainty in the estimates, showing the range of possible trends.
Uncertainty level chart
The uncertainty level chart takes a slightly different approach. In this chart, each horizontal line represents an estimate of the number of EU nationals living in Haringey in each year.
When you click on the chart, these lines move to newly generated random values, leaving a translucent shadow of the value in each case. Over time these shadows build up to represent the density of the error distribution: the more values drawn in a given region, the darker that region becomes.
Like the animated bar charts, these charts show different trends that are equally likely given the uncertainty in the estimates. But unlike the bar charts, these versions also build up a visual representation of the overall degree of uncertainty.
In some respects, I quite like the epistemological terror that the bar charts can induce. Seeing a trend erased by variation is a useful antidote to the apparent solidity of numbers. But it does mean those charts are missing some important context, which is made explicit in these versions of animated uncertainty.
29 Sep 2019 16:08 GMT
The library uses D3 to make the charts but offers a simple higher-level interface. However, some of the D3 selections that are used to build and run the charts are exposed in the chart objects, so you can use D3 to extend the behaviour of a chart if you wish.
I won't run through all of the features of the library here — the documentation is fairly detailed. But I wanted to publish some examples of the library in action, so that people can see what it looks like before deciding if they want to try it. The following four examples show some of the most important features of the library.
Simple versions of these examples can be found in the example folder on GitHub, which you can use as a starting point if you want to make your own animated chart.
As I always say when I publish something completely new: this is work in progress, it's evolving, there may be bugs I haven't found yet, don't bank on interface stability until version 1.0 etc. But please do have a play and let me know if you have any suggestions or run into any problems.
7 Apr 2019 13:29 GMT
I have been working with ggplot2 themes lately, including developing a theme for the House of Commons Library. In my spare time I have also been working on a plotting theme for personal projects. For this theme I wanted a vibrant colour palette for representing discrete categories that is accessible to people with the most common kinds of colour blindness.1
I don't have any expertise in colour vision, but there are lots of good tools to help you test how a given set of colours look to different people, including Susie Lu and Elijah Meeks' Viz Palette and the Coblis colour blindness simulator.
I spent a while playing around with colours in Viz Palette, trying to come up with a set that are visually distinct for most people. I was interested to see how many categories it was possible to safely represent.
I managed to come up with a six colour palette that seems to work reasonably well in most cases. I stress, this was just the result of my own inexpert trial and error. I am sure there is room for improvement. The colours are:
- blue [#0080e8]
- sky [#70d0ff]
- mint [#98f098]
- yellow [#ffa000]
- green [#009900]
- magenta [#c00060]
There are two caveats to bear in mind. First, this palette still fails for people with the most severe and least common types of colour blindness, such as tritanopia and monochromacy.
Second, the palette takes advantage of differences in the perceived brightness of different colours, so colours that look different to people with no colour deficiency appear as different shades of the same colour to people with protanopia and deuteranopia.
In practice, this means you need to take care if you interpolate between the colours in this palette to represent more than six categories. Doing this may reduce accessibility, as it risks creating colours that are not visually distinct to some people. The sequence of the colours in the palette has been chosen to reduce this risk, but it needs to be considered. To be on the safe side, use the same number of colours from the palette as there are categories and don't interpolate.
The following charts approximate how this palette looks as a basic column chart to people with different types of vision. The colours shown are based on the transformations shown on Viz Palette, which I assume are a good approximation.
If you are colour blind in one of the ways listed here, I would love to know if you can in fact distinguish between the colours in the first chart in the way the colour blindness simulators suggest you are able to.
I wanted to test the palette in the most difficult case. So here is the same comparison using a messy scatter plot with all six categories. The charts show turnout in Parliamentary constituencies at the 2017 General Election by median age, with each constituency shaded according its settlement class, based on the Library's city and town classification of constituencies.
And finally, here is the same comparison again with a line chart, showing an index of the value of various technology stocks over the last decade.
The palette is available as ggplot2 scales in the pilot package, which is my general purpose plotting theme. You are more than welcome to use or fork it, but I make no guarantees that the theme elements won't change in future.
1. A quick note about the spelling of colour. I'm British so I spell colour “colour”. But in most English-based programming languages the word colour is spelled “color”. Consequently, I've developed a habit of using “color” everywhere in source code, and “colour” in all other contexts. But it's leaky. Is a GitHub readme source code? Probably, if it includes function signatures.
11 Mar 2019 08:38 GMT
A few months ago, when ONS published their population estimates for Parliamentary constituencies in mid-2017, I worked on an analysis of constituencies by median age. This included some interacive charts produced with D3, which I was never able to publish as the online part of the project didn't get off the ground. Rather than see that work go to waste, I decided to publish the charts here for posterity.
The charts in question are grid histograms, which show the distribution of Parliamentary constituencies by median age at the 2017 General Election. In these charts, each square represents a single constituency, and each chart shows the squares shaded by another variable: such as party, turnout and majority. Click on the image below or the link above to see the interactive versions of these charts.
On average, median age was lower in seats won by Labour and higher in seats won by the Conservatives. Turnout tended to be higher in seats with a higher median age. And some of Labour's biggest majorities were in seats with the lowest median age.
As a way of showing the strength of the relationship between two variables, I think this type of chart is probably less succesful than a scatterplot. But as a way of showing the distribution of one variable within another, I think it works quite well. Although perhaps better for the continuous variables than the categorical variable.