18 Sep 2020 14:28 GMT
I have always been skeptical about the value of writing weeknotes. There are a million and one things I want to get done at work and the idea of writing weeknotes always felt like not doing any of them.
However, in the last few months I have started to notice how working from home has subtly changed the way we work. Some of these changes are good: I get longer periods of uninterrupted time between meetings, which makes it easier to be productive with development work.
But some are less good. In particular, I've noticed that with many people working from home there is less opportunity for serendipity, when you find out through casual conversations that what you are doing joins up neatly with something one of your colleagues is doing and you both benefit.
So while we're all keeping our distance, I thought it might help to know what I'm up to.
House of Commons Library GitHub account
This week we launched our House of Commons Library GitHub account. Since we started the data science programme two years ago, researchers in the Library have been working on tools to make routine data analysis easier. Some of this work is potentially useful to researchers outside Parliament, and we share code that has a general purpose when we can.
Until now we have all been using personal GitHub accounts, but some of the tools we have developed have become important enough that we need to maintain and manage them in one place. I hope that having an organisational account will also make collaboration and training easier. It's been a real pleasure seeing statistical researchers with no previous programming experience discover its potential and get really good at it. We are now in a position to work collaboratively on shared projects as a team.
I spent a good part of this week working on a new R package for downloading data from Parliament's Committees API. The package is called clcommittees and I've published an initial working version on GitHub. We use data on committees in enquiries, briefings and dashboards. The package currently focusses on committees, memberships and roles, but it is likely to grow to include other data as and when we need it.
This package is part of a wider programme of work we are doing developing our capability with Parliamentary data. We are developing tools to work with a range of Parliamentary data sources (including the data platform now it has been unpaused), so watch this space.
New chart style for the Library
This week, the Library launched it's new chart style. The style was developed by my colleages Carl Baker and Paul Bolton. I've implemented the style as a ggplot2 theme in an R package called clcharts, so that Library researchers can easily produce charts in the new style when doing computational data analysis. To give you a flavour, here's an example of a ridgeline plot made with the package.
You can see more examples in the GitHub readme at the link above. I think the new style looks great, and thanks to patchwork we have been able to fully implement it in R, which wasn't the case with our old chart style.
MSOA Names 1.5.0
Carl and I updated the MSOA Names dataset and map to version 1.5.0 to fix a couple of errors people had spotted. The dataset has been turning up everywhere from Public Health England's map of coronavirus cases to the Doogal postcode lookup service.
29 Dec 2019 09:41 GMT
I had a busy end to the year. Parliament voted for a general election to be held two weeks before Christmas and I was once again running the House of Commons Library's data collection effort. This was my third general election working for the Library. We completed the data collection in record time and published the first edition of our briefing paper and datasets within a week of the polls closing.
In between the data collection, I found a bit of time to work on some data visualisation for the election. My colleague Carl Baker (who I worked with previously on MSOA Names) has designed a new constituency cartogram, which neatly balances equally sized constituencies with geographic groupings that make it easy to find particular constituencies and to see patterns within historic county areas.
I made an interactive version of the cartogram for showing the election results online on the morning after the vote. Embedding a small image of it doesn't do it justice; the interactive version of the election cartogram is better.
I think it is a really nice bit of visual design and I am happy with how we managed to make it look and work on the web in quite a short period of time. I hope we can do further work using a similar approach for other geographies next year.
17 Nov 2019 16:32 GMT
As I mentioned in my last post, I have been playing with uncertainty charts again. In that post, I wanted to simplify the task of creating animated bar charts so that I could easily create uncertainty bar charts with positive and negative values.
More recently, I have been exploring how the idea of animated uncertainty could be extended to other chart types. The following two experiments were inspired by suggestions from Paul Bolton and Harvey Goldstein.
Both examples are based on the same fundamental idea behind the uncertainty bar chart, which is to illustrate the statistical uncertainty in a set of estimates by generating alternative but equally plausible data using random values drawn from the error distribution for each estimate.
In these examples, I use estimates of the number of EU nationals living in the London Borough of Haringey, which are taken from the Annual Population Survey. These figures are published by the ONS in their regular statistical release on the population of the UK by country of birth and nationality.
Haringey is the borough where I live. There has been an increase in the number of EU nationals living in Haringey since 2004, but the precise extent of the increase is uncertain due to sampling error in the APS. Coincidentally, I was also a migrant to Haringey during this period (from Wales).
I have embedded screenshots of the new charts below, but please follow the links, or click the screenshots, to see the live animated versions.
Uncertainty line chart
The uncertainty line chart starts by showing the trend for the estimates as a line. Clicking on the chart generates alternative lines for the same estimates. Each line is drawn in sequence, and then fades gradually over time until it disappears. In this way the chart builds up a constantly evolving representation of the uncertainty in the estimates, showing the range of possible trends.
Uncertainty level chart
The uncertainty level chart takes a slightly different approach. In this chart, each horizontal line represents an estimate of the number of EU nationals living in Haringey in each year.
When you click on the chart, these lines move to newly generated random values, leaving a translucent shadow of the value in each case. Over time these shadows build up to represent the density of the error distribution: the more values drawn in a given region, the darker that region becomes.
Like the animated bar charts, these charts show different trends that are equally likely given the uncertainty in the estimates. But unlike the bar charts, these versions also build up a visual representation of the overall degree of uncertainty.
In some respects, I quite like the epistemological terror that the bar charts can induce. Seeing a trend erased by variation is a useful antidote to the apparent solidity of numbers. But it does mean those charts are missing some important context, which is made explicit in these versions of animated uncertainty.
29 Sep 2019 16:08 GMT
The library uses D3 to make the charts but offers a simple higher-level interface. However, some of the D3 selections that are used to build and run the charts are exposed in the chart objects, so you can use D3 to extend the behaviour of a chart if you wish.
I won't run through all of the features of the library here — the documentation is fairly detailed. But I wanted to publish some examples of the library in action, so that people can see what it looks like before deciding if they want to try it. The following four examples show some of the most important features of the library.
Simple versions of these examples can be found in the example folder on GitHub, which you can use as a starting point if you want to make your own animated chart.
As I always say when I publish something completely new: this is work in progress, it's evolving, there may be bugs I haven't found yet, don't bank on interface stability until version 1.0 etc. But please do have a play and let me know if you have any suggestions or run into any problems.
7 Apr 2019 13:29 GMT
I have been working with ggplot2 themes lately, including developing a theme for the House of Commons Library. In my spare time I have also been working on a plotting theme for personal projects. For this theme I wanted a vibrant colour palette for representing discrete categories that is accessible to people with the most common kinds of colour blindness.1
I don't have any expertise in colour vision, but there are lots of good tools to help you test how a given set of colours look to different people, including Susie Lu and Elijah Meeks' Viz Palette and the Coblis colour blindness simulator.
I spent a while playing around with colours in Viz Palette, trying to come up with a set that are visually distinct for most people. I was interested to see how many categories it was possible to safely represent.
I managed to come up with a six colour palette that seems to work reasonably well in most cases. I stress, this was just the result of my own inexpert trial and error. I am sure there is room for improvement. The colours are:
- blue [#0080e8]
- sky [#70d0ff]
- mint [#98f098]
- yellow [#ffa000]
- green [#009900]
- magenta [#c00060]
There are two caveats to bear in mind. First, this palette still fails for people with the most severe and least common types of colour blindness, such as tritanopia and monochromacy.
Second, the palette takes advantage of differences in the perceived brightness of different colours, so colours that look different to people with no colour deficiency appear as different shades of the same colour to people with protanopia and deuteranopia.
In practice, this means you need to take care if you interpolate between the colours in this palette to represent more than six categories. Doing this may reduce accessibility, as it risks creating colours that are not visually distinct to some people. The sequence of the colours in the palette has been chosen to reduce this risk, but it needs to be considered. To be on the safe side, use the same number of colours from the palette as there are categories and don't interpolate.
The following charts approximate how this palette looks as a basic column chart to people with different types of vision. The colours shown are based on the transformations shown on Viz Palette, which I assume are a good approximation.
If you are colour blind in one of the ways listed here, I would love to know if you can in fact distinguish between the colours in the first chart in the way the colour blindness simulators suggest you are able to.
I wanted to test the palette in the most difficult case. So here is the same comparison using a messy scatter plot with all six categories. The charts show turnout in Parliamentary constituencies at the 2017 General Election by median age, with each constituency shaded according its settlement class, based on the Library's city and town classification of constituencies.
And finally, here is the same comparison again with a line chart, showing an index of the value of various technology stocks over the last decade.
The palette is available as ggplot2 scales in the pilot package, which is my general purpose plotting theme. You are more than welcome to use or fork it, but I make no guarantees that the theme elements won't change in future.
1. A quick note about the spelling of colour. I'm British so I spell colour “colour”. But in most English-based programming languages the word colour is spelled “color”. Consequently, I've developed a habit of using “color” everywhere in source code, and “colour” in all other contexts. But it's leaky. Is a GitHub readme source code? Probably, if it includes function signatures.