29 Sep 2019 16:08 GMT
The library uses D3 to make the charts but offers a simple higher-level interface. However, some of the D3 selections that are used to build and run the charts are exposed in the chart objects, so you can use D3 to extend the behaviour of a chart if you wish.
I won't run through all of the features of the library here — the documentation is fairly detailed. But I wanted to publish some examples of the library in action, so that people can see what it looks like before deciding if they want to try it. The following four examples show some of the most important features of the library.
Simple versions of these examples can be found in the example folder on GitHub, which you can use as a starting point if you want to make your own animated chart.
As I always say when I publish something completely new: this is work in progress, it's evolving, there may be bugs I haven't found yet, don't bank on interface stability until version 1.0 etc. But please do have a play and let me know if you have any suggestions or run into any problems.
7 Apr 2019 13:29 GMT
I have been working with ggplot2 themes lately, including developing a theme for the House of Commons Library. In my spare time I have also been working on a plotting theme for personal projects. For this theme I wanted a vibrant colour palette for representing discrete categories that is accessible to people with the most common kinds of colour blindness.1
I don't have any expertise in colour vision, but there are lots of good tools to help you test how a given set of colours look to different people, including Susie Lu and Elijah Meeks' Viz Palette and the Coblis colour blindness simulator.
I spent a while playing around with colours in Viz Palette, trying to come up with a set that are visually distinct for most people. I was interested to see how many categories it was possible to safely represent.
I managed to come up with a six colour palette that seems to work reasonably well in most cases. I stress, this was just the result of my own inexpert trial and error. I am sure there is room for improvement. The colours are:
- blue [#0080e8]
- sky [#70d0ff]
- mint [#98f098]
- yellow [#ffa000]
- green [#009900]
- magenta [#c00060]
There are two caveats to bear in mind. First, this palette still fails for people with the most severe and least common types of colour blindness, such as tritanopia and monochromacy.
Second, the palette takes advantage of differences in the perceived brightness of different colours, so colours that look different to people with no colour deficiency appear as different shades of the same colour to people with protanopia and deuteranopia.
In practice, this means you need to take care if you interpolate between the colours in this palette to represent more than six categories. Doing this may reduce accessibility, as it risks creating colours that are not visually distinct to some people. The sequence of the colours in the palette has been chosen to reduce this risk, but it needs to be considered. To be on the safe side, use the same number of colours from the palette as there are categories and don't interpolate.
The following charts approximate how this palette looks as a basic column chart to people with different types of vision. The colours shown are based on the transformations shown on Viz Palette, which I assume are a good approximation.
If you are colour blind in one of the ways listed here, I would love to know if you can in fact distinguish between the colours in the first chart in the way the colour blindness simulators suggest you are able to.
I wanted to test the palette in the most difficult case. So here is the same comparison using a messy scatter plot with all six categories. The charts show turnout in Parliamentary constituencies at the 2017 General Election by median age, with each constituency shaded according its settlement class, based on the Library's city and town classification of constituencies.
And finally, here is the same comparison again with a line chart, showing an index of the value of various technology stocks over the last decade.
The palette is available as ggplot2 scales in the pilot package, which is my general purpose plotting theme. You are more than welcome to use or fork it, but I make no guarantees that the theme elements won't change in future.
1. A quick note about the spelling of colour. I'm British so I spell colour “colour”. But in most English-based programming languages the word colour is spelled “color”. Consequently, I've developed a habit of using “color” everywhere in source code, and “colour” in all other contexts. But it's leaky. Is a GitHub readme source code? Probably, if it includes function signatures.
11 Mar 2019 08:38 GMT
A few months ago, when ONS published their population estimates for Parliamentary constituencies in mid-2017, I worked on an analysis of constituencies by median age. This included some interacive charts produced with D3, which I was never able to publish as the online part of the project didn't get off the ground. Rather than see that work go to waste, I decided to publish the charts here for posterity.
The charts in question are grid histograms, which show the distribution of Parliamentary constituencies by median age at the 2017 General Election. In these charts, each square represents a single constituency, and each chart shows the squares shaded by another variable: such as party, turnout and majority. Click on the image below or the link above to see the interactive versions of these charts.
On average, median age was lower in seats won by Labour and higher in seats won by the Conservatives. Turnout tended to be higher in seats with a higher median age. And some of Labour's biggest majorities were in seats with the lowest median age.
As a way of showing the strength of the relationship between two variables, I think this type of chart is probably less succesful than a scatterplot. But as a way of showing the distribution of one variable within another, I think it works quite well. Although perhaps better for the continuous variables than the categorical variable.
10 Feb 2019 17:27 GMT
During my career as a journalist I used to work for BBC Radio Current Affairs, where I would often work on Radio 4's pop stats programme More or Less. I don’t think anyone would be surprised to hear it was my favourite programme.
I am still in touch with the team that make the show, and every now and again the editor Richard Vadon sends me a message to chat about something statistical. Last year he sent me a DM with an interesting question.
More or Less wanted to work out which planet was closest to Earth on average, given how their relative positions change over time. They were exploring ways of getting the data and wanted to know if I could help. They did the story on the programme a couple of weeks ago, and they have produced a special version of that show for the BBC’s new interactive web player.
I want to be clear about how I helped with the story, because Tim Harford was very generous with his praise on the programme, which was kind of him and the team, but it was essentially a web scraping exercise.
I'm not an astronomer. I do use computational statistics in my job but I work primarily with social, economic and political data. Richard asked if I could help them with the story by scraping the data from the web, which I did. I’m not even sure this is the best source for the data, but it is a source, and one that was relatively easy to use.
Here’s a chart of the data that I gathered for the story — these are the “wiggly lines” Professor David Rothery talked about during the piece. It shows that on average over the last fifty years Mercury was the planet that was closest to Earth.
If you want to reproduce the chart yourself, you can download the Python code to gather the data and generate the image from this gist.
27 Jan 2019 18:26 GMT
I recently published two software packages for downloading and analysing data from the new Parliamentary data platform.
The data platform is an ambitious project, which aims to be a canonical source of integrated open data on Parliamentary activity. The data is stored in RDF and is available through a publicly accessible SPARQL endpoint. You can see the structure of the data stored in the platorm visualised with WebVOWL.
These packages provide an easy way to use the data platform API in both R and Python. They are aimed at people who want to use Parliamentary data for research and analysis. Their main feature is that they let you easily download data in a structure and format that is suitable for analysis, preserving the links between data so that it is easy to combine the results of different queries.
The packages provide two different interfaces to the data platorm:
- A low level interface that takes a SPARQL SELECT query, sends it to the platform, and returns the result as a tibble (R) or a DataFrame (Python), with data types appropriately converted.
- A high level interface comprising families of functions for downloading specific datasets. This currently focuses on key data about Members of both Houses of Parliament.
I think the data platform is great. It's a really valuable piece of public data infrastructure that has the potential to become a comprehensive digital record of what Parliament does. I hope to expand these packages as more data is added to the platform in future.