olihawkins

Visualising migration between the countries of the UK

14 Mar 2017 21:55 GMT

I've been experimenting with Sankey diagrams using d3 and thought I'd share an example. This visualisation shows migration flows between the different countries of the UK in the year ending June 2015. The data comes from the Office for National Statistics annual release on internal migration. In this dataset, internal migration refers to people moving to a new home in a different part of the UK.

When it comes to migration between the countries of the UK, most of the flows are between England and each of the other countries. There is much less direct migration between Wales, Scotland, and Northern Ireland. This may be because of the geographical arrangement of the UK, the size of England relative to the other countries, the movement of people to and from England's major cities (especially London), or a combination of all those things.

The flows between England and each other part of the UK are fairly balanced, with a similar number of people moving in each direction. Interestingly, the flows between Wales and England are slightly larger than the flows between Scotland and England, even though the population of Scotland is larger than that of Wales. What's not shown in this visualisation is the large number of moves within England itself.

One aspect of these charts that I'm undecided on is how the links — the flowing bridges between the origin and destination nodes — should be shaded. Most of the examples of Sankey diagrams made with d3 use a single neutral colour for all the links (see Mike Bostock's example). In this case I have used asymmetric shading: the links are shaded according to their origin node. This lets you trace the flows from their origin, reading from left to right, while you can easily see the composition of the flows at the destination without having to trace them back.

Some thoughts on Constituency Boundaries

4 Feb 2017 15:04 GMT

A few months ago I launched Constituency Boundaries, a small web app that lets you explore proposed changes to the boundaries of Parliamentary constituencies, which are being put forward as part of the 2018 Boundary Review.

I won't go over the details of how the app works here (the homepage explains it), but I wanted to set down a few thoughts about the process of developing this kind of software, because I think it has some interesting characteristics that put it somewhere between data journalism and traditional web application development.

Constituency Boundaries was developed primarily as a form of data journalism: to help answer a specific set of questions about a particular subject. The app exists because in the HoC Library we need a tool that not only allows us to compare the current and proposed constituency boundaries, but lets us explore how constituencies are composed, and discover what other possible arrangements of wards might produce alternative constituencies that also meet the boundary review criteria.

This requirement was essential, because if someone disagrees with some aspect of the proposed boundaries, it helps their case in the consultation process if they can show that a better alternative arrangement exists. (And making the app public means other people can use it for this purpose too.)

Because the app is designed to help people engage with a particular process, it doesn't have to be all things to all people. It's not a general tool for exploring theoretical arrangements of Parliamentary constituencies under every possible set of criteria; it directly addresses the questions raised by the 2018 review.

The development process was also journalistic in that the app was built to meet an event-driven news deadline — the announcement of the initial proposals for new constituencies — and to perform its role for the limited lifetime of that story, which concludes in 2018. This meant concentrating on core features and aiming for just a little more than the minimum viable product. The software is a living prototype rather than a fully evolved application.

And I'd argue that was the right approach because, as this chart from Mapbox's tile server shows, the peak of demand for the app was at its launch, on the day the initial proposals for England and Wales were published. So for this kind of software the features that matter are the features you can deliver on day one.

A chart showing a spike in website traffic at the time the app launched.

One way data journalism differs from application development is that you have more opportunity to iterate on your code between projects than within them. (Constituency Boundaries evolved from Population Builder, for example.)

At the same time, the deadline was known long enough in advance, and the boundary review process has a long enough lifetime, to make it worth some real effort. The app was built quickly, but relative to the features that it provides. It was more work than is involved in a simple interactive chart or map, and it needed additional development on the server.

I think there is a role in data journalism for small standalone data-driven apps like this: apps that expose large datasets to the public in a way that makes them explorable and easy to understand. They can yield more than one story. They can even move the story on.

Getting data on Members of Parliament with Python

25 Aug 2016 09:27 GMT

The UK Parliament has a little known but very useful web service called the Members Names Information Service. It provides a public interface to Parliament's Members Names database, which holds information on all MPs elected to Parliament since the 1983 General Election.1

Over the last two years I have been using MNIS to help answer questions about MPs, writing ad-hoc Python code to extract information as and when I needed it. It's a flexible and powerful API, but it's not very easy to use.

During my summer break I decided to write a dedicated library to make routine work with MNIS as easy as possible. I have published the code in a small Python package called mnis.

The mnis library

At the most basic level, the mnis library allows you to download key data on all MPs who were serving on a given date to a csv with a single line of code. It makes it easy to obtain the same summary data for MPs as a list of Python dictionaries, or alternatively to get the full data for each MP returned by the API. The library allows you to customise the parameters sent to the API through a simple interface and makes possible quite sophisticated analysis of MPs' careers.

The library's summary functions provide the following data on MPs by default:

  • Member ID
  • Name
  • Constituency
  • Party
  • Date of Birth
  • Gender
  • Date first became an MP
  • Number of days service (excluding dissolution periods)

The mnis library is a personal project and it is unofficial. I am sharing it ‘as is’ in case it is useful to others.

Python setup

The mnis library is written in Python 3. Install it into your chosen environment with: ‘pip install mnis’.

Downloading data on MPs

To download summary data on all MPs serving on a given date to a csv, pass a datetime.date object to the downloadMembers function. The constituency, party, and number of days served shown for each MP is as at the given date.


import mnis
import datetime

# Download data on current MPs into members.csv
mnis.downloadMembers(datetime.date.today(), 'members.csv')

# Download data on MPs serving on 25 Aug 2016 into members.csv
mnis.downloadMembers(datetime.date(2016, 8, 25), 'members.csv')

To do exactly the same thing step by step, giving you access to all the available data at each stage, do the following:


import mnis
import datetime

# Create a date for the analysis
d = datetime.date.today()

# Download full data for MPs serving on this date as a list
members = mnis.getCommonsMembersOn(d)

# Get the summary data for these members as a list
sd = mnis.getSummaryDataForMembers(members, d)

# Save the summary data into members.csv
mnis.saveSummaryDataForMembers(sd, 'members.csv')

Note that a date is passed both to functions for downloading member data (in this case getCommonsMembersOn) and to functions for extracting summary data (getSummaryDataForMembers). This is because the functions that extract summary data for each MP from their full record need to return the party, constituency, and number of days served for a particular date.

In many cases the date used to get members will be the same as the date used to extract summary data about those members, but it doesn't have to be.

This means you can get all MPs serving on a particular date, or between particular dates, and then find out which parties and constituencies they were representing on a different date. If an MP was not serving on the date used for summarising the data, the summary data will report that they weren't serving on that date. This means you can do things like find out which of a group of MPs serving on one date were still serving at a later date.2

To give an example, the following code gets all MPs who served during the 2010-15 Parliament, including those elected at by-elections. If the date passed to getSummaryDataForMembers is for the start of the Parliament, Douglas Carswell MP is shown as a member of the Conservative Party; but if the date passed to getSummaryDataForMembers is for the end of the Parliament, his party is shown as the UK Independence Party.


import mnis
import datetime

# Create dates for the start and end of the 2010-15 Parliament
startDate = datetime.date(2010, 5, 7)
endDate = datetime.date(2015, 3, 30)

# Download full data for MPs serving between the dates as a list
members = mnis.getCommonsMembersBetween(startDate, endDate)

# Get the summary data for these members on the startDate
sd = mnis.getSummaryDataForMembers(members, startDate)

# Douglas Carswell's party is Conservative
print(sd[103]['list_name'], '-', sd[103]['party'])

# Get the summary data for these members on the endDate
sd = mnis.getSummaryDataForMembers(members, endDate)

# Douglas Carswell's party is UK Independence Party
print(sd[103]['list_name'], '-', sd[103]['party'])

In the above example, we requested all members who served at any point during the 2010-15 Parliament, but this wasn't strictly necessary. Douglas Carswell's record would have been returned in the results of any date-based request within that Parliament, except one that fell wholly within the period between his resignation as an MP on the 29th of August 2014 and his re-election at the Clacton by-election on the 9th of October 2014. But it was a good opportunity to show how you can request all members serving within a range of dates using getCommonsMembersBetween.

API gotchas

The Members Names database is an administrative system as well as a record of historical data, and there are some inconsistencies in recording practices to look out for. In particular, in some cases MPs are listed as serving up to the date of the general election at which they were defeated or stepped down, while in others they are listed as serving up to the date of dissolution before the general election at which they were defeated or stepped down.

This does not affect the calculation of the number of days served by a member, which excludes any period of dissolution irrespective of how the memberships are recorded. However, it does affect the MPs returned by date-based API requests.

For example, requesting all members serving on the date of the 2010 General Election with getCommonsMembersOn returns the 650 MPs elected on that date and the 225 MPs who were either defeated or stood down at that election. This is not the case for the 2015 General Election: a date-based request for members serving on the date of that election returns just those elected on that day.3

There are two simple solutions to this problem. First, if you are only interested in MPs returned at a particular general election you can use the function getCommonsMembersAtElection, which uses a different API call and only returns those MPs elected on that date. The function takes the year of the general election as a string and will return records for any general election since 1983.


members = mnis.getCommonsMembersAtElection('2010')

Alternatively, if you want to request MPs based on a date range starting at a general election, use the day after the general election as the start date. The membership hasn't changed between election day and the following day at any of the general elections since 1983, so requesting the MPs serving on the day following a general election is equivalent to asking for the MPs elected at that election. This is how the data was requested in the above example showing Douglas Carswell's change of party.4

A less simple solution, which provides the most fine-grained control, is to request the full data for all members with a date-based request and then filter the list using the dates of their House memberships. In most cases this sort of approach is not necessary, but it is wise to check the data returned by the API before automating any analysis.

Further information

This post covers the basics of using the mnis libray to extract data from Members Names. In a future post I will take a deeper dive into the mnis library, showing how to customise API requests and write your own data extraction functions.

Footnotes

1. The API returns information on some members before 1983, but coverage is incomplete before then and becomes more sparse the further you go back in time.

2. Use the constituency field, rather than the party field, to test whether an MP was serving on a given date, as an MP who later served in the House of Lords will also have party memberships associated with their Lords membership.

3. The same issue does not appear to affect outgoing members at by-elections, whose end date is either the date of their death, or the date of their formal resignation as an MP under official Parliamentary procedures.

4. Dates of general elections and dissolutions are available as a dictionary in the mnis.housedata module.

An introduction to programming with Python for complete beginners

14 Apr 2016 21:14 GMT

Earlier this week I ran a workshop teaching basic Python to people with little or no programming experience. I promised on Twitter that if it went well I would share the slides, which you can download here in PDF or PowerPoint format.

This is the first time I have ever tried to teach anyone how to program from scratch, and designing the workshop was not easy. There is a basic minimum you need to know before you can do anything useful as a programmer, but expecting beginners to absorb a lot of theory before writing a single line of code seems like a mistake.

The joy of programming comes from seeing your code run. To this day I think there is something magical about typing a string of symbols and making something happen. So this workshop tries to give people that experience as quickly as possible.

It's built around typing very short code examples into an online Python interpreter in order to see the concepts being taught in action. When running the workshop I tried to give everyone enough time absorb what they were doing, to experiment with the code, and to test their intuitions about how it worked.

I felt this approach worked reasonably well, but I need to reflect more before deciding whether to change anything in future. In the meantime, here are the slides in case they are useful to anyone teaching or learning to code for the first time.

Further adventures in treemapping the UK's migrant population

9 Apr 2016 14:05 GMT

Earlier this week I posted a treemap showing the UK's migrant population by country of birth. A common reaction among people who saw it was to wonder what the size of the UK's foreign-born population was relative to the size of the UK-born population.

To help put that in context I have produced a new nested treemap showing the population of the UK broken down by region and broad country of birth. The population in each region is grouped into those born in the UK, those born in other EU countries, and those born in countries outside the EU.

While it's interesting to see the data visualised in this way, the advantages of using a treemap rather than a traditional bar chart are much less obvious in this case. When showing the migrant population by individual country of birth, a treemap lets you compare data for a very large number of countries in a way that is much easier to gloss than a bar chart. It allows you to group countries into common geographical regions, which represent the group's aggregate size. And the arrangement of countries from largest to smallest in each group provides a good visual representation of the distribution of the population within the group.

In this case, the UK's migrant population is too small as a proportion of the total population to break down into individual countries of birth, or anything more than two or three groups. Arguably the most interesting thing about the visualisation is how small the migrant population appears relative to the size of the UK-born population in every region outside London. On the other hand, a treemap makes it harder to make exact comparisons between the size of the migrant population in each region.

In short, I don't think this treemap is as effective as the last one, but that is probably because it is less well suited to the data being presented. But the new version of D3 made it just as easy to produce this treemap as the last one, and I thought it was worth sharing for those that asked to see it.