4 Feb 2017 15:04 GMT
A few months ago I launched Constituency Boundaries, a small web app that lets you explore proposed changes to the boundaries of Parliamentary constituencies, which are being put forward as part of the 2018 Boundary Review.
I won't go over the details of how the app works here (the homepage explains it), but I wanted to set down a few thoughts about the process of developing this kind of software, because I think it has some interesting characteristics that put it somewhere between data journalism and traditional web application development.
Constituency Boundaries was developed primarily as a form of data journalism: to help answer a specific set of questions about a particular subject. The app exists because in the HoC Library we need a tool that not only allows us to compare the current and proposed constituency boundaries, but lets us explore how constituencies are composed, and discover what other possible arrangements of wards might produce alternative constituencies that also meet the boundary review criteria.
This requirement was essential, because if someone disagrees with some aspect of the proposed boundaries, it helps their case in the consultation process if they can show that a better alternative arrangement exists. (And making the app public means other people can use it for this purpose too.)
Because the app is designed to help people engage with a particular process, it doesn't have to be all things to all people. It's not a general tool for exploring theoretical arrangements of Parliamentary constituencies under every possible set of criteria; it directly addresses the questions raised by the 2018 review.
The development process was also journalistic in that the app was built to meet an event-driven news deadline — the announcement of the initial proposals for new constituencies — and to perform its role for the limited lifetime of that story, which concludes in 2018. This meant concentrating on core features and aiming for just a little more than the minimum viable product. The software is a living prototype rather than a fully evolved application.
And I'd argue that was the right approach because, as this chart from Mapbox's tile server shows, the peak of demand for the app was at its launch, on the day the initial proposals for England and Wales were published. So for this kind of software the features that matter are the features you can deliver on day one.
One way data journalism differs from application development is that you have more opportunity to iterate on your code between projects than within them. (Constituency Boundaries evolved from Population Builder, for example.)
At the same time, the deadline was known long enough in advance, and the boundary review process has a long enough lifetime, to make it worth some real effort. The app was built quickly, but relative to the features that it provides. It was more work than is involved in a simple interactive chart or map, and it needed additional development on the server.
I think there is a role in data journalism for small standalone data-driven apps like this: apps that expose large datasets to the public in a way that makes them explorable and easy to understand. They can yield more than one story. They can even move the story on.
25 Aug 2016 09:27 GMT
The UK Parliament has a little known but very useful web service called the Members Names Information Service. It provides a public interface to Parliament's Members Names database, which holds information on all MPs elected to Parliament since the 1983 General Election.1
Over the last two years I have been using MNIS to help answer questions about MPs, writing ad-hoc Python code to extract information as and when I needed it. It's a flexible and powerful API, but it's not very easy to use.
During my summer break I decided to write a dedicated library to make routine work with MNIS as easy as possible. I have published the code in a small Python package called mnis.
The mnis library
At the most basic level, the mnis library allows you to download key data on all MPs who were serving on a given date to a csv with a single line of code. It makes it easy to obtain the same summary data for MPs as a list of Python dictionaries, or alternatively to get the full data for each MP returned by the API. The library allows you to customise the parameters sent to the API through a simple interface and makes possible quite sophisticated analysis of MPs' careers.
The library's summary functions provide the following data on MPs by default:
- Member ID
- Date of Birth
- Date first became an MP
- Number of days service (excluding dissolution periods)
The mnis library is a personal project and it is unofficial. I am sharing it ‘as is’ in case it is useful to others.
The mnis library is written in Python 3. Install it into your chosen environment with: ‘pip install mnis’.
Downloading data on MPs
To download summary data on all MPs serving on a given date to a csv, pass a datetime.date object to the downloadMembers function. The constituency, party, and number of days served shown for each MP is as at the given date.
# Download data on current MPs into members.csv
# Download data on MPs serving on 25 Aug 2016 into members.csv
mnis.downloadMembers(datetime.date(2016, 8, 25), 'members.csv')
To do exactly the same thing step by step, giving you access to all the available data at each stage, do the following:
# Create a date for the analysis
d = datetime.date.today()
# Download full data for MPs serving on this date as a list
members = mnis.getCommonsMembersOn(d)
# Get the summary data for these members as a list
sd = mnis.getSummaryDataForMembers(members, d)
# Save the summary data into members.csv
Note that a date is passed both to functions for downloading member data (in this case getCommonsMembersOn) and to functions for extracting summary data (getSummaryDataForMembers). This is because the functions that extract summary data for each MP from their full record need to return the party, constituency, and number of days served for a particular date.
In many cases the date used to get members will be the same as the date used to extract summary data about those members, but it doesn't have to be.
This means you can get all MPs serving on a particular date, or between particular dates, and then find out which parties and constituencies they were representing on a different date. If an MP was not serving on the date used for summarising the data, the summary data will report that they weren't serving on that date. This means you can do things like find out which of a group of MPs serving on one date were still serving at a later date.2
To give an example, the following code gets all MPs who served during the 2010-15 Parliament, including those elected at by-elections. If the date passed to getSummaryDataForMembers is for the start of the Parliament, Douglas Carswell MP is shown as a member of the Conservative Party; but if the date passed to getSummaryDataForMembers is for the end of the Parliament, his party is shown as the UK Independence Party.
# Create dates for the start and end of the 2010-15 Parliament
startDate = datetime.date(2010, 5, 7)
endDate = datetime.date(2015, 3, 30)
# Download full data for MPs serving between the dates as a list
members = mnis.getCommonsMembersBetween(startDate, endDate)
# Get the summary data for these members on the startDate
sd = mnis.getSummaryDataForMembers(members, startDate)
# Douglas Carswell's party is Conservative
print(sd['list_name'], '-', sd['party'])
# Get the summary data for these members on the endDate
sd = mnis.getSummaryDataForMembers(members, endDate)
# Douglas Carswell's party is UK Independence Party
print(sd['list_name'], '-', sd['party'])
In the above example, we requested all members who served at any point during the 2010-15 Parliament, but this wasn't strictly necessary. Douglas Carswell's record would have been returned in the results of any date-based request within that Parliament, except one that fell wholly within the period between his resignation as an MP on the 29th of August 2014 and his re-election at the Clacton by-election on the 9th of October 2014. But it was a good opportunity to show how you can request all members serving within a range of dates using getCommonsMembersBetween.
The Members Names database is an administrative system as well as a record of historical data, and there are some inconsistencies in recording practices to look out for. In particular, in some cases MPs are listed as serving up to the date of the general election at which they were defeated or stepped down, while in others they are listed as serving up to the date of dissolution before the general election at which they were defeated or stepped down.
This does not affect the calculation of the number of days served by a member, which excludes any period of dissolution irrespective of how the memberships are recorded. However, it does affect the MPs returned by date-based API requests.
For example, requesting all members serving on the date of the 2010 General Election with getCommonsMembersOn returns the 650 MPs elected on that date and the 225 MPs who were either defeated or stood down at that election. This is not the case for the 2015 General Election: a date-based request for members serving on the date of that election returns just those elected on that day.3
There are two simple solutions to this problem. First, if you are only interested in MPs returned at a particular general election you can use the function getCommonsMembersAtElection, which uses a different API call and only returns those MPs elected on that date. The function takes the year of the general election as a string and will return records for any general election since 1983.
members = mnis.getCommonsMembersAtElection('2010')
Alternatively, if you want to request MPs based on a date range starting at a general election, use the day after the general election as the start date. The membership hasn't changed between election day and the following day at any of the general elections since 1983, so requesting the MPs serving on the day following a general election is equivalent to asking for the MPs elected at that election. This is how the data was requested in the above example showing Douglas Carswell's change of party.4
A less simple solution, which provides the most fine-grained control, is to request the full data for all members with a date-based request and then filter the list using the dates of their House memberships. In most cases this sort of approach is not necessary, but it is wise to check the data returned by the API before automating any analysis.
This post covers the basics of using the mnis libray to extract data from Members Names. In a future post I will take a deeper dive into the mnis library, showing how to customise API requests and write your own data extraction functions.
1. The API returns information on some members before 1983, but coverage is incomplete before then and becomes more sparse the further you go back in time.
2. Use the constituency field, rather than the party field, to test whether an MP was serving on a given date, as an MP who later served in the House of Lords will also have party memberships associated with their Lords membership.
3. The same issue does not appear to affect outgoing members at by-elections, whose end date is either the date of their death, or the date of their formal resignation as an MP under official Parliamentary procedures.
4. Dates of general elections and dissolutions are available as a dictionary in the mnis.housedata module.
9 Apr 2016 14:05 GMT
Earlier this week I posted a treemap showing the UK's migrant population by country of birth. A common reaction among people who saw it was to wonder what the size of the UK's foreign-born population was relative to the size of the UK-born population.
To help put that in context I have produced a new nested treemap showing the population of the UK broken down by region and broad country of birth. The population in each region is grouped into those born in the UK, those born in other EU countries, and those born in countries outside the EU.
While it's interesting to see the data visualised in this way, the advantages of using a treemap rather than a traditional bar chart are much less obvious in this case. When showing the migrant population by individual country of birth, a treemap lets you compare data for a very large number of countries in a way that is much easier to gloss than a bar chart. It allows you to group countries into common geographical regions, which represent the group's aggregate size. And the arrangement of countries from largest to smallest in each group provides a good visual representation of the distribution of the population within the group.
In this case, the UK's migrant population is too small as a proportion of the total population to break down into individual countries of birth, or anything more than two or three groups. Arguably the most interesting thing about the visualisation is how small the migrant population appears relative to the size of the UK-born population in every region outside London. On the other hand, a treemap makes it harder to make exact comparisons between the size of the migrant population in each region.
In short, I don't think this treemap is as effective as the last one, but that is probably because it is less well suited to the data being presented. But the new version of D3 made it just as easy to produce this treemap as the last one, and I thought it was worth sharing for those that asked to see it.
5 Apr 2016 08:04 GMT
It's been exciting seeing version 4.0 of D3 develop, and last week Mike Bostock announced that d3-hierarchy was now included in the alpha. This module provides various ways of visualising hierarchical data, including treemaps.
I have wanted to explore using treemaps for some time, as much of the data I work with could potentially be presented in this way, so this seemed like an ideal opportunity to start experimenting.
My first attempt is a treemap showing the foreign-born population of the UK broken down by country of birth. The countries are grouped into broad global regions, using a particular arrangement of the new country groupings the Office for National Statistics has recently introduced in reporting migration statistics. Within the group for the European Union, the EU14, EU8, and EU2 are grouped separately.
The figures are taken from the most recent quarterly Labour Force Survey, which is for the fourth quarter of 2015. They are estimates of all people born abroad who were living in the UK at the time of the survey, excluding two small groups: those born in British overseas territories, and those who did not fully specify their country of birth. (The latter group consists of people who, for example, said they were born in the USSR but did not say which current country that would be.)
The treemap itself is relatively simple and leans heavily on Mike Bostock's example code, with just a few presentational tweaks. But it's the first time I have seen this data (which is very familiar to me) laid out in this way. It was extremely simple to get this working and I am looking forward to delving deeper into d3-hierarchy to explore what else is possible.