R and Python packages for the Parliamentary data platform

27 Jan 2019 18:26 GMT

I recently published two software packages for downloading and analysing data from the new Parliamentary data platform.

The data platform is an ambitious project, which aims to be a canonical source of integrated open data on Parliamentary activity. The data is stored in RDF and is available through a publicly accessible SPARQL endpoint. You can see the structure of the data stored in the platorm visualised with WebVOWL.

These packages provide an easy way to use the data platform API in both R and Python. They are aimed at people who want to use Parliamentary data for research and analysis. Their main feature is that they let you easily download data in a structure and format that is suitable for analysis, preserving the links between data so that it is easy to combine the results of different queries.

The packages provide two different interfaces to the data platorm:

  • A low level interface that takes a SPARQL SELECT query, sends it to the platform, and returns the result as a tibble (R) or a DataFrame (Python), with data types appropriately converted.
  • A high level interface comprising families of functions for downloading specific datasets. This currently focuses on key data about Members of both Houses of Parliament.

I think the data platform is great. It's a really valuable piece of public data infrastructure that has the potential to become a comprehensive digital record of what Parliament does. I hope to expand these packages as more data is added to the platform in future.