Going for broke in King of Tokyo

23 Jul 2017 15:24 GMT

King of Tokyo is a lighthearted but engrossing board game that's based on the classic monster movie genre. You play a monster in the style of Godzilla or King Kong and your objective is to defeat the other monsters battling for control of Tokyo. Since my brother introduced me to the game last summer it has quickly become a family favourite. It's a game I can play with my seven year-old daughter and her 83 year-old grandad and everyone has a good time.

I won't run through all of the rules here. The only thing you need to understand to read this article is the game's central dice mechanism. Underneath the thematic flavour of giant monsters with superpowers, King of Tokyo is a dice rolling game with similar rules to Yahtzee. The dice are the traditional six-sided type, but with different faces. These are the digits 1, 2, and 3, a claw, a heart, and a lightning bolt.

On your turn you roll six dice up to three times. After each roll you can keep any of the dice that you like and continue rolling the remaining dice until you either have all the faces that you want, or you run out of rolls. The dice that are face up at the end of all your rolls are your hand for that turn, and you then resolve the outcomes.

Dice with numbers give you victory points if you have three or more of the same number. Claws deal damage to your opponents, hearts restore your health, and lightning gives you energy which you can use to buy power cards.

Quite often in the game you can benefit from getting dice of more than one type on the same turn: do some damage to an opponent and heal yourself a bit; collect some lightning for power cards and gain a few victory points.

But at other times you need to go for broke and try to get as many of one dice face as possible. This tends to happen at the most decisive moments in the game: when you need just a few more victory points to win outright, when you need to do enough damage to kill an opponent before they win, when you need to heal quickly to prevent your own impending death, or when you need to collect enough lightning to buy a vital power card before someone else.

You can model this strategy with a Python function that looks like this:

import numpy as np

def max_outcome_for_face(dice=6, rolls=3):

    hits = 0
    outcome = [0] * rolls

    for roll in range(rolls):

        # Get a random number from one to six for each dice
        results = np.random.randint(1, 7, size=dice)
        # Count the number of ones: a one is a hit
        numhits = np.count_nonzero(results == 1)
        # Add to hits and remove a dice for each hit
        hits = hits + numhits
        dice = dice - numhits 

        # Store the hits after each roll
        outcome[roll] = hits

    return outcome

This function simulates a turn in the game with the given number of dice and rolls — six dice with three rolls by default. After each roll, it counts the number of times the target face was rolled, adds that number to a running total for the number of dice with the target face, and removes those dice with the target face from subsequent rolls. After all the rolls have completed, it returns a list showing the cumulative number of dice with the target face after each roll.

If you call this function a large enough number of times and collect the results, you can obtain the probability distribution for the outcomes of this strategy for any given number of dice and rolls. The heatmap below shows the distribution of outcomes after ten million turns with six dice and up to four rolls.

A heatmap showing the probability distribution of outcomes for dice rolls in King of Tokyo where the player is trying to get as many of one dice face as possible.

In King of Tokyo you get three rolls with six dice by default, so while this chart shows the outcomes for up to four rolls, the third column is the most relevant. This shows that if you are trying to get as many dice as possible with one particular face, you have an 80% chance of getting two or more dice with the given face, and a 21% chance of getting four or more.

So why does the heatmap show probabilities for up to four rolls? Because some of the power cards you can buy in the game give you an extra roll of the dice, and the Giant Brain card in particular gives you an extra roll as a permanent effect. With a fourth roll of the dice the probability of getting two or more dice with the target face increases to 91%, and the probability of getting four or more rises to 38%.

There are also power cards in the game that give you an extra dice on each roll: the Extra Head card gives you this as a permanent effect. Here is the distrbution of outcomes for seven dice and up to four rolls.

A heatmap showing the probability distribution of outcomes for dice rolls in King of Tokyo where the player is trying to get as many of one dice face as possible.

This shows that an extra dice is worth less than an extra roll when you are going after a particular face. With seven dice and three rolls the probability of getting two or more of a given face is 87%, and the probability of getting four or more is 33%. That's better than six dice with three rolls, but not as good as six dice with four. Although it's worth noting that an extra dice confers other benefits — you get more stuff.

Of course, if you can get the combination of cards that gives you seven dice with four rolls, then getting four or more dice with the target face becomes more likely than not, with a 54% chance.

This analysis was done using numpy and pandas, and the heatmaps were produced with matplotlib and seaborn. The complete source code is available on GitHub.

Making hexmaps with D3

24 Jun 2017 14:27 GMT

I spend a lot of time working with data for Parliamentary constituencies. I often want to map constituency data in a way that gives each constituency equal weight, especially when mapping election data, but until recently I had never seen an interactive hexmap of constituencies online. So I was really pleased when two months ago the Open Data Institute released not just a hexmap of Parliamentary constituencies, but also a specification for describing hexmap data called HexJSON.

You can read more about the HexJSON spec on the ODI's website, but briefly: HexJSON describes a hexmap as a set of hexes with column (q) and row (r) coordinates within a given coordinate system, which is specified with the layout property.

HexJSON that looks like this:

  "hexes": {

Describes a hexmap that looks like this:

An example of a hexmap with four columns, four rows, and a pointy top layout.

I wanted a way to render hexjson data generally, so I wrote a small D3 plugin called d3-hexjson, which takes a hexjson object and generates the data necessary to render it easily with D3.

Code examples can be found in the GitHub readme, and are shown in two blocks by Henry Lau. Giuseppe Sollazzo used d3-hexjson to create a visualisation showing the potential impact of swing on the number of Conservative and Labour seats won at the 2017 General Elecion, and wrote an article explaining how he did it.

My first use of the plugin was to run live hexmaps of the 2017 General Election results as they came in overnight on polling day. We were recording the results as they were announced at work, so we could send the data to the hexmaps very easily. At one point we fell behind the announced results as most of our election volunteers did not start until 3:00am and the results were already coming in quickly at that point.

Here are links to all the completed hexmaps showing the 2017 results, along with the 2015 results for comparison.

Animating the difference between seats and votes in first past the post elections

9 May 2017 07:44 GMT

I've been playing with animated treemaps. This treemap illustrates the difference between the number of seats and votes won by political parties in each nation and region of Great Britain at the 2015 General Election. Northern Ireland is not shown as NI has its own distinct political parties, so comparisons with other parts of the UK are less meaningful.

I wanted to try presenting the data in this way because it helps address an interesting question: what is the role of animation in data visualisation? I've been thinking about this question ever since posting an animated uncertainty chart a few years ago.

Animation is superficially appealing because the eye loves motion, and interfaces that respond to user input feel alive. But sometimes animations in visualisations add little if anything to the reader's understanding of the data. They can be like animated transitions in PowerPoint, which tend to communicate that the presenter has spent more time thinking about the style of their presentation than the content.

I think animations can be valuable when they help show something about the data. The extent of change in a transition can be a dimension along which comparisons can be made. In this case, you could show two treemaps side by side (one for seats and one for votes) but the eye would have to move between the two to find the same region and make comparisons. Here you can hold the eye still and observe the magnitude of the transition from one state to the other. That comparison is meaningful in this context because in a perfectly proportional electoral system there would be no difference at all.

I don't think the effect works in all cases. In particular, when a party's position in a nation or region changes along with its size you lose the visual baseline for the comparison and it becomes harder to judge by how much a party's share has changed. Nevertheless, I thought this was an efficient way to summarise a large dataset and the comparison was worth sharing.

Visualising migration between the countries of the UK

14 Mar 2017 21:55 GMT

I've been experimenting with Sankey diagrams using d3 and thought I'd share an example. This visualisation shows migration flows between the different countries of the UK in the year ending June 2015. The data comes from the Office for National Statistics annual release on internal migration. In this dataset, internal migration refers to people moving to a new home in a different part of the UK.

When it comes to migration between the countries of the UK, most of the flows are between England and each of the other countries. There is much less direct migration between Wales, Scotland, and Northern Ireland. This may be because of the geographical arrangement of the UK, the size of England relative to the other countries, the movement of people to and from England's major cities (especially London), or a combination of all those things.

The flows between England and each other part of the UK are fairly balanced, with a similar number of people moving in each direction. Interestingly, the flows between Wales and England are slightly larger than the flows between Scotland and England, even though the population of Scotland is larger than that of Wales. What's not shown in this visualisation is the large number of moves within England itself.

One aspect of these charts that I'm undecided on is how the links — the flowing bridges between the origin and destination nodes — should be shaded. Most of the examples of Sankey diagrams made with d3 use a single neutral colour for all the links (see Mike Bostock's example). In this case I have used asymmetric shading: the links are shaded according to their origin node only. This lets you trace the flows from their origin, reading from left to right, while you can easily see the composition of the flows at the destination without having to trace them back.

Some thoughts on Constituency Boundaries

4 Feb 2017 15:04 GMT

A few months ago I launched Constituency Boundaries, a small web app that lets you explore proposed changes to the boundaries of Parliamentary constituencies, which are being put forward as part of the 2018 Boundary Review.

I won't go over the details of how the app works here (the homepage explains it), but I wanted to set down a few thoughts about the process of developing this kind of software, because I think it has some interesting characteristics that put it somewhere between data journalism and traditional web application development.

Constituency Boundaries was developed primarily as a form of data journalism: to help answer a specific set of questions about a particular subject. The app exists because in the HoC Library we need a tool that not only allows us to compare the current and proposed constituency boundaries, but lets us explore how constituencies are composed, and discover what other possible arrangements of wards might produce alternative constituencies that also meet the boundary review criteria.

This requirement was essential, because if someone disagrees with some aspect of the proposed boundaries, it helps their case in the consultation process if they can show that a better alternative arrangement exists. (And making the app public means other people can use it for this purpose too.)

Because the app is designed to help people engage with a particular process, it doesn't have to be all things to all people. It's not a general tool for exploring theoretical arrangements of Parliamentary constituencies under every possible set of criteria; it directly addresses the questions raised by the 2018 review.

The development process was also journalistic in that the app was built to meet an event-driven news deadline — the announcement of the initial proposals for new constituencies — and to perform its role for the limited lifetime of that story, which concludes in 2018. This meant concentrating on core features and aiming for just a little more than the minimum viable product. The software is a living prototype rather than a fully evolved application.

And I'd argue that was the right approach because, as this chart from Mapbox's tile server shows, the peak of demand for the app was at its launch, on the day the initial proposals for England and Wales were published. So for this kind of software the features that matter are the features you can deliver on day one.

A chart showing a spike in website traffic at the time the app launched.

One way data journalism differs from application development is that you have more opportunity to iterate on your code between projects than within them. (Constituency Boundaries evolved from Population Builder, for example.)

At the same time, the deadline was known long enough in advance, and the boundary review process has a long enough lifetime, to make it worth some real effort. The app was built quickly, but relative to the features that it provides. It was more work than is involved in a simple interactive chart or map, and it needed additional development on the server.

I think there is a role in data journalism for small standalone data-driven apps like this: apps that expose large datasets to the public in a way that makes them explorable and easy to understand. They can yield more than one story. They can even move the story on.