Portugal will be the 2022 World Cup grand champion!
Simulations… Simulations everywhere
I’m not sure about you, but I get bonus points for correctly predicting the winner of the World Cup. Let’s give this some extra attention.
Calculations or simulations
You have two major ways to get the odds:
- Calculate the odds by calculation
- Approximate the odds by simulation
Calculation
Let’s look at calculation first, say you want to know the odds of getting a total of exactly 11 when rolling two dice. You could create a table with all options and do the math to find out the exact chance of getting a sum of 11.
Out of 36 possible combinations, only two will result in a sum of 11.
This means the chance of getting a sum of 11 when rolling two dice is:
2 /36 or ~5.56%.
That is what calculation looks like. Now let’s have a look at the simulation approach.
Simulation
Again, we want to know the odds of getting exactly 11 when rolling two dice.
This time, we won’t create a table. Instead, we will just roll two dice a hundred times. Even better, we let a computer do it using the following piece of Python code:
from collections import defaultdict
counts = defaultdict(int)
for i in range(100):
# roll the dice
dice1 = np.random.randint(1, 7)
dice2 = np.random.randint(1, 7)
# calculate the sum
dice_sum = dice1 + dice2
# note the sum
counts[dice_sum] += 1
# print the occurrences
for i in range(2, 13):
print(f"{i:>2}: {counts.get(i, 0)} times")
Which gives the following results:
2: 3 times
3: 2 times
4: 7 times
5: 19 times
6: 14 times
7: 12 times
8: 10 times
9: 12 times
10: 16 times
11: 3 times
12: 2 times
On three rolls we got 11, resulting in 3/100 = 3%, not so close to the 5.56% we got before. Maybe we just need some more rolls, let’s try for a thousand:
2: 32 times
3: 49 times
4: 84 times
5: 99 times
6: 145 times
7: 161 times
8: 153 times
9: 109 times
10: 78 times
11: 64 times
12: 26 times
This time we get 64/1000 or 6.4%. Already a bit closer, but it looks like we need even more. When we ramp up the number of simulations we get the following graph:
When we roll a million times we get a total of 55657 rolls with a sum of 11, or 55657/1000000 = ~5.57%. Pretty close to the original 5.56%.
What about the World Cup champion?
I’m glad you asked.
When we have a look at the above approaches, we can see we cannot do the calculation method: we cannot create a table for all possible outcomes of all upcoming matches for the World Cup.
But what we can do is simulate the entire World Cup tournament a million times! So let’s do that.
To simulate a game, we will draw a score from a distribution depending on both teams’ FIFA ranking. We group each country into a range of FIFA ranks, then we look at the distribution of the scores for matches between these groups. For example:
Monday, November 21, Senegal (rank 18) vs. The Netherlands (rank 8).
Senegal will be in group rank 11–20, and The Netherlands will be in group rank 1–10.
Then we look at the distribution, just like we did in the previous blog here. One addition, this time it does matter if we draw 0–1 or 1–0 since we would like to know if Senegal or The Netherlands will win.
Two things to note:
- The bar for “other” is much higher now because we no longer group together mirrored scores, for example, 0–1 and 1–0. Resulting in many more different options
- The second team winning is more likely, with 0–1, 1–2, and 0–2 all appearing before the first victory bar for team 1 with 1–0. This makes sense because the second team has the better FIFA ranking.
We can create this distribution for each possible pair of FIFA rank groups and then draw from it, giving us a sample score for each match.
The first draw
Before we run it a million times, let’s show a sample for Group A first.
Using the technique above, we can now have a score randomly selected from our distribution for each match. The chance column indicates the chance of this particular score happening given the grouped FIFA ranks of the two teams.
The scores above result in the following standing:
Meaning in this scenario The Netherlands and Ecuador would continue to the round of 16.
We are now calculating the percentages and simulating the tournament, the best of both worlds.
Playing a million World Cups
We repeat the above process for all groups, then for the entire round of 16, quarter-finals, semi-finals, and finally… the final.
We then run it many many times, a million times to be precise.
Without further ado, here are your top 10 contestants for winning the 2022 World Cup in Qatar:
Portugal wins 84832 of all one million simulated World Cups, making it the most likely winner.