Predicting the 2022 World Cup games using simple statistics
Again we play for fame and glory
If you have read my blog series during the Euro Cup games, you know what to expect. I’ll work out some math and statistics and tell you which outcomes you should be predicting for the upcoming World Cup games.
In this edition: predict winners with the most occurring outcome.
Alright, let’s dive in
As always with data science, we start with the data.
The data:
- All international football games played from 1872 to 2022
- Fifa rankings of each country over time
- Upcoming world cup games
We will use historical data to understand what a sensible prediction means for football. For example, 105–92 isn’t a strange outcome in basketball, but it is in football.
Football outcome distributions
Which is just a fancy way of saying how often each outcome occurs.
You may feel like 2–2 is a reasonable outcome for a football match, but it would be a bad guess. The most common 1–0 appears more than 4 times as often!
The above bar chart includes all international football games, but our data has a wide variety of different types of international football games. It could be a friendly match but also a World Cup final.
Our data includes 44059 matches over 139 different match types. Let’s have a look at the 20 most common types of games.
+--------------------------------------+---------------------+
| Match type | Number of matches |
+======================================+=====================+
| Friendly | 17425 |
| FIFA World Cup qualification | 7774 |
| UEFA Euro qualification | 2593 |
| African Cup of Nations qualification | 1932 |
| FIFA World Cup | 900 |
| Copa América | 841 |
| AFC Asian Cup qualification | 764 |
| African Cup of Nations | 742 |
| CECAFA Cup | 620 |
| CFU Caribbean Cup qualification | 606 |
| Merdeka Tournament | 595 |
| British Championship | 505 |
| UEFA Nations League | 468 |
| Gulf Cup | 380 |
| AFC Asian Cup | 370 |
| Gold Cup | 358 |
| Island Games | 350 |
| UEFA Euro | 337 |
| COSAFA Cup | 309 |
| AFF Championship | 293 |
| Small tournaments | 5897 |
+--------------------------------------+---------------------+
Is it all the same?
What we would like to know is: are the outcomes similar for friendly and World Cup games? If so, we can use all the data. If not, we should separate the data or make adjustments.
Below is a bar chart much like the one before, but this one has a bar for each different type of game and outcome.
It is not all the same, but it is close enough for me: 1–0 is still the most occurring outcome and that’s why we will predict 1–0 or 0–1 only.
Who will win?
To determine if we should predict 1–0 or 0–1, we will use the FIFA ranking. Below is a snapshot of the most recent FIFA ranking from the 6th of October 2022.
It is going to be really simple: higher on the list means you will win.
Or well, that’s what we will predict.
Final predictions for the group stage
Applying the logic above results in the following predictions:
+--------------+--------+--------------+--------+--------------+
| Home | Rank | Away | Rank | Prediction |
+==============+========+==============+========+==============+
| Qatar | 50 | Ecuador | 44 | 0-1 |
| England | 5 | Iran | 20 | 1-0 |
| Senegal | 18 | Netherlands | 8 | 0-1 |
| USA | 16 | Wales | 19 | 1-0 |
| Argentina | 3 | Saudi Arabia | 51 | 1-0 |
| Denmark | 10 | Tunisia | 30 | 1-0 |
| Mexico | 13 | Poland | 26 | 1-0 |
| France | 4 | Australia | 38 | 1-0 |
| Morocco | 22 | Croatia | 12 | 0-1 |
| Germany | 11 | Japan | 24 | 1-0 |
| Spain | 7 | Costa Rica | 31 | 1-0 |
| Belgium | 2 | Canada | 41 | 1-0 |
| Switzerland | 15 | Cameroon | 43 | 1-0 |
| Uruguay | 14 | South Korea | 28 | 1-0 |
| Portugal | 9 | Ghana | 61 | 1-0 |
| Brazil | 1 | Serbia | 21 | 1-0 |
| Wales | 19 | Iran | 20 | 1-0 |
| Qatar | 50 | Senegal | 18 | 0-1 |
| Netherlands | 8 | Ecuador | 44 | 1-0 |
| England | 5 | USA | 16 | 1-0 |
| Tunisia | 30 | Australia | 38 | 1-0 |
| Poland | 26 | Saudi Arabia | 51 | 1-0 |
| France | 4 | Denmark | 10 | 1-0 |
| Argentina | 3 | Mexico | 13 | 1-0 |
| Japan | 24 | Costa Rica | 31 | 1-0 |
| Belgium | 2 | Morocco | 22 | 1-0 |
| Croatia | 12 | Canada | 41 | 1-0 |
| Spain | 7 | Germany | 11 | 1-0 |
| Cameroon | 43 | Serbia | 21 | 0-1 |
| South Korea | 28 | Ghana | 61 | 1-0 |
| Brazil | 1 | Switzerland | 15 | 1-0 |
| Portugal | 9 | Uruguay | 14 | 1-0 |
| Netherlands | 8 | Qatar | 50 | 1-0 |
| Ecuador | 44 | Senegal | 18 | 0-1 |
| Wales | 19 | England | 5 | 0-1 |
| Iran | 20 | USA | 16 | 0-1 |
| Australia | 38 | Denmark | 10 | 0-1 |
| Tunisia | 30 | France | 4 | 0-1 |
| Poland | 26 | Argentina | 3 | 0-1 |
| Saudi Arabia | 51 | Mexico | 13 | 0-1 |
| Croatia | 12 | Belgium | 2 | 0-1 |
| Canada | 41 | Morocco | 22 | 0-1 |
| Costa Rica | 31 | Germany | 11 | 0-1 |
| Japan | 24 | Spain | 7 | 0-1 |
| South Korea | 28 | Portugal | 9 | 0-1 |
| Ghana | 61 | Uruguay | 14 | 0-1 |
| Serbia | 21 | Switzerland | 15 | 0-1 |
| Cameroon | 43 | Brazil | 1 | 0-1 |
+--------------+--------+--------------+--------+--------------+
Next up
Likely we can do much better, in the next blog we add more statistics to enhance the predictions, read more here!
World champion
Want to read about simulating the World Cup to find the most likely champion? You can find that here.