Much of the work we do is based on statistical models used to reproduce reallife processes, yet it is not the only option. When we know the rules and restrictions that dominate a system, we can reproduce their behaviour using simulation techniques and with a few simple statistical calculations, we can get to know everything that we are interested in the system.
Coinciding with the start of the Spanish League 20132014 two weekends ago, myself and a group of friends were thinking about a series of questions about the future results of the League:

Will we have a League of two (Barcelona and Madrid)?

What are the chances of my team to win the League (I am a supporter of Valencia)?

Assuming that Madrid or Barcelona will win League, what options have the other teams to obtain the second position?
To answer these questions, I was challenged to simulate the current League using information from the last 5 seasons:

I downloaded the general statistics of the last five seasons from www.as.com website. With this information, I calculated the odds of home win (HW), home draw (HD), home loss (HL), away win (AW), away draw (AD) and away loss (AL) for each of the 20 current teams in Spanish League. (Note: Elche probabilities are the average of probabilities from the 10 teams that have been on the First Division in the last five leagues but are not on that division in the 20132014 league.)

From the League schedule I have calculated the probability of win, draw or loss for each league matches, being given for each match the names of the local and the visitor teams.
 I simulated 10000 leagues from which I calculated the probabilities Pr(Win), which is the probability of winning the current league; Pr(Champ), the probability to obtain a position between first and fourth (Champions League); Pr(Eur) is the probability of entering European competitions (Champions or Europa League) ;and Pr(ncc) is the probability of not changing category.
Pr(Barcelona 1st and Madrid 2nd or Madrid 1st and Barcelona 2nd) = Pr(Barcelona 1st and Madrid 2nd) + Pr(Madrid 1st and Barcelona 2nd) = 0.2930 + 0.2428 = 0.5358
Figure 1: Boxplot of the Points per Team in the 10000 Leagues simulated.
Besides the obvious conclusions we can draw from these results, we can see that we are clearly in a league of two. This sort of procedure will also allow us to emulate complex systems in which we know the rules for calculating the corresponding probabilities. For example, in Biostatistics we could work out the probabilities of an offspring being affected by a genetic disease if we know the probabilities of the parents being carriers (genetic problem).
If you are interested in the topic Simulation and Sports, I recommend reading the paper “On the use of simulation methods to compute probabilities: application to the Spanish first division soccer league” of Ignacio DíazEmparanza and Vicente NúñezAntón, which explains in much more detail how to approach this problem from different points of views.
I have implemented all the calculations for this post with free software R.
Do you have experience in simulation? Tell us about it!!!
Great post Hector! It’s very interesting. I like sports and statistics 🙂
I also recommend this paper of Burton(http://www.soph.uab.edu/Statgenetics/Club_ssg/MPadilla_07.pdf) if you are interested in simulation.
Thanks a lot Martí! I know your interest in statistics and sports 😉 because I have read some of your posts and papers. Will we write a future post about basketball?
We could talk about it. It’d be a pleasure to write a post together in basketball!!
Here, you have some others articles related with this subject:
http://www.sumsar.net/blog/2013/07/modelingmatchresultsinlaligapartone/
http://www.statistica.it/gianluca/Research/BaioBlangiardo.pdf
Nice blog 😉
Pingback: Mixed Models in Sports and Health  FreshBiostats