Winning the Deloitte/Fide chess comp

The following is a cross-post of my contribution to the Kaggle blog, made after winning the Deloitte/FIDE Chess Rating Challenge: a contest for developing a new predictive system for ranking chess players, sponsored by Deloitte Analytics Australia and the world chess federation FIDE.

My name is Tim Salimans and I am a PhD candidate in Econometrics at Erasmus University Rotterdam. For my job I constantly work with data, models, and algorithms, and the Kaggle competitions seemed like a fun way of using these skills in a competitive and social environment. The Deloitte/FIDE Chess Rating Challenge was the first Kaggle contest I entered and I was very fortunate to end up taking first place. During the same period I also used Kaggle-in-class to host a prediction contest for an undergraduate course in Econometrics for which I was the teaching assistant. Both proved to be a lot of fun.

Chess rating systems

The first thing to do when tackling a new problem is to look up what other people have done before you. Since Kaggle had already organized an earlier chess rating competition, the blog posts of the winners were a logical place to start. After reading those posts and some of the academic literature, I found that chess rating systems usually work by assuming that each player’s characteristics can be described by a single rating number. The predicted result for a match between two players is then taken to be some function of the difference between their ratings. Yannis Sismanis, the winner of the first competition, used a logistic curve for this purpose and estimated the rating numbers by minimizing a regularized version of the model fit. Jeremy Howard, the runner-up, instead used the TrueSkill model, which uses a Gaussian cumulative density function and estimates the ratings using approximate Bayesian inference.

I decided to start out with the TrueSkill model and to extend it by shrinking each player’s rating to that of their recent opponents, similar to what Yannis Sismanis had done in the first competition. In addition, I introduced weights into the algorithm which allowed me to put most emphasis on the matches that were played most recently. After some initial experimentation using the excellent Infer.NET package, I programmed everything in Matlab.

Using the match schedule

The predictions of my base model scored very well on the leaderboard of the competition, but they were not yet good enough to put me in first place. It was at this time that I realized that the match schedule itself contained useful information for predicting the results, something that had already been noticed by some of the other competitors. In chess, most tournaments are organized using to the Swiss system, in which in each round players are paired with other players that have achieved a comparable performance in earlier rounds. If in a Swiss system tournament player A has encountered better opponents than player B, this most likely means that player A has won a larger percentage of his/her matches in that tournament.

In order to incorporate the information present in the match schedule, I generated out-of-sample predictions for the last 1.5 years of the data using a rolling 3-month prediction window. I then performed two post-processing steps using these predictions and the realized match outcomes. The first step used standard logistic regression and the second step used a locally weighted variant of logistic regression. The most important variables used in the post-processing procedure were:

  •  the predictions of the base model
  •  the ratings of the players
  •  the number of matches played by each player
  •  the ratings of the opponents encountered by each player
  •  the variation in the quality of the opponents encountered
  •  the average predicted win percentage over all matches in the same month for each player
  •  the predictions of a random forest using these variables

This post-processing dramatically improved my score and put me well ahead of the competition for some time. Later, other competitors made similar improvements and the final weeks of the competition were very exciting. After a long weekend away towards the end of the competition I came back to find that I had been surpassed on the leaderboard by team PlanetThanet. By tweaking my approach I was able to crawl back up during the next few days, after which I had to leave for a conference in the USA. Upon arrival I learned that I was again surpassed, now by Shang Tsung. Only by making my last submissions from my hotel room in St. Louis was I finally able to secure first place.

Conclusions

Much of the contest came down to how to use the information in the match schedule. Although interesting in its own right, this was less than ideal for the original goal of finding a good rating system. To my relief, the follow-up data set that Jeff Sonas made available showed that my model also makes good predictions without using this information. Finally, I would like to thank both the organizers and the competitors for a great competition!