• hours
  • minutes
  • seconds
Soccernet Store
  • Share
July 19, 2010

SPI in review: How did it do?

By Nate Silver
Special to

It was almost two tournaments in one. This year's World Cup abruptly shifted gears from an unpredictable affair dominated by underdogs and South American teams in the group stage to a more efficiently played one dominated by favorites and European teams in the knockout stages. The Las Vegas odds correctly predicted the outcome of 12 of the 16 matches in the knockout stage, missing only U.S.-Ghana, England-Germany, Argentina-Germany and Brazil-Netherlands. The SPI match predictor got 13 of 16 right, having correctly deemed Ghana to be a very slight favorite over the U.S.

Still, let's not forget entirely about the lessons learned in the early stages of the tournament. Parity is increasing in world football. Although the best European teams obviously had few problems, the more vulnerable ones -- including traditional powers Italy, France and England -- are at risk of falling behind up-and-coming teams from elsewhere on the planet like Uruguay, Paraguay, Ghana and Chile and to a more debatable extent Japan, South Korea, Mexico and the United States. Even Spain -- for all its tremendous ability to come through in the clutch -- did not play a flawless tournament, as it lost to Switzerland in its opening match and having real scares against Paraguay and the Netherlands.

One way to evaluate the tournament as a whole is to apply the formula FIFA uses to rate the teams from No. 1 through No. 32 based on their performance in the World Cup; this formulates a nice balance between knockout- and group-stage play. We can then compare these standings against the order in which the teams were rated by various systems heading into the tournament. The systems we'll look at are SPI, Elo, the Voros McCracken ratings and the FIFA/Coca-Cola World Ranking.

Generally speaking, there was relatively little difference in how the systems rated the top clubs: Everyone had Brazil No. 1, Spain No. 2, the Netherlands No. 3 or 4 and Germany No. 5 or 6. It was lower down the table where some differences emerged. I have designated a team's rating as "hot" when one of the ratings systems rated it at least three positions higher than an average of the other three systems. Chile, for instance, was one of SPI's hot teams; we had it rated No. 8 heading into the tournament, whereas the consensus of the other three systems had it in the 13th position, on average, of the 32 sides. These teams are shown in pink in the table. There are also, of course, "cold" teams, which are designated in blue; this occurs when a team is rated at least three positions behind the average of the other three systems. Greece was one of SPI's cold teams; we had them as just the 26th-best side in South Africa, whereas the other systems had them 17th on average. As you can see from the table, SPI's hot teams -- those that it regarded more favorably than the consensus of other systems -- were Chile, Uruguay, Cameroon, Ivory Coast and South Korea. SPI's cold teams, on the other hand, were Italy, France, Greece, Mexico, Algeria and Australia.

Three of these picks stand out as being especially strong: Uruguay, which advanced to the semifinals (albeit in controversial fashion) on the hot side, and Italy and France, which played scandalously bad football, on the cold side. SPI also looks fairly wise to have wagered some of its credibility on Chile and South Korea on the upside, and Greece on the downside.

Another of SPI's calls was poor: winless Cameroon, which it had rated higher than most of the other systems (although FIFA also regarded the Cameroon highly). The other picks, like Ivory Coast, had mixed results and are harder to evaluate.

So, SPI might not have been quite as good as Paul the Octopus. But overall, this looks like a rather strong performance: somewhere between three and six very good-looking calls, depending on how generous you're being, and only one really bad-looking one in Cameroon.

We also can study this in a slightly more systematic way by means of a mathematical technique known as the correlation coefficient, which is a measure of the quality of the relationship between two sets of data. A system that perfectly predicted the final order of the teams No. 1 through 32 would have a correlation coefficient of 1.0; this is the highest possible score. One that demonstrated no skill at all -- just picked the teams at random -- would have a correlation coefficient of 0. (A system could have a negative correlation coefficient if it tended to rate the worst-performing teams more highly.)

The correlation coefficients for the four systems were as follows:

SPI -- .58
Elo -- .57
Voros -- .54
FIFA -- .43

SPI, with a correlation coefficient of .58, did the best of the four systems, although only by a trivial (and certainly not mathematically significant) margin over ELO at .57. Voros McCracken's ratings, with a correlation coefficient of .54, were also essentially within the margin of error of the top two systems. The FIFA rankings, on the other hand, were less accurate, with a correlation coefficient of just .43. Although the FIFA ratings made one or two good calls  they were pretty down on England, for instance -- there were far more problematic ones. They were too optimistic about Italy, Greece, Nigeria, Cameroon and Portugal, for example, and too pessimistic on Paraguay, Japan and South Korea.

I don't want to make too much of any one tournament, especially a somewhat strange, Jekyll-and-Hyde tournament. But it does look as though the SPI ratings can potentially give you a pretty good idea about which clubs are playing the best soccer at any given moment. We'll look forward to seeing how they do in Poland and the Ukraine in 2012, and Brazil in 2014.

Nate Silver is a renowned statistical analyst who was named one of "The World's 100 Most Influential People" by Time Magazine in 2009. He gained acclaim for outperforming the polls in the 2008 U.S. presidential elections and created baseball's popular predictive system, PECOTA.