My understanding was that the system consists of using the historical odds of wi...

BSTRhino · on July 22, 2020

Yes, the best fit for the data is the data itself, it's a tautology. Nothing wrong with Elo's exponential curve, it just can't beat the actual data.

You raise a good point in that I could've created a training set and a test set, that probably would be a better validation. But I don't know, I'm not doing science, I'm making a game.

On the topic of whether the future matches the past, the predictions were based on a rolling database of the past 100000 matches, which is approximately the number of matches played per 7 days. So my theory is that the data is quite recent and up-to-date and so should match, in general.

Of course I never tested this. In the end, I'm not doing science, I'm making a game. If the retention goes up, complaints are down, then I can't keep working on the rating system, there are 1000 other things to do.

mcnamaratw · on July 22, 2020

Yeah, I'm not giving advice on how you should do it. I was just unsure whether critics here had understood that measured data is probably better than any theoretical fit, even the revered ELO.

roenxi · on July 22, 2020

> I think it is by definition the most accurate system

By gum, an opportunity to quibble semantics on the internet. That is true if benchmark using means 'only admit to knowing' and accuracy means 'must be numerically quantifiable given existing data'. It is false otherwise, especially if accuracy means 'conforming to truth' and we have a model for how the numbers are being generated.

Obviously if I generate a set of numbers by sampling a normal distribution then the most accurate model is a normal distribution, no matter what empirical data I use for benchmarking.

That is to say, if we know how the data was generated (sans noise) we can reject empirical distributions as the most accurate, because we can directly know the distribution of the data.

mcnamaratw · on July 22, 2020

Ok, that is a legitimate ... quibble. Let's assume that we don't already know the correct distribution. In that case we're going to judge each theoretical fit by how close it comes to the historical data. (Or else we're going to get that wrong, which is another common approach.) ELO is much more prestigious and credible than some guy who made a game, but it is less credible than data, for some number of data points N. (Although I think a theory can be more prestigious than data almost independent of N.)

ponker · on July 22, 2020

Well, it’s like the question of what is better: a restaurant with 4.5 stars on 4 reviews or one with 4.2 stars on 1,500 reviews?

mcnamaratw · on July 22, 2020

Sure. If there's enough data then the data becomes more credible than even the most popular theoretical fit. If I have four games I played with my nephew then people should probably go with ELO.