Topic: Rating system
king_of_nowhere![]() |
Posted at: 2019-07-08, 03:08
sorry to be contrary here, but I question the premises of your mathematical model. while i do believe that the mathematics is sound, i think the model would need experimental validation. You can't assign the value of how much difference is so much that the weaker player is virtually negligible, because that is different for every game, and cannot be calculated a priori. I think it boils down to how much difference is enough to win 1v2, and how much damage a weaker player can inflict to the stonger one. in chess it is very easy to force a 1-for-1 exchange, and that's why a world champion would never be able to 1v2 even mediocre club players. in widelands, hero soldiers can face a lot of weaker enemies with no losses, so it's much easier to face multiple weak enemies. the difficulty is also dependant on the map (on a small map, no time to upgrade soldiers, fight of untrained soldiers mean weaker players can still deal lots of damage. on a large map, weaker players would have time to upgrade soldiers. on medium map it's easier for stronger player). ultimately, I don't think the question of 2v2 can be tackled without some hard experimental data to set boundaries for a mathematical model. ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
hessenfarmer![]() |
Posted at: 2019-07-08, 07:05
Although I am not very keen to be ranked in any ranking I have no objections against it. Only thing that would really be interesting is getting some statisitcs from the system about used maps, used tribes, and so on. we could use them for determining or prove balance and other things of the tribes. ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
einstein13![]() |
Posted at: 2019-07-08, 10:24
I have to say: "You're 100% right!". And it is not a contrary here, but rather complement. I agree that we need experimental situations and see if any model works, but from our (widelands) experience, we need a pretty good model to start. Remember trees growth model? It was introduced and mathematically well designed, but it didn't work in the game. We (you) have changed some values, but the major model stayed. Now the trees are growing OK. I think that here is a similar situation: somebody proposed a change, addition to the game, somebody else is trying to solve the problems that occurs with it. If it doesn't work, we can stick to any model Widelands needs. My model doesn't affect Glicko or Elo, it is only solving the problem of more than 2 players at once. And I am trying to get the proper values of that. If it will not work, we can just change the model OR prohibit games other than 1 vs 1 for ranking. einstein13 ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
einstein13![]() |
Posted at: 2019-07-09, 00:51
Today I was able to expand a model a bit: now it covers calculating R and RD for all games with 2 teams only. New file available on my site: I know that some of you can be sceptic about it, but hey, if it is possible, we can try it and then decide if it is OK or not. Next step is to think about three or more teams... Is it possible to make a model for that? einstein13 ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
WorldSavior![]() |
Posted at: 2019-07-09, 23:14
Quite impressive, but is it just me - or is the most important information missing: How the rating and the RD will change for each player? And another thing looks weird: A 50 RD guy and a 130 RD guy form a team with less than 50 RD? Shouldn't it be in between?
No, why should it be?
That's questionable
Yes, why not. This doesn't have to be the limit forever.
At the other hand it's not so good to declare games as rated after they have been played, especially if not every player agrees on get a ranking.
Nice
Ah, thanks
If the opponents are a team but not completely incompetent they might exchange most pieces and leave the GM no chance. Wanted to save the world, then I got widetracked ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
einstein13![]() |
Posted at: 2019-07-10, 00:06
As simple as possible: according to Glicko-2 system you calculate for team scores the gains and loses for the game. Then you add the results to all winners and subtract the loses from the opposite side. If you recalculate again (with new scores) the team ranks, they will be as expected: higher or lower by given values. This behaviour was proven in the document in point 4. d).
Yes, I was a bit surprised too, but the standard deviation is sometimes counter intuitive. Let's make an example. Take a wooden stick of length 1 meter. Then you pick something big, like stadium (football, baseball, whatever). Try to measure the size of this stadium by the stick. You will get some number (let it be 535 sticks) with quite high possible error. But you can measure the same thing again (new result would be 523 sticks). And if you collect many of those experiments with high standard deviation, you will be pretty sure that the stadium has length of 530 m with standard deviation less than 1 meter. That is the power of collecting many (independent!) data. Also that is why in our case we get less RD than initial RDs - the system is pretty sure that the new value is correct. I have experimented with second RD and I have found that if it is very high (f.e. 300), the result RD is higher than 50. einstein13 ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
trimard![]() Topic Opener |
Posted at: 2019-07-12, 16:24
Glicko test in 1vs1I used the data from the 2017 tournament I don't think it's really necessary to ask for permission, because the result are already available to everyone and it's only for test. It's not the data that will actually be used for the rating system. PrecedureI didn't want to redo the calculus, because I'm not as good as einstein for these kind of things. So I used this script. I didn't compact series of games together as is recommended in the glicko2 paper. I actually calculated each map 1 by 1. It's not yet clear to me how to do otherwise. Anyone knows btw? About the data, a few "problems" I had was that:
Constants used (recommended in the initial glicko2 paper):
Result
Non 1vs1 gameEinstein13
I don't know what to say, so happy you were able to do that. I really want to test these equations. You're totally right, it will easily be done by a computer! I agree with your whole reasoning, though I haven't done math since so long, I can't comment your equations. Yes it's exponential and not linear, that's for sure.
I'm so hyped to test these too Kind of Nowhere
Yes totally, we need A LOT of test. But the problem is. Currently, we have no data to test. And we have no data to test, because people don't play and then report their results (except during tournament). So integrating this system, even if using "false" assumptions, will give us enough data to make better equations. It's a first stage. And it's good to have some equations to help for this first stage. Storing datas
Yes, and yes hessenfarmer we totally should use these data for balance discussions. That would be super useful! Edited: 2019-07-12, 18:58
![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
king_of_nowhere![]() |
Posted at: 2019-07-12, 17:26
there are no missing data. probably you only looked at the pairings and rankings, not all of which are present in the tournament thread. the table shows, for every player, all his opponents and results, and it is complete: https://www.widelands.org/forum/topic/2912/?page=1#post-21254 ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
trimard![]() Topic Opener |
Posted at: 2019-07-12, 17:54
Damn didn't scroll till that point, I knew you had a more detailed table! I'll check back latter. Do you have an idea about the added matches?
Really easy to integrate, but I prefer to be sure when these matches were played. ![]() ![]() |
|||||||||||||||||||||||||||||||||||||||
king_of_nowhere![]() |
Posted at: 2019-07-12, 18:13
No. the 1/1, 1/2, 0/1 are the directmatch tiebreak. It means that when two or more players have equal score and bucholz, the one who won a direct game (if one was played) is first in the ranking. after round 5 there were 3 people at equal score and buchholz, so i looked at all their matches. einstein had played against both mars and nemesis, won one game and lost one, so i gave him 1 out of two in direct match. mars had only played against einstein by that point, and he lost, so he got 0/1. nemesis had defeated einstein, so he got 1/1. but no additional matches were played.
no, i played against worldsavior in round 6. the table shows that I faced nemesis in round 1 ![]() ![]() |