Pickup Fortress Matchmaking (new algo)

General Stuff about Armagetron, That doesn't belong anywhere else...
Post Reply
User avatar
Kronkleberry
On Lightcycle Grid
Posts: 18
Joined: Fri Apr 16, 2010 3:29 pm

Pickup Fortress Matchmaking (new algo)

Post by Kronkleberry »

Hello gamers. As of the last few weeks I have been running my new fort match-making algorithm for pickup, as well as created a discord bot for it to run automatically. The reception has been quite good and the gameplay yielded from it has, in my opinion, been better than ever from a match-making standpoint. I’m here to explain a little about how it works and address some questions/misconceptions I have heard and expect to hear.

The System:
For full transparency, I ranked the entire active player base and most inactive players on a 7-tier skill hierarchy, SABCDEF. I then collaborated with a few other high level players to give input and finalize what we thought would be a good dataset to run simulations on.

This process of ranking everyone was completely subjective and I understand that might be a little dubious. I was worried at first how this kind of thing would be received, as it is pretty flawed in principle. I ended up running hundreds of simulations on subsets of 12 in our community and it ended up seeming more accurate than our data-driven system in the past. Since testing this system on pickup, we’ve had consistently balanced fort lobbies that I think the most skeptical are willing to over look the lack of actual data to support this structure. Which is the most I was hoping for.

I used some criteria that does grant players mobility within tiers should they improve, and I am also very open to expanding this system if a good suggestion comes my way.

The Math:
Each player is given a numerical value based on their corresponding tier (S=7, A=6, … F=1). The algorithm finds the 4 highest rated players in the queue and splits them randomly into two teams.

The numerical values are tallied across both teams so far. The team with the greater score is given the lowest ranked available player, and the team with the lower score is granted the highest ranked available player. If the teams have an equal score, they are each granted the highest ranked available player. This step is repeated until both teams are full.

There are also some checks in place for if the difference between teams tier rating score passes a certain threshold, the queue is then rerolled and the algorithm starts over with fresh teams. This measure is in place so as to not create teams too lopsided to where the initial split of players cannot be balanced around.

The benefit of this system is that is there more variability on the top end of teams. S/A tier players now have a greater chance of being on the same team as each other, while having the rest of the queue to be distributed evenly and accordingly around that strength. A problem with the last fortauto system was that if two S tier players added, there was a 0% chance of them ever being on the same team.

The Logistics:
The entire architecture of this code is predicated on your discord account. I painstakingly went through every single player’s unique ID and attached a tier to it. This means, that if you use your primary discord account that you’ve always used, this algorithm will work. If you use a different account, the algorithm will not recognize you and it will put you in “E-tier”. This is intended for new or returning players who might not have used our discord before. If you’re a veteran player and use a burner discord account this will likely impact the balance this system gives and is smurfing.

The Commands:
!roll - any player can do this after pickup pops. It is only available once. There is a bug where if two people enter !roll simultaneously, it run’s the algorithm on a double-queue. In that case just…
!reroll - this will run the algorithm again on the queue. Intended use for when a roll looks imbalanced or if teams are similar to a previous game. As of now this command is available to small list of ‘trustees’. (Windrider, Ampz, Desolate, delinquent, vov, kronkleberry)
!clear - will clear the teams and queue. Only available to trustees.

THINGS TO CONSIDER BEFORE YOU COMPLAIN:
  • This algorithm merely suggests teams. I worked to make it create teams as balanced as possible while still incorporating elements of randomness so as to not spit out the same teams every time.
  • Balancing 2 teams of 6 in the size of our community with these limitations is insanely difficult. There are so few players that consistently add for pickup, there is massive skill gap from top to bottom, even from tier to tier, and more often than not, that skill variety is represented in a pickup lobby. Most every other game will separate a MASTER player from a BRONZE player, but is this game we are tasked with putting them in the same game and somehow making it work.
  • The algorithm does not select positions for each team. This is a massive and direct influence of how a match can play out. Some players excel at certain positions and are weaker at others, it is your job as a team to figure out something that works for everyone and understand the impact this can have on the match. A certain line up of 6 players can be much stronger than those 6 players in a different order. The algorithm doesn’t directly account for that. (Although players who are well rounded at all positions are generally tiered higher.)
  • Aside from positions, there are numerous other variables that dictate the outcome of a fort match beyond team balance. Someone can be lagging badly and not playing well. Someone might sub out for a player of different skill level. Someone could make a poor decision that leads to what could be a 20-point swing in the match. Someone can die/troll on the grind and throw the entire round for your team. Just to name a few.
I’ve seen people blame the algorithm for a “bad match” when the team selection was one of the few things that actually went right. Use your !reroll if you think you need it, and play matches with consistency. I’m not claiming the match-making is perfect, if anything it’s very imperfect as balancing in this game is such a seemingly impossible task. I know this because I’ve put a ton of time into theory-crafting and fine-tuning different balancing algorithms. It’s important to be realistic about why a match with a big score difference happens. I am open to quality suggestions and observations from your experience with the algorithm.

Another point I’d like to make is that having balanced teams is great and makes the games more worth playing, but the real reason a lot of us requested an automated system is to bypass the amount of time it takes to pick teams. Teams are now made within seconds of the queue being filled, go to the server immediately!!!!!
User avatar
delinquent
Match Winner
Posts: 750
Joined: Sat Jul 07, 2012 3:07 am

Re: Pickup Fortress Matchmaking (new algo)

Post by delinquent »

I might consider adding the command

Code: Select all

!merge <@player> <@player>
to the bot, to reduce the impact (and, indeed, the possibility) of smurfing. That would work in the case of, say, Subliminal, who cycles throwaway Discord accounts frequently (Just to note - she's not smurfing here). Apple, Mr, Magi, Johnny, Ghostly and Nelg are all accounts that I, off the top of my head, recall having switched accounts at some point in the past. You don't need to replace those userids either, which means that those individuals can use any of their accounts to achieve the same rating.

On the subject of the old fortauto ranking system, I think there may have been a fundamental flaw in its original design. I did, at one point, jot down a rough outline for captured datapoints but iirc there's not much in the way of a timestamp which makes some of my intended datapoints impossible to catch. The crux of it, though, is that a poor team with one or two excellent players can often have a far more negative impact on those excellent players, even if the opposing team is made up of poor to average players, by the range of available tactics within fortress explicitly, and armagetron overall. The most obvious answer is holing - three extremely poor players can quickly overwhelm a single excellent player - which is, of course, the expected outcome. My immediate answer to that is to base the scoring meta on whether or not the excellent player achieves an additional kill (so, two of the three poor players) after the hole is made, whether or not the hole is a no-point hole, how long it takes for the hole to be made and the zone captured, and whether or not the defender attempted to zone-save. Again, the latter are nigh impossible to measure on the basis of ladderlogs alone, because of the lack of a timestamp and the inability to evaluate entirely whether or not a death is intentional or mistaken.

All of this leads me to an idea. What sort of interest is there in anonymised judging? By this, I mean a selection of two to three games is put to a rotating panel of, say, four volunteer judges on a weekly or monthly basis. Their role is to anonymously evaluate the performance of each player in those games, taking into account any exacerbating factors, matching those performances against a set number of criteria which carry a set number of points. This brings a human into the equation, a necessary factor in evaluating things like trapping players in boxes and escaping from the traps of other players. That final score, and its reasoning, is used to influence the score of those players within your metric. By the same facility, and given that the judges and games are both somewhat randomised each session, no player can claim to be unfairly ranked.

It would require z-man or another volunteer to run a presence in every server carrying the pickup name, but that might be easily automated - dump any games overnight (or probably better to do so late morning), and check to see the recording bot is still running. That's not too great an ask, though, I don't think, and the results would (in my opinion) more than justify the effort. We could also set requirements for the judges - active within the past x weeks, not a judge on the last x panels, etc etc.
User avatar
delinquent
Match Winner
Posts: 750
Joined: Sat Jul 07, 2012 3:07 am

Re: Pickup Fortress Matchmaking (new algo)

Post by delinquent »

Oh I forgot to mention it in that post, I got far too distracted by hey, new ideas! but thanks for doing this kronk
User avatar
sinewav
Graphic Artist
Posts: 6390
Joined: Wed Jan 23, 2008 3:37 am
Contact:

Re: Pickup Fortress Matchmaking (new algo)

Post by sinewav »

Just wanted to say the newauto system is fantastic and I have no complaints. Games are better now than they have been in years and I think a major factor is the human sort and larger number of tiers.
delinquent wrote: Sun Jul 24, 2022 11:59 pmWhat sort of interest is there in anonymised judging?
I see in-game some people are a bothered by the private dataset (though it doesn't bother me at all because it's clearly a good sort). If the community wants something less opaque, then I want to offer a suggestion I had before I learned about armarankings. Basically, we allow the community to sort players using a simple interface, either a webpage or a discord bot that offers a choice "is player 1 better|worse|equal to player 2?" Or, simply "rank player 1 on a scale of 1-8." We all intuitively know everyone's skill level and over time we will have a good sort that is continually updated as long as people vote regularly. (This tool could be expanded for individual positions: is player 1 better than player 2 at pos x?)
User avatar
delinquent
Match Winner
Posts: 750
Joined: Sat Jul 07, 2012 3:07 am

Re: Pickup Fortress Matchmaking (new algo)

Post by delinquent »

sinewav wrote: Mon Jul 25, 2022 8:18 pm "is player 1 better|worse|equal to player 2?"
I don't really like this sort of approach, it's a bit too vague for me. The reason I made the above suggestion is because we can create a list of capabilities and successes, and base the results on tangible evidence. That means no personal impressions can impact the rankings (or, at least, those impressions are less impactful), and implies that the ability of players to learn new skills is thus reflected in their next ranking - whether or not they are successful. We could also have a bonus score for a new technique learned. Point being, we can score players very finely and accurately, without a great deal of effort.
User avatar
Kronkleberry
On Lightcycle Grid
Posts: 18
Joined: Fri Apr 16, 2010 3:29 pm

Re: Pickup Fortress Matchmaking (new algo)

Post by Kronkleberry »

I am interested in the idea of accurately scoring players with tangible evidence. Nanu has an SBT server that harvests statistics about players' behavior in a sumo round. He shared some analysis of the data and it was pretty compelling. I think he may or not be in the process of creating a version of that server but appropriated for fortress. If/when that kind of data is available it would be interesting to see a sort of list of attributes or perhaps a calculation of skill kind of like players in FIFA have a derived overall score based on pace, dribbling, shooting, physicality, etc. This allows us to analyze a more robust dataset at large than having a panel anonymously spectate a few fort matches.

I am aware that some expressed some apprehension of a private dataset, and I totally expected that, and fully understand why too. My hope was that the matchmaking would speak for itself and people would share your take on it, sine. It would be good enough that people are willing to overlook having such an opaque system. Which I think is the case aside from a select few. Because this genuinely is a massive upgrade from all the bullshit that came with manually picking teams. I do believe that re-educating the tierings by more democratic means would likely sacrifice accuracy for transparency. There's a reason they hire former professional athletes to become analysts of the same sport. There are a so many dimensions to becoming skilled in this game, and it's hard to account for those which you might not have or know what to look for. This is why I asked some of the best players to proof-read my tier listings. Those that have climbed the ranks have greater insight into the intricacies of fortress and play-style than those that have not.
User avatar
Nanu Nanu
Core Dumper
Posts: 189
Joined: Wed Jul 27, 2011 3:20 am
Location: Witty comment about location here

Re: Pickup Fortress Matchmaking (new algo)

Post by Nanu Nanu »

I think because of the small size of the community and difficulty to categorize players based on a few data points or W/L ratio, a few top tier players coming up with tiers and placements is as close as we can get to balanced match making. Reception seems pretty good since the introduction, so the tiers look to be accurate. Would be interested in the option to suggest positions in the future, but that might require more data and maybe shouldn't be included anyway so we aren't stuck playing the same pos every game.

I'll probably make a more in depth post about it when I have more details, but I will put up a server sometime in the next few weeks for gathering fort data like Kronk said. I'm reworking my stats gathering scripts to automatically send data to my server so I can display on my site, but starting with TST since it requires less players to test everything. Front end for all pickup modes will take a little longer, but for TST at least we can start testing and gathering data in probably a couple days.

Combining all the gathered data with the tiers that Kronk has could be really interesting. We know that the tiers are accurate, but we don't always know for sure what data points correspond to better play. Finding the differences between the tiers based on data would be fun, though maybe if we did something like that, we'd have to find a way to publish the results while keeping the tiers private.

Links to previous data in case anyone missed it at the time: TST and SBT
Prema wrote:The second match starts, a new beginning,
Nanu and Prema, Sui and Ninja,
versus those same old hoes grinning.
User avatar
sinewav
Graphic Artist
Posts: 6390
Joined: Wed Jan 23, 2008 3:37 am
Contact:

Re: Pickup Fortress Matchmaking (new algo)

Post by sinewav »

Kronkleberry wrote: Tue Jul 26, 2022 2:27 amI do believe that re-educating the tierings by more democratic means would likely sacrifice accuracy for transparency.
Maybe. There is probably a way to add transparency to the current system of "cloaked committee creating a secret database" and maybe we should think of how to do that if only to appease people uncomfortable with it (not me, I don't care about my rank or who does the ranking). Transparency is good when things are broken because it's easier to fix. Right now things are great so I assume there won't be much pressure to uncover the inner workings of newauto. Regarding stat collecting, this could potentially create a meta-game with negative effects. Some people LOVE stats and their pride sometimes overrides their ability to be a good teammate. Hell, even knowing your tier makes people act shitty to each other. Plus, stats can't account for a person's Fort IQ or situational awareness. I'm not convinced stats would be useful or better than a human sort.
User avatar
delinquent
Match Winner
Posts: 750
Joined: Sat Jul 07, 2012 3:07 am

Re: Pickup Fortress Matchmaking (new algo)

Post by delinquent »

Kronkleberry wrote: Tue Jul 26, 2022 2:27 am There's a reason they hire former professional athletes to become analysts of the same sport...I do believe that re-educating the tierings by more democratic means would likely sacrifice accuracy for transparency


That's also partly why I suggested a rotational panel, and why the criteria should ideally be pre-determined. A set of pre-determined criteria limits the potential for players to be "wowed" by a better player. And don't forget, they have to justify their decision-making to the rest of the panel. They can't just turn around and sya "Well, I can't do that, so it must be good". Hell, we could even record those panels and put them up for review by the community at large.

The thing with the automated scoring system is that it doesn't reflect gameplay, it only reflects scoring. It's easy to abuse it by volunteering for defense and letting the opposing team die on you, whilst never displaying any other skills. Obviously this is an exaggerated scenario, but its not like this doesn't have a certain impact.

This is the problem I encountered the first time I tried building an algorithm myself - I wanted to look at the flow of a game, get some rough timings together, and try to put together a rough idea of what was actually happening by examining when exactly a death occurred, and what position that player was in. With the level of output in the ladderlog, this simply isn't possible. Of course, I could try and fudge positioning with complicated scripts, but that would get farcically complex very quickly. This is why I suggested expanding Kronks approach. It's not that I distrust your assessment of the ability of others (in actual fact, I would say I prefer your approach thus far), but I'm interested in making those judgements harder to refute by updating them with clear, concise information on a regular basis, from a diverse repertoire of sources.
Post Reply