Guest Post: All Diplomacy Scoring Stinks11 min read

Introduction

There has been a lot of debate in the Diplomacy community on scoring systems. Look at how just this one blog has an entire subject matter section devoted to the topic! These discussions have strong points in favor of “what is most ‘pure’ ” or “what rewards what style of play,” but none of them seem to address the issue of the general purpose of a scoring system. There are multiple competing purposes, and we have to acknowledge that before it can be addressed.

There are three main purposes of a scoring system in Diplomacy. Skill points are assigned to roughly gauge relative abilities of players. Utility points are assigned to motivate a player’s in-game goals. Tournament points are similar to utility points, but motivate play across multiple games. 

In the Diplomacy community, we often conflate all three of these main purposes. In other games, like chess, they have well established different points for this (Elo for skill points, and usually a 0/.5/1 tournament points for a loss/draw/win). In tournament bridge, there are multiple different ways people do all three types of points, but they are never conflated.

An Analysis of Tournament Bridge

A large conference hall full of people sitting at tables playing bridge.
A bridge tournament stock photo. The dividers
at the table are a response to previous famous
cheating scandals.

For those unfamiliar with tournament bridge, I describe the common methods in detail here.

For utility points there are two main methods of scoring in bridge. Each hand is scored on hand points and then converted to utility points in one of two main ways: a 0/.5/1 score based on who has more hand points or a conversion to an amount of utility points based on how many more hand points were scored. In bracket tournaments, there is no further conversion (the team with more utility points advances), but in round robins or swiss style events tournament points are assigned based on the difference in utility points for each match. Finally, skill points are assigned by placement at the end of a tournament. These are usually assigned based on the size of the event, prestige of the event, and rank at the end of the event (but NOT based on difference in tournament points). Some skill points accumulate and never are lost, some “decay” over time. It is widely acknowledged in the bridge community that outside the top tier of players, these skill points do a suboptimal job of raking players, but for various reasons no better system has emerged.

But let’s look closely at the utility points in bridge. Depending on whether you care about “more/less” or the “size of the difference” in your hand points makes a big difference in how you play. In the latter case, you’ll often happily surrender a small number of hand points for the chance at a large payout. The skill to judge these gambles is very difficult, and accurately finding and protecting against large losses is among the hardest skills in the game. In the “more/less” case, you’ll try hard to consider how often a play works, and care less about how many hand points the play is potentially worth. Accurately assessing frequency is another difficult skill, and many players cannot balance when they are already likely to be ahead and should “play safe” or when they are likely to be behind and need to “swing away.” The choice of utility points obviously affects outcomes of matches. Imagine a 12-hand match where most hands are worth 50-200 hand points, but one hand has a 50-50 guess to be worth either 1430 or 100 hand points. In the system that just cares about “more/less” all the hands contribute to which team wins the match. In the system that cares about how much, a team can win all the other hands, but lose the 50-50 guess, and lose the match. 

If you’re waiting for me to say which system in bridge is “better”, keep waiting. They both have merit, and both lead to interesting tournaments and displays of skill. But each scoring system requires subtly different judgements, strategies, techniques, and skills.

Back to Diplomacy

Utility Points

Comparing this to the discussion in Diplomacy, we need to acknowledge the following conflated goals that we have when discussing scoring systems. The utility points should dictate how we want players modifying their play in-game. In my opinion, any player who is in a game with a prescribed scoring system and declines to increase their utility points in the outcome (at no risk) is not playing by the agreed upon rules and norms of a game. Voluntarily tanking is against the spirit of fair play.[1]I do not mean that a player should never threaten to throw if it looks like they are being cut out of a draw; that’s playing to their win condition, and they may follow through when pushed. I mean that a player who could safely remove someone from a draw and declines to is purposefully scoring … Continue reading

Press Diplomacy (excepting tournaments) should have the option of acknowledging that different players have different personal utilities and allow an option of having NO utility points; this contributes to the ability of players to manipulate and disguise their goals as opposed to having an agreed upon scoring system. If you’ve successfully convinced someone that you “deserve” to be in the draw, then by all means take it.

Tournament Points

Tournament points are similar to utility points, except they exist across games. There are times where it can be beneficial for a player to tank their utility points in a single game in order to help their overall standing.[2]There are famous cases in World Cup Soccer and Olympic Badminton where players were punished for doing exactly this. Tournaments need to be clear about players needing to maximize their win conditions. They should also do what they can to remove such conflicts, like making teams from the same … Continue reading The play styles needed and skills to assess a difficult situation are myriad. Should you throw to a solo so only one player scores points, and you’ll still qualify for the next round? What about throwing so a player whom you assess is easier to beat joins you in the final (feeling like they owe you a favor to boot!)?

Another issue I want to raise is that players making these decisions towards the end of a round (after other players’ matches have ended) have an advantage: they can use information from other matches to make their decisions. Something to consider for symmetry would be adding anonymity, or hiding the running total in the event; modifications to prevent using the “barometer” (as it is called in bridge tournaments when it exists) to modify later play may be useful.

Skill Points

Skill points may be the hardest of all to design. There is the ghost ranking on webDiplomacy. And diplomacy.ca used to have a similar system. But there are many skills in this game, and what you value still comes through in assigning skill points. Is surviving but losing to a solo better or worse than being eliminated fighting the solo? Is getting a lead, but not making it to 18 centers better or worse than managing to corral a posse to stop the advance? What displays more skill: getting to 12 centers as France and stalling, or getting to 6 as Italy and stopping France before they can cross a stalemate line? Anyone who believes these questions are answerable is wrong. But we can still attempt to assign a ranking on a player’s demonstrated results.

My Preferences

Now, for those who have read this much theory and think I have nothing to contribute of my personal preference of points, I do have some thoughts.

First, I think that most current utility points overweight draws, or overweight relative size of a country in a draw. Both are problematic. For gunboat, I want a utility point system to be relatively agnostic to the large country that doesn’t solo, and the small country that plays beautifully to stop them soloing. Motivating the endgame play is important.

That said, I also want to motivate smaller draws, and solos. My solution is to return to something similar to diplomacy.ca’s scoring system, which achieved this by having the total points awarded at the end of a game change based on the number of players left.

For example, a solo might be worth 100 points. A 2-player draw worth 40 points each. A 3-player draw worth 25 points each. A 5-player draw might be worth only 10 points each (don’t get hung up on the specific values, but on the flavor of the motivation). Remember, these are not skill points, these are points assigned to motivate a play style and emphasize skills. In particular, they motivate survival, eliminating opponents, and preventing/achieving solos.

For press, I want a utility point system that gives as much flexibility to the press as possible. One option is 0/1 (not-solo/solo). This allows maximum flexibility for the players to have their own personal pride points for anything but a solo. Another good option is to use the proposed gunboat system. 

Skill points are particularly difficult to assign. Elo works well in a 1v1 game, but has issues in teams and multiplayer games. Simply converting from utility points or tournament points ends up overvaluing a “big win” over simply “winning.” One option for both press and gunboat is to have a post game judges points assigned for skill points. This would be like gymnastics or figure skating judges to assign points based on various skills and styles demonstrated during the match.

What about a “Duplicate” system?

Another option is a duplicate style tournament like in backgammon or Scrabble. In these events, there is a measurable “best play.” Players are given a number of situations and asked what to do, and then scored based on their choices. I don’t think such a system is possible in Diplomacy (gunboat or press), but maybe we can do it closer to how bridge does duplicate. Where the same situation is played by multiple teams, and the results are then compared. I could see a duplicate Austria vs. France tournament (here’s the first 5 years, play it from here…), or gunboat and press (here’s the position in 1913 with 2 big countries and 2 littler ones remaining, play it out). Seeing how many people manage to play the same position could start to provide us a measure of duplicate skill.

My Criticism of the Other Systems

What do I not like about Sum-Of-Squares (SOS) or Draw Sized Scoring (DSS) or Tribute? They motivate play and skills that I don’t personally value and don’t enjoy focusing on.

For example, as a 1-center survivor in a key position in a stalemate line, SOS rewards me almost nothing for having managed the skills to foresee and move to that position.

Draw Sized Scoring rewards me managing to happy Care Bear draw in 5-way draws, and score as well over time as compared to nearly the same number of 3-way draws. This is not a game of just surviving, one must show the skills to push and thrive and eliminate.

Tribute ends up with wacky motivations for the 1st and 2nd place players on a board. It rewards the move from 1st to solo comparably in some situations as it does moving from 2nd to 1st. It also rewards more moving 2nd to 1st than penalizing moving from 2nd to giving up a solo. So the play ends up highly valuing ALL centers at the end of the game, and valuing being in 1st more than giving up a solo.

Part of the charm of Diplomacy is that different centers have different values at different times. Anyone who wants Munich in 1901 or 1902 doesn’t understand this. Anyone who doesn’t want Munich in 1915 equally doesn’t. But the Tribute system forces players to throw out this subtle valuation ability in favor of “total centers”. I personally don’t find it an appealing style of play, and I think it overweights certain countries abilities to score.

Conclusion

So, what should we do? We should start acknowledging the different goals of our points. I want to read more articles by top tournament players discussing how the scoring system and deadline affects their style of play. I want discussions on which skills we’re ranking in skill points. What play styles we’re motivating with utility points. What collusion, gaming, and play styles a tournament is encouraging with tournament points. We need to make these considerations more explicit, not just for personal preference, but focus on skills examined and measured. All the scoring systems are telling the players how to play; we need to focus on what they’re being told.

Footnotes

Footnotes
1 I do not mean that a player should never threaten to throw if it looks like they are being cut out of a draw; that’s playing to their win condition, and they may follow through when pushed. I mean that a player who could safely remove someone from a draw and declines to is purposefully scoring fewer points. Someone declining to solo for a 2-way draw is purposefully scoring fewer points. In games with an assigned utility points system, this is arbitrarily colluding. See the next paragraph for press players who wish to manipulate others into allowing them to survive or adhere to a long term alliance.
2 There are famous cases in World Cup Soccer and Olympic Badminton where players were punished for doing exactly this. Tournaments need to be clear about players needing to maximize their win conditions. They should also do what they can to remove such conflicts, like making teams from the same country play each other in the first round of a qualifying round robin, or preventing knowledge of current standings from being known.

8 thoughts on “Guest Post: All Diplomacy Scoring Stinks

  1. Mercy

    Nice article. It sums up well what I have been thinking about scoring systems.

    Personally I refer to ‘skill points’ as ‘rating’, so e.g. I would call Draw-Size-Scoring an example of a ‘scoring system’ and Ghost Rating an example of a ‘rating system’.

    Why the title, though? From the article I get the impression that you only dislike the particular scoring systems on webDiplomacy, not *all* scoring systems.

    Reply
      1. BunnyGo

        Lol. I informed our Bored Brother that it’s common practice that editors choose titles for advertising and readership purposes and authors have little control. Something to keep in mind when you like an Op-Ed but find the title click-bait.

        Glad you liked the article. I think my opinion of any scoring system is that they all have merits, some just aren’t to my personal tastes.

  2. Eric Hunter

    The rules of Diplomacy are clear.
    The Object of the Game is to Solo, therefore a Solo has a value “S”. If the players agree to a Draw, all Survivors share equally in the Draw. Therefore, Draw value “D” < "S", and each surviving player get "D" ÷ P. So, assuming a 5-point buy-in:
    S = 35
    2-way = 17 – 5 or 12
    3-way = 11 – 5 or 6
    4-way = 4
    5-way = 2
    6-way = 1
    7-way = 0
    Any other scoring system means you are not playing Diplomacy.

    Reply
    1. George Battlesworth

      The rules of Diplomacy are *not* clear where scoring systems are concerned (and convoy paradoxes but that’s just life).

      For a start, tournament scoring isn’t, and has never been, a concern of either Calhamer or Avalon Hill. Just like it was never an issue across the historical development of chess, as it is completely alien to the rules of the game itself yet things like ELO are a necessary addition to the game at e.g. tournament level.

      Play on e.g. PlayDiplomacy or webDiplomacy with Draw Size Scoring and you are *not* (according to your own logic, which I find valid) “playing Diplomacy”. You either win, draw, or lose in the game. There are no positions, buy-ins, points, etc. Who the hell grabs the board game, makes into a 4-way draw, and says “Yay, I got 374 magic points because I shared equally in a draw”?

      Bottom line is, the only way to play Diplomacy “purely” is playing “unrated” – i.e. without any kind of points, with each draw position being valued per case in a purely qualitative manner… like Calhamer intended.

      Ah, and also without any kind of negotiations during the Retreat and Disband phases – you can only negotiate before orders are submitted for resolution, per the rulebook. Anything else means you are not, and have been not, playing Diplomacy. 😉

      Reply
    2. CaptainMeme

      This is only the case if you assume that the scoring must be zero-sum, which was never the intention of the rulebook. The ‘All Survivors share equally in the Draw’ originally meant that all Survivors in a draw got the same value result, so something more akin to:
      Solo = 100 points
      Draw = 10 points to each surviving player (regardless of size)
      Anything else = 0

      The system you’re describing is what it morphs into when you force it to be zero-sum, and it promotes a very, very different style of play than original diplomacy rules do.

      Reply
      1. BunnyGo

        Yes. I think dropping zero sum as a design requirement will help people focus on more useful applications. Either match or tournament or ranking points could all be non-zero sum. One just needs to be careful with the other applications.

  3. jay65536

    I have many differences of opinion with what’s in this article, but they’re just that: differences of opinion, not much worth commenting on.

    There is one point, however, where you claim to prefer assigning utility based solely on draw size in a fashion that increases the total points per game as the draw size decreases. This by itself is just another difference of opinion, but you later go on to claim not to like Calhamer points (what you call “DSS”) because the different draw points are too close to each other in value. This doesn’t make sense, and I can show why.

    To put some numbers to this, normally the way people think of CP is to take a fixed pot (let’s say 420 to make the numbers look good) and split equally among non-losing players. So:
    Solo 420
    2way 210
    3way 140
    4way 105
    5way 84
    6way 70
    7way 60
    Loss 0

    When framed this way, it certainly does appear that your complaint has merit. After all, two 5ways would appear to beat a 3way, to use your example. But here’s my point: this treats CP as a positive-sum system. What does it look like if we convert to zero-sum? The way we’d do that is to recognize that to make a pot of 420, each player should “bet” 60; then we track, not how many points the players *get* at the end, but how much they *win*. (The shortcut is to just subtract 60 from everything.) Now it looks like this:

    Solo 360
    2way 150
    3way 80
    4way 45
    5way 24
    6way 10
    7way 0
    Loss -60

    So, zero-sum CP looks like it actually gives the outputs you say you want! It is hard for a string of big draws to catch a small draw: it takes, for example, four 5ways to outscore a single 3way, not two. Similarly, it takes four 4ways to catch a single 2way, and it takes five 3ways to catch a solo!

    The difference is that this formulation highlights the fact that a loss is a decisive negative result (notice that the absolute value of a loss is in between those of a 4way and a 3way). In the 420-sum version, two 5ways appears to beat a 3way; in the zero-sum version, two 5ways beats *one 3way and one loss*–but not one 3way and no other games.

    Zero-sum CP has draw values that seem to align with what you say you want to motivate. So if you want to make a loss worth 0, not a negative number, then you’re not breaking zero-sum by increasing the values of solos and draws–you’re breaking zero-sum by increasing the value of a loss so that it scores the same as an indecisive result (i.e. a 7way).

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *