Good afternoon everyone! My name is Magne Skjæran, and I’m a senior programmer on Crusader Kings 3. I’m here to bring you the second entry in the Anatomy of a Game series. You can read the first entry by Matthew here on our Startup and Loading.
Today’s topic will be how we change the gamestate. This is core to the whole simulation of the game, since if nothing changes there’s no real game to play.
I’ll be covering three main topics here. First I’ll talk about the command system, which is how all interaction with the game happens. Then about how we determine what to change vs. how we actually change it. Then finally, about what out of syncs are and how they occur.
What I cover here will all be based on CK3, but a lot of it applies to our other games as well. But there will be differences here and there; sometimes big ones! So don’t take any of this as gospel for our other games.
A command is a set of data on how to change the gamestate. A simple example would be a command to “queue movement of this unit to that province”, or “select this event option”.
All interaction happening via command also makes it easy for us to find everything the player can influence, which makes a variety of bugs easier to debug, and makes it easier for us to reason about how the game works.
[A command to disband an army]
The command system also forms the basis of multiplayer. Anything a player does is communicated to the other players’ machines by sending a command over the network. Forcing all interaction into the system therefore makes multiplayer Just Work™ in the vast majority of cases without us having to write any MP-specific code. When a programmer implements a new system, it is rare to have to think much at all about much at all about multiplayer (while the designer probably needs to give it some thought to make sure the feature is fun both in SP and MP).
The player’s interaction happens via the interface, unsurprisingly. The interface is a separate module from the actual game logic; it covers things like what to show, and how they’re interacted with. The game logic can’t see the existence of the interface at all in the code, which avoids a whole class of bugs where logic in some way depends on the interface, an issue that would occasionally happen in our older games. The interface is only able to read the gamestate, and this is enforced by the code systems we have. Commands are the only way for the interface to affect the gamestate.
[The code to send a command from the UI]
Since the gamestate cannot see the existence of the interface, this means that it is hard to communicate with the interface. Naturally, this can pose a problem. For instance, imagine something happens to the player and we want to send a notification about it; for instance if the player goes up a prestige level. Sure, the interface could store the player’s prestige level and then generate the notification when it changes, but this ends up duplicating a ton of state between the logic and the interface. So instead we have a system similar to commands for sending information from the logic to the interface, which we call “messages”. Like commands, these are specific pieces of information that the interface is to act upon in some manner. They get handled in the interface, so when the logic sends the message “player increased their prestige level”, the interface then takes care of actually showing that notification to the player.
Now, that’s enough about the player. What about the AI? The AI plays by essentially the same rules. Anything that’s not happening via the simulation itself is done by command for the AI as well. Periodically, the AI considers the various actions it can take, and for each it decides to do it’ll send a command. Usually this is the exact same command a player would’ve sent; the player and AI will both use the same command for “move this unit to this province” for instance.
The AI and the player using the same system makes it easier for us to ensure the two play by the same rules. Even more importantly, that the same attempted action gives the same result, avoiding subtle bugs due to differences between how the AI and the player interact with the game.
[The AI sending a command]
There’s not that much more to cover about how the AI interacts with the game without going into far more detail of the AI systems themselves, which could easily be a dev diary of its own, so I’ll move on now.
Generally speaking, we can only execute one thing at a time (otherwise we get out of syncs; more on that later). We can however evaluate multiple things at the same time by using threads. So instead of each character individually in a row deciding what events to fire, we can consider 8 or more (depending on how many CPU threads the player has) characters’ events simultaneously. Each character then adds the events they’re going to fire to a queue. This part has to be synchronized as we can’t add two things to a list at the same time, but since the vast majority of the time is spent on the evaluation rather than adding to the list, we save huge amounts of time by distributing the work. Later we can then go through the queue and fire each event, removing any that’s no longer applicable for whatever reason (maybe a character involved died?) along the way.
This split between evaluation and execution is one of the cornerstones of how we do threading on CK3. The gamestate is split up into various “managers” that are each responsible for one part of the game. For example there’s a Secrets Manager, an Event Manager, a Character Manager, and a Title Manager. The main part of how we progress the game a single day is split into two parts; the pre-update and the main update. In the pre-update, each manager does its own evaluations and makes notes of things to do later. No visible gamestate is allowed to change, so each manager can safely look at things they don’t manage (E.G., the title manager is allowed to look at the holder of a title, even though it doesn’t own characters). Instead they can only change things that are invisible to the rest of the game (like that event queue mentioned earlier).
[Time spent in the various pre-update managers]
The split makes it very easy for us to thread things, as there’s only one rule to follow (don’t modify any visible state). The threading on our older game came with far more rules to obey (only look at your own data, don’t look at this thing, don’t modify this other thing, etc.), meaning that for experienced and new programmers alike it was easy to make mistakes. With only a single rule mistakes are harder to make, easier to catch, and easier to fix. As a result we’re more productive, and CK3 is our most threaded game to date.
The AI works very similarly. It’s run after the main update rather than before it, but works on the same principle. The AI is not allowed to change anything except certain pieces of purely AI-internal state, and instead just sends commands. The AI is split up into a variety of sub-tasks, composed together based on a frequency basis. E.G., an individual AI will check whether it should change its laws and whether it should leave a faction in the same task, as these happen at the same frequency. Each such grouping of tasks can happen simultaneously with any other grouping of tasks. The granularity of this means that the threading of the AI is very effective (known as “good load-balancing”) as one thread is unlikely to finish its work significantly earlier than another thread (which would leave it idling).
[The AI updating a set of tasks]
As mentioned earlier, the use of the command system means that the effects of the AI are nicely isolated from its decision-making process. This makes it easier to iterate upon, easier to reason about, and easier to optimize.
Now, let's move on to the final topic of today: out of syncs.
[CK3 going out of sync]
To explain what an OOS is, I first need to explain how multiplayer itself works. In most games out there, the core of how multiplayer works is that the server (or a player’s machine acting as the server, if it is peer to peer) will tell all the clients the state of everything in the game. Where everything is, where it’s moving, how much health it has left, etc. Left out is usually only things that are static (what the map looks like in many games for instance). Competitive games often also leave out things that the client would have no way of knowing (like the position of another player on the other side of the map) to combat wallhack cheats and the like.
This is generally a very sensible model, but it breaks down if there’s too much gamestate to send over the network several times a second. In a first person shooter with 10, 20, maybe even 100 players all this info can be stored in a few kB, but CK3 for comparison usually has around 20 thousand characters, never mind everything else. The full gamestate of CK3 takes around 30 to 100 MB to store uncompressed, and even with compression you’ll easily pass 10-20 MB once you’re far enough in. Clearly, this is not something we can send over the network repeatedly.
So what do we do instead? We use an architecture known as “lockstep multiplayer”. This is common for strategy games. How this works is that instead of telling clients the state of everything (or a large subset of everything), we instead first provide them the initial state (in the form of a save), and then each client runs their own simulation. We send commands for player and AI interactions; everything else each client will calculate on their own. As a result far less info is sent over the network, since we only need to inform the clients of things that deviate from the natural flow of the simulation.
But here’s the problem: this means we have to ensure every single client simulates the game the exact same way. Because if anything differs, no matter how small, that tiny change will eventually Butterfly Effect its way to causing drastic differences between what’s happening on each machine. So while one player just got declared war on by some Vikings, on another client this wouldn’t be happening at all.
When anything differs, that’s an out of sync. At this point, major breakage is inevitable, and so we tell the players and force a rehost. This isn’t a great experience for anyone, so it is something we work hard on avoiding.
So how do out of syncs happen in the first place? It generally comes down to a lack of determinism. Determinism is when the same input always leads to the same result. As long as that’s the case, out of syncs are impossible (except if some input is lost or corrupted due to, say, network issues). But determinism isn’t easy.
It is simple enough if your game is single-threaded, but then it’ll also be slow. Any threading can introduce non-deterministic behavior if you’re not careful. The most common way is due to order issues. Let's say you’ve got the number X. It has a value of 10. Thread A wants to add 2 to it. Thread B wants to multiply it by 2. If Thread A happens to run first, the end result will be (10 + 2) * 2 = 24. But if Thread B runs first, it will be (10 * 2) + 2 = 22. So if for any reason threads run in a different order on two machines (maybe one CPU core was busy with something else for a split second), an out of sync will occur.
This is a big reason why we usually only multi-thread evaluation. If nothing is changed, then order doesn’t matter. We sometimes thread things that change visible state too, but that’s much rarer and we’re far more careful to ensure that ordering doesn’t matter.
Another cause of out of syncs that was far more common in our older games, was the interface influencing the gamestate in some manner. To take a simple example, imagine we have some value we only rarely update because it is really time consuming to update. But when the player looks at it, we want it to be fully up to date. It might be tempting to force it to update when the player opens the interface but oops… now you’ve introduced an out of sync.
The way we’ve structured CK3 makes it far more difficult to make this mistake, as it’s much harder to modify the gamestate from the interface. We’d instead send a command to refresh the value, and/or maybe do the actual math for the new value just in the interface and leave the gamestate untouched.
Similarly, it’s easy to introduce issues due to bits of game logic depending on if a character is the local player or not. E.G., we want to update the player’s predicted income daily rather than monthly to ensure the player’s info is up to date. The naive implementation here would mean that on each client the client’s character gets updated daily, but the other players get updated monthly. The game would thus be out of sync, as the player characters would have different cached incomes.
In CK3 we avoid this by just checking that they’re a player rather than the person playing on this machine. Furthermore, we’ve made it deliberately harder to check “is this the local player” than to just check “is this any player”. We still need the former quite a bit (primarily for sending notifications), but it involves the programmer basically going “yes, I’m sure I know what I’m doing here”:
[A notification being sent to the local player]
Note the “ALLOW_GET_LOCAL_PLAYER_IN_SCOPE” here; that’s our way of making sure we only check who the local player is if we really need to. Otherwise, we’d easily end up with something only getting changed on a player character for the client actually playing that character.
So that’s the long and short of what out of syncs are, why they happen, and some of what we do to avoid them.
And with that, I’m done. I hope you found this post about how our gamestate works interesting!
I am on vacation today but Matthew (@blackninja9939) will be here to answer any of your questions about this topic as well! And I may check in too!
Today’s topic will be how we change the gamestate. This is core to the whole simulation of the game, since if nothing changes there’s no real game to play.
I’ll be covering three main topics here. First I’ll talk about the command system, which is how all interaction with the game happens. Then about how we determine what to change vs. how we actually change it. Then finally, about what out of syncs are and how they occur.
What I cover here will all be based on CK3, but a lot of it applies to our other games as well. But there will be differences here and there; sometimes big ones! So don’t take any of this as gospel for our other games.
Command System
The core of our game is a simulation. It runs on its own even if no agent (the player or AI) makes any changes to it. But a simulation you can’t influence is just a toy rather than an actual game. This is where commands come in.A command is a set of data on how to change the gamestate. A simple example would be a command to “queue movement of this unit to that province”, or “select this event option”.
All interaction happening via command also makes it easy for us to find everything the player can influence, which makes a variety of bugs easier to debug, and makes it easier for us to reason about how the game works.
[A command to disband an army]
The command system also forms the basis of multiplayer. Anything a player does is communicated to the other players’ machines by sending a command over the network. Forcing all interaction into the system therefore makes multiplayer Just Work™ in the vast majority of cases without us having to write any MP-specific code. When a programmer implements a new system, it is rare to have to think much at all about much at all about multiplayer (while the designer probably needs to give it some thought to make sure the feature is fun both in SP and MP).
The player’s interaction happens via the interface, unsurprisingly. The interface is a separate module from the actual game logic; it covers things like what to show, and how they’re interacted with. The game logic can’t see the existence of the interface at all in the code, which avoids a whole class of bugs where logic in some way depends on the interface, an issue that would occasionally happen in our older games. The interface is only able to read the gamestate, and this is enforced by the code systems we have. Commands are the only way for the interface to affect the gamestate.
[The code to send a command from the UI]
Since the gamestate cannot see the existence of the interface, this means that it is hard to communicate with the interface. Naturally, this can pose a problem. For instance, imagine something happens to the player and we want to send a notification about it; for instance if the player goes up a prestige level. Sure, the interface could store the player’s prestige level and then generate the notification when it changes, but this ends up duplicating a ton of state between the logic and the interface. So instead we have a system similar to commands for sending information from the logic to the interface, which we call “messages”. Like commands, these are specific pieces of information that the interface is to act upon in some manner. They get handled in the interface, so when the logic sends the message “player increased their prestige level”, the interface then takes care of actually showing that notification to the player.
Now, that’s enough about the player. What about the AI? The AI plays by essentially the same rules. Anything that’s not happening via the simulation itself is done by command for the AI as well. Periodically, the AI considers the various actions it can take, and for each it decides to do it’ll send a command. Usually this is the exact same command a player would’ve sent; the player and AI will both use the same command for “move this unit to this province” for instance.
The AI and the player using the same system makes it easier for us to ensure the two play by the same rules. Even more importantly, that the same attempted action gives the same result, avoiding subtle bugs due to differences between how the AI and the player interact with the game.
[The AI sending a command]
There’s not that much more to cover about how the AI interacts with the game without going into far more detail of the AI systems themselves, which could easily be a dev diary of its own, so I’ll move on now.
Evaluation and execution
All changes to the gamestate can be considered to have two main parts: deciding what to do (evaluation) and actually doing it (execution). In a large number of cases, the thing that takes the most time is to figure out what to do, not to actually do it. For instance, choosing which event to fire out of hundreds available in a yearly pulse takes longer than applying the event we decide upon.Generally speaking, we can only execute one thing at a time (otherwise we get out of syncs; more on that later). We can however evaluate multiple things at the same time by using threads. So instead of each character individually in a row deciding what events to fire, we can consider 8 or more (depending on how many CPU threads the player has) characters’ events simultaneously. Each character then adds the events they’re going to fire to a queue. This part has to be synchronized as we can’t add two things to a list at the same time, but since the vast majority of the time is spent on the evaluation rather than adding to the list, we save huge amounts of time by distributing the work. Later we can then go through the queue and fire each event, removing any that’s no longer applicable for whatever reason (maybe a character involved died?) along the way.
This split between evaluation and execution is one of the cornerstones of how we do threading on CK3. The gamestate is split up into various “managers” that are each responsible for one part of the game. For example there’s a Secrets Manager, an Event Manager, a Character Manager, and a Title Manager. The main part of how we progress the game a single day is split into two parts; the pre-update and the main update. In the pre-update, each manager does its own evaluations and makes notes of things to do later. No visible gamestate is allowed to change, so each manager can safely look at things they don’t manage (E.G., the title manager is allowed to look at the holder of a title, even though it doesn’t own characters). Instead they can only change things that are invisible to the rest of the game (like that event queue mentioned earlier).
[Time spent in the various pre-update managers]
The split makes it very easy for us to thread things, as there’s only one rule to follow (don’t modify any visible state). The threading on our older game came with far more rules to obey (only look at your own data, don’t look at this thing, don’t modify this other thing, etc.), meaning that for experienced and new programmers alike it was easy to make mistakes. With only a single rule mistakes are harder to make, easier to catch, and easier to fix. As a result we’re more productive, and CK3 is our most threaded game to date.
The AI works very similarly. It’s run after the main update rather than before it, but works on the same principle. The AI is not allowed to change anything except certain pieces of purely AI-internal state, and instead just sends commands. The AI is split up into a variety of sub-tasks, composed together based on a frequency basis. E.G., an individual AI will check whether it should change its laws and whether it should leave a faction in the same task, as these happen at the same frequency. Each such grouping of tasks can happen simultaneously with any other grouping of tasks. The granularity of this means that the threading of the AI is very effective (known as “good load-balancing”) as one thread is unlikely to finish its work significantly earlier than another thread (which would leave it idling).
[The AI updating a set of tasks]
As mentioned earlier, the use of the command system means that the effects of the AI are nicely isolated from its decision-making process. This makes it easier to iterate upon, easier to reason about, and easier to optimize.
Now, let's move on to the final topic of today: out of syncs.
Out of Sync
If you play multiplayer in any of our games you’re aware of a particularly dreaded set of words: “game is out of sync”. When this happens you’re unable to continue playing, and depending on the game have to either rehost or resync. But what is an out of sync (OOS), beyond us programmers having a laugh at your expense?[CK3 going out of sync]
To explain what an OOS is, I first need to explain how multiplayer itself works. In most games out there, the core of how multiplayer works is that the server (or a player’s machine acting as the server, if it is peer to peer) will tell all the clients the state of everything in the game. Where everything is, where it’s moving, how much health it has left, etc. Left out is usually only things that are static (what the map looks like in many games for instance). Competitive games often also leave out things that the client would have no way of knowing (like the position of another player on the other side of the map) to combat wallhack cheats and the like.
This is generally a very sensible model, but it breaks down if there’s too much gamestate to send over the network several times a second. In a first person shooter with 10, 20, maybe even 100 players all this info can be stored in a few kB, but CK3 for comparison usually has around 20 thousand characters, never mind everything else. The full gamestate of CK3 takes around 30 to 100 MB to store uncompressed, and even with compression you’ll easily pass 10-20 MB once you’re far enough in. Clearly, this is not something we can send over the network repeatedly.
So what do we do instead? We use an architecture known as “lockstep multiplayer”. This is common for strategy games. How this works is that instead of telling clients the state of everything (or a large subset of everything), we instead first provide them the initial state (in the form of a save), and then each client runs their own simulation. We send commands for player and AI interactions; everything else each client will calculate on their own. As a result far less info is sent over the network, since we only need to inform the clients of things that deviate from the natural flow of the simulation.
But here’s the problem: this means we have to ensure every single client simulates the game the exact same way. Because if anything differs, no matter how small, that tiny change will eventually Butterfly Effect its way to causing drastic differences between what’s happening on each machine. So while one player just got declared war on by some Vikings, on another client this wouldn’t be happening at all.
When anything differs, that’s an out of sync. At this point, major breakage is inevitable, and so we tell the players and force a rehost. This isn’t a great experience for anyone, so it is something we work hard on avoiding.
So how do out of syncs happen in the first place? It generally comes down to a lack of determinism. Determinism is when the same input always leads to the same result. As long as that’s the case, out of syncs are impossible (except if some input is lost or corrupted due to, say, network issues). But determinism isn’t easy.
It is simple enough if your game is single-threaded, but then it’ll also be slow. Any threading can introduce non-deterministic behavior if you’re not careful. The most common way is due to order issues. Let's say you’ve got the number X. It has a value of 10. Thread A wants to add 2 to it. Thread B wants to multiply it by 2. If Thread A happens to run first, the end result will be (10 + 2) * 2 = 24. But if Thread B runs first, it will be (10 * 2) + 2 = 22. So if for any reason threads run in a different order on two machines (maybe one CPU core was busy with something else for a split second), an out of sync will occur.
This is a big reason why we usually only multi-thread evaluation. If nothing is changed, then order doesn’t matter. We sometimes thread things that change visible state too, but that’s much rarer and we’re far more careful to ensure that ordering doesn’t matter.
Another cause of out of syncs that was far more common in our older games, was the interface influencing the gamestate in some manner. To take a simple example, imagine we have some value we only rarely update because it is really time consuming to update. But when the player looks at it, we want it to be fully up to date. It might be tempting to force it to update when the player opens the interface but oops… now you’ve introduced an out of sync.
The way we’ve structured CK3 makes it far more difficult to make this mistake, as it’s much harder to modify the gamestate from the interface. We’d instead send a command to refresh the value, and/or maybe do the actual math for the new value just in the interface and leave the gamestate untouched.
Similarly, it’s easy to introduce issues due to bits of game logic depending on if a character is the local player or not. E.G., we want to update the player’s predicted income daily rather than monthly to ensure the player’s info is up to date. The naive implementation here would mean that on each client the client’s character gets updated daily, but the other players get updated monthly. The game would thus be out of sync, as the player characters would have different cached incomes.
In CK3 we avoid this by just checking that they’re a player rather than the person playing on this machine. Furthermore, we’ve made it deliberately harder to check “is this the local player” than to just check “is this any player”. We still need the former quite a bit (primarily for sending notifications), but it involves the programmer basically going “yes, I’m sure I know what I’m doing here”:
[A notification being sent to the local player]
Note the “ALLOW_GET_LOCAL_PLAYER_IN_SCOPE” here; that’s our way of making sure we only check who the local player is if we really need to. Otherwise, we’d easily end up with something only getting changed on a player character for the client actually playing that character.
So that’s the long and short of what out of syncs are, why they happen, and some of what we do to avoid them.
And with that, I’m done. I hope you found this post about how our gamestate works interesting!
I am on vacation today but Matthew (@blackninja9939) will be here to answer any of your questions about this topic as well! And I may check in too!
- 45
- 31
- 17
- 1