• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Victoria 3 - Dev Diary #76 - Performance

16_9.jpg

Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end.

For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.

What is performance​

It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable.

What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.

If you thought 1.1 was slow you should have seen the game a year before release…
DD1.png

Content of a tick​

Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
DD2.png

Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.

Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
DD3.png

As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.

Optick capture of a weekly tick around 1890 in a release build.
DD4.png

There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.

All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.

The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.

When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.

First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.

The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.

Closer look at the UpdateEmployment tick task.
DD5.png

Modifiers are slow​

One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.

However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.

Closer look at the RecalculateModifierNodes tick task.
DD6.png

Countries come in all sizes​

A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.

One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.

(And this is also why the game runs slower during your world conquest playthroughs!)

Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
DD7.png

Improvements in 1.2​

I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.

If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.

New construction queue interface.
DD8.png

To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.

Side by side comparison of old vs new construction queue.
DD9.gif

DD10.gif

One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.

Graphics update before/after optimization.
DD11.png

As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.

Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.

In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.

Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
DD12.png

That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.
 
  • 120Like
  • 36Love
  • 21
  • 4
Reactions:
This comment is reserved by the Community Team for gathering Dev Responses in, for ease of reading.


Oglesby said:


I am curious, do the daily tasks all occur on the same tick or are you able to spread those over the 4 ticks to further load balance?

It sounds like the daily and weekly ticks occur on the same tick, would it be possible to at least shift those greater than tickly to occur on different ticks? So it would at most ever be tickly + something and not possibly be from tickly to yearly in one tick.
The tasks that are marked as daily all occur on the same tick, but we do something similar to what you describe but in a slightly different way. Some of our tickly tasks are actually daily tasks in disguise. What we do is we spread out the objects to update over several ticks. So say we do something with pops, one quarter of them would be processed in the morning, one quarter during midday, and so on.


Olaf Trygvasson said:


So I am not super into coding, so maybe this isn’t viable. But you mention how the weekly ticks take longer because there are far more things to calculate. Would it be possible to make many of these things biweekly, calculating on alternate weeks, to lighten the load on each individual weekly tick? For example, say X, Y, and Z are all weekly checks, you keep X as weekly and make Y and Z biweekly on alternating weeks, so that week 1 would check X and Y, week 2 would check X and Z, week 3 would check X and Y, etc. Maybe this would require a ton of work on your part and yield negligible improvements, and wouldn’t be worth it, I don’t know
From a technical perspective this would be possible. But it would change the game mechanics so any such change would likely require a lot of consideration from our designers before doing it. As mentioned above though, we do something similar for some updates.



Crashes said:


If employment is such an expensive calculation, would it make sense to shift employment (and perhaps pop need as well) to a monthly calculation? Most employment related factors don't shift that often on a weekly basis. You lose granularity but the improvement in performance would be noticeable from cutting the number of times it needs to run by 4.
This would be a radical change to the game mechanics. I'm not sure our game designer would appreciate me doing this change as it would affect the balance of basically every part of the game :)

It would also rather counter-inuitively risk making performance worse. The employment update is one of the main places where pops gets removed from the game. So doing it less frequently could mean we end up with more pops hanging around for longer leading to other parts of the update potentially taking longer.




FaIconere said:


Oh great mr.dev, do you perhaps have a fancy graph that tracks the amount of pops over time? I have a general sense that the amount of pops grow but no idea of the extent
I don't have a graph right now, but in general the game tends to start out at between 20k and 25k pops and usually end up somewhere around 100k pops towards the end in our automated tests. Since this is in observer mode the numbers will of course be somewhat different with a human player.

UBERWASP said:


Even tho gpu does gpu things, gpu things require cpu oversight so to speak. So how much impact does graphical effects such as particles and „building profitting” or „sol decreasing” effects we see poping in and out on top of cities impact cpu load? Do they make a tangible impact on cpu and if so can we get a graphics setting to off them?

I’ve never been a big fan of some of those popping fx anyway :D
The vfx (currently) don't affect the cpu side that much, they mostly put strain on the gpu. The most cpu-intensive things for the graphics update is changes to the dynamic terrain and to the city layouts in reaction to what happens in the game. Both of these two are somewhat expensive and we do some things both to parallelize it but also to limit how much is allowed to update visually in a single frame. You can notice for example when starting a new game. If you are very quick to zoom in you'll see cities updating for a short while.

Crashes said:


If employment is such an expensive calculation, would it make sense to shift employment (and perhaps pop need as well) to a monthly calculation? Most employment related factors don't shift that often on a weekly basis. You lose granularity but the improvement in performance would be noticeable from cutting the number of times it needs to run by 4.
In addition to @egladil's answer I'll say I'm not generally a fan of the solution of putting heavier tasks on a less frequent update cycle. It effectively "hides" the performance problem by just making an annoying freeze happen more rarely. The trick is to get the execution time of the most frequent tick interval down low enough that we can seamlessly update frames in between every tick. So the more common strategy (like Emil also points out in another answer) is to increase the frequency of the updates but process fewer entities on each update.

For example, pops only process their growth once per month, but since this is a pretty heavy calculation that can also involve a lot of pop split/merge operations, we spread it out such that we only process 1/120th of all pops each tick. I'd love to do the same for the employment update, but it wouldn't be possible to break it up by pop since every possible pop in a state could potentially take any job offered in that same state, and we have to evaluate who would be eligible for every job in some priority sequence. We could potentially split it up by state, but if state employment updates on a different schedule than the market then potentially changes in the economy might not have been done in time to properly inform employment, etcetera. So it's possible to make further optimizations there, but it's complex and potentially bug-prone, and as much a design challenge as a technical challenge.


Oglesby said:


Could employment be sliced by nations or markets and then spread over multiple ticks (assuming this isn't already done to spread out the weight of the task)?
The issue would be syncing it up with the economy update, and since that needs* to happen on the same tick everywhere in the world in order to get trade to work properly, it's tricky.

* "needs" is questionable here - it's more a matter of it being safer to do it at the same time, to pre-empt bugs and side effects. We could absolutely look into spreading out employment updates on a different schedule and account for any side effects that may arise from it, but it's a task of unknown complexity at the moment.


MathyM said:


I’ll once again ask you to repost these as new threads so they properly appear on the news feed at the top of the forum. Changing visibility/moving to a different forum section makes it too easy to miss the updates for Vic 3 in particular.
We're aware and looking into it! There's some workflow kinks we have to sort out first.


Nikolai said:


I can confirm the public beta has seen some changes to the better. As I have mentioned before, I have a good PC so I didn't get affected as bad as many. But the effect still were there. In my current Belgium game (up to 1920s now) the performance is only slightly down, and the problem is not tick speed. At all. Sometimes the UI is slow for some reason, but that is not consistent. :)
There can still be individual UI panels or such that needs to be tweaked and we do this as we go along but generally tickspeed is favoured :)


Chief of Staff said:


You know, this is funny, because, four months ago, I suggested adding statistics showing global population (along with global GDP) for Victoria 3. It would probably not be helpful, though, because such statistics does not tell you how many pops as a number of distinct groups/units. Still, it would be pretty cool to see the graph accompanying the number for global population to see whether it doubles, triples, or even quadruples. :p

Now that I think about it, I somehow attained a population of like 100 million or even more for Great Britain in a pre-1.1 game (played through to the end by 1936) even though, in real life, it only has 60 (almost 70) million today. :oops: Just to be clear, I came up with that number for the isles, excluding all other states outside the metropolis so to avoid these states skewing the number.

EDIT: By the way, population given for Great Britain in 1931-1951 (within which the end-game year is in) is over 46 million, but I should also note that the comparison is somewhat complicated because all the states in what is now the Republic of Ireland is still part of Great Britain in my game (mainly because I am too soft to actively oppress them. ;)).

Click to expand...
We do log this for all our automated tests. But since it's after office hours in Sweden I'm not logged in to my work machine and so I can't produce a graph of it right this moment. I'll see if I can dig one out tomorrow :)


Aloraand said:


Two questions,

1. Could have an option to turn off some visual parts of the game, like the queue, that hinder performance?
2. What are some parts of the game that had to be turned down due to the physical restraints on performance? For instance, connected subtrees of the market disconnected from the main node seem too be doable in O(f(n) a(n)), where n is the inverse ackerman function (using the Union find datastructure) and f(n) is the cost function of the market calculation. To my theory oriented brain, this seems feasible, but I suspect there will be practical implications.
1. If there are UI elements severely lowering framerate (like construction queue or specific lenses) we need to simply fix them.
2. Anything pathfinding is dodgy, an old iteration of Markets had each state trace market/infrastructure connections back to its market capital. Was dropped primarily for other reasons but would've been iffy for performance too.


Fawr said:


This was a good read.

I'm seeing some good performance numbers in that last graph. However that graph need the end suggests performance is still slowing down considerably during the game. Visually it looks like 1870s performance in 1.2 is like 1855 performance in 1.1

To me that suggests now is not the time to add extra goods, extra countries, extra cultures, extra religions, or go easy on pop merging. Is that your view too? Do you do any metrics on what performance impact would come with design changes like that before they are done? Or is performance a more reactive role?


I'm interested in this too, but for the map graphics (eg urban centers visually growing). For me having an option to turn this off would be a quick performance win as I don't care about that kind of thing. I've seen mods which turn the actual drawing parts off, but I assume they can't change the way the calculations needed beforehand are done.

Of course, if those kinds of calculations are now put in a quiet thread and done gradually (say 1/360th per tick, so they never end up on the critical path) then it's not important anymore.

Click to expand...
We have to consider new features from a performance perspective, yes. If the initial design would be problematic we can offer a compromise code solution that achieves virtually the same thing but at much lower costs. Good dialogue between design and programmers is key!
We can't micromanage everything ahead of time though so we still need to do performance passes after feature implementations etc.

1.2 should already have some hefty improvements to map graphics performance but there are more things we could do. There are several stakeholders involved though, Artists have to agree to any changes so we don't kneecap the visuals completely for the sake of performance.


EwaldvonKleist said:


As a member of the <10 Standard of Living interest group which can only buy standard computers, I appreciate the work on performance very much.

Question: Is the highest tick speed still limited or can it tick as quickly as the computer finishes its calculations?
Speed 5 is unlimited/as fast as your computer goes


miov01 said:


Setting POP_MERGE_MAX_WORKFORCE to 30000 instead of 30 and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION to 1 instead of 4 makes the game run quite a lot faster from my tests.

1.2.2 takes 6:30 minutes to reach 1838 in observation mode.

These settings bring it down to 5:43 minutes. And setting automatic saves to half a year instead of monthly makes it go down further to 5:26.

That is an improvement of 16.5% overall. Pretty good in my books.

Though setting the first define to 300k instead of 30k seems to reverse speed gains.

Think I'll post this as a mod later.

Click to expand...
Doing this will likely tank endgame performance as that is where the pop numbers become a real issue.


Notme1 said:


Fun question: Why after deleting all pops, buildings and nations from map game won't go faster than 100 weeks per second? ;^)
100 weeks * 7 days/week * 4 ticks/day, that's 2800 ticks per second. I think that should be good enough ;)


Psieye said:


Is this GraphViz using the `dot` layout engine? If yes, how did you get the edges to bend around nodes to avoid overlap?
Its our own implementation of the Sugiyama graph layout algorithm based on a bunch of scientific papers. Its the same thing we use for the tech trees. The bending around nodes is achieved by inserting virtual invisible nodes at every layer along the edges.

This wikipedia article explains some of the concepts used: https://en.wikipedia.org/wiki/Layered_graph_drawing


EsenTaishi said:


Any plan of an M1 native version? I remember this was discussed in early development...
Yes, this is still on the table. We're still waiting for the engine team for some of the things needed so I can't give you a timeline, but it is being worked on.
 
  • 4Like
  • 1
Reactions:
At risk of being perceived as an irritant, and in the very unlikely case that the development team isn't already aware, it appears that the "next music" button in the new music player added into the Beta does not seem to listen to the "is_valid" triggers in the music files.
 
  • 9
  • 3Like
  • 1
Reactions:
I don't care about performance because I have NASA computer but ok good news
i am happy for you but the better the performance the more room for improvements.
I find performance problems particularly tedious to solve so thanks to the team to keep an eye on it. Even though it doesnt make like the best DD ever :D
 
  • 20Like
  • 1
Reactions:
I am very happy about this huge performance boost, but it makes me wonder if we will see something like that ever again. The more you try to optimise, the harder it gets. I wonder how many months or years we need for the community to start whining again because of speed problems percieved by them.
 
  • 4Like
  • 1
Reactions:
Great to hear Strategic Objectives next week, I am very curious about it!

For future warfare updates, please consider making armies and HQs more manageable, making us able to merge/split/relocate armies
 
  • 3
  • 1
Reactions:
For those not in Discord I can share this post from user Petrogust Syndicates that confirms from the point of view of the beta testers:

1677165239727.png


PS: I hope I can test myself the game this weekend!
 
  • 10Love
  • 3Like
  • 2
  • 1
Reactions:
Yeah, construction queue is very fast now.

I wonder how performance would fare if you left world running for 200 - 300 years - way past game timeframe.
 
  • 3
  • 2Like
  • 1Love
Reactions:
Glad this means that CK3 players get a performance boost too.
 
  • 5Like
Reactions:
I am curious, do the daily tasks all occur on the same tick or are you able to spread those over the 4 ticks to further load balance?

It sounds like the daily and weekly ticks occur on the same tick, would it be possible to at least shift those greater than tickly to occur on different ticks? So it would at most ever be tickly + something and not possibly be from tickly to yearly in one tick.
 
So I am not super into coding, so maybe this isn’t viable. But you mention how the weekly ticks take longer because there are far more things to calculate. Would it be possible to make many of these things biweekly, calculating on alternate weeks, to lighten the load on each individual weekly tick? For example, say X, Y, and Z are all weekly checks, you keep X as weekly and make Y and Z biweekly on alternating weeks, so that week 1 would check X and Y, week 2 would check X and Z, week 3 would check X and Y, etc. Maybe this would require a ton of work on your part and yield negligible improvements, and wouldn’t be worth it, I don’t know
 
If employment is such an expensive calculation, would it make sense to shift employment (and perhaps pop need as well) to a monthly calculation? Most employment related factors don't shift that often on a weekly basis. You lose granularity but the improvement in performance would be noticeable from cutting the number of times it needs to run by 4.
 
  • 8Like
Reactions:
I will say this is the first game that have made me say that 16gb of ram is not enough. The game almost acted as if it had a memory leak, slowly using more ram, and slowing down until its restarted. It's much better with 32gb of ram.
 
  • 7
  • 3Like
  • 1
Reactions:
I am curious, do the daily tasks all occur on the same tick or are you able to spread those over the 4 ticks to further load balance?

It sounds like the daily and weekly ticks occur on the same tick, would it be possible to at least shift those greater than tickly to occur on different ticks? So it would at most ever be tickly + something and not possibly be from tickly to yearly in one tick.
The first would lead to inaccurate results. For example the market prices update on the weekly ticks but if you update only part of the market on Monday and another part on Wednesday then on Tuesday the market will do something incorrect.

The second doesn't really make sense. You have ticks every six hours but no ticks between this. Every four ticks you get a daily tick no matter what you do even if you offset the hour values.
 
  • 2
Reactions:
I am curious, do the daily tasks all occur on the same tick or are you able to spread those over the 4 ticks to further load balance?

It sounds like the daily and weekly ticks occur on the same tick, would it be possible to at least shift those greater than tickly to occur on different ticks? So it would at most ever be tickly + something and not possibly be from tickly to yearly in one tick.
The tasks that are marked as daily all occur on the same tick, but we do something similar to what you describe but in a slightly different way. Some of our tickly tasks are actually daily tasks in disguise. What we do is we spread out the objects to update over several ticks. So say we do something with pops, one quarter of them would be processed in the morning, one quarter during midday, and so on.

So I am not super into coding, so maybe this isn’t viable. But you mention how the weekly ticks take longer because there are far more things to calculate. Would it be possible to make many of these things biweekly, calculating on alternate weeks, to lighten the load on each individual weekly tick? For example, say X, Y, and Z are all weekly checks, you keep X as weekly and make Y and Z biweekly on alternating weeks, so that week 1 would check X and Y, week 2 would check X and Z, week 3 would check X and Y, etc. Maybe this would require a ton of work on your part and yield negligible improvements, and wouldn’t be worth it, I don’t know
From a technical perspective this would be possible. But it would change the game mechanics so any such change would likely require a lot of consideration from our designers before doing it. As mentioned above though, we do something similar for some updates.


If employment is such an expensive calculation, would it make sense to shift employment (and perhaps pop need as well) to a monthly calculation? Most employment related factors don't shift that often on a weekly basis. You lose granularity but the improvement in performance would be noticeable from cutting the number of times it needs to run by 4.
This would be a radical change to the game mechanics. I'm not sure our game designer would appreciate me doing this change as it would affect the balance of basically every part of the game :)

It would also rather counter-inuitively risk making performance worse. The employment update is one of the main places where pops gets removed from the game. So doing it less frequently could mean we end up with more pops hanging around for longer leading to other parts of the update potentially taking longer.
 
  • 29
  • 11Like
Reactions:
Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end.

For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.

What is performance​

It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable.

What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.

If you thought 1.1 was slow you should have seen the game a year before release…
View attachment 949724

Content of a tick​

Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
View attachment 949725

Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.

Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
View attachment 949726

As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.

Optick capture of a weekly tick around 1890 in a release build.
View attachment 949727

There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.

All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.

The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.

When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.

First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.

The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.

Closer look at the UpdateEmployment tick task.
View attachment 949728

Modifiers are slow​

One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.

However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.

Closer look at the RecalculateModifierNodes tick task.
View attachment 949729

Countries come in all sizes​

A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.

One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.

(And this is also why the game runs slower during your world conquest playthroughs!)

Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
View attachment 949730

Improvements in 1.2​

I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.

If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.

New construction queue interface.
View attachment 949731
To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.

Side by side comparison of old vs new construction queue.
View attachment 949732
View attachment 949733
One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.

Graphics update before/after optimization.
View attachment 949734

As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.

Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.

In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.

Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
View attachment 949735

That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.
Oh great mr.dev, do you perhaps have a fancy graph that tracks the amount of pops over time? I have a general sense that the amount of pops grow but no idea of the extent
 
  • 5
  • 1Like
Reactions: