• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #181 : Threading and Loading Times

Hello everyone, this is The French Paradox speaking!

On behalf of the whole Stellaris team, we hope you've had a good summer vacation, with current circumstances and all!

We're all back to work, although not at the office yet. It is going to be a very exciting autumn and winter with a lot of interesting news! We are incredibly excited to be able to share the news with you over the coming weeks and months!

Today I open the first look at the upcoming 2.8 release with some of the technical stuff that we programmers have been working on over summer. The rest of the team will reveal more about the upcoming content and features in the following diaries.

Without further ado, let's talk about threads!

Threads? What threads?

There is a running joke that says fans are always wondering which one will come first: Victoria III or a PDS game using more than one thread.

image (26).png

Don't lie, I know that's how some of you think our big decision meetings go

I’m afraid I’ll have to dispel the myth (again): all PDS games in production today use threads, from EU4 to CK3. Even Stellaris! To better explain the meme and where it comes from, we have to go through a little history. I’m told you guys like history.

For a long time, the software industry relied on “Moore’s Law”, which states that a CPU built in two years will be roughly twice as efficient as one today.
This was especially true in the 90s, when CPUs went from 50 MHz to 1GHz in the span of a decade. The trend continued until 2005 when we reached up to 3.8GHz. And then the clock speed stopped growing. In the 15 years since, the frequency of CPUs has stayed roughly the same.
As it turns out, the laws of physics make it quite inefficient to increase speeds beyond 3-4 GHz. So instead manufacturers went in another direction and started “splitting” their CPUs into several cores and hardware threads. This is why today you’ll look at how many cores your CPU has and won’t spend much time checking the frequency. Moore’s Law is still valid, but, to put it in strategy terms, the CPU industry reached a soft cap while trying to play tall so they changed the meta and started playing wide.

This shift profoundly changed the software industry, as writing code that will run faster on a CPU with a higher speed is trivial: most code will naturally do just that. But making usage of threads and cores is another story. Programs do not magically “split” their work in 2, 4 or 8 to be able to run on several cores simultaneously, it’s up to us programmers to design around that.

Threading nowhere faster

Which brings us back to our games and a concern we keep reading on the forums: “is the game using threads?”. The answer is yes, of course! In fact, we use them so much that we had a critical issue a few releases back where the game would not start on machines with 2 cores or less.

But I suspect the real question is : “are you making efficient usage of threads?”. Then the answer is “it depends”. As I mentioned previously, making efficient use of more cores is a much more complex issue than making use of more clock cycles. In our case, there are two main challenges to overcome when distributing work among threads: sequencing and ordering.

Sequencing issues occur when 2 computations running simultaneously need to access the same data. For example let’s say we are computing the production of 2 pops: a Prikki-Ti and a Blorg. They both access the current energy stockpile, add their energy production to it and write the value back. Depending on the sequence, they could both read the initial value (say 100), add their production (say 12 and 3, the Blorg was having a bad day) and write back. Ideally we want to end up with 115 (100 + 12 + 3). But potentially both would read 100, then compute and overwrite each other ending up with 112 or 103.
The simple way around it is to introduce locks: the Prikki-Ti would “lock” the energy value until it’s done with its computation and has written the new value back, then the Blog would take its turn and add his own. While this solves the problem, it introduces a greater one: the actions are now sequential again, and the benefit of doing them on concurrent threads has been lost. Worse, due to the cost of locking, unlocking and synchronizing, the whole thing will likely take longer than if we simply computed both on the same thread in the first place.

The second issue is ordering, or “order dependency”. Meaning in some cases changing the order of operations changes the outcome. For example let’s say our previous Prikki-Ti and Blorg decide to resolve a dispute in a friendly manner. We know the combat system will process both combatants, but since there are potentially hundreds of combat actions happening, we don’t know which one will happen first. And potentially on 2 different machines the order will differ. For example on the server the Prikki-Ti action will happen first, while on the client the Blorg will act first.

OOS.png

#BlorgShotFirst

On the server the Prikki-Ti action is resolved first, killing the Blorg. The Blorg action that comes after (possibly on another thread) is discarded as dead Blorgs can’t shoot (it’s a scientific fact). The client however distributed the computation in another way (maybe it has more cores than the server) and in his world the Blorg dispatched the Prikki-Ti first, which in turn couldn’t fight back. Then both players get the dreaded “Player is Out of Sync” popup as their realities have diverged.

There are, of course, ways to solve the problem, but they usually require redoing the design in a way that satisfies both constraints. For example in our first case each thread could store the production output of each pop to add to each empire, and then those could be consolidated at the end. In the same fashion our 2 duelists problem could be solved by recording damage immediately, but applying the effects in another phase to eliminate the need for a deterministic order.

As you can imagine, it is much easier to design something with threading in mind rather than retrofitting an existing system for it. If you don’t believe me just look at how much time is spent retrofitting your fleets, I’ll wait.

The good news

This is all nice and good, but what’s in it for you in the next patch, concretely? Well you will be happy to hear that I used some time to apply this to one of the oldest bits of our engine: the files and assets loading system.

For the longest time we have used a 3rd party software to handle this. While it saved us a lot of trouble, it has also turned out to be quite bad at threading. Up to the point that it was sometimes slower with more cores than less, most notably to the locking issues I mentioned before.
In conjunction with a few other optimizations, it has enabled us to drastically reduce the startup time of the game.
I could spend another thousand word explaining why, but I think this video will speak better:


This comparison was done on my home PC, which uses a venerable i7 2600K and an SSD drive. Both were “hot” startups (the game had been launched recently), but in my experiments I found that even on a “cold” start it makes a serious difference.

To achieve the best speedup, you will need to use the new beta DirectX11 rendering engine. Yes, you read correctly: the next patch will also offer an open beta which replaces the old DX9 renderer by a more recent DX11 version that was initially made by our friends at Tantalus for the console edition of Stellaris. While visually identical, using DX11 to render graphics enables a whole range of multi-threading optimizations that are hard or impossible to achieve with DX9. Playing with the old renderer will still net you some nice speedup on startup, the splash screen step should still be much faster, but you’re unlikely to see the progress bar “jump” as it does with DX11 when the game loads the models and textures.

Some of those optimizations have also been applied to newer versions of Clausewitz, and will be part of CK3 on release. Imperator should also benefit from it. It might be possible to also apply it to EU4 and HoI4, but so far my experiments with EU4 haven’t shown a huge speedup like it did for Stellaris and CK3.

If you want to read more technical details about the optimizations that were applied to speedup Stellaris, you can check out the article I recently published on my blog.

And with that I will leave you for now. This will likely be my last dev diary on Stellaris, as next month I will be moving teams to lead the HoI4 programmers. You can consider those optimizations my farewell gift.
This may have been a short time for me on Stellaris but don’t worry: even if I go, Jeff will still be there for you!
 
Last edited:
  • 145Like
  • 38Love
  • 24
  • 6
  • 5Haha
  • 4
Reactions:
Okay,
Then I've obviously missed something somewhere.
DX12 and vulkan can both do low level (low overhead) resource and task management.
DX12 and vulkan can do graphics.

What do you mean by simulation thread, and how does that translate to dx11 doing something that opengl and vulkan cannot?

"Simulation thread" in this case refers to game logic simulation, not the graphical simulation. Stuff like "fleet A is moving from here to here, what's the best path". This isn't to say that some of these things cannot be written into "GPU code" but most of them do not require that high a degree of parallel calculations.
 
"Simulation thread" in this case refers to game logic simulation, not the graphical simulation. Stuff like "fleet A is moving from here to here, what's the best path". This isn't to say that some of these things cannot be written into "GPU code" but most of them do not require that high a degree of parallel calculations.

Sure, but this other guy's post was saying it can't be in vulkan and only in dx11.

My question was less about the details of code and simulation, and more about what capabilities dx11 has that opengl and vulkan doesn't.
That pdx would avoid using it,
 
Sure, but this other guy's post was saying it can't be in vulkan and only in dx11.

My question was less about the details of code and simulation, and more about what capabilities dx11 has that opengl and vulkan doesn't.
That pdx would avoid using it,
I never said it couldn't be done in vulkan, only that doing so wouldn't deliver any benefits. And they are not "avoiding" using it, but instead the act of moving to another API comes with a cost in terms of programmers that need to understand and code for it, when they are already well acquainted with using DX. You risk introducing a lot of bugs.

Why fix what isn't broke or slow? Why waste manhours switching to a new API for nearly zero overall benefit? Yes Vulkan can do certain compute tasks, but it doesn't magically make it easier to parallelize things. That is still dependant on the code, the problem, and the abilities of the programmers. If they can't parallelize the game logic thread as is, switching to Vulkan isn't going to do anything to alleviate it.
 
  • 4Like
  • 2
Reactions:
I never said it couldn't be done in vulkan, only that doing so wouldn't deliver any benefits. And they are not "avoiding" using it, but instead the act of moving to another API comes with a cost in terms of programmers that need to understand and code for it, when they are already well acquainted with using DX. You risk introducing a lot of bugs.

Why fix what isn't broke or slow? Why waste manhours switching to a new API for nearly zero overall benefit? Yes Vulkan can do certain compute tasks, but it doesn't magically make it easier to parallelize things. That is still dependant on the code, the problem, and the abilities of the programmers. If they can't parallelize the game logic thread as is, switching to Vulkan isn't going to do anything to alleviate it.

What you said was changing the API would be meaningless. But that's fine, I accept your argument that changing the API is problematic at each major step.
Programmers to learn, implementation (game coding), and retroactive change later on (patches+dlc).

However you've made a few errors.
Last I checked, DX API is windows exclusive, PDX titles are not. They're cross platform.
OpenGL and Vulkan are cross platform.

Vulkan is supposed to be much more accessible than opengl, and much faster. In the same kinds of ways DX12 is better and faster than 11 and 10 and 9.

Vulkan benchmarks have better results when compared to dx12. Maybe not more than 15%, but more than none.

While changing dx9 to 10, to 11, may not seem like major changes in software code; when contrasted against opengl/vulkan.
There are valid and nontrivial reasons for embracing vulkan.

But maybe it is as you say, the cost at the moment is much higher than perceived benefits from changing API to vulkan.

However my original question was separate to all of that.
What I was asking was- what can dx11 and 12 do that vulkan cannot?
 
Last edited:
  • 2
Reactions:
Vulkan benchmarks have better results when compared to dx12.
Source? What about across differing manufacturers of graphics cards? (NVIDIA, ATI, etc)

What I was asking was- what can dx11 and 12 do that vulkan cannot?
Aside from running natively on non-Windows environments, What can vulkan do that dx11 and 12 cannot?
 
Last edited:
However my original question was separate to all of that.
What I was asking was- what can dx11 and 12 do that vulkan cannot?
Why Stellaris uses DX instead of Vulkan is not a question of features or performance, but rather the fact that by the time Vulkan was first released, Stellaris had already been a few years in development and was released like 3 months later. Switching to a different rendering API after the game is completed would be huge resource waste without much benefit.
 
  • 4
  • 1Like
Reactions:

Isn't that interesting, other people don't get asked for sources, but I do.


What about across differing manufacturers of graphics cards? (NVIDIA, ATI, etc)

Good question. If we look at data like that; Should we also look at data for dx12 to see how they compare?


Aside from running natively on non-Windows environments, What can vulkan do that dx11 and 12 cannot?
I have no idea what vulkan can and cannot do, nor do I know what dx 11, 10, 12, and 9 can and cannot do.

Now, lets look at my original question.

...Has anyone asked why pdx is using dx rendering instead of vulkan?

My immediate response was

Because the game is bound by the simulation thread, not the graphics thread. Switching to vulkan is meaningless.

Now, looking at that answer, in direct relevance to that question.
Is it unfair to conclude KaiserTom was implying a difference of features? That dx11 offered something that vulkan did not?

Also that "Switching to vulkan is meaningless" suggesting again that vulkan is inferior, or at most equal.
Noone is challenging these assertions about dx 11, simulation threads, and whether switching to vulkan is meaningless.
Yet when I ask for further information about what each thing means, I get challenged and disagreements.


As far as sources go, I don't have an all encompassing broad spectrum quality list. But I do have things that are easy to find.

 
Last edited:
  • 5
Reactions:
Why Stellaris uses DX instead of Vulkan is not a question of features or performance, but rather the fact that by the time Vulkan was first released, Stellaris had already been a few years in development and was released like 3 months later. Switching to a different rendering API after the game is completed would be huge resource waste without much benefit.

While it's good to see people contributing constructively, my previous post acknowledged the cost of adoption.

I wasn't trying to trivialise that cost, nor was I ignorant of when vulkan was released- implied by my comments that vulkan is supposed to be more accessible than opengl.
Hinting that I knew opengl was technically better, but it was apparently very clunky and difficult to use. Compared to microsoft directx which might be technically inferior, but it's the mainstream thing, on the mainstream OS everybody knows, with plenty of thorough documentation and after-sales support.

I don't know enough about opengl vs vulkan codebase to know how they differ. Whether transitioning opengl to vulkan is an upgrade path similar to upgrading dx9 to 11.
However here, you've merely moved the goal-post of my original question.
Stellaris had already been a few years in development and was released like 3 months later.

If pdx have multi-platform games. Why do they use a microsoft exclusive API?


It's also bewildering to see people disagreeing with me, when most of the objections so far have been over things I've already acknowledged, or about things
that I haven't said.
 
  • 4
  • 1
Reactions:
@ProtoformX Dude, you need to stop trying to push a graphics API that the game does not need. It will not accomplish much of anything and instead bring about many costs. Yes, the graphics threads may run a bit faster, but that means nothing when the entire game is bound by the game logic thread that Vulkan will not fix.

Is it unfair to conclude KaiserTom was implying a difference of features? That dx11 offered something that vulkan did not?

Also that "Switching to vulkan is meaningless" suggesting again that vulkan is inferior, or at most equal.
It does not matter whether Vulkan is better, equal, or inferior, because even if it is better, it will not make any performance difference in the game because the game speed is heavily bound by things unrelated to the graphics API; the game logic. Switching to Vulkan will not make the game logic run faster. It may make the individual graphics threads run faster, but that means nothing. Those threads are all ran on seperate cores from the game logic thread. The game is not bound by the graphics, at all. Stellaris is not a graphically intensive game unless you mod it to be. And even then you are still far more bound by the game logic. It does not matter if the rabbit gets across the finish line faster if you still need to wait for the tortoise to finish before continuing on.

When you posture that Vulkan should be used, it's up to you to demonstrate how that will make a meaningful difference in the game, considering developing it is not free. We are asserting that it is not meaningful to add at all. The game will not run any faster by switching to Vulkan. Hell, it won't run any faster switching to DX11 to be frank. But honestly, that was a free upgrade because they already needed a DX11 renderer for consoles. Migrating that over is quite easy since 99% of the work is done already.
 
  • 10
  • 1
Reactions:
@SephirothWS ??
I'm failing to see a question here. You still also haven't given me any source information that I requested. A link to a Google Search isn't enough. Yeah, I could perform the search myself. But when you state information and make it sound like fact,
Vulkan benchmarks have better results when compared to dx12.
then I want to know where you pull your information from.

I am going to have to agree with @KaiserTom here though, you should not be pushing a graphics API the game doesn't necessarily need unless you plan on going to work for Paradox to implement the change as doing so is quite the expensive expenditure for little to no benefit as the game itself is bound by game logic, and not the graphics API.
 
Last edited:
I'm sorry to be the bearer of bad news, the performance problem won't be solved with incremental improvements, you either attack the problem at it's core and solve it or go around it by changing the design of the game. So it looks like that from what you're writiing here, the performance thread will reach 140 pages or more in the next 5 years (now sits at 71 pages). Thanks but no thanks, can't be bothered...
The main problems with that are that (1) fixing everything completely is not realistically possible, (2) fixing most things and ensuring that the fixes don't come with bugs and other problems could be very time consuming, and (3) the content designers, artists, and various other people can't just sit around all day doing nothing.

As it stands, I never saw any of your customers complaining about loading times and even if the game took double the amount of time to load right now, without the beta patch I wouldn't care at all. What most of your players care about is how slow it plays near mid game/end game - and forum admins had to even isolate all those posts to a megathread, cause it was all over the place.
Me, as well as a fair number of other modders, and even some players that have a lot of mods, have been somewhat annoyed at very long load times. No joke, I've had it take more than 20 minutes. And typical load times are between 5 and 10 minutes. If this was reduced by just 50%, my playtime when modding could drop by up to 2/3.

As per the DD and the basics of threading are concerned, here's a free tip for you guys: have you considered having a local planetary buffer/stockpile? In that way you don't have to lock on the global empire stockpile for each pop, and you can parallelize all planets on n threads, with no lock contention.
Having pops lock global resources/values is a retarded/morronic design and coding decision, but the team that did that was working originally on a different and simplified specification and not the *thing* that came out to be known as the megacorp population system. So of course I mean no offense to them.
Please quote the exact location where they said they're locking the stockpile.

Considering that you're the same guy who disagreed with my findings in a repeatable, mostly scientific test, in favour of your findings from a single unscientific test, and then proceeded to tell me to add more pops to my test which uses 10,000 pops, I'd advise you to keep away from commenting on/about technical stuff.
 
Last edited:
  • 3
Reactions:
Me, as well as a fair number of other modders, and even some players that have a lot of mods, have been somewhat annoyed at very long load times. No joke, I've had it take more than 20 minutes. And typical load times are between 5 and 10 minutes. If this was reduced by just 50%, my playtime when modding could drop by up to 2/3.
Can confirm. I can go take a shit, come back, and it's half loaded.
 
  • 6Haha
Reactions:
Well, I'll just say that I tried testing a few mods a few days ago (3 mods, 1 of which was Unofficial Hive DLC, + 10 UI mods that I always use), and it took about 12 minutes to load from clicking start on the launcher.

And the error log said that the game failed to find the files for the Unofficial Hive DLC mod.
 
@MatRopert A really quick note on that CppCon talk: You can avoid those potentially multiply-loaded texture files by simply maintaining a separate set, with its own lock, that you put those file names in and test for their presence in before you load them. It's unlikely to be a win significant enough to be worth implementing, but in pure performance terms it seems straightforwardly correct: You get rid of any duplicated work, and you avoid contention since you're not taking the same lock twice for each file you load.