• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #181 : Threading and Loading Times

Hello everyone, this is The French Paradox speaking!

On behalf of the whole Stellaris team, we hope you've had a good summer vacation, with current circumstances and all!

We're all back to work, although not at the office yet. It is going to be a very exciting autumn and winter with a lot of interesting news! We are incredibly excited to be able to share the news with you over the coming weeks and months!

Today I open the first look at the upcoming 2.8 release with some of the technical stuff that we programmers have been working on over summer. The rest of the team will reveal more about the upcoming content and features in the following diaries.

Without further ado, let's talk about threads!

Threads? What threads?

There is a running joke that says fans are always wondering which one will come first: Victoria III or a PDS game using more than one thread.

image (26).png

Don't lie, I know that's how some of you think our big decision meetings go

I’m afraid I’ll have to dispel the myth (again): all PDS games in production today use threads, from EU4 to CK3. Even Stellaris! To better explain the meme and where it comes from, we have to go through a little history. I’m told you guys like history.

For a long time, the software industry relied on “Moore’s Law”, which states that a CPU built in two years will be roughly twice as efficient as one today.
This was especially true in the 90s, when CPUs went from 50 MHz to 1GHz in the span of a decade. The trend continued until 2005 when we reached up to 3.8GHz. And then the clock speed stopped growing. In the 15 years since, the frequency of CPUs has stayed roughly the same.
As it turns out, the laws of physics make it quite inefficient to increase speeds beyond 3-4 GHz. So instead manufacturers went in another direction and started “splitting” their CPUs into several cores and hardware threads. This is why today you’ll look at how many cores your CPU has and won’t spend much time checking the frequency. Moore’s Law is still valid, but, to put it in strategy terms, the CPU industry reached a soft cap while trying to play tall so they changed the meta and started playing wide.

This shift profoundly changed the software industry, as writing code that will run faster on a CPU with a higher speed is trivial: most code will naturally do just that. But making usage of threads and cores is another story. Programs do not magically “split” their work in 2, 4 or 8 to be able to run on several cores simultaneously, it’s up to us programmers to design around that.

Threading nowhere faster

Which brings us back to our games and a concern we keep reading on the forums: “is the game using threads?”. The answer is yes, of course! In fact, we use them so much that we had a critical issue a few releases back where the game would not start on machines with 2 cores or less.

But I suspect the real question is : “are you making efficient usage of threads?”. Then the answer is “it depends”. As I mentioned previously, making efficient use of more cores is a much more complex issue than making use of more clock cycles. In our case, there are two main challenges to overcome when distributing work among threads: sequencing and ordering.

Sequencing issues occur when 2 computations running simultaneously need to access the same data. For example let’s say we are computing the production of 2 pops: a Prikki-Ti and a Blorg. They both access the current energy stockpile, add their energy production to it and write the value back. Depending on the sequence, they could both read the initial value (say 100), add their production (say 12 and 3, the Blorg was having a bad day) and write back. Ideally we want to end up with 115 (100 + 12 + 3). But potentially both would read 100, then compute and overwrite each other ending up with 112 or 103.
The simple way around it is to introduce locks: the Prikki-Ti would “lock” the energy value until it’s done with its computation and has written the new value back, then the Blog would take its turn and add his own. While this solves the problem, it introduces a greater one: the actions are now sequential again, and the benefit of doing them on concurrent threads has been lost. Worse, due to the cost of locking, unlocking and synchronizing, the whole thing will likely take longer than if we simply computed both on the same thread in the first place.

The second issue is ordering, or “order dependency”. Meaning in some cases changing the order of operations changes the outcome. For example let’s say our previous Prikki-Ti and Blorg decide to resolve a dispute in a friendly manner. We know the combat system will process both combatants, but since there are potentially hundreds of combat actions happening, we don’t know which one will happen first. And potentially on 2 different machines the order will differ. For example on the server the Prikki-Ti action will happen first, while on the client the Blorg will act first.

OOS.png

#BlorgShotFirst

On the server the Prikki-Ti action is resolved first, killing the Blorg. The Blorg action that comes after (possibly on another thread) is discarded as dead Blorgs can’t shoot (it’s a scientific fact). The client however distributed the computation in another way (maybe it has more cores than the server) and in his world the Blorg dispatched the Prikki-Ti first, which in turn couldn’t fight back. Then both players get the dreaded “Player is Out of Sync” popup as their realities have diverged.

There are, of course, ways to solve the problem, but they usually require redoing the design in a way that satisfies both constraints. For example in our first case each thread could store the production output of each pop to add to each empire, and then those could be consolidated at the end. In the same fashion our 2 duelists problem could be solved by recording damage immediately, but applying the effects in another phase to eliminate the need for a deterministic order.

As you can imagine, it is much easier to design something with threading in mind rather than retrofitting an existing system for it. If you don’t believe me just look at how much time is spent retrofitting your fleets, I’ll wait.

The good news

This is all nice and good, but what’s in it for you in the next patch, concretely? Well you will be happy to hear that I used some time to apply this to one of the oldest bits of our engine: the files and assets loading system.

For the longest time we have used a 3rd party software to handle this. While it saved us a lot of trouble, it has also turned out to be quite bad at threading. Up to the point that it was sometimes slower with more cores than less, most notably to the locking issues I mentioned before.
In conjunction with a few other optimizations, it has enabled us to drastically reduce the startup time of the game.
I could spend another thousand word explaining why, but I think this video will speak better:


This comparison was done on my home PC, which uses a venerable i7 2600K and an SSD drive. Both were “hot” startups (the game had been launched recently), but in my experiments I found that even on a “cold” start it makes a serious difference.

To achieve the best speedup, you will need to use the new beta DirectX11 rendering engine. Yes, you read correctly: the next patch will also offer an open beta which replaces the old DX9 renderer by a more recent DX11 version that was initially made by our friends at Tantalus for the console edition of Stellaris. While visually identical, using DX11 to render graphics enables a whole range of multi-threading optimizations that are hard or impossible to achieve with DX9. Playing with the old renderer will still net you some nice speedup on startup, the splash screen step should still be much faster, but you’re unlikely to see the progress bar “jump” as it does with DX11 when the game loads the models and textures.

Some of those optimizations have also been applied to newer versions of Clausewitz, and will be part of CK3 on release. Imperator should also benefit from it. It might be possible to also apply it to EU4 and HoI4, but so far my experiments with EU4 haven’t shown a huge speedup like it did for Stellaris and CK3.

If you want to read more technical details about the optimizations that were applied to speedup Stellaris, you can check out the article I recently published on my blog.

And with that I will leave you for now. This will likely be my last dev diary on Stellaris, as next month I will be moving teams to lead the HoI4 programmers. You can consider those optimizations my farewell gift.
This may have been a short time for me on Stellaris but don’t worry: even if I go, Jeff will still be there for you!
 
Last edited:
  • 145Like
  • 38Love
  • 24
  • 6
  • 5Haha
  • 4
Reactions:
Shame I was wondering too how the Linux version would benefit from this change. Have you done any comparison between the native linux build and the new DX11 renderer running under proton with DXVK?

Would we ever be likely to see a Vulkan rendered? Seeing as this could benefit multiple platforms.
I was also wondering how DX11 will affect Stellaris on Linux. And thanks for the update :)
DX11 will obviously not change anything outside of windows.
Our engine team has been experimenting with Vulkan and DX12 for the new games but there are no plans for Stellaris as far as I know.
Will there be more attention for the technical limitations of the game when designing features for future expansions? The game is still a lot slower then it was before the Megacorp dlc. New features can be nice and all but if the late game stays as slow as it currently is, I wouldn't consider it a lot of fun.
We are aware about concerns of game speed. I can take a moment to clarify a bit.
Features are usually not the reason why the game performance varies. The big impact from Megacorp is due to the population system rework in the free patch. Galaxies nowadays have at least 4-5 times more population to simulate now, which comes with a cost.
As for why we looked at startup speed in the first place, first we developers restart the game a _lot_ during development (think something like 20+ times a day) so that's a huge productivity improvement for us. Second as I mentioned this could be shared with other titles, making it valuable for most of PDS and not just the Stellaris team.
 
  • 24
  • 7Like
  • 4
Reactions:
I don't understand people on this forum.

First, they complain that there hasn't been any DD or official communication in months (rightly so I think). People said, if PDX can't show any new content yet, they could at least write about something boring or technical, just to show that they are working on the game and engage with the community.

Now we have a DD doing just that, Mat even sticked around answering questions (thank you @MatRopert ), and now people complain that the topic of this DD was not important for them and the devs should have worked on something else.

Also, keep in mind that Mat said that they are still in home office and I can say from experience that the complete lack of physical meetings and interactions with your colleagues over such a long period of time definitely takes a toll on productivity.

I too was complaining about the lack of communication, so if any devs read this, plaese don't be discouraged by the always unhappy and keep DDs like this coming, until the marketing people give you the "go" for the juicy stuff.
 
  • 26
  • 4Like
  • 2Haha
  • 1
Reactions:
Bon voyage Mat
Info before your welcome party at HoI: any idea why this didn't grant any boost to EU4? And how is IR handing it? I suppose somewhere in between (better than EU4 but worse than CK3 and Stellaris)
I suspect it's because EU4 assets contain a _lot_ of text files in the form of history databases. This is hitting another bottleneck related to the Windows filesystem. Stellaris on the other hand is light on text files, having no history database.
 
  • 26
  • 3Like
  • 2
Reactions:
so is this update on the beta now?
Only on the internal beta accessible to PDX staff and beta testers under NDA, I'm afraid. As I mentioned there's more to 2.8 than just those performance improvements. More to come in the following diaries.
 
  • 17Like
  • 11
  • 2
Reactions:
Great stuff, thank you for your work and the dev diary. Will your work (plus maybe the work by others) and the move to DX11 have other performance benefits besides the startup time of the game?
I have another small optimization I didn't mention which should speed up loading savegames.
 
  • 22Like
  • 6
  • 2
Reactions:
Who's Jeff?

[REDACTED] [REDACTED] [REDACTED] [REDACTED] Jeff [REDACTED]

Edit: Sorry, there's something wrong with my auto-correct today...
 
  • 20Haha
  • 2Like
  • 2
  • 2
Reactions:
Hey @MatRopert . Thank you for posting a dev diary. Quick question sadly unrelated to this one: In your last dev diary you announced a debug_ai console command to get a grasp of what the military AI is "thinking". Unfortunately it never made it to release but it would be very much needed as said military AI is currently not working at all and hasn't been for quite some patches.
Any chance we could get the command for the sake of modders or even better yet an actual AI fix?
Thank you.
That command is definitely in the current release. You need to switch to observer mode and observe an AI nation to see it though.
 
  • 21
  • 1Like
  • 1
Reactions:
OK, so my main fleet is running towards the other end of my space to merge with the single transporter there. OK, that is still dumb but it's not randomly dumb. I abandoned this game months ago and finally have an answer.
I feel like transport handling is probably the weakest bit of the fleet AI today. I have a few beta changes that improve objective selection (avoiding sending fleets on long trips) but it's a bit too early to present and it may not address entirely address that exact issue. That's the trick with AI, while reasonably feasible to make it behave better in one given test case, you always run the risk of making it worse in all the others.
i was wondering if you could make an auto threading script that applies to mods
I'm afraid that's not how it works. If I could make a script that magically transforms some game logic into something thread friendly, I could probably put a bunch of programmers out of a job ;)
it would be cool to see higher interaction between the devteam and modders giving them access to performance metrics. some dos and donts
After playing with both EU4 and Stellaris loading benchmarks, I can give you one right now: try avoiding have lots of small files (like EU4 does with one text file per province). Windows is just terrible at loading those efficiently. My first idea was to "compile" the game text files into one big archive after the first load to speed up subsequent startups, but it would have taken some time to implement safely and wouldn't have benefited Stellaris as much as the other changes I made.
I still keep it in mind for later, but as you may guess there's always more things we could do than time to actually do them.
 
  • 20
  • 1Like
  • 1
Reactions:
So, let me get this straight, see if I understood this correctly:

The programmers on Stellaris have worked all summer just to increase the start-up speed of the game, instead of the speed of the gameplay itself? Or will we see another optimization dev diary explaining in detail the optimizations made to the gameplay itself?

Excuse me for being so blunt, but starting up the game faster, doesn't make the game itself faster.
This isn't the only we worked on, no. This is merely what I toyed with during the slow summer month with a bit of outside help.
Why can't you calculate everything on host's machine and stream data to clients?
This is an idea that comes back from time to time. Thing is, it requires an efficient way of discerning what data changed during an update and then another efficient way to transmit that over the wire. So far we haven't seen a lot of empirical evidence that this would prove more efficient.
 
  • 8Like
  • 7
  • 4
Reactions:
Who's Jeff?
 
  • 14Haha
  • 2Like
  • 2
  • 1
Reactions:
you forget that stellaris fans are the kind of people who can spend a hundred hours on a game and then call it unplayable.
This is far too true.
So in other words: If I've reached an arbitrary limit of played hours ( so that I can be called an experienced player on top ) then it's no longer justified that I express any form of critique ?

How about the thing that critiques refer to the latest version of this game ( V2.7.2 ) whereas the counter for played hours is an accumulated one that includes not only the current, but all previous versions as well ? Wanna tell me that a guy or girl who had played ( for example ) V1.9.1 extensively is as satisfied with V2.7.2, just because he / she had accumulated a zillion + 1 hour of play-time altough said zillion hours are from V1.9.1 whereas just said 1 hour is actually from V2.7.2 ? In theory, this counter for played hours has to be resetted every time this game gets a new update.

It's also a thing that this counter for played hours counts also when you're playing with mods: You two adorn Paradox with borrowed plumes if you think it's Paradox merit that players have accumulated a zillion hours of play-time altough they had actually played a completed, fixed, optimised, balanced, changed and extended ( modded ) version of this game that makes it not only playable, but enjoyable, too.

How about the "wasted" hours of play-time ( that're also included in said counter ) due to bugs that have led to CtDs or due to other bugs that have led to abandoned games ?

Stellaris players always seem to need to find a reason to complain about what the devs are doing, and while I think that Stellaris could certainly be improved (and frankly does need more polish for its existing systems), the way people here go about it is... not okay.
Bad customer, BAD !

We got radio silence, and we complained.
So, you think that 3 months of radio silence is something that should be celebrated instead ?

They said they're looking at performance,
So, you think that a non-committal intention ( I've "never" "ever" heard before on top ) is something that should be celebrated, too ?

We demanded they fix systems, and they did, and we complained (specifically about it not being enough in this case, despite "enough" being subjective).
I'm sorry, but it's the ingame-performance-problems that most people complain about, not such outgame-performance-"problems" how Stellaris wouldn't boot, save and load that fast ( and we're just talking about seconds here on top ). Stellaris is not a Total-War-game in which something like this would make an actual difference ( several "breaks" in a session due to switches between the campaign-map and battles ) since it's still a Paradox-game that runs continuously once you're in your session.

Frankly, every time this community asks for something, when we get it, people complain.
It's rather the other way around: People have to complain in order to get what ( patches ) they want, especially if it's not that profitable as ... for example ... DLCs since otherwise Paradox gets the excuse that everything seems to be fine and dandy.

PDX has been absurdly tolerant of our crap, and seriously deserves a break from it all.
You've never heard about these rules against this so called "toxicity" ? And the 3 months of radio silence ( that qualifies as such a break ) you've already forgotten ?

I don't want to see the Stellaris dev team go the same way a lot of my favourite old things went; abandoned and despised by its creators for the people that came to call themselves fans.
My concern is more the other way around that Paradox continues to throw out DLCs ( as long as said DLCs get selled ) without having the prioritized intention or even the ability to overcome the state of this game.
 
Last edited:
  • 16
  • 2Like
  • 1
Reactions:
Good afternoon,great DD,since in the next patch,you will switch to Directx11 on Windows,i guess the Linux and Mac versions will use OpenGL 4.1,as for Imperator and CK3, or not?
Thanks for any replies about this.
The game will be opt-in on dx11, dx9 will still be the default. There are no changes planned for rendering on Mac and Linux at the moment.
 
  • 15
  • 1Like
  • 1
Reactions:
It's good to see a Dev Diary. That performance boost looks really cool. Will we get the same benefit in game or just during launch.
Those changes are purely about loading times. As I mentioned the in-game speed is limited by completely different factors.
Also will the next Dev Diary have information about content for 2.8, maybe a DLC?
You will hear more about the content of the 2.8 update in the upcoming dev diaries.
Although it'll mostly be about Jeff...
 
  • 11
  • 5
  • 1Like
Reactions: