• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Really good gen sir.

Certainly, there's going to be a lot of improvement possible at the architectural level of the game, especially considering how old the engine is. However, this is where business realities really do become the limiter.

If you review updates for PDS' various games, you'll see a lot more investment in trying to improve the performance of games at a higher level. This is 'cheaper', and has proven to be an effective way to get short-term improvements in performance on more easily quantifiable issues (as they are exposed and perceivable by the player, and to a greater degree, the modder).

This is also the level where myself and other modders can work.

Your analysis is informative, but I'm not sure it will really influence PDS' methodology for addressing performance concerns with a game.
 
As I said:

In conclusion if you are going to PDXcon please ask the director of stellaris this: Is he expecting this game to last 5-8 more years worth of DLC and if yes, are they going to take a cost hit and improve the base engine to have a playable game from start to finish? I understand that it's impossible to expect a smooth experience with 2 million pops, but it should be able to cope with 50-100k. If they can't do that, then they should scale down the complexity and simplify systems to provide a solution.

My engineering insights if you may:

The pop data are not perused all at the same the time, the game has indeed split them up so even with 2M pops if you calculate a portion each day of the month, that would be 2M /30 = 66.666 pops per game day. If we then consider multithreading x10 that would be 6.666 per GAME DAY not FRAME, which is doable on potato machines without any specialized cache, so the f*up is somehwere else, or the game tries to evaluate pops with interpreted scripts and multiple (dozens of) memory redirections, and not optimized C++ code. This paragraph doesn't include multiplayer considerations and the devs can possibly include memory/computation tradeoffs in their algos to make the engine faster at the expense of ram so that a 16 gig rig wouch chop up a galaxy with 500k pops on each frame. New processors come with 16 cores/32 threads and will enter the budget range any time now so they need to wake up and smell the coffe, because imagine the above calculations at 30 threads. Hello 5-10M pops!

I expect that the engine rewrite the post(s) above imply, will happen for clausewitz 2020+ and stellaris 2. If we see a really dramatic improvement before S2 I will be amazed.
 
  • 1Like
Reactions:
In the "Starnet AI mod is excellent" thread, some people seems to be able to play game in the scale of "1000 star galaxy with 24 regular empires, 4 fallen empires and 2 marauder empires. Everything else is standard except for 1.25x habitable planets with guaranteed habitable worlds turned off and 1.25x hyperlanes."

how is that even possible ? I would LOVE to play such games, but on my powerfull enought PC i can't play bigger than small galaxie, and even there i tend to play empire with only one specie. Only difference seems to be the AI mod, but i'm reluctant to play with mod. (i know i may be wrong, but i'm only discovering this game and want to play it plenty before i do)
 
In the "Starnet AI mod is excellent" thread, some people seems to be able to play game in the scale of "1000 star galaxy with 24 regular empires, 4 fallen empires and 2 marauder empires. Everything else is standard except for 1.25x habitable planets with guaranteed habitable worlds turned off and 1.25x hyperlanes."

how is that even possible ? I would LOVE to play such games, but on my powerfull enought PC i can't play bigger than small galaxie, and even there i tend to play empire with only one specie. Only difference seems to be the AI mod, but i'm reluctant to play with mod. (i know i may be wrong, but i'm only discovering this game and want to play it plenty before i do)

Tahts very subjective. I read the same post and thought the same. It is possible that you wouldnt play this savefile even it runs better and your pc. I have the same problem: A good gaming rig, high performance in many games but stellaris crumbles to death (<- in my subjective view) in lategame. If i see lag in menus and stutter when panning through the galaxy i get very tired of playing further very fast. It isnt fun to interact with game for me so i stop playing.

I would love to play a filled, big Galaxy too. With many Races, Ethics and Dangers. With Empires as strong as me, every war is able to end your existence or freedom.
 
Let's be real, there is no alternative solution. They tried for years and things keep getting worse. You would trade the joy to see AI economy going to shit (it probably was already shit anyway) for good late game performance, if that would depend on me <<Where do I sign?>>.
I would just argue, that their team isn't extremely good, when it comes to performance. A modder fixed their broken AI after updates so well, they incorporated it into the game. I just think they should either expand their expertise or hire somebody, who is specialized in performance issues. They already doing this seeing they hired somebody just for the UI, because they saw, they don't have the necessary experience.
 
  • 1Like
Reactions:
I would just argue, that their team isn't extremely good, when it comes to performance. A modder fixed their broken AI after updates so well, they incorporated it into the game. I just think they should either expand their expertise or hire somebody, who is specialized in performance issues. They already doing this seeing they hired somebody just for the UI, because they saw, they don't have the necessary experience.

The skillset available from their staff is, I think, not my business. You are simplifying a complex business process unfairly, I feel. PDS have and have had specialists in UI and AI, varying as you'd expect in a busy games studio, and they have proven many times that they are 'extremely good' at making successful games in a fairly niche market. Regardless of whatever area of expertise is lacking at any given time and appears to need bolstering, because of some 'issue' arising with one game or another.

tldr - Dismissing the team's competency isn't fair.
 
Last edited:
It seems clear to me that the 'issue' is not the number of pops, although the problem is obviously scaling with number of pops. It feels to me that the actual issue may be one of the following:

1. Too many calls being made to optimise jobs. There should be a limited number of triggers for job optimisation. (Resettle, unemployed, growth, policy change, construction, tech, etc.) All of these would be limited to a single empire, single species, single stratum, or even single planet.
2. The optimise algorithm is possibly too wide, and not properly localised. Result would be unnecessary pops being processed. There are very few events that would require processing more than a few hundred pops.
3. Threshold values, even hidden thresholds, such as a pop that has been subject to an optimisation decision should have a cooldown before being reconsidered. Even if this will result in temporary inefficiencies. This is a last resort though.
4. Deferred processing. Some common events like pop growth, resettle, unemployment, construction, can be deferred to specific days (every X days, or even once a month).
 
Last edited:
The skillset available from their staff is, I think, not my business. You are simplifying a complex business process unfairly, I feel. PDS have and have had specialists in UI and AI, varying as you'd expect in a busy games studio, and they have proven many times that they are 'extremely good' at making successful games in a fairly niche market. Regardless of whatever area of expertise is lacking at any given time and appears to need bolstering, because of some 'issue' arising with one game or another.

tldr - Dismissing the team's competency isn't fair.
I don't think it's unfair. The reason why Paradox is successful of making games in niche markets is partly because the markets actually are niche and they have a good overarching formula for developing their games. That however doesn't mean that competence can't be called into question. Every individual and also every company has holes in their competence and it's something to be expected. The problem is that for Paradox, neither performance nor AI should be a point where there is competence holes in their company. Stellaris isn't the only game that suffer from performance problems, Hearts of Iron IV also suffer from performance problems in the late game.

It might just be the Clausewitz engine not being able to support the new features of the games well or maybe it's a difficult engine to do performance enhancing work on, but if modders can implement performance fixes for the game and the PDX developers can't, then that's a lot of evidence pointing toward the PDX staff having holes in their competence surrounding performance.

Some of the choices that devs made surrounding performance in HoI IV was also highlighted by a weird set of priorities. Suggestions from the community saw that less, but larger divisions would improve performance of the game, but Paradox, probably wanting to stick to historical accuracy and not impose penalties on the AI wanted to keep the divisions small and plenty for the AI in the late game.
 
Last edited:
Some speculations regarding performance and architecture of Stellaris

Preliminaries

Modern days, the most commonly occurred bottleneck in performance is memory latency, i.e. time spent from generating request to memory to fulfill it. Processor caches can help with this, but only if data access pattern is right, so only a small fraction of the data set is accessed at once. CPUs are shit in handling scattered memory reads and writes, and caches help only if the data set layout in memory and access pattern is fine-tuned for performance.

Naturally, when performance is an issue, people aim to streamline algorithms for use of linear scans. CPUs are fine in handling linear scans. Incidental, commonly used object-oriented paradigm is not very compatible with this kind of optimizations. Still, Entity-Component-System approach is the king in AAA projects.

Incidentally, OOP is not a friend of performance. OOP extensively uses dynamic dispatch, i.e. indirect jumps. Indirect jumps are a killer of performance, because indirect jumb causes CPU pipeline stall and might cause cache miss.

Yet another angle for optimization is to split data sets into small chunks, fitting into CPU completely. Stellaris economic model is actually naturally hierarchic (Empires -> Systems -> Planets -> Pops), and there is nothing preventing planets being processed locally as long as all relevant data is collected in compact per-planet arrays. However, if they access some non-local resources during processing, it will be a big bottleneck.
That's all correct and it applies to a lot of games (and software in general). However, this doesn't appear to be a significant factor in Stellaris. When looking at CPU utilization the telling symptom of the problems you've described is high ratio of data cache misses and a lot of stalls in the back end. But in Stellaris it's not the case (assuming that graphics turned down enough to become a negligible cost). It's the front end that is overloaded which is a typical symptom of an interpreter. Perhaps the scripting language used by PDX is the root issue?

...or the game tries to evaluate pops with interpreted scripts and multiple (dozens of) memory redirections, and not optimized C++ code.
That is my guess too.
 
Perhaps the scripting language used by PDX is the root issue?
AFAIK, Stellaris scripts are not a proper scripting language. They allow to tie events to triggers and provide weights and numeric values for predefined slots. They also posted somewhere, that the event handling is properly optimized and isn't an issue.

In short, it should not be a big problem.

back end.
I dunno how you split Stellaris into back-end and front-end. I heard there are problems with GUI layer, but that they are not all that dire and are not a source of late-game lag.
 
It's the front end that is overloaded which is a typical symptom of an interpreter. Perhaps the scripting language used by PDX is the root issue?

AFAIK, Stellaris scripts are not a proper scripting language. They allow to tie events to triggers and provide weights and numeric values for predefined slots. They also posted somewhere, that the event handling is properly optimized and isn't an issue.

There was also a video I saw with Wiz answering some questions after a presentation and he told specifically about the scripting language that a part of the loading time at game start was pre-compiling the scripts. Sure it is not as efficient as could be if made in C++ directly but it is clearly not interpreted either, and given what was dug in various threads by numerous people, I think the problems the game faces are an order of magnitude above what inefficiency those scripts could bring.

I would have given a link to the video itself but I have no clue where I saw this... 99.99% certain tho.
 
I noticed the pop job swapping between two jobs too, it would definitely indicate that all pops are checked everyday, maybe multiple times ... if i judge from the swapping speed.
 
I noticed the pop job swapping between two jobs too, it would definitely indicate that all pops are checked everyday, maybe multiple times ... if i judge from the swapping speed.

I explained this earlier in the thread. The evaluation process has been derailed when a fringe case is hit.

edit - Inevitable "how do you know that?" qualifier. I wrote an AI mod that includes eliminating the fringe case. Which means during the development of the mod, I created test-cases to reproduce the issue, and to prove that it was in fact derailing. It was.

edit 2 - An example. When a version of GAI was integrated, a bunch of its scripted triggers were included. Here's one of the weight clauses for the artisan:

Code:
       modifier = {
           factor = 0
           jobs_work_minerals_goods = yes
       }

Normally the weight clause is evaluated for 25% of pops every 7 days. However this evaluation is derailed when this weight is applied, and all your artisans are sacked. Your artisans become unemployed, and immediately re-evaluate all vacant jobs - they've come off the 25%/7 days track.

And then one of those ex-artisans grabs a vacant job. Now that original weighting that got it sacked is no longer valid, so rather than being back on the 25%/7 days track, the pop evaluates that the artisan job is actually more suitable for it and swaps back.

Repeat ad infinitum until fringe case is broken by some outside factor.

A pop only gets back on the 25%/7days track if it actually holds a job long enough to get back on it.
 
Last edited:
AFAIK, Stellaris scripts are not a proper scripting language. They allow to tie events to triggers and provide weights and numeric values for predefined slots. They also posted somewhere, that the event handling is properly optimized and isn't an issue.

In short, it should not be a big problem.
That logic has to be evaluated somewhere somehow and it was often observed that large mods tend to have much worse performance than vanilla, so associating slow down with scripts seems a reasonable guess.

I dunno how you split Stellaris into back-end and front-end. I heard there are problems with GUI layer, but that they are not all that dire and are not a source of late-game lag.
CPU has hardware counters that record various events (such as instructions retiring, cache misses and a lot of other stuff), so running something like sysprof will show you where the time is being spent in CPU (front-end vs back-end and even more specifically). GUI issues in Stellaris is a separate source of slow down, but it can be practically eliminated with ticks_per_turn setting. I was referring to the apparently unavoidable slowdown (even when running with TPT 10).


There was also a video I saw with Wiz answering some questions after a presentation and he told specifically about the scripting language that a part of the loading time at game start was pre-compiling the scripts. Sure it is not as efficient as could be if made in C++ directly but it is clearly not interpreted either, and given what was dug in various threads by numerous people, I think the problems the game faces are an order of magnitude above what inefficiency those scripts could bring.

I would have given a link to the video itself but I have no clue where I saw this... 99.99% certain tho.
Interpreted and compiled are somewhat loosely defined. The point is that it is not compiled into a code CPU can run. Realistically, looking at the amount of scripts in the game there is no way it could compile all of them into reasonably efficient machine code during those 4-5 seconds it takes to load the game. If it was compiled (as from efficient C++), 10x-100x speedup would likely be enough not to have this thread at all.
 
CPU has hardware counters that record various events
I'm very much aware of that.
You STILL didn't clarified how you split Stellaris into back-end and front-end. Do qualify.

That said, there was dev diary N 149, that posted plots of CPU time sinks and specifically qualified that most time is spent on pops-related calculations.
 
Last edited:
I was reviewing that first thread I linked where @KingAlamar , @UltimateTobi and others did a very thorough examination of lag issues with the game.

Many of their findings suggest the issue is unemployment. Pairing that up with my own work on performance and eliminating job popping, when developing EDAI, I'd agree, and I think I can make the following conclusions from it:

1 - Employed pops will re-evaluate jobs on the schedule suggested in a dev post/changelog/dev diary (needs citation clarifying) for an update: approximately 25% every 7 days.
2 - Unemployed pops aren't on this schedule. They are evaluating jobs continuously, regardless of whether there are actual vacancies (presumably to see if they can displace an employed pop because they are more suitable).

It is #2 that is the cause of most performance problems coming from pops. Whether evaluation of potential clauses or any other clauses of a job, other than the job weightings, are evaluated on a different schedule for unemployed/employed, I'm not sure.

I think a considerable performance improvement would result from re-thinking the creation of unemployment, and those pops method of evaluating jobs.
 
Last edited:
I'm very much aware of that.
You STILL didn't clarified how you split Stellaris into back-end and front-end. Do qualify.
I think you're misunderstanding. I don't split the software into back-end and front-end, only observe what it is utilizing. There are many tools that read the counters and show what hardware (inside of CPU) application is utilizing (I've used perf). In a big picture, front-end is responsible for fetching and decoding the instructions and the back-end is responsible for loading the data and executing the instructions. For the compiled algorithms it's typical to heavily load the back-end (because they may go over large data sets and contain complex instructions that take many cycles to run, but they keep running the same small set of instructions, so they are not demanding on the front-end ). The problems with the data locality you've described in your earlier post would cause processor to stall in the back-end while waiting for the data to load (which would be slow due to data cache misses). Stellaris is stalling a lot in the front-end, but not in the back-end, so it shows that the data layout is not a main issue.

That said, there was dev diary N 149, that posted plots of CPU time sinks and specifically qualified that most time is spent on pops-related calculations.
It didn't really show much details. Is it spending time running the pop-related scripts? Or some pop-related compiled code?
 
Many of their findings suggest the issue is unemployment. Pairing that up with my own work on performance and eliminating job popping, when developing EDAI, I'd agree, and I think I can make the following conclusions from it:

1 - Employed pops will re-evaluate jobs on the schedule suggested in a dev post/changelog/dev diary (needs citation clarifying) for an update: approximately 25% every 7 days.
2 - Unemployed pops aren't on this schedule. They are evaluating jobs continuously, regardless of whether there are actual vacancies (presumably to see if they can displace an employed pop because they are more suitable).

It is #2 that is the cause of most performance problems coming from pops. Whether evaluation of potential clauses or any other clauses of a job, other than the job weightings, are evaluated on a different schedule for unemployed/employed, I'm not sure.
I agree with (1), but there is more to (2). Unemployed pops don't always evaluate jobs every day. By going on slow speed it's easy to observe the situations when there is an open job, an unemployed pop and it takes that pop several days to pick the job. I am not sure if unemployment makes performance worse or not, but unfortunately (for me) I see performance degradation even when I have no unemployment (and it applies even in the case of empty galaxy with no AI). Maybe it's possible to test it by disabling all jobs and comparing performance with the case when all those jobs are enabled?
 
Maybe it's possible to test it by disabling all jobs and comparing performance with the case when all those jobs are enabled?

It is. All the jobs are almost instantly taken by any pop that meets the basic potential clause. In other words, there's no 25%/7 day evaluation. It's happening constantly for unemployed pops.

Unemployed pops don't always evaluate jobs every day.

I'll do some more testing as this surprises me. Perhaps it is a matter of perception, or possibly the amount of unemployment is a factor. Or maybe there's an internal call when job availability changes. I'll see if I can get the weight clauses to pipe a ping out to the log as well.
 
Last edited: