I have a couple of suggestions for what might be causing this based on my very, very limited experience with Unity. But I am by no means an expert, so they might be laughably wrong. If anyone points out where I've gone wrong, then I will at least have learned something.
My first observation is that Unity divides games into Scenes (other engines might call these Worlds or Levels). In a platformer, each level might be a separate Scene. In a sports sim, the 3D stadium where you see matches might be one Scene and the 2D screen where you buy and sell players might be a different Scene. In an RPG, when you step out of the Town Square into the Palace, you are probably moving from one Scene into another. Each Scene has a collection of things ("Objects"): they could be football players, swords and shields, whatever. In today's 3D games, each Scene will generally be associated with a 3D space where Unity's graphical side will keep track of the location of each object and report any collisions between them. In the examples above, you will often leave one Scene to go into another, and so it's important that they close down 'nicely'. When the RPG character leaves the Town Square, the graphics memory needs to be thoroughly cleaned of all the old objects so that you don't find muddy puddles inside the Palace. But there's continuity to take care of as well. If you had a loaf of bread in your rucksack in the Town Square, that needs to be noted so that you still have a loaf of bread in your rucksack in the Palace, even though it is in fact a completely new environment and the loaf of bread has been freshly recreated in the new 3D space from the game's files.
This is relevant because each C:S city is one, gargantuan Scene. There can't be many other Unity games with such large and complex Scenes. There are huge numbers of buildings, Cims, etc. But Unity isn't just going to say "blow this for a game of soldiers" and tell the OS to wipe the lot. As far as Unity knows, you might be closing this Scene and moving to another one, and so every Object has to be checked and destroyed carefully. So when you exit C:S, closing it all down nicely is a huge task (or more accurately, millions of small tasks). A lot of data is going to be going through your CPU.
My second observation is that Unity uses the C# programming language, which is a 'garbage collected' language. This means (the next bit is simplified hugely) that the C:S programmers don't directly control the computer's memory. If you demolish a house in C:S, the memory used by that house isn't immediately freed. The house's ID number is just marked with an asterisk in the game's list of house Objects. Every so often (very roughly 30 to 60 seconds), the garbage collector will check all the lists, and if there's an asterisk it will delete the house Object from the list and either re-use the memory or hand it back to the OS. So our lovely friends in Finland don't need to worry about the details of that and can focus on more important things.
But there is a trade-off. In a traditional game, the memory would have been freed up when the house was demolished. Now we have two phases, demolishing the house in-game and then later tidying up the house Object memory. But when you close the Scene, you are suddenly putting an asterisk on every item of every list and when the garbage collector activates it's going to have hundreds of thousands of Objects to tidy up, and at the very same moment that Unity is already force-feeding your CPU with the Scene-closing routines. Because C:S is soooooo memory-hungry and has so many Objects, just processing the lists of Objects must be a major task in itself. And again, the garbage collector has no knowledge that you're closing the game; it doesn't know the difference between a death wave that removes lots of Cim objects and closing to desktop. It doesn't even know that you're playing a game; as far as the garbage collector knows it could be handling bank transactions or phone calls to the emergency services. All it sees are a lot of asterisks on lists and it must process every one carefully and correctly. And that just takes time.