• We will be taking the forums down for scheduled maintenance on Thursday, February 25th 2021 At approximately 4PM CST / Friday, February 26th 2021 Midnight UTC. This downtime is estimated to last up to 2 hours.
  • Crusader Kings III Available Now!

    The realm rejoices as Paradox Interactive announces the launch of Crusader Kings III, the latest entry in the publisher’s grand strategy role-playing game franchise. Advisors may now jockey for positions of influence and adversaries should save their schemes for another day, because on this day Crusader Kings III can be purchased on Steam, the Paradox Store, and other major online retailers.


    Real Strategy Requires Cunning

This thread is more than 5 months old.

It is very likely that it does not need any further discussion and thus bumping it serves no purpose. If you feel it is necessary to make a new reply, you can still do so though.

jpd

Entil'Zha Anla'Shok
Demi Moderator
35 Badges
Apr 19, 2001
7.208
487
  • Europa Universalis IV
  • Hearts of Iron II: Beta
  • Hearts of Iron IV: Expansion Pass
  • Battle for Bosporus
  • Arsenal of Democracy
  • Hearts of Iron IV: Field Marshal
  • Hearts of Iron IV: Colonel
  • Hearts of Iron IV: Cadet
  • Hearts of Iron IV: Death or Dishonor
  • Hearts of Iron IV: Expansion Pass
  • Hearts of Iron IV: La Resistance
"My game just went belly up. What's wrong?"

Just to start off with, no program to date that actually does something significant, is completely free of errors. The games from Paradox are no exception to the rule, and will crash from time to time. What you need to understand here is that today's PC's are highly complex machines with all sorts of combinations of hardware, OS and driver software. It's just not possible (not even for Microsoft itself) to exhaustively test each and every combination. That can only be done (and even then to a certain extend only) on controlled platforms like an Apple Macintosh, a Sony Playstation or a Microsoft XBox.

Now to the failures themselves. Basically, there are three different kinds of failures.

1) At some point in the game, the game time stops advancing and 'pauses' indefinately.
2) All of a sudden, the game stops abruptly, and you end up in the desktop.
3) All of a sudden, the PC locks up completely and becomes unresponsive to everything.

Now, lets examine each case in more detail.

1 - The game clock stops.

This is (most likely) caused by an internal routine that enters a loop (to search for something, for example) and never exits again. When this happens in the game engine part of the game (that part that deals with the game rules, updating the provinces, evaluating the AI, that stuff), then the current game turn never finishes. Contrary to a chess program, where the AI is simply cut off when the time has elapsed, Paradox games extend the game turn until all (AI) processing is complete. If one of the routines during this phase never exits, then in turn the current game turn never ends. And that means that the game clock stops advancing. It should be treated as a game bug, and, when reproducable, should be reported in the appropriate bug forum. Normal Windows behaviour is still possible, like using <alt><tab> to switch to the desktop.

2 - The game terminates suddenly

This is the most common type of failure. It is also known as a CtD, or Crash to Desktop.

What causes it, you may ask? Well, it can be caused by a lot of things, both under and outside of the game's control. What actually happens is standard Windows behaviour when what is known as an exception occurs. Exceptions are (mostly) fatal occurences that prevent the application (our game in this case) from continuing. Other applications also can have these kinds of fatal interruptions. How an application responds is ultimately a choice for the programmer. When the application does nothing, the exception ultimately ends up in the Windows kernel. The kernel has only two ways of dealing with it. It can either show you a dialog box with a cryptic looking text and an <Ok> button, or it presents you with the dreaded BSOD (or blue screen of death). Either way, the application is dead and terminated.

For a game like the Paradox games, that is not a good solution. Letting the application die by the hands of Windows itself is a very bad idea. When that happens, none of the resources (except memory) that were claimed by the game are released. That means that DirectX remains active, sound buffers are still allocated, and so forth. Thus, the game engine contains a very rudimentary solution. The game engine captures the exception, and gives itself a more or less orderly way out. Since the game cannot continue running (that is, unfortunately, the nature of an exception), the only thing it can do is to release all resources, close down DirectX, and quit. Now, it would have been nice if the game actually produced a (user friendly) message box stating why it closed down, but unless you are an expert or programmer, that would not mean much to you, the user.

Now, what are these exceptions? Well, most (if not all) are in fact processor exceptions. The processor, during the course of running the program (and all of it's supporting stuff like the video or sound driver) encounters a state that is illegal or otherwise undefined. The most common of those are:

Access Violation
The processor was instructed to access a piece of information on a memory location that either does not exist, or that the current process has no access to (for example, because it belongs to a different process). It invariably means that there is a bug somewhere, because normally this should never happen. Accessing memory through an uninitialized pointer can cause this, or accessing memory through a pointer that has been released back to Windows previously. Now, the question that remains is whether this is caused by the game code (and thus is a bug in the game) or a driver. If problem is reproducable in the game (as in: load a save game, do the same steps over and over again, and each time it crashes at the same spot), then it's most likely a problem in the game, and should be reported in the appropriate bug forum.

Page error
The processor was instructed to access a piece of memory that is registered as being stored in a swap file, but for some reason the virtual memory manager could not load it (back) into main memory. It can either indicate a corruption of the paging tables (a malfunctioning device driver can cause this, for example), or the system is low on pagable physical RAM.

No memory left
An attempt to allocate a chunk of memory has failed, most likely because available RAM has been exhausted. Having not too much physical RAM, together with an almost full hard disk parition holding the swapfile, can cause this. It can indicate a memory leak in the game or another application, or simply that too much applications are open at the same time.

Invalid handle
An attempt was made to call a Windows function with a handle that (no longer) exists. Most Windows API functions perform a (limited) sanity check on the parameters they receive from the calling application. When something doesn't add up here, this exception can follow. It usually indicates a failure in the application that called the Windows function. Again, if this is reproducable, it should be reported in the appropriate bug forum.

When your OS is Windows 9x, then there may be a second reason for this type of exception. On a Windows 9x system, there is only a limited amount of memory reserved for allocating Windows resources. Those are the things these handles normally refer to. On a Windows 9x system, a fixed amount of two times 64 KB (yes you read that correctly. It's kilobytes) is systemwide set aside for storing resources like icons, mouse cursors, edit boxes, menu bars and what not. Having lots of applications open will quickly exhaust this limited amount of RAM, and that can cause Windows API functions to fail.

Illegal instruction
An attempt was made to execute an illegal (or non existing) processor instruction. Normally, this can never happen. When it does, it usually means the program entered a random piece of memory, thinking that program instructions are stored there. It's usually an indication that some time before this point something has gone wrong, like a processor stack corruption. This can be caused by a function that tries to access a local buffer outside of it's defined bounds. This is, btw., how virusses misuse buffer overflow furnerabilities in the various operating systems.

Privileged instruction
Some processor instructions are reserved for the so called supervisor mode. This is a processor mode, reserved for OS kernel routines and key device drivers. Normal applications (including games) run in user mode. In this mode, the priviliged CPU instructions may not be executed. If a program attempts this anyway, then this exception follows. It usually indicates that program execution has entered a chunk of code that it wasn't supposed to enter. Again, as with the Illegal instruction, stack corruption is the most likely cause.

Stack overflow
This is a simple one. The memory, reserved for the stack, has been exhausted. Usually this happens when a routine calls itself (directly or indirectly) indefinately. It indicates a logic error in the program or a driver.

floating point failures
This is a collection of related exceptions, all linked to floating point operations. Things like division by zero, taking the square root of a negative value, that sort of thing. Usually indicates an error in the program's logic.

3 - The system freezes completely, leaving the PC unusable.

This is a very nasty condition. However, it has very little to do with the game itself, and a lot with the current system configuration. The most common cause of a full system freeze is a condition that has been named 'infinite loop' by Microsoft. This is, in fact, a system failure within the AGP section of your mainboard. Let me explain a bit.

A modern day AGP video card is much more than simply an advanced version of the good old VGA card and it's predecessors. Those were simply dumb frame buffer cards, and all of it's memory contents was manipulated by the CPU. Nowadays, video chips are even more advanced than the main CPU itself. Together with the support chips on the video board they are, in fact, a separate computer all by themselves. Like the main CPU, the video card runs it's own, highly specialized operating system and communicates with the rest of the system via the AGP interface. The communication can be initiated both by the video chip and the main CPU, and the AGP interface in the main board's chipset controls this communication.

When all goes well, you will never notice anything of this. You only see the result, which is a great looking image in your game of choice. However, things can, unfortunately, go horribly wrong. When the video card is not used as a dumb frame buffer card (something that the standard PCI VGA driver does), the main CPU does not manipulate the contents of the frame buffer directly. Instead, it tells the video processor what to do. The video processor then executes those commands. For this to work, the CPU must be able to tell the video chip what to do, and the video chip must be able to accept those commands. The AGP interface is what connects these two subsystems. Now, in order to speed up processing on both sides of the AGP interface, the chipset maintains a command queue, which buffers the various instructions until such time as the video chip is ready to process them. The size of this buffer is actually determined by the chipset that is in use on your motherboard.

So, what happens if the CPU is stuffing commands faster into the AGP pipeline than the video chip can execute them? Well, sooner or later that buffer fills up. When that happens, the CPU will be stalled by the chipset until such time as the video chip has executed it's current command, and retrieves the next pending one from the AGP pipeline. That will free up a slot at the other end. The CPU can now finish putting it's command into the AGP pipeline. The stall is lifted, and the CPU is released by the chipset and can finally continue executing program instructions. If the video chip is slow at processing commands for any reason, then this stalling of the CPU by the main board's chipset will be perceived by you, the user, as a temporary system freeze.

Things can become even worse, if for some reason the video chips stops retrieving commands from the AGP pipeline. Then the temporary CPU stall becomes a permanent one. Since the CPU isn't allowed to execute new program instructions, it cannot respond to keystrokes, mouse clicks and what not. Even the sound card's interrupts won't be honored. That usually causes a sound card to repeat it's most recently loaded sound fragment over and over again.

What can cause such a condition to occur? Well, as said previously, a modern video chip is a highly sophisticated mini computer with it's own operating system. Like Windows, this OS can crash. When it crashes, it won't execute it's program until it gets rebooted. A video reset could do the trick, but it's not easy to let the main CPU issue a reset command if the CPU itself is stalled because the AGP pipeline is filled up, because of the video chip's crash. So a hard system reset or a power cycle is usually the only viable way out.

The most likely cause of a video card's crash is, believe it or not, insufficient power. Like it or not, but modern day PC's are extremely power hungry. What's more, the tolerances for voltage fluctuations are significantly less than a couple of years ago. True, the tolerances are still rated as plus or minus 5%, but on today's AGP x8 boards that is 5% of 0.8 volt, and not 5% of 5 volt which it was a mere 5 years ago. Which means that today's chips are far less forgiving if you have a power supply that is not completely up to the task. As a rule of thumb, a good power supply used in any Pentium 4 or AMD Athlon system which is paired with a modern AGP video board should be able to deliver at least 300 W. Be advised, this is not 300 W input, but 300 W output. Power supplies, when they operate, incur thermal loss. On a good power supply this is as little as 15%. On a bad one, this can be as high as 50%. As a second rule, the power supply must be able to deliver 21 Amps combined on the 3.3 and 5 volt power rails. This is not the same as simply adding up the separate Amps listings of the 3.3 and 5 volt rails. A good power supply will list the combined Amps as a separate rating.

A second cause of a video board crash is overheating. Modern video processors run hot, even hotter than your main processor. And while the main processor gets a big cooling solution, the video chip usually has nothing more than a large heat spreader and a small fan. What's worse, the mounting location of the AGP card itself in most computer casings is so bad, that the tiny little cooling fan cannot suck in (enough) cool air and get rid of the heated air. And this causes the temperature to rise, especially if the chip is working hard, like in a game. When it overheats, a few things can happen. If you're lucky, the card has thermal protection and the chip simply stalls until it's cooled off a bit. Like in the filled up AGP pipeline case, you will perceive this as a momentary systemwide freeze that lasts a couple of seconds. If you're unlucky, the chip starts behaving erraticly or stops alltogether. Again, this causes a permanent system freeze until a hard reset or power cycle.

A third cause for a complete system freeze is the AGP driver software itself. Intel has written the specs for the AGP interface, and these specs allow for the main CPU and the video card to both access main RAM. So, it can happen that both want to access the same location at the same time. Normally this would not be a problem, as only one device can access main memory at any given time, and so either the CPU waits for the video processor or vice versa. However, this does interfere with another portion of the specs, specifically dealing with the CPU side of the communication. Intel has specified that all data transfers should happen in 64 bit chunks, or 8 bytes. Intel also specified that these chunks should always start at multiples of 8 bytes. However, there is a provision that allows access on the uneven 4 byte boundaries, in which case the actual data transfer is split into two separate ones. The first one deals with the lower 4 bytes (scaled up to 8 bytes), and the next one deals with the upper 4 bytes (also scaled to 8 bytes).

While a driver is allowed to do this, it is highly discouraged. The reason why is very simple. As stated before, AGP allows the video processor to initiate memory access. What happens if the video processor wants access to the same memory location as the main CPU is dealing with right now, and it does this precisely between the two split up partial data transfers? Well, the mainboard's chipset refuses the video processor access, because part of the memory transfer concerning precisely that location hasn't finished yet. Allowing the video processor to procede would alter the memory location, and this would corrupt the pending second half of the CPU's data transfer. By the same token, the second half of the CPU's data transfer will be rejected by the video processor. So both data transfers are essentially blocked, and both the video processor and the CPU are stalled, and cannot continue with their respective programs. Again, what we have here is a complete system freeze. This particular variant was the first confirmed case of a system freeze, and because the data transfer requests bounce back and forth between the video chip and the CPU indefinately, Microsoft called this type of problem 'infinite loop'. To date, mostly VIA is guilty of this type of failure in their AGP driver, which is part of the VIA 4in1 driver package, later dubbed Hyperion drivers. That's why owners of VIA chipsets (especially the aging KT133 and KT266 models) are hit with the freeze more often than owners of other types of chipsets.

If you get hit with any of these problems, there are a number of things one can do. Ultimately, all these measures seek to slow down the speed of the video card and/or the AGP interface. Less speed means less heat and less Amps drawn from the power supply.

1) Lower your AGP bus rating. Usually, the BIOS allows for manual selection between x1, x2, x4 or x8.
2) Disable sidebanding. Sidebanding is an AGP pipeline feature that implements a sort of passing lane for AGP commands, in which special AGP commands can bypass pending requests in the regular AGP pipeline buffer. Not all video ships implement this feature as robust as it should.
3) Disable fast writes. Again, this will make your AGP pipeline slower, thus slowing down the video processor with it.
4) If your system came with power and temperature monitoring software, start it before running the game. While the game is running, check the readouts of the monitoring software to see if the temperature rises to critical levels, or the power (especially the 3.3 and 5 volt rails) fluctuate either dangerously close to the 5% rule or even exceed it. If the temperature rises too high, you need better cooling. If the power fluctuates too much, you need a better power supply, or the power regulators on your mainboard cannot cope and run too hot. Power regulators are those medium sized vertically mounted chips with a large heat spreader mounted on the back.

If you are the unlucky owner of a VIA chipset (like me), do not install the VIA 4in1 drivers. Instead, rely on the AGP support that comes default in Windows itself, together with support from the video driver. I am not sure about ATI or other brands, but I know for a fact that NVidia drivers will correctly activate the AGP support for a VIA chipset if the VIA 4in1 drivers are not installed.

Jan Peter
 

Castellon

★Paradox Forum Manager★
Administrator
110 Badges
Mar 12, 2002
43.207
1.566
  • Europa Universalis IV
  • 200k Club
  • 500k Club
  • Hearts of Iron II: Beta
  • Victoria 2 Beta
  • Humble Paradox Bundle
  • Paradox Order
Another one, You are on a roll jpd. :)
*Stickied