I will try to give a somewhat comprehensible answer to the question why the AI does flee this much and how one might fix such behaviour:
First one needs to understand how the EU4 combat AI works:
Every stack (i.e. a certain number of units merged together) assigns a number to each province and the lower the number is the more the stack wants to go to the province. This number depends on a variety of factors like distance and enemy troops and can be seen by using the command mapmode aieval in the console. I do not want to go into detail on all the available factors but rather focus on the one most important for this behaviour, which is threat or acceptable balance:
The idea is the following again the AI assigns to every a number to every province which now depends on the estimated strength of enemy and allied regiments nearby and if the number is too high (i.e. the enemy is estimated to be stronger than the allied troops) then the stack does not want to move to the province in almost any circumstance. This completely negates all other modifiers and can completely block the AI from moving troops in a certain area if outmatched.
In detail this works like this: Every stack gets assigned two numbers to every province. First the threat which will be discussed in detail later and second a number depending on how good/bad the province is for a fight depending on terrain bonuses etc. (The higher this is the better) Then it divides the threat by the second number and compares it to a defined number ACCEPTABLE_BALANCE_DEFAULT which is normally set to 1.7 and is one of the few modifiable aspects of EU4 combat. If the Quotient is larger than ACCEPTABLE_BALANCE_DEFAULT^(-1) the AI deems the province too dangerous to enter.
In this example Ramazan fights against the Ottomans and while it would like to siege down Anatolia it is too scared to do so (the provinces in red) for example in one province the AI estimates the Ottomans to be 23.10 times stronger than Ramazan and therefore wants to avoid combat at any cost and rather go to the other side of the strait to siege down the Ottoman hinterlands if it can do so.
Now we want to look at the way the threat is calculated and which problems arise from this:
As with the AI evaluation there is also a mapmode which allows to see the assigned threat which is mapmode aithreateval. From there one can deduce the formula used to calculate threat (with the necessary mathematical abilities at least ) and I want to give a quick summary of its most important aspects:
1) The formula calculates two numbers, enemy strength and allied strength and divides them,
2) The AI simulates a scenario where all troops except the selected stack are in the same position they are in right now and the selected stack is in the province for which the threat is calculated,
3) It takes into account all enemy/allied troops which are 4 or less provinces away,
4) It depends on troop quality, distance to the province and number of troops and nothing else ( most importantly not on combat width, flanking range and distribution of unit types),
5) The troop quality estimation is decent (it has some exploitable weaknesses as bankrupt horde players might know) it could be improved slightly but I wont go into detail here,
6) The distance function is a factor 10^(1-D/4) where D is the distance in number of provinces,
7) The formula is quadratic in the number of troops.
Some things to take away from these points:
2) The AI has a tendency to build small stacks, these are heavily disfavored by the formula as a large stack is far more influential when moved to another position. Therefore larger stacks would benefit the AI greatly.
4) EU4 land combat depends heavily on the omitted variables especially CW which I will demonstrate later.
7) This is the most problematic part, as EU4 land combat is only in rare circumstances quadratic in troop number, which the following examples will show:
Example 1 (CW infinite, Flanking range 1)
3 Infantry vs 1 Infantry (Strength 1)
Here quadratic is indeed the right estimate as the three regiments deal 3 times the damage and can suffer 3 times the casualties which yields 3*3=3^2=9.
With a bit of mathematical knowledge this yields: A version of the EU4 combat system with infinite CW, infinite flanking range and no backrow is indeed quadratic in the number of troops, which implies that the currently used strength estimate is true for such a system.
Example 2 (CW infinite, Flanking range 1)
20 Infantry vs 1 Infantry (Strength 1)
Here the 20 Infantry can sustain 20 times the damage but only 3 can attack which yields a strength comparison of 3*20=60 opposed to the 20^2=400 the current formula yields which leads to a vast overestimation of the higher troop number ( a recurring theme with the current formula as it indeed is an upper bound of the correct strength estimation).
Example 3 (CW 20, Flanking range arbitrary)
40 Infantry vs 20 Infantry (Strength 1)
If you know EU4 combat well enough you know that this battle will be an exact draw (due to the backline morale damage). Note that here the strength is sub-linear in the amount of troops which is far away from quadratic. (Strength = 1 vs 2^2 = 4 with the current formula)
Example 4 (CW 20, Flanking range arbitrary) Perfect Reinforcement
40 Infantry vs 20 Infantry (Strength 1)
Here the correct formula should be linear as the 40 are exactly twice as strong as the 20 also far away from quadratic. (Strength = 2 vs 2^2 = 4 with the current formula)
With a bit of mathematical knowledge this yields: A version of the EU4 combat system with infinite CW, no flanking range and no backrow is indeed linear in the number of troops, which is vastly different from the AI strength estimation.
These examples show that EU4 land combat is highly sub-quadratic (sometimes even sub-linear) and therefore a quadratic strength estimate (as currently implemented) vastly overestimates the strength of troop quantity which is at least to some degree responsible for the AI behaviour when outnumbered. But what can be changed about it in terms of a different strength evaluation formula. Here are two possibilities:
1) Keep the power law formula but make the (currently quadratic) exponent adjustable by defines. This would allow extensive testing of different versions, especially by AI modders (and there are a lot of very enthusiastic people out there who could do a lot of testing), or
2) Write a more accurate formula depending on CW (and possibly other factors). But this is not necessarily an easy task on the one hand on the theoretical side (this needs a post on its own and probably some valuable input of players very familiar with the combat system [for example the multiplayer community]) but also on the side of implementation ( the more parameters are involved the more clunky and probably slow everything gets).
Keep in mind that there are some exceptions to AI behaviour (for example if there are already ongoing battles that need to be reinforced) and that I omitted a decent amount of details in this post to keep it somewhat compact and readable. I hope I was moderately successful in doing the latter and this is helpful in understanding the problematic a bit better.