Exactly how it does work is not at all clear from the historical statistics.
One thing important to note is that zero ORG units do not count for stacking. They have 'left the fight', effectively, so they have no bearing on the reserve/front line/formation depth equation. Sure - they would do for a time after reaching zero ORG, but that would be an overcomplication for little gain, I think.
I thoroughly endorse your first point Bal, but have to take issue with the second.
The issue of the effects of "defeated" units is not treated at all well in the original research and therefore, unsurprisingly, is not treated at all well in the model used here either. Zero org units might very well leave the fight in that they cease to contribute "combat effectiveness" as the model describes it, but they do not immediately and efficiently leave the zone of combat. Instead, their lack of organisation tends to result in them clogging avenues of tactical advance or retreat, disrupting communication, and exerting sometimes very powerful morale effects on the units remaining in combat or coming up to it. In this way, I would suggest, they reduce the effectiveness of those units - perhaps particularly in the reserve/front line/formation depth aspect of the equation.
The fact is that there is precious little that is unequivocally clear in the analysis of historical combat statistics; and if we stick ourselves with a model that confines itself only to those issues that are, then I fear we'll end up with a very poor and dislocated gaming experience. Where the data is silent you use your best guess to fill the gaps until more comprehensive interpretation is possible. Rather than not counting for stacking purposes, therefore, I'd argue that zero org units should attract a higher stacking weight to reflect the dysfunctional effects they produce.
Extending that argument to HistoryMan's request for stacking weights reflecting the differences between small and large divisions, there is certainly enough descriptive evidence to suggest that they did not perform equally in combat - and in his analysis of combat effectiveness Dupoy specifically attempts to control for this. Given that, I think it better to add stacking weights knowing that there's a strong likelihood that they won't entirely capture historical experience, than to continue to omit them in the knowledge that this will certainly misrepresent the historical case.