I guess, that 64-bit multithreaded engine recreation would do the job with huge reserve.
It was quite a long ago, but i remember, there was some info about optimizations made for AoD and they used to be pretty decent. Therefore i suppose there is not much space for improvements,, although there might still be some fields, where we have some ridiculously unoptimized parts of code (some procedures or functions), which aren't even considered the case...