Finally...
.... and along with the user specified byte alignment boundaries issue, there is another little quirk in the current Pentium cache design, that penalizes you when two blocks of memory in concurrent use differ by some multiple of 4K or so. Hence for the fastest possible speedup, you need to be sure that your 16-byte aligned data blocks do not live at addresses that differ by a multiple of 4K.(And finally -- for us DSP types, I would like to see a Lisp option for round-to-zero, avoiding denormal operations, and also some saturating arithmetic. And while you are at it, how about a fast native modulo array indexing option too? heh, heh, heh..  Why not? We fully intend to plant Lisp systems down on the embedded PowerPC's inside some of the latest FPGA chips... I'd much rather work in Lisp that in any other alternative language, wouldn't you?)
 
 
               Dr. David McClain
Chief Technical Officer
Refined Audiometrics Laboratory
4391 N. Camino Ferreo
Tucson, AZ  85750
email: dbm@refined-audiometrics.com
phone: 1.520.390.3995