Finally...

.... and along with the user specified byte alignment boundaries issue, there is another little quirk in the current Pentium cache design, that penalizes you when two blocks of memory in concurrent use differ by some multiple of 4K or so. Hence for the fastest possible speedup, you need to be sure that your 16-byte aligned data blocks do not live at addresses that differ by a multiple of 4K.

(And finally -- for us DSP types, I would like to see a Lisp option for round-to-zero, avoiding denormal operations, and also some saturating arithmetic. And while you are at it, how about a fast native modulo array indexing option too? heh, heh, heh.. Why not? We fully intend to plant Lisp systems down on the embedded PowerPC's inside some of the latest FPGA chips... I'd much rather work in Lisp that in any other alternative language, wouldn't you?)

Dr. David McClain

Chief Technical Officer

Refined Audiometrics Laboratory

4391 N. Camino Ferreo

Tucson, AZ 85750

email: dbm@refined-audiometrics.com

phone: 1.520.390.3995

web: http://www.refined-audiometrics.com