numeric performance of LW

There's a recent thread in c.l.l. which was started by a nameless troll,
but which was followed up by some lispers which took the challenge and
translated the numeric benchmark into code that was close to C in speed
with both cmucl and sbcl, and Duane Rettig followed up with a beta
for the upcoming ACL 6.2 which made it almost as fast. I tried this
version: http://www.cs.indiana.edu/~bmastenb/misc/almabench.lisp
of the benchmark with LispWorks 4.3.6 (linux), and it wasn't exactly
impressing - it's very evident that the declarations won't stop LW
from consing like hell (se end of message. Actually I think it's
pretty impressive that it is that fast, given that it has allocated
40 GIGABYTES!).

I've tried to do some numeric optimizations in LW before, with little
luck, so I wonder if there is something I miss here, or if LW 
actually is quite bad at this kind of numeric stuff?

Note: I'm doing this mostly out of curiousity, for my actual work LW
has so far been more than fast enough. But who knows, I might be doing
something involving floats next, and I'd like not having to switch to
CMUCL or SBCL for that!

I'd LOVE to have unboxed 32-bit integers, though, I do a lot of crypto
stuff in my app, and that is really slow with bignums.  (Also on my
make-crypto-faster-wishlist is a built-in implementation of
exponential modulus, that would be really cool, it is the only thing
slowing down my homebrewed RSA code)

Here the timings:

LW 4.3.6 on Athlon XP 2200+-based workstation with linux 2.4/debian
unstable:
==============================================

CL-USER 19 > (time (almabench:main))
Timing the evaluation of (ALMABENCH:MAIN)

user time    =    455.280
system time  =      1.420
Elapsed time =   0:10:21
Allocation   = 40555763480 bytes standard / 66231 bytes conses
2 Page faults
Calls to %EVAL    33

SBCL on the same workstation:
=============================

* (time (almabench:main))

Evaluation took:
                 32.962 seconds of real time
                 4.11 seconds of user run time
                 0.08 seconds of system run time
                 0 page faults and
                 0 bytes consed.

-- 
  (espen)

Re: numeric performance of LW

On 10 Mar 2004, at 13:07, Espen Vestre wrote:

> LW 4.3.6 on Athlon XP 2200+-based workstation with linux 2.4/debian
> unstable:
> ==============================================
>
> CL-USER 19 > (time (almabench:main))
> Timing the evaluation of (ALMABENCH:MAIN)
>
> user time    =    455.280
> system time  =      1.420
> Elapsed time =   0:10:21
> Allocation   = 40555763480 bytes standard / 66231 bytes conses
> 2 Page faults
> Calls to %EVAL    33

On a totally different machine, PowerBook G4 1.25 Ghz, 1 Gb RAM, Mac OS 
10.3.2, LWM 4.3.6:

CL-USER 129 >  (time (almabench:main))
Timing the evaluation of (ALMABENCH:MAIN)

user time    =    450.460
system time  =      1.940
Elapsed time =   0:07:44
Allocation   = 40743601840 bytes
0 Page faults
Calls to %EVAL    31

As much as I would like a PowerPC G4 to beat an Athlon, this is not 
possible for this floating point code,
unless that code is heavily vectorized (which I don't think is the case 
here). Furthermore, altough my machine's load was 100%, it was still 
very responsive, which suggests that it wasn't completely occupied. I 
suspect LW was doing memory management most of the time, and not 
floating point calculations.

Sven

Re: numeric performance of LW

Sven Van Caekenberghe <sven@beta9.be> writes:

> occupied. I suspect LW was doing memory management most of the time,
> and not floating point calculations.

Sure. That's the problem. It isn't able to compile the code that
works with those floats and float-vectors into non-allocating code
(LW allocates 40GB while SBCL allocates nothing).
-- 
  (espen)

Re: numeric performance of LW

Espen Vestre wrote:

> I tried this version: http://www.cs.indiana.edu/~bmastenb/misc/almabench.lisp
> of the benchmark with LispWorks 4.3.6 (linux), and it wasn't exactly
> impressing - it's very evident that the declarations won't stop LW
> from consing like hell [...]
> I've tried to do some numeric optimizations in LW before, with little
> luck, so I wonder if there is something I miss here, or if LW actually
> is quite bad at this kind of numeric stuff?

I'm also interested in learning more about this.

Adding #+LISPWORKS(FLOAT 0) to the (DECLAIM (OPTIMIZE ...)) form
seems to speed up the code by about 5% and also to reduce the consing
by 5%, but that's the only thing I can think of to get this code
faster.

A quick browse of the disassembly of PLANETPV shows a few places where
all the computations seem to be done on unboxed floats, but also quite
a few calls to SYSTEM::RAW-FAST-BOX-DOUBLE. It's not clear to me how the
the compiler decides when floats must be boxed.

Regards,

Arthur Lemmens

Re: numeric performance of LW

davef@xanalys.com writes:

> We've now done some work to improve LispWorks performance, adding more
> unboxed float optimizations and the type inferencing to trigger
> them. 

Great work!
Do you think this will make it into the next patch package?
-- 
  (espen)

Re: numeric performance of LW

Unable to parse email body. Email id is 2068