Re: Decoding an int32 from a part of a sequence

Greg Menke <gregm32768@comcast.net> writes:

> HI all,
>
> I ran a benchmark comparing the DPB approach vs an arithmetic one I've
> used, benchmark & code follows.  The results suggest the arithmetic
> approach may be faster w/ smaller memory footprint in some
> circumstances.  The performance difference follows reversing the order
> of the tests and with dataset lengths up to 2x10^6.  (only one test
> shown below).  Would be interesting if others find a similar result,
> or not.
>
> Lispworks 4.3.7
> Linux megatux 2.6.34.7-66.fc13.i686 #1 SMP
> Intel(R) Core(TM)2 Duo CPU     T9400  @ 2.53GHz
> MemTotal:        3093120 kB
>
> ===========================================
>
> CL-USER 14 > (r)
>
> 500000 iterations dpb uint32
> Timing the evaluation of (loop for i from 0 below *dataset-length*
> with tval = 0 do (incf tval (pb-uint32-ref (aref dataset i) 0)))
>
> user time    =      0.865
> system time  =      0.001
> Elapsed time =   0:00:01
> Allocation   = 84000568 bytes standard / 1331 bytes conses
> 0 Page faults
>
> 500000 iterations gm uint32
> Timing the evaluation of (loop for i from 0 below *dataset-length*
> with tval = 0 do (incf tval (gm-uint32-ref (aref dataset i))))
>
> user time    =      0.498
> system time  =      0.000
> Elapsed time =   0:00:01
> Allocation   = 36000800 bytes standard / 1397 bytes conses
> 0 Page faults
>
> nil
>
>
> ===========================================
>
> ;; (proclaim '(optimize (speed 3) (safety 1) (debug 1)))
>
>
> (defun pb-uint32-ref (vector offset)
>   (the (unsigned-byte 32)
>        (dpb (aref vector offset) (byte 8  24)
>             (dpb (aref vector (1+ offset)) (byte 8  16)
>                  (dpb (aref vector (+ 2 offset)) (byte 8  8)
>                       (aref vector (+ 3 offset)))))))
>
>
> (defun gm-uint32-ref (dblock)
>   (+ (* #x01000000 (aref dblock 0))
>      (* #x00010000 (aref dblock 1))
>      (* #x00000100 (aref dblock 2))
>      (aref dblock 3)) )

It shouldn't make a difference with a not-dumb compiler, but:

 (defun gm-uint32-ref (dblock)
   (+ (ash (aref dblock 0) 24)
      (ash (aref dblock 1) 16)
      (ash (aref dblock 2)  8)
           (aref dblock 3)))


Then, depending on the strength of the compiler on types,  the following
might be even faster (notice that processor can OR faster than ADD
usually, because they don't have to check for carries)

(defun gm-uint32-ref (dblock) 
 (logand #xffffffff
   (logior (ash (aref dblock 0) 24)
           (ash (aref dblock 1) 16)
           (ash (aref dblock 2)  8)
                (aref dblock 3))))


The logand is actually a type declaration, a good compiler should do the
same with it as with (the (unsigned-byte 32) ...).
But again, I'd expect a good compiler to generate the same code from dpb
as from logior+ash.


So there, you have a few more options to benchmark.


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.