Lisp HUG Maillist Archive

Optimization Question?

Here is a fast implementation of a routine to find the power of 2 equal to or larger than the input value (an integer):

(defun ceiling-pwr2 (n)
  (declare (fixnum n))
  (labels ((iter (n nsh)
             (declare (fixnum n nsh))
             (logior n (ash n nsh))))
    (declare (inline iter))
    (1+ (iter 
         (iter
          (iter
           (iter
            (iter (1- n) -1)
            -2)
           -4)
          -8)
         -16))))


On disassembly for x86 architecture it produces the following:

21D37F62:
       0:      3B25EC330020     cmp   esp, [200033EC]  ; SYSTEM:*%STACK-LIMIT
       6:      7658             jbe   L1
       8:      80FD01           cmpb  ch, 1
      11:      7553             jne   L1
      13:      55               push  ebp
      14:      89E5             move  ebp, esp
      16:      83E804           sub   eax, 4
      19:      89C7             move  edi, eax
      21:      C1FF01           sar   edi, 1
      24:      81E7FCFFFFFF     and   edi, -4
      30:      0BC7             or    eax, edi
      32:      89C7             move  edi, eax
      34:      C1FF02           sar   edi, 2
      37:      81E7FCFFFFFF     and   edi, -4
      43:      0BC7             or    eax, edi
      45:      89C7             move  edi, eax
      47:      C1FF04           sar   edi, 4
      50:      81E7FCFFFFFF     and   edi, -4
      56:      0BC7             or    eax, edi
      58:      89C7             move  edi, eax
      60:      C1FF08           sar   edi, 8
      63:      81E7FCFFFFFF     and   edi, -4
      69:      0BC7             or    eax, edi
      71:      89C7             move  edi, eax
      73:      C1FF10           sar   edi, 10
      76:      81E7FCFFFFFF     and   edi, -4
      82:      0BC7             or    eax, edi
      84:      89C7             move  edi, eax
      86:      83C704           add   edi, 4
      89:      700A             jo    L2
      91:      FD               std   
      92:      89F8             move  eax, edi
      94:      C9               leave 
      95:      C3               ret   
L1:   96:      E8FBDD3DFE       call  20115DC2         ; #<Function RUNTIME:BAD-ARGS-OR-STACK 20115DC2>
L2:  101:      FF7500           push  [ebp]
     104:      83ED04           sub   ebp, 4
     107:      8B7508           move  esi, [ebp+8]
     110:      897504           move  [ebp+4], esi
     113:      C7450804000000   move  [ebp+8], 4
     120:      B502             moveb ch, 2
     122:      C9               leave 
     123:      FF2524880120     jmp   [20018824]       ; SYSTEM::+$INTEGER
     129:      90               nop   

As you can see, it produces nearly ideal code, but one item that is sub-optimal is the incessant (AND EDI,-4) after every SAR (shift) needed to convert the intermediate results back to FIXNUM. But given the nature of this algorithm, that type conversion isn't really needed until the very end.

So the question I have, for any Lisp compiler experts out there, is "how can I coax the compiler to elide these AND operations?"

Dr. David McClain
Chief Technical Officer
Refined Audiometrics Laboratory
4391 N. Camino Ferreo
Tucson, AZ  85750

email: dbm@refined-audiometrics.com
phone: 1.520.390.3995
web: http://refined-audiometrics.com



Re: Optimization Question?

On Tue, 8 Jul 2008 16:16:02 -0700, David McClain <dbm@refined-audiometrics.com> wrote:

> As you can see, it produces nearly ideal code, but one item that is
> sub-optimal is the incessant (AND EDI,-4) after every SAR (shift)
> needed to convert the intermediate results back to FIXNUM. But given
> the nature of this algorithm, that type conversion isn't really
> needed until the very end.

Yes, it would be nice if the compiler would be smart enough to figure
this out without the user having to hack sys:int32-foo code.  I
checked and it seems even with (optimize (hcl:fixnum-safety 0)) it
does that.


Updated at: 2020-12-10 08:42 UTC