fast 64 bit arithmetic

Hi guys,

Any way to do fast 64 bit arithmetic like the INT32 functions do for 32 bit?

(disassemble
 (compile 'test
          #'(lambda (a b)
              (declare (optimize (safety 0) (debug 0) (speed 3) (space
0) (compilation-speed 0)(float 0) (hcl:fixnum-safety 0))
                       (type fixnum a b))
              (+ (ash a 56) (ash b 56)))))
4060007C74:
       0:      4157             push  r15
       2:      55               push  rbp
       3:      4889E5           moveq rbp, rsp
       6:      4989DF           moveq r15, rbx
       9:      48C1E738         shlq  rdi, 38
      13:      48C1E638         shlq  rsi, 38
      17:      4803FE           addq  rdi, rsi
      20:      B901000000       move  ecx, 1
      25:      4889EC           moveq rsp, rbp
      28:      5D               pop   rbp
      29:      415F             pop   r15
      31:      C3               ret  

That's cool but if course it does not work if your result is bigger than
64 bits:
(format t "~x" (test 8 8))=> -1000000000000000

So let's be less aggressive for the optimizations.
(disassemble
 (compile 'test
          #'(lambda (a b)
              (declare (optimize (safety 0) (debug 0) (speed 3) (space
0) (compilation-speed 0)
                                 (float 0) #+nil(hcl:fixnum-safety 0))
                       (type fixnum a b))
              (+ (ash a 56) (ash b 56)))))
406000551C:
       0:      4155             push  r13
       2:      4153             push  r11
       4:      4157             push  r15
       6:      55               push  rbp
       7:      4889E5           moveq rbp, rsp
      10:      4989DF           moveq r15, rbx
      13:      4989F3           moveq r11, rsi
      16:      4D8B4FD4         moveq r9, [r15-2C]     ;
system::ash-left$fixnum$fixnum
      20:      498B590F         moveq rbx, [r9+F]
      24:      B902000000       move  ecx, 2
      29:      BEC0010000       move  esi, 1C0
      34:      FFD3             call  rbx
      36:      4989FD           moveq r13, rdi
      39:      4D8B4FD4         moveq r9, [r15-2C]     ;
system::ash-left$fixnum$fixnum
      43:      498B590F         moveq rbx, [r9+F]
      47:      B902000000       move  ecx, 2
      52:      4C89DF           moveq rdi, r11
      55:      BEC0010000       move  esi, 1C0
      60:      FFD3             call  rbx
      62:      4889FE           moveq rsi, rdi
      65:      4D89E9           moveq r9, r13
      68:      4C0BCE           orq   r9, rsi
      71:      41F6C107         testb r9b, 7
      75:      7518             jne   L1
      77:      4C89EF           moveq rdi, r13
      80:      4803FE           addq  rdi, rsi
      83:      7010             jo    L1
      85:      B901000000       move  ecx, 1
      90:      4889EC           moveq rsp, rbp
      93:      5D               pop   rbp
      94:      415F             pop   r15
      96:      415B             pop   r11
      98:      415D             pop   r13
     100:      C3               ret  
L1:  101:      4C89EF           moveq rdi, r13
     104:      4889EC           moveq rsp, rbp
     107:      5D               pop   rbp
     108:      415F             pop   r15
     110:      415B             pop   r11
     112:      415D             pop   r13
     114:      498B9E070E0000   moveq rbx, [r14+E07]   ;
system::*%+$any-code
     121:      FFE3             jmp   rbx
     123:      90               nop  

(format t "~x" (test 8 8))=> 1000000000000000

OK of course that works but it'svery far from being even reasonably fast.

So what we need is something like INT32 which will do the computation in
64 bitarithmeticand then convert the resultslike this:

(disassemble
 (compile 'test
          #'(lambda (a b)
              (declare (optimize (safety 0) (debug 0) (speed 3) (space
0) (compilation-speed 0)
                                 (float 0)(hcl:fixnum-safety 0))
                       (type fixnum a b))
              (sys:int32+ (sys:int32<< a 24) (sys:int32<< b 24)))))
406000B2A4:
       0:      4157             push  r15
       2:      55               push  rbp
       3:      4889E5           moveq rbp, rsp
       6:      4989DF           moveq r15, rbx
       9:      48C1FF03         sarq  rdi, 3
      13:      48C1E718         shlq  rdi, 18
      17:      48C1FE03         sarq  rsi, 3
      21:      48C1E618         shlq  rsi, 18
      25:      4803FE           addq  rdi, rsi
      28:      4D8B4FD4         moveq r9, [r15-2C]     ;
system::%%raw-integer-to-int32
      32:      498B590F         moveq rbx, [r9+F]
      36:      B901000000       move  ecx, 1
      41:      FFD3             call  rbx
      43:      4889EC           moveq rsp, rbp
      46:      5D               pop   rbp
      47:      415F             pop   r15
      49:      C3               ret  

Here it's less optimized than the first one but much better than the
second one.

Any ideas or tricks we could use? I even tried to use (unsigned-byte 64)
types but with no luck so far.

On a similar topic how could I read an int32 or an int64 from a simple
array of (unsigned byte 8)

(disassemble
 (compile 'read-uint32
          #'(lambda (frame idx)
              (declare (optimize (safety 0) (debug 0) (speed 3) (space
0) (compilation-speed 0)
                                 (float 0)(hcl:fixnum-safety 0))
                       (type (simple-array (unsigned-byte 8) (1024)) frame)
                       (type fixnum idx))
              (+ (aref frame (incf idx) (aref frame (incf idx)) 8) (ash
(aref frame (incf idx)) 16) (ash (aref frame (incf idx)) 24)))))
406002EAB4:
       0:      4157             push  r15
       2:      55               push  rbp
       3:      4889E5           moveq rbp, rsp
       6:      4989DF           moveq r15, rbx
       9:      4883C608         addq  rsi, 8
      13:      4989F1           moveq r9, rsi
      16:      49C1F903         sarq  r9, 3
      20:      4E0FB64C0F05     movzbq r9, [rdi+5+r9]
      26:      49C1E103         shlq  r9, 3
      30:      4883C608         addq  rsi, 8
      34:      4989F0           moveq r8, rsi
      37:      49C1F803         sarq  r8, 3
      41:      4E0FB6440705     movzbq r8, [rdi+5+r8]
      47:      49C1E003         shlq  r8, 3
      51:      49C1E008         shlq  r8, 8
      55:      4D03C8           addq  r9, r8
      58:      4883C608         addq  rsi, 8
      62:      4989F0           moveq r8, rsi
      65:      49C1F803         sarq  r8, 3
      69:      4E0FB6440705     movzbq r8, [rdi+5+r8]
      75:      49C1E003         shlq  r8, 3
      79:      49C1E010         shlq  r8, 10
      83:      4D03C8           addq  r9, r8
      86:      4989F0           moveq r8, rsi
      89:      4983C008         addq  r8, 8
      93:      49C1F803         sarq  r8, 3
      97:      4E0FB6440705     movzbq r8, [rdi+5+r8]
     103:      49C1E003         shlq  r8, 3
     107:      49C1E018         shlq  r8, 18
     111:      4B8D3C01         leaq  rdi, [r9+r8]
     115:      B901000000       move  ecx, 1
     120:      4889EC           moveq rsp, rbp
     123:      5D               pop   rbp
     124:      415F             pop   r15
     126:      C3               ret  

Well it does what it is supposed to do but it's somewhat overkilland
thus not fast.Anybody has tried to alias a FLI pointer to do a C like cast?

Marc

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html