Re: Optimization riddle
Interesting...
So note if you change foo-2 to optimize debug 0 and speed 2 as well as your existing declarations, it’s much faster (but still allocates). The code produced between foo-2 and this foo-3 is mostly the same, but if we look at the core of the loop we can see this:
foo-2: (the problematic code -safety 2, fixnum safety 0)
L4: 92: 4C896DF0 moveq [rbp-10], r13
96: 4D89DC moveq r12, r11
99: 4C89E0 moveq rax, r12
102: 480345F0 addq rax, [rbp-10]
106: 488945F8 moveq [rbp-8], rax
110: 4C8B5DF8 moveq r11, [rbp-8]
114: 4C896DE0 moveq [rbp-20], r13
118: 488B45E0 moveq rax, [rbp-20]
122: 4883C008 addq rax, 8
126: 488945D8 moveq [rbp-28], rax
130: 4C8B6DD8 moveq r13, [rbp-28]
134: 4C896DE8 moveq [rbp-18], r13
138: 488B45E8 moveq rax, [rbp-18]
142: 483B45D0 cmpq rax, [rbp-30]
146: 7FA6 jg L1
148: EBC0 jmp L3
L5
foo-3: (speed optimized - safety 2, fixnum safety 0, debug 0, speed 2)
L4: 75: 4D03DD addq r11, r13
78: 4983C508 addq r13, 8
82: 4D3BEC cmpq r13, r12
85: 7FD2 jg L1
87: EBEC jmp L3
So the other code is doing some kind of test and indirect referencing that’s adding to the cost. Note there are no indirect references in the speed optimized version (e.g., [rbp-18]) I see some indirect references in foo-1, but not as many in the loop (though I’m only guessing L4-L6 is the loop, based on block L6 ending with the same instruction pair. (the indirect codes are using some offset to a register rather than just doing direct register arithmetic and is presumably much slower as it may require a non-cache access to memory).
foo-1: (safety 2)
L4: 113: 4D89EC moveq r12, r13
116: 4D89D9 moveq r9, r11
119: 4D0BCC orq r9, r12
122: 41F6C107 testb r9b, 7
126: 7554 jne L9
128: 4C89D8 moveq rax, r11
131: 4903C4 addq rax, r12
134: 488945F0 moveq [rbp-10], rax
138: 7048 jo L9
L5: 140: 4C8B5DF0 moveq r11, [rbp-10]
144: 41F6C507 testb r13b, 7
148: 7553 jne L10
150: 4C89E8 moveq rax, r13
153: 4883C008 addq rax, 8
157: 488945F8 moveq [rbp-8], rax
161: 7046 jo L10
L6: 163: 4C8B6DF8 moveq r13, [rbp-8]
167: 4C89EF moveq rdi, r13
170: 488B75E8 moveq rsi, [rbp-18]
174: 4989F9 moveq r9, rdi
177: 4C0BCE orq r9, rsi
180: 41F6C107 testb r9b, 7
184: 7546 jne L11
186: 483BFE cmpq rdi, rsi
189: 7F89 jg L1
191: EBAA jmp L3
So it’s possible the code generator isn’t worrying that the indirect references are slower because you didn’t ask it to optimize speed… not very intuitive I’ll admit.
—
Bradford W. Miller
Sr. Cognition and Decision Scientist
Co-Chair Industrial Internet Consortium Technology Working Group
GE Global Research
Distributed Intelligent Systems Lab
518 387-4271
millerb@ge.com
http://www.geglobalresearch.com/blog/brilliant-machines
> On Jun 11, 2015, at 11:51 AM, Edi Weitz <edi@weitz.de> wrote:
>
>
> Hello everybody,
>
> I'm stumped by the following results and I hope that someone can
> explain to me what I'm missing. This is the 64-bit version of LWW
> 7.0.
>
> I have the following two functions. (And, yes, I know there's a
> better way to compute the sum of the first n integers. This is just
> an example.)
>
> (defun foo-1 (n)
> (loop for i from 0 to n
> sum i))
>
> (defun foo-2 (n)
> (declare (optimize (hcl:fixnum-safety 0) (safety 2)))
> (loop for i from 0 to n
> sum i))
>
> If I disassemble FOO-2, I see essentially only pure machine code while
> FOO-1 contains calls to functions like
> SYSTEM::*%WRONG-NUMBER-OF-ARGUMENTS-STUB, SYSTEM::*%+$ANY-CODE, or
> SYSTEM::*%>$ANY-CODE which are obviously able to cope with arbitrary
> numbers and not only fixnums. Simple-minded me expects that FOO-2
> should be significantly faster. But it's the other way around! I
> consistently get results like this:
>
> CL-USER 14 > (time (foo-1 1000000000))
> Timing the evaluation of (FOO-1 1000000000)
>
> User time = 2.698
> System time = 0.000
> Elapsed time = 2.699
> Allocation = 50024 bytes
> 0 Page faults
> 500000000500000000
>
> CL-USER 15 > (time (foo-2 1000000000))
> Timing the evaluation of (FOO-2 1000000000)
>
> User time = 4.461
> System time = 0.000
> Elapsed time = 4.446
> Allocation = 52176 bytes
> 0 Page faults
> 500000000500000000
>
> What's happening?
>
> Thanks,
> Edi.
>
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lispworks.com_support_lisp-2Dhug.html&d=AwIBaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=yTegpHW1zevhgxHhUFFvwA&m=6N6FeE9kDHQlglCyqYC02ADV7hY55rf7V8oIfWKboEk&s=qeaTdAzG2x7NLWncinc2XS4EAMVgyP3sYWH4FmgP_qs&e=
>
_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html