Re: Help optimizing array iteration
On third thought I think things could be faster if your
image, sprite and mask arrays were actually static lisp
arrays of type (simple-array (unsigned-byte 8) (*)). If
you look at the documentation:
http://www.lispworks.com/documentation/lw44/FLI/html/fli-120.htm#pgfId-1112879
one sees that a Lisp array can be passed to Foreign functions directly
if they are allocated in the static area. The Lisp compiler seems to
do a better job copying between arrays if they are actually Lisp
arrays. The overhead to call the FLI to copy bytes between foreign
arrays is not optimized. When you pass to OpenGL use
fli:with-dynamic-lisp-array-pointer.
So if you replace you opengl arrays from being fli allocated arrays
to Lisp arrays then you can create functions like (still not sure about
the ma increment though):
(defun cocoa-build-sprite (sprite-array image-array mask-array size)
"Merges BGR Image and Alpha Mask arrays into a BGRA Sprite array"
(declare (optimize (speed 3) (safety 0)) (type fixnum size)
(type (simple-array (unsigned-byte 8) (*))
sprite-array image-array mask-array))
(prog ((sb 0) (sg 1) (sr 2) (sa 3)
(ib 0) (ig 1) (ir 2)
(ma 0))
(declare (type fixnum sb sg sr sa ib ig ir ma))
replace-bytes
(when (zerop size) (return-from cocoa-build-sprite sprite-array))
(setf (aref sprite-array sb) (aref image-array ib)
(aref sprite-array sg) (aref image-array ig)
(aref sprite-array sr) (aref image-array ir)
(aref sprite-array sa) (aref mask-array ma))
(incf sb 4) (incf sg 4) (incf sr 4) (incf sa 4)
(incf ib 3) (incf ig 3) (incf ir 3)
(incf ma 1)
(decf size)
(go replace-bytes)))
Disassembling on x86 reveals that there are no longer any function
calls.
CL-USER 2 > (disassemble 'cocoa-build-sprite)
206A82D2:
0: 55 push ebp
1: 89E5 move ebp, esp
3: 83EC10 sub esp, 10
6: C7042445100000 move [esp], 1045
13: 50 push eax
14: 50 push eax
15: 50 push eax
16: 50 push eax
17: 50 push eax
18: 50 push eax
19: 50 push eax
20: 50 push eax
21: 8B7D10 move edi, [ebp+10]
24: 8B5D0C move ebx, [ebp+C]
27: C745EC00000000 move [ebp-14], 0
34: C745E800010000 move [ebp-18], 100
41: C745E400020000 move [ebp-1C], 200
48: C745E000030000 move [ebp-20], 300
55: C745DC00000000 move [ebp-24], 0
62: C745D800010000 move [ebp-28], 100
69: C745D400020000 move [ebp-2C], 200
76: 33D2 xor edx, edx
L1: 78: 83F800 cmp eax, 0
81: 7507 jne L2
83: 89F8 move eax, edi
85: FD std
86: C9 leave
87: C20C00 ret C
L2: 90: 8B4DEC move ecx, [ebp-14]
93: 8D7310 lea esi, [ebx+10]
96: 8975FC move [ebp-4], esi
99: 8B75DC move esi, [ebp-24]
102: 8975F8 move [ebp-8], esi
105: C17DF808 sar [ebp-8], 8
109: 8B75FC move esi, [ebp-4]
112: 0375F8 add esi, [ebp-8]
115: 0FB636 movzb esi, [esi]
118: 8975FC move [ebp-4], esi
121: 8B75FC move esi, [ebp-4]
124: 8975D0 move [ebp-30], esi
127: C165D008 shl [ebp-30], 8
131: 8D7710 lea esi, [edi+10]
134: 8975FC move [ebp-4], esi
137: 894DF8 move [ebp-8], ecx
140: C17DF808 sar [ebp-8], 8
144: 8B75D0 move esi, [ebp-30]
147: 8975F4 move [ebp-C], esi
150: C17DF408 sar [ebp-C], 8
154: 8B4DF4 move ecx, [ebp-C]
157: 8B75FC move esi, [ebp-4]
160: 0375F8 add esi, [ebp-8]
163: 880E moveb [esi], cl
165: 8B4DE8 move ecx, [ebp-18]
168: 8D7310 lea esi, [ebx+10]
171: 8975FC move [ebp-4], esi
174: 8B75D8 move esi, [ebp-28]
177: 8975F8 move [ebp-8], esi
180: C17DF808 sar [ebp-8], 8
184: 8B75FC move esi, [ebp-4]
187: 0375F8 add esi, [ebp-8]
190: 0FB636 movzb esi, [esi]
193: 8975FC move [ebp-4], esi
196: 8B75FC move esi, [ebp-4]
199: 8975D0 move [ebp-30], esi
202: C165D008 shl [ebp-30], 8
206: 8D7710 lea esi, [edi+10]
209: 8975FC move [ebp-4], esi
212: 894DF8 move [ebp-8], ecx
215: C17DF808 sar [ebp-8], 8
219: 8B75D0 move esi, [ebp-30]
222: 8975F4 move [ebp-C], esi
225: C17DF408 sar [ebp-C], 8
229: 8B4DF4 move ecx, [ebp-C]
232: 8B75FC move esi, [ebp-4]
235: 0375F8 add esi, [ebp-8]
238: 880E moveb [esi], cl
240: 8B4DE4 move ecx, [ebp-1C]
243: 8D7310 lea esi, [ebx+10]
246: 8975FC move [ebp-4], esi
249: 8B75D4 move esi, [ebp-2C]
252: 8975F8 move [ebp-8], esi
255: C17DF808 sar [ebp-8], 8
259: 8B75FC move esi, [ebp-4]
262: 0375F8 add esi, [ebp-8]
265: 0FB636 movzb esi, [esi]
268: 8975FC move [ebp-4], esi
271: 8B75FC move esi, [ebp-4]
274: 8975D0 move [ebp-30], esi
277: C165D008 shl [ebp-30], 8
281: 8D7710 lea esi, [edi+10]
284: 8975FC move [ebp-4], esi
287: 894DF8 move [ebp-8], ecx
290: C17DF808 sar [ebp-8], 8
294: 8B75D0 move esi, [ebp-30]
297: 8975F4 move [ebp-C], esi
300: C17DF408 sar [ebp-C], 8
304: 8B4DF4 move ecx, [ebp-C]
307: 8B75FC move esi, [ebp-4]
310: 0375F8 add esi, [ebp-8]
313: 880E moveb [esi], cl
315: 8B4DE0 move ecx, [ebp-20]
318: 8B7508 move esi, [ebp+8]
321: 83C610 add esi, 10
324: 8975FC move [ebp-4], esi
327: 8955F8 move [ebp-8], edx
330: C17DF808 sar [ebp-8], 8
334: 8B75FC move esi, [ebp-4]
337: 0375F8 add esi, [ebp-8]
340: 0FB636 movzb esi, [esi]
343: 8975FC move [ebp-4], esi
346: 8B75FC move esi, [ebp-4]
349: 8975D0 move [ebp-30], esi
352: C165D008 shl [ebp-30], 8
356: 8D7710 lea esi, [edi+10]
359: 8975FC move [ebp-4], esi
362: 894DF8 move [ebp-8], ecx
365: C17DF808 sar [ebp-8], 8
369: 8B75D0 move esi, [ebp-30]
372: 8975F4 move [ebp-C], esi
375: C17DF408 sar [ebp-C], 8
379: 8B4DF4 move ecx, [ebp-C]
382: 8B75FC move esi, [ebp-4]
385: 0375F8 add esi, [ebp-8]
388: 880E moveb [esi], cl
390: 8B4DEC move ecx, [ebp-14]
393: 81C100040000 add ecx, 400
399: 894DEC move [ebp-14], ecx
402: 8B4DE8 move ecx, [ebp-18]
405: 81C100040000 add ecx, 400
411: 894DE8 move [ebp-18], ecx
414: 8B4DE4 move ecx, [ebp-1C]
417: 81C100040000 add ecx, 400
423: 894DE4 move [ebp-1C], ecx
426: 8B4DE0 move ecx, [ebp-20]
429: 81C100040000 add ecx, 400
435: 894DE0 move [ebp-20], ecx
438: 8B4DDC move ecx, [ebp-24]
441: 81C100030000 add ecx, 300
447: 894DDC move [ebp-24], ecx
450: 8B4DD8 move ecx, [ebp-28]
453: 81C100030000 add ecx, 300
459: 894DD8 move [ebp-28], ecx
462: 8B4DD4 move ecx, [ebp-2C]
465: 81C100030000 add ecx, 300
471: 894DD4 move [ebp-2C], ecx
474: 81C200010000 add edx, 100
480: 2D00010000 sub eax, 100
485: E964FEFFFF jmp L1
NIL
CL-USER 3 >
Is it fast?. If this is still not fast enough
this may be a good case for creating a C DLL with helper functions
to do this brute forve copying.
Wade