Lisp HUG Maillist Archive

Help with optimization

Here's a simple macro/function pair from our system, used for tokenizing a
string:

(defmacro with-string-tokenizer ((str predicate) &body body)
  (let ((index (gensym))
	(data (gensym))
	(start (gensym))
	(end (gensym))
        (test-func (gensym)))
    `(let* ((,data ,str)
	    (,start 0)
	    (,end (length ,data))
	    (,index ,start)
            (,test-func ,predicate))
       (declare (optimize (speed 3) (debug 0) (hcl:fixnum-safety 0)))
       (flet ((next-token ()
                (if (not ,index)
                    nil
                  (let* ((i (position-if (complement ,test-func)
                                         ,data :start ,index))
                         (j (position-if ,test-func ,data :start i)))
                    (prog1
                        (subseq ,data i j)
                      (setf ,index j))))))
         ,@body))))

(defun tokenize (str &key delim)
  (let ((test-func (if delim (lambda (char)
                               (char= char delim))
                     'lw:whitespace-char-p)))
    (with-string-tokenizer (str test-func)
      (loop for tok = (next-token)
            while tok collect tok))))

It shows up during profiling (on the top of stack about 1% of the time).
>From my point of view that's not too bad, but I'm wondering if either a)
there's more optimizations that can be done; or b) a better way to write
this utility. Thanks for the comments.

-- david

For wisdom is more precious than rubies,
and nothing you desire can compare with her.
  -- Proverbs 8:11

But all the world understands my language.
  -- Franz Joseph Haydn


This email message is for the sole use of the intended recipients(s) and may contain confidential and privileged information of Bloodhound Software, Inc.. Any unauthorized review, use, disclosure is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.


Re: Help with optimization

Most likely you can get it to work much quicker, but if it is only taking up
1% of the time, it wouldn't be worth it.

On 3/8/07, Young, David <deyoung@bloodhoundinc.com> wrote:
>
> Here's a simple macro/function pair from our system, used for tokenizing a
> string:
>
> (defmacro with-string-tokenizer ((str predicate) &body body)
>  (let ((index (gensym))
>        (data (gensym))
>        (start (gensym))
>        (end (gensym))
>        (test-func (gensym)))
>    `(let* ((,data ,str)
>            (,start 0)
>            (,end (length ,data))
>            (,index ,start)
>            (,test-func ,predicate))
>       (declare (optimize (speed 3) (debug 0) (hcl:fixnum-safety 0)))
>       (flet ((next-token ()
>                (if (not ,index)
>                    nil
>                  (let* ((i (position-if (complement ,test-func)
>                                         ,data :start ,index))
>                         (j (position-if ,test-func ,data :start i)))
>                    (prog1
>                        (subseq ,data i j)
>                      (setf ,index j))))))
>         ,@body))))
>
> (defun tokenize (str &key delim)
>  (let ((test-func (if delim (lambda (char)
>                               (char= char delim))
>                     'lw:whitespace-char-p)))
>    (with-string-tokenizer (str test-func)
>      (loop for tok = (next-token)
>            while tok collect tok))))
>
> It shows up during profiling (on the top of stack about 1% of the time).
> >From my point of view that's not too bad, but I'm wondering if either a)
> there's more optimizations that can be done; or b) a better way to write
> this utility. Thanks for the comments.
>
> -- david
>
> For wisdom is more precious than rubies,
> and nothing you desire can compare with her.
>  -- Proverbs 8:11
>
> But all the world understands my language.
>  -- Franz Joseph Haydn
>
>
> This email message is for the sole use of the intended recipients(s) and may contain confidential and privileged information of Bloodhound Software, Inc.. Any unauthorized review, use, disclosure is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
>


-- 
~jrm


Re: Help with optimization


> > ... I'm wondering if either a)
> > there's more optimizations that can be done; or b) a better way to write
> > this utility. Thanks for the comments.
> >
> > -- david
....

Concerning (b) above, with-string-tokenizer fails with trailing white space.
A fix may be:

(defmacro with-string-tokenizer ((str predicate) &body body)
  (let ((index (gensym))
        (data (gensym))
        (start (gensym))
        (end (gensym))
        (test-func (gensym)))
    `(let* ((,data ,str)
            (,start 0)
            (,end (length ,data))
            (,index ,start)
            (,test-func ,predicate))
       (declare (optimize (speed 3) (debug 0) (hcl:fixnum-safety 0)))
       (flet ((next-token ()
                (if (not ,index)
                  nil
                  (let ((i (position-if (complement ,test-func)
                                        ,data :start ,index)))
                    (when i
                      (let ((j (position-if ,test-func ,data :start i)))
                        (prog1
                            (subseq ,data i j)
                          (setf ,index j))))))))
         ,@body))))

Jeff Caldwell



 
____________________________________________________________________________________
8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news


Re: Help with optimization

Ah, ok. So as a rule of thumb, code that shows up in this range isn't generally worth looking at?

We've optimized our system to the point where this utility is the only piece of our code showing up, so it was next on my list of things to look at.

Thanks.

On 3/8/07, Joe Marshall <jmarshall@alum.mit.edu> wrote:

Most likely you can get it to work much quicker, but if it is only taking up
1% of the time, it wouldn't be worth it.

On 3/8/07, Young, David <deyoung@bloodhoundinc.com > wrote:
>
> Here's a simple macro/function pair from our system, used for tokenizing a
> string:
>
> (defmacro with-string-tokenizer ((str predicate) &body body)
>  (let ((index (gensym))
>        (data (gensym))
>        (start (gensym))
>        (end (gensym))
>        (test-func (gensym)))
>    `(let* ((,data ,str)
>            (,start 0)
>            (,end (length ,data))
>            (,index ,start)
>            (,test-func ,predicate))
>       (declare (optimize (speed 3) (debug 0) (hcl:fixnum-safety 0)))
>       (flet ((next-token ()
>                (if (not ,index)
>                    nil
>                  (let* ((i (position-if (complement ,test-func)
>                                         ,data :start ,index))
>                         (j (position-if ,test-func ,data :start i)))
>                    (prog1
>                        (subseq ,data i j)
>                      (setf ,index j))))))
>         ,@body))))
>
> (defun tokenize (str &key delim)
>  (let ((test-func (if delim (lambda (char)
>                               (char= char delim))
>                     'lw:whitespace-char-p)))
>    (with-string-tokenizer (str test-func)
>      (loop for tok = (next-token)
>            while tok collect tok))))
>
> It shows up during profiling (on the top of stack about 1% of the time).
> >From my point of view that's not too bad, but I'm wondering if either a)
> there's more optimizations that can be done; or b) a better way to write
> this utility. Thanks for the comments.
>
> -- david
>
> For wisdom is more precious than rubies,
> and nothing you desire can compare with her.
>  -- Proverbs 8:11
>
> But all the world understands my language.
>  -- Franz Joseph Haydn
>
>
> This email message is for the sole use of the intended recipients(s) and may contain confidential and privileged information of Bloodhound Software, Inc.. Any unauthorized review, use, disclosure is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
>


--
~jrm




--
And now these three remain: faith, hope, and love.
But the greatest of these is love.
  -- 1 Corinthians 13:13

For wisdom is more precious than rubies,
and nothing you desire can compare with her.
  -- Proverbs 8:11

Re: Help with optimization

And that's precisely the issue. I wrote this utility because the list of strings *is* needed. Good idea, though, and worth remember. Thanks very much...

david

On 3/9/07, Gary King <gwking@metabang.com> wrote:
Hi David,

This way you could process all the tokens without necessarily needing to ever call subseq... Of course, if you _need_ the strings, then this won't work!



--
And now these three remain: faith, hope, and love.
But the greatest of these is love.
  -- 1 Corinthians 13:13

For wisdom is more precious than rubies,
and nothing you desire can compare with her.
  -- Proverbs 8:11

Re: Help with optimization

Unable to parse email body. Email id is 6477

RE: Help with optimization

So, for what it's worth I changed the original function/macro implementation to this:

(defun tokenize (string &key delim)
  "This version is the final result of a lot of optimization attempts"
  (declare (optimize (speed 3) (debug 0) (safety 0) (hcl:fixnum-safety 0))
           (type simple-string string)
           (type simple-string str))
  (let ((tokens (list))
        (str (if (char= delim #\Space)
                 (string-trim '(#\Space) string)
               string)))
    (loop for i from 0 below (length str)
          with start = 0 do
            (let ((char (schar str i)))
              (when (char= char delim)
                (push (subseq str start i) tokens)
                (setf start (1+ i))))
          finally (push (subseq str start i) tokens))
    (nreverse tokens)))

Profiling this version reveals that it drops below the 1% stack-top threshold. That's good enough for me. Thanks to all who offered advice, especially Martin who suggested replacing POSITION-IF with LOOP and SCHAR.

-- david


Re: Help with optimization

Doh! You're absolutely right; even it I don't bother optimizing it further, (COMPLEMENT ,TEST-FUNC) should *not* be in the loop. I'll also check the bug noticed by Jeff Caldwell.

Thanks, david

On 3/9/07, Martin Simmons <martin@lispworks.com> wrote:


If you were to optimize it, then a couple of things that spring to mind are:

1) Take (COMPLEMENT ,TEST-FUNC) out of the loop.

2) Consider implementing the calls to POSITION-IF as loops using CHAR (or
   SCHAR if you know the string is simple).  That also removes the need for
   COMPLEMENT.

--
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/





--
And now these three remain: faith, hope, and love.
But the greatest of these is love.
  -- 1 Corinthians 13:13

For wisdom is more precious than rubies,
and nothing you desire can compare with her.
  -- Proverbs 8:11
Updated at: 2020-12-10 08:46 UTC