Lisp HUG Maillist Archive

Quite astoning results

Well one can argue about the problem at hand, however here we go
I have around 1500 files in a directory whith some informatoi on how
many downloads we've encountered. Now I wrote a few script to see how
"fast" or "slow" different languages on this task are. 

I come up with the following alternatives for LispWorks:
(without cl-ppcre)
(defun run (dir)
  (let ((sum 0)
        (count 0)
        (digits (make-string 10)))
    (cd dir)
    (dolist (file (directory "*.*"))
      (with-open-file (stream file)
        (loop for line = (read-line stream nil) 
              then (read-line stream nil)
              while line
              do 
#|             
              (multiple-value-bind (mstart mend reg-start reg-end) 
                  (cl-ppcre:scan "(\\d+) Windows executable"
                                 line)
                (when mstart
                  (let* ((start-pos (aref reg-start 0))
                         (end-pos (aref reg-end 0)))
|#
                   

              (let ((find (search "Windows exe" line )))
                (when find
                  (let ((j
                         (loop for i = 0 then (1+ i)
                               while (char= (schar line i) #\Space)
                               finally return i))
                        (to-add 0))
                    (loop for counter = j then (1+ counter)
                          and i = 0 then (1+ i)
                          while (digit-char-p (schar line counter))
                          do
                          (setf to-add (+ (* to-add 10) (- (char-int (schar line counter))
                                                           #.(char-int #\0)))))
                                                                     
                    
                    (incf sum to-add) ; (parse-integer line :start start-pos :end end-pos))
                    (incf count)))))))
                 
  (format t "~d download in ~d days = ~,2,f downloads/day"
          sum count (/ (float sum) count))))

Here are the timing results for it:
 (time (run "/home/frido/Mail/Administration/"))
Timing the evaluation of (RUN "/home/frido/Mail/Administration/")
61233 download in 168 days = 364.48 downloads/day
user time    =      9.178
system time  =      0.091
Elapsed time =   0:00:10
Allocation   = 109359240 bytes standard / 1237797 bytes conses
0 Page faults

You see it alloates more then 100 MB !!!

Now the same with the cl-ppcre solution:
(time (run "/home/frido/Mail/Administration/"))
Timing the evaluation of (RUN "/home/frido/Mail/Administration/")
61233 download in 168 days = 364.48 downloads/day
user time    =      1.946
system time  =      0.101
Elapsed time =   0:00:03
Allocation   = 198901592 bytes standard / 1236895 bytes conses
0 Page faults
NIL

It allocates round 98 MB more but neverthless is nearly 5 times faster
then the non cl-ppcre solution. 

I'm quite suprises about this findings. Have you an idea why this may
work that way?

Regards
Friedrich


Re: Quite astoning results

Friedrich Dominicus <frido@q-software-solutions.de> writes:

> It allocates round 98 MB more but neverthless is nearly 5 times faster
> then the non cl-ppcre solution. 
> 
> I'm quite suprises about this findings. Have you an idea why this may
> work that way?

I suspect that the built-in SEARCH does not use Boyer-Moore-Horspool
string matching, which trades space for speed when searching for
constant strings. CL-PPCRE uses BMH matching by default.

See:

   http://www.weitz.de/cl-ppcre/#use-bmh-matchers

Zach


Updated at: 2020-12-10 08:48 UTC