threads and GC in LWW 4.2

I've been finding that if you have lots of threads in LWW then it
tends to leak memory enormously - generation 2 becomes hundreds of
Mb.  In some cases doing a mark and sweep in generation 2 will then
cause an abrupt termination of LW.

Obviously the latter is a bug of some kind (possibly in windows), but
I'm not sure if the former is due to some resonance between the thread
system and memory allocation or something like that.  In general I'm
not clear if what I'm doing is reasonable or not, or if I could tweak
some GC parameters to fix it.

Unfortunately my real system lives behind a compatibility layer which
I can't show here, but a native LW version looks like the code below.
If you call (ts-native 100 100) - spawn 100 threads 100 times - you
should get a generation 2 which is hundreds of Mb, and if you try
(mark-and-sweep 2) LW will probably die.  (ts-native 1000 10) - spawn
10 threads 1000 times - does not seem to leak like this, so I guess
the issue is if there are hundreds of threads alive when GC happens?

My real system doesn't spawn this many threads, this was just part of
the test suite for it.  Adding the commented locking code back in
doesn't seem to make any difference.

Does anyone know anything about this?

Thanks

--tim

;;;; Test native thread leakage
;;;

(in-package :cl-user)

(defvar *n* 0)

(defun native-inc-n (cons)
  ;; Note there is only one lock here.
  #||(let ((lock (load-time-value (mp:make-lock))))
    (mp:with-lock (lock)
      (incf *n*)))||#
  (mp:without-preemption
    (setf (car cons) t)))

(defun native-run-n (n)
  ;; spawn n threads and then wait for them all to complete
  (setf *n* 0)
  (loop for x in (loop repeat n
                       for c = (cons nil nil)
                       do (mp:process-run-function "native-thr" ()
                                                   #'native-inc-n c)
                       collect c)
        do (mp:process-wait "Waiting..." #'(lambda ()
                                             (car x)))
        finally (return *n*)))

(defun ts-native (n m)
  ;; run m threads n times
  (loop repeat n
        do (native-run-n m)))

Re: threads and GC in LWW 4.2

Tim Bradshaw writes


> I've been finding that if you have lots of threads in LWW then it
> tends to leak memory enormously - generation 2 becomes hundreds of
> Mb.  In some cases doing a mark and sweep in generation 2 will then
> cause an abrupt termination of LW.
>
> Obviously the latter is a bug of some kind (possibly in windows), but
> I'm not sure if the former is due to some resonance between the thread
> system and memory allocation or something like that.  In general I'm
> not clear if what I'm doing is reasonable or not, or if I could tweak
> some GC parameters to fix it.
>
> Unfortunately my real system lives behind a compatibility layer which
> I can't show here, but a native LW version looks like the code below.
> If you call (ts-native 100 100) - spawn 100 threads 100 times - you
> should get a generation 2 which is hundreds of Mb, and if you try
> (mark-and-sweep 2) LW will probably die.  (ts-native 1000 10) - spawn
> 10 threads 1000 times - does not seem to leak like this, so I guess
> the issue is if there are hundreds of threads alive when GC happens?
>
> My real system doesn't spawn this many threads, this was just part of
> the test suite for it.  Adding the commented locking code back in
> doesn't seem to make any difference.
>
> Does anyone know anything about this?

I had the same problem with a LWL web service (Apache + mod_lisp) and now I
force a mark-and-sweep of gen 2 every 1000 requests. The LW 4.2 GC has now 3
generations so this takes less time but it's still painfull.

I don't know if there is a good way to handle this in a generational GC ?
Allocated data of a thread is promoted to older generations by the GC
triggered by the other threads working.
May be some kind of per thread generations ?

Marc

PS your code makes LWW die on my PC when I do a (mark-and-sweep 2) after
running it.

CL-USER 7 > (room)
 Generation 0:  Total Size 525K, Allocated 229K, Free 287K
 Generation 1:  Total Size 1602K, Allocated 951K, Free 639K
 Generation 2:  Total Size 713045K, Allocated 564248K, Free 144192K
 Generation 3:  Total Size 16213K, Allocated 15783K, Free 412K

Total Size 731382K, Allocated 581213K, Free 145532K

I tried the Linux version and it hangs before finishing

Re: threads and GC in LWW 4.2

At 16/04/2002 15:46 +0100, Tim Bradshaw wrote:
>I've been finding that if you have lots of threads in LWW then it
>tends to leak memory enormously - generation 2 becomes hundreds of
>Mb.

Would it not help to freeze the promotion mechanism during the call to 
native-run-n in order to keep the garbage in generation 0? At the end of 
native-run-n you could also force a promotion from 0 to 1. I don't know LW 
yet. In ACL (two-space generation-scavenging gc) I had defined the 
following macro for a similar purpose:

(defmacro without-generation-stepping (&body body)
    "To stop the generation-stepping during the execution of body.
     Possible danger : could force the newspace to grow"
    (with-gensyms (auto-step)
      `(let ((,auto-step (sys:gsgc-switch :auto-step)))
          (unwind-protect
           (progn
            (setf (sys:gsgc-switch :auto-step) NIL)
            ,@body
            (setf (sys:gsgc-switch :auto-step) ,auto-step))))))


>Obviously the latter is a bug of some kind (possibly in windows), but
>I'm not sure if the former is due to some resonance between the thread
>system and memory allocation or something like that.  In general I'm
>not clear if what I'm doing is reasonable or not, or if I could tweak
>some GC parameters to fix it.
>
>Unfortunately my real system lives behind a compatibility layer which
>I can't show here, but a native LW version looks like the code below.
>If you call (ts-native 100 100) - spawn 100 threads 100 times - you
>should get a generation 2 which is hundreds of Mb, and if you try
>(mark-and-sweep 2) LW will probably die.  (ts-native 1000 10) - spawn
>10 threads 1000 times - does not seem to leak like this, so I guess
>the issue is if there are hundreds of threads alive when GC happens?
>
>My real system doesn't spawn this many threads, this was just part of
>the test suite for it.  Adding the commented locking code back in
>doesn't seem to make any difference.
>
>Does anyone know anything about this?
>
>Thanks
>
>--tim
>
>;;; Test native thread leakage
>;;
>
>(in-package :cl-user)
>
>(defvar *n* 0)
>
>(defun native-inc-n (cons)
>   ;; Note there is only one lock here.
>   #||(let ((lock (load-time-value (mp:make-lock))))
>     (mp:with-lock (lock)
>       (incf *n*)))||#
>   (mp:without-preemption
>     (setf (car cons) t)))
>
>(defun native-run-n (n)
>   ;; spawn n threads and then wait for them all to complete
>   (setf *n* 0)
>   (loop for x in (loop repeat n
>                        for c = (cons nil nil)
>                        do (mp:process-run-function "native-thr" ()
>                                                    #'native-inc-n c)
>                        collect c)
>         do (mp:process-wait "Waiting..." #'(lambda ()
>                                              (car x)))
>         finally (return *n*)))
>
>(defun ts-native (n m)
>   ;; run m threads n times
>   (loop repeat n
>         do (native-run-n m)))


--
Francis Leboutte
Algorithme, Rue de la Charrette 141, 4130 Tilff, Belgium
   f.leboutte@algo.be    leboutte@acm.org
   www.algo.be   +32-(0)4.388.3919
    Le langage Logo à l'athénée de Waha : www.algo.be/logo.html

Re: threads and GC in LWW 4.2

Unable to parse email body. Email id is 284

Re: threads and GC in LWW 4.2

Unable to parse email body. Email id is 285

Re: threads and GC in LWW 4.2

* davef  wrote:

> Most of the allocation in Tim's test is for the stacks of the
> processes. Stacks are allocated in generation 2 by default, which
> accounts for the effects observed. To improve the performance, try
> setting SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH*.

Thanks for explaining this.  I'd worked out, I think, that there must
have been some allocation in the generation 2, and stacks is the
obvious thing (especially as they're relatively large).

> LispWorks keeps a cache of stacks, so 'normal' creation and
> destruction of processes normally does not cause a problem.  The stack
> of a dead process is put in the cache and then re-used. The number of
> stacks in the cache is limited to the value of
> SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH*, which defaults to 10. The
> test creates more than 10 processes and then they all die, so only
> 10 of the stacks are cached and the rest are not collected until the
> user calls (MARK-AND-SWEEP 2). This is what causes the enlargement of
> memory.

Yes.  One thing I tried was to set collect-second-generation (? the
function name, but the one that says to GC the second generation), and
this didn't seem to help.  I'm slightly mystified by that, but
everything else makes sense. (Actually, maybe this is because there
was not enough allocation in 0 and 1 to force a GC of 2?)

> If the application needs this fast turnover of processes, then try
> setting SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH* to a larger value. For
> example, after doing 
>  (SETQ SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH* 110)
> for me 
>  (TS-NATIVE 100 100) 
> runs in a few minutes and doesn't grow after the initial growth.
> Since a stack is approximately 65Kb, setting this to 110 will make a
> cache of around 7Mb.

If I understand correctly, the bad situation is if more than
SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH* have died with no new process
creation in the meantime, then if the death rate exceeds the birth
rate stacks will `leak' until a GC of 2.  Is that right?

> Maybe we should make the initial value of
> SYS::*DEFAULT-STACK-GROUP-LIST-LENGTH* larger, but it is not obvious
> since this issue has not arisen before AFAICT. A larger value should
> be harmless since in normal circumstances the cache is not going to
> grow much anyway. Anyway, we have have decided to export and document
> this variable, but not change the initial value at this stage.

> We would be interested to hear of a 'real-life' (non test-suite!)
> situation where such a large turnover of processes is useful. 

I doubt my system will have this problem.  I'm intending to create a
pool of processes and then feed them with mailboxes, so my stacks will
be very long-lived.

> There is a general mechanism in LispWorks for freezing the promotion
> mechanism similarly to what Francis described for ACL. See
> LW:BLOCK-PROMOTION. I tried this in the test case, and found that it
> does not improve performance, because stacks are allocated in
> generation 2 regardless. We'll improve the doc to mention this
> subtlety.

The other thing is that I *think* LW:BLOCK-PROMOTION won't work in
multithreaded contexts. When I macro-expanded it it seemed to work by
binding an internal special variable, which binding will, I guess not
be visible outside the process that created it.  Unless there is some
deep magic I don't see how the GC can tell if BLOCK-PROMOTION is in
effect in another thread.  I think it should assign the variable in an
unwind-protect or something. But maybe I misunderstand this...

Anyway, thanks for explaining what is happening.  I'm now reassured
that bad things won't happen in real life.

--tim

Re: threads and GC in LWW 4.2

Unable to parse email body. Email id is 287