Multithreading question

Hello,

I would like to know if LispWorks 6 can be considered to have "true pre-emptive multithreading".

If yes or no can you also explain why is it so.


All the best,
Cam

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Hi Camille,

If I understand the term "true pre-emptive multi-threading" correctly,
the answer to your question is "yes" (I used this link to learn
http://en.wikipedia.org/wiki/Computer_multitasking#Cooperative_multitasking
_

My impression is that most of the following operating systems (at
least Linux and Windows) provide "true pre-emptive multi-threading",
and LispWorks certainly runs "native" threads on them, which means the
threads are OS-level threads:

"On Microsoft Windows, Mac OS X, Linux, x86/x64 Solaris and FreeBSD,
LispWorks multiprocessing uses native threads and supports Symmetric
Multiprocessing (SMP). The implementation is referred to as "SMP
LispWorks" where relevant."

http://www.lispworks.com/documentation/lw60/LW/html/lw-228.htm

Best,
David

On Sat, Jan 25, 2014 at 10:15 AM, Camille Troillard
<camille@osculator.net> wrote:
>
> Hello,
>
> I would like to know if LispWorks 6 can be considered to have "true pre-emptive multithreading".
>
> If yes or no can you also explain why is it so.
>
>
> All the best,
> Cam
>
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html
>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Hi Cam,

Okay, this goes beyond my expertise, as while I've used LispWorks'
multi-threading, I haven't done a phenomenal amount of benchmarking
with it.  Good luck in your search for information!

David

On Mon, Jan 27, 2014 at 12:16 PM, Camille Troillard
<camille@osculator.net> wrote:
> Thank you David.
>
> After some research and discussion with fellow developers, my understanding of LW's threading model is that it is cooperative threading implemented over pre-emptive threading (the operating system).
>
> I found that under some circumstances MP:YIELD or MP:PROCESS-ALLOW-SCHEDULING had to be used in order to relinquish control.
>
> I would love to have more information about this subject, because I did not found a lot in the documentation.
>
> Best,
> Cam
>
>
>> On 25 janv. 2014, at 19:21, "David L. Rager" <ragerdl@cs.utexas.edu> wrote:
>>
>>
>> Hi Camille,
>>
>> If I understand the term "true pre-emptive multi-threading" correctly,
>> the answer to your question is "yes" (I used this link to learn
>> http://en.wikipedia.org/wiki/Computer_multitasking#Cooperative_multitasking
>> _
>>
>> My impression is that most of the following operating systems (at
>> least Linux and Windows) provide "true pre-emptive multi-threading",
>> and LispWorks certainly runs "native" threads on them, which means the
>> threads are OS-level threads:
>>
>> "On Microsoft Windows, Mac OS X, Linux, x86/x64 Solaris and FreeBSD,
>> LispWorks multiprocessing uses native threads and supports Symmetric
>> Multiprocessing (SMP). The implementation is referred to as "SMP
>> LispWorks" where relevant."
>>
>> http://www.lispworks.com/documentation/lw60/LW/html/lw-228.htm
>>
>> Best,
>> David
>>
>> On Sat, Jan 25, 2014 at 10:15 AM, Camille Troillard
>> <camille@osculator.net> wrote:
>>>
>>> Hello,
>>>
>>> I would like to know if LispWorks 6 can be considered to have "true pre-emptive multithreading".
>>>
>>> If yes or no can you also explain why is it so.
>>>
>>>
>>> All the best,
>>> Cam
>>>
>>> _______________________________________________
>>> Lisp Hug - the mailing list for LispWorks users
>>> lisp-hug@lispworks.com
>>> http://www.lispworks.com/support/lisp-hug.html
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Hi Cam,

It sounds like you may be confusing coroutines with native threads. It is usually not necessary to use yield for native threads. A simple sleep will suffice. The scheduler will manage the threads, so you don't really have to worry about it.

Thanks,

-G

From: Camille Troillard

Sent: Monday, January 27, 2014 3:48 PM

To: Tim Bradshaw

Reply To: Camille Troillard

Cc: lisp-hug@lispworks.com Lispworks [lisp-hug@lispworks.com]

Subject: Re: Multithreading question

Hello Tim,

On 27 Jan 2014, at 22:34, Tim Bradshaw <tfb@tfeb.org> wrote:

>> I found that under some circumstances MP:YIELD or MP:PROCESS-ALLOW-SCHEDULING had to be used in order to relinquish control.
>
> Can you describe what these circumstances are and what your platform is?

Circumstances: I would like to get detailed information about multi-processing on LW 6.1.1, more particularly how and when I should use MP:YIELD.
Platform: Mac OS X, LW 64 bits.

> Are you sure you're not being confused by the SMP support LW now has which can allow multiple processes to be running simultaneously on a multicore system (which pretty much is any modern CPU)?

I initially assumed SMP implied pre-emptive multi-threading.

> In those cases processes often do not need to be preempted to allow others to run.

I’m sorry, I am bit confused.
What cases are you referring to?

All the best,
Cam

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

A simple way to demonstrate the need for yield points is to run

(funcall (compile nil '(lambda () (tagbody x (go x)))))

This will be uninterruptible and, at least on Linux, often crash the
GUI. TAGBODY has no yield points, but LOOP does, so

(funcall (compile nil '(lambda () (loop))))

is not a problem. Note this is some kind of compiler magic since we
don't see MP:YIELD in the macroexpansion of LOOP.

LW threads need yield points, which are inserted automatically into
non-inlined/non-safety-0 functions, LOOP, and perhaps some other
forms. This is a common enough case to make it seem like threads are
pre-emptive.

Regards,
lmj

On Mon, Jan 27, 2014 at 5:46 PM, Paul Tarvydas <paultarvydas@gmail.com> wrote:
> From
> http://www.lispworks.com/documentation/lw61/LW/html/lw-295.htm#pgfId-896065
>
> "Threads running Lisp code can be rescheduled preemptively, so if you call
> into Lisp from more than one thread simultaneously and one request takes a
> long time then it will not delay the requests in other threads."
>
>
> Up to 5.1, LW ran all code in one process.  With 6.0, true SMP
> multi-processing was implemented.  Every top-level window is a separate O/S
> process in LWW and LWL.  I believe that MacOS requires that all windows run
> under one process (or something like that, I'm not a Mac guy) and LWM
> respects this requirement.
>
>
> Here
> http://www.lispworks.com/documentation/lw61/LW/html/lw-1140.htm#pgfId-1688841
> MP:YIELD is documented:
>
> "Normally code compiled at safety 0 cannot be preempted because the
> necessary checks are ommited. ... if you find that preemption is not working
> in a loop with no function calls, yield can be useful."
>
> So, either
>
> 1) MP:YIELD is a hold-over from pre-6.0 days, and retained for backwards
> compatibility, or,
>
> 2) There is some hoary edge-case that prevents the operating system from
> preempting LW.  I don't believe this one.
>
> I don't have LWM, so I can't try this out, but you can test it.  Write a
> tight loop that has no function calls (check with the disassembler), then
> compile it at safety 0 and see if preemption stops.  Spawn 100 processes
> (e.g. some number larger than your number of cores) each running the loop.
> Do 99 of them get starved?
>
>
> When I first pored over the 6.0 MP manuals, I was confused by
> http://www.lispworks.com/documentation/lw61/LW/html/lw-283.htm#pgfId-894689
> .  Again, to my eyes, this looks like backwards compatibility with 5.1
> and/or a convenience function that allows process readiness to be computed
> by a function which is called at random intervals.  My conclusion was -
> don't use these functions in >= 6.0 code.
>
> pt
>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Rainer Joswig <joswig@lisp.de> writes:

> Hi,
>
> where in the disassembly are these yield calls?

If I compile these:

(defun gazonk ()
  (loop))

(defun bar ()
  (tagbody x (go x)))


I get these (LWM 6.1.1 64 bit). Note the lack of a call to
system::check-for-interrupts in the code for #'bar:

CL-USER 2 > (disassemble 'gazonk)
4140020434:
       0:      49396275         cmpq  [r10+75], rsp
       4:      7723             ja    L3
       6:      4883F900         cmpq  rcx, 0
      10:      751D             jne   L3
      12:      4157             push  r15
      14:      55               push  rbp
      15:      4889E5           moveq rbp, rsp
      18:      4989DF           moveq r15, rbx
L1:   21:      49396275         cmpq  [r10+75], rsp
      25:      7702             ja    L2
      27:      EBF8             jmp   L1
L2:   29:      4D8B4FD4         moveq r9, [r15-2C]     ; SYSTEM::CHECK-FOR-INTERRUPTS
      33:      498B590F         moveq rbx, [r9+F]
      37:      FFD3             call  rbx
      39:      EBEC             jmp   L1
L3:   41:      41FFA6E7020000   jmp   [r14+2E7]        ; SYSTEM::*%WRONG-NUMBER-OF-ARGUMENTS-STUB
      48:      90               nop   
      49:      90               nop   
      50:      90               nop   
      51:      90               nop   
NIL

Cl-USER 3 > (disassemble 'bar)
4060014944:
       0:      49396275         cmpq  [r10+75], rsp
       4:      7711             ja    L2
       6:      4883F900         cmpq  rcx, 0
      10:      750B             jne   L2
      12:      4157             push  r15
      14:      55               push  rbp
      15:      4889E5           moveq rbp, rsp
      18:      4989DF           moveq r15, rbx
L1:   21:      EBFE             jmp   L1
L2:   23:      41FFA6E7020000   jmp   [r14+2E7]        ; SYSTEM::*%WRONG-NUMBER-OF-ARGUMENTS-STUB
      30:      90               nop   
      31:      90               nop   
      32:      90               nop   
      33:      90               nop   
      34:      90               nop   
      35:      90               nop   
NIL


-- 
  (espen)

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Rainer Joswig <joswig@lisp.de> writes:

> Hi,
>
> but that's not a yield. It checks for interrupts.

Right, it does quite a bit more than mp:yield does. But it explains why
the loop variant behaves nice while the tagbody variant doesn't.
-- 
  (espen)

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Jeffrey Massung <massung@gmail.com> writes:

> One thing I've seem 
> missing from this discussion is the term "process", which is what LW 
> uses. And there can be multiple processes per HW thread. Processes can 
> run concurrently when on separate HW threads, but when on the same 
> thread must yield to each other (as James has pointed out).
>
> Threads != processes. And they are not (necessarily) 1:1. This is 
> important to remember.

Hmm. I think you just added to the confusion. How OS level threads are
actually executed as HW threads (i.e. in parallell on the CPU) is not
necessary to consider here(*). The important thing is that in LW 6 and
newer, there will be one OS-level *process* for your lispworks program
(I'm using the OS X interpretations of "process" and "thread" now, since
we were talking about LWM) and there will be one OS-level *thread* for
each lisp-level process your program starts. 

(*) Well, I actually have one program that counts the processor cores
before running, because I assume that it doesn't make sense to break up
a cpu-heavy task in more lisp-level processes than the machine has cores.
-- 
  (espen)

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Martin, we covered this in Lisp Support Call #39324, in which I showed
a problem in a multi-threaded stress test. The solution was to add an
MP:YIELD. Loops need a yield point, which will typically be present
since using LOOP or calling a non-inlined/non-safety-0 function will
suffice. A loop using TAGBODY and inlined functions does not have a
yield point, which was the problem in #39324. The stress test was
unrelated to a user needing to interrupt a loop. I wouldn't call LW
threads pre-emptive, even if they seem to be in common cases.

Best,
lmj


On Tue, Jan 28, 2014 at 10:33 AM, Martin Simmons <martin@lispworks.com> wrote:
> Sorry, it seems like the introduction to the Multiprocessing chapter isn't
> clear enough.
>
> On Microsoft Windows, Mac OS X, Linux, x86/x64 Solaris and FreeBSD, each Lisp
> mp:process has a separate native thread and in LispWorks 6.0 and later
> versions these threads can run simultaneously.
>
> As others have noted, the issue with loops is whether they can be interrupted
> by Lisp.  This doesn't affect preemption because the OS can always interrupt a
> thread to allow another one to use CPU, but it does affect communication
> between threads in Lisp.  In particular, it affects GC operations,
> mp:process-interrupt and mp:process-kill.
>
> --
> Martin Simmons
> LispWorks Ltd
> http://www.lispworks.com/
>
>
>>>>>> On Tue, 28 Jan 2014 15:23:40 +0100, Camille Troillard said:
>>
>>
>> Now it makes sense!
>> Thanks Jeffrey.
>>
>> Without knowing this it's hard to understand why MP:YIELD is useful in the first place.
>>
>> I would love to see the documentation clearer on the aspect of the mapping between LW processes and the operating system threads.
>>
>>
>>
>> > On 28 janv. 2014, at 15:12, Jeffrey Massung <massung@gmail.com> wrote:
>> >
>> > One thing I've seem missing from this discussion is the term "process", which is what LW uses. And there can be multiple processes per HW thread. Processes can run concurrently when on separate HW threads, but when on the same thread must yield to each other (as James has pointed out).
>> >
>> > Threads != processes. And they are not (necessarily) 1:1. This is important to remember.
>> >
>> > Jeff M.
>> >> <compose-unknown-contact.jpg>      James M. Lawrence       Tuesday, January 28, 2014 6:30 AM
>> >> A simple way to demonstrate the need for yield points is to run
>> >>
>> >> (funcall (compile nil '(lambda () (tagbody x (go x)))))
>> >>
>> >> This will be uninterruptible and, at least on Linux, often crash the
>> >> GUI. TAGBODY has no yield points, but LOOP does, so
>> >>
>> >> (funcall (compile nil '(lambda () (loop))))
>> >>
>> >> is not a problem. Note this is some kind of compiler magic since we
>> >> don't see MP:YIELD in the macroexpansion of LOOP.
>> >>
>> >> LW threads need yield points, which are inserted automatically into
>> >> non-inlined/non-safety-0 functions, LOOP, and perhaps some other
>> >> forms. This is a common enough case to make it seem like threads are
>> >> pre-emptive.
>> >>
>> >> Regards,
>> >> lmj
>> >>
>> >> _______________________________________________
>> >> Lisp Hug - the mailing list for LispWorks users
>> >> lisp-hug@lispworks.com
>> >> http://www.lispworks.com/support/lisp-hug.html
>>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

2014-01-25 Camille Troillard:
> I would like to know if LispWorks 6 can be considered to have "true pre-emptive multithreading".
>
> If yes or no can you also explain why is it so.

Hello.

Allow me to throw my 2 cents.

LispWorks 6 has SMP. A simple test is:

(dotimes (i 3)
  (mp:process-run-function
   (format nil "p~d" (1+ i))
   ()
   #'(lambda ()
       (let ((start (get-internal-real-time)))
         (loop
          (let ((current (get-internal-real-time)))
            (when (>= (- current start) (* 10 internal-time-units-per-second))
              (return))))))))

In the machine I'm answering from, which has 4 cores, it uses 75% CPU,
or 3 of the 4 cores if you will. If I replace 3 by 4, it tops at about
95% CPU. (I have other things running.)

PS: This test could be improved by mp:process-wait'ing for each
process and checking that the waiting time on a multi-core CPU was not
10 x num-processes.

As for preemptiveness, that's the OS job. Windows, Mac OS X, Linux
(with native threads), Solaris and FreeBSD all have preemptive
processes and threads. LW 6 happens to use a native OS thread for each
active Lisp process on these platforms.

I don't know anything about LW for UNIX, but I think that the
underlying platforms actually have preemptive processes and threads.
There must be a reason for not supporting SMP in these environments,
such as lack of interest or of proper support primitives for e.g.
locking, non-polling waiting functions, etc. Anyway, even though LW
for UNIX might use user-level threads, and thus not take advantage of
multiple cores, the underlying OS still has preemptive processes.

Versions of LW before 6 (at least version 5) already used one OS
thread per active Lisp process. The difference was in which one held a
global lock. So, the previous test would never exceed 25% CPU on this
machine. As such, LW was said to not be SMP, because in practice, Lisp
processes didn't ever run in parallel.

Note that foreign threads could run in parallel, since they didn't
have anything to do with the Lisp lock.

The most probable reason this was so for so long was the garbage
collector, which was not ready to execute simultaneously with Lisp
processes, or orchestrate them. I believe the current GC is a
stop-the-world, it suspends and resumes other Lisp processes during
its thing.

(progn
  (dotimes (i 3)
    (mp:process-run-function
     (format nil "p~d" (1+ i))
     ()
     #'(lambda ()
         (let ((start (get-internal-real-time)))
           (loop
            (let ((current (get-internal-real-time)))
              (when (>= (- current start) (* 10 internal-time-units-per-second))
                (return))))))))
  (mp:process-run-function
   "pgc"
   ()
   #'(lambda ()
       (let ((start (get-internal-real-time)))
         (loop
          (let ((current (get-internal-real-time)))
            (when (>= (- current start) (* 10 internal-time-units-per-second))
              (return))
            #+:lispworks5 (hcl:mark-and-sweep 0)
            #+:lispworks6 (hcl:gc-generation 0)))))))

The previous example averages at about 35% CPU for whatever amount of
processes. It's still above the theoretical 25% (on a 4 core CPU)
where the GC would keep the other processes from achieving anything
other than being suspended and resumed. This it probably because the
GC first tells other Lisp processes to suspend, waits until they're
all suspended, does its thing and resumes the suspended processes.

AFAIKT, the check for interrupts serves mp:process-interrupt, and
maybe the GC orchestration i.e. suspending a process.

About unresponding UI, that's another subject. For instance, in
Windows, if an application doesn't process window messages by a call
to GetMessage, PeekMesage et al, for about 5 seconds, it'll be
regarded as not responding. For that, you should move intensive
processing or blocking operations out of UI threads to non-UI threads.

Or for lengthy but chunkable/resumable operations (e.g. a state
machine), cooperate by processing messages once in a while. or in a
loop like modal dialogs, or by using wait functions that return when
there are messages so you can process them before waiting again. I
think LW has a wait function that processes window messages, so you
don't have to do the latter yourself.

Best regards,

Paulo Madeira

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

The issue is completely independent of the GUI. Attached is the test
case for Lisp Support Call #39324 demonstrating the problem using
messaging with lockless queues. RUN will eventually hang. Adding
MP:YIELD to the TAGBODY fixes it.

The underlying OS thread is preemptive, of course. I said a LW thread
is not preemptive, and by typical terminology I believe that is
accurate. Any thread abstraction requiring yield calls is not _really_
preemptive in my book, even if the necessary yield calls are
automatically inserted in most cases. Having had a case not in the
"most cases", I was ferociously bitten by the issue in a way that I
will never forget.

LispWorks performs poorly in the lparallel benchmarks

  (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)

while SBCL and CCL perform quite well, which is another reason I
believe LispWorks has an unusual threading model. In particular there
is a 10X slowdown where there should be an N-fold speedup for
tight-loop situations like the Fibonacci benchmark. This is consistent
with what I understand about LW needing yield points.


On Tue, Jan 28, 2014 at 3:13 PM, Rainer Joswig <joswig@lisp.de> wrote:
>
> LispWorks 6.1.1 on Mac OS X 10.9.1. quad core i7.
>
> Preemption is done by the OS. It has nothing to do whether the thread is interruptible or not.
> On a timer event, the processor gives control to the OS kernel and the OS kernel then decides what to run next.
>
> For example I can easily create a bunch of threads which make my quad core with 2 threads per core Mac mini be completely busy, but the application GUI does not react anymore. The machine runs the threads at full speed and other applications are still nicely responsive. I'm writing this mail without any problems while LispWorks picks up 790% CPU time.
>
> The tagbody version does not check for interrupts and thus the LispWorks application does not respond anymore.
> It would be great if it did, but it is independent of just thread preemption. Note that 'interrupts' does not mean user interrupts. Interrupts are a basic OS services to communicate between OS processes.
>
> See this example:
>
> (dotimes (i 30)
>   (mp:process-run-function "foo" nil (compile nil '(lambda (&aux (x 5000000000))
>                                                                                (sleep 1)
>                                                                                (tagbody x (decf x)
>                                                                                         (when (plusp x)
>                                                                                           (go x)))))))
>
> I'm running it under LispWorks 6.1.1 on a Mac. LispWorks starts the threads and runs the computation. I can see that my machine is 790% busy. It is a quad core i7 with two threads per core.
>
> After some time LispWorks gets unresponsive. It does not take events from the event loop and the OS displays the busy cursor after a few seconds.
>
> But when the computation is over - I just count down -, LispWorks quits the threads and is responsive again.
>
> The TAGBODY version is also a bit faster than the LOOP version.
>
> It would be great though, if LispWorks could service the event loop, while background processes are running for a longer time, background processes which do not check for interrupts.
>
> @Martin: what would it take in above example to keep LispWorks responsive without help from the background threads? Shouldn't the initial thread run with some higher priority and be able to service the event loop?
>
> Regards,
>
> Rainer Joswig
>
>
> Am 28.01.2014 um 13:30 schrieb James M. Lawrence <llmjjmll@gmail.com>:
>
>>
>> A simple way to demonstrate the need for yield points is to run
>>
>> (funcall (compile nil '(lambda () (tagbody x (go x)))))
>>
>> This will be uninterruptible and, at least on Linux, often crash the
>> GUI. TAGBODY has no yield points, but LOOP does, so
>>
>> (funcall (compile nil '(lambda () (loop))))
>>
>> is not a problem. Note this is some kind of compiler magic since we
>> don't see MP:YIELD in the macroexpansion of LOOP.
>>
>> LW threads need yield points, which are inserted automatically into
>> non-inlined/non-safety-0 functions, LOOP, and perhaps some other
>> forms. This is a common enough case to make it seem like threads are
>> pre-emptive.
>>
>> Regards,
>> lmj
>>
>> On Mon, Jan 27, 2014 at 5:46 PM, Paul Tarvydas <paultarvydas@gmail.com> wrote:
>>> From
>>> http://www.lispworks.com/documentation/lw61/LW/html/lw-295.htm#pgfId-896065
>>>
>>> "Threads running Lisp code can be rescheduled preemptively, so if you call
>>> into Lisp from more than one thread simultaneously and one request takes a
>>> long time then it will not delay the requests in other threads."
>>>
>>>
>>> Up to 5.1, LW ran all code in one process.  With 6.0, true SMP
>>> multi-processing was implemented.  Every top-level window is a separate O/S
>>> process in LWW and LWL.  I believe that MacOS requires that all windows run
>>> under one process (or something like that, I'm not a Mac guy) and LWM
>>> respects this requirement.
>>>
>>>
>>> Here
>>> http://www.lispworks.com/documentation/lw61/LW/html/lw-1140.htm#pgfId-1688841
>>> MP:YIELD is documented:
>>>
>>> "Normally code compiled at safety 0 cannot be preempted because the
>>> necessary checks are ommited. ... if you find that preemption is not working
>>> in a loop with no function calls, yield can be useful."
>>>
>>> So, either
>>>
>>> 1) MP:YIELD is a hold-over from pre-6.0 days, and retained for backwards
>>> compatibility, or,
>>>
>>> 2) There is some hoary edge-case that prevents the operating system from
>>> preempting LW.  I don't believe this one.
>>>
>>> I don't have LWM, so I can't try this out, but you can test it.  Write a
>>> tight loop that has no function calls (check with the disassembler), then
>>> compile it at safety 0 and see if preemption stops.  Spawn 100 processes
>>> (e.g. some number larger than your number of cores) each running the loop.
>>> Do 99 of them get starved?
>>>
>>>
>>> When I first pored over the 6.0 MP manuals, I was confused by
>>> http://www.lispworks.com/documentation/lw61/LW/html/lw-283.htm#pgfId-894689
>>> .  Again, to my eyes, this looks like backwards compatibility with 5.1
>>> and/or a convenience function that allows process readiness to be computed
>>> by a function which is called at random intervals.  My conclusion was -
>>> don't use these functions in >= 6.0 code.
>>>
>>> pt
>>>
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>
>
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html
>

Re: Multithreading question

On Tue, Jan 28, 2014 at 6:23 PM, Rainer Joswig <joswig@lisp.de> wrote:
>
>
> Am 28.01.2014 um 22:48 schrieb James M. Lawrence <llmjjmll@gmail.com>:
>
>> The issue is completely independent of the GUI. Attached is the test
>> case for Lisp Support Call #39324 demonstrating the problem using
>> messaging with lockless queues. RUN will eventually hang. Adding
>> MP:YIELD to the TAGBODY fixes it.
>
> What does 'eventually' mean. I'm running it on my quad-core Mac for five minutes now.
> Does it take longer to show?

It hangs immediately or almost immediately on quad-core Linux. Did you
compile it?

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Pascal Costanza <pc@p-cos.net> writes:

> My experience seems very different from yours. 

I recently had very good experiences converting a heavy computation
(millions of correlation coefficients each based on several hundred data
points) to run in parallell (I used as many threads as there are cpu
cores on the given machine) on LW 6.1 running on debian. In fact, there
was not very much overhead, the computation ran fairly close to in 1/N
of the original time (where N is the number of cpu cores). But while
each correlation coefficient is computed with heavily optimized code,
the main loop of each thread does not inline those function calls, and
that, I guess, makes the code behave LispWorks-friendly.

> We have never used yield for anything. When looking at the disassembly
> of yield, it looks like a no-op to me.

I was puzzled by that too. Is the point that the function call to
mp:yield (which compiles to a simple CALL) itself makes the thread
interruptable by lisp? I never took that assembly programming course I
should have taken (too many other courses in my curriculium ;-)) so I
feel like a complete ignorant when trying to understand these things...
-- 
  (espen)

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

On Wed, Jan 29, 2014 at 5:37 AM, Rainer Joswig <joswig@lisp.de> wrote:
>
> Hi,
>
> I was running James' code a bit on my quad core Mac yesterday.
>
> It is very unpredictable - sometimes it runs for many minutes and sometimes it got problems early. Since it can run sometimes for a long time, without the need of yield, I don't think that's a problem.
>
> I was using it from the terminal with multi-processing enabled.
>
> >From looking at the code I'm not completely convinced that it is thread safe and the example code can then go into endless loops. I can see that there are still threads running (the machine is still busy on one or more cores).
>
> If mp:yield would call an OS yield, then the thread would get stopped and put at the end of the queue. This could prevent a race condition between enqueue and dequeue. The yield would be in the dequeue and would put the dequeue threads at the end of the list of threads to execute.
>
> >From some testing I have the suspicion that the queue code has a problem - but can't pin that down yet. The enqueue function uses a thread-safe way to add to the end of the queue, but then the queue tail slot is modified.
>
> I guess I can play a bit more with it in the evening...

(defun enqueue (value queue)
  ;; Attempt CAS, repeat upon failure. Upon success update QUEUE-TAIL.
  (let ((new (cons value nil)))
    (loop (when (compare-and-swap (cdr (queue-tail queue)) nil new)
            (setf (queue-tail queue) new)
            (return value)))))

You are not alone in puzzling over ENQUEUE. A similar stress test
revealed a bug in Clozure, and it was only after some discussion that
the Clozure maintainer became convinced it was a Clozure bug.
Naturally, the initial reaction _should_ be that the queue code is
wrong. On the other hand, race conditions inside an implementation can
hide for years until stress testing roots them out, and in fact I've
found several bugs relating to multi-threading in Clozure (and one in
SBCL).

I had explained ENQUEUE like this
(http://trac.clozure.com/ccl/ticket/1058#comment:8):

It seems to me that the single CAS in ENQUEUE provides necessary and
sufficient ordering.

After a thread writes to the tail slot, another thread may read the
old tail for some length of time. If so then that's fine -- CAS will
fail for some number of iterations, and no harm is done.

Therefore a write to the tail slot is always preceded by a valid read.
And a valid read must have been preceded by a completed write.
Therefore writes to the tail slot are ordered as a consequence of CAS
writes being ordered.

Best,
lmj

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

As shown in the lparallel benchmarks, LispWorks does produce speedup
in kinds of cases that people have described in their responses.

However, LispWorks fails at a very specific kind of benchmark
(involving something like the Cilk algorithm
http://supertech.csail.mit.edu/cilk/) which requires fine-grained
context switching and a low overhead for threading. As I mentioned,
LispWorks is not just slightly bad in this case, but has a 10-fold
slowdown where SBCL and CCL achieve N-fold speedup. The problem is
only apparent when we choose a small function like Fibonacci where the
threading overhead is significant.

Thus the issue is not whether LispWorks does true paralellism -- most
obviously it does -- but what makes its threading model so different
from SBCL and CCL, and why it performs so poorly in this one case. The
requirement of LW threads needing yield points is a natural place to
look.

I'm not asking anyone to solve the problem for me -- mainly, I just
wanted to clarify that lparallel does in fact work fine with LispWorks
(except in this one special case).

Best,
lmj

On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
> Hi,
>
> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>
> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>
> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>
> Best,
> Pascal
>
> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>
>> The issue is completely independent of the GUI. Attached is the test
>> case for Lisp Support Call #39324 demonstrating the problem using
>> messaging with lockless queues. RUN will eventually hang. Adding
>> MP:YIELD to the TAGBODY fixes it.
>>
>> The underlying OS thread is preemptive, of course. I said a LW thread
>> is not preemptive, and by typical terminology I believe that is
>> accurate. Any thread abstraction requiring yield calls is not _really_
>> preemptive in my book, even if the necessary yield calls are
>> automatically inserted in most cases. Having had a case not in the
>> "most cases", I was ferociously bitten by the issue in a way that I
>> will never forget.
>>
>> LispWorks performs poorly in the lparallel benchmarks
>>
>>  (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>
>> while SBCL and CCL perform quite well, which is another reason I
>> believe LispWorks has an unusual threading model. In particular there
>> is a 10X slowdown where there should be an N-fold speedup for
>> tight-loop situations like the Fibonacci benchmark. This is consistent
>> with what I understand about LW needing yield points.

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

The problem isn't the scheduler -- the algorithm isn't
scheduling-intensive. There are two schedulers available in lparallel:
a central-queue scheduler and a lock-free stealing scheduler. A
compile-time option decides between them. Both schedulers give good
performance on SBCL and CCL, while on LispWorks both schedulers give
super-bad performance in this one special case. In the
non-special-cases, LispWorks performs well with either scheduler.

The algorithm resembles Cilk in some respects but differs in other
respects. Since more detail would be (more) off-topic for this list,
I'll follow up in private email.

Thanks,
lmj

On Tue, Feb 4, 2014 at 2:46 AM, Pascal Costanza <pc@p-cos.net> wrote:
> Hi,
>
> It's exactly with Cilk-like work stealing where we achieved the best performance results with LispWorks. So your experience seems very different from ours.
>
> I hope I will be able to open source our own scheduler very soon, then we can compare.
>
> Where in your code basis do I need to look? I'd like to check what you're doing, maybe I notice something...
>
> Pascal
>
> Sent from my iPad
>
>> On 3 Feb 2014, at 20:33, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>
>>
>> As shown in the lparallel benchmarks, LispWorks does produce speedup
>> in kinds of cases that people have described in their responses.
>>
>> However, LispWorks fails at a very specific kind of benchmark
>> (involving something like the Cilk algorithm
>> http://supertech.csail.mit.edu/cilk/) which requires fine-grained
>> context switching and a low overhead for threading. As I mentioned,
>> LispWorks is not just slightly bad in this case, but has a 10-fold
>> slowdown where SBCL and CCL achieve N-fold speedup. The problem is
>> only apparent when we choose a small function like Fibonacci where the
>> threading overhead is significant.
>>
>> Thus the issue is not whether LispWorks does true paralellism -- most
>> obviously it does -- but what makes its threading model so different
>> from SBCL and CCL, and why it performs so poorly in this one case. The
>> requirement of LW threads needing yield points is a natural place to
>> look.
>>
>> I'm not asking anyone to solve the problem for me -- mainly, I just
>> wanted to clarify that lparallel does in fact work fine with LispWorks
>> (except in this one special case).
>>
>> Best,
>> lmj
>>
>>> On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>> Hi,
>>>
>>> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>>>
>>> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>>>
>>> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>>>
>>> Best,
>>> Pascal
>>>
>>>> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>
>>>> The issue is completely independent of the GUI. Attached is the test
>>>> case for Lisp Support Call #39324 demonstrating the problem using
>>>> messaging with lockless queues. RUN will eventually hang. Adding
>>>> MP:YIELD to the TAGBODY fixes it.
>>>>
>>>> The underlying OS thread is preemptive, of course. I said a LW thread
>>>> is not preemptive, and by typical terminology I believe that is
>>>> accurate. Any thread abstraction requiring yield calls is not _really_
>>>> preemptive in my book, even if the necessary yield calls are
>>>> automatically inserted in most cases. Having had a case not in the
>>>> "most cases", I was ferociously bitten by the issue in a way that I
>>>> will never forget.
>>>>
>>>> LispWorks performs poorly in the lparallel benchmarks
>>>>
>>>> (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>>>
>>>> while SBCL and CCL perform quite well, which is another reason I
>>>> believe LispWorks has an unusual threading model. In particular there
>>>> is a 10X slowdown where there should be an N-fold speedup for
>>>> tight-loop situations like the Fibonacci benchmark. This is consistent
>>>> with what I understand about LW needing yield points.
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

Pascal,

lparallel is not based upon futures. The promises/futures package is
completely independent of the rest of the project (save for PLET). In
particular it is not related to the Cilk-like algorithm in lparallel
(which uses a different PLET).

The lparallel documentation has an example showing how parallelism
based upon futures can be catastrophically bad
(http://lparallel.org/defpun/). Future-based parallelism fails to
deliver in the case of Fibonacci; there is no question about that, and
we certainly agree on that point. This problem -- the one that you
mention -- is solved by the Cilk-like algorithm in lparallel.

One purpose of lparallel is to provide parallelized variants of some
CL functions -- pmap, preduce, etc. -- which I call cognates. I wanted
these functions to match or beat their CL counterparts even in the
vicinity of worst-case scenarios (small data), so that they may be
used without hesitation in many cases.

The lparallel benchmarks are measuring two things: the performance of
some cognates, and the performance of the Cilk-like algorithm. In
comparison to SBCL and CCL, LispWorks doesn't fare particularly well
with the cognates, but it is still OK (the difference is greatly more
significant than the scheduling overhead, so again, scheduling is not
the issue). This is quite distinct from the 10X slowdown in the
Fibonacci benchmark. My story hasn't changed; I am just focusing on
the serious problem.

I am thrilled that you have taken an interest in hunting down this
LispWorks-only issue. The Cilk-like algorithm differs from Cilk and
will require some explanation, which I am very much willing to
provide. I have found threading bugs in every Lisp implementation I
have tested with lparallel, including in LispWorks (6.0, fixed before
I reported it). I hope, at least, to be taken seriously. And I would
ask -- plead! -- that you take the time to understand the issue before
drawing conclusions.

Regards,
lmj


On Wed, Feb 5, 2014 at 1:59 AM, Pascal Costanza <pc@p-cos.net> wrote:
> [I don't agree to move the discussion to private email, because I believe your statements throw a wrong light on LispWorks's support for SMP, and I believe this needs to be corrected.]
>
> A previous post of yours seemed to suggest that LispWorks performs poor in general (but I may have misunderstood you), while your recent post below narrows the claim down to one particular case, at least for me. If I understand correctly (please correct me if I'm wrong), the particular case is a parallel version of the Fibonacci function. Alas, I get very good performance out of the Fibonacci function as well with our own scheduler.
>
> When looking at lparallel, it appears to me that it's based on a future abstraction, where a force function is used to synchronize and wait for a result that is potentially still executing in parallel, or hasn't yet started executing yet.
>
> While futures are very flexible and useful for expressing concurrency, they don't guarantee good performance. They may speed up computations, but they also may not. This is a characteristic of futures, and not of particular implementations of futures. It may be that the slowdown you notice is because of an aspect of how LispWorks provides SMP, but it may also be because of this characteristic of futures.
>
> Cilk-style scheduling is based on the notion of work-stealing schedulers (which you seem to provide in lparallel), but also on fork/join computations, which require a much stricter expression of parallel algorithms than with futures. They are a lot less flexible than futures, but in return provide guaranteed speed ups, both in theory and in practice.
>
> The paper at http://supertech.csail.mit.edu/papers/steal.pdf describes this in detail. Especially, it stresses the importance of "fully strict" computations, and describes how this property is exploited by Cilk to guarantee speed ups. Fork/join computations yield "fully strict" computations, but futures don't.
>
> Pascal
>
> On 5 Feb 2014, at 04:32, James M. Lawrence <llmjjmll@gmail.com> wrote:
>
>>
>> The problem isn't the scheduler -- the algorithm isn't
>> scheduling-intensive. There are two schedulers available in lparallel:
>> a central-queue scheduler and a lock-free stealing scheduler. A
>> compile-time option decides between them. Both schedulers give good
>> performance on SBCL and CCL, while on LispWorks both schedulers give
>> super-bad performance in this one special case. In the
>> non-special-cases, LispWorks performs well with either scheduler.
>>
>> The algorithm resembles Cilk in some respects but differs in other
>> respects. Since more detail would be (more) off-topic for this list,
>> I'll follow up in private email.
>>
>> Thanks,
>> lmj
>>
>> On Tue, Feb 4, 2014 at 2:46 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>> Hi,
>>>
>>> It's exactly with Cilk-like work stealing where we achieved the best performance results with LispWorks. So your experience seems very different from ours.
>>>
>>> I hope I will be able to open source our own scheduler very soon, then we can compare.
>>>
>>> Where in your code basis do I need to look? I'd like to check what you're doing, maybe I notice something...
>>>
>>> Pascal
>>>
>>> Sent from my iPad
>>>
>>>> On 3 Feb 2014, at 20:33, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>>>
>>>>
>>>> As shown in the lparallel benchmarks, LispWorks does produce speedup
>>>> in kinds of cases that people have described in their responses.
>>>>
>>>> However, LispWorks fails at a very specific kind of benchmark
>>>> (involving something like the Cilk algorithm
>>>> http://supertech.csail.mit.edu/cilk/) which requires fine-grained
>>>> context switching and a low overhead for threading. As I mentioned,
>>>> LispWorks is not just slightly bad in this case, but has a 10-fold
>>>> slowdown where SBCL and CCL achieve N-fold speedup. The problem is
>>>> only apparent when we choose a small function like Fibonacci where the
>>>> threading overhead is significant.
>>>>
>>>> Thus the issue is not whether LispWorks does true paralellism -- most
>>>> obviously it does -- but what makes its threading model so different
>>>> from SBCL and CCL, and why it performs so poorly in this one case. The
>>>> requirement of LW threads needing yield points is a natural place to
>>>> look.
>>>>
>>>> I'm not asking anyone to solve the problem for me -- mainly, I just
>>>> wanted to clarify that lparallel does in fact work fine with LispWorks
>>>> (except in this one special case).
>>>>
>>>> Best,
>>>> lmj
>>>>
>>>>> On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>> Hi,
>>>>>
>>>>> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>>>>>
>>>>> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>>>>>
>>>>> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>>>>>
>>>>> Best,
>>>>> Pascal
>>>>>
>>>>>> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>>
>>>>>> The issue is completely independent of the GUI. Attached is the test
>>>>>> case for Lisp Support Call #39324 demonstrating the problem using
>>>>>> messaging with lockless queues. RUN will eventually hang. Adding
>>>>>> MP:YIELD to the TAGBODY fixes it.
>>>>>>
>>>>>> The underlying OS thread is preemptive, of course. I said a LW thread
>>>>>> is not preemptive, and by typical terminology I believe that is
>>>>>> accurate. Any thread abstraction requiring yield calls is not _really_
>>>>>> preemptive in my book, even if the necessary yield calls are
>>>>>> automatically inserted in most cases. Having had a case not in the
>>>>>> "most cases", I was ferociously bitten by the issue in a way that I
>>>>>> will never forget.
>>>>>>
>>>>>> LispWorks performs poorly in the lparallel benchmarks
>>>>>>
>>>>>> (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>>>>>
>>>>>> while SBCL and CCL perform quite well, which is another reason I
>>>>>> believe LispWorks has an unusual threading model. In particular there
>>>>>> is a 10X slowdown where there should be an N-fold speedup for
>>>>>> tight-loop situations like the Fibonacci benchmark. This is consistent
>>>>>> with what I understand about LW needing yield points.
>>>>
>>>> _______________________________________________
>>>> Lisp Hug - the mailing list for LispWorks users
>>>> lisp-hug@lispworks.com
>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>
>
> --
> Pascal Costanza
> The views expressed in this email are my own, and not those of my employer.
>
>
>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

(Responding to myself.) After learning that LW needed these yield
points scattered about code, coupled with the poor performance I was
seeing, I had suspected there was something peculiar about LW
threading. As I mentioned, I knew they were OS threads, but I got the
impression there was some kind of historical cruft lying underneath.
(The yield point requirement still seems peculiar, but never mind.)

Now that I'm convinced the expected performance should be good, I
should whittle down the issue to a small, self-contained test case.


On Wed, Feb 5, 2014 at 6:02 PM, James M. Lawrence <llmjjmll@gmail.com> wrote:
> Pascal,
>
> lparallel is not based upon futures. The promises/futures package is
> completely independent of the rest of the project (save for PLET). In
> particular it is not related to the Cilk-like algorithm in lparallel
> (which uses a different PLET).
>
> The lparallel documentation has an example showing how parallelism
> based upon futures can be catastrophically bad
> (http://lparallel.org/defpun/). Future-based parallelism fails to
> deliver in the case of Fibonacci; there is no question about that, and
> we certainly agree on that point. This problem -- the one that you
> mention -- is solved by the Cilk-like algorithm in lparallel.
>
> One purpose of lparallel is to provide parallelized variants of some
> CL functions -- pmap, preduce, etc. -- which I call cognates. I wanted
> these functions to match or beat their CL counterparts even in the
> vicinity of worst-case scenarios (small data), so that they may be
> used without hesitation in many cases.
>
> The lparallel benchmarks are measuring two things: the performance of
> some cognates, and the performance of the Cilk-like algorithm. In
> comparison to SBCL and CCL, LispWorks doesn't fare particularly well
> with the cognates, but it is still OK (the difference is greatly more
> significant than the scheduling overhead, so again, scheduling is not
> the issue). This is quite distinct from the 10X slowdown in the
> Fibonacci benchmark. My story hasn't changed; I am just focusing on
> the serious problem.
>
> I am thrilled that you have taken an interest in hunting down this
> LispWorks-only issue. The Cilk-like algorithm differs from Cilk and
> will require some explanation, which I am very much willing to
> provide. I have found threading bugs in every Lisp implementation I
> have tested with lparallel, including in LispWorks (6.0, fixed before
> I reported it). I hope, at least, to be taken seriously. And I would
> ask -- plead! -- that you take the time to understand the issue before
> drawing conclusions.
>
> Regards,
> lmj
>
>
> On Wed, Feb 5, 2014 at 1:59 AM, Pascal Costanza <pc@p-cos.net> wrote:
>> [I don't agree to move the discussion to private email, because I believe your statements throw a wrong light on LispWorks's support for SMP, and I believe this needs to be corrected.]
>>
>> A previous post of yours seemed to suggest that LispWorks performs poor in general (but I may have misunderstood you), while your recent post below narrows the claim down to one particular case, at least for me. If I understand correctly (please correct me if I'm wrong), the particular case is a parallel version of the Fibonacci function. Alas, I get very good performance out of the Fibonacci function as well with our own scheduler.
>>
>> When looking at lparallel, it appears to me that it's based on a future abstraction, where a force function is used to synchronize and wait for a result that is potentially still executing in parallel, or hasn't yet started executing yet.
>>
>> While futures are very flexible and useful for expressing concurrency, they don't guarantee good performance. They may speed up computations, but they also may not. This is a characteristic of futures, and not of particular implementations of futures. It may be that the slowdown you notice is because of an aspect of how LispWorks provides SMP, but it may also be because of this characteristic of futures.
>>
>> Cilk-style scheduling is based on the notion of work-stealing schedulers (which you seem to provide in lparallel), but also on fork/join computations, which require a much stricter expression of parallel algorithms than with futures. They are a lot less flexible than futures, but in return provide guaranteed speed ups, both in theory and in practice.
>>
>> The paper at http://supertech.csail.mit.edu/papers/steal.pdf describes this in detail. Especially, it stresses the importance of "fully strict" computations, and describes how this property is exploited by Cilk to guarantee speed ups. Fork/join computations yield "fully strict" computations, but futures don't.
>>
>> Pascal
>>
>> On 5 Feb 2014, at 04:32, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>
>>>
>>> The problem isn't the scheduler -- the algorithm isn't
>>> scheduling-intensive. There are two schedulers available in lparallel:
>>> a central-queue scheduler and a lock-free stealing scheduler. A
>>> compile-time option decides between them. Both schedulers give good
>>> performance on SBCL and CCL, while on LispWorks both schedulers give
>>> super-bad performance in this one special case. In the
>>> non-special-cases, LispWorks performs well with either scheduler.
>>>
>>> The algorithm resembles Cilk in some respects but differs in other
>>> respects. Since more detail would be (more) off-topic for this list,
>>> I'll follow up in private email.
>>>
>>> Thanks,
>>> lmj
>>>
>>> On Tue, Feb 4, 2014 at 2:46 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>> Hi,
>>>>
>>>> It's exactly with Cilk-like work stealing where we achieved the best performance results with LispWorks. So your experience seems very different from ours.
>>>>
>>>> I hope I will be able to open source our own scheduler very soon, then we can compare.
>>>>
>>>> Where in your code basis do I need to look? I'd like to check what you're doing, maybe I notice something...
>>>>
>>>> Pascal
>>>>
>>>> Sent from my iPad
>>>>
>>>>> On 3 Feb 2014, at 20:33, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>>>>
>>>>>
>>>>> As shown in the lparallel benchmarks, LispWorks does produce speedup
>>>>> in kinds of cases that people have described in their responses.
>>>>>
>>>>> However, LispWorks fails at a very specific kind of benchmark
>>>>> (involving something like the Cilk algorithm
>>>>> http://supertech.csail.mit.edu/cilk/) which requires fine-grained
>>>>> context switching and a low overhead for threading. As I mentioned,
>>>>> LispWorks is not just slightly bad in this case, but has a 10-fold
>>>>> slowdown where SBCL and CCL achieve N-fold speedup. The problem is
>>>>> only apparent when we choose a small function like Fibonacci where the
>>>>> threading overhead is significant.
>>>>>
>>>>> Thus the issue is not whether LispWorks does true paralellism -- most
>>>>> obviously it does -- but what makes its threading model so different
>>>>> from SBCL and CCL, and why it performs so poorly in this one case. The
>>>>> requirement of LW threads needing yield points is a natural place to
>>>>> look.
>>>>>
>>>>> I'm not asking anyone to solve the problem for me -- mainly, I just
>>>>> wanted to clarify that lparallel does in fact work fine with LispWorks
>>>>> (except in this one special case).
>>>>>
>>>>> Best,
>>>>> lmj
>>>>>
>>>>>> On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>>>>>>
>>>>>> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>>>>>>
>>>>>> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>>>>>>
>>>>>> Best,
>>>>>> Pascal
>>>>>>
>>>>>>> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>>>
>>>>>>> The issue is completely independent of the GUI. Attached is the test
>>>>>>> case for Lisp Support Call #39324 demonstrating the problem using
>>>>>>> messaging with lockless queues. RUN will eventually hang. Adding
>>>>>>> MP:YIELD to the TAGBODY fixes it.
>>>>>>>
>>>>>>> The underlying OS thread is preemptive, of course. I said a LW thread
>>>>>>> is not preemptive, and by typical terminology I believe that is
>>>>>>> accurate. Any thread abstraction requiring yield calls is not _really_
>>>>>>> preemptive in my book, even if the necessary yield calls are
>>>>>>> automatically inserted in most cases. Having had a case not in the
>>>>>>> "most cases", I was ferociously bitten by the issue in a way that I
>>>>>>> will never forget.
>>>>>>>
>>>>>>> LispWorks performs poorly in the lparallel benchmarks
>>>>>>>
>>>>>>> (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>>>>>>
>>>>>>> while SBCL and CCL perform quite well, which is another reason I
>>>>>>> believe LispWorks has an unusual threading model. In particular there
>>>>>>> is a 10X slowdown where there should be an N-fold speedup for
>>>>>>> tight-loop situations like the Fibonacci benchmark. This is consistent
>>>>>>> with what I understand about LW needing yield points.
>>>>>
>>>>> _______________________________________________
>>>>> Lisp Hug - the mailing list for LispWorks users
>>>>> lisp-hug@lispworks.com
>>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>>
>>>
>>> _______________________________________________
>>> Lisp Hug - the mailing list for LispWorks users
>>> lisp-hug@lispworks.com
>>> http://www.lispworks.com/support/lisp-hug.html
>>>
>>
>> --
>> Pascal Costanza
>> The views expressed in this email are my own, and not those of my employer.
>>
>>
>>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Multithreading question

My apologies to LispWorks for jumping to conclusions and dissing LW
threads. Whatever the underlying issue in this particular case turns
out to be, I think it will be interesting.


On Thu, Feb 6, 2014 at 12:53 AM, Pascal Costanza <pc@p-cos.net> wrote:
> I apologize for misrepresenting your library. I agree that identifying and solving the real underlying issue should be our focus. I hope I can contribute to that. More hopefully later...
>
> Pascal
>
> Sent from my iPad
>
>> On 6 Feb 2014, at 02:18, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>
>>
>> (Responding to myself.) After learning that LW needed these yield
>> points scattered about code, coupled with the poor performance I was
>> seeing, I had suspected there was something peculiar about LW
>> threading. As I mentioned, I knew they were OS threads, but I got the
>> impression there was some kind of historical cruft lying underneath.
>> (The yield point requirement still seems peculiar, but never mind.)
>>
>> Now that I'm convinced the expected performance should be good, I
>> should whittle down the issue to a small, self-contained test case.
>>
>>
>>> On Wed, Feb 5, 2014 at 6:02 PM, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>> Pascal,
>>>
>>> lparallel is not based upon futures. The promises/futures package is
>>> completely independent of the rest of the project (save for PLET). In
>>> particular it is not related to the Cilk-like algorithm in lparallel
>>> (which uses a different PLET).
>>>
>>> The lparallel documentation has an example showing how parallelism
>>> based upon futures can be catastrophically bad
>>> (http://lparallel.org/defpun/). Future-based parallelism fails to
>>> deliver in the case of Fibonacci; there is no question about that, and
>>> we certainly agree on that point. This problem -- the one that you
>>> mention -- is solved by the Cilk-like algorithm in lparallel.
>>>
>>> One purpose of lparallel is to provide parallelized variants of some
>>> CL functions -- pmap, preduce, etc. -- which I call cognates. I wanted
>>> these functions to match or beat their CL counterparts even in the
>>> vicinity of worst-case scenarios (small data), so that they may be
>>> used without hesitation in many cases.
>>>
>>> The lparallel benchmarks are measuring two things: the performance of
>>> some cognates, and the performance of the Cilk-like algorithm. In
>>> comparison to SBCL and CCL, LispWorks doesn't fare particularly well
>>> with the cognates, but it is still OK (the difference is greatly more
>>> significant than the scheduling overhead, so again, scheduling is not
>>> the issue). This is quite distinct from the 10X slowdown in the
>>> Fibonacci benchmark. My story hasn't changed; I am just focusing on
>>> the serious problem.
>>>
>>> I am thrilled that you have taken an interest in hunting down this
>>> LispWorks-only issue. The Cilk-like algorithm differs from Cilk and
>>> will require some explanation, which I am very much willing to
>>> provide. I have found threading bugs in every Lisp implementation I
>>> have tested with lparallel, including in LispWorks (6.0, fixed before
>>> I reported it). I hope, at least, to be taken seriously. And I would
>>> ask -- plead! -- that you take the time to understand the issue before
>>> drawing conclusions.
>>>
>>> Regards,
>>> lmj
>>>
>>>
>>>> On Wed, Feb 5, 2014 at 1:59 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>> [I don't agree to move the discussion to private email, because I believe your statements throw a wrong light on LispWorks's support for SMP, and I believe this needs to be corrected.]
>>>>
>>>> A previous post of yours seemed to suggest that LispWorks performs poor in general (but I may have misunderstood you), while your recent post below narrows the claim down to one particular case, at least for me. If I understand correctly (please correct me if I'm wrong), the particular case is a parallel version of the Fibonacci function. Alas, I get very good performance out of the Fibonacci function as well with our own scheduler.
>>>>
>>>> When looking at lparallel, it appears to me that it's based on a future abstraction, where a force function is used to synchronize and wait for a result that is potentially still executing in parallel, or hasn't yet started executing yet.
>>>>
>>>> While futures are very flexible and useful for expressing concurrency, they don't guarantee good performance. They may speed up computations, but they also may not. This is a characteristic of futures, and not of particular implementations of futures. It may be that the slowdown you notice is because of an aspect of how LispWorks provides SMP, but it may also be because of this characteristic of futures.
>>>>
>>>> Cilk-style scheduling is based on the notion of work-stealing schedulers (which you seem to provide in lparallel), but also on fork/join computations, which require a much stricter expression of parallel algorithms than with futures. They are a lot less flexible than futures, but in return provide guaranteed speed ups, both in theory and in practice.
>>>>
>>>> The paper at http://supertech.csail.mit.edu/papers/steal.pdf describes this in detail. Especially, it stresses the importance of "fully strict" computations, and describes how this property is exploited by Cilk to guarantee speed ups. Fork/join computations yield "fully strict" computations, but futures don't.
>>>>
>>>> Pascal
>>>>
>>>>> On 5 Feb 2014, at 04:32, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>
>>>>>
>>>>> The problem isn't the scheduler -- the algorithm isn't
>>>>> scheduling-intensive. There are two schedulers available in lparallel:
>>>>> a central-queue scheduler and a lock-free stealing scheduler. A
>>>>> compile-time option decides between them. Both schedulers give good
>>>>> performance on SBCL and CCL, while on LispWorks both schedulers give
>>>>> super-bad performance in this one special case. In the
>>>>> non-special-cases, LispWorks performs well with either scheduler.
>>>>>
>>>>> The algorithm resembles Cilk in some respects but differs in other
>>>>> respects. Since more detail would be (more) off-topic for this list,
>>>>> I'll follow up in private email.
>>>>>
>>>>> Thanks,
>>>>> lmj
>>>>>
>>>>>> On Tue, Feb 4, 2014 at 2:46 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> It's exactly with Cilk-like work stealing where we achieved the best performance results with LispWorks. So your experience seems very different from ours.
>>>>>>
>>>>>> I hope I will be able to open source our own scheduler very soon, then we can compare.
>>>>>>
>>>>>> Where in your code basis do I need to look? I'd like to check what you're doing, maybe I notice something...
>>>>>>
>>>>>> Pascal
>>>>>>
>>>>>> Sent from my iPad
>>>>>>
>>>>>>> On 3 Feb 2014, at 20:33, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> As shown in the lparallel benchmarks, LispWorks does produce speedup
>>>>>>> in kinds of cases that people have described in their responses.
>>>>>>>
>>>>>>> However, LispWorks fails at a very specific kind of benchmark
>>>>>>> (involving something like the Cilk algorithm
>>>>>>> http://supertech.csail.mit.edu/cilk/) which requires fine-grained
>>>>>>> context switching and a low overhead for threading. As I mentioned,
>>>>>>> LispWorks is not just slightly bad in this case, but has a 10-fold
>>>>>>> slowdown where SBCL and CCL achieve N-fold speedup. The problem is
>>>>>>> only apparent when we choose a small function like Fibonacci where the
>>>>>>> threading overhead is significant.
>>>>>>>
>>>>>>> Thus the issue is not whether LispWorks does true paralellism -- most
>>>>>>> obviously it does -- but what makes its threading model so different
>>>>>>> from SBCL and CCL, and why it performs so poorly in this one case. The
>>>>>>> requirement of LW threads needing yield points is a natural place to
>>>>>>> look.
>>>>>>>
>>>>>>> I'm not asking anyone to solve the problem for me -- mainly, I just
>>>>>>> wanted to clarify that lparallel does in fact work fine with LispWorks
>>>>>>> (except in this one special case).
>>>>>>>
>>>>>>> Best,
>>>>>>> lmj
>>>>>>>
>>>>>>>> On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>>>>>>>>
>>>>>>>> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>>>>>>>>
>>>>>>>> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Pascal
>>>>>>>>
>>>>>>>>> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> The issue is completely independent of the GUI. Attached is the test
>>>>>>>>> case for Lisp Support Call #39324 demonstrating the problem using
>>>>>>>>> messaging with lockless queues. RUN will eventually hang. Adding
>>>>>>>>> MP:YIELD to the TAGBODY fixes it.
>>>>>>>>>
>>>>>>>>> The underlying OS thread is preemptive, of course. I said a LW thread
>>>>>>>>> is not preemptive, and by typical terminology I believe that is
>>>>>>>>> accurate. Any thread abstraction requiring yield calls is not _really_
>>>>>>>>> preemptive in my book, even if the necessary yield calls are
>>>>>>>>> automatically inserted in most cases. Having had a case not in the
>>>>>>>>> "most cases", I was ferociously bitten by the issue in a way that I
>>>>>>>>> will never forget.
>>>>>>>>>
>>>>>>>>> LispWorks performs poorly in the lparallel benchmarks
>>>>>>>>>
>>>>>>>>> (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>>>>>>>>
>>>>>>>>> while SBCL and CCL perform quite well, which is another reason I
>>>>>>>>> believe LispWorks has an unusual threading model. In particular there
>>>>>>>>> is a 10X slowdown where there should be an N-fold speedup for
>>>>>>>>> tight-loop situations like the Fibonacci benchmark. This is consistent
>>>>>>>>> with what I understand about LW needing yield points.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lisp Hug - the mailing list for LispWorks users
>>>>>>> lisp-hug@lispworks.com
>>>>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>>
>>>>> _______________________________________________
>>>>> Lisp Hug - the mailing list for LispWorks users
>>>>> lisp-hug@lispworks.com
>>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>
>>>> --
>>>> Pascal Costanza
>>>> The views expressed in this email are my own, and not those of my employer.
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html