Re: Multithreading question
My apologies to LispWorks for jumping to conclusions and dissing LW
threads. Whatever the underlying issue in this particular case turns
out to be, I think it will be interesting.
On Thu, Feb 6, 2014 at 12:53 AM, Pascal Costanza <pc@p-cos.net> wrote:
> I apologize for misrepresenting your library. I agree that identifying and solving the real underlying issue should be our focus. I hope I can contribute to that. More hopefully later...
>
> Pascal
>
> Sent from my iPad
>
>> On 6 Feb 2014, at 02:18, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>
>>
>> (Responding to myself.) After learning that LW needed these yield
>> points scattered about code, coupled with the poor performance I was
>> seeing, I had suspected there was something peculiar about LW
>> threading. As I mentioned, I knew they were OS threads, but I got the
>> impression there was some kind of historical cruft lying underneath.
>> (The yield point requirement still seems peculiar, but never mind.)
>>
>> Now that I'm convinced the expected performance should be good, I
>> should whittle down the issue to a small, self-contained test case.
>>
>>
>>> On Wed, Feb 5, 2014 at 6:02 PM, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>> Pascal,
>>>
>>> lparallel is not based upon futures. The promises/futures package is
>>> completely independent of the rest of the project (save for PLET). In
>>> particular it is not related to the Cilk-like algorithm in lparallel
>>> (which uses a different PLET).
>>>
>>> The lparallel documentation has an example showing how parallelism
>>> based upon futures can be catastrophically bad
>>> (http://lparallel.org/defpun/). Future-based parallelism fails to
>>> deliver in the case of Fibonacci; there is no question about that, and
>>> we certainly agree on that point. This problem -- the one that you
>>> mention -- is solved by the Cilk-like algorithm in lparallel.
>>>
>>> One purpose of lparallel is to provide parallelized variants of some
>>> CL functions -- pmap, preduce, etc. -- which I call cognates. I wanted
>>> these functions to match or beat their CL counterparts even in the
>>> vicinity of worst-case scenarios (small data), so that they may be
>>> used without hesitation in many cases.
>>>
>>> The lparallel benchmarks are measuring two things: the performance of
>>> some cognates, and the performance of the Cilk-like algorithm. In
>>> comparison to SBCL and CCL, LispWorks doesn't fare particularly well
>>> with the cognates, but it is still OK (the difference is greatly more
>>> significant than the scheduling overhead, so again, scheduling is not
>>> the issue). This is quite distinct from the 10X slowdown in the
>>> Fibonacci benchmark. My story hasn't changed; I am just focusing on
>>> the serious problem.
>>>
>>> I am thrilled that you have taken an interest in hunting down this
>>> LispWorks-only issue. The Cilk-like algorithm differs from Cilk and
>>> will require some explanation, which I am very much willing to
>>> provide. I have found threading bugs in every Lisp implementation I
>>> have tested with lparallel, including in LispWorks (6.0, fixed before
>>> I reported it). I hope, at least, to be taken seriously. And I would
>>> ask -- plead! -- that you take the time to understand the issue before
>>> drawing conclusions.
>>>
>>> Regards,
>>> lmj
>>>
>>>
>>>> On Wed, Feb 5, 2014 at 1:59 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>> [I don't agree to move the discussion to private email, because I believe your statements throw a wrong light on LispWorks's support for SMP, and I believe this needs to be corrected.]
>>>>
>>>> A previous post of yours seemed to suggest that LispWorks performs poor in general (but I may have misunderstood you), while your recent post below narrows the claim down to one particular case, at least for me. If I understand correctly (please correct me if I'm wrong), the particular case is a parallel version of the Fibonacci function. Alas, I get very good performance out of the Fibonacci function as well with our own scheduler.
>>>>
>>>> When looking at lparallel, it appears to me that it's based on a future abstraction, where a force function is used to synchronize and wait for a result that is potentially still executing in parallel, or hasn't yet started executing yet.
>>>>
>>>> While futures are very flexible and useful for expressing concurrency, they don't guarantee good performance. They may speed up computations, but they also may not. This is a characteristic of futures, and not of particular implementations of futures. It may be that the slowdown you notice is because of an aspect of how LispWorks provides SMP, but it may also be because of this characteristic of futures.
>>>>
>>>> Cilk-style scheduling is based on the notion of work-stealing schedulers (which you seem to provide in lparallel), but also on fork/join computations, which require a much stricter expression of parallel algorithms than with futures. They are a lot less flexible than futures, but in return provide guaranteed speed ups, both in theory and in practice.
>>>>
>>>> The paper at http://supertech.csail.mit.edu/papers/steal.pdf describes this in detail. Especially, it stresses the importance of "fully strict" computations, and describes how this property is exploited by Cilk to guarantee speed ups. Fork/join computations yield "fully strict" computations, but futures don't.
>>>>
>>>> Pascal
>>>>
>>>>> On 5 Feb 2014, at 04:32, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>
>>>>>
>>>>> The problem isn't the scheduler -- the algorithm isn't
>>>>> scheduling-intensive. There are two schedulers available in lparallel:
>>>>> a central-queue scheduler and a lock-free stealing scheduler. A
>>>>> compile-time option decides between them. Both schedulers give good
>>>>> performance on SBCL and CCL, while on LispWorks both schedulers give
>>>>> super-bad performance in this one special case. In the
>>>>> non-special-cases, LispWorks performs well with either scheduler.
>>>>>
>>>>> The algorithm resembles Cilk in some respects but differs in other
>>>>> respects. Since more detail would be (more) off-topic for this list,
>>>>> I'll follow up in private email.
>>>>>
>>>>> Thanks,
>>>>> lmj
>>>>>
>>>>>> On Tue, Feb 4, 2014 at 2:46 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> It's exactly with Cilk-like work stealing where we achieved the best performance results with LispWorks. So your experience seems very different from ours.
>>>>>>
>>>>>> I hope I will be able to open source our own scheduler very soon, then we can compare.
>>>>>>
>>>>>> Where in your code basis do I need to look? I'd like to check what you're doing, maybe I notice something...
>>>>>>
>>>>>> Pascal
>>>>>>
>>>>>> Sent from my iPad
>>>>>>
>>>>>>> On 3 Feb 2014, at 20:33, "James M. Lawrence" <llmjjmll@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> As shown in the lparallel benchmarks, LispWorks does produce speedup
>>>>>>> in kinds of cases that people have described in their responses.
>>>>>>>
>>>>>>> However, LispWorks fails at a very specific kind of benchmark
>>>>>>> (involving something like the Cilk algorithm
>>>>>>> http://supertech.csail.mit.edu/cilk/) which requires fine-grained
>>>>>>> context switching and a low overhead for threading. As I mentioned,
>>>>>>> LispWorks is not just slightly bad in this case, but has a 10-fold
>>>>>>> slowdown where SBCL and CCL achieve N-fold speedup. The problem is
>>>>>>> only apparent when we choose a small function like Fibonacci where the
>>>>>>> threading overhead is significant.
>>>>>>>
>>>>>>> Thus the issue is not whether LispWorks does true paralellism -- most
>>>>>>> obviously it does -- but what makes its threading model so different
>>>>>>> from SBCL and CCL, and why it performs so poorly in this one case. The
>>>>>>> requirement of LW threads needing yield points is a natural place to
>>>>>>> look.
>>>>>>>
>>>>>>> I'm not asking anyone to solve the problem for me -- mainly, I just
>>>>>>> wanted to clarify that lparallel does in fact work fine with LispWorks
>>>>>>> (except in this one special case).
>>>>>>>
>>>>>>> Best,
>>>>>>> lmj
>>>>>>>
>>>>>>>> On Wed, Jan 29, 2014 at 2:56 AM, Pascal Costanza <pc@p-cos.net> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> My experience seems very different from yours. We have used LispWorks's SMP very successfully and repeatedly achieved extremely good performance on multi-core CPUs. We can reliably get close in performance to equivalent code written in C++ with Threading Building Blocks, and on (rare) occasions even slightly better. (Incidentally, I will talk about some of our experiences in my talk at ELS in Paris this year, which will indeed focus on LispWorks.)
>>>>>>>>
>>>>>>>> We don't use a portable library for these purposes, but our own work-stealing scheduler that directly uses LispWorks facilities. It seems to me that it requires deeper investigation why you can't achieve similar speed ups with lparallel.
>>>>>>>>
>>>>>>>> We have never used yield for anything. When looking at the disassembly of yield, it looks like a no-op to me.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Pascal
>>>>>>>>
>>>>>>>>> On 28 Jan 2014, at 22:48, James M. Lawrence <llmjjmll@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> The issue is completely independent of the GUI. Attached is the test
>>>>>>>>> case for Lisp Support Call #39324 demonstrating the problem using
>>>>>>>>> messaging with lockless queues. RUN will eventually hang. Adding
>>>>>>>>> MP:YIELD to the TAGBODY fixes it.
>>>>>>>>>
>>>>>>>>> The underlying OS thread is preemptive, of course. I said a LW thread
>>>>>>>>> is not preemptive, and by typical terminology I believe that is
>>>>>>>>> accurate. Any thread abstraction requiring yield calls is not _really_
>>>>>>>>> preemptive in my book, even if the necessary yield calls are
>>>>>>>>> automatically inserted in most cases. Having had a case not in the
>>>>>>>>> "most cases", I was ferociously bitten by the issue in a way that I
>>>>>>>>> will never forget.
>>>>>>>>>
>>>>>>>>> LispWorks performs poorly in the lparallel benchmarks
>>>>>>>>>
>>>>>>>>> (ql:quickload :lparallel-bench) (lparallel-bench:execute 4)
>>>>>>>>>
>>>>>>>>> while SBCL and CCL perform quite well, which is another reason I
>>>>>>>>> believe LispWorks has an unusual threading model. In particular there
>>>>>>>>> is a 10X slowdown where there should be an N-fold speedup for
>>>>>>>>> tight-loop situations like the Fibonacci benchmark. This is consistent
>>>>>>>>> with what I understand about LW needing yield points.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lisp Hug - the mailing list for LispWorks users
>>>>>>> lisp-hug@lispworks.com
>>>>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>>
>>>>> _______________________________________________
>>>>> Lisp Hug - the mailing list for LispWorks users
>>>>> lisp-hug@lispworks.com
>>>>> http://www.lispworks.com/support/lisp-hug.html
>>>>
>>>> --
>>>> Pascal Costanza
>>>> The views expressed in this email are my own, and not those of my employer.
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>
_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html