Bizarre deadlock with only one lock - (LW6.01/32/Win32/Quad-core Intel/Vista64)
I tried to distill the problem into a single code fragment (below). If
the problem won't repeat, perhaps change "workers-available" or put it
in a function and compile it. It repeated faster on a beefy quad-core
than on a slower XP dual; but would repeat. I think it's a multi-CPU thing.
The "wl" macrolet merely expands to mp:with-lock and the single lock
created in the code fragment.
Example failure point:
........
Making thread #:G1590
Error: Trying to lock 134220981 :
Deadlock {simple} : waiting for another stack which waits for
the current thread.
Other stack: #<MP:PROCESS Name "Worker #:G1588" Priority 0 State
"PROCESS-LOCK waiting for Worker thread lock">
Waits for: #<MP:LOCK "Worker thread lock" Locked once by "CAPI
Execution Listener 1" 2071ABCB>
The failure occurs where, according to a stack trace, the listener
process is still inside the "process-run-function" call, with the lock
still held. Notice though from the dump - the thread it was making
above was G1590; but the "other stack" is listed as G1588.
Is it legal to start a thread from within a thread that happens to hold
a lock? That's all I can see that could be wrong here in code, since
otherwise there is only one lock. . . .
Should I be doing this differently, or should I write this up as a
potential LW6.01 issue?
Anyway, the code is listed below:
(let ((lock (mp:make-lock :name "Worker thread lock"))
(workers-available 16))
(macrolet ((wl (&body body) `(mp:with-lock (lock) (prog1 (let nil
,@body)))))
(labels ((worker-thread-run ()
(wl
(incf workers-available)))
(give-work ()
(wl
(unless (zerop workers-available)
(decf workers-available)
(let ((id (gensym)))
(format t "Making thread ~S" id)
(terpri)
(mp:process-run-function (format nil "Worker ~S" id)
nil #'worker-thread-run))))))
(loop while t do
(give-work)
(sleep 0)))))