Subtle problem in MP code: Are we doing this right?
Dear lisp-huggers,
Consider the following piece of code:
(defun invoke-with-timeout (timeout bodyfn timeoutfn)
(block timeout
(let* ((process mp:*current-process*)
(timer (mp:make-timer
#'(lambda ()
(mp:process-interrupt process
#'(lambda ()
(return-from timeout
(funcall timeoutfn))))))))
(mp:schedule-timer-relative timer timeout)
(unwind-protect (funcall bodyfn)
(mp:unschedule-timer timer)))))
This code is from the acl-compat library distributed with portable
allegroserve. (Thanks Jochen!)
We have, in *very* rare circumstances, seen the following occur:
Condition: Uncaught throw of <timeoutfn return value> to (NIL).
Call to (HARLEQUIN-COMMON-LISP:SUBFUNCTION 1 ACL-COMPAT-MP::INVOKE-WITH-TIMEOUT)
Call to MP::PROCESS-OUTSTANDING-INTERRUPTS
Call to MP::RESUME-PROCESS
Call to MP:PROCESS-WAIT
Call to PING-READY-MONITORS
Call to (HARLEQUIN-COMMON-LISP:SUBFUNCTION 1 MP::INITIALIZE-PROCESS-STACK)
We believe that the following occurs:
a) Timer is scheduled
b) (funcall bodyfn) in unwind protect form runs
c) Timer begins running, mp:process-interrupt is called, but process
is not yet interrupted (in running non-safety-0 function?)
d) Process normally completes invoke-with-timeout
e) Process enters a wait-function
f) At this point, the interrupt is able to run, but the context in
which it was called has disappeared.
g) A condition is thrown containing the return value of timeoutfn.
[Because it's trying to do a RETURN-FROM to a block which is now gone]
Claudio and I propose the following correction:
(defun invoke-with-timeout (timeout bodyfn timeoutfn)
(block timeout
(let* ((process mp:*current-process*)
(unsheduled? nil)
(timer (mp:make-timer
#'(lambda ()
(mp:process-interrupt process
#'(lambda ()
(unless unsheduled?
(return-from timeout
(funcall timeoutfn)))))))))
(mp:schedule-timer-relative timer timeout)
(unwind-protect (funcall bodyfn)
(without-interrupts
(mp:unschedule-timer timer)
(setf unsheduled? t))))))
Does anyone know if this theory of ours is correct, and if our
above fix will work?
--
Alain Picard
Memetrics