Lisp HUG Maillist Archive

Weird Error...

I just spent several hours trying to track down why impromptu REPL forms sometimes interpret properly, and at other times interpretation generates miserable errors - things like “Unbound variables” on what should have clearly been lexical bindings in the interpreted code bodies. Code that was compiled always runs correctly.

I finally nailed it down to having my OS thread pool manager start up via the Lisp Actions list on system startup. Starting that pool manager would appear to have no involvement with the compiler / interpreter / runtime, yet some global system setting at the time of pool startup is different from the environment available after the Listener pane has been presented to the user. And the threads created early must have captured a global setting that causes interpreted code to perform improperly in those threads.

If I manually shut down my thread pool system, then restart, from the REPL, then those same impromptu testing forms are interpreted properly in the threads of the pool. No errors at all. No recompilation of anything between the thread pool shutdown and restart. 

So some global environment from earlier times was captured by the threads started during Action list time, and that global environment must be different from when the threads are launched from the Listener REPL. And that early environment causes interpreted forms to incorrectly conclude that lexical references are referring to some unbound global symbol.

The test code in question is executed from the Listener or Editor, and gets compiled and executed on the spot each time. I traced the actions of the interpreting compiler by planting a tracer macro in the code. That showed me that the macro is never even called during compile time from the REPL, but gets fired only later when the supposedly compiled form is evaluated in the listener. It is that running of interpreted code that is being hosed up by global settings captured in the threads of the pool when those threads were produced early by the Actions list. If threads of the pool are manually started from the Listener, then the captured environment is suitable for successful running of the interpreted forms.

I ask the Actions to start me up in the “Initialize LispWorks Tools” section, :AFTER “Run the environment start up functions”. If I don’t ask the Actions to start up the thread pool system, and I do it manually from the REPL, everything works properly. I was hoping to start up the pool manager after system multithreading has been enabled, and could be as late as possible just before LW shows a Listener pane. Is there a better Action to be using?

- DM

Re: Weird Error...

… actually, i just thought of trying something, and it works.

My thread pool is always demand driven. It gets instantiated only as a result of code producing things that would like to run across a thread pool, if that pool isn’t yet available. I have an ensure-thread-pool planted in those critical locations.

So I go ahead and allow Actions to start up my system, which happens to produce two services that want the thread pool. They produce that pool, but instead of leaving the pool threads in place for later use - with the incorrect bindings, whatever they are… I do a (sleep 1) and (kill-pool) inside the startup form fired by the Actions.

That allows the system to start up properly, put the services in place, and then removes the bad threads. Later on when new code is performed at the REPL, the thread pool will once again be created automatically on demand.

That works.

- DM

On Oct 21, 2017, at 07:19, David McClain <dbm@refined-audiometrics.com> wrote:

I just spent several hours trying to track down why impromptu REPL forms sometimes interpret properly, and at other times interpretation generates miserable errors - things like “Unbound variables” on what should have clearly been lexical bindings in the interpreted code bodies. Code that was compiled always runs correctly.

I finally nailed it down to having my OS thread pool manager start up via the Lisp Actions list on system startup. Starting that pool manager would appear to have no involvement with the compiler / interpreter / runtime, yet some global system setting at the time of pool startup is different from the environment available after the Listener pane has been presented to the user. And the threads created early must have captured a global setting that causes interpreted code to perform improperly in those threads.

If I manually shut down my thread pool system, then restart, from the REPL, then those same impromptu testing forms are interpreted properly in the threads of the pool. No errors at all. No recompilation of anything between the thread pool shutdown and restart. 

So some global environment from earlier times was captured by the threads started during Action list time, and that global environment must be different from when the threads are launched from the Listener REPL. And that early environment causes interpreted forms to incorrectly conclude that lexical references are referring to some unbound global symbol.

The test code in question is executed from the Listener or Editor, and gets compiled and executed on the spot each time. I traced the actions of the interpreting compiler by planting a tracer macro in the code. That showed me that the macro is never even called during compile time from the REPL, but gets fired only later when the supposedly compiled form is evaluated in the listener. It is that running of interpreted code that is being hosed up by global settings captured in the threads of the pool when those threads were produced early by the Actions list. If threads of the pool are manually started from the Listener, then the captured environment is suitable for successful running of the interpreted forms.

I ask the Actions to start me up in the “Initialize LispWorks Tools” section, :AFTER “Run the environment start up functions”. If I don’t ask the Actions to start up the thread pool system, and I do it manually from the REPL, everything works properly. I was hoping to start up the pool manager after system multithreading has been enabled, and could be as late as possible just before LW shows a Listener pane. Is there a better Action to be using?

- DM


Re: Weird Error...

On second reflection, that doesn’t totally solve the problem.

The problem of incorrect interpretation — thinking that lexical references are instead to unbound globals — happens whenever the thread pool is recreated as a result of a simple non-binding form making the pool creation request at the REPL, or from automatically triggered compiled code. 

But if the first thing to request the pool is an interpreted form which deep internal lexical bindings and macros, then the threads of a new pool formed during interpretation of that form are able to correctly evaluate the form. So this issue really has nothing to do with the Actions list.

It appears that macros used inside of interpreted forms are not examined until runtime. And when they rewrite themselves to executable form, the interpreter evaluates arguments of resulting function calls before calling the function. To do that, the interpreter needs to know where the value is for each symbol mentioned in the argument list.

When the pool is created while interpreting such a form, the environment is captured by all threads, so this environment must be a global. And that environment is rich enough to know the lexical bindings being referenced during arg evaluation. And so any of those threads can successfully interpret the user’s request.

But if the first thing you do which creates threads, does not need any such compilation environment, the thread-captured environment will be too poor to support proper interpretation of additional REPL forms. 

So it appears that we somehow need to always spawn threads while a rich enough global compiler environment is available to be captured by each thread. That seems to do nothing to impair performance of compiled code, but it allows any thread to properly interpret REPL forms typed in by the user.

The problem I ran into is that the REPL in the Listener thread actually ran to completion properly. But what it produced were objects that were thrown at a pool of threads, and those objects further contained more interpreted code. By the time the threads caught the code to interpret for themselves, they either had a sufficiently rich environment captured from the Listener, or they didn’t. It was a toss up as to when your code would / would not run properly in interpreter mode.

The other solution, is to never use interpreted code. Always fully compile everything sent to a thread. That always seems to work properly.

This is not a criticism, but rather an attempt to understand the boundaries of the Lisp system under multithreading. I think I’m beginning to grasp some of it…

- DM


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Weird Error...

I'm not convinced that this is anything to do with threads, because they don't
have a lexical environment themselves.

Are you sure it isn't some variant of the "shared binding" problem:

(mapcar 'funcall (loop for x below 4 collect (lambda () x)))

which returns (4 4 4 4) rather than (0 1 2 3)?

-- 
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/


>>>>> On Sat, 21 Oct 2017 10:51:20 -0700, David McClain said:
> 
> On second reflection, that doesn’t totally solve the problem.
> 
> The problem of incorrect interpretation — thinking that lexical references are instead to unbound globals — happens whenever the thread pool is recreated as a result of a simple non-binding form making the pool creation request at the REPL, or from automatically triggered compiled code. 
> 
> But if the first thing to request the pool is an interpreted form which deep internal lexical bindings and macros, then the threads of a new pool formed during interpretation of that form are able to correctly evaluate the form. So this issue really has nothing to do with the Actions list.
> 
> It appears that macros used inside of interpreted forms are not examined until runtime. And when they rewrite themselves to executable form, the interpreter evaluates arguments of resulting function calls before calling the function. To do that, the interpreter needs to know where the value is for each symbol mentioned in the argument list.
> 
> When the pool is created while interpreting such a form, the environment is captured by all threads, so this environment must be a global. And that environment is rich enough to know the lexical bindings being referenced during arg evaluation. And so any of those threads can successfully interpret the user’s request.
> 
> But if the first thing you do which creates threads, does not need any such compilation environment, the thread-captured environment will be too poor to support proper interpretation of additional REPL forms. 
> 
> So it appears that we somehow need to always spawn threads while a rich enough global compiler environment is available to be captured by each thread. That seems to do nothing to impair performance of compiled code, but it allows any thread to properly interpret REPL forms typed in by the user.
> 
> The problem I ran into is that the REPL in the Listener thread actually ran to completion properly. But what it produced were objects that were thrown at a pool of threads, and those objects further contained more interpreted code. By the time the threads caught the code to interpret for themselves, they either had a sufficiently rich environment captured from the Listener, or they didn’t. It was a toss up as to when your code would / would not run properly in interpreter mode.
> 
> The other solution, is to never use interpreted code. Always fully compile everything sent to a thread. That always seems to work properly.
> 
> This is not a criticism, but rather an attempt to understand the boundaries of the Lisp system under multithreading. I think I’m beginning to grasp some of it…
> 
> - DM
> 
> 
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html
> 

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Weird Error...

Hi Martin,

Yes, I’m pretty sure, but there is always a chance I’m wrong… 

Here are the facts that I have uncovered:

1. Compiled code always works properly, seeing my intended lexical bindings as purely lexical. So I can put together the following kind of macro (bending some rules…)

(defmacro make-actor (name args state &body body &environment env)
  (let* ((a!self (anaphor 'self))
         (inner `(let (,a!self) ;;  <— Build the macro expansion here, called inner...
                   (setf ,a!self (make-instance 'Actor
                                                :name ,(if (consp name)
                                                           name
                                                         `',name)
                                                :lambda-list ',args
                                                :behav (behav ,args ,state ,@body)))
                   (add-actor ,a!self)
                   ,a!self)))
    (if (some (um:curry #'slot-value env) ;; <— peek where I probably shouldn't
              '(compiler::compilation-env
                compiler::fenv
                compiler::venv))
        inner ;; <— for assumed compile mode
      ;; else - we must be in eval mode...
      `(funcall (compile nil (lambda ()
                               ,inner))) )
    ))

And so far this always works, whether in eval mode in the Listener or Editor, or when compiling under ASDF.

2. The code that I’m producing with this macro actually has two portions. There is an outer frame that constructs an instance of Actor class and then installs it into the running system, after first backpatching a self-referential lexical binding. Then the inner frame is a portion of code actually stored inside that Actor instance.

You can see that outer frame in the example just shown. I construct a lexical binding named SELF, create that Actor instance, add to the running system, and return that instance. The (behav …) macro constructs the inner lexical closure that gets stored inside the Actor instance. And yes it *is* a lexical closure because SELF may be free in the body of the code, attempting to reference that outer binding shown above. 

And yes, I understand that you can’t compile closures, only functions that construct closures. See what I do in that last line during eval mode to get things fully compiled anyway.

3. When I don’t fully compile using this treachery, and leave the code in interpreted form, the outer portion that constructs the Actor instance actually runs properly in the REPL thread. What doesn’t run properly is that inner eval mode code that gets embedded into the Actor instance. And the Actor always runs in some other Executive thread, not the Listener or Editor threads. 

If there are no Executive threads when the expanded macro evaluates the (ADD-ACTOR …) clause, then Executive threads will be spawned to fill a thread pool. Under that situation, if the ADD-ACTOR was performed in eval mode in the Listener or Editor, then the resulting spawned Executive threads will fail to recognize inner references to SELF as lexical, and instead complain that my references are to an unbound symbol SELF. But that only happens if the free references are relatively deep inside the embedded code, such as within a contained LAMBDA form or inside a LABELS code body. Outer level references to SELF succeed.

4. Things always work properly if code is fully compiled. Things also work properly *IF* the very first thing performed in a newly spawned Executive thread happen to be eval-mode expressions from the Listener pane. But not if the very first thing performed in the Executive comes from compiled code.

——
Now, I don’t see any shared reference problems here, and it really appears to me that some residual setting from the compiling thread doesn’t get carried over to the Executive thread, when that Executive doesn’t get spawned during a REPL eval. 

That’s why I thought the new threads were capturing some compiler environment for themselves. If they spawn during a REPL eval, that compiler environment info seems to get captured into the new threads, and they perform just fine. But if the newly spawned Executive threads come instead from a previously compiled expression evaluation, then they don’t have sufficient information to see that my future eval mode lexical bindings are lexical, not to unbound globals.

But, there’s always the possibility that I’m not seeing something that I caused for myself…

Cheers,

- DM


On Oct 24, 2017, at 05:47, Martin Simmons <martin@lispworks.com> wrote:

I'm not convinced that this is anything to do with threads, because they don't
have a lexical environment themselves.

Are you sure it isn't some variant of the "shared binding" problem:

(mapcar 'funcall (loop for x below 4 collect (lambda () x)))

which returns (4 4 4 4) rather than (0 1 2 3)?

--
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/


On Sat, 21 Oct 2017 10:51:20 -0700, David McClain said:

On second reflection, that doesn’t totally solve the problem.

The problem of incorrect interpretation — thinking that lexical references are instead to unbound globals — happens whenever the thread pool is recreated as a result of a simple non-binding form making the pool creation request at the REPL, or from automatically triggered compiled code.

But if the first thing to request the pool is an interpreted form which deep internal lexical bindings and macros, then the threads of a new pool formed during interpretation of that form are able to correctly evaluate the form. So this issue really has nothing to do with the Actions list.

It appears that macros used inside of interpreted forms are not examined until runtime. And when they rewrite themselves to executable form, the interpreter evaluates arguments of resulting function calls before calling the function. To do that, the interpreter needs to know where the value is for each symbol mentioned in the argument list.

When the pool is created while interpreting such a form, the environment is captured by all threads, so this environment must be a global. And that environment is rich enough to know the lexical bindings being referenced during arg evaluation. And so any of those threads can successfully interpret the user’s request.

But if the first thing you do which creates threads, does not need any such compilation environment, the thread-captured environment will be too poor to support proper interpretation of additional REPL forms.

So it appears that we somehow need to always spawn threads while a rich enough global compiler environment is available to be captured by each thread. That seems to do nothing to impair performance of compiled code, but it allows any thread to properly interpret REPL forms typed in by the user.

The problem I ran into is that the REPL in the Listener thread actually ran to completion properly. But what it produced were objects that were thrown at a pool of threads, and those objects further contained more interpreted code. By the time the threads caught the code to interpret for themselves, they either had a sufficiently rich environment captured from the Listener, or they didn’t. It was a toss up as to when your code would / would not run properly in interpreter mode.

The other solution, is to never use interpreted code. Always fully compile everything sent to a thread. That always seems to work properly.

This is not a criticism, but rather an attempt to understand the boundaries of the Lisp system under multithreading. I think I’m beginning to grasp some of it…

- DM


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Weird Error...

arrgh!! (dyslexia is a constant battle…)

3. When I don’t fully compile using this treachery, and leave the code in interpreted form, the outer portion that constructs the Actor instance actually runs properly in the REPL thread. What doesn’t run properly is that inner eval mode code that gets embedded into the Actor instance. And the Actor always runs in some other Executive thread, not the Listener or Editor threads. 

If there are no Executive threads when the expanded macro evaluates the (ADD-ACTOR …) clause, then Executive threads will be spawned to fill a thread pool. Under that situation, if the ADD-ACTOR was performed in eval mode in the Listener or Editor, then the resulting spawned Executive threads *will* fail to recognize inner references to SELF as lexical. But if the Executive pool had been spawned already, resulting from previously compiled forms, then eval mode code inside the Actor instance will not run properly in the Executive thread, and instead complain that my references are to an unbound symbol SELF. But that only happens if the free references are relatively deep inside the embedded code, such as within a contained LAMBDA form or inside a LABELS code body. Outer level references to SELF succeed.


On Oct 24, 2017, at 08:42, David McClain <dbm@refined-audiometrics.com> wrote:

3. When I don’t fully compile using this treachery, and leave the code in interpreted form, the outer portion that constructs the Actor instance actually runs properly in the REPL thread. What doesn’t run properly is that inner eval mode code that gets embedded into the Actor instance. And the Actor always runs in some other Executive thread, not the Listener or Editor threads. 

If there are no Executive threads when the expanded macro evaluates the (ADD-ACTOR …) clause, then Executive threads will be spawned to fill a thread pool. Under that situation, if the ADD-ACTOR was performed in eval mode in the Listener or Editor, then the resulting spawned Executive threads will fail to recognize inner references to SELF as lexical, and instead complain that my references are to an unbound symbol SELF. But that only happens if the free references are relatively deep inside the embedded code, such as within a contained LAMBDA form or inside a LABELS code body. Outer level references to SELF succeed.


Re: Weird Error...

Ohhhh! Yes!… I think I see something like a shared binding between two threads now… I need to examine that. 

The MAKE-ACTOR builds a SELF Let-binding in the launching thread (e.g. Listener or Editor), and hands off to the embedded code that will be running in a foreign thread. So in a sense, you are correct about captured cross-thread boundaries. But once executed at the REPL, no further use is made of the SELF lexical binding, at least from the REPL side. And the stuffing of that binding with the SETF just after constructing an Actor instance is correct and the last thing to ever modify that binding.

Thanks for prodding me to look more closely.

- DM

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Weird Error...

No, what I was seeing is that there is a lexical binding established in one thread, used by a lexical closure executed in a foreign thread. I don’t see how that can be a problem, since closures are first-class objects and can be instantiated at any later time. Why would it matter which thread they are invoked on as long as they are lexically self-contained? 

Clearly, when code is compiled, it always behaves properly. The difficulty arises when in eval mode in both creator and user threads. Under some conditions the user thread cannot correctly discern the lexical bindings that the compiler has no trouble seeing.

- DM


> On Oct 24, 2017, at 09:39, David McClain <dbm@refined-audiometrics.com> wrote:
> 
> Ohhhh! Yes!… I think I see something like a shared binding between two threads now… I need to examine that. 
> 
> The MAKE-ACTOR builds a SELF Let-binding in the launching thread (e.g. Listener or Editor), and hands off to the embedded code that will be running in a foreign thread. So in a sense, you are correct about captured cross-thread boundaries. But once executed at the REPL, no further use is made of the SELF lexical binding, at least from the REPL side. And the stuffing of that binding with the SETF just after constructing an Actor instance is correct and the last thing to ever modify that binding.
> 
> Thanks for prodding me to look more closely.
> 
> - DM
> 
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html
> 


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: Weird Error...

Assuming your ANAPHOR function is defined something like:

(defun anaphor (name)
  (intern (symbol-name name)))

then I think all the problems are caused by having different values of
*PACKAGE* at different times and in different threads.

Note that the definition of ANAPHOR above has a dependency on the value of
*PACKAGE*, so using it at macroexpansion time will create a macroexpansion
time dependency on the value of *PACKAGE*.  This is probably OK when compiling
everything because all macros are fully expanded at compile time.  However,
when using the interpreter, macros are expanded at run time (as they are
encountered), so you will have a run time dependency on the value of
*PACKAGE*.

Here are two problems with this for interpreted code **within BEHAV**:

1. Any code that statically references SELF will refer to the symbol in
*PACKAGE* at read time, which may be different from the symbol bound by
MAKE-ACTOR at macroexpansion time.

2. If some other macro calls (anaphor 'self) hoping to pick up the binding
made by MAKE-ACTOR, then it will get a symbol named SELF in *PACKAGE* at the
run time (i.e. macroexpansion time) of that code.  If that code runs at the
same time as MAKE-ACTOR, then it will work.  However, code within a LAMBDA or
LABELS form might run later, in particular on the Executive thread.

Now for the fun part: the value of *PACKAGE* on the Executive thread will
depend on when the thread is created!

This is because the initial value of *PACKAGE* is defined by:

(assoc '*package* mp:*process-initial-bindings*) => (*package* . *package*)

so will capture the value in the thread that calls MP:PROCESS-RUN-FUNCTION.

I think that might explain all of the quirks you are seeing.

-- 
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/



>>>>> On Tue, 24 Oct 2017 08:42:05 -0700, David McClain said:
> 
> 1. Compiled code always works properly, seeing my intended lexical bindings as purely lexical. So I can put together the following kind of macro (bending some rules…)
> 
> (defmacro make-actor (name args state &body body &environment env)
>   (let* ((a!self (anaphor 'self))
>          (inner `(let (,a!self) ;;  <— Build the macro expansion here, called inner...
>                    (setf ,a!self (make-instance 'Actor
>                                                 :name ,(if (consp name)
>                                                            name
>                                                          `',name)
>                                                 :lambda-list ',args
>                                                 :behav (behav ,args ,state ,@body)))
>                    (add-actor ,a!self)
>                    ,a!self)))
>     (if (some (um:curry #'slot-value env) ;; <— peek where I probably shouldn't
>               '(compiler::compilation-env
>                 compiler::fenv
>                 compiler::venv))
>         inner ;; <— for assumed compile mode
>       ;; else - we must be in eval mode...
>       `(funcall (compile nil (lambda ()
>                                ,inner))) )
>     ))

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Updated at: 2020-12-10 08:30 UTC