Lisp HUG Maillist Archive

unicode support

It seems that Lispworks doesn't handle a subset of unicode properly,
please let me know if I am mistaken:

CL-USER 15 > (map 'string #'(lambda (c) (print c)) "a⍳c")

#\a
#\�
#\U+008D
#\�
#\c
"a⍳c"

CL-USER 24 : 1 > (string 'a⍳c)
Error: Cannot read character U+008D as part of a token because it has
constituent trait 'invalid'.
[...]

CL-USER 25 > (string 'aλc)
"AλC"

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: unicode support

Hi,

Try
(map 'lw:simple-bmp-string #'(lambda (c) (print c)) "a⍳c")
or
(map 'lw:bmp-string #'(lambda (c) (print c)) "a⍳c")

Read further here: http://www.lispworks.com/documentation/lw71/LW/html/lw-195.htm#pgfId-885921

Jerome Ibanes <jibanes@gmail.com> writes:

> It seems that Lispworks doesn't handle a subset of unicode properly,
> please let me know if I am mistaken:
>
> CL-USER 15 > (map 'string #'(lambda (c) (print c)) "a⍳c")
>
> #\a
> #\�
> #\U+008D
> #\�
> #\c
> "a⍳c"
>
> CL-USER 24 : 1 > (string 'a⍳c)
> Error: Cannot read character U+008D as part of a token because it has
> constituent trait 'invalid'.
> [...]
>
> CL-USER 25 > (string 'aλc)
> "AλC"
>
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html

-- 
Br,
/Alexey

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: unicode support

While this:
(map 'lw:simple-bmp-string #'(lambda (c) (print c)) "a⍳c")
or even
(string 'a⍳c)
works well in the lispworks graphical environment, I can't get it to
operate correctly on the console (meaning, for instance a save-image
lw-console running in an xterm) or lispworks REPL xterm prompt.

Would you know how to get this to run in a lw-console identically from
the lispworks graphical environment? (assuming one created with the
instructions from
http://www.lispworks.com/documentation/lw70/LW/html/lw-91.htm section
"13.3.5 Saving a non-GUI image with multiprocessing enabled")

On Thu, Jan 18, 2018 at 8:59 AM, Alexey Veretennikov
<txm.fourier@gmail.com> wrote:
> Hi,
>
> Try
> (map 'lw:simple-bmp-string #'(lambda (c) (print c)) "a⍳c")
> or
> (map 'lw:bmp-string #'(lambda (c) (print c)) "a⍳c")
>
> Read further here: http://www.lispworks.com/documentation/lw71/LW/html/lw-195.htm#pgfId-885921
>
> Jerome Ibanes <jibanes@gmail.com> writes:
>
>> It seems that Lispworks doesn't handle a subset of unicode properly,
>> please let me know if I am mistaken:
>>
>> CL-USER 15 > (map 'string #'(lambda (c) (print c)) "a⍳c")
>>
>> #\a
>> #\�
>> #\U+008D
>> #\�
>> #\c
>> "a⍳c"
>>
>> CL-USER 24 : 1 > (string 'a⍳c)
>> Error: Cannot read character U+008D as part of a token because it has
>> constituent trait 'invalid'.
>> [...]
>>
>> CL-USER 25 > (string 'aλc)
>> "AλC"
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>
> --
> Br,
> /Alexey

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: unicode support

Hi Jerome,

Here is a thread from last year that should  answer your question. I'm sending the whole discussion because I believe there interesting bits, not just the last message.

Now, I wished I could point to the list archives, but unfortunately I can't find them. Message to Lispworks: how do we get access to the list archives ? Gmane is gone, and the most recent message on Narkive is 3 years old.


Cam


---
Begin forwarded message:

From: Sven Emtell <sven.emtell@doremir.com>
Subject: Re: *standard-output* element type
Date: 27 March 2017 at 10:40:46 GMT+2
To: lisp-hug@lispworks.com
Cc: Pascal Bourguignon <pjb@informatimago.com>, Burton Samograd <busfactor1@gmail.com>
Reply-To: Sven Emtell <sven.emtell@doremir.com>

I just realized that I never followed up on this…

Martin at LispWorks gave some inside information:
---
When running in a delivered app, *standard-output* is connected to the
terminal stream via *background-output*.  The problem is that the terminal
stream only supports Latin-1.

The simplest workaround is to log the debug info to a file (opened with
:external-format :utf-8 :element-type :default).  If you want to view the file
while the app is still running then you will need to call force-output to
flush the buffer to disk.
---

In the end, I chose to write debug info using a macro like this:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;  write-to-stdout
;;
;;  Write to standard output and replace every character with char-code > 255 with #\@
;;
;;  NOTE:
;;  This function can be used for logging to the terminal stream in Linux, which only accepts Latin-1 
;;

(defmacro write-to-stdout (control-string &rest args)
  `(format t "~a" (coerce (loop for char across (format nil ,control-string ,@args)
                                collect (if (> (char-code char) 255)
                                            #\@
                                          char)) 'string)))


Thanks Burton and Pascal for your help!
Best,
Sven



11 feb. 2017 kl. 00:03 skrev Sven Emtell <sven.emtell@doremir.com>:

Thanks,
I’ll look more into this after the weekend...


10 feb. 2017 kl. 23:04 skrev Pascal Bourguignon <pjb@informatimago.com>:


On 10 Feb 2017, at 22:44, Sven Emtell <sven.emtell@doremir.com> wrote:

My mistake - it’s getting late here in Sweden ;-)
It was the other way around…

LispWorks GUI on Mac OS X:
(stream-element-type *standard-output*) => CHARACTER

Delivered with LispWorks for Linux (32-bit) and running in terminal on Linux:
(stream-element-type *standard-output*) => BASE-CHAR

Ok, so my advice, when dealing with terminals, would be to test that, and restrict yourself to base-char when it cannot deal with all characters:


;; (beware of substituted typographical double quotes)
(let ((itemizer (if (subtypep ‘character (stream-element-type *standard-output*))
                   “•”
                   “*”)))
   (dolist (item items)
      (format t “~A ~A~%” itemizer item)))


And then, test whether export LC_ALL=sw_SW.UTF-8 makes your program use ‘character as element-type on the terminal.  If not, then you will have to explicitely change the stream-element-type (if possible in LispWorks), by testing the environment variable yourself.


Unfortunately, I don’t have LispWork code, I can only show you ccl example:


(defun locale-terminal-encoding ()
  "Returns the terminal encoding specified by the locale(7)."
  #+(and ccl windows-target)
  :iso-8859-1
  ;; ccl doesn't support :windows-1252.
  ;; (intern (format nil "WINDOWS-~A" (#_GetACP)) "KEYWORD")
  #-(and ccl windows-target)
  (dolist (var '("LC_ALL" "LC_CTYPE" "LANG")
               :iso-8859-1) ; some random default…
    (let* ((val (getenv var))
           (dot (position #\. val))
           (at  (position #\@ val :start (or dot (length val)))))
      (when (and dot (< dot (1- (length val))))
        (return (intern (let ((name (string-upcase (subseq val (1+ dot)
                                                           (or at (length val))))))
                          (if (and (prefixp "ISO" name) (not (prefixp "ISO-" name)))
                              (concatenate 'string "ISO-" (subseq name 3))
                              name))
                        "KEYWORD"))))))


(defun set-terminal-encoding (encoding)
  #-(and ccl (not swank)) (declare (ignore encoding))
  #+(and ccl (not swank))
  (mapc (lambda (stream)
          (setf (ccl::stream-external-format stream)
                (ccl:make-external-format :domain nil
                                          :character-encoding encoding
                                          :line-termination
                                          (if (boolean-enval "LSE_TELNET" nil)
                                              :windows
                                              (or
                                               #+unix :unix
                                               #+windows :windows
                                               #-(or unix windows) :unix)))))
        (list (two-way-stream-input-stream  *terminal-io*)
              (two-way-stream-output-stream *terminal-io*)))
  (values))


with:
   (set-terminal-encoding (locale-terminal-encoding))
called in the main.


One thing we can learn from this example, is that we’re actually talking about the terminal, and that the actual CL stream is *TERMINAL-IO*; *standard-input* and *standard-output* may be indirect streams redirected to *TERMINAL-IO*.  So you may have to actually modify *TERMINAL-IO*.


10 feb. 2017 kl. 22:37 skrev Pascal Bourguignon <pjb@informatimago.com>:


On 10 Feb 2017, at 22:26, Burton Samograd <busfactor1@gmail.com> wrote:

Try (set-default-character-element-type 'character) at the top of your program.

Sent from my iPhone

On Feb 10, 2017, at 2:11 PM, Sven Emtell <sven.emtell@doremir.com> wrote:

Hello,
I have a piece of server code where I make some debug printouts using (format t ...) containing characters with cl:char-code > 256.
When I run it in LispWorks GUI on Mac OS X it outputs the characters nicely in the Listener,
but when I deliver the code using LispWorks for Linux (32-bit) and then run it in a Linux terminal it says:

Call to SIGerror #\U+2022 (of type CHARACTER) is not of type BASE-CHAR. while writing to error log, error not logged

I can see why when checking the element-type of *standard-output* on the two platforms.

LispWorks GUI on Mac OS X:
(stream-element-type *standard-output*) => BASE-CHAR

Delivered with LispWorks for Linux (32-bit) and running in terminal on Linux:
(stream-element-type *standard-output*) => CHARACTER

That’s suspicious, given that (subtypep ‘bash-char ‘character) must hold.
(I would expect a GUI to be able to display more characters than a terminal).
Furthermore, it’s not consistent with the error above.


Are you lying to us?



Is it possible to change the element type of *standard-output*?
Any other suggestions?

I would first check how the LC_ALL environment variable is set in both environment.
(check also the other LC_* environment variables if LC_ALL is not set).

Of course, normally LC_ALL should be set in function of the capabilities of your terminal.

Perhaps you should check (stream-element-type *standard-output*) before sending to the terminal characters that it won’t be able to render?
I mean, if it gives a meaningfully different result for an ASCII terminal than for a GUI?

Re: unicode support


On 18 Jan 2018, at 15:55, Jerome Ibanes <jibanes@gmail.com> wrote:

It seems that Lispworks doesn't handle a subset of unicode properly,
please let me know if I am mistaken:

[I staeted writing this before the other replies but then forgot about it: it may now be obsolete  (and/or wrong), sorry.]

I think you are.  I think what you've managed to do is to create a string which has in it a sequence of characters that, when interpreted by a terminal or something else that expects things in UTF-8, looks as if it is the string you think it should be.  But it's not.  I think that something like

(map 'list #'char-code <your string>)

will show you that.

If you, for instance, do this:

(set-default-character-element-type 'bmp-char) ;character is also fine but strings will be fatter
(coerce '(#\a #\u+03b9 #\c) 'string)

You'll get a string which actually contains the unicode characters you expect.

You will then have enormous nightmare dealing with this on terminals of course, which I have no idea how to deal with.
Updated at: 2020-12-10 08:30 UTC