Lisp HUG Maillist Archive

UTF-8 and the LW Shell on LWM

I have been going around in circles on this problem… perhaps someone out there has some answers.

I have a bunch of CD tracks recorded in Europe, with file and folder names containing extended characters (e.g., umlouts, c-cedilla, acute accents, etc.). I have not been able to reach them with command line utilities, either from within the LW Shell pane, nor from my programs with SYS:CALL-SYSTEM. I really need to reach them from within some running application code. The LWM Shell pane was just a side track.

I read in some Erlang documentation that almost all OS today allow Unicode in filenames, and that OS X is actually quite rigorous about enforcing UTF-8 encodings on reading back of filenames.

I have absolutely no trouble opening these files from inside of LWM using WITH-OPEN-FILE and the file pathnames returned by directory scans. It is only when attempting to go through a shell or from a direct SYS:CALL-SYSTEM that I have difficulties. The filenames do not go across with the proper character set translations.

This is true whether I run my LWM with the default BASE-CHAR, or the full CHARACTER as default.

I tried setting locale inside the Shell pane, and also surrounding my CALL-SYSTEM with FLI locale settings, but to no avail. 

At first I thought this was Apple’s fault for not sufficiently programming their own tools to handle these filenames with extended characters embedded within. But then I tried from Apple’s own Terminal panes, and they manage to execute the utility programs just fine. The Apple Terminal is running en_US.UTF-8 locale. So it isn’t Apple’s problem after all.

There is something amiss in the LWM Shell interface and CALL-SYSTEM interface, and I cannot find anything resembling system or environment parameters that can coax it to run properly. I did dig down through the disassembly on CALL-SYSTEM to look for where it might be hosing up the character string encodings to the outside world, but I could not find anything obvious.

The Erlang documentation discussed what a can of worms this all is. So my sympathies extend to anyone attempting to provide bridges between programs. It is almost looking like it would be more uniform to use binary interfaces instead of the old Unix textual interfaces. At least then we would have some definite protocols to follow. That flies in the opposite direction from the original Unix philosophy. But I guess, in a more inclusive global setting, we really must dispense with plain old 127-code ASCII. But we now have ambiguous protocols.

- DM

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: UTF-8 and the LW Shell on LWM

… my workaround for now… since I can reach these files directly from inside of LW, I copy them bodily to a temp file with a simpler name, then call the command line utilities on that temp file. Then I erase the temp file after finishing.

That works, using LW:COPY-FILE, and MAKE-TEMP-FILE. But that is a kludge, and in many cases the reason that my command line utility calls fail is because some of these tracks have DRM encryption and cannot be read, even if you can reach them. 

But for now, I have to make two attempts — the first one against the original file name, then if that fails, make a copy of the file to a simpler named file and try again. Only after this second attempt can I be sure that the external utility fails because of DRM encoding. That first failure might have been due to a filename with extended characters that could not be found by the external utility, due to improper name translation from Lisp the the outside world.

- DM


> On Feb 13, 2017, at 23:32, David McClain <dbm@refined-audiometrics.com> wrote:
> 
> I have been going around in circles on this problem… perhaps someone out there has some answers.
> 
> I have a bunch of CD tracks recorded in Europe, with file and folder names containing extended characters (e.g., umlouts, c-cedilla, acute accents, etc.). I have not been able to reach them with command line utilities, either from within the LW Shell pane, nor from my programs with SYS:CALL-SYSTEM. I really need to reach them from within some running application code. The LWM Shell pane was just a side track.
> 
> I read in some Erlang documentation that almost all OS today allow Unicode in filenames, and that OS X is actually quite rigorous about enforcing UTF-8 encodings on reading back of filenames.
> 
> I have absolutely no trouble opening these files from inside of LWM using WITH-OPEN-FILE and the file pathnames returned by directory scans. It is only when attempting to go through a shell or from a direct SYS:CALL-SYSTEM that I have difficulties. The filenames do not go across with the proper character set translations.
> 
> This is true whether I run my LWM with the default BASE-CHAR, or the full CHARACTER as default.
> 
> I tried setting locale inside the Shell pane, and also surrounding my CALL-SYSTEM with FLI locale settings, but to no avail. 
> 
> At first I thought this was Apple’s fault for not sufficiently programming their own tools to handle these filenames with extended characters embedded within. But then I tried from Apple’s own Terminal panes, and they manage to execute the utility programs just fine. The Apple Terminal is running en_US.UTF-8 locale. So it isn’t Apple’s problem after all.
> 
> There is something amiss in the LWM Shell interface and CALL-SYSTEM interface, and I cannot find anything resembling system or environment parameters that can coax it to run properly. I did dig down through the disassembly on CALL-SYSTEM to look for where it might be hosing up the character string encodings to the outside world, but I could not find anything obvious.
> 
> The Erlang documentation discussed what a can of worms this all is. So my sympathies extend to anyone attempting to provide bridges between programs. It is almost looking like it would be more uniform to use binary interfaces instead of the old Unix textual interfaces. At least then we would have some definite protocols to follow. That flies in the opposite direction from the original Unix philosophy. But I guess, in a more inclusive global setting, we really must dispense with plain old 127-code ASCII. But we now have ambiguous protocols.
> 
> - DM


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: UTF-8 and the LW Shell on LWM

Thank you Martin! That worked!!

- DM

> On Feb 14, 2017, at 04:00, Martin Simmons <martin@lispworks.com> wrote:
> 
> The problem is that SYS:CALL-SYSTEM doesn't convert its arguments into the
> correct external format.
> 
> You could try writing the command into a temporary file (opened with
> :external-format :utf-8 probably) and then use
> 
> sys:call-system (list "/bin/sh" temp-file)
> 
> -- 
> Martin Simmons
> LispWorks Ltd
> http://www.lispworks.com/
> 

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: UTF-8 and the LW Shell on LWM

Here’s Martin’s suggested approach, shared with everyone…

( I added a keyword :SHOWING to allow this to be used for both SYS:CALL-SYSTEM and SYS:CALL-SYSTEM-SHOWING-OUTPUT)

;; SAFE-CALL-SYSTEM - fixes a bug in LWM that prevents proper
;; character set translation of extended chars in filenames, making a
;; straightforward CALL-SYSTEM fail when it should otherwise succeed.
;;
;; DM/RAL 02/17 (LWM 7.0)

(defmethod safe-call-system ((v vector) &rest args &key &allow-other-keys)
  (apply #'safe-call-system (coerce v 'list) args))

(defmethod safe-call-system ((lst list) &rest args &key &allow-other-keys)
  (apply #'safe-call-system (format nil "~{~A ~}" lst) args))

(defmethod safe-call-system ((cmd string) &rest args &key showing &allow-other-keys)
  (let ((syscmd (if showing
                    #'sys:call-system-showing-output
                  #'sys:call-system)))
    (remf args :showing)
    (if (some (lambda (c)
                (> (char-code c) 127))
              cmd)
        (let ((scrfname (hcl:create-temp-file :directory "/tmp/")))
          (with-open-file (s scrfname
                             :direction :output
                             :if-exists :supersede
                             :external-format '(:UTF-8 :eol-style :lf))
            (write-line cmd s))
          (unwind-protect
            (apply syscmd (list "/bin/sh" (namestring scrfname)) args)
          (delete-file scrfname)))
      ;; else
      (apply syscmd cmd args))
    ))


- DM


On Feb 14, 2017, at 08:15, David McClain <dbm@refined-audiometrics.com> wrote:

Thank you Martin! That worked!!

- DM

On Feb 14, 2017, at 04:00, Martin Simmons <martin@lispworks.com> wrote:

The problem is that SYS:CALL-SYSTEM doesn't convert its arguments into the
correct external format.

You could try writing the command into a temporary file (opened with
:external-format :utf-8 probably) and then use

sys:call-system (list "/bin/sh" temp-file)

--
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: UTF-8 and the LW Shell on LWM

Yes… the UNWIND-PROTECT was in the wrong place…

(defmethod safe-call-system ((cmd string) &rest args &key showing &allow-other-keys)
  (let ((syscmd (if showing
                    #'sys:call-system-showing-output
                  #'sys:call-system)))
    (remf args :showing)
    (if (some (lambda (c)
                (> (char-code c) 127))
              cmd)
        (let ((scrfname (hcl:create-temp-file :directory "/tmp/")))
          (unwind-protect
              (progn
                (with-open-file (s scrfname
                                   :direction :output
                                   :if-exists :supersede
                                   :external-format '(:UTF-8 :eol-style :lf))
                  (write-line cmd s))
                (apply syscmd (list "/bin/sh" (namestring scrfname)) args))
          (delete-file scrfname)))
      ;; else
      (apply syscmd cmd args))
    ))

—————
Also… you should still ensure that any special filenames have been properly escaped for the shell - either with backslashes or else enclosed in single-quotes.

 - DM

On Feb 14, 2017, at 09:14, David McClain <dbm@refined-audiometrics.com> wrote:

Here’s Martin’s suggested approach, shared with everyone…

( I added a keyword :SHOWING to allow this to be used for both SYS:CALL-SYSTEM and SYS:CALL-SYSTEM-SHOWING-OUTPUT)

;; SAFE-CALL-SYSTEM - fixes a bug in LWM that prevents proper
;; character set translation of extended chars in filenames, making a
;; straightforward CALL-SYSTEM fail when it should otherwise succeed.
;;
;; DM/RAL 02/17 (LWM 7.0)

(defmethod safe-call-system ((v vector) &rest args &key &allow-other-keys)
  (apply #'safe-call-system (coerce v 'list) args))

(defmethod safe-call-system ((lst list) &rest args &key &allow-other-keys)
  (apply #'safe-call-system (format nil "~{~A ~}" lst) args))

(defmethod safe-call-system ((cmd string) &rest args &key showing &allow-other-keys)
  (let ((syscmd (if showing
                    #'sys:call-system-showing-output
                  #'sys:call-system)))
    (remf args :showing)
    (if (some (lambda (c)
                (> (char-code c) 127))
              cmd)
        (let ((scrfname (hcl:create-temp-file :directory "/tmp/")))
          (with-open-file (s scrfname
                             :direction :output
                             :if-exists :supersede
                             :external-format '(:UTF-8 :eol-style :lf))
            (write-line cmd s))
          (unwind-protect
            (apply syscmd (list "/bin/sh" (namestring scrfname)) args)
          (delete-file scrfname)))
      ;; else
      (apply syscmd cmd args))
    ))


- DM


On Feb 14, 2017, at 08:15, David McClain <dbm@refined-audiometrics.com> wrote:

Thank you Martin! That worked!!

- DM

On Feb 14, 2017, at 04:00, Martin Simmons <martin@lispworks.com> wrote:

The problem is that SYS:CALL-SYSTEM doesn't convert its arguments into the
correct external format.

You could try writing the command into a temporary file (opened with
:external-format :utf-8 probably) and then use

sys:call-system (list "/bin/sh" temp-file)

--
Martin Simmons
LispWorks Ltd
http://www.lispworks.com/


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html



Updated at: 2020-12-10 08:31 UTC