Re: string encoding for pathnames. FSRef?
On Mar 19, 2008, at 2:22 PM, Chris Perkins wrote:
> I'm using Mac LispWorks 5 and am passing some data back and forth
> with another process over a socket. Both LW and the other process
> are running on the same machine.
> Some of the data that is being passed are pathnames to files. I've
> been having a bit of trouble with pathnames that contain non-ASCII
> characters. It turns out that the other process is using a variety
> of string encodings (UTF8, MacRoman) and varying between precomposed
> and decomposed unicode. So now I'm trying to make everything
> consistent and fix it.
> What is the encoding LispWorks uses for its strings? Such as can be
> used to construct a pathname object or passed to (probe-file).
LW can decode a variety of formats, but you have to know what you are
decoding. If the files are all really paths on the same machine, how
is it that some would be UTF8 while others are MacRoman? I thought the
OS would represent them the same way. Unless perhaps they are stored
in a database or something from different OS versions.
If I recall correctly, MacRoman is similar to Latin1 which is a subset
of UTF8. If the encodings are really different, I think there should
be a small number of bytes to compare to determine if you have one or
the other.
Here is a way to convert UTF-8 byte stream to a lisp string:
(defun translate-string-via-fli (string from to)
(fli:with-foreign-string (ptr elements bytes :external-format from)
(declare (ignore elements bytes))
(fli:convert-from-foreign-string ptr :external-format to)))
(defun decode-external-string (string)
(translate-string-via-fli string :latin-1 :utf-8))
> Alternately, rather than the inconsistent paths I currently am
> accessing, I have the option of getting a Mac FSRef. That really
> seems ideal. But, I'd need to transmit it across the socket to
> LispWorks and then get LW to make a valid pathname object. An FSRef
> is an opaque 80 byte entity. I have no idea if it is portable across
> processes (both are on the same machine, though). But, if it were,
> is there any way I can get LispWorks to embrace it?
You can call Carbon FSRef related functions. For example,
(fli:define-foreign-function (CFURLCreateFromFSRef
"CFURLCreateFromFSRef" :source)
((allocator :pointer)
(fs-ref :pointer))
:result-type :pointer
:language :c)
This gives you a CFURL which you can convert to a file system path.
Something like this:
(defun cf-url-file-system-path (url &optional (path-style +posix-path-
(let ((cf-string (CFURLCopyFileSystemPath url path-style)))
(unless (fli:null-pointer-p cf-string)
(prog1 (%string cf-string) ;(cf-string-to-lisp-string cf-string)
(CFRelease cf-string)))))
> P.S. I haven't worked much with string encoding in the past. If I
> have a pathname with arabic characters in, for example, is that even
> expressible with a string using MacRoman encoding? What about
> ISOLatin? UTF8?
UTF8, yes, MacRoman or Latin-X, no.
John DeSoi, Ph.D.