Lisp HUG Maillist Archive

Unescaping strings

Is there anything in the CLHS that deals with escaped strings? 

For example, in C: "\t\n", while in Lisp I seem forced to (format nil "~c~c" #\tab #\newline).

I can obviously write a function to handle escaped strings (more importantly un-escaping them), but if something the spec deals with this already, I'd rather use it... I just haven't found it yet.

Jeff M.

Re: Unescaping strings

On Mon, Feb 25, 2013 at 4:29 PM, Jeff Massung <massung@gmail.com> wrote:
> Is there anything in the CLHS that deals with escaped strings?
[...]

Nope. However, Edi Weitz' CL-INTERPOL will do it: http://weitz.de/cl-interpol/

    -tree

-- 
Tom Emerson
tremerson@gmail.com
http://www.dreamersrealm.net/tree

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Unescaping strings

On Mon, Feb 25, 2013 at 4:29 PM, Jeff Massung <massung@gmail.com> wrote:
> Is there anything in the CLHS that deals with escaped strings?
>
> For example, in C: "\t\n", while in Lisp I seem forced to (format nil "~c~c"
> #\tab #\newline).
>
> I can obviously write a function to handle escaped strings (more importantly
> un-escaping them), but if something the spec deals with this already, I'd
> rather use it... I just haven't found it yet.

It's not necessarily great style to have meaningful literal whitespace
in your code, I suppose, but you can always just do "
" as well.  (Let's see if any of the processing between here and the
list will strip out that tab character...

//JT

-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Unescaping strings

Well, I'm parsing JSON, so I'm not trying to just use the Lisp reader. ;-)

See https://github.com/massung/json

So, while I've just parsed the strings and that was simple/fine (just hoping CL had something built-in), this leads me to another string-related question for LispWorks. How can I coerce a list of characters to a string when that list can contain Unicode characters?

For example:

(coerce `(#\a #\b ,(code-char 1049)) 'string) ;=> ERROR!

(format nil "~c" (code-char 1049)) ;=> WORKS!

(format nil "~{~c~}" `(#\a #\b ,(code-char 1049))) ;=> WORKS!

(type-of *) ;=> SIMPLE-TEXT-STRING

(coerce `(#\a #\b ,(code-char 1049)) 'simple-text-string) ;=> ERROR!

So, I think my question boils down to whether or not there is a type specifier in LispWorks for Unicode/multi-byte strings that I can use with COERCE, or if I'm stuck doing my FORMAT work-around?

Jeff M.


On Mon, Feb 25, 2013 at 5:26 PM, Joshua TAYLOR <tayloj@rpi.edu> wrote:

On Mon, Feb 25, 2013 at 4:29 PM, Jeff Massung <massung@gmail.com> wrote:
> Is there anything in the CLHS that deals with escaped strings?
>
> For example, in C: "\t\n", while in Lisp I seem forced to (format nil "~c~c"
> #\tab #\newline).
>
> I can obviously write a function to handle escaped strings (more importantly
> un-escaping them), but if something the spec deals with this already, I'd
> rather use it... I just haven't found it yet.

It's not necessarily great style to have meaningful literal whitespace
in your code, I suppose, but you can always just do "
" as well.  (Let's see if any of the processing between here and the
list will strip out that tab character...

//JT

--
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Unescaping strings

> For example:
>
> (coerce `(#\a #\b ,(code-char 1049)) 'string) ;=> ERROR!

If you evaluate

(set-default-character-element-type 'lw:simple-char)

before you evaluate the above it works.

You could also write

(coerce '(#\a #\b #\U+0419) 'string)

> (coerce `(#\a #\b ,(code-char 1049)) 'simple-text-string) ;=> ERROR!

That works for me in LW 6.0.1 on Mac OS X even without the
set-default-character-element-type call.

    -tree

-- 
Tom Emerson
tremerson@gmail.com
http://www.dreamersrealm.net/tree

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Unescaping strings

On Tue, Feb 26, 2013 at 9:50 AM, Jeff Massung <massung@gmail.com> wrote:
> Well, I'm parsing JSON, so I'm not trying to just use the Lisp reader. ;-)
>
> See https://github.com/massung/json
>
> So, while I've just parsed the strings and that was simple/fine (just hoping
> CL had something built-in), this leads me to another string-related question
> for LispWorks. How can I coerce a list of characters to a string when that
> list can contain Unicode characters?
>
> For example:
>
> (coerce `(#\a #\b ,(code-char 1049)) 'string) ;=> ERROR!
>
> (format nil "~c" (code-char 1049)) ;=> WORKS!
>
> (format nil "~{~c~}" `(#\a #\b ,(code-char 1049))) ;=> WORKS!
>
> (type-of *) ;=> SIMPLE-TEXT-STRING
>
> (coerce `(#\a #\b ,(code-char 1049)) 'simple-text-string) ;=> ERROR!
>
> So, I think my question boils down to whether or not there is a type
> specifier in LispWorks for Unicode/multi-byte strings that I can use with
> COERCE, or if I'm stuck doing my FORMAT work-around?

Two others have already written about changing the default character
type, and that may be a suitable solution.  If you still want to
concatenate without changing default types, the string type
TEXT-STRING [1], which "is the string type that is guaranteed to
always hold any character used in writing text (program text or
natural language)," may be what you want:

CL-USER 2 > (coerce `(#\a #\b ,(code-char 1049)) 'string)  ; problem you saw
Error: #\Й is not of type BASE-CHAR.
....

CL-USER 4 > (coerce `(#\a #\b ,(code-char 1049)) 'text-string)
"abЙ"

//JT

[1] http://www.lispworks.com/documentation/lw50/LWRM/html/lwref-353.htm#82128

-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: Unescaping strings

Thank you all for the awesome information!

Jeff M.


On Tue, Feb 26, 2013 at 9:13 AM, Joshua TAYLOR <joshuaaaron@gmail.com> wrote:
On Tue, Feb 26, 2013 at 9:50 AM, Jeff Massung <massung@gmail.com> wrote:
> Well, I'm parsing JSON, so I'm not trying to just use the Lisp reader. ;-)
>
> See https://github.com/massung/json
>
> So, while I've just parsed the strings and that was simple/fine (just hoping
> CL had something built-in), this leads me to another string-related question
> for LispWorks. How can I coerce a list of characters to a string when that
> list can contain Unicode characters?
>
> For example:
>
> (coerce `(#\a #\b ,(code-char 1049)) 'string) ;=> ERROR!
>
> (format nil "~c" (code-char 1049)) ;=> WORKS!
>
> (format nil "~{~c~}" `(#\a #\b ,(code-char 1049))) ;=> WORKS!
>
> (type-of *) ;=> SIMPLE-TEXT-STRING
>
> (coerce `(#\a #\b ,(code-char 1049)) 'simple-text-string) ;=> ERROR!
>
> So, I think my question boils down to whether or not there is a type
> specifier in LispWorks for Unicode/multi-byte strings that I can use with
> COERCE, or if I'm stuck doing my FORMAT work-around?

Two others have already written about changing the default character
type, and that may be a suitable solution.  If you still want to
concatenate without changing default types, the string type
TEXT-STRING [1], which "is the string type that is guaranteed to
always hold any character used in writing text (program text or
natural language)," may be what you want:

CL-USER 2 > (coerce `(#\a #\b ,(code-char 1049)) 'string)  ; problem you saw
Error: #\Й is not of type BASE-CHAR.
....

CL-USER 4 > (coerce `(#\a #\b ,(code-char 1049)) 'text-string)
"abЙ"

//JT

[1] http://www.lispworks.com/documentation/lw50/LWRM/html/lwref-353.htm#82128

--
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

Re: Unescaping strings

Jeff Massung <massung@gmail.com> writes:

> Is there anything in the CLHS that deals with escaped strings? 
>
> For example, in C: "\t\n", while in Lisp I seem forced to (format nil
> "~c~c" #\tab #\newline).
>
> I can obviously write a function to handle escaped strings (more
> importantly un-escaping them), but if something the spec deals with
> this already, I'd rather use it... I just haven't found it yet.

This was posted on http://paste.lisp.org once:


;;;; -*- mode:lisp;coding:utf-8 -*-
;;;;**************************************************************************
;;;;FILE:               c-string-reader.lisp
;;;;LANGUAGE:           Common-Lisp
;;;;SYSTEM:             Common-Lisp
;;;;USER-INTERFACE:     NONE
;;;;DESCRIPTION
;;;;    
;;;;    A C string reader, implememting C string back-slash escapes.
;;;;    Also includes a writer to print strings with C back-slash escapes.
;;;;    
;;;;AUTHORS
;;;;    <PJB> Pascal J. Bourguignon <pjb@informatimago.com>
;;;;MODIFICATIONS
;;;;    2011-05-21 <PJB> Updated from http://paste.lisp.org/display/69905
;;;;BUGS
;;;;LEGAL
;;;;    GPL
;;;;    
;;;;    Copyright Pascal J. Bourguignon 2011 - 2011
;;;;    
;;;;    This program is free software; you can redistribute it and/or
;;;;    modify it under the terms of the GNU General Public License
;;;;    as published by the Free Software Foundation; either version
;;;;    2 of the License, or (at your option) any later version.
;;;;    
;;;;    This program is distributed in the hope that it will be
;;;;    useful, but WITHOUT ANY WARRANTY; without even the implied
;;;;    warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
;;;;    PURPOSE.  See the GNU General Public License for more details.
;;;;    
;;;;    You should have received a copy of the GNU General Public
;;;;    License along with this program; if not, write to the Free
;;;;    Software Foundation, Inc., 59 Temple Place, Suite 330,
;;;;    Boston, MA 02111-1307 USA
;;;;**************************************************************************


(defun write-c-string (string &optional (stream *standard-output*))
  "Prints the string as a C string, with C escape sequences."
  (loop
     :for ch :across string
     :initially (princ "\"" stream)
     :do (princ (case ch
                  ((#\bell)               "\\a")
                  ((#\backspace)         "\\b")
                  ((#\page)               "\\f")
                  ((#\newline #\linefeed) "\\n")
                  ((#\return)             "\\r")
                  ((#\tab)                "\\t")
                  ((#\vt)                 "\\v")
                  ((#\")                  "\\\"")
                  ((#\\)                  "\\\\")
                  (otherwise
                   (if (< (char-code ch) 32)
                       (format nil "\\~3,'0o" (char-code ch))
                       ch))) stream)
     :finally (princ "\"" stream)))


(defun read-c-string (stream)
  "Read a C string from the STREAM
The initial double-quote must have been read already."
  (let ((buffer (make-array 80 :element-type 'character
                            :adjustable t :fill-pointer 0))
        (state :in-string)
        (start  0))
    (flet ((process-token (ch)
             (ecase state
               ((:in-string)
                (setf state (case ch
                              ((#\")     :out)
                              ((#\\)     :escape)
                              (otherwise (vector-push-extend ch buffer)
                                         :in-string)))
                nil)
               ((:escape)
                (setf state :in-string)
                (case ch
                  ((#\' #\" #\? #\\) (vector-push-extend ch buffer))
                  ((#\a)  (vector-push-extend #\bell    buffer))
                  ((#\b)  (vector-push-extend #\backspace buffer))
                  ((#\f)  (vector-push-extend #\page    buffer))
                  ((#\n)  (vector-push-extend #\newline buffer))
                  ((#\newline) #|remove it|#)
                  ((#\r)  (vector-push-extend #\return  buffer))
                  ((#\t)  (vector-push-extend #\tab     buffer))
                  ((#\v)  (vector-push-extend #\vt      buffer))
                  ((#\x)
                   (setf state :in-hexa
                         start (fill-pointer buffer)))
                  ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7)
                   (setf state :in-octal
                         start (fill-pointer buffer))
                   (vector-push-extend ch buffer))
                  (otherwise
                   (error "Invalid escape character \\~C at position ~D"
                          ch (fill-pointer buffer))))
                nil)
               ((:in-octal)
                (flet ((insert-octal ()
                         (setf (aref buffer start) (code-char (parse-integer buffer :start start :radix 8))
                               (fill-pointer buffer) (1+ start) 
                               state :in-string)))
                 (case ch
                   ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7)
                    (vector-push-extend ch buffer)
                    (when (<= 3 (- (fill-pointer buffer) start))
                      (insert-octal))
                    nil)
                   (otherwise
                    (insert-octal)
                    :again))))
               ((:in-hexa)
                (case ch
                  ((#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9
                        #\a #\b #\c #\d #\e #\f
                        #\A #\B #\C #\D #\E #\F)
                   (vector-push-extend ch buffer)
                   nil)
                  (otherwise
                   (if (< start (fill-pointer buffer))
                       (setf (aref buffer start) (code-char (parse-integer buffer :start start :radix 16))
                             (fill-pointer buffer) (1+ start))
                       (error "Invalid hexadecimal digit at position ~A" (fill-pointer buffer)))
                   (setf state :in-string)
                   :again))))))
      (loop
         :for ch = (read-char stream)
         :do (loop :while (process-token ch))
         :until (eq state :out) 
         :finally (return buffer)))))



(defun test/read-c-string ()
 (let ((*readtable*
        (let ((rt (copy-readtable nil)))
          (set-macro-character #\"
                               (lambda (stream ch)
                                 (declare (ignore ch))
                                 (read-c-string stream))
                               nil
                               rt)
          rt)))
    (read-from-string "\"Hello, bell=\\a, backspace=\\b, page=\\f, newline=\\n, return=\\r, tab=\\t, vt=\\v, \\
\\\"double-quotes\\\", \\'single-quotes\\', question\\?, backslash=\\\\, \\
hexa=\\x3BB, octal=\\101, \\7\\77\\107\\3071\"")))


;;;; THE END ;;;;

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: GPL license in posted code, was Re: Unescaping strings

Rainer Joswig <joswig@lisp.de> writes:

> Hi,
>
> just a remark about the GPL license used in the code Pascal J. Bourguignon posted here.
>
> Note that I can't load this code into a Lispworks and distribute that
> as an application. The GPL requires that the complete code (including
> your app and the LispWorks code) needs to be available then. I would
> need to get this code under a different license from him - even though
> I could decide to make my code available, then still the LispWorks
> code is not available.
>
> See http://www.cliki.net/LGPL for variants. Only the LLGPL, not even
> the LGPL, would really make it possible for me to use such code with
> something like LispWorks and being able to distribute an application.

Indeed.  However, you can still use the code to learn how to do it.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: GPL license in posted code, was Re: Unescaping strings

Is this due to there being no Lispworks runtime library that applications link against?  The GPL FAQ states that GPL code is compatible with environments like Visual C++ as apps link against a separate runtime library. 



On Wednesday, February 27, 2013, Rainer Joswig wrote:

Hi,

just a remark about the GPL license used in the code Pascal J. Bourguignon posted here.

Note that I can't load this code into a Lispworks and distribute that as an application. The GPL requires that the complete code (including your app and the LispWorks code) needs to be available then. I would need to get this code under a different license from him - even though I could decide to make my code available, then still the LispWorks code is not available.

See http://www.cliki.net/LGPL  for variants. Only the LLGPL, not even the LGPL,  would really make it possible for me to use such code with something like LispWorks and being able to distribute an application.

Regards,

Rainer Joswig



Am 27.02.2013 um 20:38 schrieb "Pascal J. Bourguignon" <pjb@informatimago.com>:

> This was posted on http://paste.lisp.org once:
> ;;;;AUTHORS
> ;;;;    <PJB> Pascal J. Bourguignon <pjb@informatimago.com>
> ;;;;MODIFICATIONS
> ;;;;    2011-05-21 <PJB> Updated from http://paste.lisp.org/display/69905
> ;;;;BUGS
> ;;;;LEGAL
> ;;;;    GPL
> ;;;;
> ;;;;    Copyright Pascal J. Bourguignon 2011 - 2011
> ;;;;
> ;;;;    This program is free software; you can redistribute it and/or
> ;;;;    modify it under the terms of the GNU General Public License
> ;;;;    as published by the Free Software Foundation; either version
> ;;;;    2 of the License, or (at your option) any later version.
> ;;;;
> ;;;;    This program is distributed in the hope that it will be
> ;;;;    useful, but WITHOUT ANY WARRANTY; without even the implied
> ;;;;    warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
> ;;;;    PURPOSE.  See the GNU General Public License for more details.
> ;;;;
> ;;;;    You should have received a copy of the GNU General Public
> ;;;;    License along with this program; if not, write to the Free
> ;;;;    Software Foundation, Inc., 59 Temple Place, Suite 330,
> ;;;;    Boston, MA 02111-1307 USA
> ;;;;**************************************************************************


_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html

Re: GPL license in posted code, was Re: Unescaping strings

2013/03/02 Luke Crook <luke@...>:
> Is this due to there being no Lispworks runtime library that applications
> link against?  The GPL FAQ states that GPL code is compatible with
> environments like Visual C++ as apps link against a separate runtime
> library.

In Common Lisp, you usually load code (*.lisp or *.fasl) into the
current environment. I don't know how much of that can be considered
just linking.

Maybe if you use (declaim (ftype ...)) everywhere, so you can compile
your code which invokes yet undefined functions, and load external
Lisp code as late as possible. However, macros and reader (dispatch)
macro characters don't seem to match this simple category, as their
purpose is to do something in the context of the compiler and the
reader, so they have an immediate effect on the environment instead of
just in your final bundle.

I don't know how this is thought of in the C world, where #defines can
be simple constants, but they can also be complex macros that expand
to non-trivial code or yet other macros.

Given the ambiguity and the lack of actual trial of the licenses (GPL
and LPGL), even in the scope of statically compiled/processed
languages, I have to agree that the LLGPL is the only friendly
"copyleft" license for Lisp in general.

-- 
Paulo Madeira

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Updated at: 2020-12-10 08:35 UTC