Re: array to string
On Fri, Sep 30, 2011 at 4:03 AM, Joel Reymont <joelr1@gmail.com> wrote:
>
>
> On Sep 29, 2011, at 10:02 PM, Joshua TAYLOR wrote:
>
>> CL-USER > (map 'string 'code-char #(111 112 101 110 112 111 107 101 114))
>> "openpoker"
>
>
> Obvious but slow?
You may want to look at flexi-streams:octets-to-string and
external-format:decode-external-string - these do the required
conversion from bytes to strings, but also allow you to specify a
character encoding.
Alternatively, if your byte array represents a UTF-8 string, the code
below, with suitable modifications, should be faster than either
flexi-streams:octets-to-string or
external-format:decode-external-string.
(defun reinterpret-string-as-utf8 (string)
"Useful when a string has been generated from an external source,
where the source data has been interpreted as an 8-bit (direct)
encoding - e.g, :latin-1, but should have been interpreted as :utf-8."
(declare (optimize (speed 3) (compilation-speed 0)
(debug 1) (safety 1)))
(let ((output-chars-remaining 0)
(output-accumulator 0))
(with-output-to-string (s nil :element-type 'character)
(flet ((output-byte (byte)
(if (zerop output-chars-remaining)
(if (= (logand byte #x80) 0)
(write-char (code-char byte) s)
(cond ((= (logand byte #xe0) #xc0)
(setf output-accumulator (logand byte #x1f))
(setf output-chars-remaining 1))
((= (logand byte #xf0) #xe0)
(setf output-accumulator (logand byte #x0f))
(setf output-chars-remaining 2))
((= (logand byte #xf8) #xf0)
(setf output-accumulator (logand byte #x07))
(setf output-chars-remaining 3))
((= (logand byte #xfc) #xf8)
(setf output-accumulator (logand byte #x03))
(setf output-chars-remaining 4))
((= (logand byte #xe0) #xfe)
(setf output-accumulator (logand byte #x01))
(setf output-chars-remaining 5))
(t (error "Invalid UTF-8 byte ~A" byte))))
(progn
(assert (= (logand byte #xc0) #x80))
(setf output-accumulator (logior (ash output-accumulator 6)
(logand byte #x3f)))
(decf output-chars-remaining)
(when (zerop output-chars-remaining)
(write-char (code-char output-accumulator) s))))))
(loop for c across string
do (output-byte (char-code c))
finally (assert (= output-chars-remaining 0)))))))