Re: READ-SEQUENCE and FILE-LENGTH (LWW 4.2.7)
Thanks to all who responded. The problem seems indeed one of DOS/UNIX
linefeed mismatch.
For the case at hand (trying to get 'albert' to work), returning the
subsequence (memory costly operation) appears to work.
I'll investigate the external encoding feature for future cases.
However, I am not sure that the semantics of READ-SEQUENCE warrants the
linefeed translation. Oh well.
Cheers
marco
On Tuesday, May 11, 2004, at 08:29 America/New_York, davef@xanalys.com
wrote:
>
> Well, I thought so. But I am unsure that is the case. The file I
> am
> testing was edited with LWW itself.
>
> LispWorks is capable of editing and creating files with different
> encodings. See the Editor User Guide for details.
>
> Now, checking with 'wc' on a Linux box shows that the difference
> between the file length and the number of chars read is suspiciously
> close to the number of lines in the file.
>
> Is that difference exactly equal to the number of lines?
>
> The file is a DOS file.
> Maybe an end of line mismatch?
>
> I guess that you have a CRLF-line-terminated file. OPEN detects this
> and creates a file stream with an appropriate external format. The
> Lisp line terminator is LF, so when that file is read into Lisp the
> external-format maps each CRLF pair to LF.
>
> You can check the stream's external format by STREAM-EXTERNAL-FORMAT.
>
> LispWorks FILE-LENGTH does not take account of the external format,
> because in general it would need to read the entire file to achieve
> that. Perhaps it should return NIL rather than the file's byte-length
> in such cases. In any case, your code needs to allow for FILE-LENGTH
> returning NIL.
>
> As an aside, this is a show stopper for 'albert' and LW(W)
>
> Perhaps you can simply call the LispWorks function FILE-STRING, which
> does do external formats. Alternately you could hack it by specifying
> a no-conversion external format or even use a binary stream - you'll
> need to think about whether you want to see those Control-M characters
> in your Lisp strings if you take this route.
>
>
> Cheers
>
> Marco
>
>
>
>
> On Monday, May 10, 2004, at 16:31 America/New_York, Edi Weitz wrote:
>
>> Could it be something like a UTF8-encoded file containing non-ASCII
>> characters? See
>>
>>
>> <http://www.google.com/
>> groups?selm=8765gs6qve.fsf%40bird.agharta.de&oe=UTF-8
>> &output=gplain>
>>
>> Edi.
>>
>>> -----Original Message-----
>>> From: owner-lisp-hug@xanalys.com
>>> [mailto:owner-lisp-hug@xanalys.com] On Behalf Of Marco Antoniotti
>>> Sent: Montag, 10. Mai 2004 22:08
>>> To: lisp-hug@xanalys.com
>>> Subject: READ-SEQUENCE and FILE-LENGTH (LWW 4.2.7)
>>>
>>>
>>> Hi
>>>
>>> I have the following test function on LWW 4.2.7
>>>
>>> (defun test-rs (filename)
>>> (with-open-file (s filename :direction :input)
>>> (let ((buffer (make-array (file-length s)
>>> :element-type (stream-element-type
>>> s)))
>>> (chars-read 0)
>>> )
>>> (format t "Trying to READ-SEQUENCE on stream ~S of
>>> length ~D.~%"
>>> s
>>> (file-length s))
>>> (setf chars-read (read-sequence buffer s))
>>> (cond ((< chars-read (file-length s))
>>> (format t "Read only ~D chars instead of ~D~%"
>>> chars-read
>>> (file-length s))
>>> (subseq buffer 0 chars-read))
>>> (t
>>> buffer)))))
>>>
>>>
>>> turns out that it appears that the FILE-LENGTH is greater than the
>>> actual file contents.
>>>
>>> Has anybody observed this effect?
>>>
>>> Cheers
>>> --
>>> Marco
>>>
>>>
>>>
> --
> Marco Antoniotti http://bioinformatics.nyu.edu
> NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
> 715 Broadway 10th FL fax. +1 - 212 - 998 3484
> New York, NY, 10003, U.S.A.
>
> --
> Dave Fox
>
> Xanalys http://www.lispworks.com
> Compass House
> Vision Park, Chivers Way
> Histon
> Cambridge, CB4 9AD
> England
>
--
Marco Antoniotti http://bioinformatics.nyu.edu
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
715 Broadway 10th FL fax. +1 - 212 - 998 3484
New York, NY, 10003, U.S.A.