Lisp HUG Maillist Archive

ALPHANUMERICP

(loop for i below char-code-limit
      when (alphanumericp (code-char i))
      count 1)

In LW this results in 124 while Allegro CL yields 65470. For both
implementations CHAR-CODE-LIMIT is 65536. Is this a bug?

Thanks,
Edi.


Re: ALPHANUMERICP

Edi Weitz (2003-03-25 01:08):

> (loop for i below char-code-limit
>       when (alphanumericp (code-char i))
>       count 1)
> 
> In LW this results in 124 while Allegro CL yields 65470. For both
> implementations CHAR-CODE-LIMIT is 65536. Is this a bug?
> 

How many alphanumerics are there in Unicode?

s.

> Thanks,
> Edi.


Re: ALPHANUMERICP

Unable to parse email body. Email id is 997

Re: ALPHANUMERICP

David Fox <davef@xanalys.com> writes:

>    (loop for i below char-code-limit
> 	 when (alphanumericp (code-char i))
> 	 count 1)
> 
>    In LW this results in 124 while Allegro CL yields 65470. For both
>    implementations CHAR-CODE-LIMIT is 65536. Is this a bug?
> 
> No.
> 
> Implementations of ANSI Common Lisp are free to define their character
> categories as long as these include specified subsets of the type
> STANDARD-CHAR. See
> http://www.lispworks.com/reference/HyperSpec/Body/13_ad.htm

Yes, I was aware of this. I actually meant "bug" with respect to
Unicode not with respect to ANSI Common Lisp.

> In LispWorks currently the alphabetics include only those of the
> Latin-1 character set. To do the ANSI CL categories stuff
> comprehensively across Unicode would require not just the
> alphabetics but the cased pairs, extra digits (e.g. the Arabic
> digits) and so on. I would be interested to learn why you care.

I don't really care (currently) but just came across this more or less
by chance. However, although I'm by no means an expert on this topic I
think there must exist some definition in the Unicode standard whether
a code point is alphanumeric or not. Perl (sorry for this example...)
has Unicode constructs like 'IsAlpha' or 'IsAlnum', other languages
probably have that, too. Wouldn't it be nice if CL's ALPHANUMERICP
would yield the same result for all implementations which support
Unicode?

> Allegro seems to take a more inclusive approach, and is returning
> true for ALPHANUMERICP on some undefined Unicode code points.

Ooops, I didn't look that close... :)

> I don't know if that's better or worse than the exclusive approach
> of LispWorks. Could be that both Lisps are ANSI CL-compliant whilst
> not really implementing the Unicode notion of 'alphabetic'
> (http://www.unicode.org/glossary/)

Yep, that's what I meant. After all, the function ALPHANUMERICP is
defined for all characters, so if you decide to return NIL you thereby
explicitely tell the user that the character is _not_
alphanumeric. There's no third answer saying "we don't care"... :)

Thanks,
Edi.


Re: ALPHANUMERICP

* Edi Weitz wrote:

> Yep, that's what I meant. After all, the function ALPHANUMERICP is
> defined for all characters, so if you decide to return NIL you thereby
> explicitely tell the user that the character is _not_
> alphanumeric. There's no third answer saying "we don't care"... :)

I think this is the real problem.  There should be a second value
saying whether or not you know, so NIL, NIL would mean `dunno'.

--tim


Updated at: 2020-12-10 09:00 UTC