ALPHANUMERICP
(loop for i below char-code-limit when (alphanumericp (code-char i)) count 1) In LW this results in 124 while Allegro CL yields 65470. For both implementations CHAR-CODE-LIMIT is 65536. Is this a bug? Thanks, Edi.
(loop for i below char-code-limit when (alphanumericp (code-char i)) count 1) In LW this results in 124 while Allegro CL yields 65470. For both implementations CHAR-CODE-LIMIT is 65536. Is this a bug? Thanks, Edi.
Edi Weitz (2003-03-25 01:08): > (loop for i below char-code-limit > when (alphanumericp (code-char i)) > count 1) > > In LW this results in 124 while Allegro CL yields 65470. For both > implementations CHAR-CODE-LIMIT is 65536. Is this a bug? > How many alphanumerics are there in Unicode? s. > Thanks, > Edi.
Unable to parse email body. Email id is 997
David Fox <davef@xanalys.com> writes: > (loop for i below char-code-limit > when (alphanumericp (code-char i)) > count 1) > > In LW this results in 124 while Allegro CL yields 65470. For both > implementations CHAR-CODE-LIMIT is 65536. Is this a bug? > > No. > > Implementations of ANSI Common Lisp are free to define their character > categories as long as these include specified subsets of the type > STANDARD-CHAR. See > http://www.lispworks.com/reference/HyperSpec/Body/13_ad.htm Yes, I was aware of this. I actually meant "bug" with respect to Unicode not with respect to ANSI Common Lisp. > In LispWorks currently the alphabetics include only those of the > Latin-1 character set. To do the ANSI CL categories stuff > comprehensively across Unicode would require not just the > alphabetics but the cased pairs, extra digits (e.g. the Arabic > digits) and so on. I would be interested to learn why you care. I don't really care (currently) but just came across this more or less by chance. However, although I'm by no means an expert on this topic I think there must exist some definition in the Unicode standard whether a code point is alphanumeric or not. Perl (sorry for this example...) has Unicode constructs like 'IsAlpha' or 'IsAlnum', other languages probably have that, too. Wouldn't it be nice if CL's ALPHANUMERICP would yield the same result for all implementations which support Unicode? > Allegro seems to take a more inclusive approach, and is returning > true for ALPHANUMERICP on some undefined Unicode code points. Ooops, I didn't look that close... :) > I don't know if that's better or worse than the exclusive approach > of LispWorks. Could be that both Lisps are ANSI CL-compliant whilst > not really implementing the Unicode notion of 'alphabetic' > (http://www.unicode.org/glossary/) Yep, that's what I meant. After all, the function ALPHANUMERICP is defined for all characters, so if you decide to return NIL you thereby explicitely tell the user that the character is _not_ alphanumeric. There's no third answer saying "we don't care"... :) Thanks, Edi.
* Edi Weitz wrote: > Yep, that's what I meant. After all, the function ALPHANUMERICP is > defined for all characters, so if you decide to return NIL you thereby > explicitely tell the user that the character is _not_ > alphanumeric. There's no third answer saying "we don't care"... :) I think this is the real problem. There should be a second value saying whether or not you know, so NIL, NIL would mean `dunno'. --tim