Lisp HUG Maillist Archive

find-regexp-in-string: anything wrong?

Hi

am I doing something wrong here?

CL-USER 73 > (find-regexp-in-string "'[a-z]+'" " 'foo'" :start 0)
1
5

CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
NIL
NIL

Thanks

--
Marco Antoniotti


Re: find-regexp-in-string: anything wrong?

Uhm... why are you escaping the parentheses in the second form? That to me reads like you want to match literal parentheses and there aren't any in " 'foo'". 

On 23 February 2016 at 14:15, Antoniotti Marco <antoniotti.marco@disco.unimib.it> wrote:
Hi

am I doing something wrong here?

CL-USER 73 > (find-regexp-in-string "'[a-z]+'" " 'foo'" :start 0)
1
5

CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
NIL
NIL

Thanks

--
Marco Antoniotti



Re: find-regexp-in-string: anything wrong?

Aren’t you supposed to escape the parentheses in order to tell the regexp compiler that they are to be used for grouping?

In any case….

CL-USER 78 > (find-regexp-in-string "'([a-z])+'" " 'foo'" :start 0)
NIL
NIL


MA



On Feb 23, 2016, at 14:21 , Alessio Stalla <alessiostalla@gmail.com> wrote:

Uhm... why are you escaping the parentheses in the second form? That to me reads like you want to match literal parentheses and there aren't any in " 'foo'". 

On 23 February 2016 at 14:15, Antoniotti Marco <antoniotti.marco@disco.unimib.it> wrote:
Hi

am I doing something wrong here?

CL-USER 73 > (find-regexp-in-string "'[a-z]+'" " 'foo'" :start 0)
1
5

CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
NIL
NIL

Thanks

--
Marco Antoniotti




--
Marco Antoniotti, Associate Professor tel. +39 - 02 64 48 79 01
DISCo, Università Milano Bicocca U14 2043 http://bimib.disco.unimib.it
Viale Sarca 336
I-20126 Milan (MI) ITALY

Please check: http://cdac.lakecomoschool.org

Please note that I am not checking my Spam-box anymore.
Please do not forward this email without asking me first.





Re: find-regexp-in-string: anything wrong?

Antoniotti Marco wrote on Tue, 23 Feb 2016 13:15:26 +0000 16:15:

| CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
| NIL
| NIL 

On LWW 4.4, this also returns
1
5
--
Sincerely,
Dmitry Ivanov
lisp.ystok.ru

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?

Unable to parse email body. Email id is 13678

Re: find-regexp-in-string: anything wrong?


> On Feb 23, 2016, at 16:52 , Antoniotti Marco <antoniotti.marco@disco.unimib.it> wrote:
> 
> Aren’t you supposed to escape the parentheses in order to tell the regexp compiler that they are to be used for grouping?
> 
> In any case….
> 
> CL-USER 78 > (find-regexp-in-string "'([a-z])+'" " 'foo'" :start 0)
> NIL
> NIL
> 
> 

Just to clarify: my question refers to how do you do grouping in the LW regexps.

MA












> 
> 
>> On Feb 23, 2016, at 14:21 , Alessio Stalla <alessiostalla@gmail.com> wrote:
>> 
>> Uhm... why are you escaping the parentheses in the second form? That to me reads like you want to match literal parentheses and there aren't any in " 'foo'". 
>> 
>> On 23 February 2016 at 14:15, Antoniotti Marco <antoniotti.marco@disco..unimib.it> wrote:
>> Hi
>> 
>> am I doing something wrong here?
>> 
>> CL-USER 73 > (find-regexp-in-string "'[a-z]+'" " 'foo'" :start 0)
>> 1
>> 5
>> 
>> CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
>> NIL
>> NIL
>> 
>> Thanks
>> 
>> --
>> Marco Antoniotti
>> 
>> 
>> 
> 
> --
> Marco Antoniotti, Associate Professor
> tel. +39 - 02 64 48 79 01
> DISCo, Università Milano Bicocca U14 2043
> http://bimib.disco.unimib.it
> Viale Sarca 336
> I-20126 Milan (MI) ITALY
> 
> Please check: http://cdac.lakecomoschool.org
> 
> Please note that I am not checking my Spam-box anymore.
> Please do not forward this email without asking me first.
> 
> 
> 
> 
> 

--
Marco Antoniotti, Associate Professor			tel.	+39 - 02 64 48 79 01
DISCo, Università Milano Bicocca U14 2043		http://bimib.disco.unimib..it
Viale Sarca 336
I-20126 Milan (MI) ITALY

Please check: http://cdac.lakecomoschool.org

Please note that I am not checking my Spam-box anymore.
Please do not forward this email without asking me first.






_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?

But if you change the position of the "+", you change the meaning of
the regexp.  (In this case, it shouldn't make a difference because the
group only consists of one character.)

Anyway, I can reproduce this behavior on LWW 7.  Compare with CL-PPCRE:


CL-USER 28 > (ppcre:scan "'([a-z])+'" " 'foo'")
1
6
#(4)
#(5)

CL-USER 29 > (ppcre:scan "'([a-z]+)'" " 'foo'")
1
6
#(2)
#(5)

CL-USER 30 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'")
NIL
NIL

CL-USER 31 > (find-regexp-in-string "'\\([a-z]+\\)'" " 'foo'")
1
5






On Tue, Feb 23, 2016 at 5:09 PM, Raymond Wiker <rwiker@gmail.com> wrote:
>
> I think the problem was the position of the "+" - it needs to be inside the group (like it is in your example).
>
>> On 23 Feb 2016, at 17:01 , Martin Simmons <martin@lispworks.com> wrote:
>>
>>
>> I'm not sure why your example doesn't work, but this works:
>>
>> (find-regexp-in-string "'\\([a-z]+\\)'" " 'foo'" :start 0)
>>
>> --
>> Martin Simmons
>> LispWorks Ltd
>> http://www.lispworks.com/
>>
>>
>>>>>>> On Tue, 23 Feb 2016 13:15:26 +0000, Antoniotti Marco said:
>>>
>>> Hi
>>>
>>> am I doing something wrong here?
>>>
>>> CL-USER 73 > (find-regexp-in-string "'[a-z]+'" " 'foo'" :start 0)
>>> 1
>>> 5
>>>
>>> CL-USER 74 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'" :start 0)
>>> NIL
>>> NIL
>>>
>>> Thanks
>>>
>>> --
>>> Marco Antoniotti
>>
>> _______________________________________________
>> Lisp Hug - the mailing list for LispWorks users
>> lisp-hug@lispworks.com
>> http://www.lispworks.com/support/lisp-hug.html
>>
>
>
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html
>

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?

On Tue, Feb 23, 2016 at 5:17 PM, Edi Weitz <edi@agharta.de> wrote:
> CL-USER 30 > (find-regexp-in-string "'\\([a-z]\\)+'" " 'foo'")
> NIL
> NIL

On LWL 6.1.1, the return values are 1 and 5.  This looks like new
behavior on LW 7 to me...

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?

On Tue, Feb 23, 2016 at 6:15 PM, Marco Antoniotti <marcoxa@cs.nyu.edu> wrote:
> So, this is a bug, isn’t it?

If LW6 and LW7 behave differently, then, yes, I'd call it a bug.
Unless the documentation has changed... :)

> P.S.  I understand that in CL-PPCRE you quote parentheses if you want to match them, don’t I?

Yes, CL-PPCRE uses Perl syntax while I think that LW uses Emacs syntax.

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?



On 23/02/16 16:52, Antoniotti Marco wrote:
> Aren’t you supposed to escape the parentheses in order to tell the 
> regexp compiler that they are to be used for grouping? 
That depends.

Each regexp matcher may implement its own syntax.

Historically, unix regex(3) library function implements TWO different 
syntaxes!
Eventually, they've been normalized by the POSIX standard:
IEEE Std 1003.2 (``POSIX.2''), sections 2.8 (Regular Expression 
Notation) and B.5 (C Binding for Regular Expression Matching).
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

The Basic Regular Expressions (BRE), and the Extended Regular 
Expressions (ERE).

To make it short, in BRE, the special characters are: .[\*^$
while in ERE, the special characters are: .[\()*+?{|^$

The rules are a little complexified by the context (inside brackets or 
outside parentheses), but in the right context, it means that you have 
to escape () only if your regular expression is an ERE (or a derived of 
ERE), but not if it's a BRE (or a derived of BRE).

Some libraries accept both ERE and BRE (possibly with extensions), 
selected by a flag, notably unix regex(3), which can be confusing.

cl-ppcre uses ERE:

    (scan "'([a-z])+'" " 'foo' ")
    1
    6
    #(4)
    #(5)


> In any case….
>
> CL-USER 78 > (find-regexp-in-string "'([a-z])+'" " 'foo'" :start 0)
> NIL
> NIL

So it looks like this find-regexp-in-string is expecting a BRE by 
default. ( is not special, and you have to use \( and \) for grouping, 
and \+ for the repeatition, with \ escaped in the string: "'\\([a-z]\\+\\)'"


Finally, notice that some regexp matchers are even more confusing, by 
mixing elements both from ERE and BRE; for example, emacs regexps look 
like BRE, since you have to use \( and \), but you don't use \+ for the 
1-or-more repeatition, but +:
in emacs, it would be:

     (string-match "'\\([a-z]+\\)'" " 'foo'") --> 1



-- __Pascal J. Bourguignon__ http://www.informatimago.com/

_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Re: find-regexp-in-string: anything wrong?

Thanks Pascal for the nice recap.

As things stands, I’d say it is a bug on LW 7.x

Cheers

MA




> On Feb 23, 2016, at 21:16 , Pascal J. Bourguignon <pjb@informatimago.com> wrote:
> 
> 
> 
> 
> On 23/02/16 16:52, Antoniotti Marco wrote:
>> Aren’t you supposed to escape the parentheses in order to tell the regexp compiler that they are to be used for grouping? 
> That depends.
> 
> Each regexp matcher may implement its own syntax.
> 
> Historically, unix regex(3) library function implements TWO different syntaxes!
> Eventually, they've been normalized by the POSIX standard:
> IEEE Std 1003.2 (``POSIX.2''), sections 2.8 (Regular Expression Notation) and B.5 (C Binding for Regular Expression Matching).
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
> 
> The Basic Regular Expressions (BRE), and the Extended Regular Expressions (ERE).
> 
> To make it short, in BRE, the special characters are: .[\*^$
> while in ERE, the special characters are: .[\()*+?{|^$
> 
> The rules are a little complexified by the context (inside brackets or outside parentheses), but in the right context, it means that you have to escape () only if your regular expression is an ERE (or a derived of ERE), but not if it's a BRE (or a derived of BRE).
> 
> Some libraries accept both ERE and BRE (possibly with extensions), selected by a flag, notably unix regex(3), which can be confusing.
> 
> cl-ppcre uses ERE:
> 
>   (scan "'([a-z])+'" " 'foo' ")
>   1
>   6
>   #(4)
>   #(5)
> 
> 
>> In any case….
>> 
>> CL-USER 78 > (find-regexp-in-string "'([a-z])+'" " 'foo'" :start 0)
>> NIL
>> NIL
> 
> So it looks like this find-regexp-in-string is expecting a BRE by default. ( is not special, and you have to use \( and \) for grouping, and \+ for the repeatition, with \ escaped in the string: "'\\([a-z]\\+\\)'"
> 
> 
> Finally, notice that some regexp matchers are even more confusing, by mixing elements both from ERE and BRE; for example, emacs regexps look like BRE, since you have to use \( and \), but you don't use \+ for the 1-or-more repeatition, but +:
> in emacs, it would be:
> 
>    (string-match "'\\([a-z]+\\)'" " 'foo'") --> 1
> 
> 
> 
> -- __Pascal J. Bourguignon__ http://www.informatimago.com/
> 
> _______________________________________________
> Lisp Hug - the mailing list for LispWorks users
> lisp-hug@lispworks.com
> http://www.lispworks.com/support/lisp-hug.html

--
Marco Antoniotti, Associate Professor			tel.	+39 - 02 64 48 79 01
DISCo, Università Milano Bicocca U14 2043		http://bimib.disco.unimib..it
Viale Sarca 336
I-20126 Milan (MI) ITALY

Please check: http://cdac.lakecomoschool.org

Please note that I am not checking my Spam-box anymore.
Please do not forward this email without asking me first.






_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html


Updated at: 2020-12-10 08:32 UTC