(url-to-plaintext url) and crawler utils on mac pro?

Greeting lisp-hug.

Seeking help: I'm looking to harvest some text from the web.

I need a function to return the source html+ for an url, another to 
harvest embedded urls from the source, and another to harvest the 
plaintext from the source.

Someone else already have utils like this?

It's been a while, and I need a jumpstart.  I used to live and breath 
lisp - was a professional developer in NLP applications for many many years.

I'm running Lispworks Pro on a Mac Pro, OSX 10.4.11, if that helps. 
Usually browse through Safari over DSL.
Main questions, I suppose, involve getting to the outside world from my app.

-vbr,

Rich

PS is this an appropriate forum to find lisp developers willing to take 
on small consulting jobs?

Re: (url-to-plaintext url) and crawler utils on mac pro?

In partial answer to my own question...

Chris Perkins wrote 18 months ago:

 > For any others who have run into this problem,  I have made a small
 > workaround..  Rather, than system:open-url, the following code can be
 > used to use the Cocoa bridge to open a url:
 >
 > (defun open-url-alternative (path)
 >    #+mac
 >   (objc:invoke (objc:invoke "NSWorkspace" "sharedWorkspace")
 > "openURL:" (objc:invoke "NSURL" "URLWithString:" path)
 >   #--mac
 >   (system:open-url path))

and Denis Pousseur forwarded a year ago:

------ Forwarded Message
From: <davef <at> lispworks.com>
Reply-To: <davef <at> lispworks.com>
Date: Thu, 3 May 2007 11:26:22 +0100 (BST)
To: <mkuuskan <at> siba.fi>
Cc: <lisp-hug <at> lispworks.com>
Subject: Re: Start Html from LW

  > Try this: (hqn-web::browse "http://www2.siba.fi/PWGL/")

That's good in LispWorks 4.x.

In LispWorks 5.0 the documented API is SYSTEM:OPEN-URL.

#+LispWorks4 (hqn-web::browse "http://www2.siba.fi/PWGL/")
#-LispWorks4 (sys:open-url "http://www2.siba.fi/PWGL/")

-- 
Dave Fox
LispWorks Ltd
http://www.lispworks.com/

------ End of Forwarded Message

Re: (url-to-plaintext url) and crawler utils on mac pro?

Hello Richard,

my approach would be to use drakma (http://www.weitz.de/drakma/) to  
get the page source, then either
    use closure-html (http://common-lisp.net/project/closure/closure- 
html/) to parse html into sexps and then extract urls and plaintext  
form the sexp-ized html,
    or use cl-ppcre (http://weitz.de/cl-ppcre/) if you are happy  
doing processing with regexps.


HTH, Denis



On 4 Jan 2009, at 20:33, Richard Wyckoff wrote:

>
> Greeting lisp-hug.
>
> Seeking help: I'm looking to harvest some text from the web.
>
> I need a function to return the source html+ for an url, another to  
> harvest embedded urls from the source, and another to harvest the  
> plaintext from the source.
>
> Someone else already have utils like this?
>
> It's been a while, and I need a jumpstart.  I used to live and  
> breath lisp - was a professional developer in NLP applications for  
> many many years.
>
> I'm running Lispworks Pro on a Mac Pro, OSX 10.4.11, if that helps.  
> Usually browse through Safari over DSL.
> Main questions, I suppose, involve getting to the outside world  
> from my app.
>
> -vbr,
>
> Rich
>
> PS is this an appropriate forum to find lisp developers willing to  
> take on small consulting jobs?
>

Re: (url-to-plaintext url) and crawler utils on mac pro?

Hi Richard,

Another HTML parser possibility is

http://www.cliki.net/CL-HTML-Parse

HTH,

On Jan 4, 2009, at 1:33 PM, Richard Wyckoff wrote:

>
> Greeting lisp-hug.
>
> Seeking help: I'm looking to harvest some text from the web.
>
> I need a function to return the source html+ for an url, another to  
> harvest embedded urls from the source, and another to harvest the  
> plaintext from the source.
>
> Someone else already have utils like this?
>
> It's been a while, and I need a jumpstart.  I used to live and  
> breath lisp - was a professional developer in NLP applications for  
> many many years.
>
> I'm running Lispworks Pro on a Mac Pro, OSX 10.4.11, if that helps.  
> Usually browse through Safari over DSL.
> Main questions, I suppose, involve getting to the outside world from  
> my app.
>
> -vbr,
>
> Rich
>
> PS is this an appropriate forum to find lisp developers willing to  
> take on small consulting jobs?
>

--
Gary Warren King, metabang.com
Cell: (413) 559 8738
Fax: (206) 338-4052
gwkkwg on Skype * garethsan on AIM