Re: (url-to-plaintext url) and crawler utils on mac pro?
Hello Richard,
my approach would be to use drakma (http://www.weitz.de/drakma/) to
get the page source, then either
use closure-html (http://common-lisp.net/project/closure/closure-
html/) to parse html into sexps and then extract urls and plaintext
form the sexp-ized html,
or use cl-ppcre (http://weitz.de/cl-ppcre/) if you are happy
doing processing with regexps.
HTH, Denis
On 4 Jan 2009, at 20:33, Richard Wyckoff wrote:
>
> Greeting lisp-hug.
>
> Seeking help: I'm looking to harvest some text from the web.
>
> I need a function to return the source html+ for an url, another to
> harvest embedded urls from the source, and another to harvest the
> plaintext from the source.
>
> Someone else already have utils like this?
>
> It's been a while, and I need a jumpstart. I used to live and
> breath lisp - was a professional developer in NLP applications for
> many many years.
>
> I'm running Lispworks Pro on a Mac Pro, OSX 10.4.11, if that helps.
> Usually browse through Safari over DSL.
> Main questions, I suppose, involve getting to the outside world
> from my app.
>
> -vbr,
>
> Rich
>
> PS is this an appropriate forum to find lisp developers willing to
> take on small consulting jobs?
>