Plump is able to parse, modify and serialize an HTML back.
Let's write a crawler to grab @shinmera's posts from Twitter!
POFTHEDAY> (defvar *raw-html* (dex:get "https://twitter.com/shinmera")) POFTHEDAY> (defvar *html* (plump:parse *raw-html*)) ;; We need all divs with class "tweet-text" POFTHEDAY> (defvar *posts* (remove-if-not (lambda (div) (str:containsp "tweet-text" (plump:attribute div "class"))) (plump:get-elements-by-tag-name *html* "p"))) POFTHEDAY> (loop for post in (rutils:take 5 *posts*) for full-text = (plump:render-text post) for short-text = (str:shorten 40 full-text) do (format t "- ~A~2%" short-text)) - 1478 Lighting sketch #onesies https:/... - Trust Level: Swiss A fridge with cool... - The arch.pic.twitter.com/gMamJfZ1r4 - らくがきばかりアップしていたやつ、今度は動きます。週末にプロクリエイトで描... - Shit's broken. Will be back in a few ...
This library has more utils for HTML parsing. Read the documentation to learn more.
If you are going to write crawlers in Common lisp, I recommend you to use Plump together with another @shimera's library - clss but we'll play with it tomorrow :)