RSS Feed

Lisp Project of the Day


You can support this project by donating at:

Donate using PatreonDonate using Liberapay

Or see the list of project sponsors.


Tests ๐Ÿ˜€
CI ๐Ÿฅบ

This system adds Unicode support to the cl-ppcre.

What does it mean? It means that after loading cl-ppcre-unicode you'll be able to match against Unicode symbol properties.

A property matcher has a special syntax in cl-ppcre's regexps: \p{PropertyName}.

Here is an example:

;; This is how we can find out a position
;; of the first Cyrillic letter:

POFTHEDAY> (ppcre:scan "\\p{Cyrillic}"

;; Here we are extracting a
;; sequence of Emoji from the text:
POFTHEDAY> (ppcre:regex-replace
            ".*?([\\p{Emoticons}|\\p{Supplemental Symbols and Pictographs}]+).*"
            "Hello, Lisper! 🤗😃 How are you?"

We are using two different Unicode classes as properties because these two characters belong to different classes.

You can use cl-unicode to discover the character's unicode class:

POFTHEDAY> (cl-unicode:code-block #\😃)

POFTHEDAY> (cl-unicode:code-block #\🤗)
"Supplemental Symbols and Pictographs"

The way, how cl-ppcre-unicode works is very interesting. It turns out that cl-ppcre has a special hook which allows you to define a property resolver.

For example, if you want to have a special property for vowels, you might do something like that:

POFTHEDAY> (defun my-property-resolver (property-name)
             (if (string-equal property-name
                 (rutils:fn vovel-p (character)
                   (member character '(#\A #\E #\I #\O #\U)
                           :test #'char-equal))

POFTHEDAY> (setf cl-ppcre:*property-resolver*

;; And now we can use the "Vowel" property in any
;; regular expressions!
POFTHEDAY> (ppcre:regex-replace-all
            "Hello, Lisper! How are you?"
"Hll, Lspr! Hw r y?"

Isn't this cool!? ๐Ÿคช

Brought to you by 40Ants under Creative Commons License