Regular Expressions

From Coder Merlin
Within these castle walls be forged Mavens of Computer Science ...
— Merlin, The Coder

Introduction[edit]

A regular expression (often abbreviated as "regex" or "regexp") is a specially formatted pattern of characters that enable you to search for matches in another string (the target). In Swift (and many other languages) a regex is specified by enclosing the pattern between forward slashes, for example sh /. This construct is termed a "regex literal", and it will search for any occurrences of "sh" in the target string. Consider:

CoderMerlin™ Code Explorer: W0000 (1) 🟢

The above code defines a constant target which is the string to be searched and regex which is the pattern specifying what we'll be searching for. In this case, two "sh" strings are found. Because, by default, a regex is sensitive to case, the matches are found in the word "shelves" and "she", i.e. "She took a jar down off one of the shelves as she passed." This is a trivial example; let's explore the true power of regular expressions.

Matching Options[edit]

There are several instance methods which may be invoked on a regex to alter its behavior. The most common of these is ignoresCase. Let's try the same regex as before, but this time use the ignoresCase method:

CoderMerlin™ Code Explorer: W0000 (2) 🟢

We can see that in this case there are three matches, as we now include the first word, "She", with a capital "S".

Special Characters[edit]

We explored basic patterns using ordinary characters. We'll now learn two additional constructs that enable us to do much more. Metacharacters enable us to refer to a class of characters, such as digits, letters, or punctuation rather than just a single character. Quantifiers enable us to specify how many of a character (or group of characters) we expect.

Metacharacters[edit]

Let's search for all sequences that begin with a lowercase 'i' followed by any word character. A word character is represented by the special sequence \w, so the entire sequence becomes i\w /:

CoderMerlin™ Code Explorer: W0000 (3) 🟢

How do you explain this result? Is it what you expected? Let's now select just two-character words beginning with a lowercase 'i' rather than all sequences of two characters beginning with a lowercase 'i'. To do so, we'll use another meta character, \b, which represents a word boundary. Consider the sequence \bi\w\b /:

CoderMerlin™ Code Explorer: W0000 (4) 🟢

With judicious use of \b we can select every two character word:

CoderMerlin™ Code Explorer: W0000 (5) 🟢


Challenge: Change the above RegEx so that it matches all four-character words ending in an exclamation point.

Solution: \b\w\w\w\w\b! /


Captures[edit]

Typed Captures[edit]

Named Captures[edit]

Extended Delimiters[edit]

Replacing Text[edit]

target.replacing(regex, with: replacement)