Regex Cheatsheet
Quick Reference
Created: 2021-04-30
Anchors
| Anchors | Description |
|---|---|
| \A | match start of string |
| \Z | match end of string |
| ^ | match start of line |
| $ | end of line |
| \b | start/end of words |
| \B | inverse of \b |
Groups
Note: Ellipsis (...) is for visualization purposes
| Group | Description | Consumes Characters? |
|---|---|---|
| (?:...) | non-capturing group | ✔ |
| (?P |
named capturing group | ✔ |
| (?=...) | positive lookahead | ✘ |
| (?!...) | negative lookahead | ✘ |
| (?<=...) | positive lookbehind | ✘ |
| (?<!...) | negative lookbehind | ✘ |
Character Classes
| Class | Description |
|---|---|
| [ABC] | Match any character in the set |
| [^ABC] | Match any character not in the set |
| [A-z] | Matches a range |
| . | Match any except linebreaks. Shortcut for [^\n\r] |
| \w | Match word chars. Shortcut for [A-Za-z0-9_] |
| \W | Negated ^w. Shortcut for [^A-Za-z0-9_] |
| \d | Shortcut for [0-9] |
| \D | Shortcut for [^0-9] |
| \s | Whitespace |
| [\uxxx-\uxxy] | Match a character in range (see below) |
Import regex as re
regex is a 3rd party library that provides more advanced functionality. It's mostly a drop-in replacement, so it's common to see
import regex as re
Using regex we can take advantage of Unicode Categories, and Unicode Blocks
import regex as re
# test string
chars = "".join([chr(i) for i in range(32, 0x10ffff) if chr(i).isprintable()])
chars = "".join(chars)
result = re.findall("\p{InBasicLatin}", chars)
print(result[::8])
[' ', '(', '0', '8', '@', 'H', 'P', 'X', '`', 'h', 'p', 'x']