Regex Cheatsheet
Quick Reference
Created: 2021-04-30
Anchors
| Anchors | Description | 
|---|---|
| \A | match start of string | 
| \Z | match end of string | 
| ^ | match start of line | 
| $ | end of line | 
| \b | start/end of words | 
| \B | inverse of \b | 
Groups
Note: Ellipsis (...) is for visualization purposes
| Group | Description | Consumes Characters? | 
|---|---|---|
| (?:...) | non-capturing group | ✔ | 
| (?P | named capturing group | ✔ | 
| (?=...) | positive lookahead | ✘ | 
| (?!...) | negative lookahead | ✘ | 
| (?<=...) | positive lookbehind | ✘ | 
| (?<!...) | negative lookbehind | ✘ | 
Character Classes
| Class | Description | 
|---|---|
| [ABC] | Match any character in the set | 
| [^ABC] | Match any character not in the set | 
| [A-z] | Matches a range | 
| . | Match any except linebreaks. Shortcut for [^\n\r] | 
| \w | Match word chars. Shortcut for [A-Za-z0-9_] | 
| \W | Negated ^w. Shortcut for [^A-Za-z0-9_] | 
| \d | Shortcut for [0-9] | 
| \D | Shortcut for [^0-9] | 
| \s | Whitespace | 
| [\uxxx-\uxxy] | Match a character in range (see below) | 
Import regex as re
regex is a 3rd party library that provides more advanced functionality. It's mostly a drop-in replacement, so it's common to see
import regex as re
Using regex we can take advantage of Unicode Categories, and Unicode Blocks
import regex as re
# test string
chars = "".join([chr(i) for i in range(32, 0x10ffff) if chr(i).isprintable()])
chars = "".join(chars)
result = re.findall("\p{InBasicLatin}", chars)
print(result[::8])
[' ', '(', '0', '8', '@', 'H', 'P', 'X', '`', 'h', 'p', 'x']