Regex Cheatsheet
Quick Reference
Created: 2021-04-30
Anchors
Anchors | Description |
---|---|
\A | match start of string |
\Z | match end of string |
^ | match start of line |
$ | end of line |
\b | start/end of words |
\B | inverse of \b |
Groups
Note: Ellipsis (...) is for visualization purposes
Group | Description | Consumes Characters? |
---|---|---|
(?:...) | non-capturing group | ✔ |
(?P |
named capturing group | ✔ |
(?=...) | positive lookahead | ✘ |
(?!...) | negative lookahead | ✘ |
(?<=...) | positive lookbehind | ✘ |
(?<!...) | negative lookbehind | ✘ |
Character Classes
Class | Description |
---|---|
[ABC] | Match any character in the set |
[^ABC] | Match any character not in the set |
[A-z] | Matches a range |
. | Match any except linebreaks. Shortcut for [^\n\r] |
\w | Match word chars. Shortcut for [A-Za-z0-9_] |
\W | Negated ^w. Shortcut for [^A-Za-z0-9_] |
\d | Shortcut for [0-9] |
\D | Shortcut for [^0-9] |
\s | Whitespace |
[\uxxx-\uxxy] | Match a character in range (see below) |
Import regex as re
regex is a 3rd party library that provides more advanced functionality. It's mostly a drop-in replacement, so it's common to see
import regex as re
Using regex we can take advantage of Unicode Categories, and Unicode Blocks
import regex as re
# test string
chars = "".join([chr(i) for i in range(32, 0x10ffff) if chr(i).isprintable()])
chars = "".join(chars)
result = re.findall("\p{InBasicLatin}", chars)
print(result[::8])
[' ', '(', '0', '8', '@', 'H', 'P', 'X', '`', 'h', 'p', 'x']