Filter by offset
Filter by relative offset
What this means is best demonstrated by an example problem:
Suppose you have a DataFrame of user statuses. You'd like to find instances where a user changes from 'active' to 'inactive' (as opposed to 'inactive' to 'active')
Created: 2021-04-30
import pandas as pd
import numpy as np
# Make a sample dataset
user_states = ["active", "inactive", "meeting", "do not disturb"]
df = pd.DataFrame(np.random.choice(user_states, size=1000), columns=['status'])
df.head(5)
status | |
---|---|
0 | active |
1 | active |
2 | meeting |
3 | active |
4 | inactive |
Shift Operator
We know we can find where a row is active with:
df.loc[df.status == 'active']
To express the idea of 'then changes to inactive':
df.loc[df.status.shift(-1) == 'inactive']
Does the -1
surprise you? In reality, we aren't 'jumping' to another row with shift. We're shifting some row into the index of the row currently being evaluated.
In this case it means we take the following row and shift it back by -1 to place it into our scope.
matched_idx = df.loc[(df.status == 'active') & (df.status.shift(-1) == 'inactive')].index
df.loc[matched_idx]
status | |
---|---|
3 | active |
9 | active |
40 | active |
42 | active |
48 | active |
... | ... |
937 | active |
939 | active |
943 | active |
993 | active |
997 | active |
69 rows × 1 columns
Shifted Index
While this correctly selects the rows that met our condition we'd also like to see the 'inactive' row as well.
Here's how
matched_idx_pairs = matched_idx.union(matched_idx + 1)
df.loc[matched_idx_pairs]
status | |
---|---|
3 | active |
4 | inactive |
9 | active |
10 | inactive |
40 | active |
... | ... |
944 | inactive |
993 | active |
994 | inactive |
997 | active |
998 | inactive |
138 rows × 1 columns