Filter by offset

Filter by relative offset

What this means is best demonstrated by an example problem:

Suppose you have a DataFrame of user statuses. You'd like to find instances where a user changes from 'active' to 'inactive' (as opposed to 'inactive' to 'active')

Created: 2021-04-30

import pandas as pd
import numpy as np

# Make a sample dataset

user_states = ["active", "inactive", "meeting", "do not disturb"]

df = pd.DataFrame(np.random.choice(user_states, size=1000), columns=['status'])
df.head(5)

	status
0	active
1	active
2	meeting
3	active
4	inactive

Shift Operator

Shift Docs

We know we can find where a row is active with:

df.loc[df.status == 'active']

To express the idea of 'then changes to inactive':

df.loc[df.status.shift(-1) == 'inactive']

Does the -1 surprise you? In reality, we aren't 'jumping' to another row with shift. We're shifting some row into the index of the row currently being evaluated.

In this case it means we take the following row and shift it back by -1 to place it into our scope.

matched_idx = df.loc[(df.status == 'active') & (df.status.shift(-1) == 'inactive')].index

df.loc[matched_idx]

	status
3	active
9	active
40	active
42	active
48	active
...	...
937	active
939	active
943	active
993	active
997	active

69 rows × 1 columns

Shifted Index

While this correctly selects the rows that met our condition we'd also like to see the 'inactive' row as well.

Here's how

matched_idx_pairs = matched_idx.union(matched_idx + 1)

df.loc[matched_idx_pairs]

	status
3	active
4	inactive
9	active
10	inactive
40	active
...	...
944	inactive
993	active
994	inactive
997	active
998	inactive

138 rows × 1 columns

My Notes

Filter by offset

Shift Operator

Shifted Index