I own a dataset (
df) which contains the historical daily (
date) demand (
demand) of different supermarkets (
id_store offers different products (
id_product), but the assortment varies daily, so the same
id_product is not offered every day.
My goal is to find the mean demand of the last four identical weekdays (t-7, t-14, t-21, t-28) for each
id_product of the respective
id_store, if the
id_product was offered in the
id_store on more than one identical weekday. If the
id_product was not offered on any or only one identical weekday,
NaN should be returned.
import numpy as np import pandas as pd def mean_weekday_4w(df): query = "id_store == '%s' & id_product == '%s' & (date == '%s' | date == '%s' | date == '%s' | date == '%s')" mean_weekday_list = () for i, row in df.iterrows(): df_query = df.query(query % (row("id_store"), row("id_product"), row("date") - pd.Timedelta(days=7), row("date") - pd.Timedelta(days=14), row("date") - pd.Timedelta(days=21), row("date") - pd.Timedelta(days=28))) if df_query.shape(0) >= 2: mean_weekday_list.append(df_query("demand").mean()) else: mean_weekday_list.append(np.nan) df.loc(:, "mean_weekday_4w") = mean_weekday_list return df
I know using
iterrows is very inefficient but all my attemps using
groupby have failed.