问题描述
我正在尝试使用Python和Pandas复制IBM SPSS函数@SINCE,但是不幸的是,我陷入了过程的一部分。
如果有人知道使用python复制IBM SPSS CLEM @SINCE的直接函数,我将不胜感激。
IMB @SINCE function description
“此函数返回条件为真的最后一条记录的偏移量,即该条件为真之前该记录之前的记录数。如果条件从未为真,则@SINCE返回@INDEX +1。” (IBM,2020年)
你们能用Python / Pandas帮助我解决这个问题吗
这里是问题,
我的数据如下:
+------+----------+
| Type | Flag |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
| c | |
+------+----------+
| B | myStatus |
+------+----------+
| c | |
+------+----------+
| c | myStatus |
+------+----------+
| c | |
+------+----------+
| d | |
+------+----------+
| d | |
+------+----------+
| A | myStatus |
+------+----------+
在IBM SPSS中,我使用以下公式获取此数据:
if Type = 'A' or Type = 'B' then @SINCE(Flag = 'myStatus') else -1 endif
这是输出:
+------+----------+----------------+
| Type | Flag | Expected Count |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 0 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| B | myStatus | 2 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| c | myStatus | -1 |
+------+----------+----------------+
| c | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| d | | -1 |
+------+----------+----------------+
| A | myStatus | 4 |
+------+----------+----------------+
谢谢。
解决方法
因此,我找到了解决此问题的方法:这是代码:
df = pd.DataFrame({"Type":["d","A","c","B","d","A"],"Flag":[np.nan,"myStatus",np.nan,"myStatus"]})
解决问题的功能:
def spssSince(df):
df_temp = df
df_temp = df[df.Flag=="myStatus"]
df_temp['last_ind'] = df_temp.index
df_temp['last_ind'] = df_temp.last_ind.shift(1)
df_temp['last_ind'] = df_temp['last_ind'].fillna(1)
df_temp["Expected Count"] = df_temp.index - df_temp.last_ind
df_temp.loc[~df_temp.Type.isin(["A","B"]),"Expected Count"] = -1
DFreturn = pd.merge(left=df,right=df_temp.drop(['Type','Flag','last_ind'],axis=1),how="left",left_index=True,right_index=True)
DFreturn["Expected Count"] = DFreturn["Expected Count"].fillna(-1)
return DFreturn
基本上,该函数根据条件计算最后一个SINCE值,并在具有验证的索引中计算实际索引(使用shift())。