问题描述
df = pd.DataFrame({'Date': [402,402,403,404,404],'Team' : ['SFO','ARI','CUB','STL','NYY','SEA','OAK','LAA'],'Final' :[4,6,2,5,7,1,2]})
df_expected = pd.DataFrame({'Date': [402,2],'Win_Loss': [0,1],'Run_diff': [-2,-3,3,-5,-1,1]})
我正在尝试创建两列:运行差异的 Run_diff 和二进制赢/输列。
迄今为止我能做到的最好:
设置奇偶列以尝试对游戏进行分组以进行分析
df['Test'] = 1
for i,j in enumerate(df['Final']):
if (i % 2) == 0:
df['Test'][i] = 'Even'
else:
df['Test'][i] = 'Odd'
尝试连续获得分数以更容易加/减
df['Shift'] = df['Final'].shift(fill_value = 0)
尝试使用上面创建的两个列
conditions = [(df['Test'] == 'Odd'),(df['Test'] == 'Even')]
values = [df['Final'] - df['Shift'],0]
df['Run_diff'] = np.select(conditions,values)
这适用于任何奇数列,这是我尝试将行分组为游戏。但我不知道如何让偶数列工作。
你不必使用我的代码,因为它不是最优雅的。我确信这一点。我非常愿意为此尝试应用新的/更好的技术。
谢谢。
解决方法
我选择了不同的方法。查看您的数据,我首先转换您的数据框:
new_df = pd.DataFrame(
{
"Date": df["Date"].iloc[::2].values,"Team1": df["Team"].iloc[::2].values,"Team2": df["Team"].iloc[1::2].values,"Final1": df["Final"].iloc[::2].values,"Final2": df["Final"].iloc[1::2].values,}
)
这将创建此 new_df
:
Date Team1 Team2 Final1 Final2
0 402 SFO ARI 4 6
1 402 CUB STL 2 5
2 403 NYY SEA 7 2
3 404 OAK LAA 1 2
然后就很简单了:
new_df["run_diff"] = new_df["Final1"] - new_df["Final2"]
new_df["win_loss"] = (new_df["run_diff"] < 0).astype(int)
print(new_df)
打印:
Date Team1 Team2 Final1 Final2 run_diff win_loss
0 402 SFO ARI 4 6 -2 1
1 402 CUB STL 2 5 -3 1
2 403 NYY SEA 7 2 5 0
3 404 OAK LAA 1 2 -1 1
,
您可以按奇数行和偶数行重新索引。
在一行中计算两支球队的 run_diff 和 win_loss。
然后通过堆叠和重新索引将您的数据恢复到其初始顺序。
import pandas as pd
df = pd.DataFrame({'Date': [402,402,403,404,404],'Team': ['SFO','ARI','CUB','STL','NYY','SEA','OAK','LAA'],'Final': [4,6,2,5,7,1,2]})
# Re Index Based on Odd And Even Rows
new_df = df.set_index([df.index // 2,df.index % 2]).unstack()
# Calculate Run Diff in Both Directions
new_df['Run_Diff',0] = new_df['Final',0] - new_df['Final',1]
new_df['Run_Diff',1] = new_df['Final',1] - new_df['Final',0]
# Calculate Win Loss in Both Directions
new_df['Win_Loss',0] = (new_df['Run_Diff',0] > 0).astype(int)
new_df['Win_Loss',1] = (new_df['Run_Diff',1] > 0).astype(int)
# Remove Multi Index,Change Column Order
new_df = (
new_df.stack(1)
.reset_index(drop=True)[['Date','Team','Final','Win_Loss','Run_Diff']]
)
print(new_df)
输出:
Date Team Final Win_Loss Run_Diff
0 402 SFO 4 0 -2
1 402 ARI 6 1 2
2 402 CUB 2 0 -3
3 402 STL 5 1 3
4 403 NYY 7 1 5
5 403 SEA 2 0 -5
6 404 OAK 1 0 -1
7 404 LAA 2 1 1