Pandas 比较棒球比赛的得分

问题描述

df = pd.DataFrame({'Date': [402,402,403,404,404],'Team' : ['SFO','ARI','CUB','STL','NYY','SEA','OAK','LAA'],'Final' :[4,6,2,5,7,1,2]})

df_expected =  pd.DataFrame({'Date': [402,2],'Win_Loss': [0,1],'Run_diff': [-2,-3,3,-5,-1,1]})

我正在尝试创建两列:运行差异的 Run_diff 和二进制赢/输列。

迄今为止我能做到的最好:

设置奇偶列以尝试对游戏进行分组以进行分析

df['Test'] = 1
for i,j in enumerate(df['Final']):
    if (i % 2) == 0: 
        df['Test'][i] = 'Even'
    else: 
        df['Test'][i] = 'Odd'
  

尝试连续获得分数以更容易加/减

df['Shift'] = df['Final'].shift(fill_value = 0)

尝试使用上面创建的两个列

conditions = [(df['Test'] == 'Odd'),(df['Test'] == 'Even')]

values = [df['Final'] - df['Shift'],0]

df['Run_diff'] = np.select(conditions,values)

这适用于任何奇数列,这是我尝试将行分组为游戏。但我不知道如何让偶数列工作。

你不必使用我的代码,因为它不是最优雅的。我确信这一点。我非常愿意为此尝试应用新的/更好的技术。

谢谢。

解决方法

我选择了不同的方法。查看您的数据,我首先转换您的数据框:

new_df = pd.DataFrame(
    {
        "Date": df["Date"].iloc[::2].values,"Team1": df["Team"].iloc[::2].values,"Team2": df["Team"].iloc[1::2].values,"Final1": df["Final"].iloc[::2].values,"Final2": df["Final"].iloc[1::2].values,}
)

这将创建此 new_df

   Date Team1 Team2  Final1  Final2
0   402   SFO   ARI       4       6
1   402   CUB   STL       2       5
2   403   NYY   SEA       7       2
3   404   OAK   LAA       1       2

然后就很简单了:

new_df["run_diff"] = new_df["Final1"] - new_df["Final2"]
new_df["win_loss"] = (new_df["run_diff"] < 0).astype(int)
print(new_df)

打印:

   Date Team1 Team2  Final1  Final2  run_diff  win_loss
0   402   SFO   ARI       4       6        -2         1
1   402   CUB   STL       2       5        -3         1
2   403   NYY   SEA       7       2         5         0
3   404   OAK   LAA       1       2        -1         1
,

您可以按奇数行和偶数行重新索引。

在一行中计算两支球队的 run_diff 和 win_loss。

然后通过堆叠和重新索引将您的数据恢复到其初始顺序。

import pandas as pd

df = pd.DataFrame({'Date': [402,402,403,404,404],'Team': ['SFO','ARI','CUB','STL','NYY','SEA','OAK','LAA'],'Final': [4,6,2,5,7,1,2]})

# Re Index Based on Odd And Even Rows
new_df = df.set_index([df.index // 2,df.index % 2]).unstack()

# Calculate Run Diff in Both Directions
new_df['Run_Diff',0] = new_df['Final',0] - new_df['Final',1]
new_df['Run_Diff',1] = new_df['Final',1] - new_df['Final',0]
# Calculate Win Loss in Both Directions
new_df['Win_Loss',0] = (new_df['Run_Diff',0] > 0).astype(int)
new_df['Win_Loss',1] = (new_df['Run_Diff',1] > 0).astype(int)

# Remove Multi Index,Change Column Order
new_df = (
    new_df.stack(1)
        .reset_index(drop=True)[['Date','Team','Final','Win_Loss','Run_Diff']]
)
print(new_df)

输出:

   Date Team  Final  Win_Loss  Run_Diff
0   402  SFO      4         0        -2
1   402  ARI      6         1         2
2   402  CUB      2         0        -3
3   402  STL      5         1         3
4   403  NYY      7         1         5
5   403  SEA      2         0        -5
6   404  OAK      1         0        -1
7   404  LAA      2         1         1

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...