问题描述
我一直在研究一个包含约 100,000 行每日篮球统计数据的数据集。我的项目是在给定相同约束的情况下,从每天的前 9 名得分手中进行汇总。在精简版本的数据集上运行以下代码(仅在一个日期)时,我能够编译出最佳组合。
单一日期数据集
from pulp import *
import numpy as np
import pandas as pd
dateparse = lambda dates: pd.datetime.strptime(dates,'%m/%d/%Y')
players = pd.read_csv("Date.csv",parse_dates=['DATE'],index_col='DATE',date_parser=dateparse)
players["PG"] = (players["POSITION"] == 'PG').astype(float)
players["SG"] = (players["POSITION"] == 'SG').astype(float)
players["SF"] = (players["POSITION"] == 'SF').astype(float)
players["PF"] = (players["POSITION"] == 'PF').astype(float)
players["C"] = (players["POSITION"] == 'C').astype(float)
players["SALARY"] = players["SALARY"].astype(float)
model = LpProblem("problem",LpMaximize)
total_points = {}
cost = {}
PG = {}
SG = {}
SF = {}
PF = {}
C = {}
number_of_players = {}
for i,player in players.iterrows():
var_name = 'x' + str(i) # Create variable name
decision_var = pulp.LpVariable(var_name,cat='Binary') # Initialize Variables
total_points[decision_var] = player["POINTS"] # Create PPG Dictionary
cost[decision_var] = player["SALARY"] # Create Cost Dictionary
# Create Dictionary for Player Types
PG[decision_var] = player["PG"]
SG[decision_var] = player["SG"]
SF[decision_var] = player["SF"]
PF[decision_var] = player["PF"]
C[decision_var] = player["C"]
number_of_players[decision_var] = 1.0
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function
#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 60000)
PG_constraint = pulp.LpAffineExpression(PG)
SG_constraint = pulp.LpAffineExpression(SG)
SF_constraint = pulp.LpAffineExpression(SF)
PF_constraint = pulp.LpAffineExpression(PF)
C_constraint = pulp.LpAffineExpression(C)
total_players = pulp.LpAffineExpression(number_of_players)
model += (PG_constraint <= 2)
model += (SG_constraint <= 2)
model += (SF_constraint <= 2)
model += (PF_constraint <= 2)
model += (C_constraint <= 1)
model += (total_players <= 9)
model.solve()
当我的数据集只包含一个日期时,这有效并解决了问题。我希望添加范围内的所有日期并让优化器运行每个日期并为每一天提出最佳组合。
我试图创建一个循环来为每一天创建约束,但是这出错了
from datetime import datetime
start_date = "2021-01-25"
stop_date = "2021-01-27"
start = datetime.strptime(start_date,"%Y-%m-%d")
stop = datetime.strptime(stop_date,"%Y-%m-%d")
from datetime import timedelta
while start < stop:
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function
#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 60000)
是否可以将相同的目标和约束分配给单个数据集中的多个日期?这样做的结果是我的数据集中每个日期都有 9 个最佳球员。
解决方法
只需创建您的日期列表,然后遍历这些并将过滤后的数据框(按日期)输入到您的求解器中:
from pulp import *
import numpy as np
import pandas as pd
dateparse = lambda dates: pd.datetime.strptime(dates,'%m/%d/%Y')
players = pd.read_csv("Date.csv",parse_dates=['DATE'],index_col='DATE',date_parser=dateparse)
players["PG"] = (players["POSITION"] == 'PG').astype(float)
players["SG"] = (players["POSITION"] == 'SG').astype(float)
players["SF"] = (players["POSITION"] == 'SF').astype(float)
players["PF"] = (players["POSITION"] == 'PF').astype(float)
players["C"] = (players["POSITION"] == 'C').astype(float)
players["SALARY"] = players["SALARY"].astype(float)
players = players.reset_index()
date_list = list(set(players['DATE'])) #<-- create the list of dates in the dataframe
date_list.sort() #<-- sort the dates
for var_date in date_list:
rows = []
model = LpProblem("problem",LpMaximize)
filter_players = players[players['DATE'] == var_date].reset_index(drop=True) #<-- filter by date
total_points = {}
cost = {}
PG = {}
SG = {}
SF = {}
PF = {}
C = {}
number_of_players = {}
for i,player in filter_players.iterrows(): #<--then run on that filtered dataframe
var_name = 'x' + str(i) # Create variable name
decision_var = pulp.LpVariable(var_name,cat='Binary') # Initialize Variables
total_points[decision_var] = player["POINTS"] # Create PPG Dictionary
cost[decision_var] = player["SALARY"] # Create Cost Dictionary
# Create Dictionary for Player Types
PG[decision_var] = player["PG"]
SG[decision_var] = player["SG"]
SF[decision_var] = player["SF"]
PF[decision_var] = player["PF"]
C[decision_var] = player["C"]
number_of_players[decision_var] = 1.0
objective_function = pulp.LpAffineExpression(total_points)
model += objective_function
#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 60000)
PG_constraint = pulp.LpAffineExpression(PG)
SG_constraint = pulp.LpAffineExpression(SG)
SF_constraint = pulp.LpAffineExpression(SF)
PF_constraint = pulp.LpAffineExpression(PF)
C_constraint = pulp.LpAffineExpression(C)
total_players = pulp.LpAffineExpression(number_of_players)
model += (PG_constraint <= 2)
model += (SG_constraint <= 2)
model += (SF_constraint <= 2)
model += (PF_constraint <= 2)
model += (C_constraint <= 1)
model += (total_players <= 9)
model.solve()
row = {}
for v in model.variables():
if v.varValue:
idx = int(v.name.replace('x',''))
player_name = filter_players.iloc[idx]['PLAYER']
position = filter_players.iloc[idx]['POSITION']
salary = filter_players.iloc[idx]['SALARY']
points = filter_players.iloc[idx]['POINTS']
row = {'DATE':var_date,'PLAYER':player_name,'POSITION':position,'POINTS':points,'SALARY':salary}
rows.append(row)
lineup = pd.DataFrame(rows)
lineup.loc[9,'PLAYER'] = ''
lineup.loc[9,'POSITION'] = 'Total'
lineup.loc[9,'POINTS'] = lineup['POINTS'].sum(axis=0)
lineup.loc[9,'SALARY'] = lineup['SALARY'].sum(axis=0)
print ('Lineup for: %s' %var_date.strftime('%m/%d/%Y'))
print (lineup.iloc[:,1:])
print ('\n\n')
输出:
Lineup for: 01/25/2021
PLAYER POSITION POINTS SALARY
0 Daniel Theis PF 44.1 4600.0
1 Thaddeus Young PF 44.3 4200.0
2 Luka Doncic PG 82.2 10700.0
3 Delon Wright PG 55.9 5600.0
4 Shai Gilgeous-Alexander SG 49.8 8200.0
5 Carmelo Anthony SF 39.7 4100.0
6 Enes Kanter C 49.9 6500.0
7 Norman Powell SG 42.2 5600.0
8 LeBron James SF 73.6 9800.0
9 Total 481.7 59300.0
Lineup for: 01/26/2021
PLAYER POSITION POINTS SALARY
0 Terance Mann SG 34.3 3500.0
1 John Collins PF 43.7 7400.0
2 Trae Young PG 49.1 10600.0
3 David Nwaba SG 34.3 5700.0
4 Reggie Jackson PG 47.4 5000.0
5 RJ Barrett SF 29.8 7500.0
6 Royce O'Neale PF 33.2 4500.0
7 Rudy Gobert C 55.8 8300.0
8 De'Andre Hunter SF 36.6 5600.0
9 Total 364.2 58100.0
Lineup for: 01/27/2021
PLAYER POSITION POINTS SALARY
0 Precious Achiuwa SF 35.7 3500.0
1 Ben Simmons PG 50.2 8200.0
2 Khris Middleton SF 48.5 7900.0
3 Bradley Beal SG 69.8 10300.0
4 Chris Paul PG 44.5 7400.0
5 James Johnson PF 34.9 4300.0
6 Rudy Gobert C 70.5 8000.0
7 Taurean Prince PF 36.9 4900.0
8 Buddy Hield SG 50.5 5500.0
9 Total 441.5 60000.0
,
每天都是不同的优化问题,因此您应该为要生成最佳阵容的每一天实例化 LpProblem 对象。
您应该将上面的代码包装在处理单个阵容优化的函数中。它将玩家统计数据的数据帧作为参数,即下面代码段中的 day_players。
您应该使用 pulp.LpVariable.dicts
来创建您的决策变量。无需遍历玩家名称即可执行此操作。
请注意,您不希望数据框的索引是日期。相反,保持日期列不变,然后循环遍历唯一日期
optimals = []
for day in players['DATE'].unique():
day_players = players.loc[players['DATE'] == day,:]
optimals.append(optimize_lineup(day_players))
您的代码中的约束似乎没有正确封装阵容要求。例如,您是否打算让有效阵容中的球员少于 9 人?考虑到 2 PG、2 SG、2 SF、2 PF 和 1 C 的 FanDuel 阵容要求,似乎所有这些限制都应该是 ==。有关 FanDuel 篮球的正确实现,请查看 this recent article。
我建议使用 pydfs-lineup-optimizer。注意:我是这个项目的贡献者,但我的角色是次要的,如果你使用它,我将一无所获。它有很好的文档,可以正确处理多职位资格、多个站点等。您上面提到的“多个日期”问题是通过每天实例化一个新的优化器对象然后将阵容附加到主列表来解决的。