在数据集中的多个日期上求解相同的优化函数

问题描述

我一直在研究一个包含约 100,000 行每日篮球统计数据的数据集。我的项目是在给定相同约束的情况下，从每天的前 9 名得分手中进行汇总。在精简版本的数据集上运行以下代码（仅在一个日期）时，我能够编译出最佳组合。

单一日期数据集

from pulp import *
import numpy as np
import pandas as pd

dateparse = lambda dates: pd.datetime.strptime(dates,'%m/%d/%Y')
players = pd.read_csv("Date.csv",parse_dates=['DATE'],index_col='DATE',date_parser=dateparse)
players["PG"] = (players["POSITION"] == 'PG').astype(float)
players["SG"] = (players["POSITION"] == 'SG').astype(float)
players["SF"] = (players["POSITION"] == 'SF').astype(float)
players["PF"] = (players["POSITION"] == 'PF').astype(float)
players["C"] = (players["POSITION"] == 'C').astype(float)
players["SALARY"] = players["SALARY"].astype(float)

model = LpProblem("problem",LpMaximize)

total_points = {}
cost = {}
PG = {}
SG = {}
SF = {}
PF = {}
C = {}
number_of_players = {}

for i,player in players.iterrows():
    var_name = 'x' + str(i) # Create variable name
    decision_var = pulp.LpVariable(var_name,cat='Binary') # Initialize Variables

    total_points[decision_var] = player["POINTS"] # Create PPG Dictionary
    cost[decision_var] = player["SALARY"] # Create Cost Dictionary
    
    # Create Dictionary for Player Types
    PG[decision_var] = player["PG"]
    SG[decision_var] = player["SG"]
    SF[decision_var] = player["SF"]
    PF[decision_var] = player["PF"]
    C[decision_var] = player["C"]
    number_of_players[decision_var] = 1.0

objective_function = pulp.LpAffineExpression(total_points)
model += objective_function
#Define cost constraint and add it to the model
total_cost = pulp.LpAffineExpression(cost)
model += (total_cost <= 60000)

PG_constraint = pulp.LpAffineExpression(PG)
SG_constraint = pulp.LpAffineExpression(SG)
SF_constraint = pulp.LpAffineExpression(SF)
PF_constraint = pulp.LpAffineExpression(PF)
C_constraint = pulp.LpAffineExpression(C)
total_players = pulp.LpAffineExpression(number_of_players)

model += (PG_constraint <= 2)
model += (SG_constraint <= 2)
model += (SF_constraint <= 2)
model += (PF_constraint <= 2)
model += (C_constraint <= 1)
model += (total_players <= 9)

model.solve()

当我的数据集只包含一个日期时，这有效并解决了问题。我希望添加范围内的所有日期并让优化器运行每个日期并为每一天提出最佳组合。

我试图创建一个循环来为每一天创建约束，但是这出错了

from datetime import datetime

start_date = "2021-01-25"
stop_date = "2021-01-27"
start = datetime.strptime(start_date,"%Y-%m-%d")
stop = datetime.strptime(stop_date,"%Y-%m-%d")
from datetime import timedelta
while start < stop:
    objective_function = pulp.LpAffineExpression(total_points)
    model += objective_function
#Define cost constraint and add it to the model
    total_cost = pulp.LpAffineExpression(cost)
    model += (total_cost <= 60000)

是否可以将相同的目标和约束分配给单个数据集中的多个日期？这样做的结果是我的数据集中每个日期都有 9 个最佳球员。

解决方法

只需创建您的日期列表，然后遍历这些并将过滤后的数据框（按日期）输入到您的求解器中：

from pulp import *
import numpy as np
import pandas as pd



dateparse = lambda dates: pd.datetime.strptime(dates,'%m/%d/%Y')
players = pd.read_csv("Date.csv",parse_dates=['DATE'],index_col='DATE',date_parser=dateparse)
players["PG"] = (players["POSITION"] == 'PG').astype(float)
players["SG"] = (players["POSITION"] == 'SG').astype(float)
players["SF"] = (players["POSITION"] == 'SF').astype(float)
players["PF"] = (players["POSITION"] == 'PF').astype(float)
players["C"] = (players["POSITION"] == 'C').astype(float)
players["SALARY"] = players["SALARY"].astype(float)


players = players.reset_index()
date_list = list(set(players['DATE'])) #<-- create the list of dates in the dataframe
date_list.sort()  #<-- sort the dates


for var_date in date_list:
    rows = []
    model = LpProblem("problem",LpMaximize)
    filter_players = players[players['DATE'] == var_date].reset_index(drop=True) #<-- filter by date
    total_points = {}
    cost = {}
    PG = {}
    SG = {}
    SF = {}
    PF = {}
    C = {}
    number_of_players = {}
    
    for i,player in filter_players.iterrows():  #<--then run on that filtered dataframe
        var_name = 'x' + str(i) # Create variable name
        decision_var = pulp.LpVariable(var_name,cat='Binary') # Initialize Variables
    
        total_points[decision_var] = player["POINTS"] # Create PPG Dictionary
        cost[decision_var] = player["SALARY"] # Create Cost Dictionary
        
        # Create Dictionary for Player Types
        PG[decision_var] = player["PG"]
        SG[decision_var] = player["SG"]
        SF[decision_var] = player["SF"]
        PF[decision_var] = player["PF"]
        C[decision_var] = player["C"]
        number_of_players[decision_var] = 1.0
    
    objective_function = pulp.LpAffineExpression(total_points)
    model += objective_function
    #Define cost constraint and add it to the model
    total_cost = pulp.LpAffineExpression(cost)
    model += (total_cost <= 60000)
    
    PG_constraint = pulp.LpAffineExpression(PG)
    SG_constraint = pulp.LpAffineExpression(SG)
    SF_constraint = pulp.LpAffineExpression(SF)
    PF_constraint = pulp.LpAffineExpression(PF)
    C_constraint = pulp.LpAffineExpression(C)
    total_players = pulp.LpAffineExpression(number_of_players)
    
    model += (PG_constraint <= 2)
    model += (SG_constraint <= 2)
    model += (SF_constraint <= 2)
    model += (PF_constraint <= 2)
    model += (C_constraint <= 1)
    model += (total_players <= 9)
    
    model.solve()
    
    row = {}
    for v in model.variables():
        if v.varValue:
            idx = int(v.name.replace('x',''))
            player_name = filter_players.iloc[idx]['PLAYER']
            position = filter_players.iloc[idx]['POSITION']
            salary = filter_players.iloc[idx]['SALARY']
            points = filter_players.iloc[idx]['POINTS']
            row = {'DATE':var_date,'PLAYER':player_name,'POSITION':position,'POINTS':points,'SALARY':salary}
            rows.append(row)
            
    lineup = pd.DataFrame(rows)
    lineup.loc[9,'PLAYER'] = ''
    lineup.loc[9,'POSITION'] = 'Total'
    lineup.loc[9,'POINTS'] = lineup['POINTS'].sum(axis=0)
    lineup.loc[9,'SALARY'] = lineup['SALARY'].sum(axis=0)   

    print ('Lineup for: %s' %var_date.strftime('%m/%d/%Y'))   
    print (lineup.iloc[:,1:])
    print ('\n\n')

输出：

Lineup for: 01/25/2021
                    PLAYER POSITION  POINTS   SALARY
0             Daniel Theis       PF    44.1   4600.0
1           Thaddeus Young       PF    44.3   4200.0
2              Luka Doncic       PG    82.2  10700.0
3             Delon Wright       PG    55.9   5600.0
4  Shai Gilgeous-Alexander       SG    49.8   8200.0
5          Carmelo Anthony       SF    39.7   4100.0
6              Enes Kanter        C    49.9   6500.0
7            Norman Powell       SG    42.2   5600.0
8             LeBron James       SF    73.6   9800.0
9                             Total   481.7  59300.0



Lineup for: 01/26/2021
            PLAYER POSITION  POINTS   SALARY
0     Terance Mann       SG    34.3   3500.0
1     John Collins       PF    43.7   7400.0
2       Trae Young       PG    49.1  10600.0
3      David Nwaba       SG    34.3   5700.0
4   Reggie Jackson       PG    47.4   5000.0
5       RJ Barrett       SF    29.8   7500.0
6    Royce O'Neale       PF    33.2   4500.0
7      Rudy Gobert        C    55.8   8300.0
8  De'Andre Hunter       SF    36.6   5600.0
9                     Total   364.2  58100.0



Lineup for: 01/27/2021
             PLAYER POSITION  POINTS   SALARY
0  Precious Achiuwa       SF    35.7   3500.0
1       Ben Simmons       PG    50.2   8200.0
2   Khris Middleton       SF    48.5   7900.0
3      Bradley Beal       SG    69.8  10300.0
4        Chris Paul       PG    44.5   7400.0
5     James Johnson       PF    34.9   4300.0
6       Rudy Gobert        C    70.5   8000.0
7    Taurean Prince       PF    36.9   4900.0
8       Buddy Hield       SG    50.5   5500.0
9                      Total   441.5  60000.0

每天都是不同的优化问题，因此您应该为要生成最佳阵容的每一天实例化 LpProblem 对象。

您应该将上面的代码包装在处理单个阵容优化的函数中。它将玩家统计数据的数据帧作为参数，即下面代码段中的 day_players。

您应该使用 pulp.LpVariable.dicts 来创建您的决策变量。无需遍历玩家名称即可执行此操作。

请注意，您不希望数据框的索引是日期。相反，保持日期列不变，然后循环遍历唯一日期

optimals = []
for day in players['DATE'].unique():
    day_players = players.loc[players['DATE'] == day,:]
    optimals.append(optimize_lineup(day_players))

您的代码中的约束似乎没有正确封装阵容要求。例如，您是否打算让有效阵容中的球员少于 9 人？考虑到 2 PG、2 SG、2 SF、2 PF 和 1 C 的 FanDuel 阵容要求，似乎所有这些限制都应该是 ==。有关 FanDuel 篮球的正确实现，请查看 this recent article。

我建议使用 pydfs-lineup-optimizer。注意：我是这个项目的贡献者，但我的角色是次要的，如果你使用它，我将一无所获。它有很好的文档，可以正确处理多职位资格、多个站点等。您上面提到的“多个日期”问题是通过每天实例化一个新的优化器对象然后将阵容附加到主列表来解决的。

machine-learning pulp python