问题描述

我花了最后一个小时做一些数据输入，现在在 Python 中遇到了麻烦。

基本上，我有一组 JSON 数据，我想将字段 price 中的值相加以加起来某个值（在我的情况下为 14.0）。最终结果应最大化 return 字段的总和。这是我的数据集示例（有更多团队和领域）：

[
  { "team": "England","price": 7.0,"return": 2.21 },{ "team": "Belgium","return": 2.27 },{ "team": "Spain","price": 6.0,"return": 2.14 },{ "team": "Slovakia","price": 1.0,"return": 0.97 }
]

所以在这种情况下，有 3 个可能的答案：

a) 英格兰、比利时 (4.48)

b) 英格兰、西班牙、斯洛伐克 (5.28)

c) 比利时、西班牙、斯洛伐克 (5.38)

其中 c) 是最优的，因为它具有最大的 return (5.38)。我想用Python来实现解决方案。

我看过这个问题，但似乎无法弄清楚如何在我的情况下实现它：Finding all possible combinations of numbers to reach a given sum

解决方法

注意：此解决方案采用一个假设，即数据中唯一的团队名称，以便在转换后的数据框中进行索引。

首先，将您的 JSON 数据转换为 Pandas 数据框。

from itertools import combinations
import pandas as pd

data = [
    {"team": "England","price": 7.0,"return": 2.21 },{"team": "Belgium","return": 2.27 },{"team": "Spain","price": 6.0,"return": 2.14 },{ "team": "Slovakia","price": 1.0,"return": 0.97 }
    ]

df = pd.DataFrame(data)

       team  price  return
0   England    7.0    2.21
1   Belgium    7.0    2.27
2     Spain    6.0    2.14
3  Slovakia    1.0    0.97

将团队列转换为列表

teams = df['team'].tolist()
['England','Belgium','Spain','Slovakia']

接下来，我们从团队列表中生成所有可能的组合

all_team_combinations = []

for i in range(1,len(teams)):
  all_team_combinations.extend(list(combinations(teams,i)))
  i += 1

现在，我们检查价格限制

price_threshold = 14
team_combinations_with_price_constraint = [c for c in all_team_combinations if df.loc[df['team'].isin(list(c)),'price'].sum() == price_threshold]

print(team_combinations_with_price_constraint)

[('England','Belgium'),('England','Slovakia'),('Belgium','Slovakia')]

接下来，我们计算具有约束条件的组合的收益总和

combinations_return_sum = [round(df.loc[df['team'].isin(list(c)),'return'].sum(),3) for c in team_combinations_with_price_constraint]

print(combinations_return_sum)
[4.48,5.32,5.38]

最后用最大返回和值的索引得到想要的组合

team_combinations_with_price_constraint[combinations_return_sum.index(max(combinations_return_sum))]

产生的结果

('Belgium','Slovakia')

要检查组合返回总和映射，您可以创建这样的字典。

combination_return_map = dict(zip(team_combinations_with_price_constraint,combinations_return_sum))

print(combination_return_map)

{('England','Belgium'): 4.48,'Slovakia'): 5.32,'Slovakia'): 5.38}

建立在子集和的先前SO解决方案的基础上，并使用pandas

我使用pandas来处理索引数据我不知道你是否可以在你的例子中选择英格兰两次，但我继续尝试，使用 Pandas 和 itertools 解决它，可以省略 Pandas。

import pandas as pd
from itertools import product,groupby

# i use pandas to handle indexing data
your_json = [
  { "team": "England",{ "team": "Belgium",{ "team": "Spain","return": 0.97 }
]
your_data = pd.DataFrame(your_json)

#从以前的 SO 解决方案复制迭代器。它生成的值与您的目标相加

def subset_sum(numbers,target,partial=[],partial_sum=0):
    if partial_sum == target:
        yield partial
    if partial_sum >= target:
        return
    for i,n in enumerate(numbers):
        remaining = numbers[i + 1:]
        yield from subset_sum(remaining,partial + [n],partial_sum + n)

#为了得到匹配价格值的索引，迭代解决方案值：

soltion_indexes =[]
for solution_values in subset_sum( your_data.price,14):
    possible_index= []
    for value in solution_values:
        #indexes that have the right value are added to list of possible indexes for this solution
        possible_index.append( your_data[your_data.price == value].index.tolist() )
    # in order to get all combinations,product from itertools is used
    listed_posible_indexes = list(product(*(possible_index)))
    # if indexes not allready in solution,and it does not contain the same row twince,they are added to sultion indexes. 
    for possible_indexes in  listed_posible_indexes:
        possible_solution_indexes = sorted(list(possible_indexes))
        if possible_solution_indexes not in soltion_indexes and not any(
            possible_solution_indexes.count(x) > 1 for x in possible_solution_indexes) :
            soltion_indexes.append(possible_solution_indexes)

#然后为解决方案索引中的每个索引提取行，以创建一个包含解决方案完整行的数据框，包括返回。

i=0 
all_solutions= pd.DataFrame()
for combinations in soltion_indexes:
    i+=1
    solution = your_data.iloc[combinations]
    solution["solution_number"]= i 
    all_solutions = pd.concat([all_solutions,solution])

#然后求出每组的收益总和：

ranked_groups_by_return = all_solutions.groupby("solution_number")['return'].sum().sort_values()

#找到并打印最佳组：

best = all_solutions[all_solutions.solution_number == ranked_groups_by_return.index[-1]]
print(best)

       team  price  return  solution_number
1   Belgium    7.0    2.27                3
2     Spain    6.0    2.14                3
3  Slovakia    1.0    0.97                3

我们需要迭代大小为 n 的所有组合，它小于数组中元素数量的大小，以捕获所有可能的组合。然后只需应用您的条件即可获得最大回报的组合。

from itertools import combinations

data = [
  { "team": "England","return": 0.97 }
]

sum_data = []
COMB_SUM = 14  # Desired combination sum

max_combi = None
max_sum_return = float('-inf')  # Lowest possible value as temporary maximum

for i in range(len(data),-1):  # 4,3,2,1
    combsi = list(combinations(data,i))  # Combinations of size n
    for index,combi in enumerate(combsi):
        if sum(item['price'] for item in combi) == COMB_SUM:
            sum_return = sum(item['return'] for item in combi)
            if sum_return > max_sum_return:
                max_sum_return = sum_return
                max_combi = combi

print(max_combi)
print(max_sum_return)

输出

(
    {'team': 'Belgium','price': 7.0,'return': 2.27},{'team': 'Spain','price': 6.0,'return': 2.14},{'team': 'Slovakia','price': 1.0,'return': 0.97}
)
5.38

当然，我会选择这样的

import numpy.ma as ma
import numpy as np
import pandas as pd

df = pd.DataFrame([
  { "team": "England","return": 0.97 }
])

price_limit = 14
powers_of_two = np.array([1<<n for n in range(len(df))])
combinations = (np.arange(2**len(df))[:,None] & powers_of_two)[1:].astype(bool)

prices = ma.masked_array(np.tile(df.price,(len(combinations),1)),mask=~combinations)
valid_combinations = (prices.sum(axis=-1) == price_limit)

returns = ma.masked_array(np.tile(df["return"],mask=~(valid_combinations[:,None] & combinations))

best = np.argmax(returns.sum(axis=-1))

print(f"Best combination (price={df['price'][combinations[best]].sum():.0f}): {' + '.join(df.team[combinations[best]].to_list())} = {df['return'][combinations[best]].sum():.2f}")
# prints: Best combination (price=14): Belgium + Spain + Slovakia = 5.38

这在内存使用方面有点宽松，但可以通过简单地重新骑乘 df.price 和 df.return 而不是平铺来改善这一点

numpy pandas pandas python scipy scipy

在 Python 中求和从对象到给定数字的字段以求最大值

问题描述

解决方法

建立在子集和的先前SO解决方案的基础上，并使用pandas