问题描述
我花了最后一个小时做一些数据输入,现在在 Python 中遇到了麻烦。
基本上,我有一组 JSON 数据,我想将字段 price
中的值相加以加起来某个值(在我的情况下为 14.0)。最终结果应最大化 return
字段的总和。这是我的数据集示例(有更多团队和领域):
[
{ "team": "England","price": 7.0,"return": 2.21 },{ "team": "Belgium","return": 2.27 },{ "team": "Spain","price": 6.0,"return": 2.14 },{ "team": "Slovakia","price": 1.0,"return": 0.97 }
]
所以在这种情况下,有 3 个可能的答案:
a) 英格兰、比利时 (4.48)
b) 英格兰、西班牙、斯洛伐克 (5.28)
c) 比利时、西班牙、斯洛伐克 (5.38)
其中 c) 是最优的,因为它具有最大的 return
(5.38)。我想用Python来实现解决方案。
我看过这个问题,但似乎无法弄清楚如何在我的情况下实现它:Finding all possible combinations of numbers to reach a given sum
解决方法
注意:此解决方案采用一个假设,即数据中唯一的团队名称,以便在转换后的数据框中进行索引。
首先,将您的 JSON 数据转换为 Pandas 数据框。
from itertools import combinations
import pandas as pd
data = [
{"team": "England","price": 7.0,"return": 2.21 },{"team": "Belgium","return": 2.27 },{"team": "Spain","price": 6.0,"return": 2.14 },{ "team": "Slovakia","price": 1.0,"return": 0.97 }
]
df = pd.DataFrame(data)
team price return
0 England 7.0 2.21
1 Belgium 7.0 2.27
2 Spain 6.0 2.14
3 Slovakia 1.0 0.97
将团队列转换为列表
teams = df['team'].tolist()
['England','Belgium','Spain','Slovakia']
接下来,我们从团队列表中生成所有可能的组合
all_team_combinations = []
for i in range(1,len(teams)):
all_team_combinations.extend(list(combinations(teams,i)))
i += 1
现在,我们检查价格限制
price_threshold = 14
team_combinations_with_price_constraint = [c for c in all_team_combinations if df.loc[df['team'].isin(list(c)),'price'].sum() == price_threshold]
print(team_combinations_with_price_constraint)
[('England','Belgium'),('England','Slovakia'),('Belgium','Slovakia')]
接下来,我们计算具有约束条件的组合的收益总和
combinations_return_sum = [round(df.loc[df['team'].isin(list(c)),'return'].sum(),3) for c in team_combinations_with_price_constraint]
print(combinations_return_sum)
[4.48,5.32,5.38]
最后用最大返回和值的索引得到想要的组合
team_combinations_with_price_constraint[combinations_return_sum.index(max(combinations_return_sum))]
产生的结果
('Belgium','Slovakia')
要检查组合返回总和映射,您可以创建这样的字典。
combination_return_map = dict(zip(team_combinations_with_price_constraint,combinations_return_sum))
print(combination_return_map)
{('England','Belgium'): 4.48,'Slovakia'): 5.32,'Slovakia'): 5.38}
,
建立在子集和的先前SO解决方案的基础上,并使用pandas
我使用pandas来处理索引数据 我不知道你是否可以在你的例子中选择英格兰两次,但我继续尝试,使用 Pandas 和 itertools 解决它,可以省略 Pandas。
import pandas as pd
from itertools import product,groupby
# i use pandas to handle indexing data
your_json = [
{ "team": "England",{ "team": "Belgium",{ "team": "Spain","return": 0.97 }
]
your_data = pd.DataFrame(your_json)
#从以前的 SO 解决方案复制迭代器。它生成的值与您的目标相加
def subset_sum(numbers,target,partial=[],partial_sum=0):
if partial_sum == target:
yield partial
if partial_sum >= target:
return
for i,n in enumerate(numbers):
remaining = numbers[i + 1:]
yield from subset_sum(remaining,partial + [n],partial_sum + n)
#为了得到匹配价格值的索引,迭代解决方案值:
soltion_indexes =[]
for solution_values in subset_sum( your_data.price,14):
possible_index= []
for value in solution_values:
#indexes that have the right value are added to list of possible indexes for this solution
possible_index.append( your_data[your_data.price == value].index.tolist() )
# in order to get all combinations,product from itertools is used
listed_posible_indexes = list(product(*(possible_index)))
# if indexes not allready in solution,and it does not contain the same row twince,they are added to sultion indexes.
for possible_indexes in listed_posible_indexes:
possible_solution_indexes = sorted(list(possible_indexes))
if possible_solution_indexes not in soltion_indexes and not any(
possible_solution_indexes.count(x) > 1 for x in possible_solution_indexes) :
soltion_indexes.append(possible_solution_indexes)
#然后为解决方案索引中的每个索引提取行,以创建一个包含解决方案完整行的数据框,包括返回。
i=0
all_solutions= pd.DataFrame()
for combinations in soltion_indexes:
i+=1
solution = your_data.iloc[combinations]
solution["solution_number"]= i
all_solutions = pd.concat([all_solutions,solution])
#然后求出每组的收益总和:
ranked_groups_by_return = all_solutions.groupby("solution_number")['return'].sum().sort_values()
#找到并打印最佳组:
best = all_solutions[all_solutions.solution_number == ranked_groups_by_return.index[-1]]
print(best)
team price return solution_number
1 Belgium 7.0 2.27 3
2 Spain 6.0 2.14 3
3 Slovakia 1.0 0.97 3
,
我们需要迭代大小为 n
的所有组合,它小于数组中元素数量的大小,以捕获所有可能的组合。然后只需应用您的条件即可获得最大回报的组合。
from itertools import combinations
data = [
{ "team": "England","return": 0.97 }
]
sum_data = []
COMB_SUM = 14 # Desired combination sum
max_combi = None
max_sum_return = float('-inf') # Lowest possible value as temporary maximum
for i in range(len(data),-1): # 4,3,2,1
combsi = list(combinations(data,i)) # Combinations of size n
for index,combi in enumerate(combsi):
if sum(item['price'] for item in combi) == COMB_SUM:
sum_return = sum(item['return'] for item in combi)
if sum_return > max_sum_return:
max_sum_return = sum_return
max_combi = combi
print(max_combi)
print(max_sum_return)
输出
(
{'team': 'Belgium','price': 7.0,'return': 2.27},{'team': 'Spain','price': 6.0,'return': 2.14},{'team': 'Slovakia','price': 1.0,'return': 0.97}
)
5.38
,
当然,我会选择这样的
import numpy.ma as ma
import numpy as np
import pandas as pd
df = pd.DataFrame([
{ "team": "England","return": 0.97 }
])
price_limit = 14
powers_of_two = np.array([1<<n for n in range(len(df))])
combinations = (np.arange(2**len(df))[:,None] & powers_of_two)[1:].astype(bool)
prices = ma.masked_array(np.tile(df.price,(len(combinations),1)),mask=~combinations)
valid_combinations = (prices.sum(axis=-1) == price_limit)
returns = ma.masked_array(np.tile(df["return"],mask=~(valid_combinations[:,None] & combinations))
best = np.argmax(returns.sum(axis=-1))
print(f"Best combination (price={df['price'][combinations[best]].sum():.0f}): {' + '.join(df.team[combinations[best]].to_list())} = {df['return'][combinations[best]].sum():.2f}")
# prints: Best combination (price=14): Belgium + Spain + Slovakia = 5.38
这在内存使用方面有点宽松,但可以通过简单地重新骑乘 df.price
和 df.return
而不是平铺来改善这一点