问题描述
我正在尝试将帕累托前沿添加到我拥有的散点图。散点图数据为:
array([[1.44100000e+04,3.31808987e+07],[1.21250000e+04,3.22901074e+07],[6.03000000e+03,2.84933900e+07],[8.32500000e+03,2.83091317e+07],[6.68000000e+03,2.56373373e+07],[5.33500000e+03,1.89331461e+07],[3.87500000e+03,1.84107940e+07],[3.12500000e+03,1.60416570e+07],[6.18000000e+03,1.48054565e+07],[4.62500000e+03,1.33395341e+07],[5.22500000e+03,1.23150492e+07],[3.14500000e+03,1.20244820e+07],[6.79500000e+03,1.19525083e+07],[2.92000000e+03,9.18176770e+06],[5.45000000e+02,5.66882578e+06]])
散点图如下所示:
我使用这个 tutorial 来绘制帕累托图,但由于某种原因,结果非常奇怪,我得到了很小的红线:
这是我用过的代码:
def identify_pareto(scores):
# Count number of items
population_size = scores.shape[0]
# Create a NumPy index for scores on the pareto front (zero indexed)
population_ids = np.arange(population_size)
# Create a starting list of items on the Pareto front
# All items start off as being labelled as on the Parteo front
pareto_front = np.ones(population_size,dtype=bool)
print(pareto_front)
# Loop through each item. This will then be compared with all other items
for i in range(population_size):
# Loop through all other items
for j in range(population_size):
# Check if our 'i' pint is dominated by out 'j' point
if all(scores[j] >= scores[i]) and any(scores[j] > scores[i]):
# j dominates i. Label 'i' point as not on Pareto front
pareto_front[i] = 0
# Stop further comparisons with 'i' (no more comparisons needed)
break
# Return ids of scenarios on pareto front
return population_ids[pareto_front]
pareto = identify_pareto(scores)
pareto_front_df = pd.DataFrame(pareto_front)
pareto_front_df.sort_values(0,inplace=True)
pareto_front = pareto_front_df.values
#here I get as output weird results:
>>>
array([[ 5,81],[15,80],[30,79],[55,77],[70,65],[80,60],[90,40],[97,23],[99,4]])
x_all = scores[:,0]
y_all = scores[:,1]
x_pareto = pareto_front[:,0]
y_pareto = pareto_front[:,1]
plt.scatter(x_all,y_all)
plt.plot(x_pareto,y_pareto,color='r')
plt.xlabel('Objective A')
plt.ylabel('Objective B')
plt.show()
结果是细小的红线。
我的问题是,我的错误在哪里?我怎样才能找回帕累托线?
解决方法
我认为您的代码没有任何问题,而是您的数据由分数表示的方式(如果分数是您提供的第一个数组)。
数组 [1.44100000e+04,3.31808987e+07]
的第一个元素与其他值相比确实很大,因此它是函数内部唯一不满足 if all(scores[j] >= scores[i]) and any(scores[j] > scores[i]):
条件且未减少到零的外部迭代。所有其他点都减少到零。
我相信这是唯一绘制为红点的点。