问题描述
假设我们有以下数据框:
merged = pd.DataFrame({'week' : [0,1,2,2],'shopper' : [0,'product' : [63,80,91,42,77,55,95,98,202,225],'price' : [543,644,770,620,560,354,525,667,654,783,662],'discount' : [0,10,12,30,5,0]
})
print(merged)
week shopper product price discount
0 0 0 63 543 0
1 0 0 80 644 0
2 0 0 91 770 10
3 0 1 42 620 12
4 0 1 77 560 0
5 1 0 55 354 30
6 1 1 77 525 10
7 1 1 95 667 0
8 1 2 77 525 0
9 2 0 98 654 5
10 2 2 202 783 0
11 2 2 225 662 0
您能想出一种方法来估计每个购物者在第 3 周购买每种产品的概率吗?我正在寻找看起来像这样的最终结果:
week shopper product y
0 3 0 55 0.32
1 3 0 63 0.66
2 3 0 80 0.77
3 3 0 91 0.54
4 3 0 98 0.23
5 3 1 42 0.24
6 3 1 77 0.51
7 3 1 95 0.40
8 3 2 77 0.12
9 3 2 202 0.53
10 3 2 225 0.39
我曾想过使用客户-产品组合过去出现的时间量或订单之间的时间量来预测下周再次出现的概率,但我不知道如何实施
我将非常感谢您的帮助!
解决方法
这不是一件容易的事。准确性取决于过去观察的数量。您分享的这么小的数据并不能给出准确的解决方案。但是,下面的代码可能会给您一个想法。正如您猜测的那样,您需要找到产品之间的关系并应该使用这些关系。在下面,我首先得到了平均价格,以了解购物者通常会如何支付。
idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 0,1)
averaged_paid_price_0_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1)
averaged_paid_price_1_0 = np.average(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1)
averaged_paid_price_2_0 = np.average(merged['price'][idx_0_0 == 1])
total_paid_average_0 = (averaged_paid_price_2_0 + averaged_paid_price_1_0 + averaged_paid_price_0_0)/3
然后我将每个产品的价格除以 total_paid_average_0 如下
merged_price_points_0 = merged['price'] / total_paid_average_0
我基本上是想给他们加分。
毕竟我看过购物者的倾向和折扣之间有什么关系
idx_0_0_discount = np.multiply(merged['week'] == 0,1) * np.multiply(merged['discount'] != 0,1)
discount_exist_0_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 1,1)
discount_exist_1_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
idx_0_0_discount = np.multiply(merged['week'] == 2,1)
discount_exist_2_0 = np.sum(idx_0_0_discount) / np.sum(np.multiply(merged['shopper'] == 0,1))
discount_point_0 = (discount_exist_0_0 + discount_exist_1_0 + discount_exist_2_0) / 3
再次,我计算了分数。毕竟我已经尝试将所有要点结合起来。
您可以在下面找到所有代码。
import pandas as pd
import numpy as np
merged = pd.DataFrame({'week' : [0,1,2,2],'shopper' : [0,'product' : [63,80,91,42,77,55,95,98,202,225],'price' : [543,644,770,620,560,354,525,667,654,783,662],'discount' : [0,10,12,30,5,0]
})
idx_0_0 = np.multiply(merged['week'] == 0,1)
averaged_paid_price_2_0 = np.average(merged['price'][idx_0_0 == 1])
total_paid_average_0 = (averaged_paid_price_2_0 + averaged_paid_price_1_0 + averaged_paid_price_0_0)/3
idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 1,1)
averaged_paid_price_0_1 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1)
averaged_paid_price_1_1 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1)
averaged_paid_price_2_1 = np.mean(merged['price'][idx_0_0 == 1])
total_paid_average_1 = (averaged_paid_price_2_1 + averaged_paid_price_1_1 + averaged_paid_price_0_1)/3
idx_0_0 = np.multiply(merged['week'] == 0,1) * np.multiply(merged['shopper'] == 2,1)
averaged_paid_price_0_2 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 1,1)
averaged_paid_price_1_2 = np.mean(merged['price'][idx_0_0 == 1])
idx_0_0 = np.multiply(merged['week'] == 2,1)
averaged_paid_price_2_2 = np.mean(merged['price'][idx_0_0 == 1])
total_paid_average_2 = (averaged_paid_price_2_2 + averaged_paid_price_1_2 + averaged_paid_price_0_2)/3
merged_price_points_0 = merged['price'] / total_paid_average_0
idx_0_0_discount = np.multiply(merged['week'] == 0,1))
discount_point_0 = (discount_exist_0_0 + discount_exist_1_0 + discount_exist_2_0) / 3
merged_price_points_0 = merged_price_points_0.T
points_list = list()
total_point = list()
for counter in range(len(merged['product'])):
if merged['discount'][counter] != 0:
points_list.append(discount_point_0)
else:
points_list.append(0)
if merged_price_points_0[counter] > 1:
merged_price_points_0[counter] = merged_price_points_0[counter] - 1
else:
merged_price_points_0[counter] = 1-merged_price_points_0[counter]
total_point.append(merged_price_points_0[counter] +points_list[counter] )
sum_of_points = np.sum(total_point)
possibility_of_product_week3_for_0 = total_point / sum_of_points
print("Possibility of 3th Week for 0")
for counter in range(len(merged['product'])):
print(str(merged['product'][counter]) + "||" + str(possibility_of_product_week3_for_0[counter]))
输出
Possibility of 3th Week for 0
63||0.005959173323190062
80||0.05166730062127548
91||0.18671231139850392
42||0.10112843920375306
77||0.0037403321922150397
55||0.1769494104222138
77||0.07938379612019782
95||0.06479016102447063
77||0.01622923798656016
98||0.12052745023456325
202||0.13097502218841128
225||0.06193736528464565
我建议搜索克里斯评论的内容。这不是可靠的答案,但可能会给您一个想法。主要思想;定义产品之间的关系以及购物者购买它的原因,并给予他们积分。