问题描述
我收到了有关不同品牌的糖果成分的数据集,以及有关以百分比表示的价格,糖和以百分比表示的利润的信息。成分信息是虚拟变量,其中0表示不存在该特性,而1表示存在该特性。目标是选择一种统计方法来确定消费者的喜好并预测新产品。我想在R中实现此解决方案。我的想法是执行基于选择的联合分析。首先,我计算了变量的列总和并得到了摘要。由于我在数据集中有虚拟变量和数字变量,因此出现了一个问题,即所有变量是否必须具有相同的数据类型?我也没有问题,没有任何受访者表示自己的选择。我只有糖果成分,糖百分比,价格和获胜百分比的特征可以作为潜在选择。基于选择的联合分析的进一步步骤是什么?
dput(rbind(head(cbc.df,10),tail(cbc.df,10)))
structure(list(competitorname = c("100 Grand","3 Musketeers","One dime","One quarter","Air Heads","Almond Joy","Baby Ruth","Boston Baked Beans","Candy Corn","Caramel Apple Pops","Tootsie Roll Juniors","Tootsie Roll Midgies","Tootsie Roll Snack Bars","Trolli Sour Bites","Twix","Twizzlers","Warheads","WelchÕs Fruit Snacks","WertherÕs Original Caramel","Whoppers"),chocolate = c(1L,1L,0L,1L),fruity = c(0L,0L),caramel = c(1L,peanutyalmondy = c(0L,nougat = c(0L,crispedricewafer = c(1L,hard = c(0L,bar = c(1L,pluribus = c(0L,sugarpercent = c(0.73199999,0.60399997,0.011,0.90600002,0.465,0.31299999,0.17399999,0.546,0.22,0.093000002,0.186,0.87199998),pricepercent = c(0.86000001,0.51099998,0.116,0.76700002,0.32499999,0.255,0.26699999,0.84799999
),winpercent = c("66.971.725","67.602.936","32.261.086","46.116.505","52.341.465","50.347.546","56.914.547","23.417.824","38.010.963","34.517.681","43.068.897","45.736.748","49.653.503","47.173.229","81.642.914","45.466.282","39.011.898","44.375.519","41.904.308","49.524.113")),row.names = c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,77L,78L,79L,80L,81L,82L,83L,84L,85L,86L),class = "data.frame")
摘要(cbc.df)
competitorname chocolate fruity caramel peanutyalmondy nougat crispedricewafer
Length:86 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Mode :character Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.4302 Mean :0.4535 Mean :0.1628 Mean :0.1628 Mean :0.0814 Mean :0.0814
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
hard bar pluribus sugarpercent pricepercent winpercent
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0110 Min. :0.0110 Length:86
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.2200 1st Qu.:0.2580 Class :character
Median :0.0000 Median :0.0000 Median :1.0000 Median :0.4650 Median :0.4650 Mode :character
Mean :0.1744 Mean :0.2442 Mean :0.5233 Mean :0.4736 Mean :0.4672
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.7320 3rd Qu.:0.6510
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :0.9880 Max. :0.9760
列总和
可变巧克力:0个无巧克力,1个巧克力
dplyr :: count(cbc.df,巧克力)
0 49
1 37
可变果味:0不果糖,1果味
dplyr :: count(cbc.df,果味)
0 47
1 39
可变焦糖:0不焦糖,1焦糖
dplyr :: count(cbc.df,焦糖色)
0 72
1 14
可变花生杏仁:0个非花生杏仁,1个花生杏仁
dplyr :: count(cbc.df,peanutyalmondy)
0 72
1 14
可变牛轧糖:0个非牛轧糖,1个牛轧糖
dplyr :: count(cbc.df,牛轧糖)
0 79
1 7
可变的cristedricewafer:0个非cristedricewafer,1个cristedricewafer
dplyr :: count(cbc.df,crispedricewafer)
0 79
1 7
硬变量:0不难,1哈特
dplyr :: count(cbc.df,hard)
0 71
1 15
可变条形:0不为条形,为1 bar
dplyr :: count(cbc.df,bar)
0 65
1 21
可变多发性结肠炎:0个非多发性结肠炎,1个多发性结肠炎
dplyr :: count(cbc.df,pluribus)
0 41
1 45
列总糖百分比
sugar_sum <- sum(cbc.df$sugarpercent)
40.731
列总价的百分比
price_sum <- sum(cbc.df$pricepercent)
40.18
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)