问题描述
我有一个表,其中包含前 3 个原因(表 1)和另一个表,其中包含每个变量所属的类别(表 2)。 我正在尝试将类别箱与表 3 中的原因表相匹配。
Table 1: top 3 reasons,basically which 3 variable was the most important for that Register number
RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
Table 2: The category bin for each variable
RegNo v2 v3 v4 v5 v6
1111 (0.44,0.64] (0.0,60.0] missing (-inf,3.0] (102.0,175.0]
2222 (-inf,0.33] (0.0,60.0] welldone missing missing
3333 (0.44,60.0] rare missing missing
4444 (0.44,60.0] missing missing (20.0,102.0]
5555 (0.64,inf] (0.0,60.0] missing missing (-inf,20.0]
Table 3: For each ID,find category bin label and replace reason1,2 & 3 with the labels
RegNo Reason1 Reason2 Reason3
1111 (0.0,60.0] (0.44,0.64] (102.0,175.0]
2222 (-inf,60.0] missing
3333 (0.0,0.64] missing
4444 (0.0,60.0] (20.0,102.0] (0.44,0.64]
5555 (0.0,60.0] (0.64,inf] missing
解决方法
方法
- 以与
join()
一起使用的方式索引两个数据框 - 然后它是来自 join 输出的每个原因的
pd.concat()
df1 = pd.read_csv(io.StringIO(""" RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
"""),sep="\s+")
df2 = pd.read_csv(io.StringIO(""" RegNo v2 v3 v4 v5 v6
1111 (0.44,0.64] (0.0,60.0] missing (-inf,3.0] (102.0,175.0]
2222 (-inf,0.33] (0.0,60.0] welldone missing missing
3333 (0.44,60.0] rare missing missing
4444 (0.44,60.0] missing missing (20.0,102.0]
5555 (0.64,inf] (0.0,60.0] missing missing (-inf,20.0]
"""),sep="\s\s+",engine="python")
# index main df
df1 = df1.set_index("RegNo")
# reshape and index reasons
dfm = df2.set_index("RegNo").stack()
df3 = pd.concat([
# add apprpriate column into index,don't want othert columns
(df1.set_index(c,append="True").loc[:,[]]
# now it's a straight forward join
.join(dfm.rename_axis(["RegNo",c]).to_frame())
# cleanup index and rename columns of joined DF
.droplevel(1).rename(columns={0:c}))
for c in ["Reason1","Reason2","Reason3"]],axis=1)
RegNo | 原因1 | 原因2 | 原因3 |
---|---|---|---|
1111 | (0.0,60.0] | (0.44,0.64] | (102.0,175.0] |
2222 | (-inf,0.33] | (0.0,60.0] | 缺少 |
3333 | (0.0,0.64] | 缺少 | |
4444 | (0.0,60.0] | (20.0,102.0] | (0.44,0.64] |
5555 | (0.0,60.0] | (0.64,inf] | 缺少 |