问题描述
使用agg@ayhan进行编辑(比应用要快得多)。
from collections import Counter
df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))
出:
id
a {'val2': 2, 'val6': 1, 'val7': 1, 'val1': 1}
b {'val9': 1, 'val33': 1, 'val6': 1}
Name: val, dtype: object
此版本的时间:
%timeit df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))
1000 loops, best of 3: 820 µs per loop
@ayhan版本的时间:
%timeit df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
100 loops, best of 3: 1.91 ms per loo
解决方法
从上一个问题的数据开始:
f = pd.DataFrame({'id':['a','b','a'],'val':[['val1','val2'],['val33','val9','val6'],['val2','val6','val7']]})
print (df)
id val
0 a [val1,val2]
1 b [val33,val9,val6]
2 a [val2,val6,val7]
如何将列表放入Dict:
pd.Series([a for b in df.val.tolist() for a in b]).value_counts().to_dict()
{'val1': 1,'val2': 2,'val33': 1,'val6': 2,'val7': 1,'val9': 1}
如何按组获取列表:
df.groupby('id')["val"].apply(lambda x: (list([a for b in x.tolist() for a in b])) )
id
a [val1,val2,val7]
b [val33,val6]
Name: val,dtype: object
我如何按组获取字典列表:
df.groupby('id')["val"].apply(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
返回值:
id
a val1 1.0
val2 2.0
val6 1.0
val7 1.0
b val33 1.0
val6 1.0
val9 1.0
Name: val,dtype: float64
期望的输出我忽略了什么?:
id
a {'val1': 1,'val7': 1}
b {'val33': 1,'val6': 1,'val9': 1}
Name: val,dtype: object