问题描述
我正在使用一个数据框,需要替换第一列中的值。我的自然本能是使用python字典,但是,这是我的数据看起来像个例子(original_col):
original_col desired_col
cat animal
dog animal
bunny animal
cat animal
chair furniture
couch furniture
Bob person
Lisa person
字典将类似于:
my_dict: {'animal': ['cat','dog','bunny'],'furniture': ['chair','couch'],'person': ['Bob','Lisa']}
我无法使用典型的my_dict.get(),因为我要检索对应的KEY而不是值。字典是最好的数据结构吗?有什么建议吗?
解决方法
翻转字典:
my_new_dict = {v: k for k,vals in my_dict.items() for v in vals}
请注意,如果您有类似dog->animal,dog->person
DataFrame.replace
已经接受了特定结构的词典,因此您无需重新发明轮子:main.cpp:15:65: error: no matching function for call to 'accumulate(std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,unsigned int> >,1>::_Iterator<true>,std::__detail::_Node_iterator<std::pair<const unsigned int,unsigned int>,false,false>,int)'
15 | std::cout << std::accumulate(values.begin(),values.end(),0) << std::endl;
| ^
In file included from /usr/local/include/c++/10.2.0/numeric:62,from main.cpp:5:
/usr/local/include/c++/10.2.0/bits/stl_numeric.h:134:5: note: candidate: 'template<class _InputIterator,class _Tp> constexpr _Tp std::accumulate(_InputIterator,_InputIterator,_Tp)'
134 | accumulate(_InputIterator __first,_InputIterator __last,_Tp __init)
| ^~~~~~~~~~
/usr/local/include/c++/10.2.0/bits/stl_numeric.h:134:5: note: template argument deduction/substitution failed:
main.cpp:15:65: note: deduced conflicting types for parameter '_InputIterator' ('std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,1>::_Iterator<true>' and 'std::__detail::_Node_iterator<std::pair<const unsigned int,false>')
15 | std::cout << std::accumulate(values.begin(),from main.cpp:5:
/usr/local/include/c++/10.2.0/bits/stl_numeric.h:161:5: note: candidate: 'template<class _InputIterator,class _Tp,class _BinaryOperation> constexpr _Tp std::accumulate(_InputIterator,_Tp,_BinaryOperation)'
161 | accumulate(_InputIterator __first,_Tp __init,| ^~~~~~~~~~
/usr/local/include/c++/10.2.0/bits/stl_numeric.h:161:5: note: template argument deduction/substitution failed:
main.cpp:15:65: note: deduced conflicting types for parameter '_InputIterator' ('std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,0) << std::endl;
| ^
{col_name: {old_value: new_value}}
或者您可以使用Series.replace
,然后只需要内部字典:
df.replace({'original_col': {'cat': 'animal','dog': 'animal','bunny': 'animal','chair': 'furniture','couch': 'furniture','Bob': 'person','Lisa': 'person'}})
,
pandas map()
函数使用字典或其他pandas系列来执行IIUC这种查找:
# original column / data
data = ['cat','dog','bunny','cat','chair','couch','Bob','Lisa']
# original dict
my_dict: {'animal': ['cat','bunny'],'furniture': ['chair','couch'],'person': ['Bob','Lisa']
}
# invert the dictionary
new_dict = { v: k
for k,vs in my_dict.items()
for v in vs }
# create series and use `map()` to perform dictionary lookup
df = pd.concat([
pd.Series(data).rename('original_col'),pd.Series(data).map(new_values).rename('desired_col')],axis=1)
print(df)
original_col desired_col
0 cat animal
1 dog animal
2 bunny animal
3 cat animal
4 chair furniture
5 couch furniture
6 Bob person
7 Lisa person