问题描述
在对透视数据帧进行切片时,无法获取列名,并且在下面抛出错误。无法找出正在发生的事情,好像在对数据帧进行切片时未提取可用的列一样。 数据集为“ MovieLens 100K数据集”
数据集为
movieRatings=ratings.pivot_table(index=['user_id'],columns=['title'],values=['rating'])
从“ movieRatings.head()”中脱颖而出
rating
title 'Til There Was You (1997) 1-900 (1994) 101 Dalmatians (1996) 12 Angry Men (1957) 187 (1997) 2 Days in the Valley (1996) 20,000 Leagues Under the Sea (1954) 2001: A Space Odyssey (1968) 3 Ninjas: High Noon At Mega Mountain (1998) 39 Steps,The (1935) ... Yankee Zulu (1994) Year of the Horse (1997) You So Crazy (1994) Young Frankenstein (1974) Young Guns (1988) Young Guns II (1990) Young Poisoner's Handbook,The (1995) Zeus and Roxanne (1997) unknown Á köldum klaka (Cold Fever) (1994)
user_id
1 NaN NaN 2.0 5.0 NaN NaN 3.0 4.0 NaN NaN ... NaN NaN NaN 5.0 3.0 NaN NaN NaN 4.0 NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN 2.0 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN 2.0 NaN NaN NaN NaN 4.0 NaN NaN ... NaN NaN NaN 4.0 NaN NaN NaN NaN 4.0 NaN
下一条语句
starWarsRatings = movieRatings["12 Angry Men (1957)"]
starWarsRatings.head()
错误消息
KeyError Traceback (most recent call last)
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self,key,method,tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: '12 Angry Men (1957)'
During handling of the above exception,another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-134-9c0f30e1dc98> in <module>
----> 1 starWarsRatings = movieRatings["12 Angry Men (1957)"]
2 starWarsRatings.head()
D:\Anaconda\lib\site-packages\pandas\core\frame.py in __getitem__(self,key)
2797 if is_single_key:
2798 if self.columns.nlevels > 1:
-> 2799 return self._getitem_multilevel(key)
2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
D:\Anaconda\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self,key)
2847 def _getitem_multilevel(self,key):
2848 # self.columns is a MultiIndex
-> 2849 loc = self.columns.get_loc(key)
2850 if isinstance(loc,(slice,Series,np.ndarray,Index)):
2851 new_columns = self.columns[loc]
D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in get_loc(self,method)
2660 if not isinstance(key,(tuple,list)):
2661 # not including list here breaks some indexing,xref #30892
-> 2662 loc = self._get_level_indexer(key,level=0)
2663 return _maybe_to_slice(loc)
2664
D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in _get_level_indexer(self,level,indexer)
2927 else:
2928
-> 2929 code = self._get_loc_single_level_index(level_index,key)
2930
2931 if level > 0 or self.lexsort_depth == 0:
D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in _get_loc_single_level_index(self,level_index,key)
2596 return -1
2597 else:
-> 2598 return level_index.get_loc(key)
2599
2600 def get_loc(self,method=None):
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self,tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key],method=method,tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: '12 Angry Men (1957)'
解决方法
编辑-原始答案不是OP想要的。
问题在于您创建它的方式。数组/列表values
参数的解释与简单字符串的解释不同。在这种情况下,您只需要使用字符串one。如果使用数组,则需要使用[ratings
,...]进行索引。
所以只需更改表的声明,它就可以工作。
table = df.pivot_table(index=[..],values='JUST A STRING',columns=[..])