问题描述
我有一个如下所示的数据框 (cgf),我只想删除数字列的异常值:
Product object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries,0 to 179
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 180 non-null object
1 Age 180 non-null int64
2 Gender 180 non-null object
3 Education 180 non-null category
4 MaritalStatus 180 non-null object
5 Usage 180 non-null int64
6 fitness 180 non-null category
7 Income 180 non-null int64
8 Miles 180 non-null int64
dtypes: category(2),int64(4),object(3)
我尝试了几个使用 z-score 和 iqr 方法的脚本,但都没有奏效。例如,这里有一个用于 z-score 的脚本,但它不起作用
from scipy import stats
import numpy as np
z = np.abs(stats.zscore(cgf)) # get the z-score of every value with respect to their columns
print(z)
我收到此错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-102-2759aa3fbd60> in <module>
----> 1 z = np.abs(stats.zscore(cgf)) # get the z-score of every value with respect to their columns
2 print(z)
~\anaconda3\lib\site-packages\scipy\stats\stats.py in zscore(a,axis,ddof,nan_policy)
2495 sstd = np.nanstd(a=a,axis=axis,ddof=ddof,keepdims=True)
2496 else:
-> 2497 mns = a.mean(axis=axis,keepdims=True)
2498 sstd = a.std(axis=axis,keepdims=True)
2499
~\anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a,dtype,out,keepdims)
160 ret = umr_sum(arr,keepdims)
161 if isinstance(ret,mu.ndarray):
--> 162 ret = um.true_divide(
163 ret,rcount,out=ret,casting='unsafe',subok=False)
164 if is_float16_result and out is None:
TypeError: unsupported operand type(s) for /: 'str' and 'int'
np.where((cgf < (Q1 - 1.5 * iqr)) | (cgf > (Q3 + 1.5 * iqr)))
错误信息:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-96-bb3dfd2ce6c5> in <module>
----> 1 np.where((cgf < (Q1 - 1.5 * iqr)) | (cgf > (Q3 + 1.5 * iqr)))
~\anaconda3\lib\site-packages\pandas\core\ops\__init__.py in f(self,other)
702
703 # See GH#4537 for discussion of scalar op behavior
--> 704 new_data = dispatch_to_series(self,other,op,axis=axis)
705 return self._construct_result(new_data)
706
~\anaconda3\lib\site-packages\pandas\core\ops\__init__.py in dispatch_to_series(left,right,func,axis)
273 # _frame_arith_method_with_reindex
274
--> 275 bm = left._mgr.operate_blockwise(right._mgr,array_op)
276 return type(left)(bm)
277
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in operate_blockwise(self,array_op)
362 Apply array_op blockwise with another (aligned) BlockManager.
363 """
--> 364 return operate_blockwise(self,array_op)
365
366 def apply(self: T,f,align_keys=None,**kwargs) -> T:
~\anaconda3\lib\site-packages\pandas\core\internals\ops.py in operate_blockwise(left,array_op)
36 lvals,rvals = _get_same_shape_values(blk,rblk,left_ea,right_ea)
37
---> 38 res_values = array_op(lvals,rvals)
39 if left_ea and not right_ea and hasattr(res_values,"reshape"):
40 res_values = res_values.reshape(1,-1)
~\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in comparison_op(left,op)
228 if should_extension_dispatch(lvalues,rvalues):
229 # Call the method on lvalues
--> 230 res_values = op(lvalues,rvalues)
231
232 elif is_scalar(rvalues) and isna(rvalues):
~\anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self,other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self,other)
66
67 return new_method
~\anaconda3\lib\site-packages\pandas\core\arrays\categorical.py in func(self,other)
74 if not self.ordered:
75 if opname in ["__lt__","__gt__","__le__","__ge__"]:
---> 76 raise TypeError(
77 "Unordered Categoricals can only compare equality or not"
78 )
TypeError: Unordered Categoricals can only compare equality or not
我该如何解决其中一些错误?看来我的 df 中分类数据和数字数据的组合导致了问题,但我是新手,我不知道如何修复它以便我可以删除异常值