问题描述
我有一个熊猫数据框raw_data,其中有两列:“ T”和“ BP”:
T BP
0 -0.500 115.790
1 -0.499 115.441
2 -0.498 115.441
3 -0.497 115.441
4 -0.496 115.790
... ... ...
647163 646.663 105.675
647164 646.664 105.327
647165 646.665 105.327
647166 646.666 105.327
647167 646.667 104.978
[647168 rows x 2 columns]
我想在滚动窗口上应用Hodges-Lehmann均值(这是一个可靠的平均值),并创建一个新列。功能如下:
def hodgesLehmannMean(x):
m = np.add.outer(x,x)
ind = np.tril_indices(len(x),0)
return 0.5 * np.median(m[ind])
我因此写道:
raw_data[new_col] = raw_data['BP'].rolling(21,min_periods=1,center=True,win_type=None,axis=0,closed=None).agg(hodgesLehmannMean)
但是我收到一串错误消息:
Traceback (most recent call last):
File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 194,in _run_module_as_main
return _run_code(code,main_globals,None,File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 87,in _run_code
exec(code,run_globals)
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy\__main__.py",line 45,in <module>
cli.main()
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py",line 430,in main
run()
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py",line 267,in run_file
runpy.run_path(options.target,run_name=compat.force_str("__main__"))
File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 265,in run_path
return _run_module_code(code,init_globals,run_name,line 97,in _run_module_code
_run_code(code,mod_globals,run_globals)
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 227,in <module>
main()
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 75,in main
raw_data[new_col] = raw_data['BP'].rolling(FILTER_WINDOW,File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1961,in aggregate
return super().aggregate(func,*args,**kwargs)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 523,in aggregate
return self.apply(func,raw=False,args=args,kwargs=kwargs)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1987,in apply
return super().apply(
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1300,in apply
return self._apply(
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 507,in _apply
result = calc(values)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 495,in calc
return func(x,start,end,min_periods)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1326,in apply_func
return window_func(values,begin,min_periods)
File "pandas\_libs\window\aggregations.pyx",line 1375,in pandas._libs.window.aggregations.roll_generic_fixed
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 222,in hodgesLehmannMean
m = np.add.outer(x,x)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py",line 705,in __array_ufunc__
return construct_return(result)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py",line 694,in construct_return
raise NotImplementedError
NotImplementedError
似乎是由线驱动的
m = np.add.outer(x,x)
并指出未实现或缺少numpy。但是我一开始就导入numpy,如下所示:
import numpy as np
import pandas as pd
如果我向它提供列表或numpy数组,则该函数可以很好地单独运行,因此我不确定问题出在哪里。有趣的是,如果我使用中位数而不是Hodges-Lehmann均值,它就像一个魅力
raw_data[new_col] = raw_data['BP'].rolling(21,closed=None).median()
问题的原因是什么,如何解决?
真诚的
托马斯·飞利浦
解决方法
我已经在一个较小的数据框上尝试过您的代码,并且效果很好,所以也许您的数据框上有些东西必须清理或转换。
,解决了。事实证明
m = np.add.outer(x,x)
要求x为数组。当我使用列表,numpy数组等对其进行测试时,它可以完美地工作,就像对您一样。但是.rolling
行生成一个数据帧的一个切片,该切片与数组不一样,并且该函数失败并显示一个令人困惑的错误消息。我修改了该函数,从输入中创建了一个numpy数组,现在它可以正常工作了。
def hodgesLehmannMean(x):
x_array = np.array(x)
m = np.add.outer(x_array,x_array)
ind = np.tril_indices(len(x_array),0)
return 0.5 * np.median(m[ind])
感谢您的关注!