我自己的函数在使用熊猫df.rolling时遇到了麻烦

问题描述

我有一个熊猫数据框raw_data,其中有两列:“ T”和“ BP”:

              T       BP
0        -0.500  115.790
1        -0.499  115.441
2        -0.498  115.441
3        -0.497  115.441
4        -0.496  115.790
...         ...      ...
647163  646.663  105.675
647164  646.664  105.327
647165  646.665  105.327
647166  646.666  105.327
647167  646.667  104.978

[647168 rows x 2 columns]

我想在滚动窗口上应用Hodges-Lehmann均值(这是一个可靠的平均值),并创建一个新列。功能如下:

def hodgesLehmannMean(x): 
    m = np.add.outer(x,x)
    ind = np.tril_indices(len(x),0)
    return 0.5 * np.median(m[ind])

我因此写道:

raw_data[new_col] = raw_data['BP'].rolling(21,min_periods=1,center=True,win_type=None,axis=0,closed=None).agg(hodgesLehmannMean)

但是我收到一串错误消息:

Traceback (most recent call last):
  File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 194,in _run_module_as_main
    return _run_code(code,main_globals,None,File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 87,in _run_code
    exec(code,run_globals)
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy\__main__.py",line 45,in <module>
    cli.main()
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py",line 430,in main
    run()
  File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py",line 267,in run_file
    runpy.run_path(options.target,run_name=compat.force_str("__main__"))
  File "C:\Users\tkpme\miniconda3\lib\runpy.py",line 265,in run_path
    return _run_module_code(code,init_globals,run_name,line 97,in _run_module_code
    _run_code(code,mod_globals,run_globals)
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 227,in <module>
    main()
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 75,in main
    raw_data[new_col] = raw_data['BP'].rolling(FILTER_WINDOW,File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1961,in aggregate
    return super().aggregate(func,*args,**kwargs)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 523,in aggregate
    return self.apply(func,raw=False,args=args,kwargs=kwargs)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1987,in apply
    return super().apply(
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1300,in apply
    return self._apply(
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 507,in _apply
    result = calc(values)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 495,in calc
    return func(x,start,end,min_periods)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py",line 1326,in apply_func
    return window_func(values,begin,min_periods)
  File "pandas\_libs\window\aggregations.pyx",line 1375,in pandas._libs.window.aggregations.roll_generic_fixed
  File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py",line 222,in hodgesLehmannMean
    m = np.add.outer(x,x)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py",line 705,in __array_ufunc__
    return construct_return(result)
  File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py",line 694,in construct_return
    raise NotImplementedError
NotImplementedError

似乎是由线驱动的

m = np.add.outer(x,x)

并指出未实现或缺少numpy。但是我一开始就导入numpy,如下所示:

import numpy  as np
import pandas as pd 

如果我向它提供列表或numpy数组,则该函数可以很好地单独运行,因此我不确定问题出在哪里。有趣的是,如果我使用中位数而不是Hodges-Lehmann均值,它就像一个魅力

raw_data[new_col] = raw_data['BP'].rolling(21,closed=None).median()

问题的原因是什么,如何解决

真诚的

托马斯·飞利浦

解决方法

我已经在一个较小的数据框上尝试过您的代码,并且效果很好,所以也许您的数据框上有些东西必须清理或转换。

,

解决了。事实证明

 m = np.add.outer(x,x)

要求x为数组。当我使用列表,numpy数组等对其进行测试时,它可以完美地工作,就像对您一样。但是.rolling行生成一个数据帧的一个切片,该切片与数组不一样,并且该函数失败并显示一个令人困惑的错误消息。我修改了该函数,从输入中创建了一个numpy数组,现在它可以正常工作了。

def hodgesLehmannMean(x):
    x_array = np.array(x)
    m = np.add.outer(x_array,x_array)
    ind = np.tril_indices(len(x_array),0)
    return 0.5 * np.median(m[ind])

感谢您的关注!