计算熊猫系列的熵时出错

问题描述

我正在尝试计算熊猫序列的熵。具体来说,我将Direction中的字符串按顺序分组。具体来说,使用此功能

diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()

将返回Direction中相同的字符串计数,直到更改为止。因此,对于相同的Direction字符串的每个序列,我想计算X,Y的熵。

使用代码对相同字符串进行排序:

0    1
1    1
2    1
3    1
4    1
5    2
6    2
7    2
8    3
9    3

代码曾经可以使用,但是现在返回错误。我不确定这是否是升级之后。

import pandas as pd
import numpy as np

def ApEn(U,m = 2,r = 0.2):

    '''
    Approximate Entropy 

    Quantify the amount of regularity over time-series data.

    Input parameters:
    
    U = Time series
    m = Length of compared run of data (subseries length)
    r = Filtering level (tolerance). A positive number

    '''

    def _maxdist(x_i,x_j):
        return max([abs(ua - va) for ua,va in zip(x_i,x_j)])

    def _phi(m):
        x = [U.tolist()[i:i + m] for i in range(N - m + 1)] 
        C = [len([1 for x_j in x if _maxdist(x_i,x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return (N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return abs(_phi(m + 1) - _phi(m))

def Entropy(df):

    '''
    Calculate entropy for individual direction
    '''

    df = df[['Time','Direction','X','Y']]
                                    
    diff_dir = df.iloc[0:,1].shift()).cumsum()

    # Calculate ApEn grouped by direction. 
    df['ApEn_X'] = df.groupby(diff_dir)['X'].transform(ApEn)
    df['ApEn_Y'] = df.groupby(diff_dir)['Y'].transform(ApEn)                 

    return df


df = pd.DataFrame(np.random.randint(0,50,size = (10,2)),columns=list('XY'))
df['Time'] = range(1,len(df) + 1)

direction = ['Left','Left','Right','Left']
df['Direction'] = direction


# Calculate defensive regularity
entropy = Entropy(df)

错误

return (N - m + 1.0)**(-1) * sum(np.log(C))
ZeroDivisionError: 0.0 cannot be raised to a negative power

解决方法

问题出在下面的代码上

(N - m + 1.0)**(-1)

请考虑以下情况:N==1,并且由于 N = len(U),这种情况发生在groupby产生的组的大小为1时。由于m==2的最终结果为

(1-2+1)**-1 == 0

我们0**-1的定义是不确定的,因此错误。

现在,如果从理论上看,您如何定义仅具有一个值的时间序列的近似熵;高度不可预测,因此应尽可能高。对于这种情况,让我们将其设置为np.nan来表示它未定义(熵总是大于0等于0)

代码

import pandas as pd
import numpy as np

def ApEn(U,m = 2,r = 0.2):

    '''
    Approximate Entropy 

    Quantify the amount of regularity over time-series data.

    Input parameters:
    
    U = Time series
    m = Length of compared run of data (subseries length)
    r = Filtering level (tolerance). A positive number

    '''

    def _maxdist(x_i,x_j):
        return max([abs(ua - va) for ua,va in zip(x_i,x_j)])

    def _phi(m):
        x = [U.tolist()[i:i + m] for i in range(N - m + 1)] 
        C = [len([1 for x_j in x if _maxdist(x_i,x_j) <= r]) / (N - m + 1.0) for x_i in x]
        if (N - m + 1) == 0:
          return np.nan
        return (N - m + 1)**(-1) * sum(np.log(C))

    N = len(U)

    return abs(_phi(m + 1) - _phi(m))

def Entropy(df):

    '''
    Calculate entropy for individual direction
    '''

    df = df[['Time','Direction','X','Y']]
                                    
    diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()

    # Calculate ApEn grouped by direction. 
    df['ApEn_X'] = df.groupby(diff_dir)['X'].transform(ApEn)
    df['ApEn_Y'] = df.groupby(diff_dir)['Y'].transform(ApEn)

    return df

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,50,size = (10,2)),columns=list('XY'))
df['Time'] = range(1,len(df) + 1)

direction = ['Left','Left','Right','Left']
df['Direction'] = direction

# Calculate defensive regularity
print (Entropy(df))

输出:

   Time Direction   X   Y    ApEn_X    ApEn_Y
0     1      Left   6  16  0.287682  0.287682
1     2      Left  22   6  0.287682  0.287682
2     3      Left  16   5  0.287682  0.287682
3     4      Left   5  48  0.287682  0.287682
4     5      Left  11  21  0.287682  0.287682
5     6     Right  44  25  0.693147  0.693147
6     7     Right  14  12  0.693147  0.693147
7     8     Right  43  40  0.693147  0.693147
8     9      Left  46  44       NaN       NaN
9    10      Left  49   2       NaN       NaN

更大的样本(导致0 **-1问题)

np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,size = (100,len(df) + 1)
direction = ['Left','Up','Down']
df['Direction'] = np.random.choice((direction),len(df))
print (Entropy(df))

输出:

    Time Direction   X   Y  ApEn_X  ApEn_Y
0      1      Left  44  47     NaN     NaN
1      2      Left   0   3     NaN     NaN
2      3      Down   3  39     NaN     NaN
3      4     Right   9  19     NaN     NaN
4      5        Up  21  36     NaN     NaN
..   ...       ...  ..  ..     ...     ...
95    96        Up  19  33     NaN     NaN
96    97      Left  40  32     NaN     NaN
97    98        Up  36   6     NaN     NaN
98    99      Left  21  31     NaN     NaN
99   100     Right  13   7     NaN     NaN
,

看来,当调用ApEn._phi()函数时,Nm的特定值可能最终返回0。然后需要将该值提高到-1的负幂,但是它是不确定的(另请参见Why does zero raised to the power of negative one equal infinity?)。

为说明起见,我尝试专门复制您的方案,在transform操作的第一次迭代中,将发生以下情况:

U is: 1     0
      2    48

(第一个分组依据有2个元素)

N is: 2
m is: 3

如此有效地达到_phi()的返回值时,您正在执行(N - m + 1.0)**-1 = (2 - 3 + 1)**-1 = 0**-1,这是未定义的。也许这里的关键是您说要按各个方向进行分组,并将U数组传递到近似熵函数中,但是实际上是按diff_Xdiff_Y分组,因此由于所应用方法的性质,导致结果非常小。据我了解,如果要计算每个方向的近似熵,只需按“方向”分组即可。

def Entropy(df):

    '''
    Calculate entropy for individual direction
    '''           

    # Calculate ApEn grouped by direction. 
    df['ApEn_X'] = df.groupby('Direction')['X'].transform(ApEn)
    df['ApEn_Y'] = df.groupby('Direction')['Y'].transform(ApEn)                 

    return df

这将导致这样的数据帧:

entropy.head()

    Time    Direction   X   Y   ApEn_X      ApEn_Y
0   1       Left        28  47  0.035091    0.035091
1   2       Up          8   47  0.013493    0.046520
2   3       Up          0   32  0.013493    0.046520
3   4       Right       34  8   0.044452    0.044452
4   5       Right       49  27  0.044452    0.044452
,

您必须处理ZeroDivision。也许是这样:

<marquee behavior="scroll" scrollamount="10" direction="left" width="100%" onmouseover="this.stop();" onmouseout="this.start();" onclick="this.stop();">
  <ul style="list-style:none;">
    <li class="text-center p-0 token_list_item">
      <div class="border-right px-2 border-secondary">
        <img src="https://via.placeholder.com/25/bf9763/808080?text=1" class="rounded-circle">
        <span>100</span> Test 1
        <div class="dropdown" style="display: inline-block;">
          <a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
          <div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
            <a class="dropdown-item" href="#">Action</a>
            <a class="dropdown-item" href="#">Another action</a>
            <a class="dropdown-item" href="#">Something else here</a>
          </div>
        </div>
      </div>
    </li>
    <li class="text-center p-0 token_list_item">
      <div class="border-right px-2 border-secondary">
        <img src="https://via.placeholder.com/25/77bf63/808080?text=2" class="rounded-circle">
        <span>150</span> Test 2
        <div class="dropdown" style="display: inline-block;">
          <a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
          <div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
            <a class="dropdown-item" href="#">Action</a>
            <a class="dropdown-item" href="#">Another action</a>
            <a class="dropdown-item" href="#">Something else here</a>
          </div>
        </div>
      </div>
    </li>
    <li class="text-center p-0 token_list_item">
      <div class="border-right px-2 border-secondary">
        <img src="https://via.placeholder.com/25/63bfb1/808080?text=3" class="rounded-circle">
        <span>90</span> Test 3
        <div class="dropdown" style="display: inline-block;">
          <a href="javascript://" class="p-1 dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Button</a>
          <div class="dropdown-menu" aria-labelledby="dropdownMenuButton">
            <a class="dropdown-item" href="#">Action</a>
            <a class="dropdown-item" href="#">Another action</a>
            <a class="dropdown-item" href="#">Something else here</a>
          </div>
        </div>
      </div>
    </li>
  </ul>
</marquee>

然后,您将在 groupby 上遇到长度不匹配的情况, df diff_X 的长度必须相同。