Groupby 移位滞后值模拟，只有 Numpy无熊猫对虚拟数据进行测试 -

问题描述

我有一个如下所示的数据框：

          id    date       v1
0          0  1983.0    1.574
1          0  1984.0    1.806
2          0  1985.0    4.724
3          1  1986.0    0.320
4          1  1987.0    3.414
     ...     ...      ...
107191  9874  1993.0   52.448
107192  9874  1994.0  108.652
107193  9875  1992.0    1.597
107194  9875  1993.0    3.134
107195  9875  1994.0    7.619

我想生成一个新列，其中滞后值 v1 由 id 排序。在熊猫中我会使用

df.groupby('id')['v1'].shift(-1)

但是，我想仅使用 Numpy 将其转换为纯矩阵/数组形式。在 Numpy 中获得模拟的最直接方法是什么？我需要避免使用 Pandas 工具，因为我想稍后使用 Numba @jit。

解决方法

IIUC，您希望完全在 numpy 中实现 df.groupby('id')['v1'].shift(-1)。这是由石斑鱼和移位法组成。

一个 groupby() 在 numpy 中等效于具有第一分组列和第二个值列的二维数组是 -

np.split(arr[:,1],np.unique(arr[:,0],return_index=True)[1][1:])

在 numpy 中，一维数组的 shift() 等价物是 -

np.append(np.roll(arr,-1)[:-1],np.nan)

把这两个放在一起，你就能得到你想要的-

#2D array with only id and v1 as columns
arr = df[['id','v1']].values   

#Groupby based on id
grouper = np.split(arr[:,return_index=True)[1][1:]) 

#apply shift to grouped elements
shift = [np.append(np.roll(i,np.nan) for i in grouper] 

#stack them as a single array
new_col = np.hstack(shift) 

#set as column
df['shifted'] = new_col

对虚拟数据进行测试 -

#Dummy data
idx = [0,1,2,3,3]
val = np.arange(len(idx))
arr = np.array([idx,val]).T
df = pd.DataFrame(arr,columns=['id','v1'])

#apply grouped shifting
arr = df[['id','v1']].values
grouper = np.split(arr[:,return_index=True)[1][1:])
shift = [np.append(np.roll(i,np.nan) for i in grouper]
new_col = np.hstack(shift)
df['shifted'] = new_col

print(df)

    id  v1  shifted
0    0   0      1.0
1    0   1      2.0
2    0   2      3.0
3    0   3      4.0
4    0   4      NaN
5    1   5      6.0
6    1   6      7.0
7    1   7      8.0
8    1   8      9.0
9    1   9     10.0
10   1  10      NaN
11   2  11     12.0
12   2  12     13.0
13   2  13     14.0
14   2  14      NaN
15   3  15     16.0
16   3  16     17.0
17   3  17     18.0
18   3  18     19.0
19   3  19      NaN

group-by numpy pandas pandas shift shift

Groupby 移位滞后值模拟，只有 Numpy无熊猫 对虚拟数据进行测试 -

问题描述

解决方法

对虚拟数据进行测试 -

Groupby 移位滞后值模拟，只有 Numpy无熊猫对虚拟数据进行测试 -