如何根据3D数组的特定类型对特征进行规格化

问题描述

我有一个3D数组(1883,100,68)作为(批处理,步骤,功能)。

68个功能是完全不同的功能,例如能量和mfcc。

我希望将各自的功能归一化。

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0],-1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0],-1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))

显然,将其转换为2D数组将不起作用,因为每个功能都相对于6800的所有功能均已标准化。这导致全部的100个步骤中的多个功能变为零。

例如,我正在寻找的特征[0]是能量。对于一个批次,由于100个步骤,因此有100个能量值。我希望这100个能量值在自己内可以归一化。

因此,应该在[1,1,0],[1,2,3,0] ... [1,0]之间执行归一化。其他所有功能都一​​样。

我应该如何处理?

更新:

以下代码是在sai的帮助下产生的。

def feature_normalization(x):
    batches_unrolled = np.expand_dims(np.reshape(x,(-1,x.shape[2])),axis=0)

    x_normalized = (x - np.mean(batches_unrolled,axis=1,keepdims=True)) / np.std(batches_unrolled,keepdims=True)

    np.testing.assert_allclose(x_normalized[0,:,0],(x[0,0] - np.mean(x[:,0])) / np.std(x[:,0]))
    return x_normalized

def testset_normalization(X_train,X_test):
    batches_unrolled = np.expand_dims(np.reshape(X_train,axis=0)
    fitted_mean = np.mean(batches_unrolled,keepdims=True)
    fitted_std = np.std(batches_unrolled,keepdims=True)
    X_test_normalized = (X_test - fitted_mean) / fitted_std
    return X_test_normalized 

解决方法

要在所有样本中独立地对特征进行归一化-

  1. 展开批处理样本以获取[10(时间步)* batch_size] x [40个功能]矩阵
  2. 获取每个特征的均值和标准差
  3. 对实际批量样品进行元素明智的归一化
import numpy as np

x = np.random.random((20,10,40))

batches_unrolled = np.expand_dims(np.reshape(x,(-1,40)),axis=0)

x_normalized = (x - np.mean(batches_unrolled,axis=1,keepdims=True)) / np.std(batches_unrolled,keepdims=True)

np.testing.assert_allclose(x_normalized[0,:,0],(x[0,0] - np.mean(x[:,0])) / np.std(x[:,0]))