如何根据python中训练集的均值和标准差缩放测试集？

问题描述

我阅读了解释“Why feature scaling only to training set?”的答案 ” 答案是“使用训练集均值和标准差对任何测试集进行标准化”

因此，我尝试修复我之前的错误操作。但是，我检查了 StandardScaler() 的 official document，它不支持使用给定的均值和标准值进行缩放。像这样：

from sklearn.preprocessing import StandardScaler
sc = StandardScaler(mean = train_x.mean(),var_x = train.std())
sc.fit(test_x)

# this code is incorrect,but what is the correct code?

所以，我的问题是如何根据 python 中训练集的均值和标准差来缩放测试集。

解决方法

根据官方文档，

with_mean bool,default=True 如果为 True，则在缩放前将数据居中。这在尝试时不起作用（并且会引发异常）稀疏矩阵，因为将它们居中需要构建一个密集的在常见用例中可能太大而无法容纳的矩阵记忆。

with_std bool,default=True 如果为 True，将数据缩放到单位方差（或等效的单位标准差）。

所以你可以简单地这样做。

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(test_x)

StandardScaler() 仅将 with_mean 和 with_std 作为布尔值，这意味着它们的值是 True 或 False。

normalization python scale standardized