问题描述
我正在处理MNIST数据集的一个子集,在这里我希望对数据集中的样本特征进行归一化。我正在尝试以.mat文件的形式加载数据集。谁能指导我如何将.mat转换为numpy数组,以便我可以执行诸如mean和std之类的基本操作。向量上的偏差?
import scipy.io
import numpy as np
train_0 = scipy.io.loadmat('data/training_data_0.mat')
train_1 = scipy.io.loadmat('data/training_data_1.mat')
test_0 = scipy.io.loadmat('data/testing_data_0.mat')
test_1 = scipy.io.loadmat('data/testing_data_1.mat')
# to return a group of the key-value
# pairs in the dictionary
result = train_0.items()
# Convert object to a list
data = list(result)
# Convert list to an array
numpyArray = np.array(data)
print(numpyArray.mean())
但是执行后出现此错误:
numpyArray = np.array(data)
Traceback (most recent call last):
File "<input>",line 1,in <module>
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py",line 197,in runfile
pydev_imports.execfile(filename,global_vars,local_vars) # execute the script
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py",line 18,in execfile
exec(compile(contents+"\n",file,'exec'),glob,loc)
File "/Users/mish/Work/ASU/Fall20/CSE 569/main.py",line 20,in <module>
print(numpyArray.mean())
File "/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py",line 160,in _mean
ret = umr_sum(arr,axis,dtype,out,keepdims)
TypeError: can only concatenate str (not "bytes") to str
解决方法
您将元组(键,值)列表传递给numpy.array
,您已经使用了train_0['<some variable name here>']
的numpy数组
要获取变量名,只需使用:print(train_0.keys())
这可能回答了您的问题:Convert loaded mat file back to numpy array
scipy.io.loadmat返回一个字典:
Returns
mat_dictdict
dictionary with variable names as keys,and loaded matrices as values.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html