PCA:TypeError:只能将列表不是“int”连接到列表

问题描述

我试图按照 this post 的建议解决方案在我的数据集上拟合 PCA。该代码适用于形状为 iris(150,8) 数据,如下所示:


array([[ 1.7837721,-1.23464679,4.27808537,...,0.63061657,-1.79849625,-1.41574397],[-0.35396307,-0.13400175,3.91751182,-0.58928302,-0.15735542,-0.99157312],[-0.20380491,-1.06074392,4.65814864,2.19686369,0.14920164,2.33371106],[-1.05079672,1.46836264,5.41970214,0.32847349,0.27133141,1.01266607],[ 0.19569856,0.57739573,3.84749973,0.02400556,-0.08193678,0.51223263],[ 0.04905765,0.66314259,6.22608157,0.60076934,-0.56890579,-0.23642103]])

但是,使用我的 (3475,29) 形状数据,它发现了错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-292-5661ffbde57b> in <module>
     38 # data = array([randn(8) for k in range(150)])
     39 data[:50,2:4] += 5
---> 40 data[50:,2:5] += 5
     41 
     42 """ visualize """

TypeError: can only concatenate list (not "int") to list

我的数据(形状 (3475,29))如下所示:

array([[58.5,27.0,88.5,nan,0.0,-3.0],[58.5,[47.0,45.0,92.0,1.6,-0.649519052838329,-1.1249999999999998],[46.0,44.5,98.0,2.5,-1.3],40.0,[46.5,76.5,17.767857142857142,-0.4788774197473401,-1.4219984343829701]],dtype=object)

使用的代码

# SO - doug - my data 
from numpy import array,dot,mean,std,empty,argsort
from numpy.linalg import eigh,solve
from numpy.random import randn
from matplotlib.pyplot import subplots,show


def cov(X):
    """
    Covariance matrix
    note: specifically for mean-centered data
    note: numpy's `cov` uses N-1 as normalization
    """
    return dot(X.T,X) / X.shape[0]
    # N = data.shape[1]
    # C = empty((N,N))
    # for j in range(N):
    #   C[j,j] = mean(data[:,j] * data[:,j])
    #   for k in range(j + 1,N):
    #       C[j,k] = C[k,k])
    # return C

def pca(data,pc_count = None):
    """
    Principal component analysis using eigenvalues
    note: this mean-centers and auto-scales the data (in-place)
    """
    data -= mean(data,0)
    data /= std(data,0)
    C = cov(data)
    E,V = eigh(C)
    key = argsort(E)[::-1][:pc_count]
    E,V = E[key],V[:,key]
    U = dot(data,V)  # used to be dot(V.T,data.T).T
    return U,E,V

""" test data """
# data = array([randn(8) for k in range(150)])
data = my_data1   # Using my own data
data[:50,2:4] += 5
data[50:,2:5] += 5

""" visualize """
trans = pca(data,3)[0]
fig,(ax1,ax2) = subplots(1,2)
ax1.scatter(data[:50,0],data[:50,1],c = 'r')
ax1.scatter(data[50:,data[50:,c = 'b')
ax2.scatter(trans[:50,trans[:50,c = 'r')
ax2.scatter(trans[50:,trans[50:,c = 'b')
show()

做什么

data[:50,2:5] += 5

做什么?

我试图用

替换这两行
data = [data[:50,2:4] += 5]
data = [data[50:,2:5] += 5]

基于this answer,然后返回

  File "<ipython-input-296-5d80e1852b4e>",line 42
    data = [data[:50,2:4] += 5]
                            ^
SyntaxError: invalid Syntax

感谢任何建议!

解决方法

如果 data 是一个二维数值数组,

data[:50,2:4]

选择数组的一个切片(技术上是 view),并且

data[:50,2:4] += 5

将 5 添加到该切片的所有元素 - 并修改 data

但是

TypeError: can only concatenate list (not "int") to list

表示 data 包含一个或多个列表,而不是(仅)数字。对于列表,+ 不是(数学)加法,而是连接/连接。 [1,2,3]+[4]

至于你的第二次尝试:

[data[:50,2:4] += 5]

很不一样
[i + 1]

[] 创建一个列表。您不能在列表中执行赋值,例如 =+=。因此出现语法错误。

您显示的 (3475,29) 数组将 dtype 列为 object。这是一个强有力的指标,表明它包含数字以外的东西(或除数字之外)。根据错误,那必须是一个列表。

所以你需要清理mydata