问题描述
在像下面这样的情况下,我如何 vstack 两个矩阵?
import numpy as np
a = np.array([[3,3,3],[3,3]])
b = np.array([[2,2],[2,2]])
a = np.vstack([a,b])
Output:
ValueError: all the input array dimensions for the concatenation axis must match exactly,but along dimension 1,the array at index 0 has size 3 and the array at index 1 has size 2
我想要的输出如下所示:
a = array([[[3,3]],[[2,2]]])
我的目标是然后遍历堆叠矩阵的内容,索引每个矩阵并在特定行上调用函数。
for matrix in a:
row = matrix[1]
print(row)
Output:
[3,3]
[2,2]
解决方法
小心那些“Numpy 更快”的说法。如果你已经有了数组,并且充分利用数组方法,numpy
确实更快。但是,如果您从列表开始,或者必须使用 Python 级别的迭代(就像您在 Pack...
中所做的那样),numpy
版本可能会更慢。
只是对 Pack
步骤进行时间测试:
In [12]: timeit Pack_Matrices_with_NaN([a,b,c],5)
221 µs ± 9.02 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
将其与使用简单的列表推导式获取每个数组的第一行进行比较:
In [13]: [row[1] for row in [a,c]]
Out[13]: [array([3.,3.,3.]),array([2.,2.]),array([4.,4.,4.])]
In [14]: timeit [row[1] for row in [a,c]]
808 ns ± 2.17 ns per loop (mean ± std. dev. of 7 runs,1000000 loops each)
200 微秒与不到 1 微秒!
并为您的 Unpack
计时:
In [21]: [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,3,5),i)[1,:] for i in range(3)]
...:
Out[21]: [array([3.,4.])]
In [22]: timeit [Unpack_Matrix_with_NaN(packed_matrices.reshape(3,:] for i in ra
...: nge(3)]
199 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
,
我只能使用 NumPy 来解决这个问题。由于 NumPy 比 python 的列表函数 (https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) 快得多,我想分享我的答案,因为它可能对其他人有用。
我开始添加 np.NaN 以使两个数组具有相同的形状。
import numpy as np
a = np.array([[3,3],[3,3]]).astype(float)
b = np.array([[2,2],[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
b = np.insert(b,2,np.nan,axis=1)
# Now vstack the arrays
a = np.vstack([[a],[b]])
print(a)
Output:
[[[ 3. 3. 3.]
[ 3. 3. 3.]
[ 3. 3. 3.]]
[[ 2. 2. nan]
[ 2. 2. nan]
[ 2. 2. nan]]]
然后我写了一个函数来解包 a 中的每个数组,并删除 nan。
def Unpack_Matrix_with_NaN(Matrix_with_nan,matrix_of_interest):
for first_row in Matrix_with_nan[matrix_of_interest,:1]:
# find shape of matrix row without nan
first_row_without_nan = first_row[~np.isnan(first_row)]
shape = first_row_without_nan.shape[0]
matrix_without_nan = np.arange(shape)
for row in Matrix_with_nan[matrix_of_interest]:
row_without_nan = row[~np.isnan(row)]
matrix_without_nan = np.vstack([matrix_without_nan,row_without_nan])
# Remove vector specifying shape
matrix_without_nan = matrix_without_nan[1:]
return matrix_without_nan
然后我可以遍历矩阵,找到我想要的行,然后打印出来。
Matrix_with_nan = a
for matrix in range(len(Matrix_with_nan)):
matrix_of_interest = Unpack_Matrix_with_NaN(a,matrix)
row = matrix_of_interest[1]
print(row)
Output:
[3. 3. 3.]
[2. 2.]
当每行需要添加多个 nan 时,我还制作了一个打包矩阵的函数:
import numpy as np
a = np.array([[3,2]]).astype(float)
c = np.array([[4,4,4],[4,4]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices,Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array,(x),axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan,array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,c]
packed_matrices = Pack_Matrices_with_NaN(arrays,5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 2. 2. nan nan nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]
[ 4. 4. 4. 4. nan]]