问题描述
给出下面带有w和x的 1d 数组,我可以使用以下代码形成笛卡尔积。
import numpy as np
w = np.array([1,2,3,4])
x = np.array([1,4])
V1 = np.transpose([np.repeat(w,len(x)),np.tile(x,len(w))])
print(V1)
[[1 1]
[1 2]
[1 3]
[1 4]
[2 1]
[2 2]
[2 3]
[2 4]
[3 1]
[3 2]
[3 3]
[3 4]
[4 1]
[4 2]
[4 3]
[4 4]]
但是,我希望输出V1包括仅w
[[1 2]
[1 3]
[1 4]
[2 3]
[2 4]
[3 4]]
解决方法
方法1
给出ONLY array rows where w < x
(用于成对组合),这是实现相同的一种方法-
In [81]: r,c = np.nonzero(w[:,None]<x) # or np.less.outer(w,x)
In [82]: np.c_[w[r],x[c]]
Out[82]:
array([[1,2],[1,3],4],[2,[3,4]])
方法2
使用纯基于遮罩的方法,它将是-
In [93]: mask = np.less.outer(w,x)
In [94]: s = (len(w),len(x))
In [95]: np.c_[np.broadcast_to(w[:,None],s)[mask],np.broadcast_to(x,s)[mask]]
Out[95]:
array([[1,4]])
基准化
使用相对较大的数组:
In [8]: np.random.seed(0)
...: w = np.random.randint(0,1000,(1000))
...: x = np.random.randint(0,(1000))
In [9]: %%timeit
...: r,None]<x)
...: np.c_[w[r],x[c]]
11.3 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
In [10]: %%timeit
...: mask = np.less.outer(w,x)
...: s = (len(w),len(x))
...: np.c_[np.broadcast_to(w[:,s)[mask]]
10.5 ms ± 275 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
In [11]: import itertools
# @Akshay Sehgal's soln
In [12]: %timeit [i for i in itertools.product(w,x) if i[0]<i[1]]
105 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs,10 loops each)
,
尝试使用itertools这一行方法-
import itertools
w = np.array([1,2,3,4])
x = np.array([1,4])
[i for i in itertools.product(w,x) if i[0]<i[1]]
[(1,2),(1,3),4),(2,(3,4)]
Itertools非常快速且内存高效。它应该很快。
,您还可以通过以下方式过滤自己的解决方案:
V1 = np.transpose([np.repeat(w,len(x)),np.tile(x,len(w))])
V1[V1[:,0]<V1[:,1]]
@Divakar提出的解决方案当然是更快的,因为它不会进行多余的计算。
使用@Divakar的Benchit软件包进行基准测试:
#@Proposed solution here
def m1(x):
V1 = np.transpose([np.repeat(w,len(w))])
return V1[V1[:,1]]
#@Divakar's approach 1
def m2(x):
r,x)
return np.c_[w[r],x[c]]
#Divakar's approach 2
def m3(x):
mask = np.less.outer(w,x)
s = (len(w),len(x))
return np.c_[np.broadcast_to(w[:,s)[mask]]
in_ = [np.arange(n) for n in [10,100,1000]]
w = x.copy()