NumPy-为二维数组排序的np.search

问题描述

np.searchsorted仅用于一维数组。

我有一个 lexicographically sorted 2D数组，这意味着对第0行进行排序，然后对于第0行的相同值，对第1行的相应元素也进行排序，对于第1行的相同值第2行的行值也进行排序。换句话说，由列组成的元组被排序。

我还有一些其他带有元组列的2D数组，需要将它们插入第一个2D数组中列的正确位置。对于一维情况，np.searchsorted通常用于找到正确的位置。

但是对于2D数组，np.searchsorted可以替代吗？与np.lexsort是一维np.argsort的2D替代品类似。

如果没有这样的功能，那么可以使用现有的numpy函数有效地实现此功能吗？

我对包含dtype的任何np.object_的阵列的高效解决方案感兴趣。

处理任何dtype情况的一种简单方法是将两个数组的每一列都转换为1D数组（或元组），然后将这些列存储为dtype = np.object_的另一个1D数组。也许不是那么幼稚，甚至可能很快，特别是在列数很高的情况下。

解决方法

我创建了一些更高级的策略。

还实现了像another my answer中那样使用tuples的简单策略。

测量所有溶液的时间。

大多数策略都使用np.searchsorted作为基础引擎。为了实现这些高级策略，使用了特殊的包装类_CmpIx来为__lt__调用提供自定义比较功能（np.searchsorted）。

py.tuples策略只是将所有列转换为元组，并将其存储为np.object_ dtype的numpy 1D数组，然后进行常规搜索排序。
py.zip使用python的zip懒惰地执行相同的任务。
np.lexsort策略仅使用np.lexsort来按字典顺序比较两列。
np.nonzero使用np.flatnonzero(a != b)表达式。
cmp_numba在_CmpIx包装器内使用ahead of time编译的numba代码，以便按字典顺序快速比较两个提供的元素。
np.searchsorted使用标准的numpy函数，但仅针对一维情况进行测量。
对于numba策略，整个搜索算法是使用Numba引擎从头开始实现的，算法基于binary search。此算法有_py和_nm个变体，_nm更快，因为它使用Numba编译器，而_py是相同的算法，但未编译。还有_sorted种风味，它已经对要插入的数组进行了额外的优化，已经排序。
view1d-@MadPhysicist in this answer建议的方法。用代码注释掉它们，因为对于大多数键长度> 1的大多数测试，它们返回错误的答案，这可能是由于原始查看数组的某些问题造成的。

Try it online!

class SearchSorted2D:
    class _CmpIx:
        def __init__(self,t,p,i):
            self.p,self.i = p,i
            self.leg = self.leg_cache()[t]
            self.lt = lambda o: self.leg(self,o,False) if self.i != o.i else False
            self.le = lambda o: self.leg(self,True) if self.i != o.i else True
        @classmethod
        def leg_cache(cls):
            if not hasattr(cls,'leg_cache_data'):
                cls.leg_cache_data = {
                    'py.zip': cls._leg_py_zip,'np.lexsort': cls._leg_np_lexsort,'np.nonzero': cls._leg_np_nonzero,'cmp_numba': cls._leg_numba_create(),}
            return cls.leg_cache_data
        def __eq__(self,o): return not self.lt(o) and self.le(o)
        def __ne__(self,o): return self.lt(o) or not self.le(o)
        def __lt__(self,o): return self.lt(o)
        def __le__(self,o): return self.le(o)
        def __gt__(self,o): return not self.le(o)
        def __ge__(self,o): return not self.lt(o)
        @staticmethod
        def _leg_np_lexsort(self,eq):
            import numpy as np
            ia,ib = (self.i,o.i) if eq else (o.i,self.i)
            return (np.lexsort(self.p.ab[::-1,ia : (ib + (-1,1)[ib >= ia],None)[ib == 0] : ib - ia])[0] == 0) == eq
        @staticmethod
        def _leg_py_zip(self,eq):
            for l,r in zip(self.p.ab[:,self.i],self.p.ab[:,o.i]):
                if l < r:
                    return True
                if l > r:
                    return False
            return eq
        @staticmethod
        def _leg_np_nonzero(self,eq):
            import numpy as np
            a,b = self.p.ab[:,o.i]
            ix = np.flatnonzero(a != b)
            return a[ix[0]] < b[ix[0]] if ix.size != 0 else eq
        @staticmethod
        def _leg_numba_create():
            import numpy as np

            try:
                from numba.pycc import CC
                cc = CC('ss_numba_mod')
                @cc.export('ss_numba_i8','b1(i8[:],i8[:],b1)')
                def ss_numba(a,b,eq):
                    for i in range(a.size):
                        if a[i] < b[i]:
                            return True
                        elif b[i] < a[i]:
                            return False
                    return eq
                cc.compile()
                success = True
            except:    
                success = False
                
            if success:
                try:
                    import ss_numba_mod
                except:
                    success = False
            
            def odo(self,eq):
                a,o.i]
                assert a.ndim == 1 and a.shape == b.shape,(a.shape,b.shape)
                return ss_numba_mod.ss_numba_i8(a,eq)
                
            return odo if success else None

    def __init__(self,type_):
        import numpy as np
        self.type_ = type_
        self.ci = np.array([],dtype = np.object_)
    def __call__(self,a,*pargs,**nargs):
        import numpy as np
        self.ab = np.concatenate((a,b),axis = 1)
        self._grow(self.ab.shape[1])
        ix = np.searchsorted(self.ci[:a.shape[1]],self.ci[a.shape[1] : a.shape[1] + b.shape[1]],**nargs)
        return ix
    def _grow(self,to):
        import numpy as np
        if self.ci.size >= to:
            return
        import math
        to = 1 << math.ceil(math.log(to) / math.log(2))
        self.ci = np.concatenate((self.ci,[self._CmpIx(self.type_,self,i) for i in range(self.ci.size,to)]))

class SearchSorted2DNumba:
    @classmethod
    def do(cls,v,side = 'left',*,vsorted = False,numba_ = True):
        import numpy as np

        if not hasattr(cls,'_ido_numba'):
            def _ido_regular(a,vsorted,lrt):
                nk,na,nb = a.shape[0],a.shape[1],b.shape[1]
                res = np.zeros((2,nb),dtype = np.int64)
                max_depth = 0
                if nb == 0:
                    return res,max_depth
                #lb,le,rb,re = 0,0
                lrb,lre = 0,0
                
                if vsorted:
                    brngs = np.zeros((nb,6),dtype = np.int64)
                    brngs[0,:4] = (-1,nb >> 1,nb)
                    i,j,size = 0,1,1
                    while i < j:
                        for k in range(i,j):
                            cbrng = brngs[k]
                            bp,bb,bm,be = cbrng[:4]
                            if bb < bm:
                                brngs[size,:4] = (k,(bb + bm) >> 1,bm)
                                size += 1
                            bmp1 = bm + 1
                            if bmp1 < be:
                                brngs[size,bmp1,(bmp1 + be) >> 1,be)
                                size += 1
                        i,j = j,size
                    assert size == nb
                    brngs[:,4:] = -1

                for ibc in range(nb):
                    if not vsorted:
                        ib,lrb,lre = ibc,na
                    else:
                        ibpi,ib = int(brngs[ibc,0]),int(brngs[ibc,2])
                        if ibpi == -1:
                            lrb,na
                        else:
                            ibp = int(brngs[ibpi,2])
                            if ib < ibp:
                                lrb,lre = int(brngs[ibpi,4]),int(res[1,ibp])
                            else:
                                lrb,lre = int(res[0,ibp]),int(brngs[ibpi,5])
                        brngs[ibc,4 : 6] = (lrb,lre)
                        assert lrb != -1 and lre != -1
                        
                    for ik in range(nk):
                        if lrb >= lre:
                            if ik > max_depth:
                                max_depth = ik
                            break

                        bv = b[ik,ib]
                        
                        # Binary searches
                        
                        if nk != 1 or lrt == 2:
                            cb,ce = lrb,lre
                            while cb < ce:
                                cm = (cb + ce) >> 1
                                av = a[ik,cm]
                                if av < bv:
                                    cb = cm + 1
                                elif bv < av:
                                    ce = cm
                                else:
                                    break
                            lrb,lre = cb,ce
                                
                        if nk != 1 or lrt >= 1:
                            cb,lre
                            while cb < ce:
                                cm = (cb + ce) >> 1
                                if not (bv < a[ik,cm]):
                                    cb = cm + 1
                                else:
                                    ce = cm
                            #rb,re = cb,ce
                            lre = ce
                                
                        if nk != 1 or lrt == 0 or lrt == 2:
                            cb,lre
                            while cb < ce:
                                cm = (cb + ce) >> 1
                                if a[ik,cm] < bv:
                                    cb = cm + 1
                                else:
                                    ce = cm
                            #lb,le = cb,ce
                            lrb = cb
                            
                        #lrb,lre = lb,re
                            
                    res[:,ib] = (lrb,lre)
                    
                return res,max_depth

            cls._ido_regular = _ido_regular
            
            import numba
            cls._ido_numba = numba.jit(nopython = True,nogil = True,cache = True)(cls._ido_regular)
            
        assert side in ['left','right','left_right'],side
        a,v = np.array(a),np.array(v)
        assert a.ndim == 2 and v.ndim == 2 and a.shape[0] == v.shape[0],v.shape)
        res,max_depth = (cls._ido_numba if numba_ else cls._ido_regular)(
            a,{'left': 0,'right': 1,'left_right': 2}[side],)
        return res[0] if side == 'left' else res[1] if side == 'right' else res

def Test():
    import time
    import numpy as np
    np.random.seed(0)
    
    def round_float_fixed_str(x,n = 0):
        if type(x) is int:
            return str(x)
        s = str(round(float(x),n))
        if n > 0:
            s += '0' * (n - (len(s) - 1 - s.rfind('.')))
        return s

    def to_tuples(x):
        r = np.empty([x.shape[1]],dtype = np.object_)
        r[:] = [tuple(e) for e in x.T]
        return r
    
    searchsorted2d = {
        'py.zip': SearchSorted2D('py.zip'),'np.nonzero': SearchSorted2D('np.nonzero'),'np.lexsort': SearchSorted2D('np.lexsort'),'cmp_numba': SearchSorted2D('cmp_numba'),}
    
    for iklen,klen in enumerate([1,2,5,10,20,50,100,200]):
        times = {}
        for side in ['left','right']:
            a = np.zeros((klen,0),dtype = np.int64)
            tac = to_tuples(a)

            for itest in range((15,100)[iklen == 0]):
                b = np.random.randint(0,(3,100000)[iklen == 0],(klen,np.random.randint(1,(1000,2000)[iklen == 0])),dtype = np.int64)
                b = b[:,np.lexsort(b[::-1])]
                
                if iklen == 0:
                    assert klen == 1,klen
                    ts = time.time()
                    ix1 = np.searchsorted(a[0],b[0],side = side)
                    te = time.time()
                    times['np.searchsorted'] = times.get('np.searchsorted',0.) + te - ts
                    
                for cached in [False,True]:
                    ts = time.time()
                    tb = to_tuples(b)
                    ta = tac if cached else to_tuples(a)
                    ix1 = np.searchsorted(ta,tb,side = side)
                    if not cached:
                        ix0 = ix1
                    tac = np.insert(tac,ix0,tb) if cached else tac
                    te = time.time()
                    timesk = f'py.tuples{("","_cached")[cached]}'
                    times[timesk] = times.get(timesk,0.) + te - ts

                for type_ in searchsorted2d.keys():
                    if iklen == 0 and type_ in ['np.nonzero','np.lexsort']:
                        continue
                    ss = searchsorted2d[type_]
                    try:
                        ts = time.time()
                        ix1 = ss(a,side = side)
                        te = time.time()
                        times[type_] = times.get(type_,0.) + te - ts
                        assert np.array_equal(ix0,ix1)
                    except Exception:
                        times[type_ + '!failed'] = 0.

                for numba_ in [False,True]:
                    for vsorted in [False,True]:
                        if numba_:
                            # Heat-up/pre-compile numba
                            SearchSorted2DNumba.do(a,side = side,vsorted = vsorted,numba_ = numba_)
                        
                        ts = time.time()
                        ix1 = SearchSorted2DNumba.do(a,numba_ = numba_)
                        te = time.time()
                        timesk = f'numba{("_py","_nm")[numba_]}{("","_sorted")[vsorted]}'
                        times[timesk] = times.get(timesk,ix1)


                # View-1D methods suggested by @MadPhysicist
                if False: # Commented out as working just some-times
                    aT,bT = np.copy(a.T),np.copy(b.T)
                    assert aT.ndim == 2 and bT.ndim == 2 and aT.shape[1] == klen and bT.shape[1] == klen,(aT.shape,bT.shape,klen)
                    
                    for ty in ['if','cf']:
                        try:
                            dt = np.dtype({'if': [('',b.dtype)] * klen,'cf': [('row',b.dtype,klen)]}[ty])
                            ts = time.time()
                            va = np.ndarray(aT.shape[:1],dtype = dt,buffer = aT)
                            vb = np.ndarray(bT.shape[:1],buffer = bT)
                            ix1 = np.searchsorted(va,vb,side = side)
                            te = time.time()
                            assert np.array_equal(ix0,ix1),(ix0.shape,ix1.shape,ix0[:20],ix1[:20])
                            times[f'view1d_{ty}'] = times.get(f'view1d_{ty}',0.) + te - ts
                        except Exception:
                            raise
                
                a = np.insert(a,axis = 1)
            
        stimes = ([f'key_len: {str(klen).rjust(3)}'] +
            [f'{k}: {round_float_fixed_str(v,4).rjust(7)}' for k,v in times.items()])
        nlines = 4
        print('-' * 50 + '\n' + ('','!LARGE!:\n')[iklen == 0],end = '')
        for i in range(nlines):
            print(','.join(stimes[len(stimes) * i // nlines : len(stimes) * (i + 1) // nlines]),flush = True)
            
Test()

输出：

--------------------------------------------------
!LARGE!:
key_len:   1,np.searchsorted:  0.0250
py.tuples_cached:  3.3113,py.tuples: 30.5263,py.zip: 40.9785
cmp_numba: 25.7826,numba_py:  3.6673
numba_py_sorted:  6.8926,numba_nm:  0.0466,numba_nm_sorted:  0.0505
--------------------------------------------------
key_len:   1,py.tuples_cached:  0.1371
py.tuples:  0.4698,py.zip:  1.2005,np.nonzero:  4.7827
np.lexsort:  4.4672,cmp_numba:  1.0644,numba_py:  0.2748
numba_py_sorted:  0.5699,numba_nm:  0.0005,numba_nm_sorted:  0.0020
--------------------------------------------------
key_len:   2,py.tuples_cached:  0.1131
py.tuples:  0.3643,py.zip:  1.0670,np.nonzero:  4.5199
np.lexsort:  3.4595,cmp_numba:  0.8582,numba_py:  0.4958
numba_py_sorted:  0.6454,numba_nm:  0.0025,numba_nm_sorted:  0.0025
--------------------------------------------------
key_len:   5,py.tuples_cached:  0.1876
py.tuples:  0.4493,py.zip:  1.6342,np.nonzero:  5.5168
np.lexsort:  4.6086,cmp_numba:  1.0939,numba_py:  1.0607
numba_py_sorted:  0.9737,numba_nm:  0.0050,numba_nm_sorted:  0.0065
--------------------------------------------------
key_len:  10,py.tuples_cached:  0.6017
py.tuples:  1.2275,py.zip:  3.5276,np.nonzero: 13.5460
np.lexsort: 12.4183,cmp_numba:  2.5404,numba_py:  2.8334
numba_py_sorted:  2.3991,numba_nm:  0.0165,numba_nm_sorted:  0.0155
--------------------------------------------------
key_len:  20,py.tuples_cached:  0.8316
py.tuples:  1.3759,py.zip:  3.4238,np.nonzero: 13.7834
np.lexsort: 16.2164,cmp_numba:  2.4483,numba_py:  2.6405
numba_py_sorted:  2.2226,numba_nm:  0.0170,numba_nm_sorted:  0.0160
--------------------------------------------------
key_len:  50,py.tuples_cached:  1.0443
py.tuples:  1.4085,py.zip:  2.2475,np.nonzero:  9.1673
np.lexsort: 19.5266,cmp_numba:  1.6181,numba_py:  1.7731
numba_py_sorted:  1.4637,numba_nm:  0.0415,numba_nm_sorted:  0.0405
--------------------------------------------------
key_len: 100,py.tuples_cached:  2.0136
py.tuples:  2.5380,py.zip:  2.2279,np.nonzero:  9.2929
np.lexsort: 33.9505,cmp_numba:  1.5722,numba_py:  1.7158
numba_py_sorted:  1.4208,numba_nm:  0.0871,numba_nm_sorted:  0.0851
--------------------------------------------------
key_len: 200,py.tuples_cached:  3.5945
py.tuples:  4.1847,py.zip:  2.3553,np.nonzero: 11.3781
np.lexsort: 66.0104,cmp_numba:  1.8153,numba_py:  1.9449
numba_py_sorted:  1.6463,numba_nm:  0.1661,numba_nm_sorted:  0.1651

从计时numba_nm来看，实施是最快的，它的表现快于次快的（py.zip或py.tuples_cached）15-100x次。对于一维案例，它具有与标准1.85x相当的速度（np.searchsorted较慢）。同样，看来_sorted的风格并不能改善情况（即使用有关正在排序的插入数组的信息）。

机器代码编译的

cmp_numba方法似乎比执行相同算法但使用纯python的1.5x平均快py.zip倍。由于平均最大等键深度约为15-18个元素，因此numba在这里不会获得太大的提速。如果深度为数百，则numba代码可能会大大提高速度。

py.tuples_cached策略在密钥长度为py.zip的情况下比<= 100快。

而且看来np.lexsort实际上很慢，要么没有针对两列进行优化，要么花费了一些时间进行预处理（例如将行拆分为列表），或者懒字典比较，最后一种情况可能是真正的原因，因为lexsort随着密钥长度的增长而变慢。

策略np.nonzero也不是懒惰的，因此也很慢，并且随着密钥长度的增长而变慢（但是变慢的速度不如np.lexsort那样。）

上面的时间可能不太准确，因为我的CPU每次过热都会随机降低内核频率2-2.3倍，并且由于它是笔记本电脑中的强大CPU而经常过热。

这里有两件事可以为您提供帮助：（1）您可以对结构化数组进行排序和搜索，（2）如果您具有可以映射为整数的有限集合，则可以利用它来发挥自己的优势。

以1D模式查看

假设您要插入一个字符串数组：

data = np.array([['a','1'],['a','z'],['b','a']],dtype=object)

由于数组从不参差不齐，因此您可以构造一个与行大小相同的dtype：

dt = np.dtype([('',data.dtype)] * data.shape[1])

使用我无耻插入的答案here，您现在可以将原始2D数组查看为1D：

view = np.ndarray(data.shape[:1],dtype=dt,buffer=data)

现在可以完全直接地进行搜索了：

key = np.array([('a','a')],dtype=dt)
index = np.searchsorted(view,key)

您甚至可以使用适当的最小值找到不完整元素的插入索引。对于字符串，它将为''。

更快的比较

如果您不必检查dtype的每个字段，则可以从比较中获得更好的里程。您可以使用单个齐次字段创建相似的dtype：

dt2 = np.dtype([('row',data.dtype,data.shape[1])])

构造视图与以前相同：

view = np.ndarray(data.shape[:1],dtype=dt2,buffer=data)

这次的按键操作有所不同（另一个插头here）：

key = np.array([(['a','a'],)],dtype=dt2)

使用以下方法对对象施加的排序顺序不正确：Sorting array of objects by row using custom dtype。如果链接的问题有解决方法，我在这里留下参考。另外，它在对整数排序时仍然非常有用。

整数映射

如果要搜索的对象数量有限，则将它们映射为整数会更容易：

idata = np.empty(data.shape,dtype=int)
keys = [None] * data.shape[1]     # Map index to key per column
indices = [None] * data.shape[1]  # Map key to index per column
for i in range(data.shape[1]):
    keys[i],idata[:,i] = np.unique(data[:,i],return_inverse=True)
    indices[i] = {k: i for i,k in enumerate(keys[i])}  # Assumes hashable objects

idt = np.dtype([('row',idata.dtype,idata.shape[1])])
view = idata.view(idt).ravel()

仅当data实际上在每一列中包含所有可能的键时，此方法才有效。否则，您将不得不通过其他方式获取正向和反向映射。一旦建立起来，设置密钥就简单得多，只需要indices：

key = np.array([index[k] for index,k in zip(indices,'a'])])

进一步的改进

如果您拥有的类别数量为八个或更少，并且每个类别具有256个或更少的元素，则可以通过将所有内容放入单个np.uint64元素左右来构造更好的哈希。

k = math.ceil(math.log(data.shape[1],2))  # math.log provides base directly
assert 0 < k <= 64
idata = np.empty((data.shape[:1],k),dtype=np.uint8)
...
idata = idata.view(f'>u{k}').ravel()

键的制作方法也与此类似：

key = np.array([index[k] for index,'a'])]).view(f'>u{k}')

定时

我已经使用随机随机排列的字符串为此处显示的方法（没有其他答案）计时。关键的计时参数是：

M：行数：10 ** {2，3，4，5}
N：列数：2 ** {3、4、5、6}
K：要插入的元素数：1,M // 10
方法：individual_fields，combined_field，int_mapping，int_packing。功能如下所示。

对于后两种方法，我假设您将数据预先转换为映射的dtype，而不是搜索键。因此，我要传递转换后的数据，但要安排键的转换时间。

import numpy as np
from math import ceil,log

def individual_fields(data,keys):
    dt = [('',data.dtype)] * data.shape[1]
    dview = np.ndarray(data.shape[:1],buffer=data)
    kview = np.ndarray(keys.shape[:1],buffer=keys)
    return np.searchsorted(dview,kview)

def combined_fields(data,keys):
    dt = [('row',data.shape[1])]
    dview = np.ndarray(data.shape[:1],kview)

def int_mapping(idata,keys,indices):
    idt = np.dtype([('row',idata.shape[1])])
    dview = idata.view(idt).ravel()
    kview = np.empty(keys.shape[0],dtype=idt)
    for i,(index,key) in enumerate(zip(indices,keys.T)):
        kview['row'][:,i] = [index[k] for k in key]
    return np.searchsorted(dview,kview)

def int_packing(idata,indices):
    idt = f'>u{idata.shape[1]}'
    dview = idata.view(idt).ravel()
    kview = np.empty(keys.shape,dtype=np.uint8)
    for i,keys.T)):
        kview[:,i] = [index[k] for k in key]
    kview = kview.view(idt).ravel()
    return np.searchsorted(dview,kview)

时间码：

from math import ceil,log
from string import ascii_lowercase
from timeit import Timer

def time(m,n,k,fn,*args):
    t = Timer(lambda: fn(*args))
    s = t.autorange()[0]
    print(f'M={m}; N={n}; K={k} {fn.__name__}: {min(t.repeat(5,s)) / s}')

selection = np.array(list(ascii_lowercase),dtype=object)
for lM in range(2,6):
    M = 10**lM
    for lN in range(3,6):
        N = 2**lN
        data = np.random.choice(selection,size=(M,N))
        np.ndarray(data.shape[0],dtype=[('',data.dtype)] * data.shape[1],buffer=data).sort()
        idata = np.array([[ord(a) - ord('a') for a in row] for row in data],dtype=np.uint8)
        ikeys = [selection] * data.shape[1]
        indices = [{k: i for i,k in enumerate(selection)}] * data.shape[1]
        for K in (1,M // 10):
            key = np.random.choice(selection,size=(K,N))
            time(M,N,K,individual_fields,data,key)
            time(M,combined_fields,int_mapping,idata,key,indices)
            if N <= 8:
                time(M,int_packing,indices)

结果：

M = 100（units = us）

   |                           K                           |
   +---------------------------+---------------------------+
N  |             1             |            10             |
   +------+------+------+------+------+------+------+------+
   |  IF  |  CF  |  IM  |  IP  |  IF  |  CF  |  IM  |  IP  |
---+------+------+------+------+------+------+------+------+
 8 | 25.9 | 18.6 | 52.6 | 48.2 | 35.8 | 22.7 | 76.3 | 68.2 | 
16 | 40.1 | 19.0 | 87.6 |  --  | 51.1 | 22.8 | 130. |  --  |
32 | 68.3 | 18.7 | 157. |  --  | 79.1 | 22.4 | 236. |  --  |
64 | 125. | 18.7 | 290. |  --  | 135. | 22.4 | 447. |  --  |
---+------+------+------+------+------+------+------+------+

M = 1000（units = us）

   |                                         K                                         |
   +---------------------------+---------------------------+---------------------------+
N  |             1             |            10             |            100            |
   +------+------+------+------+------+------+------+------+------+------+------+------+
   |  IF  |  CF  |  IM  |  IP  |  IF  |  CF  |  IM  |  IP  |  IF  |  CF  |  IM  |  IP  |
---+------+------+------+------+------+------+------+------+------+------+------+------+
 8 | 26.9 | 19.1 | 55.0 | 55.0 | 44.8 | 25.1 | 79.2 | 75.0 | 218. | 74.4 | 305. | 250. |
16 | 41.0 | 19.2 | 90.5 |  --  | 59.3 | 24.6 | 134. |  --  | 244. | 79.0 | 524. |  --  | 
32 | 68.5 | 19.0 | 159. |  --  | 87.4 | 24.7 | 241. |  --  | 271. | 80.5 | 984. |  --  |
64 | 128. | 19.7 | 312. |  --  | 168. | 26.0 | 549. |  --  | 396. | 7.78 | 2.0k |  --  |
---+------+------+------+------+------+------+------+------+------+------+------+------+

M = 10K（units = us）

   |                                         K                                         |
   +---------------------------+---------------------------+---------------------------+
N  |             1             |            10             |           1000            |
   +------+------+------+------+------+------+------+------+------+------+------+------+
   |  IF  |  CF  |  IM  |  IP  |  IF  |  CF  |  IM  |  IP  |  IF  |  CF  |  IM  |  IP  |
---+------+------+------+------+------+------+------+------+------+------+------+------+
 8 | 28.8 | 19.5 | 54.5 | 107. | 57.0 | 27.2 | 90.5 | 128. | 3.2k | 762. | 2.7k | 2.1k |
16 | 42.5 | 19.6 | 90.4 |  --  | 73.0 | 27.2 | 140. |  --  | 3.3k | 752. | 4.6k |  --  |
32 | 73.0 | 19.7 | 164. |  --  | 104. | 26.7 | 246. |  --  | 3.4k | 803. | 8.6k |  --  |
64 | 135. | 19.8 | 302. |  --  | 162. | 26.1 | 466. |  --  | 3.7k | 791. | 17.k |  --  |
---+------+------+------+------+------+------+------+------+------+------+------+------+

individual_fields（IF）通常是最快的工作方法。它的复杂度与列数成正比。不幸的是combined_fields（CF）不适用于对象数组。否则，它不仅是最快的方法，而且不会随着列数的增加而变得复杂。

我认为会更快的所有技术都不是，因为将python对象映射到键很慢（例如，打包int数组的实际查找比结构化数组快得多）。

参考

这是我要使代码完全起作用的其他问题：

发布我在问题中提到的第一个幼稚的解决方案，它只是将2D数组转换为包含原始列作为Python元组的dtype = np.object_的1D数组，然后使用1D np.searchsorted，该解决方案适用于任何{{ 1}}。实际上，按照我对当前问题的另一种回答，该解决方案并不是那么幼稚，而是相当快的，尤其是对于长度小于100的键来说，它是快速的。

Try it online!

dtype

arrays arrays arrays lexicographic numpy python sorting