通过唯一键 Python

问题描述

def merge(a,b):
    '''
    Merge two tuple lists and remove duplicates,sorting by the first element of each tuple.
    Used for merging time series data to existing data.

    a = [          (1,123),(2,122),(3,121),(4,120)]
    b = [(0,(1,999),120),(5,123)]
    merge(a,b) == [(0,123)]
    '''
    L = list(set(a + b))
    L.sort(key=itemgetter(0))
    return L

以上函数用于合并时间序列数据。每个元组中的第一个元素实际上是一个 datetime.datetime 对象。
我需要合并：

元组应按第一个元素升序排列
没有两个元组具有相同的键（第一个元素）
右侧列表值在合并期间优先，以便：

merge(a,123)]

注意右边的元组 (3,999)（变量 b）取代了元组 (3,121)。

使用 dict 可以帮助实现唯一键约束，但在对简单 dict 和等效列表的测试中，dict 消耗大约 3-6 倍的列表内存，是不可接受的。

from datetime import datetime,timedelta
from sys import getsizeof
for exp in range(3,7):
    atime = datetime(2021,5,17,18,0)
    num_elements = 10**exp
    alist = [(atime + timedelta(seconds=i),i) for i in range(num_elements)]
    adict = {atime + timedelta(seconds=i): i for i in range(num_elements)}
    alist_size = getsizeof(alist)
    adict_size = getsizeof(adict)
    print(f'{num_elements=}: {adict_size / alist_size = :.1f}')

num_elements=1000: adict_size / alist_size = 4.1
num_elements=10000: adict_size / alist_size = 3.4
num_elements=100000: adict_size / alist_size = 6.4
num_elements=1000000: adict_size / alist_size = 4.8

我目前正在研究 Pandas DataFrame，它也可能很有用。
我将不胜感激任何指向函数或数据结构的指针，以便以简单且高效的方式获得我需要的功能。感谢阅读。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

dictionary list list merge merge python unique-index