基于嵌套列表中包含的 id 元素比较两个通用列表的最有效方法 (C#)

问题描述

我有两个通用的项目列表，每个列表都包含一个供应商列表及其 ID：

List<ExisitingItems>
    List<suppliers>

List <PotentialMatches>
    List<suppliers>

suppliers
    supplierId
    Name

我需要根据每个项目中的嵌套供应商 ID 和名称，将现有项目列表与潜在匹配项列表进行比较。

我目前将这两个列表与所需的结果进行比较：

  foreach (var potentialMatch in _potentialMatches)
            {
foreach (var supplier in potentialMatch.suppliers)
                {
                    var match = ExistingItems.Find
                       (e => e.suppliers.Any
                        (s => s.supplierId == supplier.supplierItemId && s.Name == supplier.Name));

//Do stuff with match
                }
             }

但是，当处理超过 500k 的大量记录时，效率不高且执行速度非常慢。

如何才能更有效地执行相同类型的比较？

解决方法

您当前的算法似乎是 O(n*m*s*s)，其中 n = 现有项目的数量，m = 潜在匹配的数量，s = 每个现有项目/PotentialMatch 的平均供应商数量。您可以通过使用哈希集来匹配供应商，从而将运行时间减少到 O(n*m*s)。

通用版本看起来像这样

public static IEnumerable<(T1,T2)> SetJoin<T1,T2,TKey>(
        IEnumerable<T1> t1s,IEnumerable<T2> t2s,Func<T1,IEnumerable<TKey>> t1Key,Func<T2,IEnumerable<TKey>> t2Key) where TKey : IEquatable<TKey>
    {
        foreach (var t1 in t1s)
        {
            var t1Keys = new HashSet<TKey>(t1Key(t1));
            foreach (var t2 in t2s)
            {
                // t2Key(t2) would be called many times,// might be worth pre-computing it for each t2.
                if (t2Key(t2).Any(t1Keys.Contains))
                {
                    yield return (t1,t2);
                }
            }    
        }
    }

并称其为

SetJoin<ExistingItems,PotentialMatches,int>(
              existingItems,potentialMatches,e=> e.Suppliers.Select(s => s.Id),p => p.Suppliers.Select(s => s.Id))

此外，虽然 linq 可生成紧凑且美观的代码，但如果性能很重要，使用常规循环编写等效逻辑通常会更快。

c#c#generic-list ienumerable linq linq