IEnumerable.Select

问题描述

我需要将一个大表分解为一系列 2 列的表,以便为配置器引擎动态创建表规则。此代码演示了问题:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;

namespace Spike
{
    class Program
    {
        static void Main(string[] args)
        {
            // The actual data I need to break down has ~20 properties of type string and decimal,over 18,000 rows
            var data = new List<MyData>()
            {
                new MyData("one","two",3m,"four"),new MyData("five","six",7m,"eight"),new MyData("nine","ten",11m,"twelve"),new MyData("thirteen","fourteen",15m,"sixteen"),new MyData("one","five",9m,"thirteen"),new MyData("two",10m,"fourteen"),new MyData("three","seven","fifteen"),new MyData("four","eight",12m,"sixteen")
            };

            // This shows the desired combinations of properties
            // The actual data will have ~230 combinations
            var properties = typeof(MyData).GetProperties(BindingFlags.Instance | BindingFlags.Public);
            for (var i = 0; i < properties.Length - 1; i++)
            {
                for (var j = i + 1; j < properties.Length; j++)
                {
                    Console.WriteLine($"{properties[i].Name} <=> {properties[j].Name}");
                }
            }
            /* output:
                P1 <=> P2
                P1 <=> P3
                P1 <=> P4
                P2 <=> P3
                P2 <=> P4
                P3 <=> P4
            */

            // This shows how I want one combination to appear
            // The challenge seems to be the creation of a dynamic lambda in the Select method.
            var items = data.Select(x => new { x.P2,x.P3 }).distinct().ToList();
            Console.WriteLine();
            items.ForEach(x => Console.WriteLine($"{x.P2},{x.P3}"));
            /* output:
                two,3
                six,7
                ten,11
                fourteen,15
                five,9
                six,10
                seven,11
                eight,12
            */

            Console.ReadKey();
        }
    }

    public class MyData
    {
        public string P1 { get; set; }
        public string P2 { get; set; }
        public decimal P3 { get; set; }
        public string P4 { get; set; }

        public MyData(string p1,string p2,decimal p3,string p4)
        {
            P1 = p1;
            P2 = p2;
            P3 = p3;
            P4 = p4;
        }
    }
}

我研究了 Linq、反射和表达式树,但似乎无法克服动态构建此表达式的障碍:

var items = data.Select(x => new { x.P2,x.P3 }).distinct().ToList();

其中 x.P2 和 x.P3 是动态的。

This post 似乎朝着正确的方向前进,但我没有得到结果。

建议?提前致谢!

解决方法

我希望我能正确理解您的问题。这是枚举所需对的简单扩展:

var items = data.EnumeratePropPairs().Distinct().ToList();
items.ForEach(x => Console.WriteLine($"{x.Item1},{x.Item2}"));

和实施

public static class EnumerableExtensions
{
    public static IEnumerable<Tuple<string,string>> EnumeratePropPairs<T>(this IEnumerable<T> items)
    {
        var properties = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);
        var param = Expression.Parameter(typeof(T));
        var accessors = properties.ToDictionary(p => p,p =>
        {
            var body = (Expression)Expression.MakeMemberAccess(param,p);
            if (body.Type != typeof(string))
            {
                body = Expression.Call(body,"ToString",Type.EmptyTypes);
            }

            var lambda = Expression.Lambda<Func<T,string>>(body,param);
            return lambda.Compile();
        });

        var pairs = new List<Tuple<Func<T,string>,Func<T,string>>>();

        for (var i = 0; i < properties.Length - 1; i++)
        {
            var prop1 = properties[i];
            var prop1Accessor = accessors[prop1];

            for (var j = i + 1; j < properties.Length; j++)
            {
                var prop2 = properties[j];
                var prop2Accessor = accessors[prop2];

                pairs.Add(Tuple.Create(prop1Accessor,prop2Accessor));
            }
        }

        return items.SelectMany(item => pairs.Select(p => Tuple.Create(p.Item1(item),p.Item2(item))));
    }
}
,

幸运的是,我偶然发现了这个 Fiddle 的答案,它使用了 NuGet 包 LatticeUtils.Core。此代码段说明了结果:

            var properties = typeof(MyData).GetProperties(BindingFlags.Instance | BindingFlags.Public);
            for (var i = 0; i < properties.Length - 1; i++)
            {
                for (var j = i + 1; j < properties.Length; j++)
                {
                    var subTable = data.SelectDynamic(new[] { properties[i].Name,properties[j].Name }).Distinct();
                    Console.WriteLine($"{properties[i].Name} <=> {properties[j].Name}: {subTable.Count()}");
                }
            }

具有 18,753 行 21 列的源数据集,输出为

P1 <=> P2: 26
...
P2 <=> P3: 18
... and so forth

这允许我以编程方式创建目标系统将接受的 210 个两列表,这对于人类使用供应商的 UI 输入是不切实际的。