使用ReadOnlySpan <T>

问题描述

在.Net Core 3.1上，我有许多大型2D阵列，需要在该阵列上对单行的一部分进行操作。同一切片可以用于多个操作，因此我只想执行一次切片并重用切片。

下面的示例代码对一个数组进行切片，然后调用2个函数对该切片进行操作。

public void MyFunc()
{
    double[,] array = ...;  // populate the array

    // select which part of the array to slice,values not important
    int index0 = 0;
    int startIndex1 = 1;
    int sliceLength = 2;

    // slice the array
    ReadOnlySpan<double> slice = Slice(array,index0,startIndex1,sliceLength);

    // do things with the slice
    DoSomething1(slice);
    DoSomething2(slice);
}

public unsafe ReadOnlySpan<double> Slice(double[,] array,int index0,int startIndex1,int sliceLength)
{
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int arrayStartIndex = index0 * array.GetLength(1) + startIndex1;
    ReadOnlySpan<double> slice;
    fixed (double* arrayPtr = array)
    {
        slice = new ReadOnlySpan<double>(arrayPtr,arrayLength).Slice(arrayStartIndex,sliceLength);
    }

    // does it matter if slice is returned inside or outside of the fixed block?
    return slice;
}

public void DoSomething1(ReadOnlySpan<double> slice)
{
    ...
}

public void DoSomething2(ReadOnlySpan<double> slice)
{
    ...
}

“固定”可确保在创建“切片”时GC不会移动“阵列”。创建“切片”后，如果GC移动了“数组”，它还会更新“切片”以引用新的“数组”地址还是“切片”仍引用旧地址？换句话说，DoSomething1（...）和DoSomething2（...）会始终在原始数组的预期切片上运行，还是会无意间在随机的内存块上运行？

此外，“返回切片”是否重要？是在“固定”区块之内还是之外？

编辑借助https://stackoverflow.com/a/40589439/13532170的启发，我设法编写了一个测试来证明V0ldek关于在移动父数组时GC更新ReadOnlySpan的地址是正确的。

public static unsafe void ReadOnlySpantest()
{
    // create 2D array
    double[,] array = new double[,] { {1,2,3},{4,5,6} };

    // parameters to convert 2D array to 1D span
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int sliceStartIndex = 1;
    int sliceLength = 2;

    // create span
    IntPtr arrayAddressBeforeMove;
    ReadOnlySpan<double> spanFromPointer;
    fixed (double* arrayPtr = array)
    {
        arrayAddressBeforeMove = (IntPtr)arrayPtr;

        // spanFromPointer should contain { 2,3 }
        spanFromPointer = new ReadOnlySpan<double>(arrayPtr,arrayLength).Slice(sliceStartIndex,sliceLength);
    }

    // trick GC into moving the array
    GC.AddMemoryPressure(10000000);
    GC.Collect();
    GC.RemoveMemoryPressure(10000000);

    // check array address and span contents again
    IntPtr arrayAddressAfterMove;
    fixed (double* arrayPtr = array)
    {
        // arrayAddressAfterMove should be different from arrayAddressBeforeMove
        arrayAddressAfterMove = (IntPtr) arrayPtr;

        // spanFromPointer should still contain { 2,3 }
    }
}

在调试器中跳过ReadOnlySpanTest（），我可以看到arrayAddressAfterMove！= arrayAddressBeforeMove，表明GC确实移动了我的数组。我还可以看到spanFromPointer在移动数组之前和之后都包含{2，3}。因此，ReadOnlySpan是使用“固定”块创建的，这没关系，离开“固定”块后仍可以安全地使用它。

解决方法

创建Span<T>，ReadOnlySpan<T>或Memory<T>后，所有后续使用都是安全的。

Here's a reference by Stephen Toub。

首先，Span是一个包含引用和长度的值类型，其定义大致如下：

public readonly ref struct Span<T>
{
  private readonly ref T _pointer;
  private readonly int _length;
  ...
}

ref T字段的概念最初可能很奇怪-实际上，人们实际上不能在C＃甚至MSIL中声明ref T字段。但是实际上Span被编写为在运行时中使用特殊的内部类型，该内部类型被视为即时（JIT）内在函数，而JIT为其生成的等效ref T字段。

Span是类似于ref的类型，因为它包含一个ref字段，并且ref字段不仅可以引用对象（如数组）的开头，还可以引用它们的中间（...）这些引用称为内部指针，对于.NET运行时的垃圾收集器而言，跟踪它们是一项相对昂贵的操作。因此，运行时将这些引用限制为仅存在于堆栈中，因为它对可能存在的内部指针的数量提供了隐式的下限。

因此，GC实际上确实跟踪了ReadOnlySpan<T>中的指针，因此在构建跨度之后始终是安全的。跨度将始终指向您切片的数组，而在哪里返回都无所谓。关于如何的确切实现细节特定于CLR。要搜索的关键字是“托管指针”和“内部指针”。如果您想获得更多细节，我建议this article。

您是否考虑过使用Microsoft.Data.Analysis Nuget软件包？用数据填充DataFrame df后，获取一行（等效于Slice方法）就像df.Rows[rowIndex]一样简单。要访问返回的行中的每个值，可以再次使用索引器：df.Rows[rowIndex][columnIndex]。

.net-core c#c#garbage-collection unsafe-pointers