如何使不变的F#更具性能?

问题描述

| 我想使用不可变的F#编写大量的C#代码。它是一个设备监视器,当前的实现方式是不断从串行端口获取数据并根据新数据更新成员变量。我想将其转移到F#并获得不可变记录的好处,但是我在概念验证实现中的第一枪确实很慢。
open System
open System.Diagnostics

type DeviceStatus = { RPM         : int;
                      Pressure    : int;
                      Temperature : int }

// I\'m assuming my actual implementation,using serial data,would be something like 
// \"let rec UpdateStatusWithSerialReadings (status:DeviceStatus) (serialInput:string[])\".
// where serialInput is whatever the device streamed out since the prevIoUs check: something like
// [\"RPM=90\",\"Pres=50\",\"Temp=85\",\"RPM=40\",\"Pres=23\",etc.]
// The device streams out different parameters at different intervals,so I can\'t just wait for them all to arrive and aggregate them all at once.
// I\'m just doing a POC here,so want to eliminate noise from parsing etc.
// So this just updates the status\'s RPM i times and returns the result.
let rec UpdateStatusITimes (status:DeviceStatus) (i:int) = 
    match i with
    | 0 -> status
    | _ -> UpdateStatusITimes {status with RPM = 90} (i - 1)

let initStatus = { RPM = 80 ; Pressure = 100 ; Temperature = 70 }
let stopwatch = new Stopwatch()

stopwatch.Start()
let endStatus = UpdateStatusITimes initStatus 100000000
stopwatch.Stop()

printfn \"endStatus.RPM = %A\" endStatus.RPM
printfn \"stopwatch.ElapsedMilliseconds = %A\" stopwatch.ElapsedMilliseconds
Console.ReadLine() |> ignore
这在我的机器上大约1400毫秒内运行,而等效的C#代码(具有可变成员变量)则在310毫秒内运行。有什么办法可以在不失去不变性的情况下加快速度吗?我希望F#编译器会注意到initStatus和所有中间状态变量从未被重用,因此只是在幕后对那些记录进行了突变,但我想没有。     

解决方法

在F#社区中,命令性代码和可变数据只要不属于您的公共界面,就不会感到厌烦。也就是说,只要封装可变数据并将其与其余代码隔离,就可以使用可变数据。为此,我建议如下:
type DeviceStatus =
  { RPM         : int
    Pressure    : int
    Temperature : int }

// one of the rare scenarios in which I prefer explicit classes,// to avoid writing out all the get/set properties for each field
[<Sealed>]
type private DeviceStatusFacade =
    val mutable RPM         : int
    val mutable Pressure    : int
    val mutable Temperature : int
    new(s) =
        { RPM = s.RPM; Pressure = s.Pressure; Temperature = s.Temperature }
    member x.ToDeviceStatus () =
        { RPM = x.RPM; Pressure = x.Pressure; Temperature = x.Temperature }

let UpdateStatusITimes status i =
    let facade = DeviceStatusFacade(status)
    let rec impl i =
        if i > 0 then
            facade.RPM <- 90
            impl (i - 1)
    impl i
    facade.ToDeviceStatus ()

let initStatus = { RPM = 80; Pressure = 100; Temperature = 70 }
let stopwatch = System.Diagnostics.Stopwatch.StartNew ()
let endStatus = UpdateStatusITimes initStatus 100000000
stopwatch.Stop ()

printfn \"endStatus.RPM = %d\" endStatus.RPM
printfn \"stopwatch.ElapsedMilliseconds = %d\" stopwatch.ElapsedMilliseconds
stdin.ReadLine () |> ignore
这样,公共接口就不会受到影响–ѭ2takes仍然会接收并返回一个本质不变的
DeviceStatus
–但是内部
UpdateStatusITimes
使用可变类来消除分配开销。 编辑:(回应评论)这是我通常更喜欢的类样式,使用主构造函数和
let
s+属性而不是
val
s:
[<Sealed>]
type private DeviceStatusFacade(status) =
    let mutable rpm      = status.RPM
    let mutable pressure = status.Pressure
    let mutable temp     = status.Temperature
    member x.RPM         with get () = rpm      and set n = rpm      <- n
    member x.Pressure    with get () = pressure and set n = pressure <- n
    member x.Temperature with get () = temp     and set n = temp     <- n
    member x.ToDeviceStatus () =
        { RPM = rpm; Pressure = pressure; Temperature = temp }
但是对于每个属性都是盲目的获取者/设置者的简单外观类,我觉得这有点乏味。 F#3+允许以下内容,但就我个人而言,它仍然不是一个改进(除非一个人教义地避免使用字段):
[<Sealed>]
type private DeviceStatusFacade(status) =
    member val RPM         = status.RPM with get,set
    member val Pressure    = status.Pressure with get,set
    member val Temperature = status.Temperature with get,set
    member x.ToDeviceStatus () =
        { RPM = x.RPM; Pressure = x.Pressure; Temperature = x.Temperature }
    ,这不会回答您的问题,但是可能值得退后一步,考虑一下大局: 对于此用例,您认为不变数据结构的优势是什么? F#也支持可变数据结构。 您声称F#确实“很慢”-但它仅比C#代码慢4.5倍,并且每秒执行超过7,000万次更新……对于您的实际性能而言,这可能是不可接受的吗?应用?您是否有特定的性能目标?是否有理由相信这种类型的代码将成为您应用程序的瓶颈? 设计总是要权衡。您可能会发现,为了在短时间内记录许多更改,根据您的需求,不可变的数据结构会对性能造成不可接受的损失。另一方面,如果您有诸如一次跟踪一个数据结构的多个较旧版本之类的要求,则尽管性能下降,但不变数据结构的好处可能使它们有吸引力。     ,我怀疑您看到的性能问题是由于在每次循环迭代中克隆记录(加上可忽略的时间分配记录并随后对其进行垃圾回收)时涉及到的块内存清零。您可以使用结构重写示例:
[<Struct>]
type DeviceStatus =
    val RPM : int
    val Pressure : int
    val Temperature : int
    new(rpm:int,pres:int,temp:int) = { RPM = rpm; Pressure = pres; Temperature = temp }

let rec UpdateStatusITimes (status:DeviceStatus) (i:int) = 
    match i with
    | 0 -> status
    | _ -> UpdateStatusITimes (DeviceStatus(90,status.Pressure,status.Temperature)) (i - 1)

let initStatus = DeviceStatus(80,100,70)
现在的性能将接近使用全局可变变量或将“ 10”重新定义为“ 11”的性能。这仅在您的结构长度不超过16个字节的情况下有效,否则它将以与记录相同的缓慢方式被复制。 正如您在注释中所暗示的,如果您打算将其用作共享内存多线程设计的一部分,那么在某些时候您将需要可变性。您的选择是:a)每个参数的共享可变变量b)包含结构的共享可变变量或c)包含可变字段的共享外观对象(如ildjarn的答案)。我会选择最后一个,因为它已经很好地封装并且可以扩展到四个int字段之外。     ,如下使用元组比原始解决方案快15倍:
type DeviceStatus = int * int * int

let rec UpdateStatusITimes (rpm,pressure,temp) (i:int) = 
    match i with
    | 0 -> rpm,temp
    | _ -> UpdateStatusITimes (90,temp) (i - 1)

while true do
  let initStatus = 80,70
  let stopwatch = new Stopwatch()

  stopwatch.Start()
  let rpm,_,_ as endStatus = UpdateStatusITimes initStatus 100000000
  stopwatch.Stop()

  printfn \"endStatus.RPM = %A\" rpm
  printfn \"Took %fs\" stopwatch.Elapsed.TotalSeconds
顺便说一句,计时时应使用
stopwatch.Elapsed.TotalSeconds
。