PowerShell 7.0如何计算按块读取的大文件的哈希和

问题描述

脚本应复制文件并计算它们的哈希值总和。 我的目标是使该函数将读取文件而不是3(read_for_copy + read_for_hash + read_for_another_copy)一次,以最大程度地减少网络负载。 因此,我尝试读取一块文件,然后计算md5哈希和并将文件写出到几个位置。 文件大小可能从100 MB到2 TB,甚至更大。此时无需检查文件身份,只需计算初始文件的哈希值即可。

而且我对计算哈希总和感到困惑:

    $ifile = "C:\Users\User\Desktop\inputfile"
    $ofile = "C:\Users\User\Desktop\outputfile_1"
    $ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
    $md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
    $bufferSize = 10mb
    $stream = [System.IO.File]::OpenRead($ifile)
    $makenew = [System.IO.File]::OpenWrite($ofile)
    $makenew2 = [System.IO.File]::OpenWrite($ofile2)
    $buffer = new-object Byte[] $bufferSize
    
    while ( $stream.Position -lt $stream.Length ) {
       
     $bytesRead = $stream.Read($buffer,$bufferSize)
     $makenew.Write($buffer,$bytesread) 
     $makenew2.Write($buffer,$bytesread) 
    
     # I am stuck here
     $hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""      
            
            }
    
    $stream.Close()
    $makenew.Close()
    $makenew2.Close()

我如何收集数据块来计算整个文件的哈希值?

还有一个额外的问题:是否可以在并行模式下计算哈希并写出数据?尤其要考虑到PS版本6不支持workflow {parallel{}}吗?

非常感谢

解决方法

如果要手动处理输入缓冲,则需要使用TransformBlock公开的TransformFinalBlock / $md5方法:

while($bytesRead = $stream.Read($buffer,$bufferSize))
{
    # Write to file copies
    $makenew.Write($buffer,$bytesread) 
    $makenew2.Write($buffer,$bytesread)

    # Feed next chunk to MD5 CSP
    $null = $md5.TransformBlock($buffer,$bytesRead,$null,0)
}

# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0),0)

# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')

我的目标是使该函数读取一次文件,而不是3(read_for_copy + read_for_hash + read_for_another_copy),以最大程度地减少网络负载

我不太确定您所说的网络负载是什么意思。如果源文件位于远程文件共享上,但是新副本进入本地文件系统,则只需复制一次源文件,然后将其最小化即可将一个副本用作第二个副本和哈希计算的来源:

$ifile = "\\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2

# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')

FWIW,直接将文件流对象传递给$md5.ComputeHash($stream)可能比手动缓冲输入要快

,

最终列表

$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"

$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 1mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize

while ( $stream.Position -lt $stream.Length ) 
{
     $bytesRead = $stream.Read($buffer,$bufferSize)
     $makenew.Write($buffer,$bytesread) 
     $makenew2.Write($buffer,$bytesread) 
    
     $hash = $md5.TransformBlock($buffer,0)  
} 

$md5.TransformFinalBlock([byte[]]::new(0),0)
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')      
$hash
$stream.Flush()
$stream.Close()
$makenew.Flush()
$makenew.Close()
$makenew2.Flush()
$makenew2.Close()