问题描述
我有两个大文件要比较(超过 10 GB)。以下命令适用于小文件,但似乎占用了我机器上的 RAM 空间。
如有任何想法,我们将不胜感激。
robocopy.exe C:\Folder\ C:\Folder\ /l /nocopy /is /e /fp /ns /nc /njh /njs /tee /log:c:\temp\FolderList.txt
$path = 'C:\Folder\'
$pattern = [regex]::Escape($path)
$newContent = @()
Get-Content -Path "c:\temp\FolderList.txt" | ForEach-Object {$newContent += $_ -replace $pattern,''}
Set-Content -Path "c:\temp\FolderList.txt" -Value $newContent
(Get-Content C:\temp\FolderList.txt).Trim() -ne '' | Set-Content C:\temp\FolderList.txt
robocopy.exe C:\Folder2\ C:\Folder2\ /l /nocopy /is /e /fp /ns /nc /njh /njs /tee /log:c:\temp\FolderList2.txt
$path = 'C:\Folder2\'
$pattern = [regex]::Escape($path)
$newContent = @()
Get-Content -Path "c:\temp\FolderList2.txt" | ForEach-Object {$newContent += $_ -replace $pattern,''}
Set-Content -Path "c:\temp\FolderList2.txt" -Value $newContent
(Get-Content C:\temp\FolderList2.txt).Trim() -ne '' | Set-Content C:\temp\FolderList2.txt
Compare-Object -ReferenceObject (Get-Content c:\temp\FolderList.txt) -DifferenceObject (Get-Content c:\temp\FolderList2.txt)
最后更新
文件夹列表.txt
C:\Folder\Data2\Documents\
C:\Folder\Data2\Documents\1.txt
C:\Folder\Data2\Documents\2.txt
C:\Folder\Data2\Documents\3.txt
C:\Folder\Data2\Documents\4.txt
C:\Folder\Data2\Documents\5.txt
比较Log1.txt
Data2\Documents\
C:\Folder\Data2\Documents\
Data2\Documents\1.txt
C:\Folder\Data2\Documents\1.txt
Data2\Documents\2.txt
C:\Folder\Data2\Documents\2.txt
Data2\Documents\3.txt
C:\Folder\Data2\Documents\3.txt
Data2\Documents\4.txt
C:\Folder\Data2\Documents\4.txt
Data2\Documents\5.txt
C:\Folder\Data2\Documents\5.txt
所需的输出:
Data2\Documents\
Data2\Documents\1.txt
Data2\Documents\2.txt
Data2\Documents\3.txt
Data2\Documents\4.txt
Data2\Documents\5.txt
更新-2:
输出:
Data2\Documents\
C:\Folder\Data2\Documents\
Data2\Documents\1.txt
C:\Folder\Data2\Documents\1.txt
Data2\Documents\2.txt
C:\Folder\Data2\Documents\2.txt
Data2\Documents\3.txt
C:\Folder\Data2\Documents\3.txt
Data2\Documents\4.txt
C:\Folder\Data2\Documents\4.txt
Data2\Documents\5.txt
C:\Folder\Data2\Documents\5.txt
解决方法
首先,使用 +=
向数组添加内容是一种已知的内存占用,因为数组具有固定长度,当您向其中添加新元素时,完整 数组需要在内存中重构。
所以对于替换和删除每个日志文件的空行,我建议这样做:
robocopy.exe C:\Folder\ C:\Folder\ /l /nocopy /is /e /fp /ns /nc /njh /njs /tee /log:c:\temp\FolderList.txt
robocopy.exe C:\Folder2\ C:\Folder2\ /l /nocopy /is /e /fp /ns /nc /njh /njs /tee /log:c:\temp\FolderList2.txt
$path = 'C:\Folder\'
$newFile = 'C:\temp\CompareLog_1.txt' # have it create a new file instead of gathering all 10Gb in memory
$pattern = [regex]::Escape($path)
# use 'switch' to parse the log file line-by-line
# and write the processed lines to the new file.
# this will be lean on mmory,but takes a lot of disk write actions..
switch -Regex -File 'C:\temp\FolderList.txt' {
$pattern { Add-Content $newFile -Value ($_ -replace $pattern).Trim() }
default { if ($_ -match '\S') { Add-Content $newFile -Value $_.Trim() }} # non-empty or whitespace-only lines
}
对于第二个日志文件:
$path = 'C:\Folder2\'
$newFile = 'C:\temp\CompareLog_2.txt'
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:\temp\FolderList2.txt' {
$pattern { Add-Content $newFile -Value ($_ -replace $pattern).Trim() }
default { if ($_ -match '\S') { Add-Content $newFile -Value $_.Trim() }}
}
接下来您需要将新文件 CompareLog_1.txt
与 CompareLog_2.txt
进行比较,但我猜这些文件可能仍然很大,因此我同意 Zilog80 最好使用专用软件.
根据您希望看到的结果,您也可以考虑使用旧的 fc.exe
,它运行速度快且不占用内存。
类似的东西
fc.exe /C /N 'C:\temp\CompareLog_1.txt' 'C:\temp\CompareLog_2.txt'
您可以通过不使用 Add-Content
而是使用 StreamWriter 来加速要比较的文件的写入:
(这将创建一个 Utf8NoBOM 编码的文件)
$path = 'C:\Folder\'
$newFile = 'C:\temp\CompareLog_1.txt'
$writer = [System.IO.StreamWriter]::new($newFile)
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:\temp\FolderList.txt' {
$pattern { $writer.WriteLine(($_ -replace $pattern).Trim()) }
default { if ($_ -match '\S') { $writer.WriteLine($_.Trim()) }}
}
# clean up
$writer.Flush()
$writer.Dispose()
$path = 'C:\Folder2\'
$newFile = 'C:\temp\CompareLog_2.txt'
$writer = [System.IO.StreamWriter]::new($newFile)
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:\temp\FolderList2.txt' {
$pattern { $writer.WriteLine(($_ -replace $pattern).Trim()) }
default { if ($_ -match '\S') { $writer.WriteLine($_.Trim()) }}
}
# clean up
$writer.Flush()
$writer.Dispose()