Powershell中的数据操作\重复数据删除

问题描述

嘿,我想对一些数据进行重复数据删除并合并来自 CSV 的列。不知道该怎么做。这是我正在处理的数据示例:

cmmc,stig,descr
AC.1.001,SV-205663r569188_rule,The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
AC.1.001,SV-205667r569188_rule,Inappropriate granting of user rights can provide system administrative and other high-level capabilities.
AC.1.002,The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
AC.1.002,SV-205665r569188_rule,Enterprise Domain Controllers groups on domain controllers.

我非常接近我要查找的数据,但很难在第二列中的项目后添加 |<value of 'descr'>

这是我的脚本:

Import-CSV '.\input.csv' | group-object 'cmmc' |
    ForEach-Object {
        [PsCustomObject]@{
            cmmc = $_.name
            stig = $_.group.stig -Join '
'
                    }
    } | Export-Csv '.\output.csv' -NoTypeinformation

输出看起来像这样(为了可读性而格式化,省略了列名):

AC1.001    SV-205663r569188_rule
           SV-205665r569188_rule
AC1.002    SV-205663r569188_rule
           SV-205665r569188_rule

但我正在寻找这个:

AC.1.001 SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
         SV-205667r569188_rule|Inappropriate granting of user rights can provide system administrative and other high-level capabilities.
AC.1.002 SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
         SV-205665r569188_rule|Enterprise Domain Controllers groups on domain controllers.

解决方法

使用以下内容,它将 calculated properties 与应用于 Select-Object 调用结果的 Group-Object cmdlet 结合使用:

Import-Csv .\input.csv | 
  Group-Object cmmc |
    Select-Object @{ Name = 'cmmc'; e = 'Name' },@{ Name = 'stig_descr'; e = { 
          [array] $stigs,[array] $descrs,$i = $_.Group.stig,$_.Group.descr,0
          $sigs.ForEach( { $stigs[$i],$descrs[$i++] -join '|' }) -join "`n" 
        } 
      } | Export-Csv -NoTypeInformation -Encoding utf8 .\output.csv

注意:
• 需要[array]$stigs$descrs 类型约束来处理组仅包含一个 记录的情况,在这种情况下$_.Group.sig$_.Group.descr,由于member enumeration的行为,只返回一个单个字符串而不是一个单元素数组;如果没有 [array] 强制转换,则索引(例如 [$i])将在 [string] 实例上执行,这将从字符串返回该位置的单个字符 .
• 在Export-Csv 调用中,根据需要调整-Encoding。无 BOM 的 UTF-8 现在是 PowerShell (Core) 7+ 中的默认值,并且不再需要 -NoTypeInformation

生成的文件具有以下内容,显示了列内部换行符的使用(由 "..." 括起来的整个值保护):

"cmmc","stig_descr"
"AC.1.001","SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
SV-205667r569188_rule|Inappropriate granting of user rights can provide system administrative and other high-level capabilities."
"AC.1.002","SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
SV-205665r569188_rule|Enterprise Domain Controllers groups on domain controllers."

要可视化这会产生所需的数据,您可以重新导入生成的文件并使用 -Wrap 开关将其通过管道传输到 Format-Table

PS> Import-Csv .\output.csv | Format-Table -Wrap

cmmc     stig_descr
----     ---------
AC.1.001 SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
         SV-205667r569188_rule|Inappropriate granting of user rights can provide system administrative and other high-level capabilities.
AC.1.002 SV-205663r569188_rule|The ability to set access permissions and auditing is critical to maintaining the security and proper access controls of a system. To support this volumes must be formatted using a file system that supports NTFS attributes.
         SV-205665r569188_rule|Enterprise Domain Controllers groups on domain controllers.

请注意,-Wrap 尊重属性内部的换行符,但如果单个行对于控制台窗口来说太宽,则会将它们分成多行。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...