问题描述
我正在处理一个要求,有 2 个 CSV 如下 -
CSV.csv
Short Description Category
Device is DOWN! Server Down
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization
Device Performance Alerts was triggered on Physical memory Memory utilization
Device Performance Alerts was triggered on Physical memory Memory utilization
Device Performance Alerts was triggered on Physical memory Memory utilization
disk Space Is Lowon ;E: disk Space utilization
disk Space Is Lowon;C: disk Space utilization
Network Interface Down Interface Down
and reference.csv
Category Complexity
Server Down Simple
Network Interface down Complex
Drive Cleanup Windows Medium
cpu utilization Medium
Memory utilization Medium
disk Space utilization Unix Simple
Windows Service Restart Medium
UNIX Service Restart Medium
Web Tomcat Instance Restart Simple
Expected Output
Short Description Category Complexity
Device is DOWN! Server Down Simple
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization Medium
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization Medium
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization Medium
cpu Warning Monitoron XSSXSXSXSXSX.com cpu utilization Medium
Device Performance Alerts was triggered on Physical memory Memory utilization Medium
Device Performance Alerts was triggered on Physical memory Memory utilization Medium
Device Performance Alerts was triggered on Physical memory Memory utilization Medium
disk Space Is Lowon ;E: disk Space utilization Medium
disk Space Is Lowon;C: disk Space utilization Medium
Network Interface Down Interface Down Complex
现在,我需要查询 CSV1.csv
并选择 'Category'
的值并在 Category
的 reference.csv
列中查找所有可能的匹配项并获得相应的 'Complexity'
来自 reference.csv
并针对 CSV1.csv
的每个类别放置数据。
我正在使用 find.all 来实现这一点。我无法按预期进行。有没有更好的方法来实现相同的目标。
解决方法
一种可能的方法:
my_dict = dict(zip(reference_df['Category'].values,reference_df['Complexity'].values))
def match_key(key,default_value):
for d_key in my_dict.keys():
if key in d_key or d_key in key:
return my_dict[d_key]
return default_value
CSV1_df['Complexity'] = CSV1_df['Category'].apply(lambda x: match_key(x,'default'))
说明:
- 通过压缩参考
dict
中的 Category 和 Complexity 列来构建Dataframe
,即{'Server Down': 'Simple','Network Interface down': 'Complex'...}
- 使用
apply
和lambda
函数使用 CSV1Dataframe
中的每个 Category 值从字典中获取相应的 Complexity 值作为关键 - 我们定义了一个函数来查找 CSV1
Dataframe
中的 Category 值是否是字典中任何键的子字符串,或者相反,并在apply
中使用它立> - 将其保存到 CSV1
Dataframe
中的新列