问题描述
我有一个ID和名称的主数据。他们几乎有13000个条目。文件名是master.txt
id name
1: name1
2: test
3: fin
4: miar
现在我有了id
和someproperty
的另一个数据列表。每个ID可以出现多次。数据为74000个条目。 person_entries.txt
例如数据:
id property
1: somevalue001
2: somevalue002
2: somevalue003
1: somevalue004
例如:
name property
name1: somevalue001
test: somevalue002
test: somevalue003
name1: somevalue004
我正在尝试以下脚本vlookup.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
IFS=$'\n';
myarr=(`echo $line | awk -f break_data.awk`)
#This will break each data into two lines (id and property which then can be stored as array)
awk -v var="${myarr[0]}:" -v var2="${myarr[1]}" -f find_data.awk master.txt
# here we pass the id and property to awk as variables. It will search for id in the master.txt and print name and propert
done < "person_entries.txt"
break_data.awk
# INPUT
# 1: name1
# OUTPUT
# 1
# name1
BEGIN{
FS=": "
}
{
for(i=1;i<NF+1;i++)
{
print $i
}
}
END{
}
find_data.awk
#THIS WILL SEARCH THE ID: IN EACH LINE OF break_data2.awk
#WHEN IT FINDS THEN IT WILL PRING THE NAME AnD PROPERTY
BEGIN{
FS=": "
#print(var)
}
{
s=index($0,var)
if(s != 0){
print $2": "var2
}
else{
next
}
}
END{
}
当我跑步时
sh vlookup.sh
这需要很多时间。
Excel可以比这更快。
为我的理解写的答案代码:
$ awk ' # use awk
{
if(NR==FNR)
{ # process first file
a[$1]=$2 # hash to a array id is key,name value
next # process next record without executing following code
} else
{ # process second file
print a[$1]":",$2 # output name (the value of) from array a and property
}
}' master person
解决方法
应该执行类似的操作。不过,您可能需要对:
进行一些调整(请参见标题),并确定如果没有匹配项该怎么办:
$ awk 'NR==FNR{a[$1]=$2;next}{print a[$1]":",$2}' master person
输出:
name: property
name1: somevalue001
test: somevalue002
test: somevalue003
name1: somevalue004
解释:
$ awk ' # use awk
NR==FNR { # process first file
a[$1]=$2 # hash to a array id is key,name value
next # process next record without executing following code
}
{ # process second file
print a[$1]":",$2 # output name (the value of) from array a and property
}' master person # of the second file,colon in the middle
,
不如awk
快,但是比bash代码快。
#!/usr/bin/env bash
IFS= read -r master_head < master.txt
IFS= read -r person_head < person_entries.txt
printf '%s: %s\n' "${master_head##* }" "${person_head##* }"
while IFS= read -ru8 master; do
while IFS= read -ru9 person; do
if [[ ${master%% *} == ${person%% *} ]]; then
printf '%s: %s\n' "${person##* }" "${master##* }"
fi
done 9< <(tail -n+2 master.txt)
done 8< <(tail -n+2 person_entries.txt)