如果该值存在于txt文件中，请替换一个值

问题描述

大家好，我有一个data.ped文件，该文件由数千列和几百行组成。文件的前6列和前4行如下：

186 A_Han-4.DG 0 0 1 1
187 A_Mbuti-5.DG 0 0 1 1
188 A_Karitiana-4.DG 0 0 1 1
191 A_french-4.DG 0 0 1 1

我有一个ids.txt文件，看起来像这样：

186 Ignore_Han(discovery).DG
187 Ignore_Mbuti(discovery).DG
188 Ignore_Karitiana(discovery).DG
189 Ignore_Yoruba(discovery).DG
190 Ignore_Sardinian(discovery).DG
191 Ignore_french(discovery).DG
192 Dinka.DG
193 Dai.DG

我需要用data.ped的第二列中的值替换ids.txt文件的第一列中的值（在UNIX中）该文件将从data.ped文件中替换。例如，我想将data.ped第一列中的“ 186”值替换为ids.txt第二列中的“ Ignore_Han（discovery）.DG”值（这是因为在此值的同一行是“ 186”），因此output.ped文件必须如下所示：

Ignore_Han(discovery).DG A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG A_Karitiana-4.DG 0 0 1 1
Ignore_french(discovery).DG A_french-4.DG 0 0 1 1

data.ped文件的第一列的值是ids.txt文件的第一列中的值的子集。因此总有匹配项。

编辑：

我已经尝试过了：

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]; print}' ids.txt data.ped

但是当我用以下方法检查结果时：

cut -f 1-6 -d " " output.ped

我得到这个奇怪的输出：

A_Han-4.DG 0 0 1 1y).DG
A_Mbuti-5.DG 0 0 1 1y).DG
A_Karitiana-4.DG 0 0 1 1y).DG
A_french-4.DG 0 0 1 1y).DG

如果我使用此命令：

cut -f 1-6 -d " " output.ped | less

我明白了：

Ignore_Han(discovery).DG^M A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG^M A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG^M A_Karitiana-4.DG 0 0 1 1
Ignore_french(discovery).DG^M A_french-4.DG 0 0 1 1

我不知道为什么每一行都有^ M。

解决方法

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]} 1' ids.txt data.ped

输出：

Ignore_Han(discovery).DG A_Han-4.DG 0 0 1 1
Ignore_Mbuti(discovery).DG A_Mbuti-5.DG 0 0 1 1
Ignore_Karitiana(discovery).DG A_Karitiana-4.DG 0 0 1 1
Ignore_French(discovery).DG A_French-4.DG 0 0 1 1

这是经典的awk任务，可以根据您的要求进行各种修改。在这里，只有在data.ped中找到ids.txt的第一个字段时，我们才替换它，否则我们将行打印不变。如果您要删除不匹配的行：

awk 'NR==FNR{a[$1]=$2; next} $1 in a{$1=a[$1]; print}' ids.txt data.ped

无需对输入文件进行排序，也无需保留第二个文件的顺序。

更新：

如果输入中包含Ctrl-M个字符，请先使用

将其删除

cat file | tr -d '^M' > file.tmp && mv file.tmp file

对于您使用的任何file。通常，我建议对可能包含dos2unix或^M之类的字符的任何文本文件运行\r，通常来自dos / windows编辑。

使用join命令合并两个文件

join ids.txt data.ped > temp

您可以使用cut命令删除第一列，例如：

cut -d " " -f 2- temp > output.ped

replace txt txt