如何遍历关键字的出现频率

问题描述

enter image description here

----我必须从中搜索关键字的文本文件[文件名---测试]

<\kbd>

------关键字文件为[word.txt]

    centos is my bro$
    red hat is my course$
    ubuntu is my OS$
    fqdn is stupid $
    $
    $
    $
    tom outsmart jerry$
    red hat is my boy$
    jerry is samall 

-----我的代码

  red hat$
  we$
  hello$
  bye$
  Compensation

----我的期望是,输出应该是

  while read "p"; do
  paste  -d',' <(echo -n  "$p" ) <(echo "searchall") <(  grep -i "$p" test | wc -l) <(grep  -i -A 1  -B 1   "$p" test )
  done <word.txt
   

----但是从我的代码中输出了

 keyword,serchall,frequency,line above it
                            line it find keyword in
                            line below  it
              
 red hat,searchall,2,centos is my bro
                     red hat is my course
                     ubuntu is my OS                                
            
 red hat,tom outsmart jerry
                     red hat is my boy
                     jerry is samall

----请给我建议,并指出正确的方向,以获得所需的输出。

----我正在尝试从文件中复制关键字并打印它们 这里应该创建两个记录,因为关键字(红色帽子)要两次出现了

----我如何遍历关键字的出现频率。

解决方法

这听起来很像是一项家庭作业。
c.f. BashFAQ以获得更好的阅读效果;保持简单,专注于您的要求。

重写为更精确的格式-

while read key                          # read each search key 
do cnt=$(grep "$key" test|wc -l)        # count the hits
   pad="$key,searchall,$cnt,"           # build the "header" fields
   while read line                      # read the input from grep
   do if [[ "$line" =~ ^-- ]]           # treat hits separately
      then pad="$key,"   # reset the "header"
           echo                         # add the blank line
           continue                     # skip to next line of data
      fi
      echo "$pad$line"                  # echo "header" and data
      pad="${pad//?/ }"                 # convert header to spacving
   done < <( grep -B1 -A1 "$key" test ) # pull hits for this key
   echo                                 # add blank lines between
done < word.txt                         # set stdin for the outer read                       

$: cat word.txt
course
red hat

$: ./tst
course,1,centos is my bro
                   red hat is my course
                   ubuntu is my OS

red hat,2,centos is my bro
                    red hat is my course
                    ubuntu is my OS

red hat,tom outsmart jerry
                    red hat is my boy
                    jerry is samall
,

这将基于对您的要求的一种解释而产生预期的输出,并且如果我对您想做的事情有任何错误的猜测,则应该易于修改。

$ cat tst.awk
BEGIN {
    RS = ""
    FS = "\n"
}
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
NR == FNR {
    words[$0]
    next
}
{
    for (word in words) {
        for (i=1; i<=NF; i++) {
            if ($i ~ word) {
                map[word,++cnt[word]] = (i>1 ? $(i-1) : "") FS $i FS $(i+1)
            }
        }
    }
}
END {
    for (word in words) {
        for (i=1; i<=cnt[word]; i++) {
            beg = sprintf("%s,%d,",word,cnt[word])
            split(map[word,i],lines)
            for (j=1; j in lines; j++) {
                print beg lines[j]
                beg = sprintf("%*s",length(beg),"")
            }
            print ""
        }
    }
}

$ awk -f tst.awk words file
red hat,tom outsmart jerry
                    red hat is my boy
                    jerry is samall

我认为您的实际输入不是像您发布的示例中那样以一堆空格开头的-如果可以的话,这很容易容纳。

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...