为什么我在文件中读取的nohup bash脚本总是在文件结束前6k左右停止输出计数？

问题描述

我使用nohup运行bash脚本以读取文件的每一行（并提取所需的信息）。我已经在多个具有不同行大小的文件中使用了它，大多数文件大小在50k和100k之间。但是无论我的文件有多少行，nohup总是会在最后一行之前的6k左右停止输出信息。

我的脚本叫做：fetchStuff.sh

#!/bin/bash

urlFile=$1
myHost='http://example.com'
useragent='me'
count=0
total_lines=$(wc -l < $urlFile)

while read url; do
    if [[ "$url" == *html ]]; then continue; fi

    reqURL=${myHost}${url}
    stuffInfo=$(curl -s -XGET -A "$useragent" "$reqURL" | jq -r '.stuff')
    [ "$stuffInfo" != "null" ] && echo ${stuffInfo/unwanted_garbage/} >> newversion-${urlFile}
    ((count++))
    if [ $(( $count%20 )) -eq 0 ]
    then
        sleep 1
    fi
    if [ $(( $count%100 )) -eq 0 ]; then echo "$urlFile read ${count} of $total_lines"; fi
done < $urlFile

我这样称呼它：nohup ./fetchStuff.sh file1.txt & 我在nohup.out中获得计数信息，例如“ file1读取100 of 60000”，“ file1读取200 of 60000”，等等。但是它总是在文件结束前6k左右停止。

在文件上运行脚本后，每次执行tail nohup.out时，我将它们作为nohup.out的最后一行：

file1.txt read 90000 of 96317  
file2.txt read 68000 of 73376  
file3.txt read 85000 of 91722  
file4.txt read 93000 of 99757

我不知道为什么它总是在文件结束前停止在6k左右。（我设置了睡眠计时器，以避免泛滥有很多请求的api。）

解决方法

循环会跳过以html结尾的行，并且不会计入$count中。因此，我敢打赌file1.txt中有6317行以html结尾，file2.txt中有5376行，依此类推。

如果您希望$count包含它们，请将((count++))放在检查后缀的if语句之前。

while read url; do
    ((count++))
    if [[ "$url" == *html ]]; then continue; fi

    reqURL=${myHost}${url}
    stuffInfo=$(curl -s -XGET -A "$useragent" "$reqURL" | jq -r '.stuff')
    [ "$stuffInfo" != "null" ] && echo ${stuffInfo/unwanted_garbage/} >> newversion-${urlFile}
    if [ $(( $count%20 )) -eq 0 ]
    then
        sleep 1
    fi
    if [ $(( $count%100 )) -eq 0 ]; then echo "$urlFile read ${count} of $total_lines"; fi
done < $urlFile

或者，您可以使用以下方法将它们排除在total_lines之外：

total_lines=$(grep -c -v 'html$' "$urlFile")

您可以通过使用

来消除if语句

grep -v 'html$' "$urlFile" | while read url; do
    ...
done

bash centos7 linux nohup

为什么我在文件中读取的nohup bash脚本总是在文件结束前6k左右停止输出计数？

问题描述

解决方法

相关问答