我有一个.csv列中的数据,有时包含逗号和换行符。 如果在我的数据中有一个逗号,我用双引号括住整个string。 我怎么去parsing这个列的输出到一个.txt文件考虑换行符和逗号。
不适用于我的命令的示例数据:
,"This is some text with a,in it.",#data with commas are enclosed in double quotes,line 1 of data line 2 of data,#data with a couple of newlines,"Data that may a have,in it and also be on a newline as well.",
这是我到目前为止:
awk -F ""*,"*" '{print $4}' file.csv > column_output.txt
awk中的两个文件数字比较
如何在awk命令中插入shellvariables
使用awkparsing文本文件
sed:匹配包含换行符的string
如何使用AWK从Web日志中收集IP和用户代理信息?
Bash和awk – 如何在使用多行模式math时将variables传递给awk?
Sed / awk:alignment文件中的单词
正则expression式匹配两个string之间所有字符的最后一次出现
在这个awk命令中应该设置什么“RS”
$ cat decsv.awk BEGIN { FPAT = "([^,]*)|("[^"]+")"; OFS="," } { # create strings that cannot exist in the input to map escaped quotes to gsub(/a/,"aA") gsub(/\"/,"aB") gsub(/""/,"aC") # prepend prevIoUs incomplete record segment if any $0 = prev $0 numq = gsub(/"/,"&") if ( numq % 2 ) { # this is inside double quotes so incomplete record prev = $0 RT next } prev = "" for (i=1;i<=NF;i++) { # map the replacement strings back to their original values gsub(/aC/,"""",$i) gsub(/aB/,"\"",$i) gsub(/aA/,"a",$i) } printf "Record %d:n",++recNr for (i=0;i<=NF;i++) { printf "t$%d=<%s>n",i,$i } print "#######"
。
$ awk -f decsv.awk file Record 1: $0=<,#data with commas are enclosed in double quotes> $1=<> $2=<"This is some text with a,in it."> $3=< #data with commas are enclosed in double quotes> ####### Record 2: $0=<,"line 1 of data line 2 of data",#data with a couple of newlines> $1=<> $2=<"line 1 of data line 2 of data"> $3=< #data with a couple of newlines> ####### Record 3: $0=<,> $1=<> $2=<"Data that may a have,in it and also be on a newline as well."> $3=<> ####### Record 4: $0=<,"Data that "may" a have ""quote"" in it and also be on a newline as well.",> $1=<> $2=<"Data that "may" a have ""quote"" in it and also be on a newline as well."> $3=<> #######
以上使用GNU awk FPAT和RT。 我不知道有什么CSV格式可以让你在没有引号的字段中间有一个换行符(如果是的话,你永远不会知道任何记录结束的地方),所以脚本不允许那。 以上是在这个输入文件上运行的:
$ cat file,