问题描述
我有一个带有相对奇怪分隔符的 .txt 文件。数据看起来像这样:
|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||
|ABC5|,|Name2,extraname|,|NameRaw2|,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||
|ABC6|,|Name3|,|NameRaw3|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|
所以存在一些没有管道的变量的问题,例如这里的第六个变量只是一个没有管道的数量,而有些变量只有在它们为空时才没有管道,就像这里的第四个变量是 {{1} } 或 ,
。一些变量也有逗号,所以我不能使用逗号作为分隔符。所以基本上有两个问题:
- 分隔符是逗号,但逗号也会出现在字符串值中
- 有些变量在管道内,有些不在,有些只有在它们不为空时才存在
我正在寻找一种在 Stata 中解决此问题的方法。有人知道怎么做吗?
解决方法
如果完整数据集比这个例子更混乱,我真的不想知道。但这似乎有些道理。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str100 whatever
"|ABC4|,|Name1|,|NameRaw1|,|y|,|XY1|,10000.0,| |,|FOURTH QUARTER REPORT|,||"
"|ABC5|,|Name2,extraname|,|NameRaw2|,|XY2|,266539.0,|pac |,|MID-YEAR REPORT|,||"
"|ABC6|,|Name3|,|NameRaw3|,|X,Y3|,60000.0,|name |,|YEAR-END REPORT|,|XYZ|"
end
gen work = whatever
replace work = subinstr(work,",||,.)
forval j = 1/5 {
gen work`j' = substr(work,1,strpos(work,"|,") + 1)
replace work = subinstr(work,work`j',"",1)
}
gen work6 = substr(work,"))
replace work = subinstr(work,work6,1)
forval j = 7/8 {
gen work`j' = substr(work,1)
}
gen work9 = work
drop work
forval j = 1/9 {
replace work`j' = trim(subinstr(work`j',"|",.))
replace work`j' = substr(work`j',length(work`j') - 1) if substr(work`j',-1,1) == ","
}
list
+-----------------------------------------------------------------------------------+
1. | whatever |
| |ABC4|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC4 | Name1 | NameRaw1 | y | XY1 | 10000.0 | |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| FOURTH QUARTER REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
2. | whatever |
| |ABC5|,|| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC5 | Name2,extraname | NameRaw2 | | XY2 | 266539.0 | pac |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| MID-YEAR REPORT | |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
3. | whatever |
| |ABC6|,|XYZ| |
|-----------------------------------------------------------------------------------|
| work1 | work2 | work3 | work4 | work5 | work6 | work7 |
| ABC6 | Name3 | NameRaw3 | y | X,Y3 | 60000.0 | name |
|-----------------------------------------------------------------------------------|
| work8 | work9 |
| YEAR-END REPORT | XYZ |
+-----------------------------------------------------------------------------------+