问题描述
我使用RFC4180Parser读取包含'\'
和之前的"
的文件。效果很好。
有我的代码,我正在使用RFC4180Parser
和CsvToBeanBuilder
来读取CSV文件。
final RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
final CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(new FileReader(inputDpaCsvFilePath))
.withCSVParser(rfc4180Parser);
final List<MyClass> infos = new CsvToBeanBuilder<MyClass>(csvReaderBuilder.build())
.withType(MyClass.class)
.withSeparator(',')
.build().parse();
原始CSV文件:
"A","B","C","D"
"value 1","value 2","value 3","value 4"
"value\\" 11","value 22\\"","value 33","value 44"
但是现在文件格式改变了。在Header E
列中添加了一些逗号。
新的CSV文件:
"Header A","Header B","Header C","Header D","Header E"
"value1","value2","value3","value4","spA,spB,spC"
"value\\"5","value6\\"","value 7","value8",spC"
"value\\" 9","value 10","value 11","value 12","spC"
将会引发如下异常:
Exception in thread "pool-1-thread-1" java.lang.RuntimeException: com.opencsv.exceptions.CsvrequiredFieldEmptyException: Number of data fields does not match number of headers.
at com.opencsv.bean.concurrent.ProcessCsvLine.run(ProcessCsvLine.java:101)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.opencsv.exceptions.CsvrequiredFieldEmptyException: Number of data fields does not match number of headers.
at com.opencsv.bean.HeaderColumnNameMappingStrategy.verifyLineLength(HeaderColumnNameMappingStrategy.java:110)
at com.opencsv.bean.AbstractMappingStrategy.populateNewBean(AbstractMappingStrategy.java:313)
at com.opencsv.bean.concurrent.ProcessCsvLine.processLine(ProcessCsvLine.java:132)
at com.opencsv.bean.concurrent.ProcessCsvLine.run(ProcessCsvLine.java:85)
... 3 more
解决方法
请参阅RCF4180 specification section 2.4
在标题和每条记录中,可能有一个或多个 字段,以逗号分隔。每行应包含相同的内容 整个文件中的字段数。 空格被视为一部分 字段,并且不能忽略。最后一个字段 记录后不能带逗号。例如:
aaa,bbb,ccc
因此发生错误“数据字段数与标题数不匹配。”因为
"value1","value2","value3","value4","spA,spB,spC"
解析为7个字段(请注意前导空格和双引号。):
value1
"value2"
"value3"
"value4"
"spA
spB
spC
但标头仅包含5个字段。
无需修改csv,我们可以使用CSVParser
代替RFC4180Parser
和ignoring leading white space。以下程序演示了如何使用CSVParser
来解析提供的csv,以及RFC4180Parser
如何使用前导空格来解析字段:
import java.io.IOException;
import java.io.StringReader;
import java.util.List;
import com.opencsv.CSVParser;
import com.opencsv.CSVParserBuilder;
import com.opencsv.CSVReaderBuilder;
import com.opencsv.RFC4180Parser;
import com.opencsv.RFC4180ParserBuilder;
import com.opencsv.exceptions.CsvException;
public class ParseCsvFieldContainsCommaAndLeadingSpaceTest {
public static void main(String[] args) throws IOException,CsvException {
parseWithCSVParser();
parseWithRFC4180Parser();
}
private static void parseWithCSVParser() throws IOException,CsvException {
final CSVParser parser = new CSVParserBuilder().withIgnoreLeadingWhiteSpace(true).build();
final CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
new StringReader("\"Header A\",\"Header B\",\"Header C\",\"Header D\",\"Header E\"\r\n" +
"\"value1\",\"value2\",\"value3\",\"value4\",\"spA,spC\""))
.withCSVParser(parser);
System.out.println("Result from CSVParser");
List<String[]> lines = csvReaderBuilder.build().readAll();
for (String[] line : lines) {
System.out.println(String.join(" | ",line));
}
}
private static void parseWithRFC4180Parser() throws IOException,CsvException {
final RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
final CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
new StringReader("\"Header A\",spC\""))
////////////////////////////////// Removed space ^ to runnable
.withCSVParser(rfc4180Parser);
System.out.println("Result from RFC4180Parser");
List<String[]> lines = csvReaderBuilder.build().readAll();
for (String[] line : lines) {
System.out.println(String.join(" | ",line));
}
}
}