无法使用 Apache Commons 从 github CSV URL 解析标头

问题描述

我正在尝试使用 Apache commons csv 库从 github 访问 CSV 文件 url 中存在的每条记录的标头值。

这是我的代码

@Service
public class CoronaVirusDataService {

    private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
    
    @postconstruct
    public void getVirusData()
    {
        try
        {
        URL url = new URL(virus_data_url);
        HttpURLConnection con = (HttpURLConnection) url.openConnection();
        BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));
        
        while((in.readLine()) != null)
        {
            StringReader csvReader = new StringReader(in.readLine());
            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(csvReader);
            for (CSVRecord record : records) {
                String country = record.get("Country/Region");
                System.out.println(country);
            }       
        }
        in.close();
        }
        catch(Exception e) 
        {
            e.printstacktrace();
        }
    }
}

当我运行应用程序时出现此错误

java.lang.IllegalArgumentException: A header name is missing in [,Afghanistan,33.93911,67.709953,1,2,4,5,7,8,11,12,13,15,16,18,20,24,25,29,30,34,41,43,76,80,91,107,118,146,175,197,240,275,300,338,368,424,445,485,532,556,608,666,715,785,841,907,934,997,1027,1093]
at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
at org.apache.commons.csv.CSVFormat.parse(CSVFormat.java:1157)
at com.p1.Services.CoronaVirusDataService.getVirusData(CoronaVirusDataService.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

解决方法

如果要将第一行作为标题读取,则不应逐行读取,因为 Apache CSV 会尝试将每一行读取为标题。所以抛出异常。相反,您应该通过 reader 来读取数据。 下面的代码工作正常。

@Service
public class CoronaVirusDataService {

    private static String virus_data_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/Aysen_Chile_07032021/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv";
    
    @PostConstruct
    public void getVirusData()
    {
        try
        {
        URL url = new URL(virus_data_url);
        HttpURLConnection con = (HttpURLConnection) url.openConnection();
        BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream()));

            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(in);
            for (CSVRecord record : records) {
                String country = record.get("Country/Region");
                System.out.println(country);
            }       
   
        in.close();
        }
        catch(Exception e) 
        {
            e.printStackTrace();
        }
    }
}