为什么我不能通过 `pandas.read_csv()` 打开一些 .ann 文件?

问题描述

import pandas as pd
from pathlib import Path


Drugs = ['ARTHROTEC','CAMBIA','CATAFLAM','DICLOFENAC-POTASSIUM','DICLOFENAC-sodium','FLECTOR','LIPITOR','PENNSAID','SOLaraZE','VOLTAREN','VOLTAREN-XR','ZIPSOR']
def extract_Tags(drug):
    Files = Path('E:/TM/Final/CADEC/original').glob(drug+'*.ann')
    for file in Files:
        try:
            data = pd.read_csv(file,sep='\t',header=None)
        except:
            print('Cannot open ',file)
    print(drug,'\n')

我在一个目录下有很多 .ann 文件,每个标题都以药物名称开头。我试图通过 pandas.read_csv() 从他们那里读入数据。但是,有些文件可以打开,有些则不能。我得到了我得到的,我不知道如何检查那些无法打开的文件出了什么问题。我应该使用其他命令打开它们吗?

for drug in Drugs:
    extract_Tags(drug)

ARTHROTEC 

Cannot open  E:\TM\Final\CADEC\original\CAMBIA.1.ann
CAMBIA 

CATAFLAM 

DICLOFENAC-POTASSIUM 

DICLOFENAC-sodium 

FLECTOR 

Cannot open  E:\TM\Final\CADEC\original\LIPITOR.197.ann
Cannot open  E:\TM\Final\CADEC\original\LIPITOR.243.ann
Cannot open  E:\TM\Final\CADEC\original\LIPITOR.28.ann
...
Cannot open  E:\TM\Final\CADEC\original\LIPITOR.964.ann
LIPITOR 

Cannot open  E:\TM\Final\CADEC\original\PENNSAID.2.ann
PENNSAID 

Cannot open  E:\TM\Final\CADEC\original\SOLaraZE.1.ann
Cannot open  E:\TM\Final\CADEC\original\SOLaraZE.3.ann
SOLaraZE 

Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.11.ann
Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.13.ann
Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.4.ann
...
VOLTAREN 

Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.11.ann
Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.13.ann
Cannot open  E:\TM\Final\CADEC\original\VOLTAREN-XR.4.ann
...
VOLTAREN-XR 

ZIPSOR 

如果我尝试打开其中一个特定文件,它会返回“没有要从文件解析的列”,我不太明白。如何确定数据文件是否损坏或我应该以其他方式处理?顺便说一句,因为这是一个基准数据集,我发现文件构建不当很奇怪。

pd.read_csv("E:\TM\Final\CADEC\original\LIPITOR.197.ann",header=None)


---------------------------------------------------------------------------
EmptyDataError                            Traceback (most recent call last)
<ipython-input-4-8f6a7735c992> in <module>
      1 # check one of the unopenable files
----> 2 pd.read_csv("E:\TM\Final\CADEC\original\LIPITOR.197.ann",header=None)

d:\python\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer,sep,delimiter,header,names,index_col,usecols,squeeze,prefix,mangle_dupe_cols,dtype,engine,converters,true_values,false_values,skipinitialspace,skiprows,skipfooter,nrows,na_values,keep_default_na,na_filter,verbose,skip_blank_lines,parse_dates,infer_datetime_format,keep_date_col,date_parser,dayfirst,cache_dates,iterator,chunksize,compression,thousands,decimal,lineterminator,quotechar,quoting,doublequote,escapechar,comment,encoding,dialect,error_bad_lines,warn_bad_lines,delim_whitespace,low_memory,memory_map,float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer,kwds)
    677 
    678     parser_f.__name__ = name

d:\python\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer,kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf,**kwds)
    449 
    450     if chunksize or iterator:

d:\python\lib\site-packages\pandas\io\parsers.py in __init__(self,f,**kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):

d:\python\lib\site-packages\pandas\io\parsers.py in _make_engine(self,engine)
   1112     def _make_engine(self,engine="c"):
   1113         if engine == "c":
-> 1114             self._engine = CParserWrapper(self.f,**self.options)
   1115         else:
   1116             if engine == "python":

d:\python\lib\site-packages\pandas\io\parsers.py in __init__(self,src,**kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src,**kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

EmptyDataError: No columns to parse from file

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)