如何使用pandas.read_csv在双引号之间读取带有千位分隔符的数字

问题描述

我有一个像这样的 csv 文件 (data.csv)：

A,B,C,D,E
1.50,2.70,"2,481","1,569",2.15
29.10,31.00,4,10,37.00
John,Jeff,Ruben,Catherine,James

我尝试使用 df=pd.read_csv("data.csv") 我得到了 df=

      A     B      C      D      E
0   1.5   2.7  2,481  1,569   2.15
1  29.1  31.0      4     10  37.00
2  John  Jeff  Ruben  Catherine James

看起来不错，但实际上是 df.iloc[0,2]='2,481',df.iloc[0,3]='1,569'，即它们是字符串而不是整数或浮点数。

当我尝试 df=pd.read_csv("data.csv",quotechar='"') 时，一切都是字符串。我无法将它们全部转换为浮点数或整数，因为还有其他元素（示例中未显示）作为正确读取的日期时间。

我也尝试过作为 read_csv 中的参数千位“，”，但它只能在我的数据文件中没有最后一行的情况下工作。字符串行似乎使数千 ='，'无效。在我的实际文件中，有很多数字行和很多字符串（名称）行。

如何处理 csv 文件中与字符串行混合的双引号之间的千位分隔符？

解决方法

有一个 thousands 参数。见read_csv：

import pandas as pd

df = pd.read_csv('data.csv',thousands=',')
print(df)
print()
print(df.dtypes)

      A     B     C     D      E
0   1.5   2.7  2481  1569   2.15
1  29.1  31.0     4    10  37.00

A    float64
B    float64
C      int64
D      int64
E    float64
dtype: object

comma csv csv csv double-quotes pandas pandas separator

如何使用pandas.read_csv在双引号之间读取带有千位分隔符的数字

问题描述

解决方法

相关问答