问题描述
您好,我想总结一下并在从CSV文件读取的文档中添加数字。
例如我的csv看起来像这样
Date,Customer number,Customer,Project number,Project,Worked time
2020,2020010,Apple,12345,Buying laptops,1,00
2020,4,3,Nokia,98738,Buying phones,00
我想将其输出到一个csv文件中,并使脚本像这样总结每个客户的工作时间数量
Apple,11岁 诺基亚5
到目前为止,我只有这个
results = []
with open('Time_export.csv') as File:
reader = csv.DictReader(File)
for row in reader:
results.append(row)
print (results)
我是这个菜鸟:) 一直试图用谷歌搜索它,但无法弄清楚:( 有什么想法吗?
解决方法
使用词典存储客户名称和总数:
import csv
data = '''
Date,Customer number,Customer,Project number,Project,Worked time
2020,2020010,Apple,12345,Buying laptops,1,00
2020,4,3,Nokia,98738,Buying phones,00
'''.strip()
with open('Time_export.csv','w') as f: f.write(data) # write test file
################################
cust = {} # customer totals
with open('Time_export.csv') as File:
reader = csv.DictReader(File)
for row in reader:
if row['Customer'] in cust:
cust[row['Customer']] += int(row['Worked time'])
else:
cust[row['Customer']] = int(row['Worked time'])
print (cust)
输出
{'Apple': 11,'Nokia': 5}
如果您想尝试熊猫,代码会变小:
import pandas
df = pandas.read_csv('Time_export.csv',index_col=False )
df['Worked time'] = df['Worked time'].astype(int)
gb = df.groupby('Customer')["Worked time"].sum().reset_index()
print(gb.to_string(index=False))
输出
Customer Worked time
Apple 11
Nokia 5
,
我发现collections.defaultdict
对于这类事情很有用。它会根据需要自动创建新的键/值对。在这种情况下,默认为int
会根据需要创建0
。
import csv
import collections
with open('Time_export.csv') as File:
results = collections.defaultdict(int)
reader = csv.DictReader(File)
for row in reader:
results[row['Customer']] += int(row['Worked time'])
for name,num in sorted(results.items()):
print(f"{name}: {num}")
,
pandas
是用于处理表的强大库。它很难学习,但是值得努力。您的数据在“工作时间”列中使用逗号,使其无效CSV。如果将其更改为“。”或正确地转义,那么您可以用几行代码来完成这项工作。
open
这是由客户分组的,除去“工作时间”列以外的所有内容,然后对分组求和。结果是一个系列对象,其行为非常类似于字典:
import pandas as pd
df = pd.read_csv('Time_export.csv')
sums = df.groupby("Customer")["Worked time"].sum()
,
我尝试过
import collections
with open('Time.csv') as File:
results = collections.defaultdict(int)
reader = csv.DictReader(File)
for row in reader:
results[row['Customer']] += int(row['Worked time'])
for name,num in sorted(results.items()):
print(f"{name}: {num}")
但是得到了结果
Traceback (most recent call last):
File "/Users/stoffe/Desktop/Python/Time.py",line 8,in <module>
results[row['Customer']] += int(row['Worked time'])
ValueError: invalid literal for int() with base 10: '1,00'
,
我现在大部分事情都可以正常工作,但是我仍然遇到一些麻烦
import csv
import collections
f = open('./Time_export.csv','r')
a = [',00']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
f = open('./Time_export.csv','w')
for line in lst:
f.write(line)
f.close()
with open('Time_export.csv') as File:
results = collections.defaultdict(int)
reader = csv.DictReader(File)
for row in reader:
print(row['Project'],row['Service'],row['Worked time'])
f = open('./Time.csv','w')
for name,num in sorted(results.items()):
f.write(f"{name}: {num}")
f.close()
我打开文件以在小时后删除.00,但是为了某种原因,我为每个条目获得1个pos,而不是将数字添加到每个项目中,结果显示在终端窗口中,但是Time.csv文件是还是空的。
看起来像这样
Apple Cleaning 6
Volvo Installing 4
AFRY Window Cleaning 5
Apple Cleaning 1
Apple Building 1
AFRY Window Cleaning 2
Donald Duck Writing 12
Donald Duck Reading 2
有什么想法吗?