Python CSV:查找有条件的最新记录

问题描述

我有一个带有以下示例数据的csv:

id bb_id cc_id datetime
-------------------------
1  11    44    2019-06-09
2  33    55    2020-06-09
3  22    66    2020-06-09
4  11    44    2019-06-09
5  11    44    2020-02-22

让我们说条件为if bb_id == 11 and cc_id == 44得到最新记录,即:

11    44    2020-02-22

如何从csv中获取此信息?

我做了什么:

 with open('sample.csv') as csv_file
     for indx,data in enumerate(csv.DictReader(csv_file)):
         # check if the conditional data is in the file?
         if data['bb_id'] == 11 and data['cc_id'] == 44:
                     # sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? could I avoid that? as I need to perform this interactively multiple times

解决方法

将所有选中的记录放入列表中,然后使用max()函数(以日期为键)。

selected_rows = []
with open('sample.csv') as csv_file
    for data in csv.DictReader(csv_file):
        # check if the conditional data is in the file?
        if data['bb_id'] == 11 and data['cc_id'] == 44:
            selected_rows.append(data)
latest = max(selected_rows,key = lambda x: x['datetime'])
print(latest)
,

如果您真的想在常规python中执行此操作,则类似以下内容很简单:

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx,data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data['datetime'])

   sorted = list_of_dates.sort()
   print( sorted[-1] ) # you already know the values for bb and cc

也尝试:

def sort_func(e):
    return e['datetime']

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx,data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data)

    sorted = list_of_dates.sort(key=sort_func)
    print( sorted[-1] )
,

我知道的最简单的方法:

import pandas as pd
import pandasql as ps

sample_df = pd.read_csv(<filepath>);

ps.sqldf("""select *
            from (select * 
            from sample_df
            where bb_id = 11 
             and cc_id = 44
             order by datetime desc) limit 1""",locals())

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...