Python CSV:查找有条件的最新记录

问题描述

我有一个带有以下示例数据的csv:

id bb_id cc_id datetime
-------------------------
1  11    44    2019-06-09
2  33    55    2020-06-09
3  22    66    2020-06-09
4  11    44    2019-06-09
5  11    44    2020-02-22

让我们说条件为if bb_id == 11 and cc_id == 44得到最新记录,即:

11    44    2020-02-22

如何从csv中获取此信息?

我做了什么:

 with open('sample.csv') as csv_file
     for indx,data in enumerate(csv.DictReader(csv_file)):
         # check if the conditional data is in the file?
         if data['bb_id'] == 11 and data['cc_id'] == 44:
                     # sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? Could I avoid that? as I need to perform this interactively multiple times

解决方法

将所有选中的记录放入列表中,然后使用max()函数(以日期为键)。

selected_rows = []
with open('sample.csv') as csv_file
    for data in csv.DictReader(csv_file):
        # check if the conditional data is in the file?
        if data['bb_id'] == 11 and data['cc_id'] == 44:
            selected_rows.append(data)
latest = max(selected_rows,key = lambda x: x['datetime'])
print(latest)
,

如果您真的想在常规python中执行此操作,则类似以下内容很简单:

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx,data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data['datetime'])

   sorted = list_of_dates.sort()
   print( sorted[-1] ) # you already know the values for bb and cc

也尝试:

def sort_func(e):
    return e['datetime']

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx,data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data)

    sorted = list_of_dates.sort(key=sort_func)
    print( sorted[-1] )
,

我知道的最简单的方法:

import pandas as pd
import pandasql as ps

sample_df = pd.read_csv(<filepath>);

ps.sqldf("""select *
            from (select * 
            from sample_df
            where bb_id = 11 
             and cc_id = 44
             order by datetime desc) limit 1""",locals())