Dataframe append 和 drop_duplicates 问题

问题描述

所以,我有一个像这样的虚拟 df 并将其保存到 csv 中:

raw = pd.read_csv(io.StringIO(new_data),encoding='UTF-8')

stream = pd.DataFrame(raw,columns=['timestamp','open','high','low','close','volume'])
stream['timestamp'] = pd.to_datetime(stream['timestamp'],unit='ms')
stream['date'] = pd.to_datetime(stream['timestamp']).dt.date
stream['time'] = pd.to_datetime(stream['timestamp']).dt.time
stream = stream[['date','time','volume']]

for dif_date in stream.date.unique():
    grouped = stream.groupby(stream.date)
    df_new = grouped.get_group(dif_date)
    df_old = pd.read_csv(io.StringIO(old_data),encoding='UTF-8')

df_stream = df_old.append(df_new).reset_index(drop=True)
df_stream = df_stream.drop_duplicates(subset=['time'])
print(df_stream)

>    date        time      open       high       low        close      volume
> 0  2021-05-06  04:08:00  9150090.0  9150090.0  9125001.0  9130000.0  9.015642
> 1  2021-05-06  04:09:00  9140000.0  9145000.0  9125012.0  9134068.0  3.121043
> 2  2021-05-06  04:10:00  9133882.0  9133882.0  9125002.0  9132999.0  5.536345
> 3  2021-05-06  04:11:00  9132999.0  9135013.0  9131000.0  9132999.0  5.880620
> 4  2021-05-06  04:08:00  9150090.0  9150090.0  9125001.0  9130000.0  9.015642
> 5  2021-05-06  04:09:00  9140000.0  9145000.0  9125012.0  9134068.0  3.121043
> 6  2021-05-06  04:10:00  9133882.0  9133882.0  9125002.0  9132999.0  5.536345
> 7  2021-05-06  04:11:00  9132999.0  9135013.0  9131000.0  9132999.0  5.880620

我尝试检查 df_old 和 df_new 之间是否存在重复数据,如果有,我将其删除:

{{1}}

但结果还是返回重复值,如何解决或重新排序? https://colab.research.google.com/drive/1vMx9hXKcbz8SDawTnHbzpV6JiRZsEuVP?usp=sharing 先谢谢了

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)