问题描述
所以,我有一个像这样的虚拟 df 并将其保存到 csv 中:
raw = pd.read_csv(io.StringIO(new_data),encoding='UTF-8')
stream = pd.DataFrame(raw,columns=['timestamp','open','high','low','close','volume'])
stream['timestamp'] = pd.to_datetime(stream['timestamp'],unit='ms')
stream['date'] = pd.to_datetime(stream['timestamp']).dt.date
stream['time'] = pd.to_datetime(stream['timestamp']).dt.time
stream = stream[['date','time','volume']]
for dif_date in stream.date.unique():
grouped = stream.groupby(stream.date)
df_new = grouped.get_group(dif_date)
df_old = pd.read_csv(io.StringIO(old_data),encoding='UTF-8')
df_stream = df_old.append(df_new).reset_index(drop=True)
df_stream = df_stream.drop_duplicates(subset=['time'])
print(df_stream)
> date time open high low close volume
> 0 2021-05-06 04:08:00 9150090.0 9150090.0 9125001.0 9130000.0 9.015642
> 1 2021-05-06 04:09:00 9140000.0 9145000.0 9125012.0 9134068.0 3.121043
> 2 2021-05-06 04:10:00 9133882.0 9133882.0 9125002.0 9132999.0 5.536345
> 3 2021-05-06 04:11:00 9132999.0 9135013.0 9131000.0 9132999.0 5.880620
> 4 2021-05-06 04:08:00 9150090.0 9150090.0 9125001.0 9130000.0 9.015642
> 5 2021-05-06 04:09:00 9140000.0 9145000.0 9125012.0 9134068.0 3.121043
> 6 2021-05-06 04:10:00 9133882.0 9133882.0 9125002.0 9132999.0 5.536345
> 7 2021-05-06 04:11:00 9132999.0 9135013.0 9131000.0 9132999.0 5.880620
我尝试检查 df_old 和 df_new 之间是否存在重复数据,如果有,我将其删除:
{{1}}
但结果还是返回重复值,如何解决或重新排序? https://colab.research.google.com/drive/1vMx9hXKcbz8SDawTnHbzpV6JiRZsEuVP?usp=sharing 先谢谢了
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)