flat_table 获取 ValueError:无法从重复的轴重新索引,我的问题与此错误不同

问题描述

我有如下数据框

  behavIoUr_attributes 
  0 {'className': 'behavIoUr','type': 'behavIoUr','verb': 'can_perform_stw_everything','bs': [{'bid': ObjectId('6050da979198a053c3a02484'),'n': 'Can Perform Spin Wheel Everything','ao': datetime.datetime(2021,4,6,266000),'bs': 'CountLimitException','tids': [ObjectId('605073cb9198a053c39d7a4d')],'tags': [{'tid': ObjectId('605073cb9198a053c39d7a4d'),'prsn': True}],'prz': {'ch': False,'pts': [{'pid': ObjectId('6050d99e9198a053c3a01bee'),'pts': 0,'eo': datetime.datetime(2021,8,18,0)}]}}]} 
  1 {'className': 'behavIoUr','verb': 'game_escape_run','md': [{'n': 'total_score','v': '32'},{'n': 'game_id','v': '3'}],'bs': [{'bid': ObjectId('6050dba29198a053c3a02e4d'),'n': 'Game Escape Run',5,1,230000),'bs': 'OK','tids': [ObjectId('605073769198a053c39d77f1'),ObjectId('605071569198a053c39d6ab9')],'tags': [{'tid': ObjectId('605071569198a053c39d6ab9'),'prsn': True},{'tid': ObjectId('605073769198a053c39d77f1'),'pts': [{'pid': ObjectId('6050d9689198a053c3a019f8'),'pts': 1,0)}],'at': {'tids': [ObjectId('605073769198a053c39d77f1'),'prsn': True}]}}}]}

import flat_table
if 'behavIoUr_attributes' in getDataByDate_df.columns:
    df = pd.DataFrame(getDataByDate_df['behavIoUr_attributes'])
    getDataByDate_dfA = flat_table.normalize(df)
    getDataByDate_df = pd.concat([getDataByDate_df,getDataByDate_dfA],axis=1)
    getDataByDate_df.drop('index',axis=1,inplace=True)
    getDataByDate_df.drop('behavIoUr_attributes',inplace=True)
    del getDataByDate_dfA
    del df

我尝试删除索引然后使用 flat_table ,但在 getDataByDate_dfA = flat_table.normalize(df) 行错误仍然相同

解决方法

在用文字字符串替换 flat_table.normalize()ObjectId() 对象后,我在 datetime.datetime() 上没有遇到错误。不确定这是库错误还是功能。

数据

我假设您的数据以 dict 类型存储,因此我尝试通过 dict 将您粘贴的数据恢复为 ast.literal_eval()。由于此方法对对象有问题,因此需要将它们引用出来。

import pandas as pd
import io
import ast
import re
import flat_table

df = pd.read_csv(io.StringIO("""
   behaviour_attributes 
0  {'className': 'behaviour','type': 'behaviour','verb': 'can_perform_stw_everything','bs': [{'bid': ObjectId('6050da979198a053c3a02484'),'n': 'Can Perform Spin Wheel Everything','ao': datetime.datetime(2021,4,6,266000),'bs': 'CountLimitException','tids': [ObjectId('605073cb9198a053c39d7a4d')],'tags': [{'tid': ObjectId('605073cb9198a053c39d7a4d'),'prsn': True}],'prz': {'ch': False,'pts': [{'pid': ObjectId('6050d99e9198a053c3a01bee'),'pts': 0,'eo': datetime.datetime(2021,8,18,0)}]}}]} 
1  {'className': 'behaviour','verb': 'game_escape_run','md': [{'n': 'total_score','v': '32'},{'n': 'game_id','v': '3'}],'bs': [{'bid': ObjectId('6050dba29198a053c3a02e4d'),'n': 'Game Escape Run',5,1,230000),'bs': 'OK','tids': [ObjectId('605073769198a053c39d77f1'),ObjectId('605071569198a053c39d6ab9')],'tags': [{'tid': ObjectId('605071569198a053c39d6ab9'),'prsn': True},{'tid': ObjectId('605073769198a053c39d77f1'),'pts': [{'pid': ObjectId('6050d9689198a053c3a019f8'),'pts': 1,0)}],'at': {'tids': [ObjectId('605073769198a053c39d77f1'),'prsn': True}]}}}]}
"""),sep=r"\s{2,}",engine='python')

def restore_dict(s: str):
    """Restore dictionary by quoting out special objects."""
    s1 = re.sub(r"ObjectId\('([^)]*)'\)",r"'ObjectId(\1)'",s)
    s2 = re.sub(r"datetime\.datetime\(([^)]*)\)",r"'datetime.datetime(\1)'",s1)
    return ast.literal_eval(s2)
df["behaviour_attributes"] = df["behaviour_attributes"].apply(restore_dict)

结果

df2 = flat_table.normalize(df)

# remove long prefix in column names for printing
df2.columns = [s.replace("behaviour_attributes.","") for s in df2.columns]
print(df2)

    index md.v         md.n  ...                        verb       type  className
0       0  NaN          NaN  ...  can_perform_stw_everything  behaviour  behaviour
1       1   32  total_score  ...             game_escape_run  behaviour  behaviour
2       1   32  total_score  ...             game_escape_run  behaviour  behaviour
3       1   32  total_score  ...             game_escape_run  behaviour  behaviour
4       1   32  total_score  ...             game_escape_run  behaviour  behaviour
5       1   32  total_score  ...             game_escape_run  behaviour  behaviour
6       1   32  total_score  ...             game_escape_run  behaviour  behaviour
7       1   32  total_score  ...             game_escape_run  behaviour  behaviour
8       1   32  total_score  ...             game_escape_run  behaviour  behaviour
9       1    3      game_id  ...             game_escape_run  behaviour  behaviour
10      1    3      game_id  ...             game_escape_run  behaviour  behaviour
11      1    3      game_id  ...             game_escape_run  behaviour  behaviour
12      1    3      game_id  ...             game_escape_run  behaviour  behaviour
13      1    3      game_id  ...             game_escape_run  behaviour  behaviour
14      1    3      game_id  ...             game_escape_run  behaviour  behaviour
15      1    3      game_id  ...             game_escape_run  behaviour  behaviour
16      1    3      game_id  ...             game_escape_run  behaviour  behaviour
[17 rows x 20 columns]