如何将多行折叠为一个并创建一系列列元素Python Pandas

问题描述

我有一个数据框，如下所示：

                tags    categories            classification
          0    label    ['legislative','law,govt and 
                         politics','exe...        None
          0   document  ['legislative',govt and politics','exe...                 NaN
          0     text    ['legislative','exe...                   NaN
          0     paper   ['legislative',govt and 
                        politics','exe...          NaN
          0     poster  ['legislative','exe... NaN

我想创建一个新的数据框，在其中我可以将上面的数据框折叠为下面的一个，以便将“标签”和“分类”列的列元素转换为单行，并具有列表格式的单个项，例如>

                tags     categories           classification
       0   ['label',['legislative',['None','NaN','document',govt and          'NaN','text',politics','exe...    'NaN']                
         'paper',poster']

我该怎么做？如何使用堆栈或按功能分组以获取结果？预先感谢。

*这是df.to_dict（）的结果

           {'tags': {0: ' letter',1: ' head',2: ' water',3: ' art',4: ' indoors',5: ' flyer',6: ' poster',...},'categories': {0: "['legislative','executive branch','work','society','government']",1: "['unrest and war','religion and spirituality','buddhism']",2: '[]',3: '[]',4: "['unemployment','foreign policy','politics','armed forces']",5: '[]',6: "['sports','wrestling']",'classfication': {0: nan,1: nan,2: nan,3: nan,4: nan,5: nan,6: nan,...}}

解决方法

我没有完全回答您的问题。但是你想要这样的东西吗？

df：

    trial_num   subject samples
0   1           1       [-1.74,-0.78,-0.11]
1   2           1       [0.86,0.21,-0.01]
2   3           1       [2.04,0.6,-0.79]
3   1           2       [0.52,0.49,1.56]
4   2           2       [0.07,0.84,-1.1]
5   3           2       [0.43,-1.3,1.99]

转换后的df：

     trial_num          subject             samples
0   [1,2,3,1,3]  [1,2]  [[-1.74,-0.11],[0.86,-0.0...trial_num   subject samples
0   [1,-0.0...

import pandas as pd
df = pd.DataFrame(
    {'trial_num': [1,3],'subject': [1,2],'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)
df = df.astype(str).apply(','.join).apply(lambda x: x.split(',')).to_frame().T

group-by jupyter-notebook pandas pandas python series series