如何在Python中重塑数据

问题描述

我有一个仅包含一行但多列的数据框:

enter image description here

我想将每5列放到新行中。这是预期的输出

enter image description here

原始数据在列表中,我转换为数据框。我不知道通过列表重塑是否更容易,但是这里有一个示例列表供您试用,原始列表确实很长。 ['review: I stayed around 11 days and enjoyed stay very much.','compound: 0.5106,','neg: 0.0,'neu: 0.708,'pos: 0.292,'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0,'neu: 1.0,'pos: 0.0,']

解决方法

将其解析为列表,然后将其转换为数据框更容易。

  • 对于每个条目,请用“:”分隔该条目,然后将键\值添加到字典中
  • 将字典转换为数据框

尝试一下:

import pandas as pd

lst = ['review: I stayed around 11 days and enjoyed stay very much.','compound: 0.5106,','neg: 0.0,'neu: 0.708,'pos: 0.292,'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0,'neu: 1.0,'pos: 0.0,']

dd = {}

for x in lst:
   sp = x.split(':')
   if sp[0] in dd:
      dd[sp[0]].append(sp[1].replace(',"").strip())
   else:
      dd[sp[0]] = [sp[1].replace(',"").strip()]
      
print(dd)
print(pd.DataFrame(dd).to_string(index=False))

输出

                                                       review compound  neg    neu    pos
          I stayed around 11 days and enjoyed stay very much.   0.5106  0.0  0.708  0.292
 Plans for weekend stay canceled due to Coronavirus shutdown.      0.0  0.0    1.0    0.0
,

def main():

=MID(A1,FIND("/",A1)+1,A1,A1)+1)-FIND("/",A1)-1)

main()

,

您可以尝试使用字典

lst = ['review: I stayed around 11 days and enjoyed stay very much.',']

from collections import defaultdict
import pandas as pd

data_dict = defaultdict(list)
for _ in lst:
    header,value = _.split(':')
    data_dict [header].append(value.strip())

pd.DataFrame.from_dict(data_dict)

输出为 enter image description here

,

您可以使用numpy轻松做到这一点

import numpy as np
import pandas as pd
lis = np.array(['review: I stayed around 11 days and enjoyed stay very much.','])


columns = 5
t = np.char.split(lis,":")
cols,vals = list(zip(*t))
dff = pd.DataFrame(np.split(np.array(vals),len(vals)/columns),columns=cols[:columns]).replace(",","",regex=True)