问题描述
我尝试了三种不同的方法将字符串值列表(内容)添加到现有df(all_df)的新列中,但是每次由于列表而出现错误时,都会比较2列如果相同,则复制内容。即它匹配两列并相应地分配值。匹配可以完美完成,但是输出列表不会以任何方式出现在DF中。
我进行了搜索,但找不到解决方案。请帮助。
content[]
for i in range(len(col1)):
for j in range(len(col2)):
a=(col1[i])[0:5]
b=(col2[j])[0:5]
if(a==b):
val=con[j]
break
else:
val="Daily Update"
content.append(val)
print(content)
#内容输出: “”“ [“动机帖子”,“意识帖子”,“意识帖子”,“产品帖子”,“节日日帖子”,“每日更新”,“节日日帖子”,“一般帖子”,“产品帖子” ,“意识帖子”,“动机帖子”,“产品帖子”,“动机帖子”,“意识帖子”,“每日更新”,“产品帖子”,“动机帖子”,“一般帖子”,“产品帖子” ,'Festival Days Post']“”“
#(first approach)
all_df.insert(loc=0,column='Content Bucket',value=content)
"""error:Traceback (most recent call last):
File "C:/Users/Desktop/analytics/twitter/demo.py",line 43,in <module>
all_df.insert(loc=0,value=content)
TypeError: insert() takes no keyword arguments
"""
#(second approach)
all_df['Content Bucket']=np.array(content)
"""error:Traceback (most recent call last):
File "C:/Users/Desktop/analytics/twitter/demo.py",line 45,in <module>
all_df['Content Bucket']=np.array(content)
TypeError: list indices must be integers or slices,not str
"""
#(third approach)
dftemp = pd.DataFrame(data=content,columns=["Content Bucket"])
dft=pd.concat(dftemp,all_df)
"""error:Traceback (most recent call last):
File "C:/Users/jeshal/Desktop/analytics/twitter/demo.py",line 47,in <module>
dft=pd.concat(dftemp,all_df)
File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\reshape\concat.py",line 271,in concat
op = _Concatenator(
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\reshape\concat.py",line 306,in __init__
raise TypeError(
TypeError: first argument must be an iterable of pandas objects,you passed an object of type "DataFrame"
"""
all_df.to_excel("mergedt.xlsx",index=False)
解决方法
如果要将其添加为列,则:
df_content = pd.DataFrame(content,columns=['Content Bucket'])
all_df = all_df.append(df_content)
,
这种方法似乎有效。
import pandas as pd
import numpy as np
def main():
new_column = ['','A','B','','C','D']
df = pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
print(df)
print('\n',len(new_column),new_column,'\n')
df['Content Bucket'] = new_column
print (df)
这将产生以下结果:
A B C D
0 -1.179613 -0.374270 -0.214203 -0.400627
1 0.664314 -1.339739 0.740338 -1.637909
2 1.394077 -0.709522 1.119306 0.478199
3 0.733929 0.714355 -2.518329 -1.076162
4 0.811021 1.296503 0.280754 0.053859
5 0.419472 -0.541438 -0.215574 -0.322361
6 ['','D']
A B C D Content Bucket
0 -1.179613 -0.374270 -0.214203 -0.400627
1 0.664314 -1.339739 0.740338 -1.637909 A
2 1.394077 -0.709522 1.119306 0.478199 B
3 0.733929 0.714355 -2.518329 -1.076162
4 0.811021 1.296503 0.280754 0.053859 C
5 0.419472 -0.541438 -0.215574 -0.322361 D