问题描述
我有一个包含很多列的DataFrame。每列都有以该列命名的csv文件目录,但不完全相同。列的名称包含在其文件名中,外加一些字母和单词。我想做的是为每列创建一个正则表达式,以便让我获取文件名,将其导入到pandas并合并两个数据框,但是我无法在前面添加“ r”模式。
这就是我想要做的:
import re
import pandas as pd
data={"one":[1,2,3,4,5],"two":[6,7,8,9,10]}
left_df=pd.DataFrame(data)
routes={"wordsone.csv":"c:\route\route\one.csv","wordstwo.csv":"c:\route\route\two.csv"}
column_names=list(left_df.columns)
for i in column_names:
pattern="\w*"+i+"\w*\.csv"
# This pattern will be used to get the file name associated to the column name
filename=re.findall(pattern,list(routes.keys()))
#here i'm expecting to get the name of the file
filepath=routes[filename]
#here im expecting to get the file rout
right_df=pd.read_csv(filepath)
#Create a Dataframe to merge with left_dataframe
left_df=pd.merge(left_df,right_df,how="left",on=i)
#Add right_DF to left_df
return left_df
但是出现以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-257-0d70c9171983> in <module>
9 for i in column_names:
10 pattern="\w*"+i+"\w*\.csv"
---> 11 filename=re.findall(pattern,list(routes.keys()))
12 filepath=routes[filename]
13 right_df=pd.read_csv(filepath)
~\Anaconda3\lib\re.py in findall(pattern,string,flags)
239
240 Empty matches are included in the result."""
--> 241 return _compile(pattern,flags).findall(string)
242
243 def finditer(pattern,flags=0):
TypeError: expected string or bytes-like object
我已经尝试了很多方法来获得类似r“ pattern”的内容,但是每次python都会更改或删除“”或“”。
解决方法
re.findall需要一串变量,而不是您要传递的变量列表。从密钥列表中创建字符串,然后将其传递给findall。 另外,请注意,反斜杠会产生问题,因此要打印这些反斜杠而不使用repr()来避开每个转义字符
keys_list = list(routes.keys())
print(keys_list)
keys_string = ','.join(keys_list)
print(keys_string)
for i in column_names:
pattern="\w*"+i+"\w*\.csv"
filename = re.findall(pattern,keys_string)
print(filename)
filepath = routes[filename[0]]
print(repr(filepath))
Output from print statements:
['wordsone.csv','wordstwo.csv']
wordsone.csv,wordstwo.csv
['wordsone.csv']
'c:\route\route\\one.csv'
['wordstwo.csv']
'c:\route\route\two.csv'