问题描述
我想将URL“ https://www.treasury.gov/ofac/downloads/sdn.csv”中给出的数据直接加载到名为sdn的表中。 我唯一要做的更改是将所有具有该值的列的所有'-0-'替换为'
我尝试使用熊猫来做到这一点,但是我的方法看起来并不干净。
import requests
import pandas as pd
sdnURL = "https://www.treasury.gov/ofac/downloads/sdn.csv"
altURL = "https://www.treasury.gov/ofac/downloads/alt.csv"
addURL = "https://www.treasury.gov/ofac/downloads/add.csv"
sdnCommentsURL = "https://www.treasury.gov/ofac/downloads/sdn_comments.csv"
sdnHeader = ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"]
altHeader = ["sdn_id","alt_id","alt_type","alt_name","remarks"]
addHeader = ["sdn_id","address_id","address","city_state_post","country","remarks"]
sdnCommentsHeader = ["sdn_id","remarks"]
sdn = pd.read_csv(sdnURL,names = sdnHeader,header = None)
alt = pd.read_csv(altURL,names = altHeader,header = None)
add = pd.read_csv(addURL,names = addHeader,header = None)
sdnComments = pd.read_csv(sdnCommentsURL,names = sdnCommentsHeader,header = None)
sdn.to_csv('sdn.csv',index = False)
alt.to_csv('alt.csv',index = False)
add.to_csv('add.csv',index = False)
sdnComments.to_csv('sdnComments.csv',index = False)
我还打算将csv加载到MysqL表中。 我的方法有两个问题-
- 我不想为每个文件编写命令。
- 一次性替换所有列中的“ -0-”
最终编辑:感谢@Jimmar的回答,我最终最终编写了这样的代码-
import requests
import pandas as pd
files = {
"sdn" : ["sdn_id","remarks"],"alt" : ["sdn_id","add" : ["sdn_id","sdn_comments" : ["sdn_id","remarks"]
}
def fetch_csv(file,headers):
df = pd.read_csv("https://www.treasury.gov/ofac/downloads/"+file+".csv",names=headers,header=None)
df = df.replace('-0- ','')
df.to_csv(file+'.csv',index=False)
for file,headers in files.items():
fetch_csv(file,headers)
解决方法
您可以通过这种方式来组织代码(我只做2个)
import requests
import pandas as pd
def fetch_csv(url,headers,file_name):
df = pd.read_csv(url,names=headers,header=None)
df = df.replace('-0- ','')
df.to_csv(file_name,index=False)
sources = [
{
"url": "https://www.treasury.gov/ofac/downloads/sdn.csv","headers": ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"],"file_name": "sdn.csv"
},{
"url": "https://www.treasury.gov/ofac/downloads/alt.csv","headers": ["sdn_id","alt_id","alt_type","alt_name","file_name": "alt.csv"
} # add the rest in the same pattern
]
for source in sources:
fetch_csv(source['url'],source['headers'],source['file_name'])
如果需要将其写入数据库,则应将df.to_csv
行替换为to_sql