清理后将URL的csv数据直接加载到mysql表中

问题描述

我想将URL“ https://www.treasury.gov/ofac/downloads/sdn.csv”中给出的数据直接加载到名为sdn的表中。 我唯一要做的更改是将所有具有该值的列的所有'-0-'替换为'

我尝试使用熊猫来做到这一点,但是我的方法看起来并不干净。

import requests
import pandas as pd


sdnURL = "https://www.treasury.gov/ofac/downloads/sdn.csv"
altURL = "https://www.treasury.gov/ofac/downloads/alt.csv"
addURL = "https://www.treasury.gov/ofac/downloads/add.csv"
sdnCommentsURL = "https://www.treasury.gov/ofac/downloads/sdn_comments.csv"

sdnHeader = ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"]
altHeader = ["sdn_id","alt_id","alt_type","alt_name","remarks"]
addHeader = ["sdn_id","address_id","address","city_state_post","country","remarks"]
sdnCommentsHeader = ["sdn_id","remarks"]


sdn = pd.read_csv(sdnURL,names = sdnHeader,header = None)
alt = pd.read_csv(altURL,names = altHeader,header = None)
add = pd.read_csv(addURL,names = addHeader,header = None)
sdnComments = pd.read_csv(sdnCommentsURL,names = sdnCommentsHeader,header = None)

sdn.to_csv('sdn.csv',index = False)
alt.to_csv('alt.csv',index = False)
add.to_csv('add.csv',index = False)
sdnComments.to_csv('sdnComments.csv',index = False)

我还打算将csv加载到MysqL表中。 我的方法有两个问题-

  1. 我不想为每个文件编写命令。
  2. 一次性替换所有列中的“ -0-”

最终编辑:感谢@Jimmar的回答,我最终最终编写了这样的代码-

import requests
import pandas as pd

files = {
         "sdn" : ["sdn_id","remarks"],"alt" : ["sdn_id","add" : ["sdn_id","sdn_comments" : ["sdn_id","remarks"]
        }

def fetch_csv(file,headers):
    df = pd.read_csv("https://www.treasury.gov/ofac/downloads/"+file+".csv",names=headers,header=None)
    df = df.replace('-0- ','')
    df.to_csv(file+'.csv',index=False)

for file,headers in files.items():
    fetch_csv(file,headers)

解决方法

您可以通过这种方式来组织代码(我只做2个)

import requests
import pandas as pd

def fetch_csv(url,headers,file_name):
    df = pd.read_csv(url,names=headers,header=None)
    df = df.replace('-0- ','')
    df.to_csv(file_name,index=False)

sources = [
    {
       "url": "https://www.treasury.gov/ofac/downloads/sdn.csv","headers": ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"],"file_name": "sdn.csv"
    },{
       "url": "https://www.treasury.gov/ofac/downloads/alt.csv","headers":  ["sdn_id","alt_id","alt_type","alt_name","file_name": "alt.csv"
    } # add the rest in the same pattern
]

for source in sources:
    fetch_csv(source['url'],source['headers'],source['file_name'])

如果需要将其写入数据库,则应将df.to_csv行替换为to_sql

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...