使用 python (tabula) 将 PDF 导出到 csv

问题描述

将 PDF 文件导出到 csv 时,它返回错误:writeheader() 需要 1 个位置参数,但已给出 2 个

from tabula import read_pdf
from tabulate import tabulate
import csv

df = read_pdf("asd.pdf")
print(df)


with open('ddd.csv',"w",newline="") as file:
    columns = ['specialty ',"name",'number_of_seats','Total_seats,' "document_type","concent"]
    writer = csv.DictWriter(file,fieldnames=columns)
    writer.writeheader(df)

解决方法

http://theautomatic.net/2019/05/24/3-ways-to-scrape-tables-from-pdfs-with-python/复制的代码,还有更多细节...

import tabula
 
file = "http://lab.fs.uni-lj.si/lasin/wp/IMIT_files/neural/doc/seminar8.pdf"
 
#tables = tabula.read_pdf(file,pages = "all",multiple_tables = True)

# output just the first table in the PDF to a CSV
tabula.convert_into(file,"output.csv",output_format="csv")
 
# output all the tables in the PDF to a CSV
tabula.convert_into(file,output_format="csv",pages='all')

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...