将PDF文件转换为多页图像

问题描述

我正在尝试使用PyMuPDF将多页PDF文件转换为图像:

<?PHP
$str = '["userdomain.ltd"],["test.com"]';

$arr = explode(',',$str);

foreach ($arr as $el) {
    echo json_decode($el)[0];
    echo "\n";
}

但是我需要在多页tiff中将PDF文件的所有页面转换为单个图像,当我给page参数设置页面范围时,它只占用一页,有人知道我能做到吗?

解决方法

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip","lzw","group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom,zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB",[pix.width,pix.height],pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,save_all=True,append_images=image_list[1:],compression=compression,dpi=(300,300),)
,

要转换PDF的所有页面时,需要一个for循环。另外,在调用.getPixmap()时,您需要matrix = mat之类的属性来基本上提高分辨率。这是代码段(不确定这是否是您想要的,但这会将所有PDF转换为图像):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom,zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

为了解决问题,请从Github上here is a good example进行演示,以了解其含义以及在需要时如何将其用于您的案件。