如何使用PDF :: API2基于书签拆分多文档PDF

问题描述

|| 是否可以使用PDF :: API2基于书签拆分多文档PDF?例如,如果myfile.pdf包含以下书签: 书签1 书签2 书签3 然后需要将其拆分为以下单个PDF文件: 书签1.pdf 书签2.pdf 书签3.pdf 我在PDF :: API2的文档中找不到任何书签术语。是指轮廓吗? 谢谢!     

解决方法

我在Perl中尝试了一下,然后放弃了,将辛苦的工作投入到pdftk中。我仍然从Perl控制它。这是一个示例脚本,其中我的书签的标题为\“ Chapter 1 \”和\“ Appendix 1 \”。您可能可以改编此脚本,但要意识到某些内容是我所特有的。我还使用了一些新功能,但是如果您不想使用Perl 5.13,则可以轻松地切换出这些部分:
use 5.013;

use Data::Dumper;
use File::Basename;
use File::Spec::Functions;
use File::Path qw(make_path);

my $pdftk = \'pdftk\';


    my $file = $ARGV[0];
    say (\"\\n$0 <FILENAME>\") && exit 1 unless $file;

my $dir  = dirname( $file ) || \'.\';
my $output_dir = $ARGV[1] || $dir;

unless( -e $output_dir ) {
    make_path $output_dir,{ mode => 0755 } unless -e $output_dir;
    die \"mkdir failed: $!\" unless -e $output_dir;
    }


my $string = `$pdftk @{[quotemeta($file)]} dump_data output -`;

my( $last_page ) = $string =~ m/NumberOfPages: (\\d+)/;
say \"last page is $last_page\";

my $regex = qr/
    BookmarkTitle:      \\s+ (?<title>.*?) \\s+
    BookmarkLevel:      \\s+ (?<level>\\d+) \\s+
    BookmarkPageNumber: \\s+ (?<page>\\d+)
    /x;

my @page_numbers;
while( $string =~ /$regex/g ) {
    next unless $+{level} == 1;
    push @page_numbers,[ @+{ qw(title page) } ];
    }

say \"Last index is $#page_numbers\";

# Chapter&#160;1.&#160;Introduction
while( my( $index,$elem ) = each @page_numbers ) {
    last if $index == $#page_numbers;
    $page_numbers[$index]->[0] =~ s/&#160;/ /g;
    unshift @$elem,$page_numbers[$index]->[0] =~ s/(?:Chapter|Appendix)\\s+(\\d+|[ABC]|).?\\s+//g

            ?
        $1
            :
        \'XX\';
    last if $index == $#page_numbers;

    push @$elem,$page_numbers[$index+1]->[-1] - 1;     
    }
unshift @{ $page_numbers[-1] },\'XX\';
push @{ $page_numbers[-1] },$last_page;

print Dumper( \\@page_numbers );

# pdftk A=one.pdf B=two.pdf cat A1-7 B1-5 A8 output combined.pdf
foreach my $elem ( @page_numbers ) {
    my $chapter = $elem->[1] =~ s/\\s+/_/rg;
    my $filename = catfile( $output_dir,\"$elem->[0].$chapter.pdf\" );
    say \"Splitting Chapter $elem->[0] $elem->[1]\";
    print \"Running \",join \' \',$pdftk,$file,\'cat\',\"$elem->[2]-$elem->[3]\",\'output\',$filename,\"\\n\";
    system $pdftk,$filename;
    }