问题描述
我拿了this C# example并尝试以PdfDocument的形式获取附件,但是我不知道该怎么做。
最后,我想简单地将投资组合中包含的每个pdf文件合并为一个“正常” pdf文件。每个非PDF附件都应忽略。
编辑:
(好的,很抱歉,我太含糊了。我只是想说出我想实现的目标,只是想让你们更容易帮助我。我不想让您为我编写程序。)>
因此,这是链接示例中的部分代码:
protected void ManipulatePdf(String dest)
{
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC),new PdfWriter(dest));
PdfDictionary root = pdfDoc.GetCatalog().GetPdfObject();
PdfDictionary names = root.GetAsDictionary(PdfName.Names);
PdfDictionary embeddedFiles = names.GetAsDictionary(PdfName.EmbeddedFiles);
PdfArray namesArray = embeddedFiles.GetAsArray(PdfName.Names);
// Remove the description of the embedded file
namesArray.Remove(0);
// Remove the reference to the embedded file.
namesArray.Remove(0);
pdfDoc.Close();
}
除了要从源文档中删除任何内容外,我想知道如何在可能的情况下从PdfArray中获取PdfDocument对象。
样本文件: http://www.mediafire.com/file/c4tw07wci8swdx9/NPort_5000.pdf/file
将mkl解决方案移植到C#:
PdfNameTree embeddedFilesTree = pdfDocument.GetCatalog().GetNameTree(PdfName.EmbeddedFiles);
IDictionary<string,PdfObject> embeddedFilesMap = embeddedFilesTree.GetNames();
List<PdfStream> embeddedPdfs = new List<PdfStream>();
foreach (PdfObject pdfObject in embeddedFilesMap.Values)
{
if (!(pdfObject is PdfDictionary))
continue;
PdfDictionary filespecDict = (PdfDictionary)pdfObject;
PdfDictionary embeddedFileDict = filespecDict.GetAsDictionary(PdfName.EF);
if (embeddedFileDict == null)
continue;
PdfStream embeddedFileStream = embeddedFileDict.GetAsStream(PdfName.F);
if (embeddedFileStream == null)
continue;
PdfName subtype = embeddedFileStream.GetAsName(PdfName.Subtype);
if (PdfName.ApplicationPdf.CompareTo(subtype) != 0)
continue;
embeddedPdfs.Add(embeddedFileStream);
}
if (embeddedPdfs.Count > 0)
{
PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf",new WriterProperties().SetFullCompressionMode(true));
PdfDocument flatPdfDocument = new PdfDocument(pdfWriter);
PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
foreach (PdfStream pdfStream in embeddedPdfs)
{
PdfReader embeddedReader = new PdfReader(sourceFactory.CreateSource(pdfStream.GetBytes()),new ReaderProperties());
PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader);
pdfMerger.Merge(embeddedPdfDocument,1,embeddedPdfDocument.GetNumberOfPages());
}
flatPdfDocument.Close();
}
解决方法
要将PDF包中的所有pdf文件合并为普通pdf文件,您必须遍历 EmbeddedFiles 的名称树,检索其中所有PDF的流,并然后合并所有这些PDF。
对于加载到stepFunction1
(Java版本; OP将C#的端口编辑到他的问题正文中)的端口中,您可以按照以下步骤进行操作:
PdfDocument pdfDocument
(FlattenPortfolio测试PdfNameTree embeddedFilesTree = pdfDocument.getCatalog().getNameTree(PdfName.EmbeddedFiles);
Map<String,PdfObject> embeddedFilesMap = embeddedFilesTree.getNames();
List<PdfStream> embeddedPdfs = new ArrayList<PdfStream>();
for (Map.Entry<String,PdfObject> entry : embeddedFilesMap.entrySet()) {
PdfObject pdfObject = entry.getValue();
if (!(pdfObject instanceof PdfDictionary))
continue;
PdfDictionary filespecDict = (PdfDictionary) pdfObject;
PdfDictionary embeddedFileDict = filespecDict.getAsDictionary(PdfName.EF);
if (embeddedFileDict == null)
continue;
PdfStream embeddedFileStream = embeddedFileDict.getAsStream(PdfName.F);
if (embeddedFileStream == null)
continue;
PdfName subtype = embeddedFileStream.getAsName(PdfName.Subtype);
if (!PdfName.ApplicationPdf.equals(subtype))
continue;
embeddedPdfs.add(embeddedFileStream);
}
Assert.assertFalse("No embedded PDFs found",embeddedPdfs.isEmpty());
try ( PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf",new WriterProperties().setFullCompressionMode(true));
PdfDocument flatPdfDocument = new PdfDocument(pdfWriter) ) {
PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
for (PdfStream pdfStream : embeddedPdfs) {
try ( PdfReader embeddedReader = new PdfReader(sourceFactory.createSource(pdfStream.getBytes()),new ReaderProperties());
PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader)) {
pdfMerger.merge(embeddedPdfDocument,1,embeddedPdfDocument.getNumberOfPages());
}
}
}
)