使用iText7 C#解析/读取PDF文档

问题描述

我正在尝试使用iText7库升级代码。 以前我使用iTextSharp库 但是看起来iText7是全新的 我尝试读取pdf文档,但是在“找不到Pdf标头”之间遇到异常。 这是我的代码

byte[] bytes = System.Convert.FromBase64String(UploadedFileByes);

MemoryStream memory = new MemoryStream(bytes);
            BinaryReader BRreader = new BinaryReader(memory);
            StringBuilder text = new StringBuilder();


            iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory);
            iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(new iText.Kernel.Pdf.PdfReader(memory));



            int numberofpages = pdfDoc.GetNumberOfPages();
            for (int page = 1; page <= numberofpages; page++) {
                iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
                string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy);
                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(
                    Encoding.Default,Encoding.UTF8,Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }

我在做什么错了?

解决方法

我找到了解决方案。我使用了我定义的pdfreader,而不是创建一个新的。这是代码。希望对别人有帮助。

byte[] bytes = System.Convert.FromBase64String(UploadedFileByes);

MemoryStream memory = new MemoryStream(bytes);
            BinaryReader BRreader = new BinaryReader(memory);
            StringBuilder text = new StringBuilder();


            iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory);
            iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(iTextReader);



            int numberofpages = pdfDoc.GetNumberOfPages();
            for (int page = 1; page <= numberofpages; page++) {
                iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
                string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy);
                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(
                    Encoding.Default,Encoding.UTF8,Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...