问题描述
从 iTextSharp 5.5.13.2 迁移到 iText 7.1.15 后,我正在测试我的应用程序,并在从特定机构的 PDF 文档中提取文本时遇到异常。这些文件包含>;但是,iTextSharp 能够成功地从这些 PDF 文档中提取所有文本。我也在 iText 7.1.14 中重现了这个异常。
比较 itext7 和 iTextSharp 之间的 PdfEncodings 类后,似乎符号编码在那里。
由于在从同一个 PDF 中提取文本时,此异常仅发生在 iText7 上,而不发生在 iTextSharp 上,因此我认为这是一个错误。
有什么想法吗?
这是个例外:
'SymbolEncoding' is not a supported encoding name.
For information on defining a custom encoding,see the documentation for the Encoding.RegisterProvider method.
Parameter name: name ArgumentException
at System.Globalization.EncodingTable.internalGetCodePageFromName(String name)
at System.Globalization.EncodingTable.GetCodePageFromName(String name)
at iText.IO.Util.IanaEncodings.GetEncodingEncoding(String name)
at iText.IO.Util.EncodingUtil.ConvertToBytes(Char[] chars,String encoding)
at iText.IO.Font.PdfEncodings.ConvertToBytes(String text,String encoding)
at iText.IO.Font.FontEncoding.FillNamedEncoding()
at iText.IO.Font.FontEncoding.CreateFontEncoding(String baseEncoding)
at iText.Kernel.Font.PdfType1Font..ctor(PdfDictionary fontDictionary)
at iText.Kernel.Font.PdfFontFactory.CreateFont(PdfDictionary fontDictionary)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.GetFont(PdfDictionary fontDict)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.SetTextFontOperator.Invoke(PdfCanvasProcessor processor,PdfLiteral operator,IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator,IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes,PdfResources resources)
at PDFKeeper.WindowsApplication.PdfFileInfo.GetText()
at PDFKeeper.WindowsApplication.UploadService.UploadStagedPdfsAndSupplementalData()
at PDFKeeper.WindowsApplication.UploadService.ExecuteUploadCycle()
at System.Threading.Tasks.Task.Execute()
这是我使用 iText7 的应用程序的功能:
Public Function GetText() As String
Using reader = New PdfReader(fileInfo.FullName)
Dim textString As New StringBuilder
Using pdfDoc As New PdfDocument(reader)
For page As Integer = 1 To pdfDoc.GetNumberOfPages
Dim strategy As ITextExtractionStrategy = New LocationTextExtractionStrategy
Dim pageText As String = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy)
Dim lines As String() = pageText.Split(ControlChars.Lf)
For Each line In lines
textString.AppendLine(line)
Next
Next
End Using
Return textString.ToString
End Using
End Function
这与使用 iTextSharp 时的功能相同:
Public Function GetText() As String
Using reader = New PdfReader(fileInfo.FullName)
Dim textString As New StringBuilder
For page As Integer = 1 To reader.NumberOfPages
Try
Dim strategy As ITextExtractionStrategy = New LocationTextExtractionStrategy
Dim pageText As String = PdfTextExtractor.GetTextFromPage(reader,page,strategy)
Dim lines As String() = pageText.Split(ControlChars.Lf)
For Each line In lines
textString.AppendLine(line)
Next
Catch ex As InlineImageParseException
End Try
Next
Return textString.ToString
End Using
End Function
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)