问题描述
我在Windows 10 jre 1.8.0_241上使用Apache Tika,并且已经使用ant导入了Tika 1.24.1。我有以下代码从PDF中提取内容:
public class TikaExtraction {
public static void main(final String[] args) throws IOException,TikaException {
//Assume sample.txt is in your current directory
File file = new File("C:\\Users\\myPC\\Desktop\\testPDF.pdf");
//Instantiating Tika facade class
Tika tika = new Tika();
String filecontent = tika.parseToString(file);
System.out.println("Extracted Content: " + filecontent);
}
}
得到以下异常:
Exception in thread "main" org.apache.tika.exception.TikaException: Failed to close temporary resources
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:150)
at org.apache.tika.Tika.parseToString(Tika.java:527)
at com.oracle.cegbu.filesearch.service.kafka.TikaExtraction.main(TikaExtraction.java:28)
Caused by: java.nio.file.FileSystemException: C:\Users\myPC\AppData\Local\Temp\apache-tika-6518312717498705085.tmp: The process cannot access the file because it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
at java.nio.file.Files.delete(Unknown Source)
at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:84)
at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:145)
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:172)
... 3 more
解决方法
让我们尝试在资源尝试中使用inputStream而不是File
public class TikaExtraction {
public static void main(final String[] args) throws IOException,TikaException {
//Assume sample.txt is in your current directory
Tika tika = new Tika();
File file = new File("C:\\Users\\myPC\\Desktop\\testPDF.pdf");
//Instantiating Tika facade class
try(InputStream inputStream = new FileInputStream(file)) {
String filecontent = tika.parseToString(inputStream);
System.out.println("Extracted Content: " + filecontent);
}
}
}