使用 resume_parser 读取文件时出错

问题描述

使用 resume_parser python 模块读取文件时出现 Tika Server Jar 文件错误文件格式为 pdf/doc/docx。它发出警告:

import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

embeddings1 = ["I'd like an apple juice","An apple a day keeps the doctor away","Eat apple every day","We buy apples every week","We use machine learning for text classification","Text classification is subfield of machine learning"]
embeddings1 = embed(embeddings1)

embeddings2 = ["I'd like an orange juice","An orange a day keeps the doctor away","Eat orange every day","We buy orange every week","We use machine learning for document classification","Text classification is some subfield of machine learning"]
embeddings2 = embed(embeddings2)

print(cosine_similarity(embeddings1,embeddings2))

array([[ 0.7882168,0.3366559,0.22973989,0.15428472,-0.10180502,-0.04344492],[ 0.256085,0.7713026,0.32120776,0.17834462,-0.10769081,-0.09398925],[ 0.23850328,0.446203,0.62606746,0.25242645,-0.03946173,-0.00908459],[ 0.24337521,0.35571027,0.32963073,0.6373588,0.08571904,-0.01240187],[-0.07001016,-0.12002315,-0.02002328,0.09045915,0.9141338,0.8373743 ],[-0.04525191,-0.09421931,-0.00631144,-0.00199519,0.75919366,0.9686416 ]]

Python 脚本

2021-05-22 18:12:05,899 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to C:\Users\Users\AppData\Local\Temp\tika-server.jar.md5.
INFO:tika.tika:Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to C:\Users\Users\AppData\Local\Temp\tika-server.jar.md5.
ERROR:root:Error in docx file:: <urlopen error [WinError 10061] No connection Could be made because the target machine actively refused it>
  1. 尝试如下设置环境变量:没用。

TIKA_SERVER_JAR = http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar

  1. 尝试下载 jar 文件并在环境变量中设置该路径,这也不起作用。

预期输出是包含简历中详细信息的字典,例如电子邮件、技能、电话号码、大学、公司信息等,

它工作了前几次,然后我收到了这个错误

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)