您如何确保节 CoreNLPClient 有一个可行的端点？

问题描述

我想使用节 CoreNLPClient 来提取名词短语，类似于 this method。

但是，我似乎找不到启动服务器的好端口。默认是9000，但是这个经常被占用，如报错信息所示：

PermanentlyFailedException：错误：无法启动 CoreNLP 服务器在端口 9000 上（可能有东西在那里运行）

编辑：端口 9000 正在被 python.exe 使用，这就是为什么我不能关闭进程来为 CoreNLPClient 腾出空间。

然后，当我选择7999、8000或8080等其他端口时，服务器一直无限监听，不执行连续的代码行，只显示以下内容：

2021-07-19 12:05:55 信息：使用命令启动服务器：java -Xmx8G -cp C:\Users\timjo\stanza_corenlP* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 7998 -timeout 60000 -线程 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-2e15724b8064491b.props -preload -outputFormat 序列化

我安装了最新版本的 stanza，并且正在 VS Code 中从 .ipynb 文件运行以下代码：

# sample sentence
sentence = "Albert Einstein was a German-born theoretical physicist." 

# start the client as indicated in the docs
with CoreNLPClient(properties='corenlp_server-2e15724b8064491b.props',endpoint='https://localhost:7998',memory='8G',be_quiet=True) as client:
     matches = client.tregex(text=sentence,pattern = 'NP')

# extract the noun phrases and their indices
noun_phrases = [[text,begin,end] for text,end in
     zip([sentence[match_id]['spanString'] for sentence in matches['sentences'] for match_id in sentence],[sentence[match_id]['characterOffsetBegin'] for sentence in matches['sentences'] for match_id in sentence],[sentence[match_id]['characterOffsetEnd'] for sentence in matches['sentences'] for match_id in sentence])]

主要问题：如何确保服务器在打开的端口上启动，然后关闭？我更喜欢使用半自动方式为客户端查找打开/关闭占用的端口继续运行。

解决方法

经过 2 小时的研究，我现在知道以下几点：

使用端口 9000 不是一个选项，因为它被 python 使用。非正式证据表明，这与使用 jupyter notebook 而不是“常规”python .py 文件有关。
关于在使用其他端点时客户端不关闭：我应该简单地使用 http://localhost:port' 而不是 https://...。

希望这可以帮助其他人解决这个问题。我想这是我的非计算机科学背景渗透出来的。

（已编辑以解决错别字）

一般来说，选择另一个没有其他人使用的数字就足够了——也许是 9017？有很多数字可供选择！但更谨慎的选择是使用 try/catch 在 while 循环中创建 CoreNLPClient 并增加端口号，直到找到一个打开的端口。

nlp python-3.7 stanford-nlp stanza