问题描述
我了解如何使用 quanteda 构建语料库和 dfm。 我也了解如何使用 spacy_parse 来词形还原文本或语料库对象。
但我不明白如何在我的语料库中用引理替换原始文本标记。
我希望是这样的:
corpus(my_txt) %>%
dfm(lemmatize = spacy_parse)
生成引理矩阵,例如:
be have go
first_text 2 6 6
second_text 4 4 2
third_text 6 4 3
相反,我找到的唯一解决方案是从 spacy_parse 输出数据框中的“引理”列中重新组合词形还原文本,使用如下代码:
txt_parsed %>%
select(doc_id,lemma) %>%
group_by(doc_id) %>%
summarise(new_txt = str_c(lemma,collapse = " "))
对更好的解决方案有什么建议吗?
解决方法
您可以使用 quanteda::as.tokens()
将 spacy_parsed 对象转换为令牌。在此之前,您可以将 spacy_parsed 对象的 token 列交换为 lemma 列。
txt <- c("I like having to be going.","Then I will be gone.","I had him going.")
library("spacyr")
sp <- spacy_parse(txt,lemma = TRUE,entity = FALSE,pos = FALSE)
## Found 'spacy_condaenv'. spacyr will use this environment
## successfully initialized (spaCy Version: 2.3.2,language model: en_core_web_sm)
## (python options: type = "condaenv",value = "spacy_condaenv")
sp$token <- sp$lemma
library("quanteda")
## Package version: 3.0.0
## Unicode version: 10.0
## ICU version: 61.1
## Parallel computing: 12 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
as.tokens(sp) %>%
dfm()
## Document-feature matrix of: 3 documents,9 features (37.04% sparse) and 0 docvars.
## features
## docs -pron- like have to be go . then will
## text1 1 1 1 1 1 1 1 0 0
## text2 1 0 0 0 1 1 1 1 1
## text3 2 0 1 0 0 1 1 0 0
由 reprex package (v2.0.0) 于 2021 年 4 月 12 日创建
,其实我找到了一个更简单的解决方案,那就是在 as.tokens 函数中使用 use_lemma = T 选项。 示例:
try {
ListenableFuture<SendResult<String,String>> futureResult = this.kafkaTemplate.send(topicName,message);
futureResult.addCallback(new ListenableFutureCallback<SendResult<String,String>>() {
@Override
public void onSuccess(SendResult<String,String> result) {
log.info("Message successfully sent to topic {} with offset {} ",result.getRecordMetadata().topic(),result.getRecordMetadata().offset());
}
@Override
public void onFailure(Throwable ex) {
FAILMESSAGELOGGER.info("{},{}",topicName,message);
log.info("Unable to send Message to topic {} due to ",ex);
}
});
} catch (Exception e) {
log.error("Outer Exception occured while sending message {} to topic {}",new Object[] { message,e });
FAILMESSAGELOGGER.info("{},message);
}