Rasa RegexFeaturizer 是基于标记还是整句？

问题描述

- regex: regex features for intent classification
  examples: |
    - \bon road pric/i
    - \bonroad pric/i

我已经测试了上面的正则表达式，它们工作正常。因此我确信正则表达式没有问题

示例：

training-row-1] Please tell me on road price Now.  
training-row-2] Please tell me price Now.

基于上述正则表达式模式，应该添加的正则表达式功能是：

training-row-1] Please tell me on road price Now. ==> TRUE (because regex match)
training-row-2] Please tell me price Now.         ==> FALSE (regex don't match)

我的问题是，在 RegexFeaturizer 中，正则表达式匹配发生在整个句子还是每个标记上？在整个句子中使用它是有意义的。

以上我假设的特征化是否正确？

解决方法

我在 RegexFeaturizer 的代码中找到了以下文档字符串。

"""
Given a sentence,returns a vector of {1,0} values indicating which
regexes did match. Furthermore,if the message is tokenized,the 
function will mark all tokens with a dict relating the name of the 
regex to whether it was matched.
"""

所以我认为将整个句子作为输入。在 Rasa 中很难看到特征空间的内部，但我已经确认在使用 RegexEntityExtractor 时跨令牌拾取了正确的实体。这可以通过在 NLU 数据中临时添加实体示例（确保它在意图中至少出现两次）并运行 rasa interactive 来轻松验证。

rasa rasa-nlu

Rasa RegexFeaturizer 是基于标记还是整句？

问题描述

解决方法

相关问答