问题描述
- regex: regex features for intent classification
examples: |
- \bon road pric/i
- \bonroad pric/i
我已经测试了上面的正则表达式,它们工作正常。因此我确信正则表达式没有问题
示例:
training-row-1] Please tell me on road price Now.
training-row-2] Please tell me price Now.
training-row-1] Please tell me on road price Now. ==> TRUE (because regex match)
training-row-2] Please tell me price Now. ==> FALSE (regex don't match)
我的问题是,在 RegexFeaturizer 中,正则表达式匹配发生在整个句子还是每个标记上? 在整个句子中使用它是有意义的。
以上我假设的特征化是否正确?
解决方法
我在 RegexFeaturizer
的代码中找到了以下文档字符串。
"""
Given a sentence,returns a vector of {1,0} values indicating which
regexes did match. Furthermore,if the message is tokenized,the
function will mark all tokens with a dict relating the name of the
regex to whether it was matched.
"""
所以我认为将整个句子作为输入。在 Rasa 中很难看到特征空间的内部,但我已经确认在使用 RegexEntityExtractor 时跨令牌拾取了正确的实体。这可以通过在 NLU 数据中临时添加实体示例(确保它在意图中至少出现两次)并运行 rasa interactive
来轻松验证。