如何更改此ngrams_practice函数以返回任何n gram而不只是二元组？

问题描述

import nltk
from nltk.tokenize import word_tokenize
from nltk.util import ngrams 
from nltk.lm.preprocessing import pad_both_ends
from nltk.util import bigrams

input1 = [['A','B','C','D','E'],['D','E',['A','D']]

def ngrams_practice(n,input1):    
    test_ngrams = []
    for i in range(len(input1)-n+1):
        test_ngrams2 = list(bigrams(pad_both_ends(input1[i],n)))
        test_ngrams.append(test_ngrams2)
    return test_ngrams

ngrams_practice(2,input1)

当n = 2时，我需要下面的输出看起来像这样，直到我在代码中包括“ bigrams”，我才能使该函数像这样的输出返回。我需要它以类似的方式工作，但是对于n的任何值，不仅是双字母组，而且还有三字组等。它仅在n = 2时才能很好地起作用。有什么建议吗？

Output:
[[('<s>','A'),('A','B'),('B','C'),('C','D'),('D','E'),('E','</s>')],[('<s>','</s>')]]

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

n-gram nltk python

如何更改此ngrams_practice函数以返回任何n gram而不只是二元组？

问题描述

解决方法

相关问答