将字数存储在python trie中

问题描述

我拿了一个单词表,放进去。我还想将字数存储在里面以供进一步分析。最好的方法是什么?我认为这是收集和存储频率的课程,但是我不确定该如何进行。您可以看到我的尝试,插入的最后一行是我尝试存储计数的地方。

class TrieNode:
    def __init__(self,k):
        self.v = 0
        self.k = k
        self.children = {}
    def all_words(self,prefix):
        if self.end:
            yield prefix
        for letter,child in self.children.items():
            yield from child.all_words(prefix + letter)
class Trie:
    def __init__(self):
        self.root = TrieNode()
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self,word):
        curr = self.root
        for letter in word:
            node = curr.children.get(letter)
            if not node:
                node = TrieNode()
                curr.children[letter] = node
            curr.v += 1

    def insert_many(self,words):
        for word in words:
            self.insert(word)
    def all_words_beginning_with_prefix(self,prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            if cur is None:
                return  # No words with given prefix
        yield from cur.all_words(prefix)


我想存储计数,以便在使用时

print(list(trie.all_words_beginning_with_prefix('prefix')))

我会得到如下结果:

[(word,count),(word,count)]

解决方法

插入时,看到任何节点时,这意味着将在该路径中添加一个新单词。因此,增加该节点的word_count。

class TrieNode:
    def __init__(self,char):
        self.char = char
        self.word_count = 0
        self.children = {}

    def all_words(self,prefix,path):
        if len(self.children) == 0:
            yield prefix + path
        for letter,child in self.children.items():
            yield from child.all_words(prefix,path + letter)


class Trie:
    def __init__(self):
        self.root = TrieNode('')

    def insert(self,word):
        curr = self.root
        for letter in word:
            node = curr.children.get(letter)
            if node is None:
                node = TrieNode(letter)
                curr.children[letter] = node
            curr.word_count += 1  # increment it everytime the node is seen at particular level.
            curr = node

    def insert_many(self,words):
        for word in words:
            self.insert(word)

    def all_words_beginning_with_prefix(self,prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            if cur is None:
                return  # No words with given prefix
        yield from cur.all_words(prefix,path="")

    def word_count(self,prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            if cur is None:
                return 0
        return cur.word_count


trie = Trie()
trie.insert_many(["hello","hi","random","heap"])

prefix = "he"
words = [w for w in trie.all_words_beginning_with_prefix(prefix)]

print("Lazy method:\n Prefix: %s,Words: %s,Count: %d" % (prefix,words,len(words)))
print("Proactive method:\n Word count for '%s': %d" % (prefix,trie.word_count(prefix)))

输出:

Lazy method:
 Prefix: he,Words: ['hello','heap'],Count: 2
Proactive method:
 Word count for 'he': 2
,

我要将一个名为is_word的字段添加到trie节点,其中is_word仅对单词中的最后一个字母为true。就像您拥有单词AND一样,is_word对于持有字母D的trie节点将为true。并且我将仅更新具有is_word为true的节点的频率,而不更新单词中的每个字母。

因此,当您从一个字母进行迭代时,请检查它是否是单词,如果是,请停止迭代,返回计数和单词。我假设在您的迭代中,您要跟踪字母并将其添加到前缀中。

您的特里是多向特里。