如何解决我的tf-idf词汇错误?

问题描述

我从sklearn的训练数据中训练了一个TFIDF,当我将词汇表应用于新数据时,它给了我一个关键的错误,因为它没有学到。 我该如何解决

这是我的代码

   def feature_engineering(self,inputs):
        x = [self.analyser(seq) for seq in inputs]
        return x

    def fit(self,inputs):
        if self.vocabulary and self.analyser:
            pass
        else:
            vectorizer = TfidfVectorizer(
                ngram_range=(self.config_dict["min_n_gram"],self.config_dict["max_n_gram"]),lowercase=False,stop_words=None,min_df=2)
            vectorizer.fit(inputs)
            self.analyser = vectorizer.build_analyzer()
            self.vocabulary = vectorizer.vocabulary_
            save_object(os.path.join(self.feature_extraction_folder,"analyzer.pickle"),self.analyser)
            save_object(os.path.join(self.feature_extraction_folder,"vocabulary.pickle"),self.vocabulary)

    def transform(self,inputs):
        vocab_size = len(self.vocabulary)
        inputs = self.feature_engineering(inputs)
        inputs = [[self.vocabulary[x] for x in l] for l in inputs]##This line generate an error

        return np.array(inputs)

解决方法

使用if语句解决我的问题

inputs = [[self.vocabulary[x] for x in l if x in self.vocabulary.keys()] for l in inputs]```