max_features 在 CountVectorizer 中有什么作用，CountVectorizer 是如何工作的？

问题描述

我想知道术语 max_features 在 CountVectorizer 中的作用。我试图找到一些例子，但没有例子。
CountVectorizer 是如何工作的。现在我不明白 preprocessed_reviews1 是一个列表，所以 preprocessed_review1[0] 给出了 可用产品 victor traps unreal course total fly genocide 附近很臭的输出。但是当我尝试 count_vect.get_feature_names()[0] 时，输出是能力。在我看来，输出应该是产品。为什么会有这样的变化。

此代码归功于 AAIC。

   preprocessed_reviews1 = []
   for sentance1 in tqdm(final["Summary"].values):
    sentance1 = re.sub(r"http\S+","",sentance1)
    sentance1 = BeautifulSoup(sentance1,"lxml").get_text()
    sentance1 = decontracted(sentance1)
    sentance1 = re.sub(r"\S*\d\S*"," ",sentance1)
    sentance1 = re.sub('[A-Za-z]+',' ',sentance1)
    sentance1 = ''.join(e1.lower() for e1 in sentance1.split() if e1.lower() not in stopwords)
    preprocessed_reviews1.append(sentance1.strip)

   count_vect = CountVectorizer(ngram_range=(1,2),min_df=10,max_features=5000)
   final_bigram_counts = count_vect.fit_transform(preprocessed_reviews)

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

countvectorizer

max_features 在 CountVectorizer 中有什么作用，CountVectorizer 是如何工作的？

问题描述

解决方法

相关问答