问题描述
我在向矢量化器内容添加功能时遇到问题。我具有文本内容和页面数,并且正在使用ColumnTransformer sklearn函数将页面添加到矢量化器输入中,
training_content = pd.DataFrame({'text': training_text,'pages': training_pages})
文本内容和页面的尺寸相同
19872 19872
生成的DataFrame具有这种形状
(19872,2)
然后我正在使用ColumnTransformer生成用于特征预处理的管道
pipe = ColumnTransformer([('text',TfidfVectorizer(tokenizer=remove_strings_smaller_three_chars_tokenizer,ngram_range=(1,ngram)),['text'])],remainder=MinMaxScaler())
pipe = pipe.fit(training_content)
但是我收到此错误
Traceback (most recent call last):
File "test_clfs.py",line 336,in <module>
pipe = pipe.fit(training_content)
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 494,in fit
self.fit_transform(X,y=y)
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 553,in fit_transform
return self._hstack(list(Xs))
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 639,in _hstack
return np.hstack(Xs)
File "<__array_function__ internals>",line 6,in hstack
File "/root/semantic_env/lib/python3.7/site-packages/numpy/core/shape_base.py",line 346,in hstack
return _nx.concatenate(arrs,1)
File "<__array_function__ internals>",in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly,but along dimension 0,the array at index 0 has size 1 and the array at index 1 has size 19872
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)