使用ColumnTransformer向矢量化器内容添加功能,在尝试适合内容时出现尺寸错误

问题描述

我在向矢量化器内容添加功能时遇到问题。我具有文本内容和页面数,并且正在使用ColumnTransformer sklearn函数将页面添加到矢量化器输入中,

training_content = pd.DataFrame({'text': training_text,'pages': training_pages})

文本内容和页面的尺寸相同

19872 19872

生成的DataFrame具有这种形状

(19872,2)

然后我正在使用ColumnTransformer生成用于特征预处理的管道

pipe = ColumnTransformer([('text',TfidfVectorizer(tokenizer=remove_strings_smaller_three_chars_tokenizer,ngram_range=(1,ngram)),['text'])],remainder=MinMaxScaler())

pipe = pipe.fit(training_content)

但是我收到此错误

Traceback (most recent call last):
  File "test_clfs.py",line 336,in <module>
    pipe = pipe.fit(training_content)
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 494,in fit
    self.fit_transform(X,y=y)
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 553,in fit_transform
    return self._hstack(list(Xs))
  File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 639,in _hstack
    return np.hstack(Xs)
  File "<__array_function__ internals>",line 6,in hstack
  File "/root/semantic_env/lib/python3.7/site-packages/numpy/core/shape_base.py",line 346,in hstack
    return _nx.concatenate(arrs,1)
  File "<__array_function__ internals>",in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly,but along dimension 0,the array at index 0 has size 1 and the array at index 1 has size 19872

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...