问题描述
我刚接触 fastText,并阅读了教程:https://fasttext.cc/docs/en/supervised-tutorial.html。
我下载了示例数据,发现标签是字符串类型。
$ head cooking.stackexchange.txt
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there
__label__knife-skills __label__dicing Without knife skills,how can I quickly and accurately dice vegetables?
__label__storage-method __label__equipment __label__bread What's the purpose of a bread Box?
__label__baking __label__food-safety __label__substitutions __label__peanuts how to seperate peanut oil from roasted peanuts at home?
__label__chocolate American equivalent for British chocolate terms
__label__baking __label__oven __label__convection Fan bake vs bake
__label__sauce __label__storage-lifetime __label__acidity __label__mayonnaise Regulation and balancing of readymade packed mayonnaise and other sauces
以及教程中的训练和测试代码。
>>> model = fasttext.train_supervised(input="cooking.train",lr=1.0)
Read 0M words
Number of words: 9012
Number of labels: 734
Progress: 100.0% words/sec/thread: 81469 lr: 0.000000 loss: 6.405640 eta: 0h0m
>>> model.test("cooking.valid")
(3000L,0.563,0.245)
我的问题是为什么标签没有应用(比如sklearn)LabelEncoder?我已经运行了这个例子,它运行良好。我很困惑。
[更新]--------
IMO,代码如下
from sklearn import preprocessing
texts_train,labels_train = load_dataset()
label_encoder = preprocessing.LabelEncoder()
labels_train = label_encoder.fit_transform(labels_train)
with open('cooking.train.2','w') as f:
for i in range(len(texts_train)):
f.write('%s __label__%d\n' % (texts_train[i],labels_train[i]))
model = fasttext.train_supervised('cooking.train.2',lr=1.0)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)