问题描述
对于LSTM模型,我收到此错误。 数据有三列
“ ValueError:数据基数不明确:
x尺寸:720
y尺寸:89
请提供具有相同第一维度的数据。”
### Create sequence
vocab_size = 20000
tokenizer = Tokenizer(num_words= vocab_size)
tokenizer.fit_on_texts(df['Sentence'])
sequences = tokenizer.texts_to_sequences(df['Sentence'])
data = pad_sequences(sequences,maxlen=100)
le = LabelEncoder()
df['label'] = le.fit_transform(df['label'])
X = df['Sentence']
y = df[['value','label']]
X_train,y_train,X_test,y_test = train_test_split(X,y,test_size = 0.1)
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
vocab_size = len(tokenizer.word_index) + 1
maxlen = 200
X_train = pad_sequences(X_train,padding='post',maxlen=maxlen)
X_test = pad_sequences(X_test,maxlen=maxlen)
#print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
model = Sequential()
model.add(Embedding(vocab_size,128))
model.add(LSTM(128,dropout=0.2,recurrent_dropout=0.2))
model.add(Flatten())
model.add(Dense(2,activation='sigmoid'))
# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
model.fit(X_train,epochs=3,batch_size=8,validation_split=0.1)
accr = model.evaluate(X_test,y_test)
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
解决方法
您的数据有两个输出(“值”和“标签”列)。但是您的模型只有一个输出。
此代码有效:
X_train = tf.random.uniform([100,100],100,dtype=tf.int32)
y_train = tf.random.uniform([100,2])
model.fit(X_train,y_train,epochs=3,batch_size=8,validation_split=0.1)
检查y_train的形状。应该是[batch_size,2]。