问题描述
我试图弄清楚在Keras中训练LSTM时如何使用TF记录。我是TF Records格式的新手,对如何实现此功能感到困惑。
我正在使用来自亚马逊情绪分析的一些数据,并希望包括评论文本(我将在嵌入向量中使用)以及多个上下文字段。因此,这是一种多输入格式,我将结合数字上下文字段中的嵌入矢量和密集层。
写TF记录
def create_tf_example(review,helpful,label):
context = Features(feature={
"helpful1": Feature(int64_list=Int64List(value=[helpful[0]])),"helpful2": Feature(int64_list=Int64List(value=[helpful[1]])),"label": Feature(float_list=FloatList(value=[label]))
})
review_feature= Feature(bytes_list=BytesList(value=[word.encode("utf-8") for word in review.split(' ')]))
sequence_example = SequenceExample(
context=context,feature_lists=FeatureLists(feature_list={
"review": FeatureList(feature=[review_feature])
}
)
)
return(sequence_example)
with tf.io.TFRecordWriter("sequence.tfrecords") as writer:
for line in open('reviews_Video_Games_5.json','r'):
json_instance=json.loads(line)
review=json_instance['reviewText']
helpful=json_instance['helpful']
label=json_instance['overall']
example = create_tf_example(review,label)
writer.write(example.SerializetoString()) #write tfrecord
读取和解析TF记录
def _parse_function(proto):
#see examples where these can hold lists as well
context_feature_descriptions = {
"helpful1": tf.io.FixedLenFeature([],tf.int64,default_value=0),"helpful2": tf.io.FixedLenFeature([],"label": tf.io.FixedLenFeature([],tf.float32,}
#the review string list
sequence_feature_descriptions = {
"review": tf.io.VarLenFeature(tf.string)
}
#the parsed context and features based on the above 2 functions
parsed_context,parsed_feature_lists = tf.io.parse_single_sequence_example(proto,context_feature_descriptions,sequence_feature_descriptions)
return(parsed_context,parsed_feature_lists)
#here create a tuple of 'X' which is the review and then a list of the two contexts
# the other member of the tuple is the label
#here create a tuple of 'X' which is the review and then a list of the two contexts
# the other member of the tuple is the label
def _xy_function(context,feature_lists):
label=context.pop('label')
reviews= tf.RaggedTensor.from_sparse(feature_lists['review'])
context= [context['helpful1'],context['helpful2']]
return((reviews,context),label)
在这一点上,我认为我拥有可以使用的格式的数据,因为我似乎对Keras有一个元组(X,y),其中X是一个元组。
我想使用tensorflow.keras.layers.experimental.preprocessing.TextVectorization
如何处理这种格式并最终使模型适合2个输入?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)