正确使用Keras制作的TF记录序列示例

问题描述

我试图弄清楚在Keras中训练LSTM时如何使用TF记录。我是TF Records格式的新手,对如何实现此功能感到困惑。

我正在使用来自亚马逊情绪分析的一些数据,并希望包括评论文本(我将在嵌入向量中使用)以及多个上下文字段。因此,这是一种多输入格式,我将结合数字上下文字段中的嵌入矢量和密集层。

写TF记录

def create_tf_example(review,helpful,label):

    context = Features(feature={
                "helpful1": Feature(int64_list=Int64List(value=[helpful[0]])),"helpful2": Feature(int64_list=Int64List(value=[helpful[1]])),"label": Feature(float_list=FloatList(value=[label]))
            })
    
    review_feature= Feature(bytes_list=BytesList(value=[word.encode("utf-8") for word in review.split(' ')]))
        
    sequence_example = SequenceExample(
        context=context,feature_lists=FeatureLists(feature_list={
                "review": FeatureList(feature=[review_feature])
                 }
        )
    )
    
    return(sequence_example)


with tf.io.TFRecordWriter("sequence.tfrecords") as writer:
    
    for line in open('reviews_Video_Games_5.json','r'):

        json_instance=json.loads(line)
        review=json_instance['reviewText']
        helpful=json_instance['helpful']
        label=json_instance['overall']
        
        example = create_tf_example(review,label)
        
        writer.write(example.SerializetoString()) #write tfrecord

读取和解析TF记录

    def _parse_function(proto):
        
        #see examples where these can hold lists as well
        context_feature_descriptions = {
            "helpful1": tf.io.FixedLenFeature([],tf.int64,default_value=0),"helpful2": tf.io.FixedLenFeature([],"label": tf.io.FixedLenFeature([],tf.float32,}
        #the review string list
        sequence_feature_descriptions = {
            "review": tf.io.VarLenFeature(tf.string)
            }
        
        #the parsed context and features based on the above 2 functions
        parsed_context,parsed_feature_lists = tf.io.parse_single_sequence_example(proto,context_feature_descriptions,sequence_feature_descriptions)
        
        return(parsed_context,parsed_feature_lists)
    
    #here create a tuple of 'X' which is the review and then a list of the two contexts
    # the other member of the tuple is the label
#here create a tuple of 'X' which is the review and then a list of the two contexts

    # the other member of the tuple is the label
    def _xy_function(context,feature_lists):
            
            label=context.pop('label')
            reviews= tf.RaggedTensor.from_sparse(feature_lists['review'])
            context= [context['helpful1'],context['helpful2']]
            
            return((reviews,context),label)

在这一点上,我认为我拥有可以使用的格式的数据,因为我似乎对Keras有一个元组(X,y),其中X是一个元组

enter image description here

我想使用tensorflow.keras.layers.experimental.preprocessing.TextVectorization

如何处理这种格式并最终使模型适合2个输入?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)