为什么建议在 tf.train.example 的列表中存储单个值而不是整个列?

问题描述

为什么 https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata 建议为每个列表消息使用一个值而不是整个列?它(根据我的发现)显着提高了空间效率。我错过了什么?

代码是从上述网站复制到虚线为止,之后是我自己的代码一个示例运行给了我 843898 的教程方法和 123994 的我的。

import tensorflow as tf
import pandas as pd
import numpy as np
from tensorflow.train import BytesList,FloatList,Int64List
from tensorflow.train import Feature,Features,Example



# The following functions can be used to convert a value to a type compatible
# with tf.train.Example.

def _bytes_feature(value):
  """Returns a bytes_list from a string / byte."""
  if isinstance(value,type(tf.constant(0))):
    value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):
  """Returns a float_list from a float / double."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
  """Returns an int64_list from a bool / enum / int / uint."""
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

# The number of observations in the dataset.
n_observations = int(1e4)

# Boolean feature,encoded as False or True.
feature0 = np.random.choice([False,True],n_observations)

# Integer feature,random from 0 to 4.
feature1 = np.random.randint(0,5,n_observations)

# String feature
strings = np.array([b'cat',b'dog',b'chicken',b'horse',b'goat'])
feature2 = strings[feature1]

# Float feature,from a standard normal distribution
feature3 = np.random.randn(n_observations)


def serialize_example(feature0,feature1,feature2,feature3):
  """
  Creates a tf.train.Example message ready to be written to a file.
  """
  # Create a dictionary mapping the feature name to the tf.train.Example-compatible
  # data type.
  feature = {
      'feature0': _int64_feature(feature0),'feature1': _int64_feature(feature1),'feature2': _bytes_feature(feature2),'feature3': _float_feature(feature3),}

  # Create a Features message using tf.train.Example.

  example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
  return example_proto.SerializetoString()

#--------------------------
length = 0

# Example Tensorflow: convert one "row" per iteration to the TFRecord-format
for i in range(n_observations):
  se = serialize_example(feature0[i],feature1[i],feature2[i],feature3[i])
  length += len(se)

print(length)

# Example Me: Dump the entire column of the corresponding feature into the respective list
def create_example2(feature0,feature3):
  feature = {
    'feature0': Feature(int64_list=Int64List(value=feature0)),'feature1': Feature(int64_list=Int64List(value=feature1)),'feature2': Feature(bytes_list=BytesList(value=feature2)),'feature3': Feature(float_list=FloatList(value=feature3)),}
  example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
  
  return example_proto.SerializetoString()

example2 = create_example2(feature0,feature3)

print(len(example2))

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)