tensorflow 如何将numpy转换为tfrecords，然后生成批处理？

1sbrub3j 于 2023-06-24 发布在其他

关注(0)|答案(1)|浏览(107)

我的问题是关于如何从多个（或分片）tfrecord中获取批处理输入。我看过www.example.com的例子https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410。基本的流水线是，以训练集为例，（1）首先生成一系列tfrecords（例如，train-000-of-005，train-001-of-005，...），（2）从这些文件名中，生成一个列表并将它们送入tf.train.string_input_producer以获得队列，（3）同时生成一个tf.RandomShuffleQueue来做其他事情，（4）使用tf.train.batch_join生成批量输入。
我认为这很复杂，我不确定这个程序的逻辑。在我的例子中，我有一个.npy文件列表，我想生成分片的tfrecords（多个单独的tfrecords，而不仅仅是一个大文件）。这些.npy文件中的每一个都包含不同数量的正样本和负样本（2类）。一个基本的方法是生成一个单一的大型tfrecord文件。但是文件太大（~20Gb）。所以我求助于分片tfrecords。有没有更简单的方法来做到这一点？

tensorflow

来源：https://stackoverflow.com/questions/45427637/how-to-convert-numpy-to-tfrecords-and-then-generate-batches

1条答案

按热度按时间

a64a0gku1#

使用Dataset API简化了整个过程。以下是两个部分：(1): Convert numpy array to tfrecords和(2): read the tfrecords to generate batches。

1.从numpy数组创建tfrecords：

Example arrays:
inputs = np.random.normal(size=(5, 32, 32, 3))
labels = np.random.randint(0,2,size=(5,))

def npy_to_tfrecords(inputs, labels, filename):
  with tf.io.TFRecordWriter(filename) as writer:
    for X, y in zip(inputs, labels):
        # Feature contains a map of string to feature proto objects
        feature = {}
        feature['X'] = tf.train.Feature(float_list=tf.train.FloatList(value=X.flatten()))
        feature['y'] = tf.train.Feature(int64_list=tf.train.Int64List(value=[y]))

        # Construct the Example proto object
        example = tf.train.Example(features=tf.train.Features(feature=feature))

        # Serialize the example to a string
        serialized = example.SerializeToString()

        # write the serialized objec to the disk
        writer.write(serialized)

npy_to_tfrecords(inputs, labels, 'numpy.tfrecord')

2.使用Dataset API读取tfrecords：

filenames = ['numpy.tfrecord']
dataset = tf.data.TFRecordDataset(filenames)
# for version 1.5 and above use tf.data.TFRecordDataset

# example proto decode
def _parse_function(example_proto):
    keys_to_features = {'X':tf.io.FixedLenFeature(shape=(32, 32, 3), dtype=tf.float32),
                      'y': tf.io.FixedLenFeature((), tf.int64, default_value=0)}
    parsed_features = tf.io.parse_single_example(example_proto, keys_to_features)
    return parsed_features['X'], parsed_features['y']

# Parse the record into tensors.
dataset = dataset.map(_parse_function)  
  
# Generate batches
dataset = dataset.batch(5)

检查生成的批次是否正确：

for data in dataset:
    break
np.testing.assert_allclose(inputs[0] ,data[0][0])
np.testing.assert_allclose(labels[0] ,data[1][0])

赞(0）回复(0）举报 2023-06-24

我来回答

tensorflow 如何将numpy转换为tfrecords，然后生成批处理？

1条答案

1.从numpy数组创建tfrecords：

2.使用Dataset API读取tfrecords：

相关问题

热门标签

最新问答