将多个TensorFlow数据集交织在一起

rqmkfv5c 于 2023-03-19 发布在其他

关注(0)|答案(4)|浏览(81)

当前的TensorFlow数据集交叉功能基本上是一个交叉平面Map，将单个数据集作为输入。在当前的API中，将多个数据集交叉在一起的最佳方式是什么？假设它们已经构建好，我有一个列表。我希望交替地从它们生成元素，并且我希望支持包含2个以上数据集的列表（IidoEe.，堆叠的拉链和插页会相当难看）。
谢谢！：）
@mrry也许能帮上忙。

tensorflow

来源：https://stackoverflow.com/questions/49058913/interleaving-multiple-tensorflow-datasets-together

4条答案

按热度按时间

6ju8rftf1#

**EDIT 2：**参见tf.contrib.data.choose_from_datasets。它执行确定性数据集交叉存取。
**编辑：**参见tf.contrib.data.sample_from_datasets。尽管它执行随机采样，但我猜它可能很有用。

尽管这并不“干净”，但这是我想出的唯一变通办法。

datasets = [tf.data.Dataset...]

def concat_datasets(datasets):
    ds0 = tf.data.Dataset.from_tensors(datasets[0])
    for ds1 in datasets[1:]:
        ds0 = ds0.concatenate(tf.data.Dataset.from_tensors(ds1))
    return ds0

ds = tf.data.Dataset.zip(tuple(datasets)).flat_map(
    lambda *args: concat_datasets(args)
)

赞(0）回复(0）举报 2023-03-19

2nbm6dog2#

扩展user2781994 answer（带编辑），下面是我如何实现它的：

import tensorflow as tf

ds11 = tf.data.Dataset.from_tensor_slices([1,2,3])
ds12 = tf.data.Dataset.from_tensor_slices([4,5,6])
ds13 = tf.data.Dataset.from_tensor_slices([7,8,9])
all_choices_ds = [ds11, ds12, ds13]

choice_dataset = tf.data.Dataset.range(len(all_choices_ds)).repeat()
ds14 = tf.contrib.data.choose_from_datasets(all_choices_ds, choice_dataset)

# alternatively:
# ds14 = tf.contrib.data.sample_from_datasets(all_choices_ds)

iterator = ds14.make_initializable_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    sess.run(iterator.initializer)
    while True:
        try:
            value=sess.run(next_element)
        except tf.errors.OutOfRangeError:
            break
        print(value)

输出为：

赞(0）回复(0）举报 2023-03-19

gkl3eglg3#

在Tensorflow 2.0中

tot_imm_dataset1 = 105
tot_imm_dataset2 = 55
e = tf.data.Dataset.from_tensor_slices(tf.cast([1,0,1],tf.int64)).repeat(int(tot_imm_dataset1/2)) 
f=tf.data.Dataset.range(1).repeat(int(tot_imm_dataset2-tot_imm_dataset1/2))
choice=e.concatenate(f)
datasets=[dataset2,dataset1]
dataset_rgb_compl__con_patch= tf.data.experimental.choose_from_datasets(datasets, choice)

这对我有用

赞(0）回复(0）举报 2023-03-19

nfs0ujit4#

基本的想法是
1.用文件创建数据集（超级数据集
1.交错每个文件（子数据集）：使用from_tensor_slice加载和包裹
1.分批

import glob

    batch_size = 32
    files = glob.glob('datasets/*.npy')
    
    def read_npy_file(file_path):
      return np.load(file_path.numpy().decode())
    
    def create_dataset(files):
      dataset = tf.data.Dataset.from_tensor_slices(files)
      dataset = dataset.interleave(
        lambda x: tf.data.Dataset.from_tensor_slices(
          tf.py_function(read_npy_file, [x], tf.float32)
        ),
        cycle_length=len(files),
        num_parallel_calls=tf.data.AUTOTUNE
      )
      dataset = dataset.batch(batch_size)
      return dataset
    
    dataset = create_dataset(files)
    
    for batch in dataset:
      print(batch.shape)

棘手的部分是使用tf.py_function Package read_npy_file，否则传入的arg是一个Tensor，它没有numpy（）函数。

赞(0）回复(0）举报 2023-03-19

我来回答

将多个TensorFlow数据集交织在一起

4条答案

相关问题

热门标签

最新问答