numpy 如何使用LSTM输入解决维度和顺序性问题？

我有两个问题与LSTM模型的输入要求有关。我的LSTM要求3D输入作为一个Tensor，由一个重放缓冲区（重放缓冲区本身是一个双端队列）提供，作为一些组件的元组。LSTM要求每个组件是一个单一的值，而不是一个序列。

state_dim = 21;批量大小= 32

问题：
1.批量采样返回的NumPy数组是一维（1D），而required是3D。使用np.reshape、np.expand和np.asarray都不起作用，因为它返回错误，如 ValueError：无法将大小为32的数组重新整形为形状（32，1，21）
1.当使用数组广播作为解决方案时（这只是测试，我不想在代码中使用广播），将数组转换为Tensor时出现另一个错误：ValueError：使用序列设置数组元素。
代码结构：
有一个函数返回21个特性的状态列表，我们称之为def get_state（）：
然后它被这部分代码使用：

def sample_batch():
    global batch_size
    batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
    batch = [replay_buffer[index] for index in batch_indices]

    states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
    actions = np.array([item[1] for item in batch])
    rewards = np.array([item[2] for item in batch])
    next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
    done_flags = np.array([item[4] for item in batch])

    # Ensure states and next_states are 2-dimensional arrays - this is the workaround that was mentioned but it should not be in the final code
    if states.ndim == 1:
        states = states[:, np.newaxis]
    if next_states.ndim == 1:
        next_states = next_states[:, np.newaxis]

    n_timesteps = 1  # Specify the number of timesteps

    # Add a time_steps dimension to states and next_states using broadcasting - this is the workaround that was mentioned but it should not be in the final code
    states = states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
    next_states = next_states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)

    # Convert to tensor
    states = tf.convert_to_tensor(states, dtype=tf.float32)
    next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)

    return states, actions, rewards, next_states, done_flags

# Update the network based on target model Q-values
def update_network():
    global batch_size
    states, actions, rewards, next_states, done_flags = sample_batch()

    Q_values = model.predict(states)
    Q_values_next = target_model.predict(next_states)

    for i in range(batch_size):
        if done_flags[i]:
            Q_values[i][actions[i]] = rewards[i]
        else:
            Q_values[i][actions[i]] = rewards[i] + gamma * np.max(Q_values_next[i])

    model.train_on_batch(states, Q_values)

# Define max steps per episode
max_steps_per_episode = 1000

# Set the number of episodes over which to calculate the average reward
average_over_episodes = 5

# Initialize a list to store the rewards for each episode
episode_rewards = []

# Keep track of the best average reward
best_avg_reward = float('-inf')

# Definition of Q-learning + definition of state as state = get_state()
for episode in range(num_episodes):
    state = get_state()
    episode_reward = 0
    # Decay epsilon (episode - 1 is to start from initial value and then decay in the next episode)
    epsilon = initial_epsilon * (decay_rate ** (episode - 1))
    step = 0
    done = False

    while not done:
        # Choose action using epsilon-greedy policy
        if np.random.rand() < epsilon:
            action = np.random.randint(num_actions)
        else:
            Q_values_single = model.predict(np.array([state]))
            action = np.argmax(Q_values_single)

        # Take action and get reward
        take_action(action)
        reward = get_reward()
        next_state = get_state()

        # Add experience to replay buffer
        replay_buffer.append((state, action, reward, next_state, done))

BTW有两种型号：在线模型和目标模型：

# Choose optimizer:
optimizer = keras.optimizers.RMSprop(learning_rate=alpha)

# Set up target model -- influenced by Deep Q-learning paper by Minh et al. (2015)
target_model = keras.Sequential(
    [
        layers.LSTM(64, input_shape=(None, state_dim)),
        layers.Dense(num_actions),
    ]
)
target_model.compile(loss="mse", optimizer=optimizer)

# Define a function to update the target network's weights
def update_target_network():
    target_model.set_weights(model.get_weights())

# Set up online model and load weights
model = keras.Sequential(
    [
        layers.LSTM(64, input_shape=(None, state_dim)),
        layers.Dense(num_actions),
    ]
)

model.compile(loss="mse", optimizer=optimizer)

我确实尝试了各种方法来解决这些错误，如上所述，np.reshape，np.expand，np.asarray或使用数组广播（实际上有效，但揭示了重放缓冲区组件顺序性的另一个问题）或dtype=np.float32。
如果任何人有一些想法如何准备LSTM输入，以便状态特征数据都是3D数组，并且每个状态数组元素都是单个值（即使由21个特征组成），我将非常感谢任何帮助。

错误似乎是由于LSTM模型的输入数据的形状而产生的。正如您提到的，LSTM需要3D输入作为Tensor，我们需要确保输入数据在输入到LSTM之前是正确的形状。
根据您提供的代码，批量采样后states和next_states数据的形状似乎是（batch_size，21）。然而，LSTM期望输入形状为（batch_size，n_timesteps，state_dim），其中n_timesteps是序列中的时间步的数量，state_dim是状态中的特征的数量。我们需要重塑states和next_states数据以匹配该形状。
为此，我们可以使用NumPy的reshape函数。例如，要将状态重塑为（batch_size，1，state_dim）的形状，我们可以使用以下代码：

states = np.reshape(states, (batch_size, 1, state_dim))

类似地，我们可以重塑next_states以具有相同的形状。请注意，我们将新形状的第二个维度设置为1，因为我们一次只向LSTM提供一个时间步。
一旦我们正确地重塑了states和next_states数据，我们就可以像以前一样使用tf.convert_to_tensor函数将它们转换为Tensor。
下面是一个更新后的sample_batch函数：

def sample_batch():
    global batch_size
    batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
    batch = [replay_buffer[index] for index in batch_indices]

    states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
    actions = np.array([item[1] for item in batch])
    rewards = np.array([item[2] for item in batch])
    next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
    done_flags = np.array([item[4] for item in batch])

    n_timesteps = 1  # Specify the number of timesteps

    # Reshape states and next_states to have shape (batch_size, n_timesteps, state_dim)
    states = np.reshape(states, (batch_size, 1, state_dim))
    next_states = np.reshape(next_states, (batch_size, 1, state_dim))

    # Convert to tensor
    states = tf.convert_to_tensor(states, dtype=tf.float32)
    next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)

    return states, actions, rewards, next_states, done_flags

我希望这能帮助你解决LSTM模型的输入形状问题！

numpy 如何使用LSTM输入解决维度和顺序性问题？

1条答案

相关问题

热门标签

最新问答