我有两个问题与LSTM模型的输入要求有关。我的LSTM要求3D输入作为一个Tensor,由一个重放缓冲区(重放缓冲区本身是一个双端队列)提供,作为一些组件的元组。LSTM要求每个组件是一个单一的值,而不是一个序列。
state_dim = 21;批量大小= 32
问题:
1.批量采样返回的NumPy数组是一维(1D),而required是3D。使用np.reshape、np.expand和np.asarray都不起作用,因为它返回错误,如 ValueError:无法将大小为32的数组重新整形为形状(32,1,21)
1.当使用数组广播作为解决方案时(这只是测试,我不想在代码中使用广播),将数组转换为Tensor时出现另一个错误:ValueError:使用序列设置数组元素。
代码结构:
有一个函数返回21个特性的状态列表,我们称之为def get_state():
然后它被这部分代码使用:
def sample_batch():
global batch_size
batch_indices = np.random.randint(len(replay_buffer), size=batch_size)
batch = [replay_buffer[index] for index in batch_indices]
states = np.array([np.array(item[0], dtype=np.float32) for item in batch])
actions = np.array([item[1] for item in batch])
rewards = np.array([item[2] for item in batch])
next_states = np.array([np.array(item[3], dtype=np.float32) for item in batch])
done_flags = np.array([item[4] for item in batch])
# Ensure states and next_states are 2-dimensional arrays - this is the workaround that was mentioned but it should not be in the final code
if states.ndim == 1:
states = states[:, np.newaxis]
if next_states.ndim == 1:
next_states = next_states[:, np.newaxis]
n_timesteps = 1 # Specify the number of timesteps
# Add a time_steps dimension to states and next_states using broadcasting - this is the workaround that was mentioned but it should not be in the final code
states = states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
next_states = next_states[:, np.newaxis, :] * np.ones((1, n_timesteps, 1), dtype=np.float32)
# Convert to tensor
states = tf.convert_to_tensor(states, dtype=tf.float32)
next_states = tf.convert_to_tensor(next_states, dtype=tf.float32)
return states, actions, rewards, next_states, done_flags
# Update the network based on target model Q-values
def update_network():
global batch_size
states, actions, rewards, next_states, done_flags = sample_batch()
Q_values = model.predict(states)
Q_values_next = target_model.predict(next_states)
for i in range(batch_size):
if done_flags[i]:
Q_values[i][actions[i]] = rewards[i]
else:
Q_values[i][actions[i]] = rewards[i] + gamma * np.max(Q_values_next[i])
model.train_on_batch(states, Q_values)
# Define max steps per episode
max_steps_per_episode = 1000
# Set the number of episodes over which to calculate the average reward
average_over_episodes = 5
# Initialize a list to store the rewards for each episode
episode_rewards = []
# Keep track of the best average reward
best_avg_reward = float('-inf')
# Definition of Q-learning + definition of state as state = get_state()
for episode in range(num_episodes):
state = get_state()
episode_reward = 0
# Decay epsilon (episode - 1 is to start from initial value and then decay in the next episode)
epsilon = initial_epsilon * (decay_rate ** (episode - 1))
step = 0
done = False
while not done:
# Choose action using epsilon-greedy policy
if np.random.rand() < epsilon:
action = np.random.randint(num_actions)
else:
Q_values_single = model.predict(np.array([state]))
action = np.argmax(Q_values_single)
# Take action and get reward
take_action(action)
reward = get_reward()
next_state = get_state()
# Add experience to replay buffer
replay_buffer.append((state, action, reward, next_state, done))
BTW有两种型号:在线模型和目标模型:
# Choose optimizer:
optimizer = keras.optimizers.RMSprop(learning_rate=alpha)
# Set up target model -- influenced by Deep Q-learning paper by Minh et al. (2015)
target_model = keras.Sequential(
[
layers.LSTM(64, input_shape=(None, state_dim)),
layers.Dense(num_actions),
]
)
target_model.compile(loss="mse", optimizer=optimizer)
# Define a function to update the target network's weights
def update_target_network():
target_model.set_weights(model.get_weights())
# Set up online model and load weights
model = keras.Sequential(
[
layers.LSTM(64, input_shape=(None, state_dim)),
layers.Dense(num_actions),
]
)
model.compile(loss="mse", optimizer=optimizer)
我确实尝试了各种方法来解决这些错误,如上所述,np.reshape,np.expand,np.asarray或使用数组广播(实际上有效,但揭示了重放缓冲区组件顺序性的另一个问题)或dtype=np.float32。
如果任何人有一些想法如何准备LSTM输入,以便状态特征数据都是3D数组,并且每个状态数组元素都是单个值(即使由21个特征组成),我将非常感谢任何帮助。
1条答案
按热度按时间6jygbczu1#
错误似乎是由于LSTM模型的输入数据的形状而产生的。正如您提到的,LSTM需要3D输入作为Tensor,我们需要确保输入数据在输入到LSTM之前是正确的形状。
根据您提供的代码,批量采样后states和next_states数据的形状似乎是(batch_size,21)。然而,LSTM期望输入形状为(batch_size,n_timesteps,state_dim),其中n_timesteps是序列中的时间步的数量,state_dim是状态中的特征的数量。我们需要重塑states和next_states数据以匹配该形状。
为此,我们可以使用NumPy的reshape函数。例如,要将状态重塑为(batch_size,1,state_dim)的形状,我们可以使用以下代码:
类似地,我们可以重塑next_states以具有相同的形状。请注意,我们将新形状的第二个维度设置为1,因为我们一次只向LSTM提供一个时间步。
一旦我们正确地重塑了states和next_states数据,我们就可以像以前一样使用tf.convert_to_tensor函数将它们转换为Tensor。
下面是一个更新后的sample_batch函数:
我希望这能帮助你解决LSTM模型的输入形状问题!