You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use PettingZoo with Tianshou, but the documentation doesn't explain much for training multiple agents at the same time. My agents have to act cooperatively and perform the same tasks, so they have the same policy. It looks simple - just replace the random policy in the example with the right one, and it even seems to work, but I'm not sure it works correctly.
For some reason, only the first agent's rewards come into the reward_metric function. For the second agent it is always 0. Within the PettingZoo AEC environment, the rewards are successfully accumulated.
def step(self, action):
... #the very end of my step function
if self._agent_selector.is_last():
self._accumulate_rewards() #global rewards are definitely updated
self._clear_rewards()
self.agent_selection = self._agent_selector.next()
def reward_metric(rews):
return rews[:, 0] #shape of rews is correct, but rews[:, 1] is always 0
Am I initializing the policy correctly (with a shared network object, otherwise the second agent is not trained (although maybe this follows from the first point))?
def _get_agents(...):
...
if agent1 is None:
net = Net(
state_shape=observation_space.shape or observation_space.n,
action_shape=env.action_space.shape or env.action_space.n,
hidden_sizes=args.hidden_sizes,
softmax=True,
num_atoms=51,
dueling_param=({
"linear_layer": noisy_linear
}, {
"linear_layer": noisy_linear
}),
device=args.device,
).to(args.device)
if optim is None:
optim = torch.optim.Adam(net.parameters(), lr=args.lr)
agent1 = RainbowPolicy(
model=net,
optim=optim,
action_space=env.action_space,
discount_factor=args.gamma,
estimation_step=args.n_step,
target_update_freq=args.target_update_freq,
).to(args.device)
if (args.watch):
agent1.load_state_dict(torch.load('./log/ttt/dqn/policy_0.pth'))
if agent2 is None:
agent2 = RainbowPolicy(
model=net,
optim=optim,
action_space=env.action_space,
discount_factor=args.gamma,
estimation_step=args.n_step,
target_update_freq=args.target_update_freq,
).to(args.device)
if (args.watch):
agent2.load_state_dict(torch.load('./log/ttt/dqn/policy_1.pth'))
Tianshou: 1.0.0
PettingZoo: 1.24.3
Thank you for your time!
The text was updated successfully, but these errors were encountered:
The core team that has been working on tianshou for the last 6 months has deprioritized marl - there's just too many other things that need fixing first. Also, it's more of a niche topic.
I'm afraid I'm the foreseeable future we won't be able to help you with that, but I'm happy to review a PR if you decide to dive deeper into marl with tianshou.
I'm trying to use PettingZoo with Tianshou, but the documentation doesn't explain much for training multiple agents at the same time. My agents have to act cooperatively and perform the same tasks, so they have the same policy. It looks simple - just replace the random policy in the example with the right one, and it even seems to work, but I'm not sure it works correctly.
Tianshou: 1.0.0
PettingZoo: 1.24.3
Thank you for your time!
The text was updated successfully, but these errors were encountered: