iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: http://github.com/thu-ml/tianshou/issues/1137
Does Tianshou truly supports MARL out of the box? · Issue #1137 · thu-ml/tianshou · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Tianshou truly supports MARL out of the box? #1137

Open
4 of 9 tasks
Legendorik opened this issue May 5, 2024 · 1 comment
Open
4 of 9 tasks

Does Tianshou truly supports MARL out of the box? #1137

Legendorik opened this issue May 5, 2024 · 1 comment
Labels
MARL Temporary label to group all things MARL question Further information is requested

Comments

@Legendorik
Copy link

Legendorik commented May 5, 2024

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I'm trying to use PettingZoo with Tianshou, but the documentation doesn't explain much for training multiple agents at the same time. My agents have to act cooperatively and perform the same tasks, so they have the same policy. It looks simple - just replace the random policy in the example with the right one, and it even seems to work, but I'm not sure it works correctly.

  1. For some reason, only the first agent's rewards come into the reward_metric function. For the second agent it is always 0. Within the PettingZoo AEC environment, the rewards are successfully accumulated.
def step(self, action):
        ... #the very end of my step function
        if self._agent_selector.is_last():
            self._accumulate_rewards() #global rewards are definitely updated
            self._clear_rewards()

        self.agent_selection = self._agent_selector.next()

def reward_metric(rews):
    return rews[:, 0] #shape of rews is correct, but rews[:, 1] is always 0

  1. Am I initializing the policy correctly (with a shared network object, otherwise the second agent is not trained (although maybe this follows from the first point))?
def _get_agents(...):
    ...
    if agent1 is None:
        net = Net(
            state_shape=observation_space.shape or observation_space.n,
            action_shape=env.action_space.shape or env.action_space.n,
            hidden_sizes=args.hidden_sizes,
            softmax=True,
            num_atoms=51,
            dueling_param=({
                "linear_layer": noisy_linear
            }, {
                "linear_layer": noisy_linear
            }),
            device=args.device,
        ).to(args.device)
        if optim is None:
            optim = torch.optim.Adam(net.parameters(), lr=args.lr)
        agent1 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent1.load_state_dict(torch.load('./log/ttt/dqn/policy_0.pth'))

    if agent2 is None:
    
        agent2 = RainbowPolicy(
            model=net,
            optim=optim,
            action_space=env.action_space,
            discount_factor=args.gamma,
            estimation_step=args.n_step,
            target_update_freq=args.target_update_freq,
        ).to(args.device)

        if (args.watch):
            agent2.load_state_dict(torch.load('./log/ttt/dqn/policy_1.pth'))

Tianshou: 1.0.0
PettingZoo: 1.24.3

Thank you for your time!

@MischaPanch
Copy link
Collaborator

Hi @Legendorik

The core team that has been working on tianshou for the last 6 months has deprioritized marl - there's just too many other things that need fixing first. Also, it's more of a niche topic.

I'm afraid I'm the foreseeable future we won't be able to help you with that, but I'm happy to review a PR if you decide to dive deeper into marl with tianshou.

Maybe @Trinkle23897 or @ChenDRAG can help though

@MischaPanch MischaPanch added question Further information is requested MARL Temporary label to group all things MARL labels May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MARL Temporary label to group all things MARL question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants