Tianze Zhou Fubiao Zhang Kun Shao Zipeng Dai Kai Li 0022 Wenhan Huang Weixun Wang Bin Wang 0034 Dong Li 0016 Wulong Liu Jianye Hao Cooperative Multiagent Transfer Learning With Coalition Pattern Decomposition. 352-364 2024 June 16 IEEE Trans. Games 2 https://doi.org/10.1109/TG.2023.3272386 db/journals/tciaig/tciaig16.html#ZhouZSDLHWWLLH24
Shengyi Huang Michael Noukhovitch Arian Hosseini Kashif Rasul Weixun Wang Lewis Tunstall The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization. 2024 abs/2403.17031 CoRR https://doi.org/10.48550/arXiv.2403.17031 db/journals/corr/corr2403.html#abs-2403-17031
Jian Hu Xibin Wu Weixun Wang Xianyu Dehao Zhang Yu Cao OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework. 2024 abs/2405.11143 CoRR https://doi.org/10.48550/arXiv.2405.11143 db/journals/corr/corr2405.html#abs-2405-11143
Shilong Li Yancheng He Hui Huang Xingyuan Bu Jiaheng Liu Hangyu Guo Weixun Wang Jihao Gu Wenbo Su Bo Zheng 0007 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision. 2024 abs/2410.19720 CoRR https://doi.org/10.48550/arXiv.2410.19720 db/journals/corr/corr2410.html#abs-2410-19720 streams/journals/corr
Tianpei Yang Weixun Wang Jianye Hao Matthew E. Taylor Yong Liu 0007 Xiaotian Hao Yujing Hu Yingfeng Chen Changjie Fan Chunxu Ren Ye Huang Jiangcheng Zhu Yang Gao 0001 ASN: action semantics network for multiagent reinforcement learning. 45 2023 October 37 Auton. Agents Multi Agent Syst. 2 https://doi.org/10.1007/s10458-023-09628-3 db/journals/aamas/aamas37.html#YangWHTLHHCFRHZG23
Siyi Hu Yifan Zhong Minquan Gao Weixun Wang Hao Dong 0003 Xiaodan Liang Zhihui Li 0001 Xiaojun Chang Yaodong Yang 0001 MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library. 315:1-315:23 2023 24 J. Mach. Learn. Res. http://jmlr.org/papers/v24/23-0378.html db/journals/jmlr/jmlr24.html#HuZGW0L0C023
Jian Zhao 0018 Youpeng Zhao Weixun Wang Mingyu Yang Xunhan Hu Wengang Zhou Jianye Hao Houqiang Li Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents. 1032-1042 2022 23 Frontiers Inf. Technol. Electron. Eng. 7 https://doi.org/10.1631/FITEE.2100594 db/journals/jzusc/jzusc23.html#ZhaoZWYHZHL22
Jian Zhao 0018 Yue Zhang Xunhan Hu Weixun Wang Wengang Zhou Jianye Hao Jiangcheng Zhu Houqiang Li Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization. 2022 abs/2202.04427 CoRR https://arxiv.org/abs/2202.04427 db/journals/corr/corr2202.html#abs-2202-04427
Xiaotian Hao Weixun Wang Hangyu Mao Yaodong Yang 0002 Dong Li 0016 Yan Zheng 0002 Zhen Wang 0004 Jianye Hao API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks. 2022 abs/2203.05285 CoRR https://doi.org/10.48550/arXiv.2203.05285 db/journals/corr/corr2203.html#abs-2203-05285
Jian Zhao 0018 Youpeng Zhao Weixun Wang Mingyu Yang Xunhan Hu Wengang Zhou Jianye Hao Houqiang Li Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed Agents. 2022 abs/2203.08454 CoRR https://doi.org/10.48550/arXiv.2203.08454 db/journals/corr/corr2203.html#abs-2203-08454
Shengyi Huang Anssi Kanervisto Antonin Raffin Weixun Wang Santiago Ontañón Rousslan Fernand Julien Dossa A2C is a special case of PPO. 2022 abs/2205.09123 CoRR https://doi.org/10.48550/arXiv.2205.09123 db/journals/corr/corr2205.html#abs-2205-09123
Wei Qiu 0001 Weixun Wang Rundong Wang Bo An 0001 Yujing Hu Svetlana Obraztsova Zinovi Rabinovich Jianye Hao Yingfeng Chen Changjie Fan Off-Beat Multi-Agent Reinforcement Learning. 2022 abs/2205.13718 CoRR https://doi.org/10.48550/arXiv.2205.13718 db/journals/corr/corr2205.html#abs-2205-13718
Siyi Hu Yifan Zhong Minquan Gao Weixun Wang Hao Dong 0003 Zhihui Li 0001 Xiaodan Liang Xiaojun Chang Yaodong Yang 0001 MARLlib: Extending RLlib for Multi-agent Reinforcement Learning. 2022 abs/2210.13708 CoRR https://doi.org/10.48550/arXiv.2210.13708 db/journals/corr/corr2210.html#abs-2210-13708
Tianze Zhou Fubiao Zhang Kun Shao Kai Li 0022 Wenhan Huang Jun Luo 0009 Weixun Wang Yaodong Yang 0002 Hangyu Mao Bin Wang 0034 Dong Li 0016 Wulong Liu Jianye Hao Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment. 2021 abs/2106.00517 CoRR https://arxiv.org/abs/2106.00517 db/journals/corr/corr2106.html#abs-2106-00517
Peng Zhang Jianye Hao Weixun Wang Hongyao Tang Yi Ma 0005 Yihai Duan Yan Zheng 0002 KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge. 2020 abs/2002.07418 CoRR https://arxiv.org/abs/2002.07418 db/journals/corr/corr2002.html#abs-2002-07418
Tianpei Yang Weixun Wang Hongyao Tang Jianye Hao Zhaopeng Meng Wulong Liu Yujing Hu Yingfeng Chen Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework. 2020 abs/2002.08030 CoRR https://arxiv.org/abs/2002.08030 db/journals/corr/corr2002.html#abs-2002-08030
Tianpei Yang Jianye Hao Zhaopeng Meng Zongzhang Zhang Weixun Wang Yujing Hu Yingfeng Chen Changjie Fan Zhaodong Wang Jiajie Peng Efficient Deep Reinforcement Learning through Policy Transfer. 2020 abs/2002.08037 CoRR https://arxiv.org/abs/2002.08037 db/journals/corr/corr2002.html#abs-2002-08037
Xiaotian Hao Junqi Jin Jianye Hao Jin Li Weixun Wang Yi Ma 0005 Zhenzhe Zheng Han Li Jian Xu 0015 Kun Gai Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising. 2020 abs/2005.04355 CoRR https://arxiv.org/abs/2005.04355 db/journals/corr/corr2005.html#abs-2005-04355
Yujing Hu Weixun Wang Hangtian Jia Yixiang Wang Yingfeng Chen Jianye Hao Feng Wu 0001 Changjie Fan Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping. 2020 abs/2011.02669 CoRR https://arxiv.org/abs/2011.02669 db/journals/corr/corr2011.html#abs-2011-02669
Weixun Wang Tianpei Yang Yong Liu 0007 Jianye Hao Xiaotian Hao Yujing Hu Yingfeng Chen Changjie Fan Yang Gao 0001 Action Semantics Network: Considering the Effects of Actions in Multiagent Systems. 2019 abs/1907.11461 CoRR http://arxiv.org/abs/1907.11461 db/journals/corr/corr1907.html#abs-1907-11461
Weixun Wang Tianpei Yang Yong Liu 0007 Jianye Hao Xiaotian Hao Yujing Hu Yingfeng Chen Changjie Fan Yang Gao 0001 From Few to More: Large-scale Dynamic Multiagent Curriculum Learning. 2019 abs/1909.02790 CoRR http://arxiv.org/abs/1909.02790 db/journals/corr/corr1909.html#abs-1909-02790
Xiaotian Hao Weixun Wang Jianye Hao Yaodong Yang 0002 Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems. 2019 abs/1909.11468 CoRR http://arxiv.org/abs/1909.11468 db/journals/corr/corr1909.html#abs-1909-11468
Yong Liu 0007 Weixun Wang Yujing Hu Jianye Hao Xingguo Chen Yang Gao 0001 Multi-Agent Game Abstraction via Graph Attention Neural Network. 2019 abs/1911.10715 CoRR http://arxiv.org/abs/1911.10715 db/journals/corr/corr1911.html#abs-1911-10715
Weixun Wang Jianye Hao Yixi Wang Matthew E. Taylor Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach. 2018 abs/1803.00162 CoRR http://arxiv.org/abs/1803.00162 db/journals/corr/corr1803.html#abs-1803-00162
Weixun Wang Junqi Jin Jianye Hao Chunjie Chen 0004 Chuan Yu Weinan Zhang 0001 Jun Wang 0012 Yixi Wang Han Li Jian Xu 0015 Kun Gai Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. 2018 abs/1809.03149 CoRR http://arxiv.org/abs/1809.03149 db/journals/corr/corr1809.html#abs-1809-03149
Weixun Wang Sanjay Ranka Prabhat Mishra 0001 Energy-aware dynamic slack allocation for real-time multitasking systems. 128-137 2012 2 Sustain. Comput. Informatics Syst. 3 https://doi.org/10.1016/j.suscom.2012.04.001 db/journals/suscom/suscom2.html#WangRM12
Xiaoke Qin Weixun Wang Prabhat Mishra 0001 TCEC: Temperature and Energy-Constrained Scheduling in Real-Time Multitasking Systems. 1159-1168 2012 31 IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 8 https://doi.org/10.1109/TCAD.2012.2190824 db/journals/tcad/tcad31.html#QinWM12
Weixun Wang Prabhat Mishra 0001 Ann Gordon-Ross Dynamic Cache Reconfiguration for Soft Real-Time Systems. 28:1-28:31 2012 11 ACM Trans. Embed. Comput. Syst. 2 https://doi.org/10.1145/2220336.2220340 db/journals/tecs/tecs11.html#WangMG12
Weixun Wang Prabhat Mishra 0001 System-Wide Leakage-Aware Energy Minimization Using Dynamic Voltage Scaling and Cache Reconfiguration in Multitasking Systems. 902-910 2012 20 IEEE Trans. Very Large Scale Integr. Syst. 5 https://doi.org/10.1109/TVLSI.2011.2116814 db/journals/tvlsi/tvlsi20.html#WangM12
Kanad Basu Subrata Mitra Srishti Mukherjee Weixun Wang A Novel Approach for Handling Misbehaving Nodes in Behavior-Aware Mobile Networking http://arxiv.org/abs/1211.1736 2012 CoRR abs/1211.1736 db/journals/corr/corr1211.html#abs-1211-1736
Weixun Wang Prabhat Mishra 0001 Dynamic Reconfiguration of Two-Level Cache Hierarchy in Real-Time Embedded Systems. 17-28 2011 7 J. Low Power Electron. 1 https://doi.org/10.1166/jolpe.2011.1113 db/journals/jolpe/jolpe7.html#WangM11
Weixun Wang Sanjay Ranka Prabhat Mishra 0001 Energy-aware dynamic reconfiguration algorithms for real-time multitasking systems. 35-45 2011 1 Sustain. Comput. Informatics Syst. 1 https://doi.org/10.1016/j.suscom.2010.10.006 db/journals/suscom/suscom1.html#WangRM11