Keming Lu Bowen Yu 0002 Fei Huang 0004 Yang Fan Runji Lin Chang Zhou Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment. 2024 abs/2405.17931 CoRR https://doi.org/10.48550/arXiv.2405.17931 db/journals/corr/corr2405.html#abs-2405-17931
Bofei Gao Zefan Cai Runxin Xu Peiyi Wang Ce Zheng Runji Lin Keming Lu Junyang Lin Chang Zhou Wen Xiao Junjie Hu Tianyu Liu 0001 Baobao Chang LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback. 2024 abs/2406.14024 CoRR https://doi.org/10.48550/arXiv.2406.14024 db/journals/corr/corr2406.html#abs-2406-14024
An Yang Baosong Yang Binyuan Hui Bo Zheng 0007 Bowen Yu 0002 Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu Fei Huang 0004 Guanting Dong Haoran Wei Huan Lin Jialong Tang Jialin Wang Jian Yang 0003 Jianhong Tu Jianwei Zhang 0012 Jianxin Ma Jianxin Yang Jin Xu Jingren Zhou Jinze Bai Jinzheng He Junyang Lin Kai Dang Keming Lu Keqin Chen Kexin Yang 0002 Mei Li Mingfeng Xue Na Ni Pei Zhang 0011 Peng Wang 0028 Ru Peng Rui Men Ruize Gao Runji Lin Shijie Wang Shuai Bai Sinan Tan Tianhang Zhu Tianhao Li Tianyu Liu 0001 Wenbin Ge Xiaodong Deng Xiaohuan Zhou Xingzhang Ren Xinyu Zhang 0017 Xipin Wei Xuancheng Ren Xuejing Liu Yang Fan Yang Yao Yichang Zhang Yu Wan 0004 Yunfei Chu Yuqiong Liu Zeyu Cui Zhenru Zhang Zhifang Guo Zhihao Fan Qwen2 Technical Report. 2024 abs/2407.10671 CoRR https://doi.org/10.48550/arXiv.2407.10671 db/journals/corr/corr2407.html#abs-2407-10671 streams/journals/corr
Luo Ji Runji Lin Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence. 2024 abs/2409.07341 CoRR https://doi.org/10.48550/arXiv.2409.07341 db/journals/corr/corr2409.html#abs-2409-07341 streams/journals/corr
An Yang Beichen Zhang Binyuan Hui Bofei Gao Bowen Yu 0002 Chengpeng Li Dayiheng Liu Jianhong Tu Jingren Zhou Junyang Lin Keming Lu Mingfeng Xue Runji Lin Tianyu Liu 0001 Xingzhang Ren Zhenru Zhang Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement. 2024 abs/2409.12122 CoRR https://doi.org/10.48550/arXiv.2409.12122 db/journals/corr/corr2409.html#abs-2409-12122 streams/journals/corr
Muning Wen Runji Lin Hanjing Wang Yaodong Yang 0001 Ying Wen 0001 Luo Mai Jun Wang 0012 Hai-Feng Zhang 0002 Weinan Zhang 0001 Large sequence models for sequential decision-making: a survey. 176349 2023 December 17 Frontiers Comput. Sci. 6 https://doi.org/10.1007/s11704-023-2689-5 db/journals/fcsc/fcsc17.html#WenLWYWMWZZ23
Muning Wen Runji Lin Hanjing Wang Yaodong Yang 0001 Ying Wen 0001 Luo Mai Jun Wang 0012 Haifeng Zhang 0002 Weinan Zhang 0001 Large Sequence Models for Sequential Decision-Making: A Survey. 2023 abs/2306.13945 CoRR https://doi.org/10.48550/arXiv.2306.13945 db/journals/corr/corr2306.html#abs-2306-13945
Keming Lu Hongyi Yuan Zheng Yuan 0002 Runji Lin Junyang Lin Chuanqi Tan Chang Zhou Jingren Zhou #InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models. 2023 abs/2308.07074 CoRR https://doi.org/10.48550/arXiv.2308.07074 db/journals/corr/corr2308.html#abs-2308-07074
Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang Xiaodong Deng Yang Fan Wenbin Ge Yu Han Fei Huang 0004 Binyuan Hui Luo Ji Mei Li Junyang Lin Runji Lin Dayiheng Liu Gao Liu Chengqiang Lu Keming Lu Jianxin Ma Rui Men Xingzhang Ren Xuancheng Ren Chuanqi Tan Sinan Tan Jianhong Tu Peng Wang 0028 Shijie Wang Wei Wang 0225 Shengguang Wu Benfeng Xu Jin Xu An Yang Hao Yang 0006 Jian Yang 0003 Shusheng Yang Yang Yao Bowen Yu 0002 Hongyi Yuan Zheng Yuan 0002 Jianwei Zhang 0012 Xingxuan Zhang Yichang Zhang Zhenru Zhang Chang Zhou Jingren Zhou Xiaohuan Zhou Tianhang Zhu Qwen Technical Report. 2023 abs/2309.16609 CoRR https://doi.org/10.48550/arXiv.2309.16609 db/journals/corr/corr2309.html#abs-2309-16609
Keming Lu Hongyi Yuan Runji Lin Junyang Lin Zheng Yuan 0002 Chang Zhou Jingren Zhou Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models. 2023 abs/2311.08692 CoRR https://doi.org/10.48550/arXiv.2311.08692 db/journals/corr/corr2311.html#abs-2311-08692
Weiyu Ma Qirui Mi Xue Yan Yuqiao Wu Runji Lin Haifeng Zhang 0002 Jun Wang 0012 Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach. 2023 abs/2312.11865 CoRR https://doi.org/10.48550/arXiv.2312.11865 db/journals/corr/corr2312.html#abs-2312-11865
Muning Wen Jakub Grudzien Kuba Runji Lin Weinan Zhang 0001 Ying Wen 0001 Jun Wang 0012 Yaodong Yang 0001 Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. 2022 abs/2205.14953 CoRR https://doi.org/10.48550/arXiv.2205.14953 db/journals/corr/corr2205.html#abs-2205-14953
Yali Du 0001 Chengdong Ma Yuchen Liu Runji Lin Hao Dong 0003 Jun Wang 0012 Yaodong Yang 0001 Fully Decentralized Model-based Policy Optimization for Networked Systems. 2022 abs/2207.06559 CoRR https://doi.org/10.48550/arXiv.2207.06559 db/journals/corr/corr2207.html#abs-2207-06559
Runji Lin Ye Li 0015 Xidong Feng Zhaowei Zhang Xian Hong Wu Fung Haifeng Zhang 0002 Jun Wang 0012 Yali Du 0001 Yaodong Yang 0001 Contextual Transformer for Offline Meta Reinforcement Learning. 2022 abs/2211.08016 CoRR https://doi.org/10.48550/arXiv.2211.08016 db/journals/corr/corr2211.html#abs-2211-08016