iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://dblp.dagstuhl.de/pid/64/10094.xml

Marc Lanctot David Sychrovsky Michal Sustr Elnaz Davoodi Michael Bowling Marc Lanctot Martin Schmid Learning Not to Regret. 15202-15210 2024 AAAI https://doi.org/10.1609/aaai.v38i14.29443 conf/aaai/2024 db/conf/aaai/aaai2024.html#SychrovskySDBLS24 Ian Gemp Marc Lanctot Luke Marris Yiran Mao Edgar A. Duéñez-Guzmán Sarah Perrin Andras Gyorgy Romuald Elie Georgios Piliouras Michael Kaisers Daniel Hennes Kalesha Bullard Kate Larson Yoram Bachrach Approximating the Core via Iterative Coalition Sampling. 669-678 2024 AAMAS https://dl.acm.org/doi/10.5555/3635637.3662919 conf/atal/2024 db/conf/atal/aamas2024.html#GempLMMDPGEPKHB24 Siqi Liu 0002 Luke Marris Marc Lanctot Georgios Piliouras Joel Z. Leibo Nicolas Heess Neural Population Learning beyond Symmetric Zero-Sum Games. 1247-1255 2024 AAMAS https://dl.acm.org/doi/10.5555/3635637.3662982 conf/atal/2024 db/conf/atal/aamas2024.html#LiuMLPLH24

Siqi Liu 0002 Luke Marris Marc Lanctot Georgios Piliouras Joel Z. Leibo Nicolas Heess Neural Population Learning beyond Symmetric Zero-sum Games. 2024 abs/2401.05133 CoRR https://doi.org/10.48550/arXiv.2401.05133 db/journals/corr/corr2401.html#abs-2401-05133

Ian Gemp Yoram Bachrach Marc Lanctot Roma Patel Vibhavari Dasagi Luke Marris Georgios Piliouras Siqi Liu 0002 Karl Tuyls States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers. 2024 abs/2402.01704 CoRR https://doi.org/10.48550/arXiv.2402.01704 db/journals/corr/corr2402.html#abs-2402-01704

Ian Gemp Marc Lanctot Luke Marris Yiran Mao Edgar A. Duéñez-Guzmán Sarah Perrin Andras Gyorgy Romuald Elie Georgios Piliouras Michael Kaisers Daniel Hennes Kalesha Bullard Kate Larson Yoram Bachrach Approximating the Core via Iterative Coalition Sampling. 2024 abs/2402.03928 CoRR https://doi.org/10.48550/arXiv.2402.03928 db/journals/corr/corr2402.html#abs-2402-03928

Luca D'Amico-Wong Hugh Zhang Marc Lanctot David C. Parkes Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization. 2024 abs/2402.11835 CoRR https://doi.org/10.48550/arXiv.2402.11835 db/journals/corr/corr2402.html#abs-2402-11835

Heymann Benjamin Marc Lanctot Learning in Games with progressive hiding. 2024 abs/2409.03875 CoRR https://doi.org/10.48550/arXiv.2409.03875 db/journals/corr/corr2409.html#abs-2409-03875 streams/journals/corr

Marc Lanctot John Schultz Neil Burch Max Olan Smith Daniel Hennes Thomas Anthony 0001 Julien Pérolat Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning. 2023 2023 Trans. Mach. Learn. Res. https://openreview.net/forum?id=gQnJ7ODIAx db/journals/tmlr/tmlr2023.html#LanctotSBSH0P23

Zun Li 0002 Marc Lanctot Kevin R. McKee Luke Marris Ian Gemp Daniel Hennes Kate Larson Yoram Bachrach Michael P. Wellman Paul Muller Search-Improved Game-Theoretic Multiagent Reinforcement Learning in General and Negotiation Games. 2445-2447 2023 AAMAS https://dl.acm.org/doi/10.5555/3545946.3598962 conf/atal/2023 db/conf/atal/aamas2023.html#LiLMMGHLBWM23 Stephen Marcus McAleer Gabriele Farina Marc Lanctot Tuomas Sandholm ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret. 2023 ICLR https://openreview.net/forum?id=35QyoZv8cKO conf/iclr/2023 db/conf/iclr/iclr2023.html#McAleerFLS23 Samuel Sokota Ryan D'Orazio J. Zico Kolter Nicolas Loizou Marc Lanctot Ioannis Mitliagkas Noam Brown Christian Kroer A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games. 2023 ICLR https://openreview.net/forum?id=DpE5UYUQzZH conf/iclr/2023 db/conf/iclr/iclr2023.html#SokotaDKLLMBK23

Zun Li 0002 Marc Lanctot Kevin R. McKee Luke Marris Ian Gemp Daniel Hennes Paul Muller Kate Larson Yoram Bachrach Michael P. Wellman Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning. 2023 abs/2302.00797 CoRR https://doi.org/10.48550/arXiv.2302.00797 db/journals/corr/corr2302.html#abs-2302-00797

David Sychrovsky Michal Sustr Elnaz Davoodi Marc Lanctot Martin Schmid Learning not to Regret. 2023 abs/2303.01074 CoRR https://doi.org/10.48550/arXiv.2303.01074 db/journals/corr/corr2303.html#abs-2303-01074

Marc Lanctot John Schultz Neil Burch Max Olan Smith Daniel Hennes Thomas W. Anthony 0001 Julien Pérolat Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning. 2023 abs/2303.03196 CoRR https://doi.org/10.48550/arXiv.2303.03196 db/journals/corr/corr2303.html#abs-2303-03196

Marc Lanctot Kate Larson Yoram Bachrach Luke Marris Zun Li 0002 Avishkar Bhoopchand Thomas W. Anthony 0001 Brian Tanner Anna Koop Evaluating Agents using Social Choice Theory. 2023 abs/2312.03121 CoRR https://doi.org/10.48550/arXiv.2312.03121 db/journals/corr/corr2312.html#abs-2312-03121

Ian Gemp Thomas W. Anthony 0001 Yoram Bachrach Avishkar Bhoopchand Kalesha Bullard Jerome T. Connor Vibhavari Dasagi Bart De Vylder Edgar A. Duéñez-Guzmán Romuald Elie Richard Everett 0001 Daniel Hennes Edward Hughes 0001 Mina Khan Marc Lanctot Kate Larson Guy Lever Siqi Liu 0002 Luke Marris Kevin R. McKee Paul Muller Julien Pérolat Florian Strub Andrea Tacchetti Eugene Tarassov Zhe Wang Karl Tuyls Developing, evaluating and scaling learning agents in multi-agent environments. 271-284 2022 35 AI Commun. 4 https://doi.org/10.3233/AIC-220113 db/journals/aicom/aicom35.html#GempABBBCDVDEEH22

Ian Gemp Rahul Savani Marc Lanctot Yoram Bachrach Thomas W. Anthony 0001 Richard Everett 0001 Andrea Tacchetti Tom Eccles János Kramár Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent. 507-515 2022 AAMAS https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p507.pdf https://dl.acm.org/doi/10.5555/3535850.3535908 conf/atal/2022 db/conf/atal/aamas2022.html#GempSLBA0TEK22 Siqi Liu 0002 Marc Lanctot Luke Marris Nicolas Heess Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. 13793-13806 2022 ICML https://proceedings.mlr.press/v162/liu22h.html conf/icml/2022 db/conf/icml/icml2022.html#LiuLMH22 Finbarr Timbers Nolan Bard Edward Lockhart Marc Lanctot Martin Schmid Neil Burch Julian Schrittwieser Thomas Hubert Michael Bowling Approximate Exploitability: Learning a Best Response. 3487-3493 2022 IJCAI https://doi.org/10.24963/ijcai.2022/484 conf/ijcai/2022 db/conf/ijcai/ijcai2022.html#TimbersBLLSBSHB22 Julien Pérolat Bart De Vylder Daniel Hennes Eugene Tarassov Florian Strub Vincent de Boer Paul Muller Jerome T. Connor Neil Burch Thomas Anthony 0001 Stephen McAleer Romuald Elie Sarah H. Cen Zhe Wang Audrunas Gruslys Aleksandra Malysheva Mina Khan Sherjil Ozair Finbarr Timbers Toby Pohlen Tom Eccles Mark Rowland 0001 Marc Lanctot Jean-Baptiste Lespiau Bilal Piot Shayegan Omidshafiei Edward Lockhart Laurent Sifre Nathalie Beauguerlange Rémi Munos David Silver Satinder Singh 0001 Demis Hassabis Karl Tuyls Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning". 2022 October https://doi.org/10.5281/zenodo.7118519 Zenodo streams/repo/zenodo

Stephen McAleer Kevin Wang 0003 John B. Lanier Marc Lanctot Pierre Baldi Tuomas Sandholm Roy Fox Anytime PSRO for Two-Player Zero-Sum Games. 2022 abs/2201.07700 CoRR https://arxiv.org/abs/2201.07700 db/journals/corr/corr2201.html#abs-2201-07700

Siqi Liu 0002 Marc Lanctot Luke Marris Nicolas Heess Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. 2022 abs/2205.15879 CoRR https://doi.org/10.48550/arXiv.2205.15879 db/journals/corr/corr2205.html#abs-2205-15879

Stephen McAleer Gabriele Farina Marc Lanctot Tuomas Sandholm ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret. 2022 abs/2206.04122 CoRR https://doi.org/10.48550/arXiv.2206.04122 db/journals/corr/corr2206.html#abs-2206-04122

Samuel Sokota Ryan D'Orazio J. Zico Kolter Nicolas Loizou Marc Lanctot Ioannis Mitliagkas Noam Brown Christian Kroer A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games. 2022 abs/2206.05825 CoRR https://doi.org/10.48550/arXiv.2206.05825 db/journals/corr/corr2206.html#abs-2206-05825

Julien Pérolat Bart De Vylder Daniel Hennes Eugene Tarassov Florian Strub Vincent de Boer Paul Muller Jerome T. Connor Neil Burch Thomas W. Anthony 0001 Stephen McAleer Romuald Elie Sarah H. Cen Zhe Wang Audrunas Gruslys Aleksandra Malysheva Mina Khan Sherjil Ozair Finbarr Timbers Toby Pohlen Tom Eccles Mark Rowland 0001 Marc Lanctot Jean-Baptiste Lespiau Bilal Piot Shayegan Omidshafiei Edward Lockhart Laurent Sifre Nathalie Beauguerlange Rémi Munos David Silver Satinder Singh 0001 Demis Hassabis Karl Tuyls Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning. 2022 abs/2206.15378 CoRR https://doi.org/10.48550/arXiv.2206.15378 db/journals/corr/corr2206.html#abs-2206-15378

Ian Gemp Thomas W. Anthony 0001 Yoram Bachrach Avishkar Bhoopchand Kalesha Bullard Jerome T. Connor Vibhavari Dasagi Bart De Vylder Edgar A. Duéñez-Guzmán Romuald Elie Richard Everett 0001 Daniel Hennes Edward Hughes 0001 Mina Khan Marc Lanctot Kate Larson Guy Lever Siqi Liu 0002 Luke Marris Kevin R. McKee Paul Muller Julien Pérolat Florian Strub Andrea Tacchetti Eugene Tarassov Zhe Wang Karl Tuyls Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments. 2022 abs/2209.10958 CoRR https://doi.org/10.48550/arXiv.2209.10958 db/journals/corr/corr2209.html#abs-2209-10958

Luke Marris Marc Lanctot Ian Gemp Shayegan Omidshafiei Stephen McAleer Jerome T. Connor Karl Tuyls Thore Graepel Game Theoretic Rating in N-player general-sum games with Equilibria. 2022 abs/2210.02205 CoRR https://doi.org/10.48550/arXiv.2210.02205 db/journals/corr/corr2210.html#abs-2210-02205

Dustin Morrill Ryan D'Orazio Reca Sarfati Marc Lanctot James R. Wright Amy R. Greenwald Michael Bowling Hindsight and Sequential Rationality of Correlated Play. 5584-5594 2021 AAAI https://doi.org/10.1609/aaai.v35i6.16702 conf/aaai/2021 db/conf/aaai/aaai2021.html#MorrillDSLWGB21 Samuel Sokota Edward Lockhart Finbarr Timbers Elnaz Davoodi Ryan D'Orazio Neil Burch Martin Schmid Michael Bowling Marc Lanctot Solving Common-Payoff Games with Approximate Policy Iteration. 9695-9703 2021 AAAI https://doi.org/10.1609/aaai.v35i11.17166 conf/aaai/2021 db/conf/aaai/aaai2021.html#SokotaLTDDBSBL21 Michal Sustr Martin Schmid Matej Moravcík Neil Burch Marc Lanctot Michael Bowling Sound Algorithms in Imperfect Information Games. 1674-1676 2021 AAMAS https://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1674.pdf https://dl.acm.org/doi/10.5555/3463952.3464197 conf/atal/2021 db/conf/atal/aamas2021.html#SustrSMBLB21 Luke Marris Paul Muller Marc Lanctot Karl Tuyls Thore Graepel Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers. 7480-7491 2021 ICML http://proceedings.mlr.press/v139/marris21a.html conf/icml/2021 db/conf/icml/icml2021.html#MarrisMLTG21 Dustin Morrill Ryan D'Orazio Marc Lanctot James R. Wright Michael Bowling Amy R. Greenwald Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games. 7818-7828 2021 ICML http://proceedings.mlr.press/v139/morrill21a.html conf/icml/2021 db/conf/icml/icml2021.html#MorrillDLWBG21 Julien Pérolat Rémi Munos Jean-Baptiste Lespiau Shayegan Omidshafiei Mark Rowland 0001 Pedro A. Ortega Neil Burch Thomas W. Anthony 0001 David Balduzzi Bart De Vylder Georgios Piliouras Marc Lanctot Karl Tuyls From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization. 8525-8535 2021 ICML http://proceedings.mlr.press/v139/perolat21a.html conf/icml/2021 db/conf/icml/icml2021.html#PerolatMLOROBAB21 Abhinav Gupta 0002 Marc Lanctot Angeliki Lazaridou Dynamic population-based meta-learning for multi-agent communication with natural language. 16899-16912 2021 NeurIPS https://proceedings.neurips.cc/paper/2021/hash/8caa38721906c1a0bb95c80fab33a893-Abstract.html conf/nips/2021 db/conf/nips/neurips2021.html#GuptaLL21

Samuel Sokota Edward Lockhart Finbarr Timbers Elnaz Davoodi Ryan D'Orazio Neil Burch Martin Schmid Michael Bowling Marc Lanctot Solving Common-Payoff Games with Approximate Policy Iteration. 2021 abs/2101.04237 CoRR https://arxiv.org/abs/2101.04237 db/journals/corr/corr2101.html#abs-2101-04237

Dustin Morrill Ryan D'Orazio Marc Lanctot James R. Wright Michael Bowling Amy Greenwald Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games. 2021 abs/2102.06973 CoRR https://arxiv.org/abs/2102.06973 db/journals/corr/corr2102.html#abs-2102-06973

Ian Gemp Rahul Savani Marc Lanctot Yoram Bachrach Thomas W. Anthony 0001 Richard Everett 0001 Andrea Tacchetti Tom Eccles János Kramár Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent. 2021 abs/2106.01285 CoRR https://arxiv.org/abs/2106.01285 db/journals/corr/corr2106.html#abs-2106-01285

Luke Marris Paul Muller Marc Lanctot Karl Tuyls Thore Graepel Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers. 2021 abs/2106.09435 CoRR https://arxiv.org/abs/2106.09435 db/journals/corr/corr2106.html#abs-2106-09435

Abhinav Gupta 0002 Marc Lanctot Angeliki Lazaridou Dynamic population-based meta-learning for multi-agent communication with natural language. 2021 abs/2110.14241 CoRR https://arxiv.org/abs/2110.14241 db/journals/corr/corr2110.html#abs-2110-14241

Martin Schmid Matej Moravcik Neil Burch Rudolf Kadlec Joshua Davidson Kevin Waugh Nolan Bard Finbarr Timbers Marc Lanctot G. Zacharias Holland Elnaz Davoodi Alden Christianson Michael Bowling Player of Games. 2021 abs/2112.03178 CoRR https://arxiv.org/abs/2112.03178 db/journals/corr/corr2112.html#abs-2112-03178

Karl Tuyls Julien Pérolat Marc Lanctot Edward Hughes 0001 Richard Everett 0001 Joel Z. Leibo Csaba Szepesvári Thore Graepel Bounds and dynamics for empirical game theoretic analysis. 7 2020 34 Auton. Agents Multi Agent Syst. 1 https://doi.org/10.1007/s10458-019-09432-y db/journals/aamas/aamas34.html#TuylsPLHELSG20

Nolan Bard Jakob N. Foerster Sarath Chandar Neil Burch Marc Lanctot H. Francis Song Emilio Parisotto Vincent Dumoulin Subhodeep Moitra Edward Hughes 0001 Iain Dunning Shibl Mourad Hugo Larochelle Marc G. Bellemare Michael Bowling The Hanabi challenge: A new frontier for AI research. 103216 2020 280 Artif. Intell. https://doi.org/10.1016/j.artint.2019.103216 db/journals/ai/ai280.html#BardFCBLSPDMHDM20

Yoram Bachrach Richard Everett 0001 Edward Hughes 0001 Angeliki Lazaridou Joel Z. Leibo Marc Lanctot Michael Johanson Wojciech M. Czarnecki Thore Graepel Negotiating team formation using deep reinforcement learning. 103356 2020 288 Artif. Intell. https://doi.org/10.1016/j.artint.2020.103356 db/journals/ai/ai288.html#BachrachEHLLLJC20

Daniel Hennes Dustin Morrill Shayegan Omidshafiei Rémi Munos Julien Pérolat Marc Lanctot Audrunas Gruslys Jean-Baptiste Lespiau Paavo Parmas Edgar A. Duéñez-Guzmán Karl Tuyls Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients. 492-501 2020 AAMAS https://dl.acm.org/doi/10.5555/3398761.3398822 https://www.ifaamas.org/Proceedings/aamas2020/pdfs/p492.pdf conf/atal/2020 db/conf/atal/aamas2020.html#HennesMOMPLGLPD20 Paul Muller Shayegan Omidshafiei Mark Rowland 0001 Karl Tuyls Julien Pérolat Siqi Liu 0002 Daniel Hennes Luke Marris Marc Lanctot Edward Hughes 0001 Zhe Wang Guy Lever Nicolas Heess Thore Graepel Rémi Munos A Generalized Training Approach for Multiagent Learning. 2020 ICLR https://openreview.net/forum?id=Bkl5kxrKDr conf/iclr/2020 db/conf/iclr/iclr2020.html#MullerORTPLHMLH20 Rémi Munos Julien Pérolat Jean-Baptiste Lespiau Mark Rowland 0001 Bart De Vylder Marc Lanctot Finbarr Timbers Daniel Hennes Shayegan Omidshafiei Audrunas Gruslys Mohammad Gheshlaghi Azar Edward Lockhart Karl Tuyls Fast computation of Nash Equilibria in Imperfect Information Games. 7119-7129 2020 ICML http://proceedings.mlr.press/v119/munos20a.html conf/icml/2020 db/conf/icml/icml2020.html#MunosPLRVLTHOGA20 Thomas W. Anthony 0001 Tom Eccles Andrea Tacchetti János Kramár Ian Gemp Thomas C. Hudson Nicolas Porcel Marc Lanctot Julien Pérolat Richard Everett 0001 Satinder Singh 0001 Thore Graepel Yoram Bachrach Learning to Play No-Press Diplomacy with Best Response Policy Iteration. 2020 NeurIPS https://proceedings.neurips.cc/paper/2020/hash/d1419302db9c022ab1d48681b13d5f8b-Abstract.html conf/nips/2020 db/conf/nips/neurips2020.html#AnthonyETKGHPLP20

Julien Pérolat Rémi Munos Jean-Baptiste Lespiau Shayegan Omidshafiei Mark Rowland 0001 Pedro A. Ortega Neil Burch Thomas W. Anthony 0001 David Balduzzi Bart De Vylder Georgios Piliouras Marc Lanctot Karl Tuyls From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization. 2020 abs/2002.08456 CoRR https://arxiv.org/abs/2002.08456 db/journals/corr/corr2002.html#abs-2002-08456

Finbarr Timbers Edward Lockhart Martin Schmid Marc Lanctot Michael Bowling Approximate exploitability: Learning a best response in large games. 2020 abs/2004.09677 CoRR https://arxiv.org/abs/2004.09677 db/journals/corr/corr2004.html#abs-2004-09677

Thomas W. Anthony 0001 Tom Eccles Andrea Tacchetti János Kramár Ian Gemp Thomas C. Hudson Nicolas Porcel Marc Lanctot Julien Pérolat Richard Everett 0001 Satinder Singh 0001 Thore Graepel Yoram Bachrach Learning to Play No-Press Diplomacy with Best Response Policy Iteration. 2020 abs/2006.04635 CoRR https://arxiv.org/abs/2006.04635 db/journals/corr/corr2006.html#abs-2006-04635

Michal Sustr Martin Schmid Matej Moravcík Neil Burch Marc Lanctot Michael Bowling Sound Search in Imperfect Information Games. 2020 abs/2006.08740 CoRR https://arxiv.org/abs/2006.08740 db/journals/corr/corr2006.html#abs-2006-08740

Audrunas Gruslys Marc Lanctot Rémi Munos Finbarr Timbers Martin Schmid Julien Pérolat Dustin Morrill Vinícius Flores Zambaldi Jean-Baptiste Lespiau John Schultz Mohammad Gheshlaghi Azar Michael Bowling Karl Tuyls The Advantage Regret-Matching Actor-Critic. 2020 abs/2008.12234 CoRR https://arxiv.org/abs/2008.12234 db/journals/corr/corr2008.html#abs-2008-12234

Yoram Bachrach Richard Everett 0001 Edward Hughes 0001 Angeliki Lazaridou Joel Z. Leibo Marc Lanctot Michael Johanson Wojciech M. Czarnecki Thore Graepel Negotiating Team Formation Using Deep Reinforcement Learning. 2020 abs/2010.10380 CoRR https://arxiv.org/abs/2010.10380 db/journals/corr/corr2010.html#abs-2010-10380

Dustin Morrill Ryan D'Orazio Reca Sarfati Marc Lanctot James R. Wright Amy Greenwald Michael Bowling Hindsight and Sequential Rationality of Correlated Play. 2020 abs/2012.05874 CoRR https://arxiv.org/abs/2012.05874 db/journals/corr/corr2012.html#abs-2012-05874

Guy Barash Mauricio Castillo-Effen Niyati Chhaya Peter Clark Huáscar Espinoza Eitan Farchi Christopher W. Geib Odd Erik Gundersen Seán Ó hÉigeartaigh José Hernández-Orallo Chiori Hori Xiaowei Huang 0001 Kokil Jaidka Pavan Kapanipathi Sarah Keren Seokhwan Kim Marc Lanctot Danny Lange Julian J. McAuley David R. Martinez Marwan Mattar Mausam Martin Michalowski Reuth Mirsky Roozbeh Mottaghi Joseph C. Osborn Julien Pérolat Martin Schmid Arash Shaban-Nejad Onn Shehory Biplav Srivastava William W. Streilein Kartik Talamadupula Julian Togelius Koichiro Yoshino Quanshi Zhang Imed Zitouni Reports of the Workshops Held at the 2019 AAAI Conference on Artificial Intelligence. 67-78 2019 40 AI Mag. 3 https://doi.org/10.1609/aimag.v40i3.4981 db/journals/aim/aim40.html#BarashCCCEFGGhH19

Martin Schmid Neil Burch Marc Lanctot Matej Moravcik Rudolf Kadlec Michael Bowling Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games Using Baselines. 2157-2164 2019 AAAI https://doi.org/10.1609/aaai.v33i01.33012157 conf/aaai/2019 db/conf/aaai/aaai2019.html#SchmidBLMKB19 Edward Lockhart Marc Lanctot Julien Pérolat Jean-Baptiste Lespiau Dustin Morrill Finbarr Timbers Karl Tuyls Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent. 2019 IJCAI https://doi.org/10.24963/ijcai.2019/66 conf/ijcai/2019 db/conf/ijcai/ijcai2019.html#LockhartLPLMTT19 464-470

Nolan Bard Jakob N. Foerster Sarath Chandar Neil Burch Marc Lanctot H. Francis Song Emilio Parisotto Vincent Dumoulin Subhodeep Moitra Edward Hughes 0001 Iain Dunning Shibl Mourad Hugo Larochelle Marc G. Bellemare Michael Bowling The Hanabi Challenge: A New Frontier for AI Research. 2019 abs/1902.00506 CoRR http://arxiv.org/abs/1902.00506 db/journals/corr/corr1902.html#abs-1902-00506

Joel Z. Leibo Edward Hughes 0001 Marc Lanctot Thore Graepel Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research. 2019 abs/1903.00742 CoRR http://arxiv.org/abs/1903.00742 db/journals/corr/corr1903.html#abs-1903-00742

Shayegan Omidshafiei Christos H. Papadimitriou Georgios Piliouras Karl Tuyls Mark Rowland 0001 Jean-Baptiste Lespiau Wojciech M. Czarnecki Marc Lanctot Julien Pérolat Rémi Munos α-Rank: Multi-Agent Evaluation by Evolution. 2019 abs/1903.01373 CoRR http://arxiv.org/abs/1903.01373 db/journals/corr/corr1903.html#abs-1903-01373

Edward Lockhart Marc Lanctot Julien Pérolat Jean-Baptiste Lespiau Dustin Morrill Finbarr Timbers Karl Tuyls Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent. 2019 abs/1903.05614 CoRR http://arxiv.org/abs/1903.05614 db/journals/corr/corr1903.html#abs-1903-05614

Shayegan Omidshafiei Daniel Hennes Dustin Morrill Rémi Munos Julien Pérolat Marc Lanctot Audrunas Gruslys Jean-Baptiste Lespiau Karl Tuyls Neural Replicator Dynamics. 2019 abs/1906.00190 CoRR http://arxiv.org/abs/1906.00190 db/journals/corr/corr1906.html#abs-1906-00190

Marc Lanctot Edward Lockhart Jean-Baptiste Lespiau Vinícius Flores Zambaldi Satyaki Upadhyay Julien Pérolat Sriram Srinivasan 0005 Finbarr Timbers Karl Tuyls Shayegan Omidshafiei Daniel Hennes Dustin Morrill Paul Muller Timo Ewalds Ryan Faulkner 0001 János Kramár Bart De Vylder Brennan Saeta James Bradbury David Ding Sebastian Borgeaud Matthew Lai Julian Schrittwieser Thomas W. Anthony 0001 Edward Hughes 0001 Ivo Danihelka Jonah Ryan-Davis OpenSpiel: A Framework for Reinforcement Learning in Games. 2019 abs/1908.09453 CoRR http://arxiv.org/abs/1908.09453 db/journals/corr/corr1908.html#abs-1908-09453

Paul Muller Shayegan Omidshafiei Mark Rowland 0001 Karl Tuyls Julien Pérolat Siqi Liu 0002 Daniel Hennes Luke Marris Marc Lanctot Edward Hughes 0001 Zhe Wang Guy Lever Nicolas Heess Thore Graepel Rémi Munos A Generalized Training Approach for Multiagent Learning. 2019 abs/1909.12823 CoRR http://arxiv.org/abs/1909.12823 db/journals/corr/corr1909.html#abs-1909-12823

Todd Hester Matej Vecerík Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Dan Horgan John Quan Andrew Sendonaris Ian Osband Gabriel Dulac-Arnold John P. Agapiou Joel Z. Leibo Audrunas Gruslys Deep Q-learning From Demonstrations. 2018 AAAI https://doi.org/10.1609/aaai.v32i1.11757 conf/aaai/2018 db/conf/aaai/aaai2018.html#HesterVPLSPHQSO18 3223-3230 Karl Tuyls Julien Pérolat Marc Lanctot Joel Z. Leibo Thore Graepel A Generalised Method for Empirical Game Theoretic Analysis. 77-85 2018 AAMAS http://dl.acm.org/citation.cfm?id=3237402 conf/atal/2018 db/conf/atal/aamas2018.html#TuylsPLLG18 Peter Sunehag Guy Lever Audrunas Gruslys Wojciech Marian Czarnecki Vinícius Flores Zambaldi Max Jaderberg Marc Lanctot Nicolas Sonnerat Joel Z. Leibo Karl Tuyls Thore Graepel Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. 2085-2087 2018 AAMAS http://dl.acm.org/citation.cfm?id=3238080 conf/atal/2018 db/conf/atal/aamas2018.html#SunehagLGCZJLSL18 Kris Cao Angeliki Lazaridou Marc Lanctot Joel Z. Leibo Karl Tuyls Stephen Clark Emergent Communication through Negotiation. 2018 ICLR (Poster) https://openreview.net/forum?id=Hk6WhagRW conf/iclr/2018 db/conf/iclr/iclr2018.html#CaoLLLTC18 Sriram Srinivasan 0005 Marc Lanctot Vinícius Flores Zambaldi Julien Pérolat Karl Tuyls Rémi Munos Michael Bowling Actor-Critic Policy Optimization in Partially Observable Multiagent Environments. 3426-3439 2018 NeurIPS https://proceedings.neurips.cc/paper/2018/hash/e22dd5dabde45eda5a1a67772c8e25dd-Abstract.html http://papers.nips.cc/paper/7602-actor-critic-policy-optimization-in-partially-observable-multiagent-environments conf/nips/2018 db/conf/nips/nips2018.html#SrinivasanLZPTM18

Karl Tuyls Julien Pérolat Marc Lanctot Joel Z. Leibo Thore Graepel A Generalised Method for Empirical Game Theoretic Analysis. 2018 abs/1803.06376 CoRR http://arxiv.org/abs/1803.06376 db/journals/corr/corr1803.html#abs-1803-06376

Kris Cao Angeliki Lazaridou Marc Lanctot Joel Z. Leibo Karl Tuyls Stephen Clark Emergent Communication through Negotiation. 2018 abs/1804.03980 CoRR http://arxiv.org/abs/1804.03980 db/journals/corr/corr1804.html#abs-1804-03980

Martin Schmid Neil Burch Marc Lanctot Matej Moravcik Rudolf Kadlec Michael Bowling Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines. 2018 abs/1809.03057 CoRR http://arxiv.org/abs/1809.03057 db/journals/corr/corr1809.html#abs-1809-03057

Sriram Srinivasan 0005 Marc Lanctot Vinícius Flores Zambaldi Julien Pérolat Karl Tuyls Rémi Munos Michael Bowling Actor-Critic Policy Optimization in Partially Observable Multiagent Environments. 2018 abs/1810.09026 CoRR http://arxiv.org/abs/1810.09026 db/journals/corr/corr1810.html#abs-1810-09026

Joel Z. Leibo Vinícius Flores Zambaldi Marc Lanctot Janusz Marecki Thore Graepel Multi-agent Reinforcement Learning in Sequential Social Dilemmas. 464-473 2017 AAMAS http://dl.acm.org/citation.cfm?id=3091194 conf/atal/2017 db/conf/atal/aamas2017.html#LeiboZLMG17 Marc Lanctot Vinícius Flores Zambaldi Audrunas Gruslys Angeliki Lazaridou Karl Tuyls Julien Pérolat David Silver Thore Graepel A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. 4190-4203 2017 NIPS https://proceedings.neurips.cc/paper/2017/hash/3323fe11e9595c09af38fe67567a9394-Abstract.html http://papers.nips.cc/paper/7007-a-unified-game-theoretic-approach-to-multiagent-reinforcement-learning conf/nips/2017 db/conf/nips/nips2017.html#LanctotZGLTPSG17

Joel Z. Leibo Vinícius Flores Zambaldi Marc Lanctot Janusz Marecki Thore Graepel Multi-agent Reinforcement Learning in Sequential Social Dilemmas. 2017 abs/1702.03037 CoRR http://arxiv.org/abs/1702.03037 db/journals/corr/corr1702.html#LeiboZLMG17

Todd Hester Matej Vecerík Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Andrew Sendonaris Gabriel Dulac-Arnold Ian Osband John P. Agapiou Joel Z. Leibo Audrunas Gruslys Learning from Demonstrations for Real World Reinforcement Learning. 2017 abs/1704.03732 CoRR http://arxiv.org/abs/1704.03732 db/journals/corr/corr1704.html#HesterVPLSPSDOA17

Peter Sunehag Guy Lever Audrunas Gruslys Wojciech Marian Czarnecki Vinícius Flores Zambaldi Max Jaderberg Marc Lanctot Nicolas Sonnerat Joel Z. Leibo Karl Tuyls Thore Graepel Value-Decomposition Networks For Cooperative Multi-Agent Learning. 2017 abs/1706.05296 CoRR http://arxiv.org/abs/1706.05296 db/journals/corr/corr1706.html#SunehagLGCZJLSL17

Marc Lanctot Vinícius Flores Zambaldi Audrunas Gruslys Angeliki Lazaridou Karl Tuyls Julien Pérolat David Silver Thore Graepel A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. 2017 abs/1711.00832 CoRR http://arxiv.org/abs/1711.00832 db/journals/corr/corr1711.html#abs-1711-00832

Karl Tuyls Julien Pérolat Marc Lanctot Georg Ostrovski Rahul Savani Joel Z. Leibo Toby Ord Thore Graepel Shane Legg Symmetric Decomposition of Asymmetric Games. 2017 abs/1711.05074 CoRR http://arxiv.org/abs/1711.05074 db/journals/corr/corr1711.html#abs-1711-05074

David Silver Thomas Hubert Julian Schrittwieser Ioannis Antonoglou Matthew Lai Arthur Guez Marc Lanctot Laurent Sifre Dharshan Kumaran Thore Graepel Timothy P. Lillicrap Karen Simonyan Demis Hassabis Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. 2017 abs/1712.01815 CoRR http://arxiv.org/abs/1712.01815 db/journals/corr/corr1712.html#abs-1712-01815

Branislav Bosanský Viliam Lisý Marc Lanctot Jirí Cermák Mark H. M. Winands Algorithms for computing strategies in two-player simultaneous move games. 1-40 2016 237 Artif. Intell. https://doi.org/10.1016/j.artint.2016.03.005 https://www.wikidata.org/entity/Q59209595 db/journals/ai/ai237.html#BosanskyLLCW16

David Silver Aja Huang Chris J. Maddison Arthur Guez Laurent Sifre George van den Driessche 0002 Julian Schrittwieser Ioannis Antonoglou Vedavyas Panneershelvam Marc Lanctot Sander Dieleman Dominik Grewe John Nham Nal Kalchbrenner Ilya Sutskever Timothy P. Lillicrap Madeleine Leach Koray Kavukcuoglu Thore Graepel Demis Hassabis Mastering the game of Go with deep neural networks and tree search. 484-489 2016 529 Nat. 7587 https://doi.org/10.1038/nature16961 https://www.wikidata.org/entity/Q28005460 db/journals/nature/nature529.html#SilverHMGSDSAPL16

Chrisantha Fernando Dylan Banarse Malcolm Reynolds Frederic Besse David Pfau Max Jaderberg Marc Lanctot Daan Wierstra Convolution by Evolution: Differentiable Pattern Producing Networks. 109-116 2016 GECCO https://doi.org/10.1145/2908812.2908890 conf/gecco/2016 db/conf/gecco/gecco2016.html#FernandoBRBPJLW16 Ziyu Wang 0001 Tom Schaul Matteo Hessel Hado van Hasselt Marc Lanctot Nando de Freitas Dueling Network Architectures for Deep Reinforcement Learning. 1995-2003 2016 ICML http://proceedings.mlr.press/v48/wangf16.html conf/icml/2016 db/conf/icml/icml2016.html#WangSHHLF16 Audrunas Gruslys Rémi Munos Ivo Danihelka Marc Lanctot Alex Graves Memory-Efficient Backpropagation Through Time. 4125-4133 2016 NIPS https://proceedings.neurips.cc/paper/2016/hash/a501bebf79d570651ff601788ea9d16d-Abstract.html http://papers.nips.cc/paper/6221-memory-efficient-backpropagation-through-time conf/nips/2016 db/conf/nips/nips2016.html#GruslysMDLG16

Audrunas Gruslys Rémi Munos Ivo Danihelka Marc Lanctot Alex Graves Memory-Efficient Backpropagation Through Time. 2016 abs/1606.03401 CoRR http://arxiv.org/abs/1606.03401 db/journals/corr/corr1606.html#GruslysMDLG16

Viliam Lisý Marc Lanctot Michael H. Bowling Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games. 27-36 2015 AAMAS http://dl.acm.org/citation.cfm?id=2772887 conf/atal/2015 db/conf/atal/aamas2015.html#LisyLB15 Johannes Heinrich Marc Lanctot David Silver Fictitious Self-Play in Extensive-Form Games. 805-813 2015 ICML http://proceedings.mlr.press/v37/heinrich15.html conf/icml/2015 db/conf/icml/icml2015.html#HeinrichLS15

Ziyu Wang 0001 Nando de Freitas Marc Lanctot Dueling Network Architectures for Deep Reinforcement Learning. 2015 abs/1511.06581 CoRR http://arxiv.org/abs/1511.06581 db/journals/corr/corr1511.html#WangFL15

Tom Pepels Mark H. M. Winands Marc Lanctot Real-Time Monte Carlo Tree Search in Ms Pac-Man. 245-257 2014 6 IEEE Trans. Comput. Intell. AI Games 3 https://doi.org/10.1109/TCIAIG.2013.2291577 https://www.wikidata.org/entity/Q56883109 db/journals/tciaig/tciaig6.html#PepelsWL14

Marc Lanctot Further developments of extensive-form replicator dynamics using the sequence-form representation. 1257-1264 2014 AAMAS http://dl.acm.org/citation.cfm?id=2617448 conf/atal/2014 db/conf/atal/aamas2014.html#Lanctot14 Marc Lanctot Mark H. M. Winands Tom Pepels Nathan R. Sturtevant Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups. 1-8 2014 CIG https://doi.org/10.1109/CIG.2014.6932903 conf/cig/2014 db/conf/cig/cig2014.html#LanctotWPS14 Mandy J. W. Tak Marc Lanctot Mark H. M. Winands Monte Carlo Tree Search variants for simultaneous move games. 1-8 2014 CIG https://doi.org/10.1109/CIG.2014.6932889 conf/cig/2014 db/conf/cig/cig2014.html#TakLW14 Tom Pepels Tristan Cazenave Mark H. M. Winands Marc Lanctot Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search. 1-15 2014 CGW@ECAI https://doi.org/10.1007/978-3-319-14923-3_1 conf/ecai/2014cgw db/conf/ecai/cgw2014.html#PepelsCWL14 Tom Pepels Mandy J. W. Tak Marc Lanctot Mark H. M. Winands Quality-based Rewards for Monte-Carlo Tree Search Simulations. 705-710 2014 ECAI https://doi.org/10.3233/978-1-61499-419-0-705 conf/ecai/2014 db/conf/ecai/ecai2014.html#PepelsTLW14

Marc J. V. Ponsen Steven de Jong Marc Lanctot Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling. 2014 abs/1401.4591 CoRR http://arxiv.org/abs/1401.4591 db/journals/corr/corr1401.html#PonsenJL14

Marc Lanctot Mark H. M. Winands Tom Pepels Nathan R. Sturtevant Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups. 2014 abs/1406.0486 CoRR http://arxiv.org/abs/1406.0486 db/journals/corr/corr1406.html#LanctotWPS14

Marc Lanctot Mark H. M. Winands LOA Wins Lines of Action Tournament. 239-240 2013 36 J. Int. Comput. Games Assoc. 4 https://doi.org/10.3233/ICG-2013-36416 db/journals/icga/icga36.html#LanctotW13

Marc Lanctot Mark H. M. Winands SIA Wins Surakarta Tournament. 241 2013 36 J. Int. Comput. Games Assoc. 4 https://doi.org/10.3233/ICG-2013-36418 db/journals/icga/icga36.html#LanctotW13a

Markus Esser Michael Gras Mark H. M. Winands Maarten P. D. Schadd Marc Lanctot Improving Best-Reply Search. 125-137 2013 Computers and Games https://doi.org/10.1007/978-3-319-09165-5_11 conf/cg/2013 db/conf/cg/cg2013.html#EsserGWSL13 Todd W. Neller Marc Lanctot Devika Subramanian Stephanie E. August Model AI Assignments 2013. 2013 EAAI https://doi.org/10.1609/aaai.v27i3.19009 conf/eaai/2013 db/conf/eaai/eaai2013.html#NellerLSA13 Marc Lanctot Viliam Lisý Mark H. M. Winands Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel. 28-43 2013 CGW@IJCAI https://doi.org/10.1007/978-3-319-05428-5_3 https://www.wikidata.org/entity/Q59209609 conf/ijcai/2013cgw db/conf/ijcai/cgw2013.html#LanctotLW13 Marc Lanctot Abdallah Saffidine Joel Veness Christopher Archibald Mark H. M. Winands Monte Carlo *-Minimax Search. 2013 IJCAI http://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/view/6862 http://ijcai.org/Abstract/13/093 conf/ijcai/2013 db/conf/ijcai/ijcai2013.html#LanctotSVAW13 580-586 Viliam Lisý Vojtech Kovarík Marc Lanctot Branislav Bosanský Convergence of Monte Carlo Tree Search in Simultaneous Move Games. 2112-2120 2013 NIPS https://proceedings.neurips.cc/paper/2013/hash/1579779b98ce9edb98dd85606f2c119d-Abstract.html http://papers.nips.cc/paper/5145-convergence-of-monte-carlo-tree-search-in-simultaneous-move-games conf/nips/2013 db/conf/nips/nips2013.html#LisyKLB13

Marc Lanctot Abdallah Saffidine Joel Veness Christopher Archibald Mark H. M. Winands Monte Carlo *-Minimax Search http://arxiv.org/abs/1304.6057 2013 CoRR abs/1304.6057 db/journals/corr/corr1304.html#abs-1304-6057

Viliam Lisý Vojtech Kovarík Marc Lanctot Branislav Bosanský Convergence of Monte Carlo Tree Search in Simultaneous Move Games. 2013 CoRR http://arxiv.org/abs/1310.8613 abs/1310.8613 db/journals/corr/corr1310.html#LisyKLB13

Richard G. Gibson Marc Lanctot Neil Burch Duane Szafron Michael Bowling Generalized Sampling and Variance in Counterfactual Regret Minimization. 2012 AAAI https://doi.org/10.1609/aaai.v26i1.8241 conf/aaai/2012 db/conf/aaai/aaai2012.html#GibsonLBSB12 1355-1361 Michael Johanson Nolan Bard Marc Lanctot Richard G. Gibson Michael Bowling Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. 837-846 2012 AAMAS http://dl.acm.org/citation.cfm?id=2343816 conf/atal/2012 db/conf/atal/aamas2012.html#JohansonBLGB12 Marc Lanctot Richard G. Gibson Neil Burch Michael Bowling No-Regret Learning in Extensive-Form Games with Imperfect Recall. 2012 ICML http://icml.cc/2012/papers/58.pdf conf/icml/2012 db/conf/icml/icml2012.html#LanctotGBB12 Richard G. Gibson Neil Burch Marc Lanctot Duane Szafron Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions. 1889-1897 2012 NIPS https://proceedings.neurips.cc/paper/2012/hash/3df1d4b96d8976ff5986393e8767f5b2-Abstract.html http://papers.nips.cc/paper/4569-efficient-monte-carlo-counterfactual-regret-minimization-in-games-with-many-player-actions conf/nips/2012 db/conf/nips/nips2012.html#GibsonBLS12

Marc Lanctot Richard G. Gibson Neil Burch Martin Zinkevich Michael H. Bowling No-Regret Learning in Extensive-Form Games with Imperfect Recall http://arxiv.org/abs/1205.0622 2012 CoRR abs/1205.0622 db/journals/corr/corr1205.html#abs-1205-0622

Marc J. V. Ponsen Steven de Jong Marc Lanctot Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling. 575-605 2011 42 J. Artif. Intell. Res. https://doi.org/10.1613/jair.3402 db/journals/jair/jair42.html#PonsenJL11

Joel Veness Marc Lanctot Michael H. Bowling Variance Reduction in Monte-Carlo Tree Search. 1836-1844 2011 NIPS https://proceedings.neurips.cc/paper/2011/hash/d736bb10d83a904aefc1d6ce93dc54b8-Abstract.html http://papers.nips.cc/paper/4288-variance-reduction-in-monte-carlo-tree-search conf/nips/2011 db/conf/nips/nips2011.html#VenessLB11 Marc J. V. Ponsen Marc Lanctot Steven de Jong MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling. 2010 Interactive Decision Theory and Game Theory http://aaai.org/ocs/index.php/WS/AAAIW10/paper/view/1985 conf/aaai/2010game db/conf/aaai/game2010.html#PonsenLJ10 Marc Lanctot Kevin Waugh Martin Zinkevich Michael H. Bowling Monte Carlo Sampling for Regret Minimization in Extensive Games. 1078-1086 2009 NIPS https://proceedings.neurips.cc/paper/2009/hash/00411460f7c92d2124a67ea0f4cb5f85-Abstract.html http://papers.nips.cc/paper/3713-monte-carlo-sampling-for-regret-minimization-in-extensive-games conf/nips/2009 db/conf/nips/nips2009.html#LanctotWZB09 Franisek Sailer Michael Buro Marc Lanctot Adversarial Planning Through Strategy Simulation. 80-87 2007 CIG https://doi.org/10.1109/CIG.2007.368082 conf/cig/2007 db/conf/cig/cig2007.html#SailerBL07 John P. Agapiou Thomas W. Anthony 0001Thomas Anthony 0001 Ioannis Antonoglou Christopher Archibald Stephanie E. August Mohammad Gheshlaghi Azar Yoram Bachrach Pierre Baldi David Balduzzi Dylan Banarse Guy Barash Nolan Bard Nathalie Beauguerlange Marc G. Bellemare Heymann Benjamin Frederic Besse Avishkar Bhoopchand Vincent de Boer Sebastian Borgeaud Branislav Bosanský Michael H. BowlingMichael Bowling James Bradbury Noam Brown Kalesha Bullard Neil Burch Michael Buro Kris Cao Mauricio Castillo-Effen Tristan Cazenave Sarah H. Cen Jiri CermakJirí Cermák Sarath Chandar Niyati Chhaya Alden Christianson Peter Clark Stephen Clark Jerome T. Connor Wojciech Czarnecki 0001Wojciech M. CzarneckiWojciech Marian Czarnecki Luca D'Amico-Wong Ivo Danihelka Vibhavari Dasagi Joshua Davidson Elnaz Davoodi Sander Dieleman David Ding Ryan D'Orazio George van den Driessche 0002 Edgar A. Duéñez-Guzmán Gabriel Dulac-Arnold Vincent Dumoulin Iain Dunning Tom Eccles Romuald Elie Huáscar Espinoza Markus Esser Richard Everett 0001 Timo Ewalds Eitan Farchi Gabriele Farina Ryan Faulkner 0001 Chrisantha Fernando Jakob N. Foerster Roy Fox Nando de Freitas Christopher W. Geib Ian Gemp Richard G. Gibson Thore Graepel Michael Gras Alex Graves Amy GreenwaldAmy R. Greenwald Dominik Grewe Audrunas Gruslys Arthur Guez Odd Erik Gundersen Abhinav Gupta 0002 Andras Gyorgy Demis Hassabis Hado van Hasselt Nicolas Heess Seán Ó hÉigeartaigh Johannes Heinrich Daniel Hennes José Hernández-Orallo Matteo Hessel Todd Hester G. Zacharias Holland Daniel HorganDan Horgan Chiori Hori Aja Huang Xiaowei Huang 0001 Thomas Hubert Thomas C. Hudson Edward Hughes 0001 Max Jaderberg Kokil Jaidka Michael Johanson Steven de Jong Rudolf Kadlec Michael Kaisers Nal Kalchbrenner Pavan Kapanipathi Koray Kavukcuoglu Sarah Keren Mina Khan Seokhwan Kim J. Zico Kolter Anna Koop Vojtech Kovarík János Kramár Christian Kroer Dharshan Kumaran Matthew Lai Danny Lange John B. Lanier Hugo Larochelle Kate Larson Angeliki Lazaridou Madeleine Leach Shane Legg Joel Z. Leibo Jean-Baptiste Lespiau Guy Lever Zun Li 0002 Timothy P. Lillicrap Viliam Lisý Siqi Liu 0002 Edward Lockhart Nicolas Loizou Chris J. Maddison Aleksandra Malysheva Yiran Mao Janusz Marecki Luke Marris David R. Martinez Marwan Mattar Mausam Stephen McAleerStephen Marcus McAleer Julian J. McAuley Kevin R. McKee Martin Michalowski Reuth Mirsky Ioannis Mitliagkas Subhodeep Moitra Matej MoravcikMatej Moravcík Dustin Morrill Roozbeh Mottaghi Shibl Mourad Paul Muller Rémi Munos Todd W. Neller John Nham Shayegan Omidshafiei Toby Ord Pedro A. Ortega Ian Osband Joseph C. Osborn Georg Ostrovski Sherjil Ozair Vedavyas Panneershelvam Christos H. Papadimitriou Emilio Parisotto David C. Parkes Paavo Parmas Roma Patel Tom Pepels Julien Pérolat Sarah Perrin David Pfau Olivier Pietquin Georgios Piliouras Bilal Piot Tobias PohlenToby Pohlen Marc J. V. Ponsen Nicolas Porcel John Quan Malcolm Reynolds Mark Rowland 0001 Jonah Ryan-Davis Brennan Saeta Abdallah Saffidine Franisek Sailer Tuomas Sandholm Reca Sarfati Rahul Savani Maarten P. D. Schadd Tom Schaul Martin Schmid Julian Schrittwieser John Schultz Andrew Sendonaris Arash Shaban-Nejad Onn Shehory Laurent Sifre David Silver Karen Simonyan Satinder Singh 0001 Max Olan Smith Samuel Sokota H. Francis Song Nicolas Sonnerat Sriram Srinivasan 0005 Biplav Srivastava William W. Streilein Florian Strub Nathan R. Sturtevant Devika Subramanian Peter Sunehag Michal Sustr Ilya Sutskever David Sychrovsky Duane Szafron Csaba Szepesvári Andrea Tacchetti Mandy J. W. Tak Kartik Talamadupula Brian Tanner Eugene Tarassov Finbarr Timbers Julian Togelius Karl Tuyls Satyaki Upadhyay Matej Vecerík Joel Veness Bart De Vylder Kevin A. WangKevin Wang 0003 Zhe Wang Ziyu Wang 0001 Kevin Waugh Michael P. Wellman Daan Wierstra Mark H. M. Winands James R. Wright Koichiro Yoshino Vinícius Flores Zambaldi Hugh Zhang Quanshi Zhang Martin Zinkevich Imed Zitouni