Scientific documents

[|•|] Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming (2024) - Dimitri P. Bertsekas [|•|]

[|•|] Legged Robot State Estimation With Invariant Extended Kalman Filter Using Neural Measurement Network (2024) - Donghoon Youm, Hyunsik Oh, Suyoung Choi, Hyeongjun Kim, Jemin Hwangbo [|•|]

[|•|] Rethinking Robustness Assessment: Adversarial Attacks on Learning-based Quadrupedal Locomotion Controllers (2024) - Fan Shi, Chong Zhang, Takahiro Miki, Joonho Lee, Marco Hutter, Stelian Coros [|•|]

[|•|] POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints (2024) - Jean-Baptiste Bouvier, Kartik Nagpal, Negar Mehr [|•|]

[|•|] Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation (2024) - Jonathan Yang, Catherine Glossop, Arjun Bhorkar, Dhruv Shah, Quan Vuong, Chelsea Finn, Dorsa Sadigh, Sergey Levine [|•|]

[|•|] Dissecting Deep RL with High Update Ratios: Combatting Value Divergence (2024) - Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton [|•|]

[|•|] Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences (2024) - Nikhil Kumar Singh, Indranil Saha [|•|]

[|•|] Provably Faster Gradient Descent via Long Steps (2023) - Benjamin Grimmer [|•|]

[|•|] Modular Safety-Critical Control of Legged Robots (2023) - Berk Tosun, Evren Samur [|•|]

[|•|] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (2023) - Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, Shuran Song [|•|]

[|•|] Mastering Diverse Domains through World Models (2023) - Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap [|•|]

[|•|] ANYmal Parkour: Learning Agile Navigation for Quadrupedal Robots (2023) - David Hoeller, Nikita Rudin, Dhionis Sako, Marco Hutter [|•|]

[|•|] Reinforcement Learning from Passive Data via Latent Intentions (2023) - Dibya Ghosh, Chethan Bhateja, Sergey Levine [|•|]

[|•|] Planning Goals for Exploration (2023) - Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman [|•|]

[|•|] Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization (2023) - Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan [|•|]

[|•|] Real-World Humanoid Locomotion with Reinforcement Learning (2023) - Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath [|•|]

[|•|] Contrastive Language, Action, and State Pre-training for Robot Learning (2023) - Krishan Rana, Andrew Melnik, Niko Sünderhauf [|•|]

[|•|] TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis (2023) - Mathis Petrovich, Michael J. Black, Gül Varol [|•|]

[|•|] Bigger, Better, Faster: Human-level Atari with human-level efficiency (2023) - Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro [|•|]

[|•|] Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning (2023) - Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine [|•|]

[|•|] Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives (2023) - Muhammad Kazim, JunGee Hong, Min-Gyeom Kim, Kwang-Ki K. Kim [|•|]

[|•|] Efficient Online Reinforcement Learning with Offline Data (2023) - Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine [|•|]

[|•|] IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies (2023) - Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine [|•|]

[|•|] Benchmarking Potential Based Rewards for Learning Humanoid Locomotion (2023) - Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim [|•|]

[|•|] Controllability-Aware Unsupervised Skill Discovery (2023) - Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel [|•|]

[|•|] Predictable MDP Abstraction for Unsupervised Model-Based RL (2023) - Seohong Park, Sergey Levine [|•|]

[|•|] Learning to Modulate pre-trained Models in RL (2023) - Thomas Schmied, Markus Hofmarcher, Fabian Paischer, Razvan Pascanu, Sepp Hochreiter [|•|]

[|•|] Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic (2023) - Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu [|•|]

[|•|] Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning (2023) - Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess [|•|]

[|•|] Symbolic Discovery of Optimization Algorithms (2023) - Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, Quoc V. Le [|•|]

[|•|] Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets (2023) - Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal [|•|]

[|•|] Robust and Versatile Bipedal Jumping Control through Reinforcement Learning (2023) - Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath [|•|]

[|•|] Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning (2022) - Abhishek Gupta, Corey Lynch, Brandon Kinman, Garrett Peake, Sergey Levine, Karol Hausman [|•|]

[|•|] Divide & Conquer Imitation Learning (2022) - Alexandre Chenu, Nicolas Perrin-Gilbert, Olivier Sigaud [|•|]

[|•|] Reactive Stepping for Humanoid Robots using Reinforcement Learning: Application to Standing Push Recovery on the Exoskeleton Atalante (2022) - Alexis Duburcq, Fabian Schramm, Guilhem Boéris, Nicolas Bredeche, Yann Chevaleyre [|•|]

[|•|] Legged Locomotion in Challenging Terrains using Egocentric Vision (2022) - Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak [|•|]

[|•|] When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning (2022) - Annie Xie, Fahim Tajwar, Archit Sharma, Chelsea Finn [|•|]

[|•|] Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning (2022) - Bang You, Jingming Xie, Youping Chen, Jan Peters, Oleg Arenz [|•|]

[|•|] Imitating Past Successes can be Very Suboptimal (2022) - Benjamin Eysenbach, Soumith Udatha, Sergey Levine, Ruslan Salakhutdinov [|•|]

[|•|] Contrastive Learning as Goal-Conditioned Reinforcement Learning (2022) - Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine [|•|]

[|•|] Adversarially Trained Actor Critic for Offline Reinforcement Learning (2022) - Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal [|•|]

[|•|] Learning Dynamics and Generalization in Reinforcement Learning (2022) - Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal [|•|]

[|•|] Deep Hierarchical Planning from Pixels (2022) - Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel [|•|]

[|•|] SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition (2022) - Dylan Slack, Yinlam Chow, Bo Dai, Nevan Wichers [|•|]

[|•|] The Primacy Bias in Deep Reinforcement Learning (2022) - Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville [|•|]

[|•|] Safe Reinforcement Learning by Imagining the Near Future (2022) - Garrett Thomas, Yuping Luo, Tengyu Ma [|•|]

[|•|] SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration (2022) - Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi, Nicolas Heess, Martin Riedmiller [|•|]

[|•|] Do Differentiable Simulators Give Better Policy Gradients? (2022) - H. J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake [|•|]

[|•|] Jump-Start Reinforcement Learning (2022) - Ikechukwu Uchendu, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman [|•|]

[|•|] Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning (2022) - Jinning Li, Chen Tang, Masayoshi Tomizuka, Wei Zhan [|•|]

[|•|] Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking (2022) - Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-Hui Liu, Pieter Abbeel, Qi Dou [|•|]

[|•|] Offline Reinforcement Learning at Multiple Frequencies (2022) - Kaylee Burns, Tianhe Yu, Chelsea Finn, Karol Hausman [|•|]

[|•|] Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics (2022) - Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, Niko Sünderhauf [|•|]

[|•|] A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning (2022) - Laura Smith, Ilya Kostrikov, Sergey Levine [|•|]

[|•|] Understanding Hindsight Goal Relabeling from a Divergence Minimization Perspective (2022) - Lunjun Zhang, Bradly C. Stadie [|•|]

[|•|] Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications (2022) - Matthew Streeter, Joshua V. Dillon [|•|]

[|•|] CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery (2022) - Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel [|•|]

[|•|] Temporal Difference Learning for Model Predictive Control (2022) - Nicklas Hansen, Xiaolong Wang, Hao Su [|•|]

[|•|] Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning (2022) - Nicolas Castanet, Sylvain Lamprier, Olivier Sigaud [|•|]

[|•|] Optimization-Based Control for Dynamic Legged Robots (2022) - Patrick M. Wensing, Michael Posa, Yue Hu, Adrien Escande, Nicolas Mansard, Andrea Del Prete [|•|]

[|•|] Exploration in Deep Reinforcement Learning: A Survey (2022) - Pawel Ladosz, Lilian Weng, Minwoo Kim, Hyondong Oh [|•|]

[|•|] DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems (2022) - Pierre Schumacher, Daniel Häufle, Dieter Büchler, Syn Schmitt, Georg Martius [|•|]

[|•|] Understanding the Complexity Gains of Single-Task RL with a Curriculum (2022) - Qiyang Li, Yuexiang Zhai, Yi Ma, Sergey Levine [|•|]

[|•|] A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems (2022) - Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, Esther Luna Colombini [|•|]

[|•|] Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (2022) - Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov [|•|]

[|•|] Lipschitz-constrained Unsupervised Skill Discovery (2022) - Seohong Park, Jongwook Choi, Jaekyeom Kim, Honglak Lee, Gunhee Kim [|•|]

[|•|] A Review of Safe Reinforcement Learning: Methods, Theory and Applications (2022) - Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Alois Knoll [|•|]

[|•|] A2C is a special case of PPO (2022) - Shengyi Huang, Anssi Kanervisto, Antonin Raffin, Weixun Wang, Santiago Ontañón, Rousslan Fernand Julien Dossa [|•|]

[|•|] Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt (2022) - Sören Mindermann, Jan Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal [|•|]

[|•|] Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo (2022) - Taylor Howell, Nimrod Gileadi, Saran Tunyasuvunakool, Kevin Zakka, Tom Erez, Yuval Tassa [|•|]

[|•|] Multi-Objective Quality Diversity Optimization (2022) - Thomas Pierrot, Guillaume Richard, Karim Beguir, Antoine Cully [|•|]

[|•|] Solving Continuous Control via Q-learning (2022) - Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier [|•|]

[|•|] Transformers are Sample-Efficient World Models (2022) - Vincent Micheli, Eloi Alonso, François Fleuret [|•|]

[|•|] Constrained Differential Dynamic Programming: A primal-dual augmented Lagrangian approach (2022) - Wilson Jallet, Antoine Bambade, Nicolas Mansard, Justin Carpentier [|•|]

[|•|] A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning (2022) - Yuxing Wang, Tiantian Zhang, Yongzhe Chang, Bin Liang, Xueqian Wang, Bo Yuan [|•|]

[|•|] BYOL-Explore: Exploration by Bootstrapped Prediction (2022) - Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot [|•|]

[|•|] Embodied Scene-aware Human Pose Estimation (2022) - Zhengyi Luo, Shun Iwase, Ye Yuan, Kris Kitani [|•|]

[|•|] From Universal Humanoid Control to Automatic Physically Valid Character Creation (2022) - Zhengyi Luo, Ye Yuan, Kris M. Kitani [|•|]

[|•|] Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning (2021) - DeepMind Interactive Agents Team, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy Scully, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu [|•|]

[|•|] Open-Ended Learning Leads to Generally Capable Agents (2021) - Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, Wojciech Marian Czarnecki [|•|]

[|•|] On Bonus-Based Exploration Methods in the Arcade Learning Environment (2021) - Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare [|•|]

[|•|] RMA: Rapid Motor Adaptation for Legged Robots (2021) - Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik [|•|]

[|•|] On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning (2021) - Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra [|•|]

[|•|] Bootstrapping Motor Skill Learning with Motion Planning (2021) - Ben Abbatematteo, Eric Rosen, Stefanie Tellex, George Konidaris [|•|]

[|•|] Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification (2021) - Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov [|•|]

[|•|] Maximum Entropy RL (Provably) Solves Some Robust RL Problems (2021) - Benjamin Eysenbach, Sergey Levine [|•|]

[|•|] Learning Setup Policies: Reliable Transition Between Locomotion Behaviours (2021) - Brendan Tidd, Nicolas Hudson, Akansel Cosgun, Jurgen Leitner [|•|]

[|•|] Dynamics-Aware Quality-Diversity for Efficient Learning of Skill Repertoires (2021) - Bryan Lim, Luca Grillotti, Lorenzo Bernasconi, Antoine Cully [|•|]

[|•|] Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation (2021) - C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem [|•|]

[|•|] Successor Feature Representations (2021) - Chris Reinke, Xavier Alameda-Pineda [|•|]

[|•|] Optimal Control via Combined Inference and Numerical Optimization (2021) - Daniel Layeghi, Steve Tonneau, Michael Mistry [|•|]

[|•|] Goal-Conditioned Reinforcement Learning with Imagined Subgoals (2021) - Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev [|•|]

[|•|] Convex optimization (2021) - Evgeniya Vorontsova, Roland Hildebrand, Alexander Gasnikov, Fedor Stonyakin [|•|]

[|•|] Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot (2021) - Guillermo A. Castillo, Bowen Weng, Wei Zhang, Ayonga Hereid [|•|]

[|•|] Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning (2021) - Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Wenchao Sun, Jianyu Chen [|•|]

[|•|] Offline Reinforcement Learning with Implicit Q-Learning (2021) - Ilya Kostrikov, Ashvin Nair, Sergey Levine [|•|]

[|•|] Hierarchical Skills for Efficient Exploration (2021) - Jonas Gehring, Gabriel Synnaeve, Andreas Krause, Nicolas Usunier [|•|]

[|•|] Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision (2021) - Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter [|•|]

[|•|] Domain Generalization: A Survey (2021) - Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy [|•|]

[|•|] Demonstration-Guided Reinforcement Learning with Learned Skills (2021) - Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim [|•|]

[|•|] Information geometry for multiparameter models: New perspectives on the origin of simplicity (2021) - Katherine N. Quinn, Michael C. Abbott, Mark K. Transtrum, Benjamin B. Machta, James P. Sethna [|•|]

[|•|] Hierarchical Few-Shot Imitation with Skill Transition Models (2021) - Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin [|•|]

[|•|] gradSim: Differentiable simulation for system identification and visuomotor control (2021) - Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jerome Parent-Levesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, Sanja Fidler [|•|]

[|•|] Decision Transformer: Reinforcement Learning via Sequence Modeling (2021) - Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch [|•|]

[|•|] Differentiable Quality Diversity (2021) - Matthew C. Fontaine, Stefanos Nikolaidis [|•|]

[|•|] TRAIL: Near-Optimal Imitation Learning with Suboptimal Data (2021) - Mengjiao Yang, Sergey Levine, Ofir Nachum [|•|]

[|•|] Offline Reinforcement Learning as One Big Sequence Modeling Problem (2021) - Michael Janner, Qiyang Li, Sergey Levine [|•|]

[|•|] Implicitly Regularized RL with Implicit Q-Values (2021) - Nino Vieillard, Marcin Andrychowicz, Anton Raichuk, Olivier Pietquin, Matthieu Geist [|•|]

[|•|] Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching (2021) - Pierre-Alexandre Kamienny, Jean Tarbouriech, Sylvain Lamprier, Alessandro Lazaric, Ludovic Denoyer [|•|]

[|•|] Deep Reinforcement Learning at the Edge of the Statistical Precipice (2021) - Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare [|•|]

[|•|] Continuous Control with Action Quantization from Demonstrations (2021) - Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin [|•|]

[|•|] Decoupling Value and Policy for Generalization in Reinforcement Learning (2021) - Roberta Raileanu, Rob Fergus [|•|]

[|•|] RvS: What is Essential for Offline RL via Supervised Learning? (2021) - Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine [|•|]

[|•|] A Minimalist Approach to Offline Reinforcement Learning (2021) - Scott Fujimoto, Shixiang Shane Gu [|•|]

[|•|] Bootstrapped Meta-Learning (2021) - Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh [|•|]

[|•|] Self-Imitation Learning by Planning (2021) - Sha Luo, Hamidreza Kasaei, Lambert Schomaker [|•|]

[|•|] Vector Quantized Models for Planning (2021) - Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals [|•|]

[|•|] From Motor Control to Team Play in Simulated Humanoid Football (2021) - Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess [|•|]

2105.12196v1 - {control} {humanoid} {imitation} {population} {reinforcement} {skill} {transfer}
Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.

[|•|] Active Hierarchical Exploration with Stable Subgoal Representation Learning (2021) - Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang [|•|]

[|•|] Modern Koopman Theory for Dynamical Systems (2021) - Steven L. Brunton, Marko Budišić, Eurika Kaiser, J. Nathan Kutz [|•|]

[|•|] A Survey of Exploration Methods in Reinforcement Learning (2021) - Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup [|•|]

[|•|] Dropout Q-Functions for Doubly Efficient Reinforcement Learning (2021) - Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka [|•|]

[|•|] Tactical Optimism and Pessimism for Deep Reinforcement Learning (2021) - Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan [|•|]

[|•|] COMBO: Conservative Offline Model-Based Policy Optimization (2021) - Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn [|•|]

[|•|] C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks (2021) - Tianjun Zhang, Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine, Joseph E. Gonzalez [|•|]

[|•|] Extended Tree Search for Robot Task and Motion Planning (2021) - Tianyu Ren, Georgia Chalvatzaki, Jan Peters [|•|]

[|•|] Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies (2021) - Tim Seyde, Igor Gilitschenski, Wilko Schwarting, Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus [|•|]

[|•|] Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning (2021) - Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, Gavriel State [|•|]

[|•|] VideoGPT: Video Generation using VQ-VAE and Transformers (2021) - Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas [|•|]

[|•|] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model (2021) - Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross [|•|]

[|•|] AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (2021) - Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa [|•|]

[|•|] Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance (2021) - Yanqiu Wu, Xinyue Chen, Che Wang, Yiming Zhang, Keith W. Ross [|•|]

[|•|] Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills (2021) - Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jake Varley, Alex Irpan, Benjamin Eysenbach, Ryan Julian, Chelsea Finn, Sergey Levine [|•|]

[|•|] Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation (2021) - Yifeng Zhu, Peter Stone, Yuke Zhu [|•|]

[|•|] State Entropy Maximization with Random Encoders for Efficient Exploration (2021) - Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee [|•|]

[|•|] Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization (2021) - Youngwoon Lee, Joseph J. Lim, Anima Anandkumar, Yuke Zhu [|•|]

[|•|] Solving Compositional Reinforcement Learning Problems via Task Reduction (2021) - Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu [|•|]

[|•|] Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation (2021) - Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani [|•|]

[|•|] Discovering Diverse Athletic Jumping Strategies (2021) - Zhiqi Yin, Zeshi Yang, Michiel van de Panne, KangKang Yin [|•|]

[|•|] First return, then explore (2020) - Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune [|•|]

[|•|] Never Give Up: Learning Directed Exploration Strategies (2020) - Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, Charles Blundell [|•|]

[|•|] Longitudinal high-throughput TCR repertoire profiling reveals the dynamics of T cell memory formation after mild COVID-19 infection (2020) - Anastasia A. Minervina, Ekaterina A. Komech, Aleksei Titov, Meriem Bensouda Koraichi, Elisa Rosati, Ilgar Z. Mamedov, Andre Franke, Grigory A. Efimov, Dmitriy M. Chudakov, Thierry Mora, Aleksandra M. Walczak, Yuri B. Lebedev, Mikhail V. Pogorelyy [|•|]

[|•|] Smooth Exploration for Robotic Reinforcement Learning (2020) - Antonin Raffin, Jens Kober, Freek Stulp [|•|]

[|•|] OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning (2020) - Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum [|•|]

[|•|] Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning (2020) - Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu [|•|]

[|•|] Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics (2020) - Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov [|•|]

[|•|] AWAC: Accelerating Online Reinforcement Learning with Offline Datasets (2020) - Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine [|•|]

2006.09359v6 - {control} {exploration} {offline} {optimal} {reinforcement} {robot} {skill}
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in real-world settings such as robotic control. If we can instead allow RL algorithms to effectively use previously collected data to aid the online learning process, such applications could be made substantially more practical: the prior data would provide a starting point that mitigates challenges due to exploration and sample complexity, while the online training enables the agent to perfect the desired skill. Such prior data could either constitute expert demonstrations or sub-optimal prior data that illustrates potentially useful transitions. While a number of prior methods have either used optimal demonstrations to bootstrap RL, or have used sub-optimal data to train purely offline, it remains exceptionally difficult to train a policy with offline data and actually continue to improve it further with online RL. In this paper we analyze why this problem is so challenging, and propose an algorithm that combines sample efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of RL policies. We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. We demonstrate these benefits on simulated and real-world robotics domains, including dexterous manipulation with a real multi-fingered hand, drawer opening with a robotic arm, and rotating a valve. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.

[|•|] Parrot: Data-Driven Behavioral Priors for Reinforcement Learning (2020) - Avi Singh, Huihan Liu, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine [|•|]

[|•|] DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction (2020) - Aviral Kumar, Abhishek Gupta, Sergey Levine [|•|]

[|•|] Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data (2020) - Benjamin Shickel, Parisa Rashidi [|•|]

[|•|] Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts (2020) - Brendan Tidd, Nicolas Hudson, Akansel Cosgun, Jurgen Leitner [|•|]

[|•|] Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning (2020) - Brian Ichter, Pierre Sermanet, Corey Lynch [|•|]

[|•|] FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize (2020) - Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How [|•|]

[|•|] Mastering Atari with Discrete World Models (2020) - Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba [|•|]

[|•|] Rearrangement: A Challenge for Embodied AI (2020) - Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su [|•|]

[|•|] Behavior Priors for Efficient Reinforcement Learning (2020) - Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess [|•|]

[|•|] Estimating Training Data Influence by Tracing Gradient Descent (2020) - Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale [|•|]

[|•|] PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning (2020) - Guillaume Matheron, Nicolas Perrin, Olivier Sigaud [|•|]

[|•|] LEAF: Latent Exploration Along the Frontier (2020) - Homanga Bharadhwaj, Animesh Garg, Florian Shkurti [|•|]

[|•|] Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels (2020) - Ilya Kostrikov, Denis Yarats, Rob Fergus [|•|]

[|•|] Effective Diversity in Population Based Reinforcement Learning (2020) - Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts [|•|]

[|•|] On the role of planning in model-based deep reinforcement learning (2020) - Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber [|•|]

[|•|] Local Search for Policy Iteration in Continuous Control (2020) - Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller [|•|]

[|•|] D4RL: Datasets for Deep Data-Driven Reinforcement Learning (2020) - Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine [|•|]

[|•|] Phasic Policy Gradient (2020) - Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman [|•|]

[|•|] Accelerating Reinforcement Learning with Learned Skill Priors (2020) - Karl Pertsch, Youngwoon Lee, Joseph J. Lim [|•|]

[|•|] Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? (2020) - Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama, Daniel Nikovski [|•|]

[|•|] Reset-Free Lifelong Learning with Skill-Space Planning (2020) - Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch [|•|]

[|•|] Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO (2020) - Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry [|•|]

[|•|] Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics (2020) - Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller [|•|]

[|•|] Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems (2020) - Muhammad Asif Rana, Anqi Li, Dieter Fox, Byron Boots, Fabio Ramos, Nathan Ratliff [|•|]

[|•|] Explainable CNN-attention Networks (C-Attention Network) for Automated Detection of Alzheimer's Disease (2020) - Ning Wang, Mingxuan Chen, K. P. Subbalakshmi [|•|]

[|•|] A Theory of Universal Learning (2020) - Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir Yehudayoff [|•|]

[|•|] Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization (2020) - Paul Barde, Julien Roy, Wonseok Jeon, Joelle Pineau, Christopher Pal, Derek Nowrouzezahrai [|•|]

[|•|] Ready Policy One: World Building Through Active Learning (2020) - Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts [|•|]

[|•|] Reinforcement Learning with Probabilistically Complete Exploration (2020) - Philippe Morere, Gilad Francis, Tom Blau, Fabio Ramos [|•|]

[|•|] Contrastive Representation Learning: A Framework and Review (2020) - Phuc H. Le-Khac, Graham Healy, Alan F. Smeaton [|•|]

[|•|] Supervised Contrastive Learning (2020) - Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan [|•|]

[|•|] Soft Hindsight Experience Replay (2020) - Qiwei He, Liansheng Zhuang, Houqiang Li [|•|]

[|•|] Primal Wasserstein Imitation Learning (2020) - Robert Dadashi, Léonard Hussenot, Matthieu Geist, Olivier Pietquin [|•|]

[|•|] Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion (2020) - Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller [|•|]

[|•|] Novelty Search in Representational Space for Sample Efficient Exploration (2020) - Ruo Yu Tao, Vincent François-Lavet, Joelle Pineau [|•|]

[|•|] Support-weighted Adversarial Imitation Learning (2020) - Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris [|•|]

[|•|] Learning to Walk in the Real World with Minimal Human Effort (2020) - Sehoon Ha, Peng Xu, Zhenyu Tan, Sergey Levine, Jie Tan [|•|]

[|•|] Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration (2020) - Seungyul Han, Youngchul Sung [|•|]

[|•|] Learning Stable Normalizing-Flow Control for Robotic Manipulation (2020) - Shahbaz Abdul Khader, Hang Yin, Pietro Falco, Danica Kragic [|•|]

[|•|] Learning to Continually Learn (2020) - Shawn Beaulieu, Lapo Frati, Thomas Miconi, Joel Lehman, Kenneth O. Stanley, Jeff Clune, Nick Cheney [|•|]

[|•|] Regularizing Action Policies for Smooth Control with Reinforcement Learning (2020) - Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko [|•|]

[|•|] Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning (2020) - Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba [|•|]

[|•|] Goal-Aware Prediction: Learning to Model What Matters (2020) - Suraj Nair, Silvio Savarese, Chelsea Finn [|•|]

[|•|] Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards (2020) - Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi, Harsh Satija, Doina Precup [|•|]

[|•|] Learning Compositional Neural Programs for Continuous Control (2020) - Thomas Pierrot, Nicolas Perrin, Feryal Behbahani, Alexandre Laterre, Olivier Sigaud, Karim Beguir, Nando de Freitas [|•|]

[|•|] Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization (2020) - Thomas Pierrot, Valentin Macé, Félix Chalumeau, Arthur Flajolet, Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud, Nicolas Perrin-Gilbert [|•|]

[|•|] Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles (2020) - Tim Seyde, Wilko Schwarting, Sertac Karaman, Daniela Rus [|•|]

[|•|] Q-Learning in enormous action spaces via amortized approximate maximization (2020) - Tom Van de Wiele, David Warde-Farley, Andriy Mnih, Volodymyr Mnih [|•|]

[|•|] Population-Guided Parallel Policy Search for Reinforcement Learning (2020) - Whiyoung Jung, Giseung Park, Youngchul Sung [|•|]

[|•|] Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning (2020) - Yannick Schroecker, Charles Isbell [|•|]

[|•|] Non-local Policy Optimization via Diversity-regularized Collaborative Exploration (2020) - Zhenghao Peng, Hao Sun, Bolei Zhou [|•|]

[|•|] Transfer Learning in Deep Reinforcement Learning: A Survey (2020) - Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain, Jiayu Zhou [|•|]

[|•|] Dota 2 with Large Scale Deep Reinforcement Learning (2019) - OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang [|•|]

[|•|] Go-Explore: a New Approach for Hard-Exploration Problems (2019) - Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune [|•|]

1901.10995v4 - {exploration} {imitation} {intrinsic} {reinforcement} {robot} {sparse}
A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following principles: (1) remember previously visited states, (2) first return to a promising state (without exploration), then explore from it, and (3) solve simulated environments through any available means (including by introducing determinism), then robustify via imitation learning. The combined effect of these principles is a dramatic performance improvement on hard-exploration problems. On Montezuma's Revenge, Go-Explore scores a mean of over 43k points, almost 4 times the previous state of the art. Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge. Its max performance of nearly 18 million surpasses the human world record, meeting even the strictest definition of "superhuman" performance. On Pitfall, Go-Explore with domain knowledge is the first algorithm to score above zero. Its mean score of almost 60k points exceeds expert human performance. Because Go-Explore produces high-performing demonstrations automatically and cheaply, it also outperforms imitation learning work where humans provide solution demonstrations. Go-Explore opens up many new research directions into improving it and weaving its insights into current RL algorithms. It may also enable progress on previously unsolvable hard-exploration problems in many domains, especially those that harness a simulator during training (e.g. robotics).

[|•|] Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards (2019) - Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher [|•|]

[|•|] Online Trajectory Planning Through Combined Trajectory Optimization and Function Approximation: Application to the Exoskeleton Atalante (2019) - Alexis Duburcq, Yann Chevaleyre, Nicolas Bredeche, Guilhem Boéris [|•|]

[|•|] Dynamics-Aware Unsupervised Discovery of Skills (2019) - Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman [|•|]

[|•|] State Representation Learning from Demonstration (2019) - Astrid Merckling, Alexandre Coninx, Loic Cressot, Stéphane Doncieux, Nicolas Perrin-Gilbert [|•|]

[|•|] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction (2019) - Aviral Kumar, Justin Fu, George Tucker, Sergey Levine [|•|]

[|•|] Leveraging exploration in off-policy algorithms via normalizing flows (2019) - Bogdan Mazoure, Thang Doan, Audrey Durand, R Devon Hjelm, Joelle Pineau [|•|]

[|•|] Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning (2019) - Casey Chu, Jose Blanchet, Peter Glynn [|•|]

[|•|] Learning Latent Plans from Play (2019) - Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet [|•|]

[|•|] Proximal Distilled Evolutionary Reinforcement Learning (2019) - Cristian Bodnar, Ben Day, Pietro Lió [|•|]

[|•|] Dream to Control: Learning Behaviors by Latent Imagination (2019) - Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi [|•|]

[|•|] Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup (2019) - Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, Martin Riedmiller [|•|]

[|•|] Learning to Reach Goals via Iterated Supervised Learning (2019) - Dibya Ghosh, Abhishek Gupta, Ashwin Reddy, Justin Fu, Coline Devin, Benjamin Eysenbach, Sergey Levine [|•|]

[|•|] An Introduction to Variational Autoencoders (2019) - Diederik P. Kingma, Max Welling [|•|]

[|•|] Modified Actor-Critics (2019) - Erinc Merdivan, Sten Hanke, Matthieu Geist [|•|]

[|•|] On Identifiability in Transformers (2019) - Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer [|•|]

[|•|] Training in Task Space to Speed Up and Guide Reinforcement Learning (2019) - Guillaume Bellegarda, Katie Byl [|•|]

[|•|] The problem with DDPG: understanding failures in deterministic environments with sparse rewards (2019) - Guillaume Matheron, Nicolas Perrin, Olivier Sigaud [|•|]

[|•|] RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies (2019) - Hao-Tien Lewis Chiang, Jasmine Hsu, Marek Fiser, Lydia Tapia, Aleksandra Faust [|•|]

[|•|] Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates (2019) - Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu [|•|]

[|•|] Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit (2019) - Jascha Sohl-Dickstein, Kenji Kawaguchi [|•|]

[|•|] Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning (2019) - Joonho Lee, Jemin Hwangbo, Marco Hutter [|•|]

[|•|] Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks (2019) - Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, Nicolas Heess [|•|]

[|•|] Towards Characterizing Divergence in Deep Q-Learning (2019) - Joshua Achiam, Ethan Knight, Pieter Abbeel [|•|]

[|•|] Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991) (2019) - Juergen Schmidhuber [|•|]

[|•|] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (2019) - Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver [|•|]

[|•|] Diagnosing Bottlenecks in Deep Q-learning Algorithms (2019) - Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine [|•|]

[|•|] Better Exploration with Optimistic Actor-Critic (2019) - Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann [|•|]

[|•|] Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (2019) - Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine [|•|]

[|•|] Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery (2019) - Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine [|•|]

[|•|] A Geometric Perspective on Optimal Representations for Reinforcement Learning (2019) - Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle [|•|]

[|•|] A Theory of Regularized Markov Decision Processes (2019) - Matthieu Geist, Bruno Scherrer, Olivier Pietquin [|•|]

[|•|] Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains (2019) - Matthieu Zimmer, Paul Weng [|•|]

[|•|] When to Trust Your Model: Model-Based Policy Optimization (2019) - Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine [|•|]

[|•|] Convolutional neural network models for cancer type prediction based on gene expression (2019) - Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen [|•|]

[|•|] CAQL: Continuous Action Q-Learning (2019) - Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier [|•|]

[|•|] Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? (2019) - Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine [|•|]

[|•|] Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies (2019) - Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose [|•|]

[|•|] Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning (2019) - Richard Li, Allan Jabri, Trevor Darrell, Pulkit Agrawal [|•|]

[|•|] Feedback MPC for Torque-Controlled Legged Robots (2019) - Ruben Grandia, Farbod Farshidian, René Ranftl, Marco Hutter [|•|]

[|•|] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation (2019) - Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris [|•|]

[|•|] Hamiltonian Neural Networks (2019) - Sam Greydanus, Misko Dzamba, Jason Yosinski [|•|]

[|•|] Attention is not Explanation (2019) - Sarthak Jain, Byron C. Wallace [|•|]

[|•|] On the Convergence of Adam and Beyond (2019) - Sashank J. Reddi, Satyen Kale, Sanjiv Kumar [|•|]

[|•|] Automated curricula through setter-solver interactions (2019) - Sebastien Racaniere, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap [|•|]

[|•|] SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards (2019) - Siddharth Reddy, Anca D. Dragan, Sergey Levine [|•|]

[|•|] Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning (2019) - Thang Doan, Bogdan Mazoure, Moloud Abdar, Audrey Durand, Joelle Pineau, R Devon Hjelm [|•|]

[|•|] Towards Variable Assistance for Lower Body Exoskeletons (2019) - Thomas Gurriet, Maegan Tucker, Alexis Duburcq, Guilhem Boeris, Aaron D. Ames [|•|]

[|•|] Learning Compositional Neural Programs with Recursive Tree Search and Planning (2019) - Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas [|•|]

[|•|] Benchmarking Model-Based Reinforcement Learning (2019) - Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba [|•|]

[|•|] Making Efficient Use of Demonstrations to Solve Hard Exploration Problems (2019) - Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team [|•|]

[|•|] Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning (2019) - Torsten Koller, Felix Berkenkamp, Matteo Turchetta, Joschka Boedecker, Andreas Krause [|•|]

[|•|] Skew-Fit: State-Covering Self-Supervised Reinforcement Learning (2019) - Vitchyr H. Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, Sergey Levine [|•|]

[|•|] ES-MAML: Simple Hessian-Free Meta Learning (2019) - Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang [|•|]

[|•|] Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning (2019) - Xue Bin Peng, Aviral Kumar, Grace Zhang, Sergey Levine [|•|]

[|•|] MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies (2019) - Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Learning Action Representations for Reinforcement Learning (2019) - Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip S. Thomas [|•|]

[|•|] Lyapunov-based Safe Policy Optimization for Continuous Control (2019) - Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh [|•|]

[|•|] Representation Learning with Contrastive Predictive Coding (2018) - Aaron van den Oord, Yazhe Li, Oriol Vinyals [|•|]

[|•|] Relative Entropy Regularized Policy Iteration (2018) - Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller [|•|]

[|•|] Meta-Reinforcement Learning of Structured Exploration Strategies (2018) - Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine [|•|]

[|•|] On First-Order Meta-Learning Algorithms (2018) - Alex Nichol, Joshua Achiam, John Schulman [|•|]

[|•|] Importance mixing: Improving sample reuse in evolutionary policy search methods (2018) - Aloïs Pourchot, Nicolas Perrin, Olivier Sigaud [|•|]

[|•|] Graph networks as learnable physics engines for inference and control (2018) - Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia [|•|]

[|•|] A Closer Look at Deep Policy Gradients (2018) - Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry [|•|]

[|•|] Recall Traces: Backtracking Models for Efficient Reinforcement Learning (2018) - Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio [|•|]

[|•|] Hierarchical Behavioral Repertoires with Unsupervised Descriptors (2018) - Antoine Cully, Yiannis Demiris [|•|]

[|•|] Diversity is All You Need: Learning Skills without a Reward Function (2018) - Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine [|•|]

[|•|] Robot Motion Planning in Learned Latent Spaces (2018) - Brian Ichter, Marco Pavone [|•|]

[|•|] Backplay: "Man muss immer umkehren" (2018) - Cinjon Resnick, Roberta Raileanu, Sanyam Kapoor, Alexander Peysakhovich, Kyunghyun Cho, Joan Bruna [|•|]

[|•|] World Models (2018) - David Ha, Jürgen Schmidhuber [|•|]

[|•|] Unsupervised Control Through Non-Parametric Discriminative Rewards (2018) - David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih [|•|]

[|•|] Evolving simple programs for playing Atari games (2018) - Dennis G Wilson, Sylvain Cussat-Blanc, Hervé Luga, Julian F Miller [|•|]

[|•|] Universal Successor Features Approximators (2018) - Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul [|•|]

[|•|] QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation (2018) - Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, Sergey Levine [|•|]

[|•|] Improving Regression Performance with Distributional Losses (2018) - Ehsan Imani, Martha White [|•|]

[|•|] Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations (2018) - Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem [|•|]

[|•|] Computational Optimal Transport (2018) - Gabriel Peyré, Marco Cuturi [|•|]

[|•|] Model-Based Action Exploration for Learning Dynamic Motion Skills (2018) - Glen Berseth, Michiel van de Panne [|•|]

[|•|] Three Mechanisms of Weight Decay Regularization (2018) - Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse [|•|]

[|•|] Deep Reinforcement Learning and the Deadly Triad (2018) - Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil [|•|]

[|•|] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning (2018) - Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson [|•|]

[|•|] Neural probabilistic motor primitives for humanoid control (2018) - Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess [|•|]

[|•|] Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control (2018) - Kendall Lowrey, Aravind Rajeswaran, Sham Kakade, Emanuel Todorov, Igor Mordatch [|•|]

[|•|] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (2018) - Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine [|•|]

[|•|] When Does Stochastic Gradient Algorithm Work Well? (2018) - Lam M. Nguyen, Nam H. Nguyen, Dzung T. Phan, Jayant R. Kalagnanam, Katya Scheinberg [|•|]

[|•|] IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (2018) - Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu [|•|]

[|•|] Diffeomorphic Learning (2018) - Laurent Younes [|•|]

[|•|] Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP (2018) - Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Luis Lamb, Moshe Vardi [|•|]

[|•|] An Analysis of Categorical Distributional Reinforcement Learning (2018) - Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh [|•|]

[|•|] Learning by Playing - Solving Sparse Reward Tasks from Scratch (2018) - Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg [|•|]

[|•|] Smoothed Action Value Functions for Learning Gaussian Policies (2018) - Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans [|•|]

[|•|] Near-Optimal Representation Learning for Hierarchical Reinforcement Learning (2018) - Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine [|•|]

[|•|] Feedback Control of an Exoskeleton for Paraplegics: Toward Robustly Stable Hands-free Dynamic Walking (2018) - Omar Harib, Ayonga Hereid, Ayush Agrawal, Thomas Gurriet, Sylvain Finet, Guilhem Boeris, Alexis Duburcq, M. Eva Mungai, Matthieu Masselin, Aaron D. Ames, Koushil Sreenath, Jessy Grizzle [|•|]

[|•|] Relational inductive biases, deep learning, and graph networks (2018) - Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu [|•|]

[|•|] Evolved Policy Gradients (2018) - Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel [|•|]

[|•|] Neural Ordinary Differential Equations (2018) - Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud [|•|]

[|•|] Off-Policy Deep Reinforcement Learning without Exploration (2018) - Scott Fujimoto, David Meger, Doina Precup [|•|]

[|•|] Addressing Function Approximation Error in Actor-Critic Methods (2018) - Scott Fujimoto, Herke van Hoof, David Meger [|•|]

[|•|] TD-Regularized Actor-Critic Methods (2018) - Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan [|•|]

[|•|] Task-Embedded Control Networks for Few-Shot Imitation Learning (2018) - Stephen James, Michael Bloesch, Andrew J. Davison [|•|]

[|•|] Capturability-based Pattern Generation for Walking with Variable Height (2018) - Stéphane Caron, Adrien Escande, Leonardo Lanari, Bastien Mallein [|•|]

[|•|] GAN Q-learning (2018) - Thang Doan, Bogdan Mazoure, Clare Lyle [|•|]

[|•|] Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis (2018) - Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent [|•|]

[|•|] First-order and second-order variants of the gradient descent in a unified framework (2018) - Thomas Pierrot, Nicolas Perrin, Olivier Sigaud [|•|]

[|•|] State Representation Learning for Control: An Overview (2018) - Timothée Lesort, Natalia Díaz-Rodríguez, Jean-François Goudou, David Filliat [|•|]

[|•|] Residual Policy Learning (2018) - Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling [|•|]

[|•|] Soft Actor-Critic Algorithms and Applications (2018) - Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (2018) - Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Latent Space Policies for Hierarchical Reinforcement Learning (2018) - Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Composable Deep Reinforcement Learning for Robotic Manipulation (2018) - Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Discovering the Elite Hypervolume by Leveraging Interspecies Correlation (2018) - Vassilis Vassiliades, Jean-Baptiste Mouret [|•|]

[|•|] An Introduction to Deep Reinforcement Learning (2018) - Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, Joelle Pineau [|•|]

[|•|] Learning Symmetric and Low-energy Locomotion (2018) - Wenhao Yu, Greg Turk, C. Karen Liu [|•|]

[|•|] Implicit Quantile Networks for Distributional Reinforcement Learning (2018) - Will Dabney, Georg Ostrovski, David Silver, Rémi Munos [|•|]

[|•|] Optimal transport natural gradient for statistical manifolds with continuous sample space (2018) - Yifan Chen, Wuchen Li [|•|]

[|•|] Adapting Auxiliary Losses Using Gradient Similarity (2018) - Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, Balaji Lakshminarayanan [|•|]

[|•|] Exploration by Random Network Distillation (2018) - Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov [|•|]

[|•|] DeepMind Control Suite (2018) - Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller [|•|]

[|•|] Understanding the impact of entropy on policy optimization (2018) - Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans [|•|]

[|•|] Neural Predictive Belief Representations (2018) - Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos [|•|]

[|•|] Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning (2017) - Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Practical Gauss-Newton Optimisation for Deep Learning (2017) - Aleksandar Botev, Hippolyt Ritter, David Barber [|•|]

[|•|] Learning Multi-Level Hierarchies with Hindsight (2017) - Andrew Levy, George Konidaris, Robert Platt, Kate Saenko [|•|]

[|•|] Quality and Diversity Optimization: A Unifying Modular Framework (2017) - Antoine Cully, Yiannis Demiris [|•|]

[|•|] Action Branching Architectures for Deep Reinforcement Learning (2017) - Arash Tavakoli, Fabio Pardo, Petar Kormushev [|•|]

[|•|] Towards Generalization and Simplicity in Continuous Control (2017) - Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade [|•|]

[|•|] Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations (2017) - Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, Sergey Levine [|•|]

[|•|] Attention Is All You Need (2017) - Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin [|•|]

[|•|] Overcoming Exploration in Reinforcement Learning with Demonstrations (2017) - Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel [|•|]

[|•|] Reverse Curriculum Generation for Reinforcement Learning (2017) - Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel [|•|]

[|•|] Automatic Goal Generation for Reinforcement Learning Agents (2017) - Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel [|•|]

[|•|] TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow (2017) - Danijar Hafner, James Davidson, Vincent Vanhoucke [|•|]

[|•|] Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents (2017) - Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, Jeff Clune [|•|]

[|•|] Uncertainty-Aware Reinforcement Learning for Collision Avoidance (2017) - Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine [|•|]

[|•|] Visualizing the Loss Landscape of Neural Nets (2017) - Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein [|•|]

[|•|] Adversarially Regularized Autoencoders (2017) - Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun [|•|]

[|•|] Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics (2017) - Jeffrey Mahler, Jacky Liang, Sherdil Niyaz, Michael Laskey, Richard Doan, Xinyu Liu, Juan Aparicio Ojea, Ken Goldberg [|•|]

[|•|] Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients (2017) - Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley [|•|]

[|•|] Proximal Policy Optimization Algorithms (2017) - John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov [|•|]

[|•|] A Brief Survey of Deep Reinforcement Learning (2017) - Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath [|•|]

[|•|] Sharp Minima Can Generalize For Deep Nets (2017) - Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio [|•|]

[|•|] Discrete Sequential Prediction of Continuous Actions for Deep RL (2017) - Luke Metz, Julian Ibarz, Navdeep Jaitly, James Davidson [|•|]

[|•|] The Cramer Distance as a Solution to Biased Wasserstein Gradients (2017) - Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos [|•|]

[|•|] A Distributional Perspective on Reinforcement Learning (2017) - Marc G. Bellemare, Will Dabney, Rémi Munos [|•|]

[|•|] Hindsight Experience Replay (2017) - Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba [|•|]

[|•|] Population Based Training of Neural Networks (2017) - Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu [|•|]

[|•|] Noisy Networks for Exploration (2017) - Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg [|•|]

[|•|] Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards (2017) - Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller [|•|]

[|•|] Emergence of Locomotion Behaviours in Rich Environments (2017) - Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver [|•|]

[|•|] Trust-PCL: An Off-Policy Trust Region Method for Continuous Control (2017) - Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans [|•|]

[|•|] Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks (2017) - Roby Velez, Jeff Clune [|•|]

[|•|] Dynamic Routing Between Capsules (2017) - Sara Sabour, Nicholas Frosst, Geoffrey E Hinton [|•|]

[|•|] A Unified Approach to Interpreting Model Predictions (2017) - Scott Lundberg, Su-In Lee [|•|]

[|•|] Policy Optimization by Genetic Distillation (2017) - Tanmay Gangwani, Jian Peng [|•|]

[|•|] Expanding Motor Skills through Relay Neural Networks (2017) - Visak C. V. Kumar, Sehoon Ha, C. Karen Liu [|•|]

[|•|] Distributional Reinforcement Learning with Quantile Regression (2017) - Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos [|•|]

[|•|] True Asymptotic Natural Gradient Optimization (2017) - Yann Ollivier [|•|]

[|•|] A Bayesian Approach to Policy Recognition and State Representation Learning (2016) - Adrian Šošić, Abdelhak M. Zoubir, Heinz Koeppl [|•|]

[|•|] Professor Forcing: A New Algorithm for Training Recurrent Networks (2016) - Alex Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio [|•|]

[|•|] Successor Features for Transfer in Reinforcement Learning (2016) - André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver [|•|]

[|•|] A Lyapunov Analysis of Momentum Methods in Optimization (2016) - Ashia C. Wilson, Benjamin Recht, Michael I. Jordan [|•|]

[|•|] Input Convex Neural Networks (2016) - Brandon Amos, Lei Xu, J. Zico Kolter [|•|]

[|•|] Tutorial on Variational Autoencoders (2016) - Carl Doersch [|•|]

[|•|] Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (2016) - Chelsea Finn, Sergey Levine, Pieter Abbeel [|•|]

[|•|] Improving Variational Inference with Inverse Autoregressive Flow (2016) - Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling [|•|]

[|•|] Learning Null Space Projections in Operational Space Formulation (2016) - Hsiu-Chin Lin, Matthew Howard [|•|]

[|•|] Deep Exploration via Bootstrapped DQN (2016) - Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy [|•|]

[|•|] Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay (2016) - Ionel-Alexandru Hosu, Traian Rebedea [|•|]

[|•|] Variational Intrinsic Control (2016) - Karol Gregor, Danilo Jimenez Rezende, Daan Wierstra [|•|]

[|•|] A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation (2016) - Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard [|•|]

[|•|] Unifying Count-Based Exploration and Intrinsic Motivation (2016) - Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos [|•|]

[|•|] Reinforcement Learning with Unsupervised Auxiliary Tasks (2016) - Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu [|•|]

[|•|] The Option-Critic Architecture (2016) - Pierre-Luc Bacon, Jean Harb, Doina Precup [|•|]

[|•|] An overview of gradient descent optimization algorithms (2016) - Sebastian Ruder [|•|]

[|•|] Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection (2016) - Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen [|•|]

[|•|] Continuous Deep Q-Learning with Model-based Acceleration (2016) - Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine [|•|]

[|•|] Group Sparse Regularization for Deep Neural Networks (2016) - Simone Scardapane, Danilo Comminiello, Amir Hussain, Aurelio Uncini [|•|]

[|•|] 3D Simulation for Robot Arm Control with Deep Q-Learning (2016) - Stephen James, Edward Johns [|•|]

[|•|] Deep Successor Reinforcement Learning (2016) - Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman [|•|]

[|•|] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks (2016) - Tim Salimans, Diederik P. Kingma [|•|]

[|•|] Asynchronous Methods for Deep Reinforcement Learning (2016) - Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu [|•|]

[|•|] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (2016) - Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel [|•|]

[|•|] Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter? (2016) - Xue Bin Peng, Michiel van de Panne [|•|]

[|•|] Sample Efficient Actor-Critic with Experience Replay (2016) - Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas [|•|]

[|•|] Deep Spatial Autoencoders for Visuomotor Learning (2015) - Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel [|•|]

[|•|] Variational Inference with Normalizing Flows (2015) - Danilo Jimenez Rezende, Shakir Mohamed [|•|]

[|•|] Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies (2015) - David Balduzzi, Muhammad Ghifary [|•|]

[|•|] Model Predictive Path Integral Control using Covariance Variable Importance Sampling (2015) - Grady Williams, Andrew Aldrich, Evangelos Theodorou [|•|]

[|•|] Natural Neural Networks (2015) - Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, Koray Kavukcuoglu [|•|]

[|•|] Illuminating search spaces by mapping elites (2015) - Jean-Baptiste Mouret, Jeff Clune [|•|]

[|•|] High-Dimensional Continuous Control Using Generalized Advantage Estimation (2015) - John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel [|•|]

[|•|] Trust Region Policy Optimization (2015) - John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel [|•|]

[|•|] Speed learning on the fly (2015) - Pierre-Yves Massé, Yann Ollivier [|•|]

[|•|] Neural Programmer-Interpreters (2015) - Scott Reed, Nando de Freitas [|•|]

[|•|] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015) - Sergey Ioffe, Christian Szegedy [|•|]

[|•|] End-to-End Training of Deep Visuomotor Policies (2015) - Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel [|•|]

[|•|] Continuous control with deep reinforcement learning (2015) - Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra [|•|]

[|•|] Prioritized Experience Replay (2015) - Tom Schaul, John Quan, Ioannis Antonoglou, David Silver [|•|]

[|•|] Training recurrent networks online without backtracking (2015) - Yann Ollivier, Corentin Tallec, Guillaume Charpiat [|•|]

[|•|] Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images (2014) - Anh Nguyen, Jason Yosinski, Jeff Clune [|•|]

[|•|] Breaking the Curse of Dimensionality with Convex Neural Networks (2014) - Francis Bach [|•|]

[|•|] New insights and perspectives on the natural gradient method (2014) - James Martens [|•|]

[|•|] Generative Class-conditional Autoencoders (2014) - Jan Rudy, Graham Taylor [|•|]

[|•|] How transferable are features in deep neural networks? (2014) - Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson [|•|]

[|•|] Correlation and variable importance in random forests (2013) - Baptiste Gregorutti, Bertrand Michel, Philippe Saint-Pierre [|•|]

[|•|] Fast Marching Tree: a Fast Marching Sampling-Based Method for Optimal Motion Planning in Many Dimensions (2013) - Lucas Janson, Edward Schmerling, Ashley Clark, Marco Pavone [|•|]

[|•|] Revisiting Natural Gradient for Deep Networks (2013) - Razvan Pascanu, Yoshua Bengio [|•|]

[|•|] An Efficiently Solvable Quadratic Program for Stabilizing Dynamic Locomotion (2013) - Scott Kuindersma, Frank Permenter, Russ Tedrake [|•|]

[|•|] Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences (2013) - Yann Ollivier [|•|]

[|•|] Deep Gaussian Processes (2012) - Andreas C. Damianou, Neil D. Lawrence [|•|]

[|•|] A geometrical introduction to screw theory (2012) - E. Minguzzi [|•|]

[|•|] Layer-wise learning of deep generative models (2012) - Ludovic Arnold, Yann Ollivier [|•|]

[|•|] A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets (2012) - Nicolas Le Roux, Mark Schmidt, Francis Bach [|•|]

[|•|] Representation Learning: A Review and New Perspectives (2012) - Yoshua Bengio, Aaron Courville, Pascal Vincent [|•|]

[|•|] Krylov Subspace Descent for Deep Learning (2011) - Oriol Vinyals, Daniel Povey [|•|]

[|•|] Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles (2011) - Yann Ollivier, Ludovic Arnold, Anne Auger, Nikolaus Hansen [|•|]

[|•|] Invariant Funnels around Trajectories using Sum-of-Squares Programming (2010) - Mark M. Tobenkin, Ian R. Manchester, Russ Tedrake [|•|]

[|•|] Hamilton-Jacobi formulation for reach-avoid differential games (2009) - Kostas Margellos, John Lygeros [|•|]

[|•|] Computational Geometry Column 42 (2001) - Joseph S. B. Mitchell, Joseph O'Rourke [|•|]