Collect some World Models (for Autonomous Driving) papers.
If you find some ignored papers, feel free to create pull requests, open issues, or email me / Qi Wang. Contributions in any form to make this list more comprehensive are welcome.
If you find this repository useful, please consider giving us a star ?.
Feel free to share this list with others! ???
CVPR 2024 Workshop & Challenge | OpenDriveLab
Track #4: Predictive World Model.
Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to elevate a pre-trained foundation model to the next level. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.
CVPR 2023 Workshop on Autonomous Driving
CHALLENGE 3: ARGOVERSE CHALLENGES, 3D Occupancy Forecasting using the Argoverse 2 Sensor Dataset. Predict the spacetime occupancy of the world for the next 3 seconds.
Yann LeCun
: A Path Towards Autonomous Machine Intelligence [paper] [Video]CVPR'23 WAD
Keynote - Ashok Elluswamy, Tesla [Video]Wayve
Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]
World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.
WACVW 2024
[Paper] [Code]ISSREW
[PaperarXiv 2024.11
[Paper]arXiv 2024.11
[Paper]arXiv 2024.7
[Paper] [Code]arXiv 2024.5
[Paper] [Code]2024.3, arxiv
[Paper]TITS
[Paper]NeurIPS 2024
[Paper] [Code]NeurIPS 2024
[Paper] [Project]ECCV 2024
[Paper]ECCV 2024
[Paper] [Code]ECCV 2024
[Paper] [Code]ECCV 2024
[Paper] [Code]ECCV 2024
[Paper] [Code]ECCV 2024
[Paper]ECCV 2024
[Paper] [Code]ECCV 2024
[Code]ECCV 2024
[Paper] [Code]ECCV 2024
[Paper] [Code]ICML 2024
[Paper]CVPR 2024
[Paper] [Code]CVPR 2024
[Paper] [Data]CVPR 2024
[Paper] [Code]CVPR 2024
[Paper] [Code]CVPR 2024
[Paper]CVPR 2024
[Paper] [Code]CVPR 2024
[Paper] [Code]ICLR 2024
[Paper] [Code]ICLR 2024
[Paper]ICLR 2024
[Paper] [Code]arXiv 2024.12
[Paper] [Code]arXiv 2024.12
[Paper] [Project]arXiv 2024.12
[Paper]arXiv 2024.12
[Paper] [Project]arXiv 2024.12
[Paper] [Code]arXiv 2024.12
[Paper] [Code]arXiv 2024.12
[Paper] [Code]arXiv 2024.12
[Paper]arXiv 2024.12
[Paper] [Project Page]arXiv 2024.11
[Paper] [Code]arXiv 2024.11
[Paper]arXiv 2024.11
[Paper] [Project Page]arXiv 2024.10
[Paper] [Project Page]arXiv 2024.10
[Paper] [Project Page]arXiv 2024.10
[Paper] [Project Page]arXiv 2024.9
[Paper] [Code]arXiv 2024.9
[Paper]arXiv 2024.9
[Paper] [Code]arXiv 2024.9
[Paper]arXiv 2024.9
[Paper]arXiv 2024.8
[Paper]arXiv 2024.8
[Paper]arXiv 2024.7
[Paper] [Code]arXiv 2024.7
[Paper]arXiv 2024.6
[Paper]arXiv 2024.6
[Paper] [Code]arXiv 2024.6
[Paper] [Code]arXiv 2024.6
[Paper] [Code]arXiv 2024.6
[Paper] [Code]arXiv 2024.5
[Paper] [Code]arXiv 2024.5
[Paper] [Code]arXiv 2024.5
[Paper] [Code]arXiv 2024.5
[Paper] [Code]arXiv 2024.4
[Paper] [Code]arXiv 2024.3
[Paper] [Project]arXiv 2024.3
[Paper] [Code]ICRA 2023
[Paper] [Code]arXiv 2023.12
[Paper] [Code]arXiv 2023.11
[Paper]arXiv 2023.11
[Paper]arXiv 2023.9
[Paper]arXiv 2023.9
[Paper]arXiv 2023.8
[Paper] [Code]NeurIPS 2022
[Paper] [Code]NeurIPS 2022 Spotlight
[Paper] [Code]ICRA 2022
[Paper]IROS 2022
[Paper]NeurIPS 2022 workshop
[Paper]NVIDIA
[Paper] [Code][SMAC] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model. NeurIPS 2024
[Paper]
[CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. NeurIPS 2024
[Paper] [Website] [Torch Code]
[Diamond] Diffusion for World Modeling: Visual Details Matter in Atari. NeurIPS 2024
[Paper] [Code]
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. NeurIPS 2024
[Paper]
[MUN]Learning World Models for Unconstrained Goal Navigation. NeurIPS 2024
[Paper] [Code]
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. NeurIPS 24
[Paper]
Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity. NeurIPSW 2024
[Paper]
Emergence of Implicit World Models from Mortal Agents. NeurIPSW 2024
[Paper]
Causal World Representation in the GPT Model. NeurIPSW 2024
[Paper]
PreLAR: World Model Pre-training with Learnable Action Representation. ECCV 2024
[Paper] [Code]
[CWM] Understanding Physical Dynamics with Counterfactual World Modeling. ECCV 2024
[Paper] [Code]
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. ECCV 2024
[Paper] [Code]
[DWL] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. RSS 2024 (Best Paper Award Finalist)
[Paper]
[LLM-Sim] Can Language Models Serve as Text-Based World Simulators? ACL
[Paper] [Code]
RoboDreamer: Learning Compositional World Models for Robot Imagination. ICML 2024
[Paper] [Code]
[Δ-IRIS] Efficient World Models with Context-Aware Tokenization. ICML 2024
[Paper] [Code]
AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. ICML 2024
[Paper]
Hieros: Hierarchical Imagination on Structured State Space Sequence World Models. ICML 2024
[Paper]
[HRSSM] Learning Latent Dynamic Robust Representations for World Models.ICML 2024
[Paper] [Code]
HarmonyDream: Task Harmonization Inside World Models.ICML 2024
[Paper] [Code]
[REM] Improving Token-Based World Models with Parallel Observation Prediction.ICML 2024
[Paper] [Code]
Do Transformer World Models Give Better Policy Gradients? ICML 2024
[Paper]
TD-MPC2: Scalable, Robust World Models for Continuous Control. ICLR 2024
[Paper] [Torch Code]
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing. ICLR 2024
[Paper]
[R2I] Mastering Memory Tasks with World Models. ICLR 2024
[Paper] [JAX Code]
MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning. ICLR 2024
[Paper] [Code]
Multi-Task Interactive Robot Fleet Learning with Visual World Models. CoRL 2024
[Paper] [Code]
Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction. arXiv 2024.12
[Paper]
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination. arXiv 2024.12
[Paper] [Project]
Transformers Use Causal World Models in Maze-Solving Tasks. arXiv 2024.12
[Paper]
Owl-1: Omni World Model for Consistent Long Video Generation. arXiv 2024.12
[Paper] [Code]
StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization. arXiv 2024.12
[Paper] [Code]
SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation. BNAIC 2024
[Paper]
Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm. arXiv 2024.12
[Paper]
Genie 2: A large-scale foundation world model. 2024.12
Google DeepMind
[Blog]
[NWM] Navigation World Models. arXiv 2024.12
Yann LeCun
[Paper] [Project]
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control. arXiv 2024.12
[Paper] [Project]
Motion Prompting: Controlling Video Generation with Motion Trajectories. arXiv 2024.12
[Paper] [Project]
Generative World Explorer. arXiv 2024.11
[Paper] [Project]
[WebDreamer] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. arXiv 2024.11
[Paper] [Code]
WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making. arXiv 2024.11
[Paper]
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning. arXiv 2024.11
Yann LeCun
[Paper]
Scaling Laws for Pre-training Agents and World Models. arXiv 2024.11
[Paper]
[Phyworld] How Far is Video Generation from World Model: A Physical Law Perspective. arXiv 2024.11
[Paper] [Project]
IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI. arXiv 2024.10
[Paper] [Project]
EVA: An Embodied World Model for Future Video Anticipation. arXiv 2024.10
[Paper]
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning. arXiv 2024.10
[Paper]
[LLMCWM] Language Agents Meet Causality -- Bridging LLMs and Causal World Models. arXiv 2024.10
[Paper] [Code]
Reward-free World Models for Online Imitation Learning. arXiv 2024.10
[Paper]
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation. arXiv 2024.10
[Paper]
[GLIMO] Grounding Large Language Models In Embodied Environment With Imperfect World Models. arXiv 2024.10
[Paper]
AVID: Adapting Video Diffusion Models to World Models. arXiv 2024.10
[Paper] [Code]
[WMP] World Model-based Perception for Visual Legged Locomotion. arXiv 2024.9
[Paper] [Project]
[OSWM] One-shot World Models Using a Transformer Trained on a Synthetic Prior. arXiv 2024.9
[Paper]
R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models. arXiv 2024.9
[Paper]
Representing Positional Information in Generative World Models for Object Manipulation. arXiv 2024.9
[Paper]
Making Large Language Models into World Models with Precondition and Effect Knowledge. arXiv 2024.9
[Paper]
DexSim2Real$^2$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. arXiv 2024.9
[Paper]
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction. arXiv 2024.8
[Paper]
[MoReFree] World Models Increase Autonomy in Reinforcement Learning. arXiv 2024.8
[Paper] [Project]
UrbanWorld: An Urban World Model for 3D City Generation. arXiv 2024.7
[Paper]
PWM: Policy Learning with Large World Models. arXiv 2024.7
[Paper] [Code]
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling. arXiv 2024.7
[Paper]
[GenRL] Multimodal foundation world models for generalist embodied agents. arXiv 2024.6
[Paper] [Code]
[DLLM] World Models with Hints of Large Language Models for Goal Achieving. arXiv 2024.6
[Paper]
Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model. arXiv 2024.6
[Paper]
CityBench: Evaluating the Capabilities of Large Language Model as World Model. arXiv 2024.6
[Paper] [Code]
CoDreamer: Communication-Based Decentralised World Models. arXiv 2024.6
[Paper]
[EBWM] Cognitively Inspired Energy-Based World Models. arXiv 2024.6
[Paper]
Evaluating the World Model Implicit in a Generative Model. arXiv 2024.6
[Paper] [Code]
Transformers and Slot Encoding for Sample Efficient Physical World Modelling. arXiv 2024.5
[Paper] [Code]
[Puppeteer] Hierarchical World Models as Visual Whole-Body Humanoid Controllers. arXiv 2024.5
Yann LeCun
[Paper] [Code]
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation. arXiv 2024.5
[Paper]
Pandora: Towards General World Model with Natural Language Actions and Video States. [Paper] [Code]
[WKM] Agent Planning with World Knowledge Model. arXiv 2024.5
[Paper] [Code]
Newton™ – a first-of-its-kind foundation model for understanding the physical world. Archetype AI
[Blog]
Compete and Compose: Learning Independent Mechanisms for Modular World Models. arXiv 2024.4
[Paper]
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators. arXiv 2024.4
[Paper] [Code]
Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization. arXiv 2024.3
[Paper] [Code]
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. arXiv 2024.3
[Paper] [Code]
V-JEPA: Video Joint Embedding Predictive Architecture. Meta AI
Yann LeCun
[Blog] [Paper] [Code]
[IWM] Learning and Leveraging World Models in Visual Representation Learning. Meta AI
[Paper]
Genie: Generative Interactive Environments. DeepMind
[Paper] [Blog]
[Sora] Video generation models as world simulators. OpenAI
[Technical report]
[LWM] World Model on Million-Length Video And Language With RingAttention. arXiv 2024.2
[Paper] [Code]
Planning with an Ensemble of World Models. OpenReview
[Paper]
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. arXiv 2024.1
[Paper] [Code]
ICLR 2023 Oral
[Paper] [Torch Code]NIPS 2023
[Paper] [Torch Code]ICLR 2023
[Paper] [Torch Code]arXiv 2023.8
[Paper] [JAX Code]arXiv 2023.1
[Paper] [JAX Code] [Torch Code]ICML 2022
[Paper][Torch Code]ICML 2022
[Paper] [TF Code]CoRL 2022
[Paper] [TF Code]NIPS 2022
[Paper] [TF Code]NIPS 2022 Spotlight
[Paper] [Torch Code]arXiv 2022.3
[Paper]ICLR 2021
[Paper] [TF Code] [Torch Code]ICRA 2021
[Paper]ICLR 2020
[Paper] [TF Code] [Torch Code]ICML 2020
[Paper] [TF Code] [Torch Code]NIPS 2018 Oral
[Paper]