Overview
The goal of this workshop is to exchange ideas and establish
communications among researchers working on building generalizable
world models that describe how the physical world evolves in
response to interacting agents (e.g. human and robots).
Large-scale datasets of videos, images, and text hold the key for
learning generalizable world models that are visually plausible.
However, distilling useful physical information from such diverse
unstructured data is challenging and requires careful attention to
data curation, developing scalable algorithms, and implementing
suitable training curricula. On the other hand, physics-based
priors can enable learning plausible scene dynamics but it is
difficult to scale to complex phenomenon that lack efficient
solvers or even governing dynamic equations. Developing general
world models that can simulate complex real-world phenomenon in a
physically-plausible fashion can unlock enormous opportunities in
generative modeling and robotics, and would be of wide interest to
the larger AI community, and we believe this workshop falls at an
ideal timing given recent significant progress in both
video-modeling models and physics-based simulation. This workshop
aims to bring together researchers in machine learning, robotics,
physics-based simulation, and computer vision broadly aspiring to
build scalable world models by utilizing internet data,
simulation, and beyond in myriad ways.