I was looking into V-JEPA 2 (Meta, 2025) and related predictive representation approaches recently, and it got me wondering whether we are slowly circling back toward something more continuous-time and dynamical-systems flavored again.

A lot of current world-model work focuses on predictive latent representations:

  • predict abstract structure
  • avoid pixel reconstruction
  • learn semantics through prediction

Which makes perfect sense.

But at the same time, many of these systems still evolve internally through fundamentally discrete transitions:

ztzt+1z_t \to z_{t+1}

Coming from the Neural ODE / continuous dynamics side of things, I keep wondering:

What happens if the latent world model itself is treated as a continuous dynamical system?

Not just "predict the next embedding," but learn a latent flow:

dzdt=f(z,t)\frac{dz}{dt} = f(z, t)

This starts to blur boundaries between:

  • predictive representation learning
  • continuous-time latent dynamics
  • neural differential equations
  • control theory
  • and eventually differentiable simulation

One interesting aspect is that JEPA-style objectives and Neural ODE-style dynamics are not competitors at all. They solve different problems:

  • JEPA learns what matters
  • Neural ODEs learn how it evolves

Combining the two feels surprisingly natural: semantic latent spaces with continuous trajectories instead of discrete jumps.

It also raises interesting questions:

  • Should world models conserve structure?
  • Should latent flows obey geometric constraints?
  • Do continuous latent trajectories produce more stable long-horizon predictions?
  • Is "time" even best represented discretely in learned simulators?

Feels like there is still a lot of unexplored territory between self-supervised predictive learning and continuous dynamical systems.

Related work from my side