13 Comments

I agree with this prediction

Expand full comment

Thank you Meng! 🤝

Expand full comment

You have to embed IA in artificial worlds. The construction of training worlds for AI is the only way to arrive to AGI.

https://forum.effectivealtruism.org/posts/uHeeE5d96TKowTzjA/world-and-mind-in-artificial-intelligence-arguments-against

Expand full comment

Do you think we would have to embody AI as OpenAI is doing with Figure? Or just create realistic worlds using something like the unreal engine to simulate physics and just let it play around in there?

Expand full comment

The cheap solution is to create virtual realities, that probably will be valuable in themselves, and allow for ultra fast training.

Expand full comment

That’s fascinating and I think has a lot of potential

Expand full comment

In David Siver course on reinforcement learning there is an autonomous helicopter flying; they probably crashed thousands of virtual helicopters until the machine learned to pilot the real one.

https://m.youtube.com/watch?v=2pWv7GOvuf0

Expand full comment

Minute 15

Expand full comment

Super cool: my friend does this kind of thing. Allows you to simulate whatever you want and when you think you have a winner, create a real life mvp

Expand full comment

Hi Matthew,

Well written, thanks. Your insights (especially the robotics part, which I hadn't noticed or thought about) on JEPA were quite helpful.

I agree with everything you said.

I think JEPA, diffusion, and LLMs (and reinforcement) together can do great things and potentially bring us closer to AGI.

It seems we both agree on this, and I think I was partly inspired by your article in the comment you read earlier that I posted (in the sense that not many people are talking about JEPA yet). I've been thinking a lot about Yann's and Meta's ideas, and seeing this article a few days ago made me realize that other people see things this way, too (which is helpful because it's kind of positive reinforcement, assuming we're both right).

I like how JEPA can be used to predict high-level representations or features given a constrained framework. Predicting in this way (by adding context blocks and decoding and so on) is quite useful and starts to give some ideas of what "intuition" and "world models" feel like.

We still need more precise ways to solve problems like hierarchical planning, and it's interesting that you suggest the interplay between JEPA and LLMs as a possible approach. That makes a lot of sense and is intuitive. If you tell a model that you need to go to France or something, the model can use "contexts" of travel to abstract or decode from a latent representation space the steps you need to take to do that, such as driving a car. It's also interesting how you propose the interplay between JEPA, LLMs, and reinforcement learning to address problems of agency, which makes perfect sense because if you couple having a goal and exploring or exploiting the environment with advanced planning to realize such a goal, you start to realize some of the characteristic properties of having agency.

I think re-reading your article will help me refine some of the things I wrote in the note (e.g., JEPA itself can be seen as an inner instance of the world model, which is how I described it, but I could be more explicit in explaining its mechanism; interestingly, I just had a "déjà vu" feeling while writing this, weird lol), but I think we are onto something. JEPA as a wrapper or "top" or "outer" layer in some sense could do great things. Implementations are hard, but high-level systems thinking is a first step.

Expand full comment

Great explanatory piece from Matthew. I do have a question; do you all think we have entered a new chapter of human progress?

We had the first industrial revolution (1750), followed by the second (1870), followed by the IT revolution (1960).

Does 2020 mark the beginning of an AI revolution that is separate and distinct from the IT revolution?

Expand full comment

I think so, the future is already here, it’s just not evenly distributed yet. And it will become increasingly obvious once we get agentic AI

Expand full comment

Thanks for your well thought out response!

Completely agree. I’ve always thought that different queries should ping different parts of the model / architecture.

For example, the query “is the sky blue” ideally would only fire up a small language model since its not very complex and would require less compute and energy.

For a query about protein synthesis, fire up the large language model since it’s a more complex prompt.

For image / video generation use diffusion with reinforcement learning (Sora architecture)

And for agency use JEPA.

I don’t think many people grasp the implications of OpenAis new Omni-model. It combines many modalities we’ve been talking about into a single interface.

We’re still at the foundational layer and most of the applications have yet to be built.

Think the future is super bright and I’m so glad we can both be a part of it!

Expand full comment