Do you think we would have to embody AI as OpenAI is doing with Figure? Or just create realistic worlds using something like the unreal engine to simulate physics and just let it play around in there?
In David Siver course on reinforcement learning there is an autonomous helicopter flying; they probably crashed thousands of virtual helicopters until the machine learned to pilot the real one.
Well written, thanks. Your insights (especially the robotics part, which I hadn't noticed or thought about) on JEPA were quite helpful.
I agree with everything you said.
I think JEPA, diffusion, and LLMs (and reinforcement) together can do great things and potentially bring us closer to AGI.
It seems we both agree on this, and I think I was partly inspired by your article in the comment you read earlier that I posted (in the sense that not many people are talking about JEPA yet). I've been thinking a lot about Yann's and Meta's ideas, and seeing this article a few days ago made me realize that other people see things this way, too (which is helpful because it's kind of positive reinforcement, assuming we're both right).
I like how JEPA can be used to predict high-level representations or features given a constrained framework. Predicting in this way (by adding context blocks and decoding and so on) is quite useful and starts to give some ideas of what "intuition" and "world models" feel like.
We still need more precise ways to solve problems like hierarchical planning, and it's interesting that you suggest the interplay between JEPA and LLMs as a possible approach. That makes a lot of sense and is intuitive. If you tell a model that you need to go to France or something, the model can use "contexts" of travel to abstract or decode from a latent representation space the steps you need to take to do that, such as driving a car. It's also interesting how you propose the interplay between JEPA, LLMs, and reinforcement learning to address problems of agency, which makes perfect sense because if you couple having a goal and exploring or exploiting the environment with advanced planning to realize such a goal, you start to realize some of the characteristic properties of having agency.
I think re-reading your article will help me refine some of the things I wrote in the note (e.g., JEPA itself can be seen as an inner instance of the world model, which is how I described it, but I could be more explicit in explaining its mechanism; interestingly, I just had a "déjà vu" feeling while writing this, weird lol), but I think we are onto something. JEPA as a wrapper or "top" or "outer" layer in some sense could do great things. Implementations are hard, but high-level systems thinking is a first step.
Completely agree. I’ve always thought that different queries should ping different parts of the model / architecture.
For example, the query “is the sky blue” ideally would only fire up a small language model since its not very complex and would require less compute and energy.
For a query about protein synthesis, fire up the large language model since it’s a more complex prompt.
For image / video generation use diffusion with reinforcement learning (Sora architecture)
And for agency use JEPA.
I don’t think many people grasp the implications of OpenAis new Omni-model. It combines many modalities we’ve been talking about into a single interface.
We’re still at the foundational layer and most of the applications have yet to be built.
Think the future is super bright and I’m so glad we can both be a part of it!
I agree with this prediction
Thank you Meng! 🤝
You have to embed IA in artificial worlds. The construction of training worlds for AI is the only way to arrive to AGI.
https://forum.effectivealtruism.org/posts/uHeeE5d96TKowTzjA/world-and-mind-in-artificial-intelligence-arguments-against
Do you think we would have to embody AI as OpenAI is doing with Figure? Or just create realistic worlds using something like the unreal engine to simulate physics and just let it play around in there?
The cheap solution is to create virtual realities, that probably will be valuable in themselves, and allow for ultra fast training.
That’s fascinating and I think has a lot of potential
In David Siver course on reinforcement learning there is an autonomous helicopter flying; they probably crashed thousands of virtual helicopters until the machine learned to pilot the real one.
https://m.youtube.com/watch?v=2pWv7GOvuf0
Minute 15
Super cool: my friend does this kind of thing. Allows you to simulate whatever you want and when you think you have a winner, create a real life mvp
Hi Matthew,
Well written, thanks. Your insights (especially the robotics part, which I hadn't noticed or thought about) on JEPA were quite helpful.
I agree with everything you said.
I think JEPA, diffusion, and LLMs (and reinforcement) together can do great things and potentially bring us closer to AGI.
It seems we both agree on this, and I think I was partly inspired by your article in the comment you read earlier that I posted (in the sense that not many people are talking about JEPA yet). I've been thinking a lot about Yann's and Meta's ideas, and seeing this article a few days ago made me realize that other people see things this way, too (which is helpful because it's kind of positive reinforcement, assuming we're both right).
I like how JEPA can be used to predict high-level representations or features given a constrained framework. Predicting in this way (by adding context blocks and decoding and so on) is quite useful and starts to give some ideas of what "intuition" and "world models" feel like.
We still need more precise ways to solve problems like hierarchical planning, and it's interesting that you suggest the interplay between JEPA and LLMs as a possible approach. That makes a lot of sense and is intuitive. If you tell a model that you need to go to France or something, the model can use "contexts" of travel to abstract or decode from a latent representation space the steps you need to take to do that, such as driving a car. It's also interesting how you propose the interplay between JEPA, LLMs, and reinforcement learning to address problems of agency, which makes perfect sense because if you couple having a goal and exploring or exploiting the environment with advanced planning to realize such a goal, you start to realize some of the characteristic properties of having agency.
I think re-reading your article will help me refine some of the things I wrote in the note (e.g., JEPA itself can be seen as an inner instance of the world model, which is how I described it, but I could be more explicit in explaining its mechanism; interestingly, I just had a "déjà vu" feeling while writing this, weird lol), but I think we are onto something. JEPA as a wrapper or "top" or "outer" layer in some sense could do great things. Implementations are hard, but high-level systems thinking is a first step.
Great explanatory piece from Matthew. I do have a question; do you all think we have entered a new chapter of human progress?
We had the first industrial revolution (1750), followed by the second (1870), followed by the IT revolution (1960).
Does 2020 mark the beginning of an AI revolution that is separate and distinct from the IT revolution?
I think so, the future is already here, it’s just not evenly distributed yet. And it will become increasingly obvious once we get agentic AI
Thanks for your well thought out response!
Completely agree. I’ve always thought that different queries should ping different parts of the model / architecture.
For example, the query “is the sky blue” ideally would only fire up a small language model since its not very complex and would require less compute and energy.
For a query about protein synthesis, fire up the large language model since it’s a more complex prompt.
For image / video generation use diffusion with reinforcement learning (Sora architecture)
And for agency use JEPA.
I don’t think many people grasp the implications of OpenAis new Omni-model. It combines many modalities we’ve been talking about into a single interface.
We’re still at the foundational layer and most of the applications have yet to be built.
Think the future is super bright and I’m so glad we can both be a part of it!