"We are not ready to talk about that. I mean, we work on all kinds of research. We have said for a while that we think better reasoning in these systems is an important direction that we'd like to pursue. We haven't cracked the code yet. We're very interested in it."
—Sam Altman on the Lex Fridman podcast re: Q*
A recent Reuters article confirmed the existence of a project code named “Strawberry”, formerly known as Q*.
For anyone who has been following the AI ecosystem for a while, you may already be familiar with Q*.
It’s the same initiative that was most likely behind the Nov 2023 attempted ousting of Sam Altman, by members of the OpenAI Board.
I did an entire write up on what we knew about Q* then, which you can find → here.
And I covered the entire OpenAI Board saga as it happened, which can be found →here.
So what do we know about Q*?
How has it changed since becoming “Strawberry”?
And what is so potentially dangerous about it, that would force long time friend, collaborator, and Chief Scientist Ilya Sutskever to pull the ripcord on the entire project, and help lead a coup to oust CEO Sam Altman?
Call me Q
OpenAI's Q* equation, also known as the Q-learning equation, is a seminal reinforcement learning algorithm.
It's used to determine the "value" of a particular state in an environment, given a certain action taken by the agent. The equation looks like this:
Q(s,a) = r(s,a) + gamma * max_a' Q(s',a')
Or Q* for short. This equation allows an agent to learn what actions are likely to lead to higher rewards over time. So the model is punished for a bad outcome and rewarded for a good outcome, leading to better (more user aligned) outcomes over time.
So why did people freak out back in November 2023?
It was speculated that Q* would be able to do complex math at an alarmingly fast rate. And if it was possible for an AI model to be this good at math, then it could break the encryption most enterprises use to safe guard their data.
This seems to be the most plausible explanation as to why Ilya pulled the ripcord on the project, and formed a coalition with non-profit board members Helen Toner and Tasha McCauley. Though bear in mind, this is just my speculative opinion.
Toner and McCauley had their own reasons for wanting Sama ousted. Officially he was “not sufficiently candid” with them, which was the reason they gave for firing him. Unofficially, he was telling them to “fuck off and let me do my job” and they didn’t like that.
Toner in particular did not vibe well with Altman, even going so far as to co-write and publish a paper intended for policymakers, which discussed approaches to AI safety signaling by different companies.
The paper highlighted Anthropic's approach, noting that their "desire to be perceived as a company that values safety shines through across its communications" and it also mentioned criticisms of OpenAI, related to the launches of ChatGPT and GPT-4.
When you come at the king, you best not miss, because Toner and McCauley got the boot, Sama was reinstated as CEO, and Ilya recently left OpenAI to start his own “responsible AI company” Which brings us to:
Strawberry
So what do we know about code name “strawberry”?
Advanced reasoning capabilities:
Strawberry aims to significantly enhance AI models' ability to solve complex problems through improved planning and decision-making.
This involves enabling AI to think ahead and execute multi-step tasks over extended periods, similar to human multi-step reasoning.
The goal is to move beyond simply predictive generation, and instead allow AI to autonomously navigate information and perform deep research.
Basically, the goal is to allow the AI to form mental models, then think through how to solve problems on its own, rather than relying on rote memorization of simulations in its training data.
It’s the difference between understanding why something works, vs just knowing I’m supposed to act this way in this situation because someone told me I’m supposed to do that.
Autonomous research
A core focus of Strawberry is to enable AI models to independently browse the internet and conduct what OpenAI terms "deep research". This is one of Perplexity’s main focus’ right now with its “Pro” search.
This capability would allow AI to gather information, analyze it, and draw conclusions without constant human guidance. It would represent a significant leap forward in AI autonomy.
It’s the difference between an individual contributor who will do what you ask them to do, and a Program Manager who understands what the overarching goal is, and may make creative decisions on how best to get you there.
Post-training approach
Strawberry utilizes a specialized post-training method to refine AI models after their initial training on large datasets.
This approach allows for targeted improvements in specific areas like reasoning and planning.
While the exact details are kept secret, it likely involves techniques such as fine-tuning and potentially the creation of self-generated training data to bootstrap the model's capabilities.
Similarities to STaR
The project reportedly shares similarities with the "Self-Taught Reasoner" (STaR) method developed at Stanford University in 2022.
STaR enables AI models to iteratively create their own training data, allowing them to "bootstrap" themselves to higher levels of intelligence.
This approach could potentially lead to AI models surpassing human-level intelligence in certain domains, and caused hysteria back in Nov 2023 when people claiming this would allow the model to “auto-regressively” bootstrap itself to super intelligence, which could lead to an “extinction level” event.
This is, of course, a histrionic argument that is not based in fact, and has been debunked by Sam Altman and Yann LeCunn, among others, who both agree that “AGI” and “ASI” will be an interactive process, and not a single moment in time.
Q Branch
While some demonstrations of advanced reasoning capabilities have been shown internally, it's unclear whether these were specifically part of the Strawberry project or related initiatives.
OpenAI is maintaining strict secrecy around the Strawberry project, and this level of confidentiality suggests that the company considers the technology to be highly sensitive and potentially groundbreaking.
With all the secrecy, OpenAI really missed out on the opportunity to call this department Q Branch 😂.
In the long term, Q*, and Strawberry, and projects like it, appear to be just another stepping stone toward "AGI”, a technology that serve as the Steam Engine of the Mind.
It could help us cure disease through Gene Therapy, increase productivity to a hitherto unforeseen level, and act as a deflationary lever to our massively inflated fiat currencies, as well as help us in our quest to become a multi-planetary species.
Avanti!
Your articles are a good way to learn and stay updated.
I think following developments, especially not "under the spotlight" of AI projects, is interesting and helps to understand not only the current state but the agreements on future prospects. Thank you for this issue, when unique reflections and personal perspectives are even more valuable! P.S. Marketing and branding reflections on the ‘rename’ of the project could be a possible area of the part two of this essay!