Q* MAY have made a breakthrough in two more intelligence domains (reasoning and planning). That shouldn't surprise you; it will eventually happen.
We need to start aligning the educational system with a world where humans live with machines that have intelligence capabilities that approximate their own.
TL;DR
*AIs can currently engage in natural language conversation and are rapidly developing knowledge in ways that exceed human knowledge, and the accuracy of that knowledge continues to improve.
*Q* (and similar projects; there are many) *may* mean AIs will have the ability (when scaled) to engage in significant reasoning and planning, leading to abilities to undertake complex math, develop science, and replace more “human” ‘tasks at work.
*This is all happening much faster than anticipated.
*Today’s freshmen *may* graduate into a world where AIs have at least similar intelligence abilities to humans. Today’s 1st graders *probably* will.
*Efforts need to be made to align the educational system with a world where machines will have intelligence capabilities similar to those of humans.
Introduction
In the last couple of days, there has been a lot of drama around Q*, a supposed (there is no confirmation) research project at OpenAI (ChatGPT) that is getting us closer to Artificial General Intelligence (AGI) [explanation of AGI here] and may have (again, speculation and it has been denied) led to the split on the board on whether or not AI was advancing too quickly and needed to be slowed down.
Project Q* is built on the principles of Q-learning. You can imagine Q-learning as a game where the AI learns the best moves to make. The AI tries different moves to see which ones give the best results or rewards. It keeps track of these results using something called a Q-function. The AI uses this function to guess which move will give the best outcome in the future. As the AI tries more and more moves, it gets better at predicting and making better decisions.
This is fundamentally different from how today’s large language models (LLMs) that drive ChatGPT and other AIs operate. Imagine it as teaching a robot to navigate a maze; it moves around, bumps into walls, and eventually figures out the best path to the exit. It keeps track of the successful routes using a virtual scorecard, which helps it remember which actions are worth repeating. This method is goal-oriented and interactive, constantly updating its strategy based on new experiences. Based on its learning, it undertakes the actions needed to achieve the goal, and its memory would become cumulative, giving it the ability to improve on itself.
In contrast to goal learning, large language models like GPT-4 and others (there are currently more than 400,000 language models on Github) are akin to an encyclopedia that learned to talk. These models are fed a vast amount of text—everything from novels to newspapers—and use this information to understand patterns in language. When you ask it a question or for a text, it predicts the best response based on what it has read, without learning from interaction or aiming for a reward. It doesn't have a goal beyond generating coherent and contextually appropriate language, and it doesn't learn from its 'experiences' in the way Q-learning AI does. This is why it’s often called a really smart, stochastic parrot, mimicking human-like responses based on the massive amount of dialogue it has been exposed to, though it arguably has limited (“a little bit” Altman; train of thought) reasoning abilities, but not abstract reasoning abilities that would enable it to make scientific discoveries.
Why is Q so significant?
“Q” is not some radical new project or approach to AI, and projects like it have been under development at many companies for years.
He adds — https://www.linkedin.com/posts/yann-lecun_please-ignore-the-deluge-of-complete-nonsense-activity-7133900073117061121-tTmG?utm_source=share&utm_medium=member_desktop
Noam Brown created an AI system that out-negotiated humans in Diplomacy, one of the most significant AI accomplishments ever.
Open AI has been talking about this since 2016.
What could be significant is if there was an actual breakthrough (this link also explains the significance of development if it happened). It has been reported that so far, Q* has learned to solve basic, grade school math problems using the Q* function, not because of something in its training data, which represents not simply a scaling of models with more data, but the potential to develop models that can engage in advanced reasoning and planning without such data.
Multiple experts explain:
"If it has the ability to logically reason and reason about abstract concepts, which right now is what it really struggles with, that's a pretty tremendous leap," said Charles Higgins, a cofounder of the AI-training startup Tromero who's also a Ph.D. candidate in AI safety.
He added, "Maths is about symbolically reasoning — saying, for example, 'If X is bigger than Y and Y is bigger than Z, then X is bigger than Z.' Language models traditionally really struggle at that because they don't logically reason, they just have what are effectively intuitions."
Sophia Kalanovska, a fellow Tromero cofounder and Ph.D. candidate, told BI that Q*'s name implied it was a combination of two well-known AI techniques, Q-learning and A* search.
She said this suggested the new model could combine the deep-learning techniques that power ChatGPT with rules programmed by humans. It's an approach that could help fix the chatbot's hallucination problem.
"I think it's symbolically very important. On a practical level, I don't think it's going to end the world," Kalanovska said.
"I think the reason why people believe that Q* is going to lead to AGI is because, from what we've heard so far, it seems like it will combine the two sides of the brain and be capable of knowing some things out of experience, while still being able to reason about facts," she added, referring to artificial general intelligence.
"That is definitely a step closer to what we consider intelligence, and it is possible that it leads to the model being able to have new ideas, which is not the case with ChatGPT."
The inability to reason and develop new ideas, rather than just regurgitating information from within their training data, is seen as a huge limitation of existing models, even by the people building them.
Andrew Rogoyski, a director at the Surrey Institute for People-Centered AI, told BI that solving unseen problems was a key step toward creating AGI.
"In the case of math, we know existing AIs have been shown to be capable of undergraduate-level math but to struggle with anything more advanced," he said.
"However, if an AI can solve new, unseen problems, not just regurgitate or reshape existing knowledge, then this would be a big deal, even if the math is relatively simple," he added.
On LinkedIn, Barry Scannell and Connor Grennan offered similar explanations. Scannell also linked this relevant paper.
To reach AGI, scientists believe computers need to evolve beyond just repeating learned information as language models do, and the ability to reason and plan would empower AI to approach a wider range of complex tasks more effectively, getting us closer to human-level intelligence.
What else does it mean for work?
Artificial intelligence that can eventually tackle complex mathematical problems and reason and plan holds the potential to redefine numerous professional fields that hinge on analytical and problem-solving skills.
Just Math
Financial analysts and economists, for instance, might see parts of their roles automated as AI becomes capable of sifting through vast amounts of financial and economic data to predict market trends and inform policy decisions. Similarly, research scientists, who spend considerable time analyzing experimental data, could benefit from AI that automates data processing and hypothesis generation, potentially speeding up scientific discovery.
Operations research analysts and quantitative analysts—especially those in finance known as "quants"—could find AI handling the intricate models and simulations they currently manage. This could lead to more efficient decision-making and market analysis. Strategic planners and supply chain analysts might also leverage AI to streamline their processes, utilizing AI's ability to forecast trends and optimize logistics.
The use of AI in solving complex mathematical problems can greatly accelerate scientific discovery by processing large volumes of data swiftly, developing predictive models, automating repetitive tasks, generating new hypotheses, conducting simulations, and facilitating cross-disciplinary research. By doing the heavy lifting in data analysis and hypothesis testing, AI allows scientists to focus on the more creative and innovative aspects of their work, leading to faster and potentially groundbreaking discoveries.
For example, in drug discovery, an AI system could analyze vast datasets of chemical compounds and their interactions with biological targets to predict which compounds are most likely to succeed as new drugs. AI could significantly speed up this process, which typically requires years of trial and error in laboratories. The AI's predictions would enable researchers to focus on a smaller, more targeted set of compounds for laboratory testing, thereby speeding up the development of new medications and reducing associated costs.
We’ve already seen progress in many of these areas, but advanced reasoning capabilities would allow for a more nuanced interpretation of data, would enable AI to approach novel problems, would enable the consideration of more factors, and would enable the generation and testing of hypotheses.
More Applications of Reasoning and Planning
Beyond a greater capability to analyze data, AIs that could reason and plan would further disrupt the job market. Customer service could also see a transformation as AI systems manage customer inquiries and complaints, reducing the need for human customer service representatives. Manufacturing could witness a decline in the need for human workers and supervisors as AI systems optimize production schedules, manage inventory, and oversee quality control. The retail industry could see a change in roles for buyers and inventory managers as AI predicts consumer trends and automates ordering processes. In legal services, AI's ability to analyze documents and assist in legal strategy planning might reduce the workload on paralegals and researchers. In real estate, AI's evaluation of market data to predict trends and plan investment strategies could affect real estate analysts and agents.
In healthcare, such AI could take over diagnostic tasks after analyzing medical records and test results, potentially reducing the need for some medical specialists.
In the transportation sector, autonomous vehicles equipped with this AI could diminish the demand for human drivers, including truck, taxi, and delivery drivers, by efficiently handling routing and logistics. Although autonomous vehicles have been the “Holy Grail” of AI, AIs that can reason and plan could use that reasoning and planning to adapt to novel situations, which would make Level 5 driverless vehicles possible, as getting them to react in situations they have not been trained for properly has been difficult. Q-learning has been part of driverless car development for a long time.
The education sector could experience a shift with AI planning personalized learning paths, potentially impacting the roles of educational planners and some teachers, as AIs would now be able to plan Johnny’s math instruction for Unit 4. Lastly, in content creation, AI's management of complex projects like marketing campaigns or media production could alter the jobs of project managers and producers.
Quite possibly, it could eventually do almost anything, even without specific job training.
What makes Q-transformers most different is their potential to unlock few or one-shot learning of new tasks in an offline environment. You can't explicitly train an AI agent to do everything. But what if you could teach it enough in pre-training for it to be able to learn novel tasks after a handful of demonstrations? Then you wouldn't need to retrain or fine-tune the model to integrate it into an existing process. Instead, you could take the base model, give it a dataset of examples of various tasks being correctly performed, and have it a task. If it makes a mistake, you can correct it, and it will add that episode to its dataset of negative examples, dynamically reprogramming itself to give the corrected action a higher Q-value the next it tries it or a similar task. This would make it much easier to drop a GPT agent into a work setting and have it conform to the idiosyncrasies of the particular job with minimal extra work.
Are the Q developments real?
I have four answers to this question.
There are more and more reports that they have figured out how to do this with basic-level math. Regardless of whether or not it resulted in Altman’s termination, it seems plausible that an important advancement was made.
Related comments from leading insiders suggest that this or something similar has been achieved.
Ilya Sukstever on the Turning Point podcast.
The most near-term limit to scaling is obviously data…This is well-known. And some research is required to address this. Without going into details, I’ll just say the data limits can be overcome and progress will continue.”
Sam Altman in an interview with Mark Bentioff, founder and CEO of Salesforce.
“The models are going to get dramatically more capable”
The models are going to get significantly “better at reasoning”
“Companies will have their AI agents that customers can go off and interact with.” [This would require a better ability to reason].
Sam Altman at the APEC Summit (I posted these two videos before).
Sam Altman at the Developer’s Conference (November 7):
"What we launch today is going to look very quaint relative to what we're busy creating for you now."
This work is not something as revolutionary; it is something that OpenAI and other companies have been working on (see above).
Plus, recent papers and videos from Google, and remember that Dennis Hassabis has been saying that Deep Mind will soon release a model that has the ability to reason and plan (and have memory, which would allow it to self-improve). Could it be with this technology? It seems so.
It doesn’t really matter
There isn’t a leading AI scientist in the world who doesn’t think we will achieve AGI or something similar, and even something that is somewhat similar and falls short of all domains of human intelligence is going to have a dramatic impact on all aspects of our lives. Will Q* be one of the critical breakthroughs that gets us there? Maybe. If not, it will be something else, or, most probably, something in addition to it.
Is this “AGI”?
Even if it is one of the critical ingredients of AGI or discoveries that will lead to AGI, we are not close yet. The technology has to be scaled to more consequential uses beyond elementary school math for it to be a critical component of AGI. It would also have to be integrated with language models (potentially). Tremendous computing power would be needed to scale this.
This new “AI” has to be productized, and companies have to train to use the products, though there is some reporting that it is being integrated into products. We probably need more work on creativity and empathy to integrate as well. We may need a massive increase in visual data:
[Note: Despite Yann Lecun’s insistence on the need for visual data, he thinks that in 10–20 years we will likely have AIs that are smarter than us in all ways we consider ourselves to be smart].
But as I’ve been saying, each of these advancements has consequences that will impact the world. AGI isn’t an all-or-nothing technology that involves flipping a switch (or, more accurately, finding the switch).
What does this mean for generative AI and LLMs?
This is subject to speculation, but some suggest a future where the reasoning and planning abilities of Q-like projects are combined with the ability of large language models to communicate and generate content. At this point, such a system will have achieved many of the domains of human intelligence (see my essay on the domains here).
Why should you not be surprised?
Maybe you should be surprised that this technology is being developed faster than anticipated; many are. But no AI scientist is surprised that we are moving towards human-level AI along a continuum. You shouldn’t be surprised by this general trend.
Is it fair to criticize people for overreacting?
“Q” is not some secretive OpenAI-CIA project that emerged out of nowhere, so it is somewhat fair to criticize the “OMG! AGI” reaction on Twitter and LinkedIn, but the reality is that, “Skynet/Extinction” scenarios aside, it isn’t unfair for your average person to be concerned about the development of computer intelligence that exceeds human intelligence; even in the best of all worlds, this isn’t a completely transparent process, and this has been hard on some fields, especially education, where many consider the ability to produce an accurate bibliography or to write in the author’s voice to be a significant advance in AI. And, of course, this has been exacerbated by the idea of, “I won't believe it until I see it,” and concerns about an “AI Winter” despite billions in investment by the Department of Defense alone. Anyhow, it’s not surprising to see such a reaction and AI experts should be a little more understanding of the public’s reaction.
What does this mean for society and education?
“AGI has the potential to revolutionize every aspect of society, and it’s crucial that we prepare for its impact across all spheres of humanity.”
“AGI will be achieved in the next 6 - 24 months.”
“OpenAI’s success in the area caused me and many others to think we could see AGI within a couple of years.”
“Just a few days ago, I thought AGI might be 5-10 years in the future. Now I think it’s late next year, mid-2025 at the latest.”
“It wouldn’t be a bad time to start thinking about community AI boards to start the alignment aspects of the transition we face. The last week gives us clues to what we could expect in the future.”
These are just a few reactions to potential Q developments on the OpenAI community boards. I’m not vouching for any of them as truth statement, but they represent ideas that relatively informed people have.
What does this mean for society?
The focus at the societal level has been how to manage the rate of development both so AI is safe (doesn't act in a way that can harm us) and so that we can manage disruptions to employment (note: Google’s Deep Mind is now supporting a social adjustment fund).
What does it mean for education?
First, we need to work more on how to align the education system with a world where our students will live and work with machines that approximate their intelligence level. Part of this means integrating AI into the educational system to support today’s educational professionals, but it also means adjusting learning and assessment practices so that they are relevant to preparing students for a world in which the economic value of knowledge, reasoning, and planning will collapse, as that can be purchased at a small fraction of the cost of human labor. It also means helping students learn how to use AIs to thrive in this world.
This is not about AI writing and cheating; it is about a new world.
Second, it means helping students and faculty understand this world, one that is likely to come very soon (it’s already partially here). This will trigger a radical economic and social transformation on top of growing international and societal conflict. Perhaps we will emerge into a new Renaissance, but the transition, even absent a “Skynet/” terminator scenario things could get much worse.
How do we keep up?
It is hard to keep up with the pace of every new AI development. For most, it is impossible. Earlier today, Dr. Mairead Pratschke referenced a recent (November 21) podcast where there was a discussion of ChatGPT spitting out fake bibliographic references. That problem is so yesterday. A lot of what is being stated as “current’ is substantially out of date.
To a degree, this is inevitable, and the rate of change has discouraged many in education from even trying to adapt, but we have to. So, what can we do?
We can plan to restructure education to support a world where not only will students be able to learn from AIs, but people will live and work with computers that are as smart or smarter than them in almost all domains, at leas as it applies to the capabilities of the machines. That is an evolving reality, and that is the one we have to prepare ourselves and our students for. Planning for this is more useful than trying to “keep up.”
Where to learn more about “Q” and reasoning and planning in AI?
Sunil Ramlochan. November 21, 2023. Q* - OpenAI's Potential Breakthrough in Goal-Oriented Reasoning.
Another take — This article focuses on how Q may be about the development of reinforcement learning with AI feedback (RLAIF), which may make it more possible to scale models with synthetic data, which is another interpretation of the “we can solve the data” problem (Sustskever above). This would make it possible for those with resources to 10X the scaling of models, and scaling is still effective at generating the models and developing “emergent properties”/new capabilities.