Claude3 Includes Agents, May Be Self-Aware, and Moves Past ChatGPT4 on Many Benchmarks
The world our students live in will be much different than this one.
*Working AI agents
*”Hyper”-Advanced vision models
*Multimodal learning and outputs
*Significantly exceeding ChatGPT4 on math, reasoning, and coding benchmarks
*Fast when needed
*Potential self-awareness (but probably not)
*Use of synthetic data in training
*Reduced bias
*Improved knowledge of biology, finance, and cyber security
*Not close to AGI (sorry, not quite “human-like”)
Introduction
Yesterday morning I loaded a paper I’ve been working on to Claude and asked it to create a draft of an abstract. I don’t use Claude that often but it’s very good at document summarization and writing, so I figured I’d start there.
When I logged on, I saw that Claude now had Claude3.
Honestly, I thought I just missed that they had a new model version. But I checked LinkedIn and TheVerge.com and I saw it was out.
Then I just went back to my work. Throughout the day I saw some headlines that claimed Claude3 exceeded ChatGPT4 capabilities, but Gemini Pro had already claimed that at the margins, and I know ChatGPT5 will be out in due time, so I’ve been avoiding the “X model is better than Y at the moment.” It’s just a yo-yo and I don’t see how these marginal differences impact 98%+ of education use cases.
But as took a break yesterday evening and looked at a few articles I was somewhat surprised. Not only is the model significantly better, but it also runs Agents: AIs that can act autonomously to give instructions in the forms of prompts to other AI agents so those agents can complete parts of the task and then report back for the final product. Yes, it’s like the person who emerges as the leader in your class group work.
There are many things to write about but let’s start with one clarification, one feature, and move right into Agents.
Clarification
Claude 3 is a family of models — Opus, Sonnet, and Haiku (yet to be released). Opus is available commercially for $20/month and Sonnet is free.
If you are using a free version of AI such as ChatGPT3.5, stop using that and use this free version of Claude.
Haiku, which is coming soon, is cheap and fast.
Advanced Vision and Multimodal Learning and Output
Claude3 Opus has more advanced vision capabilities than any current model and this enables true multimodal learning and output.
Agents
The video below clearly shows Claude’s integration of agents.
The video begins by asking Claude to analyze the US economy in terms of GDP growth.
Claude starts with a web search (more an assistant than an agent) and finds a visual that it reads the data from, converts it to text, analyzes the trends, and outputs it in a visual. It includes some reasons for the fluctuations.
This is impressive, but you haven’t seen anything yet.
Starting at 2:13, the model is prompted to project GDP trends into the future, including analyzing multiple economies.
Now instead of creating this analysis one country at a time, the model “Dispatches sub agents” to “break down the problem around the sub-problems and writes its own prompts for other versions of itself to carry out the sub-tasks. Each “sub-AI,” AI analyzes a different economy and reports back.
This is how every student in the group does his or her part. It’s how each team member at work does his or her part. It’s how the different debaters and coaches do their assignments and share the collaborative work.
Check it out.
Agents are a big deal, and this is just the beginning. Imagine students completing their homework with models that can sub-assign various parts. Imagine tutoring systems using a combination of AI agents to collaboratively teach students.
Commenting on this post, AI Scientist Eric Fraser points out that this is already present in other models.
Yes, our “Cyborg Students” don’t just have AIs to help them write their papers; they have, as I predicted, follow-blown agent assistants.
Generally, how does it compare to ChatGPT4?
It claims it’s marginally better ahead in most categories, but it’s way ahead on coding, graduate-level reasoning, and math.
Most users can upload 200,000 tokens worth of material (approximately 150,000 words). If you subsequently prompt it to find what you need within that material, it will get it right with 99% accuracy. They claim it can analyze up to a million tokens (750,00 words), which is what Gemini claimed last week.
1 million tokens, 99% accuracy
One person who repeated the test argued this may mean it’s aware of what it is doing, demonstrating meta-cognition.
If these advanced models are self-aware, something Geoffrey Hinton has claimed since the release of ChatGPT4, that offers explosive potential for tutoring.
It is important to note that there is strong disagreement with the post above (Kilcher) and Fraser.
Mimicking Your Style
Scanning
Due to strong vision capabilities, Opus can scan anything, even something that is barely legible, due to multimodal scanning.
Do you have any old, historical documents in your library you want to scan? Old records? Anything? It will scan them.
From Arts Technica: According to a model card released with the models, Anthropic achieved Claude 3's capability gains in part through the use of synthetic data in the training process. Synthetic data means data generated internally using another AI language model, and the technique can serve as a way to broaden the depth of the training data to represent scenarios that might be lacking in a scraped dataset.
With the large, “frontier” models trained on most of the publicly available internet, many wondered where, beyond proprietary data, companies could get additional data to train their models on. This is important, as the primary way to train the models to date has just been to provide more data, as computer scientists have generally found that the more data and compute you throw at a problem, the greater the capabilities this can develop.
People have suggested the models could train on the synthetic data for a long time, but others worried the AI may “eat its own tail.” But this proves the models can train on synthetic data, essentially removing any data limits to scaling (some say the scaling approach has limits beyond the amount of data available, but at least this removes that limit).
Interview
This is an important interview with Daniela Amodei, Anthropic Co-Founder, covering many important topics.
(a) The difference between these three Anthropic models.
(b) The fact that companies will use different models for different purposes, including different models from different companies.
(c)Opus’s expanded reasoning capabilities.
(d)A better to respond to prompts in appropriate ways.
This analysis covers many of the features of Opus, highlighting its vision abilities, and its growing (but not quite human-like (this isn’t AGI)) abilities:
Reduced Bias
Anthropic reports less bias in these models.
Improved knowledge of cyber and biological capabilities.
More
Finance and Health Care Fine Tuning
The model claims to have greater accuracy related to health care and finance due to specific fine-tuning on those areas. When someone does this for education, we’ll have we’ll have a very powerful model for that purpopse.
You can check out the full technical report here.
Is OpenAI/ChatGP4 in trouble?
Some writers who want to grab your attention are arguing that these superior abilities mean OpenAI/ChatGPT4 is in trouble.
OpenAI certainly has a competitor, but remember that ChatGPT4 finished its training in the fall of 2022. ChatGPT5, which we can expect this year, will likely contain at least all of these features, at least advanced agents, and incredible video production abilities (see Sora). Anthropic has not yet ventured into image and video generation.