A free, very friendly (and flirty) tutor that sometimes gives you the wrong answer and can manipulate you
To start, I want to thank everyone who has voluntarily opted into the paid subscription ($30/year) to this blog. Writing these posts consumes a lot of my free time, so I appreciate the support.
Now, to the beef.
As I noted yesterday, with its release of ChatGPT-4o, OpenAI gave the world free access to the world’s most advanced general intelligence model and the tools to interact with it in a very conversational way.
This new model, which is supported by multimodality, enables users to engage in natural, “humanlike” conversation that picks up on emotion, including facial expressions, and mimics human exchange. Users can not only have direct conversations with it through text and speech, but they can show it objects in the room or math problems on their screens and have conversations about those as well.
The impact of its ability to engage in this type of conversation cannot be overstated.
Comparisons to Scarlett Johansson in Her quickly appeared all over the internet. OpenAI CEO Sam Altman said that “it feels like AI from the movies.” :). I assume he meant Her and not The Terminator.
Claire Zhao raised the issue of whether the Turing Test was passed.
One Twitter (“X) poster even claimed it represented AGI.
Users can access their new “friend” through a desktop app. I tried that last night and it’s convenient and easy to use.
I’d be surprised if it’s long before we are interacting with expressive avatars of our choosing.
Based on the newest benchmarks (see below) it is true that this is the world’s most advanced AI model, if only at the margins. Indeed, students who can otherwise pay for access to these models (previously accessing these models was $20/month) have a significant advantage over those who do not, both because when used correctly they can learn more, and also because they are likely to get better grades for work when the turn it in (assuming they use it consistent with class guidelines and do not get caught).
It is also true that this is a very friendly tutor.
What is not real is that it’s a tutor that is always right.
While ChatGPT4o is an improvement over ChatGPT4 on all benchmarks, there are still gaps between what it can do and being correct. This is particularly the with math.
A lot of the time it does the math problems correctly, and its ability in this area has grown.
But it’s far from getting everything correct; sometimes it will give the wrong answer, especially to slightly more complicated math problems. Neither what the Khans tried nor the one that was demonstrated live was especially complicated; it was very basic geometry and algebra, and it generally gets those problems right. Both demonstrations also walked ChatGPT through the problem; they didn’t just ask ChatGPT to do the problem from start to finish.
It can’t do everything right.
And remember, OpenAI never claimed it would get everything correct. Their own chart indicates the opposite.
Unfortunately, many don’t understand this.
So, the downside is that the “hype” around the tutors that is sweeping the internets is that some students will get the wrong answers. And, more importantly in my mind, it risks a backlash against AI-enabled intelligent tutoring systems. More advanced systems that are not reliant on generative AI to teach math won’t make these mistakes, though most people won’t understand the difference.
In no way is this a “Don’t use ChatGPT-4o” as a tutor post. Even with its limitations, it provides access to the most advanced generative AI tutoring that is based on LLMs. It not only levels the playing field but it provides access to a tutor many students would not otherwise have. My guess is that if every student learned what ChatGPT was capable of teaching, we could reverse the decline in NAEP scores in the US.
I just think it’s important that we recognize its limits and share those with students so that when they do use it as a tutor, they use it in the best possible way.
But beyond the issue of accuracy, I do want to raise another concern related to where the world of conversational AI may be headed, especially for students.
As Ethan Mollick noted in his take on the announcement, “people building close relationships with AIs seem inevitable.”
We know that AIs are persuasive, more persuasive than humans. They are also skilled at deception and manipulation.
ChatGPT-4o can arguably Flirt. Christian Talbot notes that “my instincts tell me that deeper impact is going to be in ‘simulated intimacy.’
Historian Yuval Harari has been arguing this intimacy will unlock political persuasion.
I wrote about these dangers back in July when the Pi bot was released — “ The new Pi bot launches with the objectives of becoming your close friend and helping you make decisions.”
Yes, this isn’t our first go-around with emotionally connecting AI. We’ve had Pi and Hume.ai, though ChatGPT-4o is the most sophisticated, and given their notoriety, the most popular.
And it opens up many questions beyond the accuracy of the models.
I’m sure that many foreign governments and both political parties would be happy to open up free tutoring systems to our students. So would many advertisers. This doesn’t appear to be a problem with ChatGPT4-0 (though some say it has political leanings), but as we think about the future, we are going to have to give this some thought.
Again, I think this is all probably a net good. But it doesn’t come without downsides. It’s not always right, and it can be pretty good at manipulating people (not that the people and the internet are not good at these things).