Stefan Bauschard
Dr. Sabba Quidwai
2023 has been a wild year in the world of AI, and it is difficult for anyone to keep up, especially now that change is exponential!
To bring you up to speed, we review the major highlights.
ChatGPT and other language models that generate text. The biggest news was the release of ChatGPT. While version 3.5 was released at the end of November 2022, it took off in 2023, reaching 100 million users by the end of January 2023. Currently, 180 million people have created accounts, and there are currently 100 million active users, with 1.7 billion total visits. ChatGPT4 was released in March, and similar tools from Anthropic (Claude), Google (Bard), Inflection (Pi), Perplexity, Facebook (Llama), and X (Grok) followed. These are all large language models trained on trillions of words.
Image generators. Words are not the only thing that AI models can generate; diffusion AI models can also generate images. At this point, everyone has likely seen some AI-generated images. We certainly had AI image generation before we had ChatGPT3.5 in November, but it was terrible. We now have image generators that can produce photos that are indistinguishable from any actual photo. Popular image generators include Dalll-E-3 (part of ChatGPT), Midjourney, and Google’s Imagen.
Video generators. AI can also generate video from simple text descriptions. Popular video generators include inveed.io, Runway.ml, and Pika.art. These really developed at the end of 2023 (November-December) and will be a huge part of 2024.
AI models that that can read, hear, and speak We all know that AI models can respond to a prompt, so they were always reading to a degree, but now they (ChatGPT4, Claude) can ingest and summarize texts up to approximately 150,000 words. They can also work with a microphone to “hear” what is being said. Their text output can be converted into audio, and they can “speak.”
AI models that can see. There are now AI vision models that can “see.” These models can “see” an image that is uploaded and provide a description, see a room through a pair of glasses, communicate to a blind person what is present, and recognize text written on a piece of paper.
Towards the end of 2023, we may have seen models that can not only see and read (+hear) through different mechanisms that are then sort of tied together (ChatGPT4), but perhaps models (Gemini) that were trained by doing all three at the same time. This could significantly enhance their capabilities.
Improving models. Many people have a negative impression of AI because they only used it once, many months ago, and they only used the free version of ChatGPT, which is version 3.5. These older models often lack not only advanced capabilities but will hallucinate (make stuff up) more often and are trained on less knowledge. The most advanced models (ChatGPT4, Claude2) hallucinate significantly less, can perform statistical analysis, generate accurate bibliographies, can read and speak in multiple languages in a user’s voice, have been trained on datasets that are not English, and have improved specialized domains like science, technology, and specific professional fields.
“Multimodal” models. Models that can take in input in text, sound, and video and then produce output in text, sound, and video at the same time are considered multimodal. Individuals who experience that in an ‘immersive” environment are using some form of virtual or augmented reality.
Training on specific content. The most advanced, “frontier” AI models have been trained on more specialized domains, but we are now seeing new products whose models have been “fine- tuned’ on a specific knowledge base, such as legal (Harvey.ai), financial (BloombergGPT), and medical (Med-PaLM). This produces more accurate results.
Reasoning & planning. There is a debate within the AI community as to whether or not language models can reason. Most AI scientists believe they can engage in low-level reasoning (basic inference reasoning) but that they cannot engage in more advanced, abstract reasoning that would be needed to formulate a hypothesis, context, and experiment. In 2023, however, we started to see more and more claims that the more advanced models (ChatGPT4, Gemini Ultra (Bard) can reason and we saw an article in a peer-reviewed journal about an experiment that was conducted by a “Co Scientist,” an AI model. This may also have demonstrated some “planning” abilities, which are the first we’ve seen of this.
Smaller, local, and “free” open-source models. There is always a lot of conversation about the large language models such as ChatGPT and large diffusion models that produce images, but there are more than 300,000 such models (mostly language models) that people can download and run on their computers without any internet interface. Many of these models are “open source’ (free). Recently, Mistrial released a model that was a bit stronger than ChatGPT3.5 and is freely available. Subsequently, Apple released a technology that allows the models to run on iPhones. As a related trend, we closed the year with models that are much smaller but as capable as the larger ones.
Integration of AI into products. Users can interact with these models through easy-to-use public interfaces (e.g. chat.openai.com (ChatGPT), bard.google.com) but the underlying technologies are also being integrated into many products, such as Character.ai, which allows you to chat with Characters, which is the most popular generative AI site after ChatGPT. Others include websites that allow people to create and chat with virtual girlfriends and boyfriends, help them write essays and blog posts, study for their AP exams, generate music, and create public service announcements in multiple languages. Microsoft 365 and Google Suite are also seeing integrations.
Incredible scientific and medical applications. We’ve seen more and more medical and scientific advances, including the ability to read radiology reports almost as well as a human, the discovery of thousands of proteins and new antibiotics, and the discovery of millions of new materials.
Launch of personal bots. We mean the launch of new bots (“GPTs,” Poe, playlab.ai) that allow a person to share knowledge and have an assistant to help them do their work. As these “assistants” develop an ability to reason and plan, they will become “agents” that can carry out tasks when given a goal, including choosing which tasks to carry out.
Robotics. You can think of AI models as the “brains.” Robots are bodies. And we are starting to see the development and production of the bodies. It is harder to develop a working robot than the AI model brains, but robots are starting to work in factories and other controlled environments.
So, yes, in 2023, we started to see the basic assembly of robots, from the development of brains to the creation of bodies. We are probably 2+ years from reliable autonomous robots with physical incarnations, but you can see them start to take shape.
Education’s reaction
Education was arguably the first industry impacted by generative AI applications, as students' papers can be written by generative AI. Most faculty were taken aback by this, and while many tried to incorporate AI writing into their courses, others spent a lot of time trying to “detect” and punish such writing. Such efforts consumed a lot of educators’ time.
In addition to spending some time thinking about how to manage student writing in an age of AI, some schools and faculty began integrating AI products that increased surveillance of students (HallPass.com) and made it easier for teachers to complete tasks (lesson plans, grades, quizzes, vocabulary lists, etc.). These focus on how to do what we do now in school faster. There is certainly some merit to this, but it leaves fundamental questions related to how we should help prepare students for an AI World unanswered.
A small number of schools and universities have started thinking about how to prepare students for a world where robots can read, listen, see, think, and respond in text, voice, image, and video in the same way a teacher or professor who is knowledgeable in the content area and has strong human interaction skills would. We outline many ideas for consideration in our report, and we expect more development in this area in 2024.