On the USDOE's new Designing for Education with Artificial Intelligence: An Essential Guide for Developers
Meeting the standards may make AI smarter than us
TLDR
This is an excellent set of standards for edtech vendors.
Since most student interaction with AIs will be outside the school-based edtech system, these standards should be applied to all AI products.
If vendors meet the standards (reduce bias below human levels, for example), edtech AIs may become smarter/better decision-makers than humans.
Rapid advances in AI will make it challenging to produce “evidence-based” support for AI usage in education, but blocking use for lack of evidence when new developments emerge will leave students using systems at school that are inferior to what they can access on their phones, glasses, and contact lenses.
AI systems that are smarter than humans challenge the principle of the Human in the Loop (HITL).
Trust requires vendors being honest as to where these products are in terms of meeting the standards outlined in the document.
Parents will use AIs to challenge in-school decision-making.
At some point, we need to talk more about preparing students for an AI world and not just how to evaluate and use AI in edtech products that prepare students for a previous world.
If we want the developers of AI products to be honest with us about the current capabilities of their products, we also need to start being honest with ourselves about what advancing machine intelligence means for us, teachers, and our students’ future.
Yesterday, the Department of Education released a new report, Designing for Education with Artificial Intelligence: An Essential Guide for Developers.
It’s really the first federal government report to directly tackle the challenges of AI in education. In May 2023 it released, Artificial Intelligence and the Future of Teaching and Learning, but that was written before ChatGPT3.5 was released in November of 2022 and barely mentioned generative/interactive models. In January, it released the revised National Educational Technology Plan, which was focused on technology and equity, which is important, but not unique to AI.
[Thank you again for supporting our work. If you find our guidance report helpful, please consider purchasing an annual subscription to this blog ($30) or picking up a copy of our book ($9.99)].
Yesterday’s release is an important report, and it highlights important areas “edtech” application developers need to be strong to properly support educators and reassure schools.
These are the areas it identifies —
*Reduced bias
*Data privacy protection
*Accuracy in output
*Reduced risk of malicious use
*Transparency and explainability
*Risks from a “Race to Release”
*Harmful content risks
*Ineffective system risks
*Honesty in how the products work
*Unprepared user risks
Essentially, the report suggests solving these problems as standards; developers should create systems that protect privacy, are not biased, etc.
These are all important risks for developers to address, and not just for educational products. The reality is that students will use AI “tools”, including “general, non-ed tech” tools outside of school way more than they will ever use them inside of school, so it would be great if biased, harmful content, etc., were reduced in all AI products. If these problems are only solved in “edtech” products, it will be helpful, but will honestly have a relatively trivial impact given the extensive use of AI products by students and faculty on their own PCs, phones, glasses, and even contact lenses. We’ll just end up in a situation where schools are in compliance with the law (which is essential and valuable) but students still lack privacy).
This isn’t in any way meant to discount the importance of protecting these in edtech products, but if 95%+ of student interaction is with non-edtech/school-based AIs, we have a long way to go.
While I support all of the recommendations, they do create two challenges for schools —
(a) Demands for “evidence-based” support before any new product is used risks locking schools into products using what will quickly become ancient technologies given the rapid rate of AI development.
(b) If the developers meet these standards, the AI systems may become smarter than humans and challenge the emphasis on the HITL.
Locking in Old Tech
The one place the report acknowledges advancement in AI is when it points out that just as educators became familiar with text-based chat-bots, industry started releasing multimodal capabilities.
Yes, this is significant, but not so much because there is more than one type of output (text+image+video+combination) that creates security risks.
Multimodal AI represents a significant advancement in AI because it allows AI systems to learn in ways that closely mimic human cognition. By processing and integrating information from multiple sensory inputs, these systems can develop a more comprehensive understanding of the world, much like humans do.
This approach enables AI to make connections between different types of data and develop more robust, flexible intelligence. Robotics further enhances this capability by enabling embodied intelligence, which even more closely simulates the human learning experience. Through robotics, AI can physically interact with its environment, learn through direct experience and trial-and-error, and develop sensorimotor skills and spatial awareness.
The combination of multimodal AI and robotics creates a learning paradigm that closely mirrors human cognitive development. This synergy potentially leads to AI systems that can adapt more readily to new situations, develop common-sense reasoning, and understand context and nuance with greater effectiveness. As a result, these advancements are pushing AI closer to human-like learning and intelligence, opening up new possibilities for AI applications and our understanding of cognition itself.
But even this is only a part of recent developments.
In late May, Anthropic released a paper titled, “Mapping the Mind of a Large Language Model,” which deals with “interpretability.”
Interpretability refers to our ability to understand and explain how these models arrive at their outputs. It's about making the "black box" of neural networks more transparent, allowing us to peek inside and comprehend the decision-making processes. Interpretability research aims to achieve several key objectives. First, it seeks to understand how knowledge is represented and stored within the model. Second, it attempts to trace the reasoning process that leads to specific outputs. Third, it works to identify potential biases or failure modes in the model's behavior. Lastly, interpretability research aims to improve model performance by understanding its strengths and weaknesses.
As retired MIT research scientist Tim Dasey notes:
While Dasey’s examples are about industry, it doesn’t take a wild imagination to think about how the same capabilities apply to “edtech” tools.
And it’s not just significant progress in multimodal learning and interpretability.
At a minimum, we went from ChatGPT3 (there were versions prior to this, but in terms of reasonably usable systems), to ChatGPT4, which can at least simulate reasoning, use tools, browse the internet, and summarize documents
Claude 3.5 Sonnet allows users to interact with Claude's outputs in real time, creating a dynamic workspace for editing and building upon AI-generated content like code snippets, text documents, or website design.
We are also starting to see the emergence of agentic AI systems. Agentic AI systems are designed to autonomously pursue complex goals and workflows with limited direct human supervision. Unlike traditional AI which simply responds to inputs or performs narrow tasks, agentic AI aims to operate more like a human employee - understanding context, setting goals, reasoning through subtasks, and adapting decisions and actions based on changing conditions.
Andrew Ng and Andrej Karpathy both note that recent developments in agents, especially the ability of large teams of agents to collaborate, potentially put us on a shorter-term path to AGI. Neither of these AI experts are known for hyping AI.
Others, such as Dario Amodei, CEO of Anthropic, claim that scaling AI models to $100 billion training runs (we are currently at $100 million training runs) will lead us to AGI. While there is a debate about how far scaling can take us, I don’t think it’s a good idea to dismiss this possibility.
I lay this all out not to rehash the AGI debate, but to point out that these systems are demonstrating new capabilities very quickly and advancing rapidly. These new capabilities will advance more quickly than evidence for their educational effectiveness could ever be demonstrated.
For example, perhaps a study will reveal that ChatGPT3.5, which is nothing more than a text-based predictor, can be utilized in a way that is effective for at least most students, but there are no studies on the effectiveness of the capabilities in ChatGPT4, Sonnet 3.5, and emerging agents. Will these latter systems (or capabilities in ‘ed tech” products) be restricted at schools and only available to students on their phones?
AI That Becomes Smarter than US
The end of the last section addresses the possibility of AI systems becoming smarter than us, something that most leading AI experts believe will happen in the next 2-20 years.
But let’s look at a few specific ways it could easily become smarter than us if the standards outlined in the document are met.
Bias in counsellor recommendations. The document expresses concern about bias in counsellor recommendations that are based on AI output.
This can be solved very easily through basic reinforcement learning. Synthetic data may also help.
For me, I think the magic is just getting these systems to be less biased than humans, which probably isn’t hard, as human bias is rampant.
Who is detecting the "unfairness in the recommendations due to biases" when lower SES, Black, and Hispanic students are recommended for academic courses that have lower difficulty levels?
When are they encouraged to apply to less competitive colleges than their Asian and White peers?
We have DEI programs to address this, but, while I’m in favor of them, these programs struggle to show results for various reasons, including that many of the trainees resist the training. AIs don’t resist their training.
So, yes, I think it’s entirely plausible for an edtech company to develop a counselling AI that is less biased than humans. This can be done with reinforcement learning. When it does, are we still going to roll with the biased recommendation of the human counsellor?
Hallucination rates. Yes, generative AI systems make factual errors. If you use ChatGPT3.5, you’ll find these error rates to be high. But error rates are declining.
Developers are employing various strategies to reduce hallucination rates in AI models, particularly in LLMs. One effective approach is Retrieval Augmented Generation (RAG), which grounds LLMs in specific knowledge bases before generating responses, ensuring more accurate and contextually relevant outputs. Another method involves implementing rigorous human oversight and feedback mechanisms, often referred to as RLHF. This approach allows human reviewers to validate and correct AI-generated content, gradually improving the model's accuracy over time. Developers are also focusing on using high-quality, verified training data and crafting more specific prompts to guide AI responses. Additionally, some researchers are exploring innovative techniques like multitoken prediction and semantic entropy measurement to detect inconsistencies in AI output. Many even have the systems debate with each other to produce the best answer.
While there is ongoing debate in the AI community about whether hallucinations can be fully eliminated within current generative AI architectures or if entirely new approaches are necessary, there is growing optimism that this challenge will eventually be overcome. Some experts argue that incremental improvements to existing models and training techniques may be sufficient, while others contend that fundamental breakthroughs in AI architecture are required. Regardless of the specific path forward, many researchers believe that continued advancements in AI will lead to systems with significantly reduced hallucination rates. The ultimate goal is to develop AI models that produce only incidental errors at rates lower than human error rates, effectively making AI-generated content as reliable as, or even more dependable than, human-produced information. As the field progresses, it seems likely that a combination of refined architectures, improved training data, and novel validation techniques will contribute to solving this complex problem.
A World of Less-Biased and More Accurate AIs
One of the most consistent themes across all AI guidance documents and documents similar to this one is the call to keep the human as the primary decision-maker.
This makes sense for many reasons, including that while these systems are competitive with us now human + AI is still better in all or nearly all instances, we all want to keep our jobs, and what we do every day is really focused on using and developing human intelligence.
But, at some point, if AI becomes ‘smarter’ than us at certain tasks, then we may need to let it do that task. Imagine this Dialogue.
Dad: (concerned) I've been reviewing your recommendation for placement X for my son. Can you walk me through your reasoning?
Counselor: (calmly) Of course. Our recommendation is based on a comprehensive evaluation involving multiple professionals. We've considered input from your son's special education teachers, general education instructors, district specialists, and other experts who've worked closely with him. Their collective assessment suggests that placement X would provide the most suitable environment for his learning needs.
Dad: (skeptical) I appreciate the thorough process, but I'm worried about concrete outcomes. What evidence do we have that this placement will actually benefit my son?
Counselor: (confidently) Our recommendation is grounded in both experience and research. We've seen positive outcomes for students with similar profiles in this placement. Additionally, current educational studies support this approach for children with your son's specific learning challenges.
Dad: (pulling out his tablet) I understand that, but I've been doing my own research. I used this new AI application that analyzed my son's profile across more than 750 data points. It compared his case against all published research on this program, and it suggests that this placement won't effectively address his core issues.
Counselor: (pausing, then speaking carefully) I appreciate your proactive approach, Mr. Jones. While AI tools can provide interesting insights, they have limitations. They may not account for the nuanced, day-to-day observations of educators who work directly with your son. Our team's recommendation comes from years of hands-on experience and a holistic understanding of your son's needs.
Dad: (frustrated) But isn't that the problem? Your recommendation seems subjective, while this AI tool is offering an evidence-based, data-driven analysis. Shouldn't we at least consider what it's suggesting?
Counselor: (leaning forward, speaking empathetically) I understand your frustration, and I agree that we should consider all available information. However, education isn't just about data points. It's about understanding the whole child - their personality, their daily interactions, their emotional needs. These are aspects that experienced educators are uniquely qualified to assess.
Dad: (firmly) I appreciate the value of human experience, but I think we need to look at this objectively. The AI system is drawing on an incredibly vast amount of data - far more than any individual or even team of educators could possibly process. It's analyzed hundreds of thousands of cases, outcomes, and research papers. In this particular instance, relying solely on human judgment might actually lead to a less desirable outcome for my son.
Counselor: (taken aback) I... I see your point. You're right that the AI has access to a broader research base than we do.
Dad: (continuing) And it's not just about quantity. The AI can identify patterns and correlations that humans might miss. It's not influenced by personal biases or limited by individual experiences. For a decision this important, shouldn't we prioritize the most comprehensive and objective analysis available?
Counselor: (looking thoughtful) You've given me something to consider, Mr. Jones. While I still believe human insight is valuable, I can't deny the potential benefits of such a data-driven approach. Perhaps we need to reassess how we make these decisions.
Dad: (nodding) That's all I'm asking. I want us to use every tool at our disposal to make the best choice for my son. Can we agree to review the AI's analysis in detail and consider adjusting the recommendation based on what we find?
Counselor: (after a pause) Yes, I think that's a fair request. Let's schedule a meeting to go through the AI analysis together. We'll involve the rest of the team as well. If the data suggests a different approach might be more beneficial, we should certainly be open to that.
Dad: (relieved) Thank you. I appreciate your willingness to consider this perspective. It's not about discounting your expertise, but about combining it with the most advanced analytical tools we have.
Counselor: (with a small smile) You're right. After all, our goal is the same - to do what's best for your son. Let's work together to make sure we're making the most informed decision possible.
Here, the role of the human educator is not replaced but takes on a different role, perhaps allowing the “more intelligent” AI to take on a larger role and shift the narrative. The educator no longer competes with the AI and tries to trump its better decision-making. Rather, the educator (and the human parent) uses. the AI as an extension of their intelligence to make a better decision.
It won’t be long before this type of “edtech” product is available to schools and parents.
Other Standards
There really is no reason developers cannot meet the other standards.
The world’s leading financial institutions, health care centers, and militaries that have started integrating AIs into their workflows demand data privacy protection, cybersecurity protections, honesty in how the products work, reduced risks of malicious use, and avoiding moving fast and breaking things in all contexts. Educators, and everyone using these products, should demand the same.
Trust
Building trust with schools requires companies to be transparent about their AI products' current standing on key benchmarks. Just as major institutions in finance, healthcare, and defense demand rigorous standards for AI integration, educational institutions should expect no less. Companies must candidly assess and disclose how their offerings measure up in terms of data privacy, cybersecurity, operational transparency, safeguards against malicious use, and overall system stability. By openly acknowledging both strengths and areas for improvement across these critical domains, AI providers demonstrate a commitment to responsible innovation. This honesty not only helps educators make informed decisions but also fosters a collaborative environment where schools and companies can work together to address challenges and advance AI applications in education. Ultimately, this transparency is essential for creating AI tools that truly serve the needs of students and teachers while upholding the highest standards of safety and ethics.
Companies that tell schools their edtech products meet these standards when they do not ultimately set themselves up for failure.
Conclusion
The Department of Education has released an important set of standards for the developers of edtech products to meet. Given that most students will use AI products more outside of school than inside of school, I hope all developers of AI tools meet these standards.
In a couple of instances, meeting these standards will push developers to create products that exceed human intelligence in some areas (something they are already trying to do). This will create more fundamental/existential challenges for educators that will force them to reconsider the traditional role of “edtech” in the school system. While many find this challenging, it will also create many opportunities when properly utilized.
P.S. When AIs Become Smarter than Us.
In this essay, I introduced the general trend of AIs becoming smarter than us and identified a couple of specific areas where they could easily become smarter than us in the education context.
When AIs become smarter than us in these contexts, this will create discomfort for educators. It will also start to create discomfort for students (it already has).
This shift may challenge students' sense of accomplishment and self-worth, particularly if they perceive AI as an insurmountable competitor rather than a tool for augmentation. Preparing students for this AI-driven world involves not only teaching them how to effectively use AI tools but also helping them develop resilience, adaptability, and a growth mindset. By fostering these qualities, we can empower students to navigate the evolving landscape with confidence, seeing AI as a collaborator in their learning journey rather than a threat to their potential.
If we want the developers of AI products to be honest with us about the current capabilities of their products, we also need to start being honest with ourselves about what advancing machine intelligence means for us, teachers, and our students’ future.