July 11: Inability to Regulate AI Use, Student-led AI instruction, Hour of AI, Grok4, Parent Concerns Grow, "Reasoning" explained, ARC-AGI Explained, Humanity's Last Exam, AI and the Economy
Sometimes these type of update posts focus on signficant model and application application updates, but this week I want to cover some important developments related to AI use.
An inability to control when and how students use AI. Despite their popularity, I’ve always been skeptical of approaches that attempt to regulate when/how/where students can use AI and I don’t think schools should build policies around this idea. My skepticism was grounded in the ideas that AI writing is becoming more human and individualized, making it difficult to detect in most instances, and that AI is simply everywhere — phones, applications students use, a bit in browesers, and soon in glasses and even context lenses. Though I’ve had success with some schools in convincing them of this, it has been an uphill battle.
Yesterday, however, Adam Pacton posted something on LinkedIn that drove home the point: With the release (Perplexity if you have the top subscription) and pending release (OpenAI/ChatGPT) of AI-enabled browsers that are entirely interactive, how do we stop students from using AI (in cetain situations)? AI usage will become inextricably intertwined in every part of the academic process. You can read what has become a long thread here (LinkedIn).
A student teaches fellow students about AI. I’ve been teaching an “AI & Entrepreneurship” class for high school students. Yesterday a high school student taught most of my class. She taught them how to build bots in Playlab AI and websites and a speech analyzer in Replit. My students were all blown away. They had no idea how to do these things. Then one student had a website he built in Replit up and running in 15 minutes.
They didn't know what tokens are, but they were also interested in recursive self-improvement (though they didn't know the term).
So, in 90 minutes, one HS student taught other HS students how to build bots and websites with AI, we learned about tokens, we had a discussion about recursive improvement, and I shared some resources about learning more about RSI. A useful 90 minutes. No one plagiarized any papers and I'm quite confident no one experienced any brain rot.
Empowring students to teach other students will be an essential part of the AI transition.
Hour of AI. This week, CODE.org launched Hour of AI, a new global initiative designed to introduce students and educators to the fundamentals of AI through accessible, hands-on activities. Building on the widespread success of the Hour of Code, Hour of AI aims to empower learners of all ages to move from passive users of AI technology to active creators who can shape the future of this rapidly evolving field. The program emphasizes that AI literacy is essential for everyone, not just computer scientists, and provides resources that require no prior expertise—making it possible for any teacher, parent, or community leader to facilitate an Hour of AI event. Activities range from training machine learning models to creating AI-generated art and exploring AI concepts through popular platforms like Minecraft, ensuring that learning is both engaging and relevant to students' everyday experiences. By making these resources globally accessible, CODE.org seeks to ensure that every student, regardless of background or location, has the opportunity to understand and influence how AI will impact their world. For more details and to access the curriculum, visit CODE.org's Hour of AI page
Grok4, Reasoning, ARC-AGI, and Humanity’s Last Exam.
Grok 4 leads the current generation of AI models with an Artificial Analysis Intelligence Index of 73. This places it ahead of competitors like OpenAI's o3 and Google's Gemini 2.5 Pro, both of which score 70 on the same index. The model's MMLU (Massive Multitask Language Understanding) score is 0.866, further underscoring its high-level performance on a broad range of academic and professional tasks
Grok 4 demonstrates exceptional proficiency across scientific, mathematical, and technical domains. It achieves near-perfect scores on benchmarks such as the AIME (American Invitational Mathematics Examination) with a 95–100% score, and excels in graduate-level scientific Q&A (GPQA) with scores around 87–88%.
The model's content mastery extends to advanced mathematics, physics, chemistry, engineering, and code generation. It can analyze complex proofs, solve advanced calculus problems, and provide structured, step-by-step explanations for challenging scientific and philosophical questions.
Related —
Reasoning. In the reveal, Musk talked a lot about Grok4’s reasoning abilities. A few still argue that AIs can’t reason like humans, and while the AI reasoning is not necessarily the same as the human reasoning process, AI models do engage in what can be defined as a reasoning process. Dr. Tim Dasey recently posted his first Substacks (1 of 3) explaining reasoning models.
ARC-AGI. Francois Challot recently gave a talk on the three levels of the ARC-AGI test. Grok4 currently has the highest score on ARC-AGI 2, twice the score of the next highest leading model (Claude4)
Humanity’s Last Exam. Humanity’s Last Exam (HLE) stands as the most formidable benchmark ever designed to assess the reasoning and knowledge of advanced artificial intelligence systems. Developed by the Center for AI Safety and Scale AI, HLE features 2,500 to 3,000 questions spanning over 100 academic disciplines, each crafted by nearly 1,000 subject-matter experts worldwide. The exam’s extreme difficulty is intentional: it probes the very frontiers of human expertise, requiring not just rote memorization but multi-step reasoning, deep domain knowledge, and even the ability to interpret diagrams or images. Previous state-of-the-art models from OpenAI, Google, and Anthropic scored below 26% on HLE, highlighting the benchmark’s rigor and the significant gap between current AI capabilities and true expert-level reasoning.
The recent performance of Grok 4, the latest model from xAI, marks a dramatic leap in this landscape. According to multiple reports and leaked benchmarks, Grok 4 achieved a remarkable score of 44–45% on Humanity’s Last Exam—nearly doubling the previous world record and far surpassing competitors like Gemini 2.5 Pro, OpenAI’s o3, and Claude 4 Opus. In some testing scenarios, Grok 4 scored as high as 51% when using advanced tool integration and parallel multi-agent synthesis, and even its base model without tools reached 27%.
This achievement has been described as a “quantum leap” in AI performance, suggesting that Grok 4 can now tackle complex, multi-step problems that previously stumped even the best language models. While these results are generating excitement about the rapid evolution of AI, some experts urge caution, noting that official verification and transparency about training data are still needed.
Worries about what to learn, college majors, and AI unemployment.
As AI capabilities grow and AI-induced job loss spreads, more and more teachers and parents are voicing concerns about what education will look like in the near future and what students should study.
Yesterday, a friend messaged to say that she had been at a school and the panel she was on was asked what education will look like in 5 years.
Amarda Shehu posted on LinkedIn.
It’s a great post that created a good discussion in the comments.
In my opinion, the reality is that we don’t know and that students must prepare to develop core skills, learn basic knowledge, be prepared to learn how to learn, and to prepare to be entrepreneurs.
These were the answers that resonated most with me.
Microsoft and Anthropic Start Studying the Impact of AI on the Economy
Microsoft
The Microsoft AI Economy Institute is a newly established corporate think tank and research hub designed to study and guide the economic and social transformations brought about by artificial intelligence. Launched in early 2025 and housed within Microsoft’s AI for Good Lab, the Institute’s mission is to advance independent, actionable research that helps societies worldwide adapt to AI’s rapid evolution.
Key Features and Activities:
Multidisciplinary Research: The Institute brings together experts from academia, business, government, and international organizations to explore how AI is reshaping work, education, and productivity. Current research projects include evaluating the labor market value of AI skills and micro-credentials, investigating generative AI’s impact on academic innovation, and addressing policy gaps in higher education, particularly in regions like Africa.
Open Collaboration and Rapid Publication: The Institute sponsors global academic research, emphasizing open collaboration and fast publication cycles. This ensures that findings quickly inform both Microsoft’s internal strategies and broader public policy debates.
Support for Policymaking and Workforce Development: Research from the Institute directly informs the design of Microsoft Elevate’s training programs and policy recommendations, aiming to prepare individuals and institutions for the AI-driven economy. The Institute’s work is intended to help shape inclusive growth, ensuring that the benefits of AI are widely shared and that no communities are left behind.
Global Reach and Equity: The Institute is committed to supporting diverse perspectives, with specific initiatives inviting researchers from underrepresented regions, such as Africa, to contribute to the global conversation on AI’s economic impact. For example, it has issued calls for proposals from African scholars to examine workforce transitions and policy strategies for inclusive AI adoption7.
Convenings and Thought Leadership: Through workshops, fellowships, and convenings, the Institute acts as a forum for stakeholders to discuss the future of the AI economy and develop evidence-based solutions for challenges such as labor displacement, skills mismatches, and equitable access to AI opportunities.
Anthropic
Anthropic’s new Economic Futures Program is a multidisciplinary initiative designed to address the economic transformations driven by AI. The program is structured around three main pillars:
Research Grants: Anthropic is funding independent research into how AI affects labor markets, productivity, and new forms of value creation. These grants are open to external researchers and institutions, with an emphasis on rapid, empirical studies that can inform public understanding and policy within a short timeframe. For example, initial grants of up to $50,000 are available for studies that can deliver results within six months. Anthropic is also providing API credits and fostering partnerships to facilitate this research (Anthropic Economic Futures).
Evidence-Based Policy Development: The program is creating forums for collaboration among researchers, policymakers, and industry professionals to evaluate and propose policy responses to AI-driven economic changes. These symposia will be held in locations such as Washington, D.C., and Europe, focusing on labor transitions, fiscal policy, and innovation (InfoQ).
Economic Measurement and Data Infrastructure: Anthropic is expanding its Economic Index, one of the first longitudinal datasets tracking AI’s economic usage and long-term effects. This open-access data aims to help researchers and policymakers monitor how AI adoption is transforming industries, job markets, and productivity. The goal is to provide a robust empirical foundation for ongoing research and policy development (The Economic).
What they say is true/we shouldn’t be surprised…the world is changing.
Critics argue that truly human-level AI—simply isn't achievable, at least for along time. They view the current excitement around AI as nothing more than overblown hype and often engage in detailed semantic arguments about whether AI can reason. They point out that there are financial incentivees for companies to make strong AI claims.
This attitude is dangerous, maybe even deadly.
AI continues to radically advance. Nothing in the new Grok release was surprising and unexpected. New and likely more advanced releases are coming from OpenAI (ChatGPT5), Google (Gemini3 Pro?), and Anthropic.
We may soon hit signficant recursive self-improvement. Most of us only have access to what is publicly available. And even within that, very few people are using the best of what is publicly available (ChatGPT-o3 Pro, for example). As a regular -o3Pro user, I can say there is a huge difference between -o3Pro and the standard $20 models. Tyler Cowan has said -03Pro is AGI.
EPIC Games CEO Tim Sweeney is now saying the same about Grok4.
Personally, I prefer more aggressive intepretations (high scores not only on ARC-2 but also on ARC-3 and Humanity’s Last Exam), which push the time frame out approximately 5 years (Hassabis). But Dr. Ben Goertzel is right that, when applied, current technologies may be able to automate 80% of jobs.
We don’t need AGI for AI to radically alter the economy, something many says will happen by 2030.
More and more jobs are being lost without people understanding the necessity of preparing to become creators and entrepreneurs.
Inequality is growing as some people are making millions on AI and others are struggling to get by.
Sophisticated deep fakes continue to spread (Rubio, AI rock band) while most lack the AI literacy/fluency to understand what is happening. Despite this, many students this year will spend signficantly more time in school learning about a battle in the Korean war than AI.
Many fail to ignore the signals and warning signs (advancing AI, emerging agents, emerging robotics) and focus only what AI can do in the moment. Schools cannot prepare for only what AI can do in the moment because it takes them many months to respond and adapt. They must plan ahead.