Claude 4: Memory, Tools, and the Evolution of AIs as Co-Workers

May 23, 2025

With the release of Claude Opus 4 and Claude Sonnet 4 (“Code with Claude”), Anthropic has delivered what might be the most significant leap toward the "open agentic web" that Microsoft CTO Kevin Scott has long envisioned—a world where AI agents can act autonomously on your behalf through reliable, interoperable protocols.

Extended Thinking Meets Real-World Tools

Unlike previous iterations that could hold coherent conversations for one to two hours before losing focus, Claude 4 models can now work autonomously for up to 24 hours while maintaining coherence and context.

Imagine this: Claude can read your email, search for relevant information, craft a thoughtful response, and send it—all in one continuous process. It can access your Google Drive, manipulate spreadsheets, and coordinate across multiple apps without losing track of the broader objective. The AI doesn't just use tools; it thinks strategically about when and how to use them.

Let me break down how this works.

When you give Claude 4 a list of tasks, it doesn't just execute them mechanically like traditional automation. Instead, it approaches them with contextual understanding. For example, if you ask it to "research competitor pricing and update our product strategy," Claude 4 might:

Start with your original task: Research competitor pricing
Recognize related needs: "I should also check recent market trends that might affect pricing."
Add relevant subtasks: Analyze customer reviews to understand value perception
Expand scope intelligently: "Since I'm updating strategy, I should also review our current positioning against these findings."

There are a couple of basic ways that Claude 4 makes this all work.

Memory expansion. Both Claude 4 models introduce sophisticated memory capabilities that address one of AI's biggest limitations: the inability to maintain context over extended periods. When given access to local files, the models can create and update "memory files," tracking progress and important information across long sessions—essentially taking notes the way humans do during extended work.

This isn't just about remembering what happened earlier in a conversation. It's about building tacit knowledge over time, maintaining continuity across sessions, and developing an understanding of your specific context and preferences.

Connecting to other tools. Perhaps most exciting from the user perspective is that you're not limited to Anthropic-approved tools. Through platforms like Zapier, Claude 4 can tap into thousands of applications—Google Sheets, Slack, Notion, and countless others. This openness represents a fundamental shift away from closed AI ecosystems toward truly interoperable agent networks.

Some Empirical Proof

The numbers tell a compelling story. Opus 4 has successfully operated on complex tasks for up to 24 hours—from playing Pokémon (demonstrating sustained strategic thinking) to running code refactoring tasks for seven hours without interruption. This represents a 12x improvement in sustained performance over previous generations.

According to Anthropic

Cursor calls it state-of-the-art for coding and a leap forward in complex codebase understanding. Replit reports improved precision and dramatic advancements for complex changes across multiple files. Block calls it the first model to boost code quality during editing and debugging in its agent, codename goose, while maintaining full performance and reliability. Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance. Cognition notes Opus 4 excels at solving complex challenges that other models can't, successfully handling critical actions that previous models have missed.

The Deep View newsletter gave some specific examples:

What This Means for the Future

We're witnessing the emergence of AI that doesn't just respond to prompts but actively participates in complex, multi-step workflows. Claude 4 represents a bridge between today's conversational AI and tomorrow's fully autonomous agents.

The "open agentic web" is no longer a distant vision—it's materializing in tools you can use today. As these capabilities mature and integrate deeper into our digital infrastructure, we're approaching a fundamental transformation in how work gets done.

Thoughts Beyond

Anthropic CEO Dario Amodei made several striking predictions that frame the true magnitude of what Claude 4 represents.

The $1 Billion One-Person Company: Amodei predicts that by 2026, we'll see the first company generating $1 billion in revenue run by a single person—enabled by AI agents like Claude 4. This isn't hyperbole; it's a logical extrapolation of what happens when one person can orchestrate complex operations across thousands of tools and processes simultaneously.

Self-Improving AI: Perhaps most remarkably, these same technologies are now being used to build Claude itself, setting the stage for what Amodei describes as a "hard takeoff"—exponential improvement as AI systems become capable of enhancing their own development processes.

Healthcare Revolution: With proper resource allocation, Amodei believes we can "cure all diseases within 5 years." This isn't just about Claude 4's current capabilities, but about the acceleration it enables across research, drug discovery, and medical innovation.

These aren't distant science fiction scenarios—they're the logical next steps of capabilities we can observe and test today.

The future of work isn't just changing—it's already here.

Education Disrupted: Teaching and Learning in An AI World

Discussion about this post