NEWS/

Anthropic Just Shipped Claude Managed Agents

Anthropic's new platform runs your agents on their infrastructure - hosted sessions, sandboxing, multi-agent coordination, and execution tracing. Here's what's in the box and what to build with it.

By Travisse Hansen

Anthropic shipped Claude Managed Agents yesterday in public beta on the Claude Platform, and the full announcement is here. The pitch is that you define an agent's tasks, tools, and guardrails, and Anthropic runs the whole thing on their infrastructure - hosted sandboxes, long-running sessions that survive disconnections, authenticated tool calls, checkpointing, execution tracing, and multi-agent coordination, all handled on their side of the API.

I have a folder on my laptop full of half-finished agent scaffolding - session state, a sandbox runner, a retry loop, tool auth, and a checkpointer I never finished. None of it is the agent. The agent is a prompt and three tools. The folder is the stuff I had to build around the agent so it wouldn't die five minutes in, and Managed Agents deletes all of it. Shipping an agent just went from a weekend infra project to an afternoon prompt.

What's Actually In The Box

Long-running sessions that survive disconnections

This is the one that matters. You start an agent, close your laptop, and come back to finished work, where before today you had to build session state and reconnection logic yourself. The real implication is that hosted agents become background threads for your brain - you can spin one up before a meeting, walk away, and the thinking keeps going without you.

Sandboxed tool execution with real credentials

The two things that used to stop side-project agents from becoming real products were sandboxing and secrets management, and Anthropic is now handling both. Your agent can call tools that need real credentials without you standing up a vault or a sandbox runner, which is the difference between "cool demo" and "actually ships."

Multi-agent coordination (research preview)

Agents can spawn and direct other agents, which is the same pattern the leaked Claude Code roadmap called Coordinator Mode, now landing as a core platform primitive. Work that looked like "hire 10 junior people" starts to look like "spin up 10 agents with a coordinator on top."

Self-evaluation and iteration

Agents can grade their own output against success criteria and iterate until they hit the bar, also in research preview, with Anthropic reporting up to a 10-point task success improvement on structured file generation from internal testing. This is the thing that turns "prompt and hope" into "define the bar and let it work."

Execution tracing in the Console

Every session shows up in the Claude Console with full traces of what the agent did, why, which tools it called, and where it got stuck. This sounds boring but it's the thing that makes agents debuggable enough to ship to real users, which is the part most teams underestimate until they're stuck in it.

Scoped permissions and identity

Agents get scoped permissions and identities, which is what you need before you let an agent touch real customer data or take actions on behalf of real users. It's the feature that quietly makes enterprise deployments possible.

Who's Already Building On It

The launch partners matter less than the shape of the work they're doing with it, and a few of them stuck out to me.

  • Notion: engineers ship code, and knowledge workers produce websites and presentations in parallel. Non-engineers making websites while they do something else is the whole thesis of where this technology is heading, and it's buried in a customer quote on launch day.
  • Rakuten: enterprise agents across product, sales, marketing, finance, and HR, with each one deployed in about a week. That used to be a quarter of work for an internal tools team, and they're doing it on a rolling basis.
  • Asana: AI Teammates that collaborate on tasks and draft deliverables inside Asana projects. Agents sitting in the tool where the work already lives, instead of yet another tab to check.
  • Vibecode: users spinning up infrastructure 10x faster than before, which is the early proof that "agent as a product feature" actually works when you put it in front of real customers.
  • Sentry: a debugging agent paired with a patch-writing agent, so a bug comes in and a reviewable PR comes out the other side. They say they shipped it in weeks instead of months, and the before-and-after on that timeline is the entire point of Managed Agents.

Things You Could Build This Week

The launch partners are enterprise stories, but the same primitives work for individuals. Here are a handful of things I'd build this week that weren't practical before today:

  • Meeting prep agent: point it at your calendar and it researches attendees, their companies, recent posts, and any past context from your notes, so when you sit down in the morning the prep doc is already waiting. This is the one I'm building and turning into a guide.
  • Weekly review agent: reads your calendar, Notion, email, and GitHub for the week and drafts your Friday review on Thursday night, so you wake up to something you just edit instead of write.
  • Competitive monitor: watches competitor domains, X accounts, and changelogs and sends you a Monday brief on anything that actually moved, which is the kind of work a junior marketer used to spend a morning on.
  • Inbox triage agent: drafts routine replies, flags the messages that actually need your brain, and queues the rest in a decision list you can clear in one sitting.
  • Customer research agent: give it a list of leads and it produces a research doc on each one overnight with firmographics, recent news, and a suggested opener.
  • Release notes agent: watches your git history and turns shipped commits into customer-facing changelog entries on a schedule, so the thing you always mean to do finally gets done.
  • Content refresh agent: audits your published posts against current search data and flags the ones that need updating, running monthly without you thinking about it.
  • Deal room agent: when a deal moves to a new stage, an agent goes and produces the next set of artifacts - security questionnaire responses, tailored proposal, reference customer match - so they're ready before the rep logs in.

None of these need a model breakthrough. They all existed as possibilities yesterday, and what changed is that they stopped needing a month of infrastructure work to ship.

What This Means If You're Learning Claude Code

I'm biased because I teach this stuff, but here's my honest take: the most valuable AI skill for the rest of 2026 isn't prompting, it's knowing what to hand off to a background thread. The question shifted from "what can I do with an LLM in a chat window" to "what job can I give an agent that runs without me," and Managed Agents just made that second question dramatically cheaper to answer.

Everything I teach in ClaudeFluent - tool use, structured output, prompt design, and when to let the model think vs. constrain it - transfers directly to building on this. The people who understand the fundamentals are going to absorb it in an afternoon, and the people who wait are looking at a steeper curve because the surface area just expanded again.

Where To Start

Managed Agents is live on the Claude Platform. You can read the docs and spin up your first agent at claude.com/blog/claude-managed-agents. I'm building the Meeting Prep Agent this week and turning it into a walkthrough guide, and if you want to build along and get the guide the second it drops, the next ClaudeFluent cohort is where we're going to work through this stuff live.

The Bigger Picture

Two announcements in two weeks pointing in the same direction. First the leaked Claude Code roadmap told us Anthropic was building Chyros, Coordinator Mode, and Wizard, and yesterday they shipped the platform version of all three at the same time, in public beta, for everyone. That's not a coincidence - Claude Code was the proof of concept, Managed Agents is the platform, and the Notion and Rakuten customer stories are the revenue justification.

Six months from now the question people are asking isn't "should I use an AI assistant," it's "how many background agents do I have running, and what's each one working on." I'd start building.

Related Posts

WANT MORE LIKE THIS?

Learn to build with AI agents

6 hours of hands-on training. Build real projects. Ship without waiting on engineering.

View Class Details