Skip to main content
← Back to BlogClaude Code Subagents: Running 4+ Hour Autonomous Refactors Without Context Rot

Claude Code Subagents: Running 4+ Hour Autonomous Refactors Without Context Rot

AIHelpTools TeamMay 3, 2026
claude-codeautonomous-agentsrefactoringcode-migrationagentic-workflows

Claude Code Subagents: Running 4+ Hour Autonomous Refactors Without Context Rot

You start a large refactor at 9 AM. By noon, your single Claude Code session has drifted so far from the original plan that it's renaming variables based on hallucinated conventions. The context window is stuffed with dead ends, and you're babysitting instead of building.

This is where subagents shine. Instead of one bloated context trying to hold an entire migration in memory, you spawn focused workers that each handle a bounded task, report back, and die. The parent agent orchestrates. The children execute.

Here's what actually works after running overnight refactors and multi-hour migrations.

Table of Contents

  1. When Subagents Beat Single-Agent Runs
  2. The Core Subagent Spawn Pattern
  3. Context Passing Without Bloat
  4. Failure Recovery and Checkpointing
  5. Real Numbers: Cost and Time Tradeoffs
  6. What Still Breaks

When Subagents Beat Single-Agent Runs

Not every task needs subagents. Code reviews, feature additions under 500 lines, debugging a specific function: these work fine in a single session. The context stays tight, the goal is clear, and you're done in 20 minutes.

Subagents make sense when:

  • The task spans multiple subsystems. Migrating from Redux to Zustand touches state management, component props, middleware, and tests. Each area is a different problem domain.
  • You need parallel exploration. Testing five different API client libraries means five independent evaluation threads, not one sequential slog.
  • Failure isolation matters. If the database migration agent crashes, you don't want it taking down the API refactor agent with it.
  • The work exceeds 2 hours of continuous execution. Context drift becomes real around the 90-minute mark. By hour 3, a single agent is rewriting code it already fixed.

Analogy: A single agent is a solo developer trying to hold an entire microservices migration in their head. Subagents are a team where each person owns a service, and the tech lead coordinates via Slack.

The Core Subagent Spawn Pattern

The pattern that works reliably:

  1. Parent agent reads the high-level plan, breaks it into bounded tasks, writes task manifests to disk.
  2. Spawns child agents via separate Claude Code API calls, each with its own context and manifest.
  3. Children execute, write results to structured output files (JSON, markdown, or diffs).
  4. Parent polls the output directory, reads results, decides next steps.
  5. Repeat until the plan is complete or a hard failure occurs.

Here's a simplified spawn command:

# Parent agent writes this
task_manifest = {
    "task_id": "migrate_auth_module",
    "goal": "Replace passport.js with lucia-auth in src/auth",
    "context_files": ["src/auth/passport-config.js", "docs/auth-requirements.md"],
    "output_file": "outputs/auth_migration_result.json",
    "max_runtime": "30min"
}
with open("tasks/task_001.json", "w") as f:
    json.dump(task_manifest, f)

The child agent reads task_001.json, loads only the files listed in context_files, does the work, writes to output_file. The parent never sees the child's internal context.

Context Passing Without Bloat

The killer mistake: passing the entire project context to every subagent. You end up with 10 agents, each holding 50,000 tokens of irrelevant code.

Instead, use scoped context manifests:

Context TypeWhat to IncludeWhat to Exclude
Code filesOnly files the agent will modify + direct dependenciesTest files unless agent is writing tests
DocumentationAPI contracts, migration guides, architecture decision recordsGeneral onboarding docs, marketing copy
Prior resultsOutputs from upstream agents this agent depends onOutputs from parallel or downstream agents
ConstraintsCoding standards, library versions, breaking change rulesCompany history, team bios

A well-scoped manifest for a database migration agent might be:

{
    "context_files": [
        "src/db/schema.sql",
        "src/db/migrations/002_add_users.sql",
        "docs/database-conventions.md"
    ],
    "upstream_results": ["outputs/schema_analysis.json"],
    "constraints": {
        "postgres_version": "15.2",
        "no_data_loss": true,
        "rollback_required": true
    }
}

This keeps the agent's context under 5,000 tokens. It knows what to do, has the context to do it, and nothing extra.

Failure Recovery and Checkpointing

Long-running agents fail. Network drops, API rate limits, the agent gets stuck in a loop. The question is whether you lose 4 hours of work or 10 minutes.

Checkpoint strategy:

Every 15 minutes of agent work, write a checkpoint file:

{
    "checkpoint_id": "auth_migration_cp_003",
    "timestamp": "2025-01-15T14:32:00Z",
    "completed_steps": [
        "analyzed_passport_config",
        "installed_lucia_auth",
        "migrated_session_storage"
    ],
    "current_step": "updating_middleware",
    "files_modified": ["src/auth/session.js", "src/middleware/auth.js"],
    "next_steps": ["update_route_guards", "migrate_tests"]
}

If the agent crashes, the parent spawns a recovery agent with:

  • The checkpoint file as context
  • Instructions to resume from current_step
  • A note that completed_steps should not be redone

This works because each step produces observable artifacts (modified files, test results, logs). The recovery agent can verify what's done and pick up where the crash happened.

Hard failure rules:

  • If an agent fails the same step 3 times, escalate to the parent for replanning.
  • If the parent can't replan, write a failure report and halt. Don't let agents flail for hours.
  • Set a global timeout. If the entire task isn't done in 6 hours, something is wrong with the plan.

Real Numbers: Cost and Time Tradeoffs

Here's what a 4-hour migration actually costs using Claude Code via API (you can't use the subscription plan for autonomous agents).

Single agent approach:

  • Input tokens: ~800,000 (context keeps growing)
  • Output tokens: ~150,000
  • Cost at Sonnet 4.5 pricing: ~$2.40 input + $15.00 output = $17.40
  • Wall time: 4 hours
  • Context rot incidents: 5-7 (agent rewrites same code, forgets constraints)

Subagent approach (1 parent + 6 children):

  • Input tokens: ~200,000 (scoped contexts, no bloat)
  • Output tokens: ~180,000 (more structured outputs)
  • Cost: ~$0.60 input + $18.00 output = $18.60
  • Wall time: 2.5 hours (parallel execution)
  • Context rot incidents: 0-1 (each agent dies before drift)

The subagent approach costs 7% more but finishes 38% faster and produces cleaner diffs. The tradeoff is setup time: you spend 30 minutes writing the parent orchestration logic.

For a one-off refactor under 2 hours, single agent wins. For recurring migrations or multi-day projects, subagents pay off.

What Still Breaks

Subagents aren't magic. Here's what still fails:

Circular dependencies between tasks. If agent A needs agent B's output, and agent B needs agent A's output, the parent deadlocks. You have to detect cycles in the task graph before spawning.

Merge conflicts. Two parallel agents modifying the same file will create conflicts. The parent needs to sequence tasks that touch shared files or use a locking mechanism.

Drift in coding style. Each subagent interprets style guides slightly differently. You get inconsistent naming, different error handling patterns. Running a linter/formatter after each agent helps but doesn't eliminate it.

Over-subdivision. Spawning 50 agents for a task that could be 5 creates coordination overhead that outweighs the benefits. If a task takes less than 15 minutes, it doesn't need its own agent.

API rate limits. Anthropic has usage tiers. If you spawn 10 agents concurrently, you might hit rate limits faster than you expect. Stagger spawns or use a queue.

Conclusion

Subagents are a tool for managing complexity, not a silver bullet. Use them when tasks naturally decompose into independent units, when you need parallel execution, or when a single context window can't hold the entire problem.

The patterns that work: scoped context manifests, checkpoint-based recovery, structured output files, and a parent agent that coordinates but doesn't micromanage.

The patterns that fail: dumping full project context into every agent, spawning agents for 5-minute tasks, ignoring failure recovery, and hoping agents will magically coordinate.

Start with a single agent. When you hit context rot or 2+ hour runtimes, split into subagents. Measure token usage and wall time. Iterate.

Autonomous work is possible. It just requires better architecture than "run Claude and pray."