Two AIs Walk Into a Data Center

Two AIs Walk Into a Data Center

One could execute anything. The other knew when to stop.


We expected this to take a couple of hours.

I’ve deployed an Oracle E-Business Suite Vision environment in that time before — but without a load balancer, HTTPS, or full production routing in place.

This time, we gave an AI agent the full scope: end-to-end deployment on OCI with HTTPS, WAF, and zero manual maintenance.

This is the kind of work that usually takes a senior engineer sustained focus, fast judgment, and a lot of small decisions made without drama.

It did everything right. And it still took days.

Not because it was bad at its job. But because being good at execution is not the same as being good at delivery.

Here’s what happened — and what it revealed about a pattern we’re now using on every complex deployment.


Part 1The First AI Was Actually Very Good

The agent — an OpenClaw instance we call Jethro — got a lot right in the early phases:

  • Networking, subnets, and load balancing provisioned cleanly
  • Private application tier deployed with no public exposure
  • IAM gaps identified and fixed without being told where to look
  • Database and application services brought online in sequence

If this were a demo, we’d have called it done early.

But real infrastructure work doesn’t stop at the happy path.


Part 2Where Capable Agents Break Down

The problems that emerged weren’t dramatic. They were ordinary:

  • IAM permissions missing in the target environment
  • Load balancer health checks failing due to application behavior
  • Certificate automation stalling on an environment gap
  • Intermediate host instability interrupting execution mid-task
  • Shell edge cases causing silent retries
  • Pre-existing defects in the vendor base image

Every one of these is solvable. None are unusual. Any experienced engineer has seen them all.

But together, they created a loop that’s very hard for a single agent to break out of:

Progress → Blocker → Pause → Clarification request → Wait → Repeat.

The agent wasn’t failing. It was pacing.

And the pace was being set by uncertainty, not capability.

It would hit an ambiguity — somewhere two valid paths existed, or where the cost of the wrong choice wasn’t clear — and default to asking a human. Which meant: stop, wait, context-switch, respond, resume.

At some point, the AI isn’t accelerating the work anymore. It’s metering it out.

This is the failure mode most AI write-ups skip: the agent can reason, but it doesn’t reliably know when to act versus when to ask. Without that judgment, every ambiguity becomes a human dependency.


Part 3The Fix Wasn’t Technical

We didn’t upgrade the agent. We didn’t add more tools. We didn’t solve this with a better prompt.

We added a second agent — with a completely different job.

The first agent’s job: execute. Follow the plan. Implement solutions. Move through the task list.

The second agent’s job: orchestrate. Set the tempo. Break deadlocks. Change strategy when the current one isn’t working. Keep the human out of low-level decisions that don’t need them.

The Pattern
Worker Agent — technical execution, task-level decisions, implementation
Orchestrator Agent — direction, momentum, strategy-level judgment

This is the operating model: execution separated from orchestration.

The orchestrator doesn’t need to know how to provision a subnet. It needs to know when the worker is stuck, what kind of stuck it is, and whether to push through, pivot, or escalate.

That’s a different skill set entirely. And it’s one almost nobody is talking about clearly.


Part 4The Moment Everything Changed

The clearest example: the worker agent spent hours trying to repair a configuration failure the “correct” way.

Standard tools weren’t present. Scripts failed. Each fix attempt exposed another layer of the problem. The agent kept trying because it was, by every measure, doing the right thing. It just wasn’t working.

The orchestrator stepped in and changed the problem:

Stop trying to fix the broken path.

Confirm the application works locally. Correct what needs correcting at the data layer. Route around the rest from the outside.

Three moves:

  • Verified the application was working at the local level
  • Made required corrections directly at the data layer
  • Let the reverse proxy handle external routing consistency

The system went live without ever fixing the original failure.

That’s the difference between tool execution and system-level thinking. The worker was executing correctly. The orchestrator changed what “correct” meant.


Part 5What the Deployment Produced

6
days to stable delivery

0
manual maintenance steps

2
agents, one outcome

  • WAF in front of a public load balancer
  • Private application tier with no direct exposure
  • Reverse proxy handling external routing and URL consistency
  • Automated certificate management
  • Recovery backup in place

More importantly: the model now repeats. That’s the real deliverable.


Part 6What Did It Actually Cost?

Six days sounds like a long time. But the actual human investment looked nothing like six days of work.

How the time actually broke down:
Most of the “work” happened in Discord — short bursts from a phone while managing the farm, during downtime on client projects, at night before bed. The total time requiring active human attention was roughly 2–3 hours spread across six days. The AI worked continuously in between.

Here’s what the real costs looked like:

Line Item Details Estimated Cost
AI compute (both agents) 906 messages, 6 days of execution, monitoring loops, document generation — Claude API across Jethro + Timothy ~$150–200
OCI infrastructure EBS VM (2 OCPU/32GB, 6 days), Load Balancer, certbot runner, boot volume backup ~$60–80
Human time ~2–3 hours across 6 days — phone approvals, quick answers, direction-setting Internal
Total out-of-pocket ~$210–280

Now compare that to the traditional path:

This approach
$210–280
Total cost including AI compute + OCI infra for 6 days

Traditional Oracle consultant
$3,600–12,000+
$150–300/hr × 2–5 day engagement. Plus scheduling lead time. Plus availability constraints.

The AI-assisted deployment cost approximately 5–10% of what a traditional consulting engagement would have cost — and it ran continuously while the human did other things.

The real cost wasn’t money. It was the learning curve: figuring out that a solo agent without an orchestration layer is significantly less effective than a structured two-agent model. That lesson cost some extra days on this project. It’s free on the next one.


TakeawayThe Model Matters More Than the Agent

The question isn’t whether AI can do the work.

The question is whether it can finish it.

What this deployment forced us to confront was simpler and more important:

Do you have the structure to make AI deliver reliably on work that actually matters?

What we observed:

  • A skilled worker without direction stalls at ambiguity
  • A system without accountability drifts toward asking instead of acting
  • An orchestration layer converts raw capability into sustained momentum

This isn’t about AI replacing engineers. It’s about changing how work actually gets delivered.

AI workers, guided by AI or human orchestration, can deliver outcomes that neither could produce alone.

AI doesn’t eliminate operations problems. It amplifies your operating model.

Weak model — things break faster.
Strong model — things move faster.

What we learned, the hard way, on a real deployment: the model matters more than the agent.


We’re actively building and refining this pattern. If you’re trying to move from AI demos to real delivery systems — and you’ve run into this same wall — we’d like to compare notes.

David Norton Consulting

Leave a Comment