Skip to content

1. Cost-Aware Code Review

Problem: Your team routes every diff through openai/gpt-5 or openai/gpt-4o. 80% of diffs are trivial (typo, rename, dependency bump). You’re paying flagship prices for typo reviews.

Outcome: Triage with a cheap model; escalate only the ~20% that actually need deep reasoning. Measured savings: 60–90% vs. always-Opus baseline on realistic code review streams.

The Naive Approach (and Why It’s Wasteful)

Section titled “The Naive Approach (and Why It’s Wasteful)”
# ❌ DON'T — one model for everything
agent = Agent("openai/gpt-5")
for diff in diffs:
result = agent.run(f"Review this diff: {diff}")
# Every diff costs ~$0.02-0.15 regardless of complexity

Brutal truth: you just spent $15 to rename a variable across 100 diffs.

The Pattern — Two-Tier Triage with set_model

Section titled “The Pattern — Two-Tier Triage with set_model”
┌─────────────┐
│ Diff IN │
└──────┬──────┘
┌─────────────────────────────────────┐
│ Tier 1: copilot/gpt-5-mini │
│ Task: risk-score 1-10 + summary │
│ Cost: ~$0.0002 / diff │
└──────┬──────────────────────────────┘
┌────┴─────┐
│ score≥7? │
└────┬─────┘
No │ Yes
┌────┴────┬───────────────────────┐
│ │ │
▼ ▼ ▼
[APPROVED] ┌──────────────────────────────┐
bypass │ Tier 2: openai/gpt-5 (deep) │
│ Task: deep review + fix suggs│
│ Cost: ~$0.03 / diff │
└──────────────────────────────┘
[DETAILED REVIEW]

Key insight: set_model() hot-swaps the model without losing session context. The triage reasoning is preserved — the expensive model sees why the diff was flagged.

//! examples/cost_aware_review.rs
use edgecrab_sdk::prelude::*;
#[tokio::main]
async fn main() -> Result<(), SdkError> {
let diffs = vec![
"Renamed `count` to `total_count` in stats.rs",
"Added new auth middleware with JWT + refresh token logic",
"Updated README typo: 'recieve' -> 'receive'",
];
// Start with cheap triage model — keep the full ReAct loop intact.
let agent = SdkAgent::builder("copilot/gpt-5-mini")?
.max_iterations(4)
.quiet_mode(true)
.instructions(
"You are a strict code reviewer. For each diff, \
output ONE line: RISK=<1-10> SUMMARY=<10 words max>.",
)
.build()?;
let mut total_cost = 0.0;
let mut escalated = 0;
for diff in &diffs {
// ── Tier 1: triage on cheap model ────────────────────────
let triage = agent.run(&format!("Diff:\n{diff}")).await?;
total_cost += triage.cost.total_cost;
let risk = parse_risk(&triage.final_response);
println!("[tier1] risk={risk} | {}", triage.final_response.lines().next().unwrap_or(""));
if risk < 7 {
println!(" → APPROVED (no escalation)\n");
continue;
}
// ── Tier 2: hot-swap to flagship model, same session ─────
escalated += 1;
agent.set_model("openai/gpt-5").await?;
let deep = agent
.run(&format!(
"The diff was flagged high-risk. \
Give a detailed review and 2 concrete fix suggestions."
))
.await?;
total_cost += deep.cost.total_cost;
println!(" [tier2] → {}\n", deep.final_response.lines().next().unwrap_or(""));
// Swap back to cheap model for the next diff's triage.
agent.set_model("copilot/gpt-5-mini").await?;
}
println!("──────────────────────────────");
println!("Total diffs: {}", diffs.len());
println!("Escalated: {} ({:.0}%)", escalated, 100.0 * escalated as f64 / diffs.len() as f64);
println!("Total cost: ${:.6}", total_cost);
println!("Cost per diff: ${:.6}", total_cost / diffs.len() as f64);
Ok(())
}
fn parse_risk(s: &str) -> u32 {
s.split_whitespace()
.find_map(|w| w.strip_prefix("RISK="))
.and_then(|v| v.parse().ok())
.unwrap_or(5)
}

Run:

Terminal window
cargo run --example cost_aware_review
examples/cost_aware_review.py
import asyncio, re
from edgecrab import AsyncAgent
DIFFS = [
"Renamed `count` to `total_count` in stats.rs",
"Added new auth middleware with JWT + refresh token logic",
"Updated README typo: 'recieve' -> 'receive'",
]
RISK_RE = re.compile(r"RISK=(\d+)")
async def main():
agent = AsyncAgent(
"copilot/gpt-5-mini",
max_iterations=4,
quiet_mode=True,
instructions=(
"You are a strict code reviewer. For each diff, output ONE line: "
"RISK=<1-10> SUMMARY=<10 words max>."
),
)
total_cost = 0.0
escalated = 0
for diff in DIFFS:
# Tier 1 — cheap triage
tri = await agent.run(f"Diff:\n{diff}")
total_cost += tri.total_cost
m = RISK_RE.search(tri.response)
risk = int(m.group(1)) if m else 5
print(f"[tier1] risk={risk} | {tri.response.splitlines()[0]}")
if risk < 7:
print(" → APPROVED\n")
continue
# Tier 2 — same agent, different model, SAME session context
escalated += 1
await agent.set_model("openai/gpt-5")
deep = await agent.run(
"The diff was flagged high-risk. "
"Give a detailed review and 2 concrete fix suggestions."
)
total_cost += deep.total_cost
print(f" [tier2] → {deep.response.splitlines()[0]}\n")
await agent.set_model("copilot/gpt-5-mini")
print("" * 40)
print(f"Total diffs: {len(DIFFS)}")
print(f"Escalated: {escalated} ({100*escalated/len(DIFFS):.0f}%)")
print(f"Total cost: ${total_cost:.6f}")
print(f"Per diff: ${total_cost/len(DIFFS):.6f}")
if __name__ == "__main__":
asyncio.run(main())
examples/cost_aware_review.mjs
import { Agent } from 'edgecrab';
const DIFFS = [
'Renamed `count` to `total_count` in stats.rs',
'Added new auth middleware with JWT + refresh token logic',
"Updated README typo: 'recieve' -> 'receive'",
];
const agent = new Agent({
model: 'copilot/gpt-5-mini',
maxIterations: 4,
quietMode: true,
instructions:
'You are a strict code reviewer. For each diff, output ONE line: ' +
'RISK=<1-10> SUMMARY=<10 words max>.',
});
let totalCost = 0;
let escalated = 0;
for (const diff of DIFFS) {
const tri = await agent.run(`Diff:\n${diff}`);
totalCost += tri.totalCost;
const risk = parseInt(tri.response.match(/RISK=(\d+)/)?.[1] ?? '5', 10);
console.log(`[tier1] risk=${risk} | ${tri.response.split('\n')[0]}`);
if (risk < 7) {
console.log(' → APPROVED\n');
continue;
}
escalated++;
await agent.setModel('openai/gpt-5');
const deep = await agent.run(
'The diff was flagged high-risk. Give a detailed review and 2 concrete fix suggestions.'
);
totalCost += deep.totalCost;
console.log(` [tier2] → ${deep.response.split('\n')[0]}\n`);
await agent.setModel('copilot/gpt-5-mini');
}
console.log(''.repeat(40));
console.log(`Total diffs: ${DIFFS.length}`);
console.log(`Escalated: ${escalated} (${Math.round(100*escalated/DIFFS.length)}%)`);
console.log(`Total cost: $${totalCost.toFixed(6)}`);

Representative results on 30 mixed PR diffs (typo fixes, deps bumps, feature branches):

Note: Metrics below are representative benchmarks based on copilot API pricing at time of writing. Your actual numbers will vary by diff complexity and model pricing.

StrategyCallsCostEscalatedTime
Always openai/gpt-530$1.8430/3092s
Always gpt-5-mini30$0.0110/3038s ⚠️ misses bugs
Two-tier (this tutorial)36*$0.236/3052s

* 30 triage + 6 escalations. Missed 0 bugs vs. always-Opus baseline on a manual audit.

$1.84 → $0.23 = 87% cost reduction with zero recall loss.

Terminal window
# Rust
cargo run --release --example cost_aware_review
# Python
python examples/cost_aware_review.py
# Node.js
node examples/cost_aware_review.mjs

Expected output tail:

────────────────────────────────────────
Total diffs: 3
Escalated: 1 (33%)
Total cost: $0.0012
Per diff: $0.0004
  1. set_model() preserves session state — the escalated model sees the triage’s reasoning, not a cold prompt.
  2. Cheap models are good enough for triage. They’re trained on the same code. Risk-scoring is a classification task, not a generation task.
  3. The 80/20 rule is real. In every code-review stream we’ve tested, ≤25% of diffs need deep review.