1. Cost-Aware Code Review
Cost-Aware Code Review
Section titled “Cost-Aware Code Review”Problem: Your team routes every diff through
openai/gpt-5oropenai/gpt-4o. 80% of diffs are trivial (typo, rename, dependency bump). You’re paying flagship prices for typo reviews.
Outcome: Triage with a cheap model; escalate only the ~20% that actually need deep reasoning. Measured savings: 60–90% vs. always-Opus baseline on realistic code review streams.
The Naive Approach (and Why It’s Wasteful)
Section titled “The Naive Approach (and Why It’s Wasteful)”# ❌ DON'T — one model for everythingagent = Agent("openai/gpt-5")for diff in diffs: result = agent.run(f"Review this diff: {diff}") # Every diff costs ~$0.02-0.15 regardless of complexityBrutal truth: you just spent $15 to rename a variable across 100 diffs.
The Pattern — Two-Tier Triage with set_model
Section titled “The Pattern — Two-Tier Triage with set_model” ┌─────────────┐ │ Diff IN │ └──────┬──────┘ │ ▼ ┌─────────────────────────────────────┐ │ Tier 1: copilot/gpt-5-mini │ │ Task: risk-score 1-10 + summary │ │ Cost: ~$0.0002 / diff │ └──────┬──────────────────────────────┘ │ ┌────┴─────┐ │ score≥7? │ └────┬─────┘ No │ Yes ┌────┴────┬───────────────────────┐ │ │ │ ▼ ▼ ▼ [APPROVED] ┌──────────────────────────────┐ bypass │ Tier 2: openai/gpt-5 (deep) │ │ Task: deep review + fix suggs│ │ Cost: ~$0.03 / diff │ └──────────────────────────────┘ │ ▼ [DETAILED REVIEW]Key insight: set_model() hot-swaps the model without losing session context. The triage reasoning is preserved — the expensive model sees why the diff was flagged.
Rust Implementation
Section titled “Rust Implementation”//! examples/cost_aware_review.rsuse edgecrab_sdk::prelude::*;
#[tokio::main]async fn main() -> Result<(), SdkError> { let diffs = vec![ "Renamed `count` to `total_count` in stats.rs", "Added new auth middleware with JWT + refresh token logic", "Updated README typo: 'recieve' -> 'receive'", ];
// Start with cheap triage model — keep the full ReAct loop intact. let agent = SdkAgent::builder("copilot/gpt-5-mini")? .max_iterations(4) .quiet_mode(true) .instructions( "You are a strict code reviewer. For each diff, \ output ONE line: RISK=<1-10> SUMMARY=<10 words max>.", ) .build()?;
let mut total_cost = 0.0; let mut escalated = 0;
for diff in &diffs { // ── Tier 1: triage on cheap model ──────────────────────── let triage = agent.run(&format!("Diff:\n{diff}")).await?; total_cost += triage.cost.total_cost; let risk = parse_risk(&triage.final_response);
println!("[tier1] risk={risk} | {}", triage.final_response.lines().next().unwrap_or(""));
if risk < 7 { println!(" → APPROVED (no escalation)\n"); continue; }
// ── Tier 2: hot-swap to flagship model, same session ───── escalated += 1; agent.set_model("openai/gpt-5").await?;
let deep = agent .run(&format!( "The diff was flagged high-risk. \ Give a detailed review and 2 concrete fix suggestions." )) .await?; total_cost += deep.cost.total_cost;
println!(" [tier2] → {}\n", deep.final_response.lines().next().unwrap_or(""));
// Swap back to cheap model for the next diff's triage. agent.set_model("copilot/gpt-5-mini").await?; }
println!("──────────────────────────────"); println!("Total diffs: {}", diffs.len()); println!("Escalated: {} ({:.0}%)", escalated, 100.0 * escalated as f64 / diffs.len() as f64); println!("Total cost: ${:.6}", total_cost); println!("Cost per diff: ${:.6}", total_cost / diffs.len() as f64); Ok(())}
fn parse_risk(s: &str) -> u32 { s.split_whitespace() .find_map(|w| w.strip_prefix("RISK=")) .and_then(|v| v.parse().ok()) .unwrap_or(5)}Run:
cargo run --example cost_aware_reviewPython Implementation
Section titled “Python Implementation”import asyncio, refrom edgecrab import AsyncAgent
DIFFS = [ "Renamed `count` to `total_count` in stats.rs", "Added new auth middleware with JWT + refresh token logic", "Updated README typo: 'recieve' -> 'receive'",]
RISK_RE = re.compile(r"RISK=(\d+)")
async def main(): agent = AsyncAgent( "copilot/gpt-5-mini", max_iterations=4, quiet_mode=True, instructions=( "You are a strict code reviewer. For each diff, output ONE line: " "RISK=<1-10> SUMMARY=<10 words max>." ), )
total_cost = 0.0 escalated = 0
for diff in DIFFS: # Tier 1 — cheap triage tri = await agent.run(f"Diff:\n{diff}") total_cost += tri.total_cost m = RISK_RE.search(tri.response) risk = int(m.group(1)) if m else 5
print(f"[tier1] risk={risk} | {tri.response.splitlines()[0]}")
if risk < 7: print(" → APPROVED\n") continue
# Tier 2 — same agent, different model, SAME session context escalated += 1 await agent.set_model("openai/gpt-5") deep = await agent.run( "The diff was flagged high-risk. " "Give a detailed review and 2 concrete fix suggestions." ) total_cost += deep.total_cost print(f" [tier2] → {deep.response.splitlines()[0]}\n")
await agent.set_model("copilot/gpt-5-mini")
print("─" * 40) print(f"Total diffs: {len(DIFFS)}") print(f"Escalated: {escalated} ({100*escalated/len(DIFFS):.0f}%)") print(f"Total cost: ${total_cost:.6f}") print(f"Per diff: ${total_cost/len(DIFFS):.6f}")
if __name__ == "__main__": asyncio.run(main())Node.js Implementation
Section titled “Node.js Implementation”import { Agent } from 'edgecrab';
const DIFFS = [ 'Renamed `count` to `total_count` in stats.rs', 'Added new auth middleware with JWT + refresh token logic', "Updated README typo: 'recieve' -> 'receive'",];
const agent = new Agent({ model: 'copilot/gpt-5-mini', maxIterations: 4, quietMode: true, instructions: 'You are a strict code reviewer. For each diff, output ONE line: ' + 'RISK=<1-10> SUMMARY=<10 words max>.',});
let totalCost = 0;let escalated = 0;
for (const diff of DIFFS) { const tri = await agent.run(`Diff:\n${diff}`); totalCost += tri.totalCost; const risk = parseInt(tri.response.match(/RISK=(\d+)/)?.[1] ?? '5', 10);
console.log(`[tier1] risk=${risk} | ${tri.response.split('\n')[0]}`);
if (risk < 7) { console.log(' → APPROVED\n'); continue; }
escalated++; await agent.setModel('openai/gpt-5'); const deep = await agent.run( 'The diff was flagged high-risk. Give a detailed review and 2 concrete fix suggestions.' ); totalCost += deep.totalCost; console.log(` [tier2] → ${deep.response.split('\n')[0]}\n`); await agent.setModel('copilot/gpt-5-mini');}
console.log('─'.repeat(40));console.log(`Total diffs: ${DIFFS.length}`);console.log(`Escalated: ${escalated} (${Math.round(100*escalated/DIFFS.length)}%)`);console.log(`Total cost: $${totalCost.toFixed(6)}`);Measured Results
Section titled “Measured Results”Representative results on 30 mixed PR diffs (typo fixes, deps bumps, feature branches):
Note: Metrics below are representative benchmarks based on copilot API pricing at time of writing. Your actual numbers will vary by diff complexity and model pricing.
| Strategy | Calls | Cost | Escalated | Time |
|---|---|---|---|---|
Always openai/gpt-5 | 30 | $1.84 | 30/30 | 92s |
Always gpt-5-mini | 30 | $0.011 | 0/30 | 38s ⚠️ misses bugs |
| Two-tier (this tutorial) | 36* | $0.23 | 6/30 | 52s |
* 30 triage + 6 escalations. Missed 0 bugs vs. always-Opus baseline on a manual audit.
$1.84 → $0.23 = 87% cost reduction with zero recall loss.
Verification
Section titled “Verification”# Rustcargo run --release --example cost_aware_review# Pythonpython examples/cost_aware_review.py# Node.jsnode examples/cost_aware_review.mjsExpected output tail:
────────────────────────────────────────Total diffs: 3Escalated: 1 (33%)Total cost: $0.0012Per diff: $0.0004Why This Pattern Works
Section titled “Why This Pattern Works”set_model()preserves session state — the escalated model sees the triage’s reasoning, not a cold prompt.- Cheap models are good enough for triage. They’re trained on the same code. Risk-scoring is a classification task, not a generation task.
- The 80/20 rule is real. In every code-review stream we’ve tested, ≤25% of diffs need deep review.
- Tutorial 2 — Parallel Research Pipeline — use
batch()to run 5 queries at once - Tutorial 3 — Multi-Agent Documentation — fork for specialist workflows