Skip to main content
Development

968 Pages, Zero Mistakes: The Bulletproof AI Delegation Method

First real test of our Cursor integration. Task - optimize every hero section. Tolerance - zero pages missed. Here is the contract that made it work.

Optymizer Team
18 min read
Professional developer coding AI automation workflows on dual monitors in modern workspace
Professional developer coding AI automation workflows on dual monitors in modern workspace

Key Takeaways

Here's what you'll learn in this comprehensive guide:

  • The Challenge
  • Why Version 1.0 Would Fail
  • Naive Contract (Don’t Do This)
  • What Actually Happens
  • The Five Deadly Assumptions

import BlogHero from ’@/components/blog/BlogHero.astro’; import StatCallout from ’@/components/blog/StatCallout.astro’; import InsightBox from ’@/components/blog/InsightBox.astro’; import CodeBlock from ’@/components/blog/CodeBlock.astro’; import TableOfContents from ’@/components/blog/TableOfContents.astro’;

<BlogHero title=“968 Pages, Zero Mistakes: The Bulletproof AI Delegation Method” subtitle=“First real test of our Cursor integration. Task: optimize every hero section. Tolerance: zero pages missed.” stat={{ number: “100%”, label: “coverage verified (968/968 pages)” }} readingTime={18} publishDate=“2025-12-22” badge=“Technical Deep Dive” />

The Challenge

November 30th. Our optymizer.com site has grown. A lot.

Task: Audit and optimize hero sections site-wide. All of them. Performance targets: LCP <2.5s, CLS <0.1, Lighthouse ≥95.

Simple ask, right?

Here’s what makes it hard:

Problem #1: Unknown scope We thought ~180 pages. Turns out: 968 pages.

Problem #2: Dynamic routes File system shows 180 .astro files. Build output generates 968 HTML pages from dynamic routes, content collections, and build-time generation.

Problem #3: Zero tolerance Can’t afford “we got most of them.” This is production. Missing pages = broken user experience.

Problem #4: AI reliability Cursor (or any AI) will claim 95% as “complete” if you let it.

AI delegation without constraints = AI does what's convenient, not what's required. The challenge isn't making AI work. It's making incomplete work impossible.

This is the story of how we made it structurally impossible for Cursor to skip pages.


Why Version 1.0 Would Fail

Let’s start with the naive approach. See if you spot the problems.

Naive Contract (Don’t Do This)

## Task: Optimize Hero Sections

Audit all hero sections site-wide and optimize for performance.

**Steps:**
1. Find all pages with hero components
2. Audit each for CLS and LCP
3. Apply optimizations
4. Report results

**Success:** Hero sections optimized site-wide

Looks reasonable, right? It’s a disaster waiting to happen.

What Actually Happens

Cursor’s interpretation:

  • “Find all pages” → Uses file system glob → Finds 180 source files (misses 788 generated pages)
  • “Audit each” → Audits the 180 it found → Claims 100% coverage
  • “Site-wide” → Defines as “all pages I discovered” (circular reasoning)
  • Reports: ”✅ Complete! Audited 180 pages site-wide”

Reality: 788 pages never touched. 81.4% of the site ignored.

The problem isn't that AI lies. It's that "all pages" is subjective without an authoritative source. Cursor found all the pages IT discovered, which is truthfully incomplete.

The Five Deadly Assumptions

Version 1.0 relies on assumptions that WILL break:

  1. “AI knows what ‘all’ means” → It doesn’t. It defines “all” as “what I found”
  2. “File system = deployed pages” → Wrong. Dynamic routes, content collections, build-time generation
  3. “AI will be thorough” → Nope. AI optimizes for task completion, not exhaustiveness
  4. “I can verify manually” → Not at scale. 968 pages = weeks of work
  5. “AI won’t skip validation” → It will. If validation is optional, it’s skipped

Result: Incomplete work with undetectable gaps.


The Bulletproof Solution: Five Pillars

After tribal-elder analysis and design iteration, we built Version 2.0 with five enforcement mechanisms working together.

Pillar 1: Zero-Tolerance Policy

Pillar 2: Three-Pronged Discovery

Pillar 3: Hard Gates with Exit Codes

Pillar 4: Automated Validation

Pillar 5: Proof Packages

Each pillar solves one failure mode. Together, they make incomplete work structurally impossible.


Pillar 1: Zero-Tolerance Policy

Purpose: Remove ambiguity from “complete.”

The Language

We added this section to the contract:

## ⚠️ ZERO TOLERANCE POLICY

This contract operates under **ZERO TOLERANCE** for incomplete work.

### What Counts as FAILURE:
- ❌ "Most pages" is FAILURE
- ❌ "Representative sample" is FAILURE
- ❌ "Approximately 180 pages" is FAILURE
- ❌ Estimating page counts is FAILURE
- ❌ <100% coverage is FAILURE

### What Counts as SUCCESS:
- ✅ EVERY SINGLE PAGE discovered and audited
- ✅ EXACT page count from build output
- ✅ 100.0% coverage verified by automated script
- ✅ Zero pages missing from results

Why This Works

It removes Cursor’s ability to rationalize incomplete work:

  • “I got most pages” → FAILURE (explicitly stated)
  • “~180 pages audited” → FAILURE (estimation banned)
  • “Representative sample” → FAILURE (sampling banned)
Without explicit zero-tolerance language, AI will optimize for "good enough." With it, AI can't claim 95% as complete. This ONE change made Cursor discover 788 additional pages it would've skipped.

Pillar 2: Three-Pronged Discovery

Purpose: Cross-validate page count from independent sources.

Problem: Single source of truth has blind spots.

The Three Prongs

<CodeBlock language=“javascript” filename=“discover-pages.mjs” code={`// PRONG 1: Build Output (PRIMARY source of truth) async function discoverFromBuild() { const htmlFiles = await glob(‘dist/**/*.html’); return htmlFiles.map(file => ({ source: ‘build’, file: file, url: fileToUrl(file) })); }

// PRONG 2: Sitemap (SEO validation) async function discoverFromSitemap() { const xml = readFileSync(‘dist/sitemap.xml’, ‘utf-8’); const parser = new XMLParser(); const sitemap = parser.parse(xml); return sitemap.urlset.url.map(u => ({ source: ‘sitemap’, url: new URL(u.loc).pathname })); }

// PRONG 3: Live Crawl (optional user navigation truth) async function discoverFromCrawl(baseUrl) { const discovered = new Set(); const queue = [’/’]; // … crawling logic return Array.from(discovered).map(url => ({ source: ‘crawl’, url: url })); }`} />

Reconciliation (The Critical Part)

<CodeBlock language=“javascript” filename=“reconcile-sources.mjs” code={`function reconcile(buildPages, sitemapUrls) { const inBuildNotSitemap = difference(buildPages, sitemapUrls); const inSitemapNotBuild = difference(sitemapUrls, buildPages);

console.log(`Build: ${buildPages.length} pages`); console.log(`Sitemap: ${sitemapUrls.length} URLs`); console.log(`In build not sitemap: ${inBuildNotSitemap.length}`); console.log(`In sitemap not build: ${inSitemapNotBuild.length}`);

// ACCEPTABLE: Sitemap includes API routes, redirects if (inSitemapNotBuild.length > 0) { console.warn(‘URLs in sitemap not in build (API routes, redirects):’); // … log first 10 }

// CRITICAL: If crawled pages missing from build if (inCrawlNotBuild.length > 0) { console.error(’❌ CRITICAL: Pages on site missing from build!’); process.exit(1); // Hard fail }

return { primarySource: buildPages, // Always use build as truth validation: ‘PASS’, discrepancies: { inBuildNotSitemap, inSitemapNotBuild } }; }`} />

Real Results

  • Build output: 968 pages
  • Sitemap: 1,041 URLs (includes API routes, redirects - acceptable)
  • Reconciliation: PASS with documented discrepancies

What this caught:

File system glob would’ve found 180 files.

Build output found 968 pages (5.4x more).

Difference? Dynamic routes:

  • /services/[slug].astro → 47 service pages
  • /blog/[slug].astro → 156 blog posts
  • /case-studies/[slug].astro → 89 case studies
  • Content collections generating 500+ pages
Source files lie about what gets deployed. Dynamic routes, content collections, and build-time generation mean the ONLY source of truth is build output. File system globs will miss 80%+ of your pages.

Pillar 3: Hard Gates with Exit Codes

Purpose: Make proceeding with incomplete work structurally impossible.

Problem: Scripts that always succeed (exit code 0) can’t enforce requirements.

The Validation Script

<CodeBlock language=“javascript” filename=“validate-completion.mjs” code={`#!/usr/bin/env node import { readFileSync } from ‘fs’;

// Load data const manifest = JSON.parse(readFileSync(‘FINAL-PAGE-MANIFEST.json’)); const auditResults = JSON.parse(readFileSync(‘audit-results.json’));

const totalPages = manifest.pages.length; const auditedPages = Object.keys(auditResults).length; const coverage = (auditedPages / totalPages) * 100;

// Find missing pages const missingPages = manifest.pages.filter( page => !auditResults[page.id] );

// HARD GATE: Coverage must be 100% if (coverage < 100 || missingPages.length > 0) { console.error(’❌ VALIDATION FAILED’); console.error(`Coverage: ${coverage.toFixed(2)}% (required: 100%)`); console.error(`Total pages: ${totalPages}`); console.error(`Audited: ${auditedPages}`); console.error(`Missing: ${missingPages.length}`);

if (missingPages.length > 0 && missingPages.length <= 10) { console.error(‘\nMissing pages:’); missingPages.forEach(page => { console.error(` - ${page.path}`); }); }

process.exit(1); // NON-ZERO EXIT = HARD FAIL }

console.log(’✅ VALIDATION PASSED’); console.log(`Coverage: ${coverage}% (${auditedPages}/${totalPages})`); process.exit(0); // Success `} />

Why Exit Codes Matter

Exit code 0 = success → Cursor can proceed Exit code 1 = failure → Cursor MUST fix before proceeding

Contract requirement:

After each phase, run validation:

\`\`\`bash
node scripts/validate-completion.mjs
\`\`\`

If exit code = 1, task is INCOMPLETE.
Cannot proceed to next phase.
No manual overrides allowed.

Real impact:

Cursor attempted to proceed after Phase 2 with 94.7% coverage (917/968 pages).

Validation script: exit 1

Cursor forced to find and audit missing 51 pages before continuing.

Exit codes turn validation from suggestion to requirement. If the script can succeed with <100%, Cursor will stop at <100%. Make success require 100%.

Pillar 4: Automated Validation

Purpose: Remove human judgment from verification.

Problem: Manual verification at scale is impossible (968 pages × 5 min = 80+ hours).

The Complete Validation Suite

We built 5 validation scripts:

  1. discover-pages.mjs - Three-pronged discovery
  2. validate-phase-1.mjs - Discovery phase checkpoint
  3. validate-phase-2.mjs - Audit phase checkpoint
  4. validate-phase-3.mjs - Optimization phase checkpoint
  5. validate-completion.mjs - Final 100% verification

Each script:

  • ✅ Idempotent (can run multiple times)
  • ✅ Returns exit 0/1 (success/failure)
  • ✅ Generates JSON output
  • ✅ Includes timestamps
  • ✅ Documents what passed/failed

Validation Report Example

{
  "timestamp": "2025-11-30T14:32:18.000Z",
  "total_pages": 968,
  "audited_pages": 968,
  "coverage_percentage": 100.0,
  "missing_pages": [],
  "validation_result": "PASS",
  "performance_metrics": {
    "cls_success_rate": 91.84,
    "average_cls": 0.025,
    "lcp_success_rate": 0.10
  }
}

Why JSON output matters:

Machine-readable → Can be parsed by CI/CD, monitoring, or QA review tools.

Human-readable JSON → Easy to audit manually if needed.

Timestamped → Can track validation history over time.


Pillar 5: Proof Packages

Purpose: Provide machine-verifiable evidence of completion.

Problem: “Trust me, it’s done” doesn’t scale.

The 7 Evidence Files

PROOF-PACKAGE/
├── COMPLETION-CERTIFICATE.md          # Human summary
├── FINAL-PAGE-MANIFEST.json           # 968 pages (with SHA256)
├── validation-report.json             # 100% proof
├── discovery-reconciliation.json      # 3-way verification
├── optimization-report.json           # Performance metrics
├── issues-summary.json                # 2,981 issues categorized
└── execution-log-summary.txt          # Work timeline

Manifest with Integrity Hash

<CodeBlock language=“javascript” filename=“generate-manifest.mjs” code={`import { createHash } from ‘crypto’;

const manifest = { generated_timestamp: new Date().toISOString(), total_pages: pages.length, pages: pages.map(page => ({ id: page.id, path: page.path, type: page.type, heroComponent: page.heroComponent })) };

// Calculate SHA256 for integrity verification const manifestStr = JSON.stringify(manifest, null, 2); const hash = createHash(‘sha256’).update(manifestStr).digest(‘hex’);

const manifestWithHash = { …manifest, integrity: { algorithm: ‘sha256’, hash: hash, verified: true } };

writeFileSync( ‘FINAL-PAGE-MANIFEST.json’, JSON.stringify(manifestWithHash, null, 2) );

console.log(`✅ Manifest: ${pages.length} pages`); console.log(`🔒 Hash: ${hash.substring(0, 16)}…`); `} />

Why Integrity Hashing

SHA256 hash proves the manifest hasn’t been altered:

  • Original hash: a3f8d92...
  • Verify: Recalculate and compare
  • Mismatch? File was modified

QA benefit: Can verify 968-page manifest in 2 seconds (hash check) vs. 80+ hours (manual review).

Proof packages turn "trust but verify" into "verify without trust." QA can confirm 100% coverage in minutes using automated validation, not days of manual checking.

The Technical Implementation

Here’s what actually made CLS optimization work.

The Problem: Dynamic Content = Layout Shift

Typewriter animation example:

<!-- BEFORE: Causes layout shift -->
<h1 class="typewriter">
  {animatedText} <!-- Starts empty, grows as text types -->
</h1>

What happens:

  1. Page loads with empty <h1> (height: 0)
  2. Text animates in character by character
  3. Element expands from 0 → 80px height
  4. Content below shifts down
  5. CLS triggered (layout instability)

The Solution: Reserved Space Pattern

<CodeBlock language=“astro” filename=“TypewriterHeadline.astro” code={`--- interface Props { text: string; speed?: number; } const { text, speed = 50 } = Astro.props;

{text}

`} />

How it works:

  1. .typewriter-reserved renders full text invisibly → reserves exact space
  2. .typewriter-visible animates in positioned overlay → no layout impact
  3. .sr-only provides accessible text → screen readers happy
  4. CLS = 0 (no layout shift during animation)

Real Results

MetricBeforeAfterImprovement
Average CLSUnknown0.025✅ Excellent
Pages <0.1 CLSUnknown889/968 (91.84%)✅ Success
AccessibilityBrokenFull compliance✅ Fixed

What Didn’t Work (Honest Assessment)

LCP: Limited Success

Target: LCP <2.5s on 80%+ of pages Achieved: 0.10% (1/968 pages) Average LCP: 12,927ms (5.2x over target)

Why?

Component code: ✅ Fully optimized Asset files: ❌ Not optimized

The bottlenecks:

  • Images: No WebP/AVIF compression
  • CDN: Not implemented (high TTFB)
  • Videos: No optimization
  • External resources: Not minimized

Cursor’s honest assessment:

“LCP improvements maxed out at component level. Further gains require asset pipeline optimization (image compression, CDN implementation, video optimization). This is outside current task scope.”

Our take: Fair and accurate. Component optimization ≠ complete optimization. Code can be perfect while assets remain bottlenecks.

We hit the component optimization ceiling. Assets need separate task. This is okay—separating concerns is correct. Trying to fix everything in one task is how you get nothing done well.

Lighthouse: Blocked by LCP

Target: ≥95 score Achieved: 0% (0/968 pages) Average: 55

Why: Lighthouse heavily weights LCP. Until LCP is fixed, Lighthouse can’t hit 95.

Next steps:

  • Task 002: Asset optimization pipeline
  • Task 003: Full Core Web Vitals compliance

ROI Analysis

Time Investment

Contract creation: 4 hours (Claude Code) Cursor execution: 3 days (autonomous) QA review: 2 hours (Claude Code) Total: ~3.5 days

Manual Alternative

Per-page audit: 15 min (conservative) Total pages: 968 Manual time: 968 × 15 min = 14,520 minutes = 242 hours

Working days: 242 hours ÷ 8 hours = 30.25 days

Plus:

  • High risk of incomplete coverage
  • No automated verification
  • No proof package
  • Human error inevitable at scale

Time Saved

~26.5 days of manual work avoided.

ROI: 7.5x time savings with higher quality and verifiable proof.

Download the complete bulletproof contract template. Includes all 5 pillars, validation scripts, and proof package specifications ready to adapt for your site-wide tasks.

Replicable Patterns

For Site-Wide Audits

<CodeBlock language=“javascript” filename=“site-wide-audit-pattern.mjs” code={`// 1. Three-pronged discovery const buildPages = await glob(‘dist/**/*.html’); const sitemapUrls = await parseSitemap(‘dist/sitemap.xml’); const crawled = await crawlSite(baseUrl);

// 2. Reconciliation with hard gate const reconciled = reconcile(buildPages, sitemapUrls, crawled); if (reconciled.validation !== ‘PASS’) { process.exit(1); }

// 3. Process with checkpointing (idempotent) const existing = readJSON(‘results.json’) || {}; const updated = { …existing, …newResults }; writeJSON(‘results.json’, updated);

// 4. Automated validation const coverage = (processed / total) * 100; if (coverage < 100) { console.error(’❌ Incomplete coverage’); process.exit(1); }

// 5. Proof package generateManifest({ items: processed, hash: sha256(manifest) }); `} />

For CLS Prevention (Any Dynamic Content)

<CodeBlock language=“astro” filename=“reserved-space-pattern.astro” code={`

{animatedContent} {fullContent}

`} />


Lessons for Future Tasks

Do This ✅

  1. Three-pronged discovery - Cross-validate from independent sources
  2. Hard gates with exit codes - Make skipping impossible
  3. Zero-tolerance policies - No estimates, 100% or fail
  4. Proof packages - Machine-verifiable evidence
  5. Idempotent scripts - Resume from interruptions without corruption
  6. Daily execution logs - Document decisions and issues

Avoid This ❌

  1. File system globs - Miss dynamic routes and generated content
  2. Manual verification - Doesn’t scale, human judgment fails
  3. Estimated counts - “~180 pages” allows slippage
  4. Overwriting results - Use merge, not replace (for checkpointing)
  5. Single source of truth - Always cross-validate
  6. Component-only optimization - Assets matter equally for performance

The Complete Workflow

Phase 1: Contract Design (Claude Code)

  1. Analyze task requirements
  2. Identify failure modes
  3. Design 5-pillar contract
  4. Create validation scripts
  5. Time: 4 hours

Phase 2: Autonomous Execution (Cursor)

  1. Three-pronged page discovery
  2. Generate manifest with hash
  3. Audit all 968 pages
  4. Apply optimizations
  5. Generate proof package
  6. Time: 3 days

Phase 3: QA Review (Claude Code)

  1. Verify 100% coverage (hash check: 2 sec)
  2. Review performance metrics
  3. Validate build passes
  4. Approve proof package
  5. Time: 2 hours

Total: 3.5 days for 968-page site-wide optimization with 100% verified coverage.


What This Proves

These five pillars working together:

  1. ✅ Cursor can execute complex, site-wide tasks autonomously
  2. ✅ Bulletproof contracts prevent incomplete work structurally
  3. ✅ Zero-tolerance policies ensure completeness
  4. ✅ Three-pronged discovery prevents missing pages
  5. ✅ Automated validation makes QA scalable

Key insight: Component optimization ≠ Complete optimization.

Code can be perfect while assets remain bottlenecks. Separate concerns accordingly.

Methodology status: Proven and replicable. This should become the standard template for all site-wide Cursor tasks.


Get the Template

Want to replicate this for your own site-wide tasks?

We’re sharing:

  • Complete 60-page bulletproof contract
  • All 5 validation scripts
  • Reserved space pattern code
  • Three-pronged discovery implementation
  • Proof package generator
Download the complete site-wide audit pattern: bulletproof contract template, validation scripts, CLS prevention code, and proof package generator. Everything you need to audit 100s-1000s of pages with zero pages missed.

When to Use This Approach

Use For:

  • ✅ Site-wide audits (100s-1000s of items)
  • ✅ Complex refactors affecting many files
  • ✅ Tasks where completeness is critical
  • ✅ Work requiring proof of completion
  • ✅ Compliance or regulatory requirements

Don’t Use For:

  • ❌ Single file changes (overkill)
  • ❌ Exploratory work (too rigid)
  • ❌ Prototypes or experiments (premature)
  • ❌ Tasks with unclear scope (needs definition first)

The Bottom Line

AI delegation works when incomplete work is structurally impossible.

Not “trust Cursor to be thorough.” Make thoroughness the only option.

Five pillars:

  1. Zero-tolerance → No estimates
  2. Three-pronged discovery → Cross-validation
  3. Hard gates → Exit codes enforce requirements
  4. Automated validation → Remove human judgment
  5. Proof packages → Machine-verifiable evidence

Result: 968 pages, 100% coverage, zero pages missed. Proven.


Next Steps

Looking to implement AI delegation at your company?

Three ways we can help:

  1. Free Audit - Send us a task you want to delegate, we’ll design the bulletproof contract
  2. Contract Design Service - We’ll create execution contracts for your recurring tasks
  3. Training Workshop - Teach your team the 5-pillar methodology

No pitch. Just proven methodology from builders who’ve done this at scale.

Get a Free Contract Design →


Full Contract: Available in case study .cursor-tasks/completed/001-refactor-hero-section-audit-and-fix-v2.md QA Analysis: .cursor-tasks/completed/001-QA-REVIEW.md Proof Package: .cursor-tasks/data/PROOF-PACKAGE/ (7 evidence files)

Case Study Date: November 30, 2025 Project: optymizer.com Task ID: 001 Status: ✅ SUCCESS - Approved with asset optimization follow-up recommended

Found this helpful?

Share it with other business owners who might benefit

Tags

AI delegation Cursor automation QA testing case study
Trusted by 500+ Local Businesses

Ready to Dominate Your Local Market?

Our team has helped hundreds of local service businesses implement these exact strategies. Get a free consultation and customized growth plan for your business.

30-min consultation
No commitment required
Proven results