Key Takeaways
Here's what you'll learn in this comprehensive guide:
- The Challenge
- Why Version 1.0 Would Fail
- Naive Contract (Don’t Do This)
- What Actually Happens
- The Five Deadly Assumptions
import BlogHero from ’@/components/blog/BlogHero.astro’; import StatCallout from ’@/components/blog/StatCallout.astro’; import InsightBox from ’@/components/blog/InsightBox.astro’; import CodeBlock from ’@/components/blog/CodeBlock.astro’; import TableOfContents from ’@/components/blog/TableOfContents.astro’;
<BlogHero title=“968 Pages, Zero Mistakes: The Bulletproof AI Delegation Method” subtitle=“First real test of our Cursor integration. Task: optimize every hero section. Tolerance: zero pages missed.” stat={{ number: “100%”, label: “coverage verified (968/968 pages)” }} readingTime={18} publishDate=“2025-12-22” badge=“Technical Deep Dive” />
The Challenge
November 30th. Our optymizer.com site has grown. A lot.
Task: Audit and optimize hero sections site-wide. All of them. Performance targets: LCP <2.5s, CLS <0.1, Lighthouse ≥95.
Simple ask, right?
Here’s what makes it hard:
Problem #1: Unknown scope We thought ~180 pages. Turns out: 968 pages.
Problem #2: Dynamic routes
File system shows 180 .astro files. Build output generates 968 HTML pages from dynamic routes, content collections, and build-time generation.
Problem #3: Zero tolerance Can’t afford “we got most of them.” This is production. Missing pages = broken user experience.
Problem #4: AI reliability Cursor (or any AI) will claim 95% as “complete” if you let it.
This is the story of how we made it structurally impossible for Cursor to skip pages.
Why Version 1.0 Would Fail
Let’s start with the naive approach. See if you spot the problems.
Naive Contract (Don’t Do This)
## Task: Optimize Hero Sections
Audit all hero sections site-wide and optimize for performance.
**Steps:**
1. Find all pages with hero components
2. Audit each for CLS and LCP
3. Apply optimizations
4. Report results
**Success:** Hero sections optimized site-wide
Looks reasonable, right? It’s a disaster waiting to happen.
What Actually Happens
Cursor’s interpretation:
- “Find all pages” → Uses file system glob → Finds 180 source files (misses 788 generated pages)
- “Audit each” → Audits the 180 it found → Claims 100% coverage
- “Site-wide” → Defines as “all pages I discovered” (circular reasoning)
- Reports: ”✅ Complete! Audited 180 pages site-wide”
Reality: 788 pages never touched. 81.4% of the site ignored.
The Five Deadly Assumptions
Version 1.0 relies on assumptions that WILL break:
- “AI knows what ‘all’ means” → It doesn’t. It defines “all” as “what I found”
- “File system = deployed pages” → Wrong. Dynamic routes, content collections, build-time generation
- “AI will be thorough” → Nope. AI optimizes for task completion, not exhaustiveness
- “I can verify manually” → Not at scale. 968 pages = weeks of work
- “AI won’t skip validation” → It will. If validation is optional, it’s skipped
Result: Incomplete work with undetectable gaps.
The Bulletproof Solution: Five Pillars
After tribal-elder analysis and design iteration, we built Version 2.0 with five enforcement mechanisms working together.
Pillar 1: Zero-Tolerance Policy
Pillar 2: Three-Pronged Discovery
Pillar 3: Hard Gates with Exit Codes
Pillar 4: Automated Validation
Pillar 5: Proof Packages
Each pillar solves one failure mode. Together, they make incomplete work structurally impossible.
Pillar 1: Zero-Tolerance Policy
Purpose: Remove ambiguity from “complete.”
The Language
We added this section to the contract:
## ⚠️ ZERO TOLERANCE POLICY
This contract operates under **ZERO TOLERANCE** for incomplete work.
### What Counts as FAILURE:
- ❌ "Most pages" is FAILURE
- ❌ "Representative sample" is FAILURE
- ❌ "Approximately 180 pages" is FAILURE
- ❌ Estimating page counts is FAILURE
- ❌ <100% coverage is FAILURE
### What Counts as SUCCESS:
- ✅ EVERY SINGLE PAGE discovered and audited
- ✅ EXACT page count from build output
- ✅ 100.0% coverage verified by automated script
- ✅ Zero pages missing from results
Why This Works
It removes Cursor’s ability to rationalize incomplete work:
- “I got most pages” → FAILURE (explicitly stated)
- “~180 pages audited” → FAILURE (estimation banned)
- “Representative sample” → FAILURE (sampling banned)
Pillar 2: Three-Pronged Discovery
Purpose: Cross-validate page count from independent sources.
Problem: Single source of truth has blind spots.
The Three Prongs
<CodeBlock language=“javascript” filename=“discover-pages.mjs” code={`// PRONG 1: Build Output (PRIMARY source of truth) async function discoverFromBuild() { const htmlFiles = await glob(‘dist/**/*.html’); return htmlFiles.map(file => ({ source: ‘build’, file: file, url: fileToUrl(file) })); }
// PRONG 2: Sitemap (SEO validation) async function discoverFromSitemap() { const xml = readFileSync(‘dist/sitemap.xml’, ‘utf-8’); const parser = new XMLParser(); const sitemap = parser.parse(xml); return sitemap.urlset.url.map(u => ({ source: ‘sitemap’, url: new URL(u.loc).pathname })); }
// PRONG 3: Live Crawl (optional user navigation truth) async function discoverFromCrawl(baseUrl) { const discovered = new Set(); const queue = [’/’]; // … crawling logic return Array.from(discovered).map(url => ({ source: ‘crawl’, url: url })); }`} />
Reconciliation (The Critical Part)
<CodeBlock language=“javascript” filename=“reconcile-sources.mjs” code={`function reconcile(buildPages, sitemapUrls) { const inBuildNotSitemap = difference(buildPages, sitemapUrls); const inSitemapNotBuild = difference(sitemapUrls, buildPages);
console.log(`Build: ${buildPages.length} pages`); console.log(`Sitemap: ${sitemapUrls.length} URLs`); console.log(`In build not sitemap: ${inBuildNotSitemap.length}`); console.log(`In sitemap not build: ${inSitemapNotBuild.length}`);
// ACCEPTABLE: Sitemap includes API routes, redirects if (inSitemapNotBuild.length > 0) { console.warn(‘URLs in sitemap not in build (API routes, redirects):’); // … log first 10 }
// CRITICAL: If crawled pages missing from build if (inCrawlNotBuild.length > 0) { console.error(’❌ CRITICAL: Pages on site missing from build!’); process.exit(1); // Hard fail }
return { primarySource: buildPages, // Always use build as truth validation: ‘PASS’, discrepancies: { inBuildNotSitemap, inSitemapNotBuild } }; }`} />
Real Results
- Build output: 968 pages
- Sitemap: 1,041 URLs (includes API routes, redirects - acceptable)
- Reconciliation: PASS with documented discrepancies
What this caught:
File system glob would’ve found 180 files.
Build output found 968 pages (5.4x more).
Difference? Dynamic routes:
/services/[slug].astro→ 47 service pages/blog/[slug].astro→ 156 blog posts/case-studies/[slug].astro→ 89 case studies- Content collections generating 500+ pages
Pillar 3: Hard Gates with Exit Codes
Purpose: Make proceeding with incomplete work structurally impossible.
Problem: Scripts that always succeed (exit code 0) can’t enforce requirements.
The Validation Script
<CodeBlock language=“javascript” filename=“validate-completion.mjs” code={`#!/usr/bin/env node import { readFileSync } from ‘fs’;
// Load data const manifest = JSON.parse(readFileSync(‘FINAL-PAGE-MANIFEST.json’)); const auditResults = JSON.parse(readFileSync(‘audit-results.json’));
const totalPages = manifest.pages.length; const auditedPages = Object.keys(auditResults).length; const coverage = (auditedPages / totalPages) * 100;
// Find missing pages const missingPages = manifest.pages.filter( page => !auditResults[page.id] );
// HARD GATE: Coverage must be 100% if (coverage < 100 || missingPages.length > 0) { console.error(’❌ VALIDATION FAILED’); console.error(`Coverage: ${coverage.toFixed(2)}% (required: 100%)`); console.error(`Total pages: ${totalPages}`); console.error(`Audited: ${auditedPages}`); console.error(`Missing: ${missingPages.length}`);
if (missingPages.length > 0 && missingPages.length <= 10) { console.error(‘\nMissing pages:’); missingPages.forEach(page => { console.error(` - ${page.path}`); }); }
process.exit(1); // NON-ZERO EXIT = HARD FAIL }
console.log(’✅ VALIDATION PASSED’); console.log(`Coverage: ${coverage}% (${auditedPages}/${totalPages})`); process.exit(0); // Success `} />
Why Exit Codes Matter
Exit code 0 = success → Cursor can proceed Exit code 1 = failure → Cursor MUST fix before proceeding
Contract requirement:
After each phase, run validation:
\`\`\`bash
node scripts/validate-completion.mjs
\`\`\`
If exit code = 1, task is INCOMPLETE.
Cannot proceed to next phase.
No manual overrides allowed.
Real impact:
Cursor attempted to proceed after Phase 2 with 94.7% coverage (917/968 pages).
Validation script: exit 1
Cursor forced to find and audit missing 51 pages before continuing.
Pillar 4: Automated Validation
Purpose: Remove human judgment from verification.
Problem: Manual verification at scale is impossible (968 pages × 5 min = 80+ hours).
The Complete Validation Suite
We built 5 validation scripts:
- discover-pages.mjs - Three-pronged discovery
- validate-phase-1.mjs - Discovery phase checkpoint
- validate-phase-2.mjs - Audit phase checkpoint
- validate-phase-3.mjs - Optimization phase checkpoint
- validate-completion.mjs - Final 100% verification
Each script:
- ✅ Idempotent (can run multiple times)
- ✅ Returns exit 0/1 (success/failure)
- ✅ Generates JSON output
- ✅ Includes timestamps
- ✅ Documents what passed/failed
Validation Report Example
{
"timestamp": "2025-11-30T14:32:18.000Z",
"total_pages": 968,
"audited_pages": 968,
"coverage_percentage": 100.0,
"missing_pages": [],
"validation_result": "PASS",
"performance_metrics": {
"cls_success_rate": 91.84,
"average_cls": 0.025,
"lcp_success_rate": 0.10
}
}
Why JSON output matters:
Machine-readable → Can be parsed by CI/CD, monitoring, or QA review tools.
Human-readable JSON → Easy to audit manually if needed.
Timestamped → Can track validation history over time.
Pillar 5: Proof Packages
Purpose: Provide machine-verifiable evidence of completion.
Problem: “Trust me, it’s done” doesn’t scale.
The 7 Evidence Files
PROOF-PACKAGE/
├── COMPLETION-CERTIFICATE.md # Human summary
├── FINAL-PAGE-MANIFEST.json # 968 pages (with SHA256)
├── validation-report.json # 100% proof
├── discovery-reconciliation.json # 3-way verification
├── optimization-report.json # Performance metrics
├── issues-summary.json # 2,981 issues categorized
└── execution-log-summary.txt # Work timeline
Manifest with Integrity Hash
<CodeBlock language=“javascript” filename=“generate-manifest.mjs” code={`import { createHash } from ‘crypto’;
const manifest = { generated_timestamp: new Date().toISOString(), total_pages: pages.length, pages: pages.map(page => ({ id: page.id, path: page.path, type: page.type, heroComponent: page.heroComponent })) };
// Calculate SHA256 for integrity verification const manifestStr = JSON.stringify(manifest, null, 2); const hash = createHash(‘sha256’).update(manifestStr).digest(‘hex’);
const manifestWithHash = { …manifest, integrity: { algorithm: ‘sha256’, hash: hash, verified: true } };
writeFileSync( ‘FINAL-PAGE-MANIFEST.json’, JSON.stringify(manifestWithHash, null, 2) );
console.log(`✅ Manifest: ${pages.length} pages`); console.log(`🔒 Hash: ${hash.substring(0, 16)}…`); `} />
Why Integrity Hashing
SHA256 hash proves the manifest hasn’t been altered:
- Original hash:
a3f8d92... - Verify: Recalculate and compare
- Mismatch? File was modified
QA benefit: Can verify 968-page manifest in 2 seconds (hash check) vs. 80+ hours (manual review).
The Technical Implementation
Here’s what actually made CLS optimization work.
The Problem: Dynamic Content = Layout Shift
Typewriter animation example:
<!-- BEFORE: Causes layout shift -->
<h1 class="typewriter">
{animatedText} <!-- Starts empty, grows as text types -->
</h1>
What happens:
- Page loads with empty
<h1>(height: 0) - Text animates in character by character
- Element expands from 0 → 80px height
- Content below shifts down
- CLS triggered (layout instability)
The Solution: Reserved Space Pattern
<CodeBlock language=“astro” filename=“TypewriterHeadline.astro” code={`--- interface Props { text: string; speed?: number; } const { text, speed = 50 } = Astro.props;
{text}
`} />
How it works:
.typewriter-reservedrenders full text invisibly → reserves exact space.typewriter-visibleanimates in positioned overlay → no layout impact.sr-onlyprovides accessible text → screen readers happy- CLS = 0 (no layout shift during animation)
Real Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average CLS | Unknown | 0.025 | ✅ Excellent |
| Pages <0.1 CLS | Unknown | 889/968 (91.84%) | ✅ Success |
| Accessibility | Broken | Full compliance | ✅ Fixed |
What Didn’t Work (Honest Assessment)
LCP: Limited Success
Target: LCP <2.5s on 80%+ of pages Achieved: 0.10% (1/968 pages) Average LCP: 12,927ms (5.2x over target)
Why?
Component code: ✅ Fully optimized Asset files: ❌ Not optimized
The bottlenecks:
- Images: No WebP/AVIF compression
- CDN: Not implemented (high TTFB)
- Videos: No optimization
- External resources: Not minimized
Cursor’s honest assessment:
“LCP improvements maxed out at component level. Further gains require asset pipeline optimization (image compression, CDN implementation, video optimization). This is outside current task scope.”
Our take: Fair and accurate. Component optimization ≠ complete optimization. Code can be perfect while assets remain bottlenecks.
Lighthouse: Blocked by LCP
Target: ≥95 score Achieved: 0% (0/968 pages) Average: 55
Why: Lighthouse heavily weights LCP. Until LCP is fixed, Lighthouse can’t hit 95.
Next steps:
- Task 002: Asset optimization pipeline
- Task 003: Full Core Web Vitals compliance
ROI Analysis
Time Investment
Contract creation: 4 hours (Claude Code) Cursor execution: 3 days (autonomous) QA review: 2 hours (Claude Code) Total: ~3.5 days
Manual Alternative
Per-page audit: 15 min (conservative) Total pages: 968 Manual time: 968 × 15 min = 14,520 minutes = 242 hours
Working days: 242 hours ÷ 8 hours = 30.25 days
Plus:
- High risk of incomplete coverage
- No automated verification
- No proof package
- Human error inevitable at scale
Time Saved
~26.5 days of manual work avoided.
ROI: 7.5x time savings with higher quality and verifiable proof.
Replicable Patterns
For Site-Wide Audits
<CodeBlock language=“javascript” filename=“site-wide-audit-pattern.mjs” code={`// 1. Three-pronged discovery const buildPages = await glob(‘dist/**/*.html’); const sitemapUrls = await parseSitemap(‘dist/sitemap.xml’); const crawled = await crawlSite(baseUrl);
// 2. Reconciliation with hard gate const reconciled = reconcile(buildPages, sitemapUrls, crawled); if (reconciled.validation !== ‘PASS’) { process.exit(1); }
// 3. Process with checkpointing (idempotent) const existing = readJSON(‘results.json’) || {}; const updated = { …existing, …newResults }; writeJSON(‘results.json’, updated);
// 4. Automated validation const coverage = (processed / total) * 100; if (coverage < 100) { console.error(’❌ Incomplete coverage’); process.exit(1); }
// 5. Proof package generateManifest({ items: processed, hash: sha256(manifest) }); `} />
For CLS Prevention (Any Dynamic Content)
<CodeBlock language=“astro” filename=“reserved-space-pattern.astro” code={`
`} />
Lessons for Future Tasks
Do This ✅
- Three-pronged discovery - Cross-validate from independent sources
- Hard gates with exit codes - Make skipping impossible
- Zero-tolerance policies - No estimates, 100% or fail
- Proof packages - Machine-verifiable evidence
- Idempotent scripts - Resume from interruptions without corruption
- Daily execution logs - Document decisions and issues
Avoid This ❌
- File system globs - Miss dynamic routes and generated content
- Manual verification - Doesn’t scale, human judgment fails
- Estimated counts - “~180 pages” allows slippage
- Overwriting results - Use merge, not replace (for checkpointing)
- Single source of truth - Always cross-validate
- Component-only optimization - Assets matter equally for performance
The Complete Workflow
Phase 1: Contract Design (Claude Code)
- Analyze task requirements
- Identify failure modes
- Design 5-pillar contract
- Create validation scripts
- Time: 4 hours
Phase 2: Autonomous Execution (Cursor)
- Three-pronged page discovery
- Generate manifest with hash
- Audit all 968 pages
- Apply optimizations
- Generate proof package
- Time: 3 days
Phase 3: QA Review (Claude Code)
- Verify 100% coverage (hash check: 2 sec)
- Review performance metrics
- Validate build passes
- Approve proof package
- Time: 2 hours
Total: 3.5 days for 968-page site-wide optimization with 100% verified coverage.
What This Proves
These five pillars working together:
- ✅ Cursor can execute complex, site-wide tasks autonomously
- ✅ Bulletproof contracts prevent incomplete work structurally
- ✅ Zero-tolerance policies ensure completeness
- ✅ Three-pronged discovery prevents missing pages
- ✅ Automated validation makes QA scalable
Key insight: Component optimization ≠ Complete optimization.
Code can be perfect while assets remain bottlenecks. Separate concerns accordingly.
Methodology status: Proven and replicable. This should become the standard template for all site-wide Cursor tasks.
Get the Template
Want to replicate this for your own site-wide tasks?
We’re sharing:
- Complete 60-page bulletproof contract
- All 5 validation scripts
- Reserved space pattern code
- Three-pronged discovery implementation
- Proof package generator
When to Use This Approach
Use For:
- ✅ Site-wide audits (100s-1000s of items)
- ✅ Complex refactors affecting many files
- ✅ Tasks where completeness is critical
- ✅ Work requiring proof of completion
- ✅ Compliance or regulatory requirements
Don’t Use For:
- ❌ Single file changes (overkill)
- ❌ Exploratory work (too rigid)
- ❌ Prototypes or experiments (premature)
- ❌ Tasks with unclear scope (needs definition first)
The Bottom Line
AI delegation works when incomplete work is structurally impossible.
Not “trust Cursor to be thorough.” Make thoroughness the only option.
Five pillars:
- Zero-tolerance → No estimates
- Three-pronged discovery → Cross-validation
- Hard gates → Exit codes enforce requirements
- Automated validation → Remove human judgment
- Proof packages → Machine-verifiable evidence
Result: 968 pages, 100% coverage, zero pages missed. Proven.
Next Steps
Looking to implement AI delegation at your company?
Three ways we can help:
- Free Audit - Send us a task you want to delegate, we’ll design the bulletproof contract
- Contract Design Service - We’ll create execution contracts for your recurring tasks
- Training Workshop - Teach your team the 5-pillar methodology
No pitch. Just proven methodology from builders who’ve done this at scale.
Full Contract: Available in case study .cursor-tasks/completed/001-refactor-hero-section-audit-and-fix-v2.md
QA Analysis: .cursor-tasks/completed/001-QA-REVIEW.md
Proof Package: .cursor-tasks/data/PROOF-PACKAGE/ (7 evidence files)
Case Study Date: November 30, 2025 Project: optymizer.com Task ID: 001 Status: ✅ SUCCESS - Approved with asset optimization follow-up recommended
