Phase 4 Playbook — Quality & Hardening
Duration: 3-7 days | Agents: 8 | Gate Keeper: Reality Checker (sole authority)
Objective
The final quality gauntlet. The Reality Checker defaults to “NEEDS WORK” — you must prove production readiness with overwhelming evidence. This phase exists because first implementations typically need 2-3 revision cycles, and that’s healthy.Pre-Conditions
Critical Mindset
Step 1: Evidence Collection (Day 1-2, All Parallel)
Evidence Collector
Comprehensive Visual EvidenceFull screenshot suite:
- Desktop (1920x1080) — every page/view
- Tablet (768x1024) — every page/view
- Mobile (375x667) — every page/view
- Navigation flows
- Form interactions (all states)
- Modal/dialog interactions
- Accordion/expandable content
- Light mode — all pages
- Dark mode — all pages
- System preference detection
- 404 pages, validation errors, network errors, empty states
API Tester
Full API RegressionEndpoint regression:
- All endpoints (GET, POST, PUT, DELETE)
- Auth/authorization verification
- Input validation testing
- Error response verification
- Cross-service communication
- Database operations
- External API integration
- Rate limiting, large payloads, concurrent requests, malformed input
Performance Benchmarker
Load TestingLoad test at 10x expected traffic:
- Response time distribution (P50, P95, P99)
- Throughput under load
- Error rate under load
- Resource utilization
- LCP < 2.5s
- FID < 100ms
- CLS < 0.1
- Query execution times
- Connection pool utilization
- Index effectiveness
- Breaking point identification
- Graceful degradation
- Recovery time
Legal Compliance Checker
Final Compliance AuditPrivacy compliance:
- Privacy policy accuracy
- Consent management
- Data subject rights
- Cookie consent
- Data encryption (at rest/in transit)
- Authentication security
- Input sanitization
- OWASP Top 10 check
- GDPR (if applicable)
- CCPA (if applicable)
- Industry-specific requirements
- WCAG 2.1 AA verification
- Screen reader compatibility
- Keyboard navigation
Step 2: Analysis (Day 3-4, Parallel)
- Test Results Analyzer
- Workflow Optimizer
- Infrastructure Maintainer
Quality Metrics AggregationInput: ALL Step 1 reportsDeliverables:
- Overall quality score
- Category breakdown (visual, functional, performance, security, compliance)
- Issue severity distribution
- Trend analysis (if multiple test cycles)
- Issue prioritization (Critical/High/Medium/Low)
- Risk assessment
- Production readiness probability
Step 3: Final Judgment (Day 5-7, Sequential)
Reality Checker — THE FINAL VERDICT
Reality Checker — THE FINAL VERDICT
MANDATORY PROCESS — DO NOT SKIP:Step 1: Reality Check Commands
- Verify what was actually built (ls, grep for claimed features)
- Cross-check claimed features against specification
- Run comprehensive screenshot capture
- Review all evidence from Step 1 and Step 2
- Review Evidence Collector findings
- Cross-reference with API Tester results
- Verify Performance Benchmarker data
- Confirm Legal Compliance Checker findings
- Test COMPLETE user journeys (not individual features)
- Verify responsive behavior across ALL devices
- Check interaction flows end-to-end
- Review actual performance data
- Quote EXACT text from original specification
- Compare with ACTUAL implementation evidence
- Document EVERY gap between spec and reality
- No assumptions — evidence only
- READY: Overwhelming evidence of production readiness (rare first pass)
- NEEDS WORK: Specific issues identified with fix list (expected)
- NOT READY: Major architectural issues requiring Phase 1/2 revisit
Quality Gate — THE FINAL GATE
| # | Criterion | Threshold | Evidence Required |
|---|---|---|---|
| 1 | User journeys complete | All critical paths working | Reality Checker screenshots |
| 2 | Cross-device consistency | Desktop + Tablet + Mobile | Responsive screenshots |
| 3 | Performance certified | P95 < 200ms, LCP < 2.5s, >99.9% uptime | Performance report |
| 4 | Security validated | Zero critical vulnerabilities | Security scan + compliance |
| 5 | Compliance certified | All regulatory requirements met | Legal Compliance report |
| 6 | Specification compliance | 100% of spec requirements | Point-by-point verification |
| 7 | Infrastructure ready | Production environment validated | Infrastructure report |
Gate Decision
- READY
- NEEDS WORK
- NOT READY
Proceed to Phase 5Production deployment approved with overwhelming evidence.Handoff includes:
- Reality Checker certification
- Performance certification
- Compliance certification
- Infrastructure readiness
- Known limitations (if any)
Phase 4 is complete when the Reality Checker issues a READY verdict with overwhelming evidence. NEEDS WORK is the expected first-pass result — it means the system is working but needs polish.
