Scoring Methodology
How SLOPE scores AI coding sprints.
SLOPE uses a par-based scoring system inspired by golf to quantify AI-assisted development sprints. Every sprint gets a scorecard. Over time, scorecards reveal patterns — which types of tasks get over-scoped, where hazards cluster, and how estimation accuracy trends.
Par-Based Scoring
Each sprint is assigned a par value — the expected number of tickets based on complexity and track record. A sprint with 3 planned tickets is par 3. A sprint with 5 is par 5.
The score is the actual number of tickets delivered (adjusted for hazards). If you finish a par 4 sprint in 3 strokes, that's a birdie (-1). If it takes 5, that's a bogey (+1).
| Label | Relative to Par | Meaning |
|---|---|---|
| Eagle | -2 | Delivered well under expected effort |
| Birdie | -1 | Delivered under expected effort |
| Par | 0 | Delivered as planned |
| Bogey | +1 | Took more effort than planned |
| Double Bogey+ | +2 or more | Significantly over-scoped |
Slope is the difficulty modifier (0-3). Higher slope means more unknowns, dependencies, or new territory. It adjusts expectations without changing the par.
Handicap Calculation
The handicap is a rolling performance index that tracks improvement over time. Lower is better. It is computed from configurable windows (last 5, last 10, and all-time).
Inputs
How often sprint scoping is on target — tickets delivered as planned, without scope changes.
How often estimation is accurate — actual score at or under par.
Revision cycles after landing on target — how much polishing was needed after the approach was correct.
Breaking changes, scope creep, data loss — hazards severe enough to warrant penalty strokes.
A handicap of 0.0 means consistent delivery at or under par. As it rises, the miss pattern (long/short/left/right) reveals where the drift is coming from — over-engineering, under-scoping, dependency blocks, or tech debt accumulation.
Scorecard Structure
Every sprint produces a JSON scorecard in docs/retros/. Each scorecard records:
- -Sprint number, theme, date, par, slope, score, and score label
- -Per-ticket breakdown: ticket key, title, club (complexity), shot result, hazards encountered, and notes
- -Aggregate stats: fairways hit, greens in regulation, putts, penalties, hazard penalties
- -Optional metadata: tests before/after, findings, review notes
Scorecards are the source of truth for everything else — briefings, handicap cards, and trend analysis all derive from the scorecard history.
Guard System
SLOPE includes 29 guard hooks that run during coding sessions. Guards inject context, warnings, and blocks into your agent's workflow in real time. They are organized by enforcement type:
Block actions until conditions are met — e.g., can't commit without tests, can't close a sprint without a review.
Inject context and suggestions without blocking — e.g., "you hit this same bug in Sprint 12, here's how you fixed it."
Guard state persists to disk in .slope/guard-state/, surviving context window compression. The common issues database tracks recurring hazards across sprints — when a pattern appears again, the agent gets warned before it starts.
The Feedback Loop
SLOPE's core value proposition is the compounding feedback loop:
Score → Review → Learn → Improve
Each sprint produces a scorecard. The scorecard feeds the handicap card. The handicap card feeds the next sprint's briefing. The briefing warns the agent about recurring hazards before it starts coding. Over time, the same mistakes stop appearing because the agent has structured memory of what went wrong.
In the SLOPE reference implementation, this loop has run for 69+ sprints with 97% estimation accuracy and a handicap of 0.2. Every scorecard is public.