Scoring Methodology

How SLOPE scores AI coding sprints.

SLOPE uses a par-based scoring system inspired by golf to quantify AI-assisted development sprints. Every sprint gets a scorecard. Over time, scorecards reveal patterns — which types of tasks get over-scoped, where hazards cluster, and how estimation accuracy trends.

Par-Based Scoring

Each sprint is assigned a par value — the expected number of tickets based on complexity and track record. A sprint with 3 planned tickets is par 3. A sprint with 5 is par 5.

The score is the actual number of tickets delivered (adjusted for hazards). If you finish a par 4 sprint in 3 strokes, that's a birdie (-1). If it takes 5, that's a bogey (+1).

Label Relative to Par Meaning
Eagle-2Delivered well under expected effort
Birdie-1Delivered under expected effort
Par0Delivered as planned
Bogey+1Took more effort than planned
Double Bogey++2 or moreSignificantly over-scoped

Slope is the difficulty modifier (0-3). Higher slope means more unknowns, dependencies, or new territory. It adjusts expectations without changing the par.

Handicap Calculation

The handicap is a rolling performance index that tracks improvement over time. Lower is better. It is computed from configurable windows (last 5, last 10, and all-time).

Inputs

Estimation Accuracy (Fairway %)

How often sprint scoping is on target — tickets delivered as planned, without scope changes.

Delivery Accuracy (GIR %)

How often estimation is accurate — actual score at or under par.

Rework Cycles (Putts)

Revision cycles after landing on target — how much polishing was needed after the approach was correct.

Serious Hazards (Penalties)

Breaking changes, scope creep, data loss — hazards severe enough to warrant penalty strokes.

A handicap of 0.0 means consistent delivery at or under par. As it rises, the miss pattern (long/short/left/right) reveals where the drift is coming from — over-engineering, under-scoping, dependency blocks, or tech debt accumulation.

Scorecard Structure

Every sprint produces a JSON scorecard in docs/retros/. Each scorecard records:

  • -Sprint number, theme, date, par, slope, score, and score label
  • -Per-ticket breakdown: ticket key, title, club (complexity), shot result, hazards encountered, and notes
  • -Aggregate stats: fairways hit, greens in regulation, putts, penalties, hazard penalties
  • -Optional metadata: tests before/after, findings, review notes

Scorecards are the source of truth for everything else — briefings, handicap cards, and trend analysis all derive from the scorecard history.

Guard System

SLOPE includes 29 guard hooks that run during coding sessions. Guards inject context, warnings, and blocks into your agent's workflow in real time. They are organized by enforcement type:

Mechanical Guards

Block actions until conditions are met — e.g., can't commit without tests, can't close a sprint without a review.

Advisory Guards

Inject context and suggestions without blocking — e.g., "you hit this same bug in Sprint 12, here's how you fixed it."

Guard state persists to disk in .slope/guard-state/, surviving context window compression. The common issues database tracks recurring hazards across sprints — when a pattern appears again, the agent gets warned before it starts.

The Feedback Loop

SLOPE's core value proposition is the compounding feedback loop:

Score → Review → Learn → Improve

Each sprint produces a scorecard. The scorecard feeds the handicap card. The handicap card feeds the next sprint's briefing. The briefing warns the agent about recurring hazards before it starts coding. Over time, the same mistakes stop appearing because the agent has structured memory of what went wrong.

In the SLOPE reference implementation, this loop has run for 69+ sprints with 97% estimation accuracy and a handicap of 0.2. Every scorecard is public.