A test method that catches one specific bug class: API behavior drift that survives your existing tests because it only manifests through real UI interaction. This page explains the method — where it fits, what it costs, what it doesn't cover.
A test scenario is a recorded sequence of UI actions (click, fill, navigate) with pixel-anchored element locators. Recording happens once. The recording is reviewable as plain code in your repo, not stored in an opaque database.
Same scenario, same app, same trace. Pixel coordinates resolve to elements by direct math, not selector guessing. No LLM judgement in the hot path. Replays produce bit-identical recordings of the API traffic the browser generated.
Run the same scenarios against version A, then version B. Compare the recorded API traffic. Surface structural, value, status, and endpoint-existence differences. The output is what changed in API behavior between releases, grouped by the user action that triggered each call.
Existing test layers each cover a specific question. UI-driven API regression is the layer that catches what falls between them.
| Layer | What it tests | Blind spot |
|---|---|---|
| Unit tests | Internal functions behave correctly in isolation | Integration effects, real API behavior |
| Contract tests | API responses match a declared schema (OpenAPI, JSON Schema) | Drift in fields the schema doesn't document — common with intentional additions, undocumented metadata |
| E2E (Playwright / Cypress) | UI flows work end-to-end — clicks render, screens load, no crashes | API responses the UI doesn't visually render but downstream consumers depend on |
| UI-driven API regression | API behavior under real UI interaction, compared cross-version | Behavior at endpoints the UI doesn't reach; load behavior; security/auth flow vulnerabilities |
Contract tests need a maintained schema. They catch breakages against the spec. UI-driven API regression catches drift the spec doesn't document — new fields, value pattern changes, status code shifts that happened "intentionally" but were never recorded as breaking changes. Complement, not replace: contract testing for documented surface, UI-driven for the actual observed behavior.
AI-generated tests use an LLM at runtime to figure out what to click and what to assert. Cost scales per run, results vary between runs, debugging stochastic failures is painful. UI-driven API regression uses AI at recording time to help annotate — not at execution time. Once recorded, the test is deterministic Python, $0 per run, same result every run.
Manual regression catches what humans visually notice. It misses field-level API changes that don't surface in the UI, and it doesn't scale per release. UI-driven API regression captures what the UI causes the API to do — tracking every request and response — even if the result is invisible on screen.
UI-driven API regression is one method, not a complete testing strategy. Use it alongside other layers, not as a replacement.
The method is a runner + a diff. The deployment shape on top of it splits into two patterns:
One-shot baseline-vs-target on release. Native scout — a CLI in your existing CI pipeline. Attach the diff report to a PR or release note.
Scheduled scout runs plus an analysis layer. Builds drift timelines, schema-change views, endpoint inventories. Scout itself doesn't include a scheduler or dashboard — you assemble (or contract) it.
pip install boxprobe-scout
playwright install chromium
git clone https://github.com/boxprobe/scout-medusa
cd scout-medusa
bash compose/start.sh # 5-10 min first time
cd admin
scout run scenarios/ --web-version 2.13.6
scout run scenarios/ --web-version 2.14.0 \
--web-base-url http://localhost:29000/app \
--api-base-url http://localhost:29000
scout diff <baseline-id> <target-id>