Method

UI-driven API regression testing

A test method that catches one specific bug class: API behavior drift that survives your existing tests because it only manifests through real UI interaction. This page explains the method — where it fits, what it costs, what it doesn't cover.

What it is

Three properties

Recording-based

A test scenario is a recorded sequence of UI actions (click, fill, navigate) with pixel-anchored element locators. Recording happens once. The recording is reviewable as plain code in your repo, not stored in an opaque database.

Deterministic replay

Same scenario, same app, same trace. Pixel coordinates resolve to elements by direct math, not selector guessing. No LLM judgement in the hot path. Replays produce bit-identical recordings of the API traffic the browser generated.

Cross-version diff

Run the same scenarios against version A, then version B. Compare the recorded API traffic. Surface structural, value, status, and endpoint-existence differences. The output is what changed in API behavior between releases, grouped by the user action that triggered each call.

Where it fits

Next to E2E, contract testing, and unit tests

Existing test layers each cover a specific question. UI-driven API regression is the layer that catches what falls between them.

Layer	What it tests	Blind spot
Unit tests	Internal functions behave correctly in isolation	Integration effects, real API behavior
Contract tests	API responses match a declared schema (OpenAPI, JSON Schema)	Drift in fields the schema doesn't document — common with intentional additions, undocumented metadata
E2E (Playwright / Cypress)	UI flows work end-to-end — clicks render, screens load, no crashes	API responses the UI doesn't visually render but downstream consumers depend on
UI-driven API regression	API behavior under real UI interaction, compared cross-version	Behavior at endpoints the UI doesn't reach; load behavior; security/auth flow vulnerabilities

Versus alternatives

Three near-neighbors, three different trade-offs

vs. Contract testing (Pact, Schemathesis, Bruno)

Contract tests need a maintained schema. They catch breakages against the spec. UI-driven API regression catches drift the spec doesn't document — new fields, value pattern changes, status code shifts that happened "intentionally" but were never recorded as breaking changes. Complement, not replace: contract testing for documented surface, UI-driven for the actual observed behavior.

vs. AI-generated tests (Browser Use, Cursor for tests)

AI-generated tests use an LLM at runtime to figure out what to click and what to assert. Cost scales per run, results vary between runs, debugging stochastic failures is painful. UI-driven API regression uses AI at recording time to help annotate — not at execution time. Once recorded, the test is deterministic Python, $0 per run, same result every run.

vs. Manual regression testing

Manual regression catches what humans visually notice. It misses field-level API changes that don't surface in the UI, and it doesn't scale per release. UI-driven API regression captures what the UI causes the API to do — tracking every request and response — even if the result is invisible on screen.

What it doesn't cover

Narrow on purpose

UI-driven API regression is one method, not a complete testing strategy. Use it alongside other layers, not as a replacement.

Endpoints the UI doesn't reach — backend cron jobs, admin tools never used by the UI, internal-only APIs. Use service-level integration tests.
Load / performance under concurrency — one scenario simulates one user. Use load testing tools (k6, Locust) for capacity.
Authorization / security boundaries — deterministic replay against a single user role doesn't probe RBAC. Use authorization-specific tests.
UI logic correctness — whether buttons disable correctly, error messages render right. Use E2E or component tests.
Unit-level logic — obviously.

Application patterns

Two ways the method gets used

The method is a runner + a diff. The deployment shape on top of it splits into two patterns:

→

Release regression testing

One-shot baseline-vs-target on release. Native scout — a CLI in your existing CI pipeline. Attach the diff report to a PR or release note.

→

Drift monitoring over time

Scheduled scout runs plus an analysis layer. Builds drift timelines, schema-change views, endpoint inventories. Scout itself doesn't include a scheduler or dashboard — you assemble (or contract) it.

Try the method

In two clones

pip install boxprobe-scout
playwright install chromium

git clone https://github.com/boxprobe/scout-medusa
cd scout-medusa
bash compose/start.sh   # 5-10 min first time

cd admin
scout run scenarios/ --web-version 2.13.6
scout run scenarios/ --web-version 2.14.0 \
    --web-base-url http://localhost:29000/app \
    --api-base-url http://localhost:29000
scout diff <baseline-id> <target-id>

See the Medusa case About Scout (the runner) scout-medusa repo