Use cases · Testing

Release regression testing

The native use of scout: run the same scenarios against baseline and target, get an HTML diff report. One-shot, deterministic, attached to a PR or release. No scheduler, no dashboard — just a CLI in your existing pipeline.

When to use this pattern

Pre-release validation

Before tagging a release, run scout against the candidate vs the last shipped version. Diff report attached to the release PR shows reviewers exactly what API behavior changed under real UI use.

Upgrade certification

Upgrading a dependency, framework, or infrastructure component. Run scout against pre-upgrade and post-upgrade environments. Surface any behavior drift the upgrade introduced.

Drift review on demand

Suspicious bug report from a downstream integration. Run scout against the current production version and a known-good past version. Either confirm or rule out API-shape changes as the cause.

How it works

Four steps, no plumbing

Have scenarios

Pixel-anchored Python scenarios in your repo. Hand-written or generated by recording tooling.

Run against baseline

scout run scenarios/ --web-version $BASELINE drives the UI, records every API call.

Run against target

Same scenarios, new version. Deterministic — same Locators, same trace.

Diff

scout diff <baseline-id> <target-id> produces the HTML report. Attach to your PR or release notes.

In your CI

Two commands, any pipeline

scout is a CLI. Whatever you already use — GitHub Actions, GitLab, Forgejo, CircleCI, Jenkins — drop in two run steps.

- run: pip install boxprobe-scout
- run: playwright install chromium
- run: scout run scenarios/ --web-version $BASELINE_VERSION
- run: scout run scenarios/ --web-version $TARGET_VERSION
- run: scout diff $BASELINE_RUN_ID $TARGET_RUN_ID

JUnit XML output alongside the HTML for status-only integrations. HTML diff report uploaded as a build artifact — reviewers click through to inspect findings. No data leaves the runner.

Scope

What scout provides natively (and what it doesn't)

Provided

Scenario execution via Playwright
Recording proxy (mitmproxy child process)
Cross-version diff with category breakdown
HTML diff report + JUnit XML
Local SQLite run index
Noise suppression via diff_ignore.json

Not provided — bring your own

The scenarios themselves (write or generate)
Baseline and target environments (your infra)
CI pipeline (your existing GHA / GitLab / etc.)
Long-term storage of historical runs
Scheduled / monitoring-style runs (see monitoring use case)
Alerting / dashboards

See it in action

A worked Medusa example

15 scenarios across two pinned Medusa releases, full reproduce-locally instructions.

Open the Medusa case scout-medusa repo About Scout (the runner)