BoxProbe

Release regression testing

The native use of scout: run the same scenarios against baseline and target, get an HTML diff report. One-shot, deterministic, attached to a PR or release. No scheduler, no dashboard — just a CLI in your existing pipeline.

01

Pre-release validation

Before tagging a release, run scout against the candidate vs the last shipped version. Diff report attached to the release PR shows reviewers exactly what API behavior changed under real UI use.

02

Upgrade certification

Upgrading a dependency, framework, or infrastructure component. Run scout against pre-upgrade and post-upgrade environments. Surface any behavior drift the upgrade introduced.

03

Drift review on demand

Suspicious bug report from a downstream integration. Run scout against the current production version and a known-good past version. Either confirm or rule out API-shape changes as the cause.

Four steps, no plumbing
1

Have scenarios

Pixel-anchored Python scenarios in your repo. Hand-written or generated by recording tooling.

2

Run against baseline

scout run scenarios/ --web-version $BASELINE drives the UI, records every API call.

3

Run against target

Same scenarios, new version. Deterministic — same Locators, same trace.

4

Diff

scout diff <baseline-id> <target-id> produces the HTML report. Attach to your PR or release notes.

Two commands, any pipeline

scout is a CLI. Whatever you already use — GitHub Actions, GitLab, Forgejo, CircleCI, Jenkins — drop in two run steps.

- run: pip install boxprobe-scout
- run: playwright install chromium
- run: scout run scenarios/ --web-version $BASELINE_VERSION
- run: scout run scenarios/ --web-version $TARGET_VERSION
- run: scout diff $BASELINE_RUN_ID $TARGET_RUN_ID

JUnit XML output alongside the HTML for status-only integrations. HTML diff report uploaded as a build artifact — reviewers click through to inspect findings. No data leaves the runner.

What scout provides natively (and what it doesn't)

Provided

  • Scenario execution via Playwright
  • Recording proxy (mitmproxy child process)
  • Cross-version diff with category breakdown
  • HTML diff report + JUnit XML
  • Local SQLite run index
  • Noise suppression via diff_ignore.json

Not provided — bring your own

  • The scenarios themselves (write or generate)
  • Baseline and target environments (your infra)
  • CI pipeline (your existing GHA / GitLab / etc.)
  • Long-term storage of historical runs
  • Scheduled / monitoring-style runs (see monitoring use case)
  • Alerting / dashboards
A worked Medusa example

15 scenarios across two pinned Medusa releases, full reproduce-locally instructions.

Open the Medusa case scout-medusa repo About Scout (the runner)

Want this for your application?

Send a URL and a few user flows. We'll record scout-runnable scenarios into your repo and run them against a baseline + target you specify.

Get a Free Sample Report