Scout runs real UI scenarios against two versions of an app, captures the browser-side API traffic, and produces deterministic diff reports. No hosted SaaS, no account, no telemetry.
Run the same scenario against baseline and target — pip-installed CLI, no service.
Replay real UI flows and capture browser-side API traffic deterministically.
Review a static HTML diff report you can attach to a PR or release note.
API behavior can drift while your E2E suite stays green. UI-driven API regression testing sits in the overlap your current pipeline doesn't cover.
Click flows pass, screens render, no crashes. But the UI doesn't read every response field — it can stay green while the API beneath quietly changes.
OpenAPI or schema validation. Only as accurate as the spec, and most teams under-document fields that are added intentionally. Drift in undocumented fields is invisible.
The missing layer. Drive real user flows, capture what the browser actually requests and what the API actually responds with, diff between versions. scout fills this layer.
A minor version upgrade keeps the UI green while browser-side API behavior changes underneath. Fully reproducible from scout-medusa.
POST /admin/product-categories response added external_id: string | null. The UI doesn't render it. No E2E test catches it. Any integration parsing this response encounters an unexpected field.
Anything that touches your CI and records your API traffic should be auditable before it runs. scout is MIT-licensed, has no telemetry, doesn't phone home, and works against a locally running app behind your firewall.
Every Python file is reviewable. The recording proxy is a child process on localhost. Nothing leaves the machine running scout. No account required.
Pixel-anchored locators make element resolution pure math. No LLM in the hot path. No per-request fees. Run on every PR or nightly without finance asking.
scout catches one specific class of bug: API behavior drift that survives your existing tests because it only manifests through real UI interaction.
Recording produces Python scenario files. scout executes them deterministically against any version and diffs the API traffic. Hand-write scenarios or generate them with any recording tool that targets scout's file format.
A scenario is plain Python — pixel-anchored Locators describing what to click, fill, and wait for. Hand-written or generated by recording tooling (BoxProbe uses Argus internally).
scout run scenarios/ drives the browser through every scenario against your current version. Every browser-side API request and response is captured.
Same scenarios, new version. Deterministic — same Locators, same inputs, same trace. No LLM judgement at runtime.
scout diff produces an HTML report grouping changes by endpoint and user flow. Filter by category. Attach to PRs. Decide what to fix and what to ignore.
scout is a CLI. Drop it into whatever workflow you already have — GitHub Actions, GitLab CI, Forgejo Actions, Jenkins, your own runner. No plugin to install, no service to subscribe to.
- run: pip install boxprobe-scout
- run: playwright install chromium
- run: scout run scenarios/ --web-version $BASELINE_VERSION
- run: scout run scenarios/ --web-version $TARGET_VERSION
- run: scout diff $BASELINE_RUN_ID $TARGET_RUN_ID JUnit XML output for CI integrations that expect it; HTML diff report as a build artifact. No data ever leaves the runner.
scout-medusa is the first reproducible demo. Below are projects we're evaluating for v2 case studies. Each must have an active community, accessible admin or storefront UI, and a release cadence worth diffing.
Active CMS, Node-based, public Docker images. Admin panel has clear release-to-release deltas.
Open-source scheduling, fast release cadence, Playwright-heavy existing test culture.
Publishing platform, stable admin UI, clear API surface for integrations.
Analytics platform, complex admin flows, sensitive to schema drift downstream.
Suggesting a target? Email us — especially if it's a project you maintain.