BoxProbe

UI-driven API regression testing,
open source and reproducible.

Scout runs real UI scenarios against two versions of an app, captures the browser-side API traffic, and produces deterministic diff reports. No hosted SaaS, no account, no telemetry.

Run the same scenario against baseline and target — pip-installed CLI, no service.

Replay real UI flows and capture browser-side API traffic deterministically.

Review a static HTML diff report you can attach to a PR or release note.

Three test layers, one blind spot

API behavior can drift while your E2E suite stays green. UI-driven API regression testing sits in the overlap your current pipeline doesn't cover.

E2E

Tests that the UI works

Click flows pass, screens render, no crashes. But the UI doesn't read every response field — it can stay green while the API beneath quietly changes.

Contract

Tests the spec the API claims

OpenAPI or schema validation. Only as accurate as the spec, and most teams under-document fields that are added intentionally. Drift in undocumented fields is invisible.

UI→API

Records what the UI actually triggers

The missing layer. Drive real user flows, capture what the browser actually requests and what the API actually responds with, diff between versions. scout fills this layer.

Medusa Admin 2.13.6 → 2.14.0

A minor version upgrade keeps the UI green while browser-side API behavior changes underneath. Fully reproducible from scout-medusa.

Scout is the runner. We open-sourced it on purpose.

Anything that touches your CI and records your API traffic should be auditable before it runs. scout is MIT-licensed, has no telemetry, doesn't phone home, and works against a locally running app behind your firewall.

01

Read the source

Every Python file is reviewable. The recording proxy is a child process on localhost. Nothing leaves the machine running scout. No account required.

02

Deterministic + free to run

Pixel-anchored locators make element resolution pure math. No LLM in the hot path. No per-request fees. Run on every PR or nightly without finance asking.

03

Narrow on purpose

scout catches one specific class of bug: API behavior drift that survives your existing tests because it only manifests through real UI interaction.

About Scout github.com/boxprobe/scout PyPI Docs
Recorded scenarios in, diff reports out

Recording produces Python scenario files. scout executes them deterministically against any version and diffs the API traffic. Hand-write scenarios or generate them with any recording tool that targets scout's file format.

1

Capture a scenario

A scenario is plain Python — pixel-anchored Locators describing what to click, fill, and wait for. Hand-written or generated by recording tooling (BoxProbe uses Argus internally).

2

Run against baseline

scout run scenarios/ drives the browser through every scenario against your current version. Every browser-side API request and response is captured.

3

Run against target

Same scenarios, new version. Deterministic — same Locators, same inputs, same trace. No LLM judgement at runtime.

4

Diff and review

scout diff produces an HTML report grouping changes by endpoint and user flow. Filter by category. Attach to PRs. Decide what to fix and what to ignore.

Two commands, any pipeline

scout is a CLI. Drop it into whatever workflow you already have — GitHub Actions, GitLab CI, Forgejo Actions, Jenkins, your own runner. No plugin to install, no service to subscribe to.

- run: pip install boxprobe-scout
- run: playwright install chromium

- run: scout run scenarios/ --web-version $BASELINE_VERSION
- run: scout run scenarios/ --web-version $TARGET_VERSION
- run: scout diff $BASELINE_RUN_ID $TARGET_RUN_ID

JUnit XML output for CI integrations that expect it; HTML diff report as a build artifact. No data ever leaves the runner.

Projects under evaluation

scout-medusa is the first reproducible demo. Below are projects we're evaluating for v2 case studies. Each must have an active community, accessible admin or storefront UI, and a release cadence worth diffing.

v2

Strapi

Active CMS, Node-based, public Docker images. Admin panel has clear release-to-release deltas.

v2

Cal.com

Open-source scheduling, fast release cadence, Playwright-heavy existing test culture.

v2

Ghost

Publishing platform, stable admin UI, clear API surface for integrations.

v2

PostHog

Analytics platform, complex admin flows, sensitive to schema drift downstream.

Suggesting a target? Email us — especially if it's a project you maintain.

Latest writing
Why Green E2E Tests Can Still Miss API Drift May 2026 Why UI Flows Need API Behavior Diffs May 2026 AI-Generated Tests vs Deterministic Execution May 2026

Want this report for your flows?

Submit a URL and a user scenario. We'll send you a sample report within 3 business days.

Get a Free Sample Report