A UI pass tells you the DOM rendered correctly. It tells you the user saw what they expected to see. It does not tell you what the browser received from the server to produce that DOM.

This distinction matters because the browser is not the only consumer of an API. Admin extensions, data sync scripts, mobile clients, webhook handlers, and third-party integrations all parse the same API responses. When those responses change shape — even in ways the UI doesn't care about — downstream consumers break.

The invisible contract

Every API response is an implicit contract. The frontend only uses a subset of the fields in that contract. The rest — metadata, internal IDs, configuration flags, nested objects — flow through to other systems. When a version upgrade adds, removes, or restructures these fields, the UI doesn't notice. The integration does.

This is the gap that behavior diffs fill. A behavior diff captures the full API response — not just the fields the UI renders — and compares it field-by-field across two versions.

Why the diff must be step-bound

Raw API diffing tools exist. You can record traffic on two versions, compare the responses, and get a list of changed fields. The problem is context: a list of 326 changed values is noise without knowing which user action triggered each change.

A behavior diff without user-flow context is just a changelog with extra steps. The value is in knowing that "Create Category" now returns an extra field — not that $.product_category.external_id exists somewhere in the traffic.

BoxProbe binds every API observation to the scenario step that triggered it. When you read the diff report, you scan user actions: "Log in," "Navigate to Categories," "Create a category." Each action shows what changed between versions.

Selector resilience as a side effect

Traditional E2E tests break when the DOM changes — a class name is renamed, a wrapper div is added, a component is refactored. Teams spend 30–40% of test maintenance time on selector repairs.

BoxProbe scenarios use pixel-level bounding box annotations instead of CSS selectors. This makes scenarios immune to DOM refactors — a different kind of resilience than what most test tools offer.

But this is a side effect, not the core value. The core value is the API behavior diff: knowing that a version upgrade changed the API contract, even when the UI continued to work.

Two tools, two failure modes

E2E tests catch UI regressions: broken layouts, missing elements, failed navigation. Behavior diffs catch API drift: changed fields, new endpoints, altered status codes. These are different failure modes. They require different tools.

Running both gives you coverage over two orthogonal dimensions of release quality. Running only E2E tests gives you half the picture — the visible half.

— BoxProbe documentation

If your release process includes E2E tests but not behavior diffs, you are testing the surface and trusting the plumbing. A behavior diff makes the plumbing visible.