Skip to content

Harness Atlas

One map of every test, benchmark, and eval harness in the workspace — what each is, which directory it lives in, and where they overlap.
Read the introduction Jump to the overlap map

What it is

A consolidated field guide to the five harness-shaped efforts in this workspace — agentlab, sam-evals, doesitarm-harness, roof-frames, and harness-watch — each mapped to its directory, its mechanism, and its status.

Why it exists

These grew independently across two years and several domains. They share a spine — run a thing → capture evidence → score it deterministically — that’s easy to miss when each lives in its own folder. This is the map that makes the overlap visible.