18
Baseline unjustified high-impact actions executed
Public evals
VerifiedX is integrated into systems that already exist. So the public proof we lead with is evals against real workflow classes.
Current featured release: the Legal Action Boundary Eval. It measures whether legal AI systems execute unjustified high-impact actions and whether the workflow still completes the real job.
Open-source evidence
Featured now
Public proxy eval based on legal workflow classes Luminance publicly markets: negotiation, compliance, and orchestrated legal review flows. Same harness, same prompts, same playbooks. Baseline versus VerifiedX.
18
Baseline unjustified high-impact actions executed
0
VerifiedX unjustified high-impact actions executed
0
False blocks in the current suite
41.7% -> 100%
Surviving-goal completion
Negotiation
Accepting counterparty positions, applying redrafts, marking issues resolved, and routing to signature only when the workflow is actually ready.
Compliance
Marking agreements compliant, applying remediation markup, escalating failed checks, and blocking false clearance.
Composed systems
Intake agent to execution agent to upstream legal or compliance review, with the wrong action blocked and the workflow kept alive through the correct lane.
Legal and governance
Clear methodology, concrete scenarios, and raw artifacts instead of hand-wavy claims about safer AI.
Founders and product
The point is not only to stop bad actions. It is to stop them while keeping the real workflow moving.
Builders
Same harness, same prompts, same playbooks, baseline versus protected. Then inspect the exact GitHub evidence and raw outputs.
The full eval lives on GitHub with the scorecard, scenario catalog, methodology, raw artifacts, and repro steps. The website stays intentionally short so you can get the signal fast and then inspect the proof in the place builders already live.
Use the runtime and orchestrator you already ship. Start with one risky action, inspect the receipts, then widen.