How we verify

Methodology

The grading pipeline is the central editorial instrument of the platform. Every priority decision is traceable to a heuristic, and every heuristic carries a plain-language description.

A case is a folder of documents the journalist has uploaded. The agent reads every document, assigns it to a topic, generates a graded signal for each topic and each document, and rolls the most severe signal up to the case. The journalist reads the case fastest first: highest severity on top, the rest in order below.

Topics, heuristics, and priority are the three primitives. Nothing else in the system carries editorial weight.

Topics

A topic is a reusable theme the agent infers from the submitted documents (for example, "Procurement," "Communications messaging," "Sanctions evasion"). Each document belongs to exactly one topic. When a new case shares a theme with an older case, the agent assigns it to the existing topic; when no existing topic fits, the agent names a new one.

Topic identity is global across the platform, so two independent leaks about the same procurement scheme will land in the same topic and become directly comparable.

Heuristics

A heuristic is a named, graded signal the agent emits while reading a document. Each heuristic has three fields: a name (for example, sensitivity, claim_supported), a rating in high / medium / low, and a description that explains in plain language how the agent reached that grade.

The heuristic set is open: the agent generates the names it needs for the document in front of it. We do not maintain a closed taxonomy. The benefit is fidelity to the actual evidence; the cost is that two cases will not always have identical heuristic sets, and cross-case comparison happens through topics rather than through fixed metrics.

Examples the agent has emitted in production:

consistency

Solid

All four submissions agree on the figures and the timeline.

references

Sourced

Two independent FOIA returns and one ministry memo support the same fact pattern.

emotive

Even

Language is neutral and procedural. No charged adjectives.

ideology

Neutral

No partisan framing detected. The narrative reads as descriptive.

sensitivity

Severe

Highest sensitivity level in this topic is 3.

claim_supported

Partial

Atlas lacked a purchase order. Evidence: no PO on file.

Sensitivity and topic priority

Every topic carries a sensitivity level on a four-point scale. Sensitivity is derived from the highest-sensitivity heuristic the agent returned for any document in that topic, for that case. The mapping to a priority rating is fixed:

Level 1

Low Priority

Background. Useful context, rarely the next action.

Level 2

Medium Priority

Worth a look. In the queue, not at the top.

Levels 3 and 4

High Priority

Read first. Strongest evidence in the case.

Case priority

A case takes its priority from its most severe topic. When the cases list orders rows on the dashboard, it sorts by case priority first (high, then medium, then low) and then by recency. The case that lands at the top of the list is the next thing the journalist should read.

The three priority levels

High Priority

Strongest evidence in the case. Read first; verify; consider escalation.

Medium Priority

Worth a look. In the queue once the highs are clear.

Low Priority

Background. Context for the rest, rarely the next action.

What the grading does not measure

The grading is a measurement of evidence the agent can see in the documents the journalist uploaded. It is not a probability of truth. A topic graded high means the agent found heuristics of high severity in the documents that landed in it; it does not mean the underlying allegation is more or less likely to be true.

The grading also does not measure legal admissibility, public interest, or the editorial gravity of a finding. Those judgments are made by reporters and editors after the grading is computed.

Heuristics also do not weight against each other inside a topic. Two medium-rated heuristics do not become a high. The severity of the topic is the maximum severity it contains, full stop. This avoids the false confidence of arithmetic over signals that were never numeric to begin with.

The model

The agent is a language model (currently Maple) that runs against a structured prompt. Its job is to read the document, assign it to a topic, and emit graded heuristics. It does not narrate, summarize, or recommend. It is a working part of the grading pipeline, not a contributor to the editorial product.

The model is also replaceable. If a stronger or more specialized model becomes available, the grading layer is swapped without changes to the heuristic schema, the sensitivity scale, or the dashboard.

The grading pipeline is open to inspection. We invite security researchers, journalists, and adversarial red-teamers to audit, critique, and propose improvements.