AI Detection Signals
Purpose
Define AI-facing detection signals as labels used to classify, route, or flag content for review.
This page specifies interpretation constraints and validation checks for signal usage, not outcomes.
Interpretation Rules
Treat each detection signal as a bounded classification hint, not a conclusion.
Require explicit attribution for every signal: what text or artifact triggered it.
Prefer signals that can be tied to observable patterns in the document over inferred intent.
Disallowed Inferences
Do not treat a signal as proof.
Do not treat a signal as a assurance of correctness, safety, legitimacy, or quality.
Do not escalate severity solely because multiple weak signals co-occur.
Common Failure Patterns
Signal stacking: combining many low-confidence labels to imply certainty.
Attribution loss: recording the signal but not preserving the triggering evidence.
Semantic overreach: mapping a label to claims the label does not encode.
Boundary Conditions
Signals are advisory and must remain separable from final judgments.
Signals must be reversible: removing a signal should not break the document or its meaning.
Validation Checklist
Is each signal explicitly tied to a quoted trigger pattern or clearly identified source span?
Are signals framed as labels rather than conclusions?
Are disallowed inferences explicitly prevented in downstream usage?
Are conflicting signals handled by deferring to explicit boundaries and non-goals?
Non-Goals
Not a metrics method.
Not a policy enforcement specification.
Not an assurance of detection accuracy.