Evidence normalisation

Evidence arrives in many forms. Two organisations publishing the same fact may use different key ordering, different encoding, different whitespace, different locale conventions for numbers and dates. Without normalisation, the same semantic content produces two different cryptographic hashes — defeating reproducible verification.

AIEP’s deterministic normalisation engine (P10) solves this by converting all evidence to a canonical form before hashing, indexing, or cross-node processing. The key property: two independent normalisers processing semantically equivalent source material must produce bit-identical output.

The problem with non-deterministic normalisation

Issue	What goes wrong
Key ordering variance	`{"a":1,"b":2}` and `{"b":2,"a":1}` produce different hashes despite being identical
Whitespace variance	Additional spaces, newlines, or indentation alter the hash
Encoding variance	UTF-8 vs UTF-16, BOM presence, normalisation form (NFC vs NFD)
Locale variance	`1,234.56` vs `1.234,56` for the same number
Timestamp variance	`2026-03-05T09:00:00Z` vs `2026-03-05T09:00:00+00:00`
Container metadata	Spreadsheet internal metadata, PDF creator fields, EXIF data

Any of these produce a different hash from semantically identical content. Non-deterministic normalisation makes distributed verification impossible — two nodes independently hashing the same evidence produce different identifiers.

The NormalisationProfile

The normalisation engine operates under a version-bound NormalisationProfile — a declared, schema-pinned set of rules that fully specifies how any input of a given type is to be normalised.

A NormalisationProfile declares:

Rule set	What it specifies
Parsing rules	How to parse each supported input type
Encoding rules	Target encoding (UTF-8 NFC, no BOM) and encoding detection logic
Locale rules	Canonical number, date, and currency representation
Key ordering rules	Lexicographic stable sort of all object keys
Whitespace rules	Elimination of insignificant whitespace
Container metadata stripping	Which metadata fields to strip from container formats
Lossless constraints	Which transformations are permitted — lossy transformations are rejected

The profile is versioned and pinned. A normaliser running ProfileVersion 1.0.3 and another normaliser also running 1.0.3 will produce identical output from identical input — regardless of operating system, locale, or hardware.

The normalisation pipeline

Input object
    → InputType detection (declared type + signature classification)
    → NormalisationProfile selection (version-bound)
    → Deterministic parsing under profile rules
    → CanonicalForm generation
    → CanonicalHash = SHA-256(CanonicalForm || ProfileVersionId)
    → NormalisationManifest generation
    → Acceptance or fail-closed rejection

The CanonicalHash formula:

$$\text{CanonicalHash} = H(\text{CanonicalForm} | \text{ProfileVersionId})$$

The profile version is included in the hash input. A document normalised under profile 1.0.2 produces a different hash from the same document normalised under 1.0.3 — even if the canonical forms are identical. This makes profile version changes detectable.

Fail-closed rejection

The normalisation engine does not attempt to recover from ambiguous inputs. If a transformation cannot be performed deterministically under the declared profile — if it would be lossy, or if it depends on parameters not declared in the profile — the upload is rejected and a deterministic rejection record is generated.

The rejection record identifies:

Which constraint was violated
At which transformation step
The input type and profile version in force

This record is itself appended to the evidence ledger. The rejection is auditable — you can see not just what evidence was accepted but what was refused and why.

NormalisationManifest

Every accepted normalisation produces a NormalisationManifest that binds:

Field	Contents
`inputType`	Detected and declared input type
`profileVersionId`	The NormalisationProfile version applied
`transformationSteps`	Ordered list of transformation step identifiers
`canonicalHash`	SHA-256 of the canonical form under the profile
`timestamp`	Recorded as data, not as system time dependency

The manifest is itself hash-bound and appended to the evidence ledger. This means the normalisation process itself is auditable — any node can replay the transformation and verify that the same manifest would result.

Cross-jurisdiction normalisation (P17)

Evidence originating in different jurisdictions may use jurisdiction-specific conventions — legal date formats, currency representations, statutory reference formats, regulatory identifier schemes. P17 extends the NormalisationProfile model to multi-jurisdiction normalisation profiles.

A multi-jurisdiction profile declares:

The source jurisdiction
The target canonical representation
Jurisdiction-specific parsing rules for known format variants
A deterministic mapping from jurisdiction-specific identifiers to canonical identifiers

This enables evidence from a UK regulatory filing and a US regulatory filing describing the same underlying fact to be normalised to the same canonical form — and therefore to the same hash — enabling cross-jurisdiction comparison.

Deterministic replay

Any normalisation can be replayed deterministically. Given:

The original input object
The NormalisationManifest (which records the profile version and transformation steps)

Any node can re-run the normalisation and verify that the resulting CanonicalHash matches the manifest. This is the basis of cross-node verification in distributed AIEP deployments.

Constitutional substrate · Divergence · Protocol · Patents — P10, P17