P10 — AIEP — Deterministic Normalisation Engine for Heterogeneous Data Upload Formats
Publication Date: 2026-02-26 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Ltd Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1
Field of the Invention
[0001] The disclosure relates to deterministic data ingestion and canonicalisation in computing systems.
[0002] More particularly, the disclosure concerns a deterministic normalisation engine that transforms heterogeneous uploaded data into a canonical representation suitable for reproducible hash-binding, indexing, and cross-node processing.
Framework Context
[0003] This disclosure operates within the AIEP framework.
[0004] Implementers may refer to related open disclosures in this series for additional architectural context. The present disclosure is independently implementable as described herein.
Background
[0005] Computing systems commonly accept uploads in multiple heterogeneous formats including structured text, delimited tables, semi-structured documents, and binary containers.
[0006] The same semantic content may be represented with different ordering of keys, whitespace, encodings, container metadata, locale-dependent number formats, or time zone representations.
[0007] Conventional ingestion pipelines often normalise opportunistically but do not guarantee that semantically equivalent uploads deterministically map to a single canonical representation across environments.
[0008] Non-deterministic normalisation produces inconsistent cryptographic identifiers, duplicated index entries, unstable audit trails, and failure of reproducible processing across distributed nodes.
[0009] Existing systems do not provide:
(a) deterministic selection of a version-bound normalisation profile prior to parsing; (b) lossless transformation verification prior to acceptance; (c) fail-closed suppression of upload acceptance when deterministic conditions cannot be satisfied; (d) a reproducible transformation manifest bound to the canonical output by cryptographic hash; or (e) deterministic replay independent of current system time.
[0010] There exists a need for a deterministic normalisation engine that produces canonical serialisation from heterogeneous inputs, records a reproducible transformation manifest, and fail-closes when deterministic conditions cannot be satisfied under a declared profile and rule set.
Summary of the Disclosure
[0011] A computer-implemented deterministic normalisation engine is executed by one or more processors.
[0012] An uploaded input object is received together with an InputType identifier derived from declared type metadata and/or signature classification.
[0013] A version-bound NormalisationProfile is selected, comprising parsing rules, encoding rules, locale rules, canonical ordering rules, container metadata stripping rules, and lossless transformation constraints.
[0014] Processors parse the input object strictly under the NormalisationProfile and generate a CanonicalForm by deterministic serialisation of the parsed representation.
[0015] A CanonicalHash is computed as:
CanonicalHash = H(CanonicalForm || ProfileVersionId)
[0016] A NormalisationManifest is generated binding at least InputType, ProfileVersionId, transformation step identifiers, CanonicalHash, and a timestamp recorded as data.
[0017] If any transformation would be lossy or dependent on non-declared environment parameters, acceptance of the upload is suppressed in a fail-closed manner and a deterministic rejection record is generated identifying the violated constraint.
[0018] The technical effect is modification of computing system operation by enforcing deterministic canonicalisation prior to hash computation and indexing, thereby enabling reproducible identification and processing of heterogeneous inputs across distributed nodes.
Brief Description of the Drawings
[0019] Figure 1 illustrates the deterministic ingestion pipeline from uploaded object through InputType detection, NormalisationProfile selection, deterministic parsing, CanonicalForm generation, and CanonicalHash computation.
[0020] Figure 2 illustrates fail-closed enforcement logic applied at each transformation step, showing the decision gate between deterministic continuation and fail-closed rejection.
[0021] Figure 3 illustrates NormalisationManifest binding architecture showing the cryptographic relationship between CanonicalForm, ProfileVersionId, and the manifest output.
[0022] Figure 4 illustrates deterministic replay verification across distributed nodes operating under identical ProfileVersionId.
ASCII Drawings
Figure 1 — Deterministic Ingestion Pipeline
+-------------------------+
| Uploaded Object |
+----------+--------------+
|
v
+----------+--------------+
| InputType Detection |
| (signature + metadata) |
+----------+--------------+
|
v
+----------+--------------+
| NormalisationProfile |
| Selection |
| (Version-Bound) |
+----------+--------------+
|
v
+----------+--------------+
| Deterministic Parse |
| (Profile Rules Only) |
+----------+--------------+
|
v
+----------+--------------+
| CanonicalForm |
+----------+--------------+
|
v
+----------+--------------+
| CanonicalHash |
| H(CanonicalForm || |
| ProfileVersionId) |
+-------------------------+
Figure 2 — Fail-Closed Enforcement Gate
+---------------------------+
| Transformation Step |
+------------+--------------+
|
v
+----------+----------+
| Lossless under |
| NormalisationProfile?|
+----------+----------+
|
+----------+----------+
No Yes
| |
v v
+-----------+ +-------------+
| REJECT | | Continue |
| Fail- | | Pipeline |
| Closed | +-------------+
| |
| Generate |
| Rejection |
| Record |
+-----------+
Figure 3 — Manifest Binding Architecture
CanonicalForm ----+
\
+--> H(CanonicalForm || ProfileVersionId)
/ |
ProfileVersionId ----+ |
v
+-----------------------------+
| NormalisationManifest |
| - InputType |
| - ProfileVersionId |
| - Parser Identity |
| - Transformation Step IDs |
| - CanonicalHash |
| - Timestamp (stored data) |
+-----------------------------+
|
v
+-----------------------------+
| Append-Only Repository |
+-----------------------------+
Figure 4 — Deterministic Replay Across Distributed Nodes
Node A Node B
+-----------------+ +-----------------+
| Input Object | | Input Object |
| (identical) | | (identical) |
+--------+--------+ +--------+--------+
| |
v v
+--------+--------+ +--------+--------+
| ProfileVersionId| | ProfileVersionId|
| (identical) | | (identical) |
+--------+--------+ +--------+--------+
| |
v v
+--------+--------+ +--------+--------+
| CanonicalForm | | CanonicalForm |
+--------+--------+ +--------+--------+
| |
v v
+--------+--------+ +--------+--------+
| CanonicalHash | | CanonicalHash |
+-----------------+ +-----------------+
| |
+---------------+---------------+
|
v
+--------------------+
| Hash Equivalence |
| Verification |
| (stored timestamps |
| used — does not |
| depend on current |
| system time) |
+--------------------+
|
+------------+------------+
No Yes
| |
v v
+-----------+ +---------------+
| Execution | | Execution |
| Suppressed| | Enabled |
+-----------+ +---------------+
Detailed Description
1. Input Classification and Profile Selection
[0023] Upon receipt of an uploaded input object, one or more processors determine an InputType identifier.
[0024] InputType determination uses declared MIME type metadata, file signature classification, or explicit user-supplied type declaration.
[0025] A NormalisationProfile corresponding to the InputType is retrieved and identified by a ProfileVersionId.
[0026] The NormalisationProfile specifies:
(a) permitted parsers and parser versions; (b) stable key ordering rules for structured data objects; (c) canonical numeric representation rules including precision and decimal separator; (d) encoding normalisation rules including character set and byte order; (e) locale rules governing number formats and date representations; (f) time zone normalisation rules; (g) container metadata stripping rules; and (h) lossless transformation constraints identifying permitted transformations that do not remove semantic content.
[0027] Profile selection is deterministic and schema-version bound. Identical InputType under identical ProfileVersionId produces identical profile selection.
2. Deterministic Parsing and Canonical Serialisation
[0028] Processors parse the input object into an intermediate representation using only parsing rules declared in the selected NormalisationProfile.
[0029] No parsing step may depend on environment parameters not declared within the NormalisationProfile.
[0030] The intermediate representation is serialised into a CanonicalForm using deterministic serialisation rules enforcing stable key ordering and stable numeric representation.
[0031] For delimited inputs, column ordering is determined by a schema mapping rule and values are normalised according to declared locale and precision constraints.
[0032] For document container inputs, non-semantic container metadata is removed and embedded resources are ordered according to deterministic resource ordering rules.
[0033] For time values, normalisation is performed under declared time zone rules and resulting values are stored as canonical data fields within the CanonicalForm.
3. Hash Binding and Manifest Generation
[0034] A CanonicalHash is computed as:
CanonicalHash = H(CanonicalForm || ProfileVersionId)
[0035] Any change to profile version produces a distinct CanonicalHash, ensuring that canonical identity is version-bound.
[0036] A NormalisationManifest is generated comprising at least:
(a) InputType identifier; (b) ProfileVersionId; (c) parser identity; (d) transformation step identifiers in application order; (e) CanonicalHash; and (f) timestamp recorded as a data field within the manifest.
[0037] The CanonicalForm and NormalisationManifest are appended to an append-only repository for later verification and replay.
4. Fail-Closed Acceptance Gate
[0038] Prior to acceptance, processors verify that each applied transformation is lossless under the NormalisationProfile constraints.
[0039] Processors further verify that no transformation step depended on:
(a) undefined locale state; (b) non-declared time zone state; (c) non-deterministic ordering; or (d) lossy conversion not permitted by the profile.
[0040] If any verification fails, processors suppress acceptance of the upload in a fail-closed manner.
[0041] A deterministic rejection record is generated identifying:
(a) the violated constraint; (b) the ProfileVersionId under which violation was detected; and (c) the transformation step identifier at which failure occurred.
[0042] The rejection record is appended to the append-only repository.
5. Deterministic Replay
[0043] Deterministic replay is performed by re-applying the NormalisationProfile to the same input object and verifying that the recomputed CanonicalHash matches the stored value.
[0044] Replay recomputes cryptographic hashes using stored fields, including timestamps recorded as data within ledger entries.
[0045] Replay does not depend on current system time.
[0046] Distributed nodes operating with identical input objects and identical ProfileVersionId produce identical CanonicalForm and identical CanonicalHash values, thereby enabling cross-node equivalence verification.
Claims
-
A computer-implemented method for deterministic normalisation of heterogeneous uploaded data, the method comprising: (a) receiving an input object and determining an InputType identifier from declared metadata or signature classification; (b) selecting a version-bound NormalisationProfile identified by a ProfileVersionId, the NormalisationProfile defining parsing rules, encoding rules, locale rules, canonical ordering rules, and lossless transformation constraints; (c) parsing the input object strictly under the NormalisationProfile and generating a CanonicalForm by deterministic serialisation of the parsed representation; (d) computing a CanonicalHash as H(CanonicalForm || ProfileVersionId); (e) generating a NormalisationManifest binding the InputType identifier, ProfileVersionId, transformation step identifiers, CanonicalHash, and a timestamp recorded as data; and (f) upon determining that any transformation is lossy or dependent on non-declared environment parameters, suppressing acceptance of the input object in a fail-closed manner and generating a deterministic rejection record identifying the violated constraint.
-
The method of claim 1 wherein the NormalisationProfile defines stable key ordering rules and stable numeric representation rules for serialisation of structured data objects.
-
The method of claim 1 wherein the NormalisationManifest is appended as an immutable entry to an append-only repository together with the CanonicalForm.
-
The method of claim 1 wherein the rejection record comprises the violated constraint identifier, the ProfileVersionId, and the transformation step identifier at which failure occurred.
-
The method of claim 1 wherein replay recomputes the CanonicalHash using stored CanonicalForm and ProfileVersionId and does not depend on current system time.
-
The method of claim 1 wherein identical input objects processed under identical ProfileVersionId on distributed nodes produce identical CanonicalForm and identical CanonicalHash values.
-
A computing system comprising one or more processors and memory storing instructions which, when executed, cause the processors to perform the method of any of claims 1 to 6.
-
A non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the method of any of claims 1 to 6.
Licence
Any person is granted a perpetual, irrevocable, worldwide, royalty-free licence to make, use, implement, modify, or distribute any system or method described in this disclosure for any purpose, without restriction, under the Apache License 2.0.
A copy of the Apache License 2.0 is available at https://www.apache.org/licenses/LICENSE-2.0
Abstract
A deterministic normalisation engine for heterogeneous data upload formats is disclosed. An input object is received with an InputType identifier and a version-bound NormalisationProfile defining parsing, encoding, locale, ordering, and lossless transformation constraints. Processors parse the input strictly under the profile, generate a CanonicalForm by deterministic serialisation, and compute a CanonicalHash as H(CanonicalForm || ProfileVersionId). A NormalisationManifest binds the profile, transformation steps, and CanonicalHash as an immutable append-only entry. Acceptance is suppressed in a fail-closed manner when any transformation is lossy or dependent on non-declared environment parameters, and a deterministic rejection record is generated. Replay recomputes hashes using stored fields and does not depend on current system time. The system modifies computing operation by enforcing deterministic canonicalisation prior to hash computation and indexing, enabling reproducible identification and processing across heterogeneous inputs and distributed nodes.