P274 — AIEP — Canonical JSON Serialisation Composition
Publication Date: 2026-04-12
Status: Open Source Prior Art Disclosure
Licence: Apache License 2.0
Author/Organisation: Phatfella Limited
Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1
Patent ID: P274
Classification: OS — Open Source Prior Art
Implemented in: AIEP GENOME SDK kernel/canon/canon.py — canonical_json()
Related filings: GB2519711.2 (AIEP Core Protocol)
Field of the Invention
[0001] The disclosure relates to deterministic serialisation of structured data objects for use in cryptographic hash computation.
[0002] More particularly, the disclosure concerns a composed serialisation function for producing canonical JSON — a byte-identical UTF-8 string representation of a normalised Python object — through application of three independently-justified serialisation parameters whose combined effect eliminates all cross-platform sources of serialisation divergence.
Framework Context
[0003] This disclosure operates within the AIEP framework.
[0004] The AIEP protocol uses SHA-256 hashes of canonical JSON strings as artefact identity proofs. Two implementations that produce different JSON strings for the same input will produce different hash values, breaking cross-platform protocol interoperability. Canonical JSON production is therefore a security property of the protocol, not only a convenience.
[0005] The normalisation step that prepares any Python object for serialisation is described in P273 (_norm_object()) and P270 (_canon_number_str()). This disclosure covers the subsequent serialisation step: composing the canonical_json() function from the normalised form.
Background
[0006] Python’s json.dumps() function accepts several keyword arguments that affect the output string:
sort_keys— whenTrue, dictionary keys are emitted in sorted order;separators— a two-element tuple(item_separator, key_separator)controlling whitespace between tokens;ensure_ascii— whenTrue, non-ASCII characters are escaped as\uXXXXsequences; whenFalse, they are emitted as UTF-8 bytes.indent— when an integer, emits the JSON in multi-line pretty-printed form.
[0007] Eleven distinct combinations of sort_keys, separators, and ensure_ascii each produce a valid JSON document from the same input object, but they produce different byte sequences and therefore different SHA-256 hashes. Without a canonical choice, any two implementations that each make different but individually reasonable choices will produce incompatible hashes.
[0008] The problem is compounded across programming languages. JavaScript’s JSON.stringify() does not accept a sort_keys parameter — key ordering in the output depends on insertion order of the object’s properties, which varies with the runtime and the code that constructed the object. Go’s encoding/json marshaller sorts map keys by default for map[string]interface{} objects, but only in byte order, not Unicode codepoint order; for most ASCII keys these coincide, but for multi-byte keys they may not.
[0009] Existing canonical JSON specifications (e.g., RFC 8785 — JSON Canonicalization Scheme; the original Canonical JSON draft) define specific serialisation rules but are either not implemented in all major languages, not aligned with AIEP’s normalisation stage (P273, P270), or both.
[0010] There exists a need for a documented canonical JSON serialisation composition that:
(a) specifies each parameter of json.dumps() as a deliberate protocol decision with explicit rationale;
(b) is composable with the P273/P270 normalisation stage;
(c) produces identical output in Python, JavaScript, Go, and any other language implementing the same rules; and
(d) defines when canonical_json() is the appropriate primitive and when concat_hash() is used instead.
Summary of the Disclosure
[0011] A canonical JSON serialisation function is provided, defined as the composition of a normalisation stage and a serialisation stage with three specified parameters.
[0012] The canonical JSON function is:
canonical_json(obj) := json.dumps(
_norm_object(obj),
sort_keys=True,
separators=(",", ":"),
ensure_ascii=False
)
[0013] The normalisation stage _norm_object(obj) is defined in P273 and P270. The serialisation stage is json.dumps() with the three parameters specified in [0012]. The output is a Python str value. The SHA-256 hash is computed over the UTF-8 encoding of this string.
[0014] Each of the three parameters is a deliberate protocol decision:
sort_keys=Trueensures deterministic key ordering. Dictionary key ordering in Python 3.7+ is insertion-order preserving, not sorted;sort_keys=Trueoverrides this with ASCII-lexicographic ordering, providing cross-platform determinism as a belt-and-suspenders guarantee over the normalisation stage.separators=(",",":")eliminates all whitespace between tokens. The default Pythonseparatorsvalue is(", ", ": ")— note the spaces — which produces different bytes than compact form. Compact form is the standard interoperability choice.ensure_ascii=Falseemits non-ASCII characters as UTF-8 bytes rather than as\uXXXXescape sequences. Because P273 enforces NFC normalisation of string values, the UTF-8 bytes are deterministic.\uXXXXescapes are ambiguous for characters outside the Basic Multilingual Plane (they require surrogate pairs inconsistently across implementations).
[0015] The technical effect is modification of a computing system’s data serialisation behaviour such that the same structured object produces bit-identical JSON bytes in all conforming implementations across all languages and execution environments, enabling SHA-256 hash interoperability.
Brief Description of the Drawings
[0016] Figure 1 illustrates the two-stage composition: normalisation (P273/P270) then serialisation (this disclosure).
[0017] Figure 2 shows the R1–R5 rule coverage table mapping each canonical_json parameter to the protocol conformance rules it enforces.
[0018] Figure 3 shows the decision table for choosing canonical_json() versus concat_hash().
ASCII Drawings
Figure 1 — Two-Stage Canonical JSON Pipeline
Input Python object (any JSON-serialisable type)
|
v
_norm_object(obj) [P273 — Bool-Guarded dispatch,
| P270 — _canon_number_str]
v
Normalised Python object
(all types reduced to str/int/list/dict/None,
no floats, no bools, strings NFC-normalised)
|
v
json.dumps(normed,
sort_keys=True, [R1 — key ordering]
separators=(",",":"), [R2 — no whitespace]
ensure_ascii=False) [R3 — NFC UTF-8 bytes]
|
v
canonical_json_str (Python str, UTF-8 encoded downstream)
|
v
SHA-256(canonical_json_str.encode("utf-8"))
|
v
sha256_hex
Figure 2 — Protocol Rule Coverage
Parameter | Rule | Rule Description
--------------------+------+-----------------------------------------------------
sort_keys=True | R1 | Key ordering must be deterministic and cross-platform
separators=(",",":") | R2 | Output must contain no whitespace between tokens
ensure_ascii=False | R3 | String values emitted as NFC UTF-8, not \uXXXX escapes
_norm_object (P273) | R4 | Bool values rendered as canonical string tokens
_canon_number_str | R5 | Numeric values rendered as minimal decimal string
(P270) | |
Figure 3 — canonical_json vs concat_hash Decision Table
Scenario | Primitive to use
----------------------------------------------+-----------------
Hashing a structured record (dict/object) | canonical_json
Hashing a flat sequence of known-type values | concat_hash
Constructing a hash from sub-hashes | concat_hash
Any case involving a dict with string keys | canonical_json
Cross-language interoperability is required | canonical_json for dicts
and the input is a flat field list | concat_hash for sequences
Detailed Description of Embodiments
The Composition
[0019] The canonical JSON function is defined in kernel/canon/canon.py as:
import json
def canonical_json(obj: object) -> str:
"""Return the canonical JSON string for obj.
Parameters are deliberate protocol decisions, not style choices.
sort_keys=True — R1 key ordering guarantee (belt-and-suspenders for ASCII)
separators=(",",":")— R2 no whitespace between tokens
ensure_ascii=False — R3 NFC UTF-8 bytes, not \\uXXXX escapes
"""
return json.dumps(
_norm_object(obj),
sort_keys=True,
separators=(",", ":"),
ensure_ascii=False,
)
[0020] _norm_object() is defined in P273. It is called inside canonical_json() and must be called before json.dumps(). Calling json.dumps() directly on an un-normalised object is a protocol error that may produce non-canonical output.
sort_keys=True
[0021] In Python 3.7+, dictionary key order is guaranteed to be insertion order. This is useful for most Python code but is not canonical: two dicts with the same key-value pairs but different insertion orders produce different json.dumps() output without sort_keys. The normalisation step (P273) already converts the object to a plain dict, but does not impose key ordering. sort_keys=True provides a belt-and-suspenders guarantee: even if a caller passes a dict, and even if the normalisation stage does not deterministically order a particular key pair, the serialiser will alphabetise the keys.
[0022] For ASCII-only keys (the common case in AIEP protocol records), sort_keys=True alphabetises by ASCII code point, which is equivalent to UTF-8 byte order. Cross-language implementations that apply the same UTF-8 byte-order sort will produce the same key ordering.
separators=(”,”, ”:”)
[0023] Python’s default separators value is (", ", ": ") — both separators include a trailing space. This produces {"id": "a1", "value": "hello"} with spaces. Canonical form requires {"id":"a1","value":"hello"} with no spaces. The separators parameter (",",":") eliminates the two spaces. Every conforming implementation in every language must use the compact delimiter form.
[0024] This parameter cannot be omitted. An implementation that omits it and runs in a Python environment with a non-default json codec that adds spaces would produce non-canonical output without error.
ensure_ascii=False
[0025] Python’s default ensure_ascii=True encodes any character with code point > 127 as a \uXXXX escape sequence. For example, the string "café" would become "caf\u00e9". With ensure_ascii=False, the same string is encoded as "café" — the é character is emitted as its UTF-8 byte sequence \xc3\xa9.
[0026] The NFC normalisation enforced by _norm_object() (P273) ensures that for any unicode string, the NFC form is unique. Given a unique NFC string, the UTF-8 encoding of that string is unique. Therefore ensure_ascii=False + NFC normalisation produces a unique byte sequence for any string input.
[0027] The \uXXXX escape form is rejected because: (a) for characters outside the Basic Multilingual Plane, Python emits surrogate pair escapes (\uD83D\uDE00 for 😀) while other languages may emit the direct 4-byte Unicode escape \U0001F600; (b) this creates a cross-language divergence point for emoji and supplementary characters that appear in some AIEP string fields.
Cross-Language Implementation Notes
[0028] JavaScript: JSON.stringify() does not accept a sortKeys parameter. The implementation must sort object keys before passing to JSON.stringify(). The sort must be by UTF-8 byte order of the key string, which for ASCII keys is equivalent to String.prototype.localeCompare() with locale "en" and sensitivity "variant". An implementation should sort keys using a stable sort of Object.keys(obj) by UTF-8 byte order.
[0029] Go: encoding/json marshalling of map[string]interface{} sorts keys by byte order by default. For protoco records with ASCII keys this is identical to Python’s sort_keys=True. Implementors should verify that their Go json.Marshal key ordering matches the Python sort_keys=True output for any non-ASCII key strings present in their record types.
[0030] Rust: The serde_json serialiser by default serialises BTreeMap<String, Value> in key-sorted order. A HashMap<String, Value> is NOT sorted — implementations must use BTreeMap or a custom serialiser.
sha256_hex Composition
[0031] The SHA-256 hash of a canonical JSON record is computed as:
sha256_hex(canonical_json(record))
where sha256_hex(s: str) -> str is defined as:
import hashlib
def sha256_hex(data: str) -> str:
return hashlib.sha256(data.encode("utf-8")).hexdigest()
[0032] The data.encode("utf-8") call converts the Python string (which is internally Unicode) to UTF-8 bytes. Since ensure_ascii=False causes json.dumps() to produce a string containing literal multi-byte characters, the .encode("utf-8") step must be performed against the full string, not against an ASCII-safe representation.
Claims Summary
[0033] The following novel aspects of the canonical JSON composition are asserted as prior art:
-
A canonical JSON serialisation function defined as the composition of
_norm_object(obj)(P273) andjson.dumps()with three parameters —sort_keys=True,separators=(",",":"),ensure_ascii=False— in which each parameter is a specifically-rationed protocol decision with documented rule coverage. -
The use of
ensure_ascii=Falsein combination with NFC unicode normalisation (P273) as the selected mechanism for canonical encoding of non-ASCII string values, rejecting\uXXXXescape form to eliminate surrogate-pair divergence across implementations. -
A documented decision table specifying when
canonical_json()is the appropriate AIEP primitive and whenconcat_hash()is to be used in its place. -
Cross-language implementation notes establishing that JavaScript objects must be key-sorted before passing to
JSON.stringify(), Go maps must use byte-order sort, and Rust serialisers must useBTreeMaprather thanHashMap.
Related Specifications
| Reference | Description |
|---|---|
| P270 | Fixed-point float canonical serialisation (_canon_number_str) |
| P271 | Shim distribution architecture — distributes this function |
| P272 | Conformance vector corpus — suites 01–05 exercise canonical_json |
| P273 | Bool-Guarded canonical normalisation (_norm_object) |
| GB2519711.2 | Core AIEP protocol — canonical_json is the primary hash primitive |