◎ OS PUB Apache 2.0 ← All specifications

P274 — AIEP — Canonical JSON Serialisation Composition

Publication Date: 2026-04-12 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Limited Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1 Patent ID: P274 Classification: OS — Open Source Prior Art Implemented in: AIEP GENOME SDK kernel/canon/canon.pycanonical_json() Related filings: GB2519711.2 (AIEP Core Protocol)


Field of the Invention

[0001] The disclosure relates to deterministic serialisation of structured data objects for use in cryptographic hash computation.

[0002] More particularly, the disclosure concerns a composed serialisation function for producing canonical JSON — a byte-identical UTF-8 string representation of a normalised Python object — through application of three independently-justified serialisation parameters whose combined effect eliminates all cross-platform sources of serialisation divergence.


Framework Context

[0003] This disclosure operates within the AIEP framework.

[0004] The AIEP protocol uses SHA-256 hashes of canonical JSON strings as artefact identity proofs. Two implementations that produce different JSON strings for the same input will produce different hash values, breaking cross-platform protocol interoperability. Canonical JSON production is therefore a security property of the protocol, not only a convenience.

[0005] The normalisation step that prepares any Python object for serialisation is described in P273 (_norm_object()) and P270 (_canon_number_str()). This disclosure covers the subsequent serialisation step: composing the canonical_json() function from the normalised form.


Background

[0006] Python’s json.dumps() function accepts several keyword arguments that affect the output string:

  • sort_keys — when True, dictionary keys are emitted in sorted order;
  • separators — a two-element tuple (item_separator, key_separator) controlling whitespace between tokens;
  • ensure_ascii — when True, non-ASCII characters are escaped as \uXXXX sequences; when False, they are emitted as UTF-8 bytes.
  • indent — when an integer, emits the JSON in multi-line pretty-printed form.

[0007] Eleven distinct combinations of sort_keys, separators, and ensure_ascii each produce a valid JSON document from the same input object, but they produce different byte sequences and therefore different SHA-256 hashes. Without a canonical choice, any two implementations that each make different but individually reasonable choices will produce incompatible hashes.

[0008] The problem is compounded across programming languages. JavaScript’s JSON.stringify() does not accept a sort_keys parameter — key ordering in the output depends on insertion order of the object’s properties, which varies with the runtime and the code that constructed the object. Go’s encoding/json marshaller sorts map keys by default for map[string]interface{} objects, but only in byte order, not Unicode codepoint order; for most ASCII keys these coincide, but for multi-byte keys they may not.

[0009] Existing canonical JSON specifications (e.g., RFC 8785 — JSON Canonicalization Scheme; the original Canonical JSON draft) define specific serialisation rules but are either not implemented in all major languages, not aligned with AIEP’s normalisation stage (P273, P270), or both.

[0010] There exists a need for a documented canonical JSON serialisation composition that:

(a) specifies each parameter of json.dumps() as a deliberate protocol decision with explicit rationale; (b) is composable with the P273/P270 normalisation stage; (c) produces identical output in Python, JavaScript, Go, and any other language implementing the same rules; and (d) defines when canonical_json() is the appropriate primitive and when concat_hash() is used instead.


Summary of the Disclosure

[0011] A canonical JSON serialisation function is provided, defined as the composition of a normalisation stage and a serialisation stage with three specified parameters.

[0012] The canonical JSON function is:

canonical_json(obj) := json.dumps(
    _norm_object(obj),
    sort_keys=True,
    separators=(",", ":"),
    ensure_ascii=False
)

[0013] The normalisation stage _norm_object(obj) is defined in P273 and P270. The serialisation stage is json.dumps() with the three parameters specified in [0012]. The output is a Python str value. The SHA-256 hash is computed over the UTF-8 encoding of this string.

[0014] Each of the three parameters is a deliberate protocol decision:

  • sort_keys=True ensures deterministic key ordering. Dictionary key ordering in Python 3.7+ is insertion-order preserving, not sorted; sort_keys=True overrides this with ASCII-lexicographic ordering, providing cross-platform determinism as a belt-and-suspenders guarantee over the normalisation stage.
  • separators=(",",":") eliminates all whitespace between tokens. The default Python separators value is (", ", ": ") — note the spaces — which produces different bytes than compact form. Compact form is the standard interoperability choice.
  • ensure_ascii=False emits non-ASCII characters as UTF-8 bytes rather than as \uXXXX escape sequences. Because P273 enforces NFC normalisation of string values, the UTF-8 bytes are deterministic. \uXXXX escapes are ambiguous for characters outside the Basic Multilingual Plane (they require surrogate pairs inconsistently across implementations).

[0015] The technical effect is modification of a computing system’s data serialisation behaviour such that the same structured object produces bit-identical JSON bytes in all conforming implementations across all languages and execution environments, enabling SHA-256 hash interoperability.


Brief Description of the Drawings

[0016] Figure 1 illustrates the two-stage composition: normalisation (P273/P270) then serialisation (this disclosure).

[0017] Figure 2 shows the R1–R5 rule coverage table mapping each canonical_json parameter to the protocol conformance rules it enforces.

[0018] Figure 3 shows the decision table for choosing canonical_json() versus concat_hash().


ASCII Drawings

Figure 1 — Two-Stage Canonical JSON Pipeline

Input Python object (any JSON-serialisable type)
              |
              v
    _norm_object(obj)                  [P273 — Bool-Guarded dispatch,
              |                          P270 — _canon_number_str]
              v
    Normalised Python object
    (all types reduced to str/int/list/dict/None,
     no floats, no bools, strings NFC-normalised)
              |
              v
    json.dumps(normed,
               sort_keys=True,         [R1 — key ordering]
               separators=(",",":"),   [R2 — no whitespace]
               ensure_ascii=False)     [R3 — NFC UTF-8 bytes]
              |
              v
    canonical_json_str (Python str, UTF-8 encoded downstream)
              |
              v
    SHA-256(canonical_json_str.encode("utf-8"))
              |
              v
    sha256_hex

Figure 2 — Protocol Rule Coverage

Parameter           | Rule | Rule Description
--------------------+------+-----------------------------------------------------
sort_keys=True      | R1   | Key ordering must be deterministic and cross-platform
separators=(",",":") | R2  | Output must contain no whitespace between tokens
ensure_ascii=False  | R3   | String values emitted as NFC UTF-8, not \uXXXX escapes
_norm_object (P273) | R4   | Bool values rendered as canonical string tokens
_canon_number_str   | R5   | Numeric values rendered as minimal decimal string
(P270)              |      |

Figure 3 — canonical_json vs concat_hash Decision Table

Scenario                                      | Primitive to use
----------------------------------------------+-----------------
Hashing a structured record (dict/object)     | canonical_json
Hashing a flat sequence of known-type values  | concat_hash
Constructing a hash from sub-hashes           | concat_hash
Any case involving a dict with string keys    | canonical_json
Cross-language interoperability is required   | canonical_json for dicts
and the input is a flat field list            | concat_hash for sequences

Detailed Description of Embodiments

The Composition

[0019] The canonical JSON function is defined in kernel/canon/canon.py as:

import json

def canonical_json(obj: object) -> str:
    """Return the canonical JSON string for obj.

    Parameters are deliberate protocol decisions, not style choices.
    sort_keys=True      — R1 key ordering guarantee (belt-and-suspenders for ASCII)
    separators=(",",":")— R2 no whitespace between tokens
    ensure_ascii=False  — R3 NFC UTF-8 bytes, not \\uXXXX escapes
    """
    return json.dumps(
        _norm_object(obj),
        sort_keys=True,
        separators=(",", ":"),
        ensure_ascii=False,
    )

[0020] _norm_object() is defined in P273. It is called inside canonical_json() and must be called before json.dumps(). Calling json.dumps() directly on an un-normalised object is a protocol error that may produce non-canonical output.

sort_keys=True

[0021] In Python 3.7+, dictionary key order is guaranteed to be insertion order. This is useful for most Python code but is not canonical: two dicts with the same key-value pairs but different insertion orders produce different json.dumps() output without sort_keys. The normalisation step (P273) already converts the object to a plain dict, but does not impose key ordering. sort_keys=True provides a belt-and-suspenders guarantee: even if a caller passes a dict, and even if the normalisation stage does not deterministically order a particular key pair, the serialiser will alphabetise the keys.

[0022] For ASCII-only keys (the common case in AIEP protocol records), sort_keys=True alphabetises by ASCII code point, which is equivalent to UTF-8 byte order. Cross-language implementations that apply the same UTF-8 byte-order sort will produce the same key ordering.

separators=(”,”, ”:”)

[0023] Python’s default separators value is (", ", ": ") — both separators include a trailing space. This produces {"id": "a1", "value": "hello"} with spaces. Canonical form requires {"id":"a1","value":"hello"} with no spaces. The separators parameter (",",":") eliminates the two spaces. Every conforming implementation in every language must use the compact delimiter form.

[0024] This parameter cannot be omitted. An implementation that omits it and runs in a Python environment with a non-default json codec that adds spaces would produce non-canonical output without error.

ensure_ascii=False

[0025] Python’s default ensure_ascii=True encodes any character with code point > 127 as a \uXXXX escape sequence. For example, the string "café" would become "caf\u00e9". With ensure_ascii=False, the same string is encoded as "café" — the é character is emitted as its UTF-8 byte sequence \xc3\xa9.

[0026] The NFC normalisation enforced by _norm_object() (P273) ensures that for any unicode string, the NFC form is unique. Given a unique NFC string, the UTF-8 encoding of that string is unique. Therefore ensure_ascii=False + NFC normalisation produces a unique byte sequence for any string input.

[0027] The \uXXXX escape form is rejected because: (a) for characters outside the Basic Multilingual Plane, Python emits surrogate pair escapes (\uD83D\uDE00 for 😀) while other languages may emit the direct 4-byte Unicode escape \U0001F600; (b) this creates a cross-language divergence point for emoji and supplementary characters that appear in some AIEP string fields.

Cross-Language Implementation Notes

[0028] JavaScript: JSON.stringify() does not accept a sortKeys parameter. The implementation must sort object keys before passing to JSON.stringify(). The sort must be by UTF-8 byte order of the key string, which for ASCII keys is equivalent to String.prototype.localeCompare() with locale "en" and sensitivity "variant". An implementation should sort keys using a stable sort of Object.keys(obj) by UTF-8 byte order.

[0029] Go: encoding/json marshalling of map[string]interface{} sorts keys by byte order by default. For protoco records with ASCII keys this is identical to Python’s sort_keys=True. Implementors should verify that their Go json.Marshal key ordering matches the Python sort_keys=True output for any non-ASCII key strings present in their record types.

[0030] Rust: The serde_json serialiser by default serialises BTreeMap<String, Value> in key-sorted order. A HashMap<String, Value> is NOT sorted — implementations must use BTreeMap or a custom serialiser.

sha256_hex Composition

[0031] The SHA-256 hash of a canonical JSON record is computed as:

sha256_hex(canonical_json(record))

where sha256_hex(s: str) -> str is defined as:

import hashlib

def sha256_hex(data: str) -> str:
    return hashlib.sha256(data.encode("utf-8")).hexdigest()

[0032] The data.encode("utf-8") call converts the Python string (which is internally Unicode) to UTF-8 bytes. Since ensure_ascii=False causes json.dumps() to produce a string containing literal multi-byte characters, the .encode("utf-8") step must be performed against the full string, not against an ASCII-safe representation.


Claims Summary

[0033] The following novel aspects of the canonical JSON composition are asserted as prior art:

  1. A canonical JSON serialisation function defined as the composition of _norm_object(obj) (P273) and json.dumps() with three parameters — sort_keys=True, separators=(",",":"), ensure_ascii=False — in which each parameter is a specifically-rationed protocol decision with documented rule coverage.

  2. The use of ensure_ascii=False in combination with NFC unicode normalisation (P273) as the selected mechanism for canonical encoding of non-ASCII string values, rejecting \uXXXX escape form to eliminate surrogate-pair divergence across implementations.

  3. A documented decision table specifying when canonical_json() is the appropriate AIEP primitive and when concat_hash() is to be used in its place.

  4. Cross-language implementation notes establishing that JavaScript objects must be key-sorted before passing to JSON.stringify(), Go maps must use byte-order sort, and Rust serialisers must use BTreeMap rather than HashMap.


ReferenceDescription
P270Fixed-point float canonical serialisation (_canon_number_str)
P271Shim distribution architecture — distributes this function
P272Conformance vector corpus — suites 01–05 exercise canonical_json
P273Bool-Guarded canonical normalisation (_norm_object)
GB2519711.2Core AIEP protocol — canonical_json is the primary hash primitive