◎ OS PUB Apache 2.0 ← All specifications

P272 — AIEP — Cryptographically-Pinned Conformance Vector Corpus

Publication Date: 2026-04-12 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Limited Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1 Patent ID: P272 Classification: OS — Open Source Prior Art Implemented in: AIEP-VECTORS repository, corpus/ directory and GENOME_LOCKFILE.json Related filings: GB2519711.2 (AIEP Core Protocol)

Field of the Invention

[0001] The disclosure relates to protocol conformance testing systems and artefact integrity assurance for deterministic hash computation frameworks.

[0002] More particularly, the disclosure concerns a conformance vector corpus architecture in which each test vector is individually cryptographically addressable, the corpus as a whole is hash-pinned in a lockfile, and conformance claims bind not to a semantic version label but to a specific kernel commit and SHA-256 hash.

Framework Context

[0003] This disclosure operates within the AIEP framework.

[0004] The AIEP canonical hash kernel (canon.py, disclosed in P270, P271, P273, P274) produces deterministic byte-identical outputs across all conforming implementations in all languages and execution environments. A conformance vector corpus provides the mechanism by which any independent implementation can demonstrate it produces these same outputs.

[0005] This disclosure describes the structure of that corpus, the conformance claim model it enables, and the integrity mechanisms that bind the corpus to a particular kernel version.

Background

[0006] Semantic versioning (e.g., SemVer 1.0.3) is the dominant method by which software libraries assert compatibility. A library claiming version 1.0.3 compatibility asserts it conforms to the published specification for that version. This model is sufficient for most compatibility contexts.

[0007] For cryptographic hash protocol implementations, semantic version conformance is insufficient. Two implementations may both correctly claim 1.0.3 compatibility while computing different hash values for the same input, because:

(a) the specification may contain ambiguities that two correct implementors resolve differently; (b) the specification may have been updated without incrementing the version label; (c) the reference implementation may have changed between the time each implementor read the specification; or (d) one implementor relied on the reference implementation and the other relied on the written specification, which are not guaranteed to be byte-identical.

[0008] In the AIEP protocol, a hash value is an artefact identity. Two implementations that produce different hashes for the same input cannot interoperate — artefacts signed by one implementation are unverifiable by the other. The protocol therefore requires a stronger conformance model than semantic versioning provides.

[0009] Vector-based conformance testing — providing known input-output pairs that any implementation must reproduce — is widely used in protocol testing. However, existing vector corpora do not typically:

(a) record the SHA-256 hash of each individual test vector file; (b) bind those hashes to a corpus-level lockfile that is itself part of the repository; (c) define a conformance claim form that cites a kernel commit identifier and kernel file hash rather than a version label; or (d) include suite-typed vectors with structured metadata fields adapted to the specific failure mode each suite probes.

[0010] There exists a need for a conformance vector architecture that enables implementors to make cryptographically-specific conformance claims, and enables consumers of those claims to verify them, without trust in the version label or the publisher’s communication of specification details.

Summary of the Disclosure

[0011] A cryptographically-pinned conformance vector corpus is provided for a deterministic hash computation framework.

[0012] The corpus comprises a set of test vector files, a MANIFEST.json aggregate descriptor, and a GENOME_LOCKFILE.json integrity record. The corpus is organised into named suites, each probing a distinct category of canonical behaviour.

[0013] Each test vector is a JSON file containing: a suite identifier, a vector_id (unique across the corpus), an input object, an expected output record, and metadata fields. The exact fields present in the input and output vary by suite class.

[0014] The MANIFEST.json declares: the corpus name, the bound kernel version and kernel hash, the suite list, per-suite vector counts, and the corpus total vector count. A conforming corpus runner asserts that the physical vector file count matches the declared total.

[0015] The GENOME_LOCKFILE.json records the SHA-256 hash of the kernel authority file bound to this corpus version. A conforming implementation may compute the SHA-256 of its installed kernel authority file and assert equality with the declared hash before executing any vectors.

[0016] The conformance claim form is: “Implementation X produces outputs identical to AIEP canonical kernel at commit <sha>, kernel hash <H>.” This claim is verifiable by any party in possession of the corpus and the named kernel version, without contacting the author’s infrastructure.

[0017] The technical effect is modification of a computing system’s protocol conformance verification such that conformance claims are cryptographically specific rather than semantically labelled, and independently verifiable using only the corpus and the declared kernel hash.

Brief Description of the Drawings

[0018] Figure 1 illustrates the corpus directory structure showing suite directories, MANIFEST.json, and GENOME_LOCKFILE.json.

[0019] Figure 2 illustrates the suite taxonomy with the 7 suites in the P272 corpus and the failure mode each probes.

[0020] Figure 3 illustrates the conformance claim verification flow: vector runner → kernel hash assertion → vector execution → pass/fail report.

ASCII Drawings

Figure 1 — Corpus Directory Structure

corpus/
├── MANIFEST.json
├── 01-deterministic-replay/
│   ├── DET-001.json
│   ├── DET-002.json
│   └── DET-003.json
├── 02-tamper-detection/
│   ├── TAMP-001.json
│   └── TAMP-002.json
├── 03-negative-proof/
│   ├── NEG-001.json
│   └── NEG-002.json
├── 04-admissibility/
│   ├── ADM-001.json
│   ├── ADM-002.json
│   └── ADM-003.json
├── 05-number-normalisation/
│   ├── R5-001.json ... R5-008.json
├── 06-malformed-input/
│   ├── MAL-001.json ... MAL-006.json
└── 07-hash-primitives/
    ├── HP-001.json ... HP-007.json

Figure 2 — Suite Taxonomy

Suite ID  | Suite Name             | Vectors | Failure Mode Probed
----------+------------------------+---------+--------------------------------
01        | deterministic-replay   | 3       | Non-determinism across calls
02        | tamper-detection       | 2       | Tampered input produces same hash
03        | negative-proof         | 2       | Implausible inputs accepted
04        | admissibility          | 3       | Gate incorrectly admits/refuses
05        | number-normalisation   | 8       | Float/int canonical form divergence
06        | malformed-input        | 6       | Malformed inputs not rejected
07        | hash-primitives        | 7       | sha256_hex / concat_hash byte errors
----------+------------------------+---------+--------------------------------
TOTAL                             | 31      |

Figure 3 — Conformance Verification Flow

Implementation under test
         |
         v
[1] Compute SHA-256(installed kernel/canon/canon.py)
         |
         v
[2] Assert == GENOME_LOCKFILE.kernel_bundle_hash  ← FAIL: wrong kernel
         |
         v
[3] For each vector in corpus:
      actual_output = implementation(vector.input)
      assert actual_output == vector.expected_output  ← FAIL: wrong output
         |
         v
[4] Assert physical vector count == MANIFEST.total_vectors ← FAIL: missing vectors
         |
         v
[5] Emit: PASS — conformant to kernel <hash>

Detailed Description of Embodiments

MANIFEST.json

[0021] The MANIFEST.json at the corpus root has the following structure:

{
  "corpus": "AIEP Canonical Conformance Vectors",
  "bound_kernel_version": "1.0.3",
  "bound_kernel_hash": "<SHA-256 of kernel/canon/canon.py>",
  "total_vectors": 31,
  "suites": [
    {
      "suite_id": "01-deterministic-replay",
      "count": 3
    },
    {
      "suite_id": "02-tamper-detection",
      "count": 2
    },
    {
      "suite_id": "03-negative-proof",
      "count": 2
    },
    {
      "suite_id": "04-admissibility",
      "count": 3
    },
    {
      "suite_id": "05-number-normalisation",
      "count": 8
    },
    {
      "suite_id": "06-malformed-input",
      "count": 6
    },
    {
      "suite_id": "07-hash-primitives",
      "count": 7
    }
  ]
}

Individual Vector Format — Canonical Output Suites

[0022] Vectors in suites 01, 02, 03, 05 follow a canonical_json input-output format:

{
  "suite": "01-deterministic-replay",
  "vector_id": "DET-001",
  "description": "Minimal record: single string field",
  "input": {
    "record": { "id": "a1", "value": "hello" }
  },
  "expected_output": {
    "canonical_json": "{\"id\":\"a1\",\"value\":\"hello\"}",
    "sha256_hex": "<SHA-256 of the canonical JSON bytes as UTF-8>"
  }
}

[0023] The canonical_json field is the deterministic UTF-8 string produced by canonical_json(record). The sha256_hex field is sha256_hex(canonical_json(record)). A conforming implementation must produce both values exactly.

Individual Vector Format — Gate Decision Suites

[0024] Vectors in suite 04 (admissibility) carry gate input fields and an expected decision:

{
  "suite": "04-admissibility",
  "vector_id": "ADM-001",
  "description": "Closed gate blocks admission regardless of plausibility",
  "input": {
    "gate_status": "CLOSED",
    "plausibility_status": "plausible",
    "dissent_final_position": "agree"
  },
  "expected_output": {
    "decision": "REFUSED",
    "refusing_constraint": "CC-005"
  }
}

Individual Vector Format — Hash Primitive Suites

[0025] Vectors in suite 07 (hash-primitives) test sha256_hex and concat_hash directly:

{
  "suite": "07-hash-primitives",
  "vector_id": "HP-001",
  "description": "sha256_hex of ASCII string 'abc'",
  "input": {
    "primitive": "sha256_hex",
    "data_hex": "616263"
  },
  "expected_output": {
    "sha256_hex": "ba7816bf8f01cfea414140de5dae2ec73b00361bbef0469348423f656b8d6a3"
  }
}

[0026] The data_hex field encodes the input bytes as a lowercase hexadecimal string with no spaces. This eliminates character encoding ambiguity across runner implementations in different languages.

Conformance Claim Form

[0027] An implementation that passes all 31 vectors MAY emit the conformance claim:

“This implementation is conformant with the AIEP canonical hash kernel at commit <git-sha>, kernel SHA-256 <kernel_bundle_hash>.”

[0028] This claim is semantically stronger than a SemVer conformance claim in two ways: (a) the commitment is byte-specific, not label-specific; and (b) any third party with the named commit can independently regenerate the corpus and verify the implementation’s pass result without trust in the claiming party.

GENOME_LOCKFILE.json

[0029] The GENOME_LOCKFILE.json at the corpus repository root records:

{
  "LOCKFILE_VERSION": "1.0.3",
  "total_vectors": 31,
  "kernel_bundle_hash": "<SHA-256 of bound kernel/canon/canon.py>",
  "suites": {
    "01-deterministic-replay": { "count": 3 },
    "02-tamper-detection":     { "count": 2 },
    "03-negative-proof":       { "count": 2 },
    "04-admissibility":        { "count": 3 },
    "05-number-normalisation": { "count": 8 },
    "06-malformed-input":      { "count": 6 },
    "07-hash-primitives":      { "count": 7 }
  }
}

[0030] The total_vectors field in the lockfile is a third independent declaration of the corpus size. Together with MANIFEST.json and the physical file count, this constitutes a three-party count integrity mechanism (P276).

Claims Summary

[0031] The following novel aspects of the conformance corpus are asserted as prior art:

A conformance vector corpus architecture for a deterministic hash computation protocol in which conformance claims bind to a specific kernel commit SHA-256 hash rather than a semantic version label.
A MANIFEST.json aggregate descriptor that declares per-suite vector counts and a total_vectors count, enabling corpus count integrity verification against physically present files.
A GENOME_LOCKFILE.json record that declares the SHA-256 hash of the kernel authority file bound to the corpus version, enabling pre-execution kernel integrity verification by any consumer.
A suite taxonomy of seven named categories probing distinct classes of canonical behaviour: deterministic replay, tamper detection, negative proof, admissibility gate, number normalisation, malformed input, and hash primitives.
A hash-primitive vector encoding in which input bytes are represented as lowercase hexadecimal with no spaces, eliminating character encoding ambiguity across implementations in multiple programming languages.

Reference	Description
P271	Shim distribution architecture — kernel whose hash this corpus pins
P273	Bool-Guarded canonical normalisation — probed by suite 05
P274	Canonical JSON composition — probed by suites 01–05
P275	Fail-closed admissibility gate — probed by suite 04
P276	Three-party count integrity — `total_vectors` field participates
GB2519711.2	Core AIEP protocol — canonical primitives these vectors test