◎ OS PUB Apache 2.0 ← All specifications

P189 — AIEP — Evidence Compression and Archival Format Protocol

Publication Date: 2026-03-27 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Ltd Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1


Framework Context

[0001] This disclosure operates within an Architected Instruction and Evidence Protocol (AIEP) environment as defined in United Kingdom patent application number GB2519711.2, filed 20 November 2025, the entire contents of which are incorporated herein by reference.

[0002] The present disclosure defines a protocol for the compressed, integrity-verified, self-describing archival of evidence artefacts and their associated metadata (ProvenanceChain, ClassificationVector, CitationRecord, LicenceRecord, AnnotationRecords) into a standardised AIEP Evidence Archive (AEA) file format — suitable for long-term cold storage, regulatory retention, corpus backup, and corpus migration between AIEP nodes.


Field of the Disclosure

[0003] This disclosure relates to archival format and compression protocols for large evidence corpora in governed artificial intelligence systems.

[0004] More particularly, the disclosure concerns: the AIEP Evidence Archive (AEA) binary format specification; the AEAManifest header structure; per-artefact AEARecord packing; compression algorithm selection and integrity verification; AEA creation, extraction, and validation tooling; and the relationship between this Protocol and the archival and retention policy protocol (P167) and the HashChain protocol (P175).


Background

[0005] Evidence corpora grow continuously and require periodic archival: completed case evidence, superseded artefacts, and regulatory retention data must be stored in a format that can be reliably restored decades later without depending on the availability of the original corpus system. The archival format must be self-describing (no dependency on external schema registries), compressed (minimising storage cost), and integrity-verified (enabling detection of archive corruption at any future restoration point).

[0006] Standard archive formats (ZIP, TAR, etc.) are not self-describing in the AIEP evidence sense: they carry no knowledge of AIEP schemas, ProvenanceChains, or content hash verification. The AEA format encodes this knowledge within the archive itself.


Summary of the Disclosure

[0007] AEA File Format Structure:

  • AEA Header (fixed 512 bytes): magic bytes (0x41 0x45 0x41 0x01 — ASCII “AEA” + version 1), total artefact count, archive creation timestamp, generating node fingerprint (P46), HashChain entry index range covered by this archive, archive-level compression algorithm code, archive-level checksum (SHA-256 of all subsequent bytes), generating node signature
  • Schema Block (variable): a self-contained schema descriptor sufficient to decode all AEARecords in this archive, including field names, types, and encoding details; referenced by AEARecord field decoders without needing external registry access
  • AEARecord sequence: one AEARecord per artefact (see [0008])
  • AEAManifest (trailer): an ordered list of {deid, byte_offset, compressed_size} for every AEARecord in the file, enabling random access to individual artefacts without sequential file scan

[0008] AEARecord Structure: For each evidence artefact:

  • deid (uncompressed, fixed 32 bytes — SHA-256)
  • hashchain_entry_index — the artefact’s position in the corpus HashChain (P175)
  • content_hash — SHA-256 of canonical content (P10 normalised)
  • provenance_chain_bytes — compressed ProvenanceChain (P150) serialisation
  • classification_bytes — compressed ClassificationVector (P160) serialisation
  • citation_bytes — compressed CitationRecord (P157) serialisation
  • licence_bytes — compressed LicenceRecord (P183) serialisation
  • annotation_bytes — compressed AnnotationRecord list (P176) (public-level annotations only)
  • content_bytes — compressed canonical content text
  • record_hash — SHA-256 of all preceding fields (uncompressed) — verification digest for this AEARecord

[0009] Compression: The default compression algorithm for AEA files is Zstandard (zstd) at compression level 15, providing a balance of compression ratio and decompression speed suitable for archival workloads. The compression algorithm code in the AEA Header identifies the algorithm; future AEA versions may specify alternative algorithms. Each field within an AEARecord is compressed independently, enabling field-level decompression without decompressing the full record.

[0010] AEA Creation: The aiep-archive create CLI produces an AEA file from a specified DEID list or taxonomy scope and date range (respecting the archival policy thresholds from P167). Creation includes: fetching all specified artefacts and their associated metadata; computing record_hash for each AEARecord; building the AEAManifest; and computing the archive-level checksum and signature.

[0011] AEA Validation: The aiep-archive validate CLI verifies an AEA file without full extraction: it checks the AEA Header magic bytes and version; verifies the archive-level checksum; spot-checks a configurable number of AEARecord record_hash values; and confirms the AEAManifest covers all AEARecords in the file. A full validation option recomputes all record_hash values.

[0012] AEA Extraction: The aiep-archive extract CLI extracts specified DEIDs from an AEA file using the AEAManifest for direct byte-offset access. Extracted artefacts can be re-admitted to an AIEP corpus using the TRANSFER ProvenanceChain step type, preserving the original content_hash and provenance data.


ASCII Architecture

AEA File
┌─────────────────────────────────────────────────────────┐
│ AEA Header (512 bytes)                                  │
│  magic | version | artefact_count | timestamp           │
│  node_fingerprint | hashchain_range | compression_algo  │
│  archive_checksum | node_signature                      │
├─────────────────────────────────────────────────────────┤
│ Schema Block (self-describing field definitions)        │
├─────────────────────────────────────────────────────────┤
│ AEARecord[0]: DEID | hashchain_index | content_hash    │
│   + compressed: provenance | classification | citation  │
│   + compressed: licence | annotations | content_text   │
│   + record_hash                                        │
├─────────────────────────────────────────────────────────┤
│ AEARecord[1] ... AEARecord[N-1]                        │
├─────────────────────────────────────────────────────────┤
│ AEAManifest (trailer)                                  │
│  [{deid, byte_offset, compressed_size}] × N            │
└─────────────────────────────────────────────────────────┘

Tooling:
  aiep-archive create  → produce AEA from corpus scope
  aiep-archive validate → verify checksums, record_hashes
  aiep-archive extract → restore specific DEIDs

Operational Detail

[0013] Incremental Archives: For large corpora where full archive creation is impractical, the aiep-archive create --incremental flag produces a delta AEA covering only artefacts admitted since the HashChain index specified by --since-chain-index. Incremental archives include a back-reference to the base archive’s archive_checksum in the AEA Header, enabling restoration tools to verify continuity of incremental archive chains.

[0014] Retention Policy Integration: The archival retention policy (P167) triggers AEA creation for artefacts transitioning to ARCHIVED status. The AEA file path and archive_checksum are recorded in the RetentionRecord (P167), enabling retrieval of the AEA at the end of the retention period.

[0015] Long-Term Readability: The Schema Block within every AEA file is self-contained and includes all field definitions needed to decode the archive without external tooling. This ensures that archives produced today can be decoded by a notional future tool that has no access to the current schema registry — a 30-year readability requirement addressed by the self-describing format.


Claims-Exclusion Notice

This specification is published as open-source prior art. No patent claims are asserted by the author in respect of the mechanisms described. Any third party seeking to patent mechanisms substantially equivalent to those described herein is placed on notice of this prior art disclosure.