P123 — AIEP — Bulk Ingestion and Delta Feed Protocol
Publication Date: 2026-03-01 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Ltd Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1
Framework Context
[0001] This disclosure operates within an Architected Instruction and Evidence Protocol (AIEP) environment as defined in United Kingdom patent application number GB2519711.2, filed 20 November 2025, and GB2519798.9, filed 20 November 2025, the entire contents of which are incorporated herein by reference.
[0002] The present disclosure extends the multimodal ingestion protocol defined in P119 and the AODSR defined in P122 to define: a bulk ingestion endpoint accepting multiple URLs or files in a single call; and a delta feed protocol by which an AIEP evidence substrate subscribes to a registered source for incremental content updates.
Field of the Disclosure
[0003] This disclosure relates to governed artificial intelligence evidence substrates that support high-throughput ingestion of multiple evidence sources in a single operation and subscription-based incremental updates from registered authoritative sources.
[0004] More particularly, the disclosure concerns: (a) a bulk ingestion endpoint that accepts an ordered list of input items (URLs or file references), processes each through the standard ingestion pipeline, and returns a batch ingestion record with per-item status; and (b) a delta feed registration and consumption mechanism that periodically retrieves new or updated content from registered sources and commits the incremental artefacts to the Evidence Ledger.
Background
[0005] Evidence-grounded AI systems that operate on dynamic corpora — legislative repositories that publish amendments, standards bodies that release updated specifications, institutional databases that publish new research periodically — require efficient mechanisms for: (a) ingesting large initial document sets without per-item API calls; and (b) receiving incremental updates as documents are published or revised.
[0006] Existing ingestion approaches require one API call per document. For corpora of hundreds or thousands of documents, this creates: high per-call overhead; no batch commitment linking related documents; and no subscription mechanism to receive future updates.
Summary of the Disclosure
[0007] The Bulk Ingestion Endpoint at POST /ingest/bulk accepts a JSON body comprising: an ordered items array, where each item is either a { "url": string } object for URL-based sources or a { "file_ref": string } object referencing a previously uploaded file (from P119); a tenant_id; and an optional batch_label string. The maximum items array length is tenant-tier-dependent (default: 50; enterprise: 500).
[0008] Each item is processed through the standard ingestion pipeline (P119 for files, standard evidence retrieval for URLs). Items are processed in parallel up to a configurable concurrency limit. For each item, the result is one of: ingested (new artefact committed); duplicate (content_hash matched an existing artefact — no duplicate committed); failed (retrieval or extraction error — error detail recorded).
[0009] A BatchIngestionRecord is assembled on completion comprising: batch_id; batch_label (if provided); item_count; counts by result type; an array of per-item result records; and a batch_hash — SHA-256 over the batch record in canonical JSON form. The BatchIngestionRecord is persisted to the Evidence Ledger.
[0010] The Delta Feed Protocol enables an AIEP substrate to register a subscription against an AODSR-registered source or a tenant-registered source. A DeltaFeedRegistration comprises: source_id (referencing an AODSR entry or tenant source); feed_url (the URL of the source’s Atom/RSS feed or a polling endpoint); poll_interval_hours (default: 24); domain_filter (optional domain labels to restrict which updates are ingested); and a registration_hash.
[0011] A scheduled Worker (Cloudflare Cron Trigger) iterates active DeltaFeedRegistrations. For each registration, the feed URL is fetched and parsed for new entries since the last poll timestamp. New entries are submitted to the standard ingestion pipeline. A DeltaFeedEvent is committed to the Evidence Ledger for each poll cycle, recording: registration_id; polled_at; new_items_count; duplicate_items_count; and a delta_hash.
[0012] Delta feed events are surfaced in the AODSR management UI as a timeline of update activity per registered source, enabling tenants to audit the currency of their evidence pool and identify sources that have not been updated within expected intervals.