◎ OS PUB Apache 2.0 ← All specifications

P122 — AIEP — Authoritative Open Data Source Registry

Publication Date: 2026-03-01 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Ltd Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1


Framework Context

[0001] This disclosure operates within an Architected Instruction and Evidence Protocol (AIEP) environment as defined in United Kingdom patent application number GB2519711.2, filed 20 November 2025, and GB2519798.9, filed 20 November 2025, the entire contents of which are incorporated herein by reference.

[0002] The present disclosure defines an Authoritative Open Data Source Registry (AODSR) — a structured, machine-readable catalogue of trusted tier-1 data sources across defined subject matter domains, against which AIEP evidence retrieval systems preferentially route domain-relevant queries.


Field of the Disclosure

[0003] This disclosure relates to governed artificial intelligence evidence substrates that prioritise retrieval from authoritative sources in domain-specific evidence rail operations.

[0004] More particularly, the disclosure concerns a registry of authoritative open data sources classified by domain, jurisdiction, and confidence tier, together with a routing protocol that applies this registry to bias evidence retrieval toward tier-1 sources when a query is classified as falling within a registered domain.


Background

[0005] Web-retrieved evidence for AI inference is subject to quality variation: a query about UK tax law may surface blog posts, forum discussions, and legal commentary in addition to primary legislation. An evidence-grounded AI system that gives equal weight to all retrieved sources produces lower-quality outputs than one that can discriminate between primary legislation and secondary commentary.

[0006] Existing AI retrieval systems use relevance scoring to rank results but do not: (a) maintain a structured registry of universally trusted sources for defined domains; (b) apply domain classification to route queries to preferred sources before open-web search; (c) assign a confidence tier based on registry membership; or (d) surface registry membership as a visible governance signal in the Evidence Rail.


Summary of the Disclosure

[0007] The AODSR is a structured JSON document comprising an array of RegistryEntry records. Each RegistryEntry comprises: source_id (unique identifier); name (human-readable); base_url (the URL prefix that identifies source content); domain (one or more domain labels from a defined taxonomy: legislation, regulation, treaty, standards, judicial, statistical, patent, scientific); jurisdiction (ISO 3166-1 alpha-2 country code or INT for international); confidence_tier (always verified for tier-1 registry members); and last_validated ISO 8601 date.

[0008] Example tier-1 registry entries:

SourceDomainJurisdictionBase URL
legislation.gov.uklegislationGBhttps://legislation.gov.uk
WTO Legal TextstreatyINThttps://wto.org/english/res_e/reser_e/gattdocs_e.htm
WIPO Lexpatent, treatyINThttps://wipolex.wipo.int
EUR-Lexlegislation, regulationEUhttps://eur-lex.europa.eu
HMRC Technical GuidanceregulationGBhttps://www.gov.uk/government/organisations/hm-revenue-customs
Federal RegisterregulationUShttps://federalregister.gov
PubMed CentralscientificINThttps://pmc.ncbi.nlm.nih.gov

[0009] On query classification (reasoning step 1), if the query’s query_class falls within a domain covered by the AODSR, the evidence retrieval phase is augmented with targeted retrieval against relevant registry entries whose domain and jurisdiction match the query’s classified domain and jurisdiction signals. Retrieved sources whose url matches a registry entry’s base_url prefix are assigned the registry’s confidence_tier regardless of their computed relevance score.

[0010] An EvidenceRef sourced from an AODSR registry entry carries an aodsr_member: true flag. This flag is surfaced in the Evidence Rail UI as a distinct visual indicator (e.g. a “Tier-1” badge) to distinguish registry-sourced evidence from open-web-retrieved evidence of the same confidence tier.

[0011] The AODSR is versioned. The version embedded in the running PIEA worker is identified by a registry_version field. On update, the delta between versions (added sources, removed sources, changed domains) is committed as a Registry Changelog artefact with a commitment hash. The Bulk Ingestion and Delta Feed protocol defined in P123 extends this mechanism to support subscription-based delta feeds.