◎ OS PUB Apache 2.0 ← All specifications

P168 — AIEP — Evidence Semantic Query Protocol

Publication Date: 2026-03-27 Status: Open Source Prior Art Disclosure Licence: Apache License 2.0 Author/Organisation: Phatfella Ltd Schema: AIEP_OS_SPEC_TEMPLATE v1.0.1 — https://aiep.dev/schemas/aiep-os-spec-template/v1.0.1


Framework Context

[0001] This disclosure operates within an Architected Instruction and Evidence Protocol (AIEP) environment as defined in United Kingdom patent application number GB2519711.2, filed 20 November 2025, the entire contents of which are incorporated herein by reference.

[0002] The present disclosure defines a protocol for issuing rich, semantically expressive queries against AIEP evidence corpora beyond the structured taxonomy filters of the distributed evidence index (P133), through a SemanticQuery schema supporting natural-language statements, concept-based matching, and multi-constraint combination, a SemanticQueryEngine processing queries against an evidence corpus semantic index, and a SemanticQueryResult schema encoding ranked results with relevance scores and query match explanations.


Field of the Disclosure

[0003] This disclosure relates to semantic query protocols for evidence retrieval in governed artificial intelligence reasoning systems.

[0004] More particularly, the disclosure concerns a SemanticQuery schema supporting text statements, concept identifiers, similarity thresholds, and constraint combinators; a SemanticQueryEngine wrapping a configurable vector embedding model to produce embedding-based relevance rankings; a result schema including per-result relevance scores and match explanations; and governance constraints ensuring that semantic queries are recorded in the ledger for audit, rate-limited per requesting entity, and bounded in corpus scope to prevent unrestricted corpus exposure.


Background

[0005] AIEP evidence discovery through the distributed index (P133) is taxonomy-filtered: queries specify ClassificationVector dimension values (P160) and the index returns artefacts classified to those dimensions. This is precise and efficient for well-classified corpora but cannot address queries of the form “find evidence relevant to the hypothesis that X causes Y” or “find evidence similar to this specific artefact”, which require semantic rather than structural matching.

[0006] Reasoning chains that must retrieve the most relevant evidence for a given analytical question benefit from semantic retrieval — the ability to rank all evidence artefacts by semantic relevance to the query statement, rather than relying on a reasoning system to guess the correct taxonomy codes. Semantic retrieval does not replace taxonomy-based retrieval; it complements it for open-ended exploratory queries.

[0007] Semantic query results must be deterministic enough for governance purposes: given the same query and corpus state, results must be reproducible, and the ranking basis must be inspectable. The choice of embedding model and similarity metric must be recorded in the ledger as part of the query audit record.


Summary of the Disclosure

[0008] SemanticQuery Schema:

  • query_id — SHA-256 of canonical serialisation of all other fields
  • query_typeTEXT_STATEMENT (natural language query text), CONCEPT_ID (a known concept identifier from the AIEP concept registry), or REFERENCE_ARTEFACT (find semantically similar artefacts to a given DEID)
  • query_text — the query string (for TEXT_STATEMENT queries)
  • concept_id — concept identifier (for CONCEPT_ID queries)
  • reference_deid — DEID of the reference artefact (for REFERENCE_ARTEFACT queries)
  • corpus_scope — a TaxonomyQuery (P160) or DEID list restricting the corpus to search; null = full accessible corpus
  • minimum_relevance — minimum relevance score threshold (float 0–1; default: 0.60)
  • max_results — maximum number of results to return (default: 20; maximum: 200)
  • embedding_model — identifier of the embedding model to use (null = node default)
  • requestor_id — requesting entity identifier

[0009] SemanticQueryEngine: The SemanticQueryEngine processes a SemanticQuery by: (a) encoding the query into a dense vector embedding using the specified embedding_model; (b) for REFERENCE_ARTEFACT queries, using the embedding of the reference artefact’s content_canonical as the query vector; (c) computing cosine similarity between the query vector and the pre-computed content embeddings of all artefacts within corpus_scope; (d) ranking results by cosine similarity score, descending; (e) filtering to results with similarity score ≥ minimum_relevance; and (f) returning the top max_results results.

[0010] SemanticQueryResult Schema: The result of a SemanticQuery is a SemanticQueryResult comprising:

  • query_id — the originating query identifier
  • corpus_size_searched — total number of artefacts evaluated
  • result_count — total results matching criteria
  • returned_count — number of results in this response
  • embedding_model_used — identifier and version of the embedding model applied
  • results — ordered list of SemanticQueryResultItem:
    • evidence_id — DEID of the matching artefact
    • relevance_score — cosine similarity score (float 0–1)
    • summary — first 280 characters of content_canonical
    • classification — ClassificationVector (P160) of the artefact
    • citation — CitationRecord (P157) of the artefact
    • match_fields — list of content sections with highest similarity contribution

[0011] Embedding Index Maintenance: The SemanticQueryEngine maintains a pre-computed embedding index over all Active tier (P167) artefacts in its corpus scope. Embeddings are computed when artefacts enter Active tier and are updated when content_canonical changes (e.g. after re-translation with an improved model, P159). Embeddings of Archived artefacts are retained in a lower-priority index partition to allow semantic search over the archive when explicitly requested.

[0012] Governance Constraints:

  • All SemanticQuery executions are recorded in the QueryAuditLog partition of the dual ledger (P80), with query_id, requestor_id, corpus_scope, embedding_model_used, and result_count.
  • Requesting entities are rate-limited: a default of 100 SemanticQueries per 24-hour window per node, configurable by the node operator.
  • corpus_scope is always enforced against the requesting entity’s PermissionTokens (P166); an entity cannot semantic-query evidence it does not have READ access to.

[0013] Determinism Note: Cosine similarity over floating-point embeddings is deterministic given the same model weights and the same artefact content. Minor variations in result ordering at near-identical relevance scores may occur under hardware floating-point non-determinism; the semantic index records the embedding model version for reproducibility audit purposes.


ASCII Architecture

SemanticQuery
(text / concept / DEID reference)


┌────────────────────────┐
│  SemanticQueryEngine   │
│  encode query → vector │
│  cosine sim over index │
└──────────┬─────────────┘
           │ ranked result list

┌────────────────────────┐    ┌──────────────────────┐
│  PermissionToken check │───▶│  AccessControl (P166)│
│  (P166) per result     │    └──────────────────────┘
└──────────┬─────────────┘
           │ filtered results

┌────────────────────────┐    ┌──────────────────────┐
│  SemanticQueryResult   │───▶│  QueryAuditLog (P80) │
│  (ranked, annotated)   │    └──────────────────────┘
└────────────────────────┘

Embedding Index:
Active artefacts → compute embeddings → store in vector index
(updated on content change)

Operational Detail

[0014] Hybrid Query: A requesting node may combine a SemanticQuery with a TaxonomyQuery (P160/P133) to perform a hybrid retrieval: first filter by taxonomy, then rank the filtered set semantically. This reduces semantic search space and improves precision for queries that are both topically scoped and semantically rich.

[0015] Concept Registry: The AIEP concept registry (referenced in CONCEPT_ID query type) is a governed list of named concepts with associated description text and canonical embedding vectors. Concept IDs enable stable query references that do not depend on the exact phrasing of a query text string. The concept registry is versioned and published via the AIEP well-known mechanism (P64).

[0016] Model Governance: Embedding models used by the SemanticQueryEngine must be registered in the node’s model registry before use, with a model identifier, version, input/output specifications, and the governance tier under which the model is permitted (P89). Unregistered models cannot be specified in a SemanticQuery embedding_model field.


Claims-Exclusion Notice

This specification is published as open-source prior art. No patent claims are asserted by the author in respect of the mechanisms described. Any third party seeking to patent mechanisms substantially equivalent to those described herein is placed on notice of this prior art disclosure.