◎ OS PUB Apache 2.0 ← All specifications

P214 — AIEP — Meta-Reasoning and Quality Evaluation Engine

Applicant: Neil Grassby Classification: Patent Application — Confidential Priority: Claims priority from GB2519711.2 filed 20 November 2025 Architecture Layer: AIEP AGI Cognition Layer — Phase 2

Framework Context

[0001] This specification operates within an AIEP environment as defined in GB2519711.2 and GB2519798.9. The present specification defines the meta-reasoning and reasoning quality evaluation mechanism of the Phase-2 AIEP cognition architecture, enabling the system to monitor the quality of its own reasoning outputs and initiate corrective cycles when quality falls below policy thresholds.

Field of the Invention

[0002] The present invention relates to meta-cognitive systems and reasoning quality evaluation architectures for evidence-bound artificial intelligence.

[0003] More particularly, the invention relates to a system that applies structured quality criteria to the outputs of active reasoning processes, detects quality deficiencies, and triggers escalation, additional evidence retrieval, or reasoning restart as appropriate.

Background

[0004] Reasoning systems that produce structured outputs without self-evaluation may generate low-quality conclusions without flagging this to downstream consumers. Quality deficiencies including insufficient evidence support, internal logical inconsistency, over-reliance on uncertain capabilities, and inadequate dissent resolution must be detected and addressed before outputs are admitted to the evidence ledger.

Summary of the Invention

[0005] The invention provides a Meta-Reasoning and Quality Evaluation Engine (MRQEE) that evaluates reasoning session outputs against a multidimensional quality rubric comprising: evidence support score (ratio of claims supported by admitted evidence); internal consistency score (absence of self-contradicting claims); dissent resolution completeness (whether all active dissent records have been resolved); capability confidence coverage (whether all applied tools met minimum capability confidence thresholds); and goal alignment score (whether the output advances approved goal states).

[0006] Sessions whose quality score falls below the policy threshold are escalated to a second reasoning cycle with additional evidence retrieval and expanded agent ensemble. Session outputs that pass quality evaluation are admitted to the AIEP evidence ledger with their quality scores embedded in the evidence artefact metadata.

ASCII Architecture

Reasoning Session Output
            |
            v
+------------------------------------------+
| Meta-Reasoning / Quality Evaluation      |
|                                          |
|  Evidence Support Score                  |
|  Consistency Score                       |
|  Dissent Resolution Completeness         |
|  Capability Confidence Coverage          |
|  Goal Alignment Score                    |
|                                          |
|  Composite Quality Score = f(above)     |
+------------------+-----------------------+
                   |
          PASS           FAIL
           |               |
           v               v
  Evidence Ledger    Second Reasoning Cycle
  Admission          (expanded evidence +
  (with scores)       extended ensemble)
                          |
                    PASS       FAIL (x2)
                     |             |
                     v             v
              Ledger Admission  Escalation
                                (P215 / P259)

Definitions

[0007] Meta-Reasoning and Quality Evaluation Engine (MRQEE): The subsystem that evaluates completed reasoning sessions against a multidimensional quality rubric and gates evidence ledger admission to sessions meeting the quality threshold.

[0008] Evidence Support Score: The ratio of reasoning session claims that are traceable to at least one admitted evidence artefact, expressed as a fraction in the range [0.0, 1.0].

[0009] Internal Consistency Score: A score in the range [0.0, 1.0] reflecting the absence of self-contradicting claims within the reasoning session output, evaluated by applying the AIEP divergence detection protocol (P37).

[0010] Dissent Resolution Completeness: A binary indicator (1.0 = all dissent records resolved, 0.0 = outstanding unresolved dissent) derived from the Multi-Agent Reasoning Dissent Engine (P209) session summary.

[0011] Composite Quality Score: A weighted aggregate of the five component scores, computed using weights defined in the active governance policy, where the minimum passing threshold is also policy-defined.

Detailed Description

Session Quality Assessment Protocol. [0012] On completion of a reasoning session, the MRQEE receives the full session output package comprising: the reasoning trace, all claims proposed by participating agents, dissent records from the Dissent Engine (P209), capability confidence records from the Self-Model Engine (P212), and the active GoalVector from the Goal Formation Engine (P210). The MRQEE computes five component scores independently before combining them into the composite quality score.

Evidence Support Evaluation. [0013] The MRQEE iterates over all claims in the reasoning session output and, for each claim, queries the AIEP evidence ledger for at least one admitted evidence artefact whose assertion set supports the claim. Claims with at least one supporting artefact are marked SUPPORTED; claims without support are marked UNSUPPORTED. The evidence support score is computed as: (count of SUPPORTED claims) / (total claims in session). Claims marked UNSUPPORTED are included in the quality evaluation report as individually identified deficiencies.

Consistency Evaluation. [0014] The MRQEE submits the full claim set to the divergence detection protocol (P37) to identify pairs of contradictory claims. Each detected contradiction reduces the internal consistency score by a policy-defined decrement. The consistency score begins at 1.0 and is decremented for each identified contradiction pair. Where the same evidence artefact supports contradictory claims, a consistency contradiction artefact is admitted to the evidence ledger as a separate record, flagging the contradiction for governance review.

Dissent and Capability Checks. [0015] Dissent resolution completeness is read directly from the session dissent summary produced by P209: if no unresolved dissent records exist at session close, the score is 1.0; otherwise 0.0. Capability confidence coverage evaluates whether every tool invoked during the session had a confidence score at or above the governance policy minimum at invocation time; the score is computed as the proportion of tool invocations meeting the minimum threshold.

Goal Alignment Evaluation. [0016] Goal alignment is assessed by comparing the session’s output claims and proposed actions to the active GoalVector. The alignment score is computed as the proportion of session outputs that advance or maintain at least one active terminal or instrumental goal. Outputs that neither advance nor conflict with any active goal are treated as neutral and do not reduce the score.

Second Cycle and Escalation. [0017] Sessions failing the composite quality threshold trigger a second reasoning cycle with an expanded evidence retrieval window and an extended agent ensemble. If the second cycle also fails, the session is escalated to the Safety Constraint Engine (P215) and, if required, the break-glass override mechanism (P259). Sessions admitted after a second cycle carry a second-cycle annotation in their ledger admission record.

Technical Effect

[0018] The invention provides a multidimensional, evidence-grounded quality gate for AI reasoning sessions. By requiring quantified evidence support, internal consistency, dissent resolution, capability adequacy, and goal alignment before ledger admission, the engine ensures that only verifiably sound reasoning products enter the AIEP evidence record. By triggering a structured second reasoning cycle rather than discarding failed sessions, the engine recovers from single-cycle quality failures without silently admitting low-quality outputs.

Claims

A computer-implemented method for meta-reasoning quality evaluation, the method comprising: (a) receiving a completed reasoning session output comprising claims, dissent records, capability confidence records, and an active GoalVector; (b) computing an evidence support score by evaluating each session claim against admitted evidence artefacts and recording the proportion of claims with at least one supporting artefact; (c) computing an internal consistency score by applying a divergence detection protocol to the full claim set and decrementing from 1.0 for each identified contradiction pair; (d) computing a dissent resolution completeness score and a capability confidence coverage score from session component records; (e) computing a goal alignment score reflecting the proportion of session outputs advancing or maintaining active goals; and (f) combining component scores into a composite quality score, admitting passing sessions to the evidence ledger with scores embedded in artefact metadata, and triggering a second reasoning cycle for sessions falling below the quality threshold.
The method of claim 1, wherein claims identified as UNSUPPORTED in the evidence support evaluation are individually listed in the quality evaluation report as named deficiencies.
The method of claim 1, wherein contradiction pairs identified in the consistency evaluation where both sides are supported by admitted evidence generate a consistency contradiction artefact admitted to the evidence ledger for governance review.
The method of claim 1, wherein sessions failing the second reasoning cycle are escalated to the Safety Constraint Engine and the break-glass override mechanism with the full quality evaluation report attached.
The method of claim 1, wherein the component score weights and the composite quality threshold are defined in the active governance policy, enabling policy-controlled tuning of quality standards.
A Meta-Reasoning and Quality Evaluation Engine comprising: one or more processors; memory storing a quality rubric, session evaluation report buffer, and ledger admission interface; wherein the processors are configured to execute the method of claim 1.
A non-transitory computer-readable medium storing instructions that, when executed by a processor, implement the method of claim 1.

Abstract

A meta-reasoning and quality evaluation engine for evidence-bound artificial intelligence computes five component quality scores — evidence support, internal consistency, dissent resolution completeness, capability confidence coverage, and goal alignment — and combines them into a composite quality score. Sessions meeting the policy-defined threshold are admitted to the AIEP evidence ledger with quality metadata embedded in the artefact record. Sessions failing the threshold trigger a second reasoning cycle with expanded evidence and ensemble; persistent failures are escalated to the safety enforcement layer. v v Admit to Ledger Escalation Cycle with Quality (deeper reasoning) Metadata


---

## Detailed Description

[0007] **Evidence Support Score.** The engine counts the fraction of distinct claims in the reasoning output that are directly supported by admitted evidence artefacts. Claims that cannot be traced to evidence artefact hashes receive zero evidence support weight.

[0008] **Internal Consistency Score.** The engine checks for logical contradictions within the reasoning output. A contradiction is detected when two claims in the output assert incompatible entity states for the same entity at the same world_time.

[0009] **Dissent Resolution Completeness.** The engine queries the Multi-Agent Dissent Engine (P209) for any open dissent records associated with the current session. Sessions with unresolved dissent records above the configured severity threshold cannot receive a passing quality score.

[0010] **Escalation Protocol.** When a session fails quality evaluation, the MRQEE generates an escalation record specifying the quality dimensions that failed and the minimum additional evidence artefacts needed to address the failing dimensions. A new reasoning cycle is initiated with this context.

[0011] **Quality Metadata.** Quality scores are embedded in the evidence artefact metadata for all admitted outputs, enabling downstream consumers and audit systems to assess the quality of any historical reasoning output.

---

## Claims

1. A meta-reasoning quality evaluation engine for an evidence-bound reasoning architecture that evaluates reasoning outputs against multidimensional quality criteria before evidence ledger admission.
2. The system of claim 1 wherein quality dimensions include evidence support, logical consistency, dissent resolution completeness, and goal alignment.
3. The system of claim 1 wherein failing outputs trigger an escalation cycle with additional evidence retrieval.
4. The system of claim 1 wherein passing outputs are admitted with quality scores embedded in evidence artefact metadata.
5. The system of claim 1 wherein unresolved dissent records prevent a session from receiving a passing quality score.