◎ OS PUB Apache 2.0 ← All specifications

P204 — AIEP — Hypothesis Simulation Engine

Applicant: Neil Grassby Classification: Patent Application — Confidential Priority: Claims priority from GB2519711.2 filed 20 November 2025 Architecture Layer: AIEP AGI Cognition Layer — Phase 2 Dependencies: P200–P203

Framework Context

[0001] This specification operates within an AIEP environment as defined in GB2519711.2 and GB2519798.9. The present specification defines the hypothesis generation and simulation capability of the Phase-2 AIEP cognition layer.

Field of the Invention

[0002] The present invention relates to artificial intelligence hypothesis generation systems and evidence-bound simulation architectures.

[0003] More particularly, the invention relates to a system for formulating, evaluating, and comparing scientific-method-style hypotheses about real-world states and causal relationships, using an evidence-bound world state model as the simulation substrate.

Background

[0004] AI reasoning systems that generate causal explanations or predictions must support hypothesis formation and evaluation. Without a formal hypothesis lifecycle — generation, simulation, evaluation, acceptance or rejection — systems produce conjecture that is not verifiably linked to evidence and cannot be replicated.

[0005] Hypothesis simulation requires: a starting world state from which to simulate; a hypothesis specification defining the proposed causal mechanism; a simulation engine to propagate the hypothesis forward or backward through the world state; and an evaluation mechanism that compares simulated outcomes to actual evidence.

Summary of the Invention

[0006] The invention provides a Hypothesis Simulation Engine (HSE) that accepts a hypothesis specification — a proposed causal relationship, predictive claim, or explanatory model — and evaluates it against the current world state using counterfactual simulation.

[0007] The HSE generates simulation runs by initialising the Counterfactual Timeline Engine (P203) from a specified world state snapshot, applying the hypothesis as a set of injected events or rule modifications, and comparing the simulated world state against actual evidence.

[0008] A hypothesis receives a simulation fitness score based on: the fraction of simulated state changes that match actual evidence-grounded state transitions; the causal coherence of the simulated chain; and the absence of contradictions with admitted evidence artefacts.

ASCII Architecture

Hypothesis Specification
(proposed causal mechanism)
         |
         v
+-----------------------------------+
| Hypothesis Simulation Engine      |
|            (HSE)                  |
+----------------+------------------+
                 |
         +-------+-------+
         |               |
         v               v
  Simulation Run A   Simulation Run B
  (with hypothesis)  (null hypothesis)
         |               |
         v               v
+-----------------------------------+
|    Comparative Outcome Evaluator  |
+----------------+------------------+
                 |
                 v
    Hypothesis Fitness Score
    + Evidence Alignment Report

Definitions

[0009] HypothesisSpec: A structured specification of a proposed causal mechanism, predictive claim, or explanatory model. Each HypothesisSpec contains: a hypothesis identifier; a natural-language description; a formal causal rule expression; and the starting world state snapshot from which simulation proceeds.

[0010] SimulationRun: A single execution of a counterfactual simulation (using the CTE, P203) from a specified starting world state, applying the causal rules defined in or modified by the HypothesisSpec.

[0011] NullHypothesisRun: A SimulationRun applying only the baseline causal rule set without hypothesis-specific modifications, providing a control baseline for comparison.

[0012] HypothesisFitnessScore: A numerical score in the range 0.0–1.0 quantifying the degree to which a SimulationRun’s outcome matches the subsequent evidence-grounded world state. A score of 1.0 indicates perfect alignment; a score of 0.0 indicates complete contradiction.

[0013] EvidenceAlignmentReport: A structured comparison of the SimulationRun’s final world state against the actual evidence-grounded world state at the equivalent temporal position, itemising: matched entity states, divergent entity states, causal relationships supported by simulation, and causal relationships contradicted by simulation.

Detailed Description

[0014] Hypothesis Lifecycle. The HSE manages hypotheses through a lifecycle: (i) generation — a hypothesis is proposed by the reasoning layer or the Goal Formation Engine (P210); (ii) simulation — the HSE executes one or more SimulationRuns to evaluate the hypothesis; (iii) evaluation — the EvidenceAlignmentReport is produced and the HypothesisFitnessScore assigned; (iv) acceptance, suspension, or rejection — based on fitness score thresholds. Accepted hypotheses are promoted to the CWSG as candidate causal rules. Rejected hypotheses are archived with their evidence alignment reports.

[0015] Simulation Execution. To evaluate a hypothesis, the HSE: selects a starting world state snapshot from the CWSG (P200); initialises the Counterfactual Timeline Engine (P203) from that snapshot; injects the hypothesis-defined causal rule modifications; propagates forward to the designated evaluation time horizon; and retrieves the counterfactual world state at that time horizon.

[0016] Null Hypothesis Comparison. For each SimulationRun, the HSE simultaneously executes a NullHypothesisRun using the baseline causal rule set without hypothesis modifications. The HypothesisFitnessScore is computed relative to both the null baseline and the actual evidence-grounded state, allowing isolation of the explanatory contribution of the specific hypothesis.

[0017] Fitness Score Computation. The HypothesisFitnessScore is computed as: score = (matched_entity_states / total_entity_states) × (1 - (contradicted_causal_rules / total_causal_rules)). Hypotheses producing a score above the acceptance_threshold configuration value are promoted as candidate causal rules. Hypotheses producing a score below the rejection_threshold are rejected. Hypotheses between the thresholds are suspended pending additional evidence.

[0018] Multi-Hypothesis Evaluation. The HSE supports concurrent evaluation of multiple hypotheses against the same world state. When two or more competing hypotheses are under evaluation, the HSE produces a comparative ranking by HypothesisFitnessScore. Where two hypotheses achieve comparable fitness scores, both are retained as coexisting candidate explanations.

[0019] Governance Gate Integration. All hypothesis acceptance decisions — promotion to the CWSG, archival, or rejection — are evaluated through the Safety Constraint and Governance Enforcement Engine (P215) before being applied. Governance approval is required for any causal rule modification to the shared world model.

Technical Effect

[0020] The invention provides a structured, evidence-bound hypothesis lifecycle for AI reasoning systems, replacing ad hoc conjecture generation with a formal simulation, evaluation, and governance-controlled acceptance process. By anchoring simulation runs to verified world state snapshots and measuring fitness against actual evidence, the system ensures that only hypotheses consistent with admitted evidence are promoted to the shared world model. Null hypothesis comparison isolates the explanatory contribution of specific causal mechanisms.

Claims

A computer-implemented method for evaluating hypotheses within an evidence-bound reasoning architecture, the method comprising:

(a) receiving a hypothesis specification comprising a hypothesis identifier, a proposed causal mechanism, and a starting world state snapshot identifier;

(b) executing a simulation run by applying the proposed causal mechanism to the specified world state snapshot and propagating causal consequences to an evaluation time horizon;

(c) executing a null hypothesis run applying only the baseline causal rule set to the same world state snapshot;

(d) computing a hypothesis fitness score comparing the simulation run outcome to both the null hypothesis run outcome and the evidence-grounded world state at the evaluation time horizon; and

(e) classifying the hypothesis as accepted, suspended, or rejected based on configurable fitness score thresholds, subject to governance gate approval before any causal rule modifications are applied to the shared world model.
The method of claim 1, wherein the hypothesis fitness score is computed as a function of matched entity states divided by total entity states, adjusted by the fraction of causal rules contradicted by the simulation outcome.
The method of claim 1, wherein the method supports concurrent evaluation of multiple competing hypotheses against the same world state, producing a comparative ranking by fitness score.
The method of claim 3, wherein competing hypotheses achieving comparable fitness scores within a configurable tolerance are retained as coexisting candidate explanations.
The method of claim 1, wherein an evidence alignment report is produced for each simulated hypothesis, itemising matched entity states, divergent entity states, causal relationships supported by simulation, and causal relationships contradicted by simulation.
The method of claim 1, wherein accepted hypotheses are promoted to the causal world state graph as candidate causal rules, and rejected hypotheses are archived with their evidence alignment reports.
A hypothesis simulation engine comprising one or more processors, configured to execute the method of any of claims 1 to 6.
A non-transitory computer-readable medium storing instructions which, when executed by one or more processors, perform the method of any of claims 1 to 6.

Abstract

A computer-implemented hypothesis simulation engine is disclosed for evidence-bound AI reasoning architectures. Hypothesis specifications defining proposed causal mechanisms are evaluated by simulation from verified world state snapshots. Each simulation run is compared against a null hypothesis run and against the actual evidence-grounded world state, producing a hypothesis fitness score and evidence alignment report. Hypotheses are accepted, suspended, or rejected based on configurable fitness thresholds, with acceptance subject to governance gate approval before causal rule modifications are applied to the shared world model. Concurrent multi-hypothesis evaluation and comparative ranking are supported. +----------------+------------------+ | v Hypothesis Fitness Score + Evidence Alignment Report


---

## Detailed Description

[0009] **Hypothesis Specification Format.** A hypothesis specification contains: `hypothesis_id` (SHA-256 of hypothesis content); `proposed_cause` (entity or event serving as the hypothesised cause); `predicted_effect` (entity state change or event predicted to follow); `causal_mechanism` (natural language or structured description of the mechanism); `time_horizon` (temporal range of the simulation); and `confidence_prior` (initial confidence level before simulation).

[0010] **Simulation Run Protocol.** For each hypothesis, the HSE requests two simulation runs from the CTE (P203): an experimental run in which the hypothesis's causal mechanism is active; and a null hypothesis run in which no intervention is applied. Both runs share the same branch origin and temporal horizon.

[0011] **Fitness Evaluation.** After both runs complete, the HSE compares the experimental run's counterfactual state to: the actual evidence-grounded canonical state; and the null hypothesis run's simulated state. A fitness score is computed from the fraction of predicted entity state changes that are confirmed by actual evidence.

[0012] **Hypothesis Record.** Hypotheses are persisted to the Hypothesis Store with their simulation records and fitness scores. Accepted hypotheses (fitness score above threshold) are registered in the CWSG as confirmed causal relationships. Rejected hypotheses are retained in the store for audit and for informing future hypothesis generation.

[0013] **Certification Integration.** Simulation outputs are eligible for certification by the Evidence-Bound Simulation Certification Engine (P233), which binds the simulation record to the world state snapshot hash, ensuring reproducibility of the fitness evaluation.

---

## Claims

1. A hypothesis simulation engine for an evidence-bound reasoning architecture, wherein hypotheses are evaluated by comparing counterfactual simulation outcomes to actual evidence.
2. The system of claim 1 wherein each hypothesis generates both an experimental and null-hypothesis simulation run.
3. The system of claim 1 wherein a fitness score quantifies the alignment of simulated outcomes with evidence-grounded world state.
4. The system of claim 1 wherein accepted hypotheses are registered in the world state graph as confirmed causal relationships.
5. The system of claim 1 wherein simulation records are bound to world state snapshot hashes for reproducibility verification.
6. A method of evaluating a causal hypothesis against an evidence-bound world state model comprising generating simulation runs, comparing outcomes to evidence, and computing a fitness score.