◎ OS PUB Apache 2.0 ← All specifications

P55 — AIEP — Feature Extraction and Similarity Computation Model

Status: Active — OS PUB (Apache 2.0) Revision: Reinstated from abandoned status — confirmed implementation substrate for P95/P96 Related Specs: P95 (Cross-Session Cognitive Pattern Accumulation), P96 (Reasoning Style Fingerprint Substrate) PIEA Build Relevance: Required implementation substrate for cognitive-pattern TypeScript package. Defines FeatureVector construction, weighted Euclidean distance similarity, SimilarityRecord structure that P95/P96 depend on. Note: Specific feature vector weight distributions and similarity threshold calibration values remain internal (Phatfella Ltd trade configuration). The mechanism is fully disclosed here as prior art.


Publication Date: 2026-03-08 Licence: Apache License 2.0 Author / Organisation: Phatfella Ltd Related Filings: GB2519711.2 (Core Protocol) Related Specs (current): P95 P96


Field of the Invention

[0001] The invention relates to pattern recognition systems for AI substrates. More particularly, the invention relates to a deterministic feature extraction and similarity computation model enabling an AIEP substrate to identify characteristic reasoning patterns in its own outputs and compute similarity scores across sessions for continuity and fingerprinting.


Background

[0002] The continuity of an AI reasoning substrate across sessions is not merely a matter of state persistence. A substrate that has developed a characteristic way of approaching a problem domain — a reasoning style, an evidence preference, a constraint weighting pattern — should recognise its own prior outputs as its own, not because it was told to, but because those outputs are structurally similar in extractable feature space.

[0003] This capability is the computational basis for cognitive continuity. Without it, each session begins from zero. With it, the substrate can identify that a new reasoning challenge is structurally similar to a prior challenge it resolved, retrieve the relevant cognitive pattern, and apply it. This is not memory retrieval — it is pattern-based inference from structural similarity.

[0004] No existing AI system provides a principled, governed mechanism for computing structural similarity between reasoning outputs — as opposed to semantic similarity between text. Semantic similarity matches words and concepts. Structural similarity matches the governance decisions, evidence weights, and branch resolution patterns that characterise how a particular substrate thinks.


Summary of the Invention

[0005] The invention provides a deterministic feature extraction and similarity computation model in which reasoning outputs are decomposed into a structured FeatureVector — a schema-defined set of numeric features representing the governance decisions, evidence weights, and branch resolution patterns present in the output.

[0006] FeatureVectors are extracted deterministically from reasoning outputs. Two identical reasoning outputs produce identical FeatureVectors. The extraction is schema-versioned — a FeatureVector extracted under schema v1.0 is not comparable to one extracted under schema v1.1 without a governed transformation.

[0007] Similarity between FeatureVectors is computed as a deterministic similarity score in [0, 1]. The similarity score is computed as a weighted Euclidean distance metric over the feature space, where feature weights are retrieved from a versioned registry. A similarity score of 1.0 indicates structural identity. A similarity score at or above a schema-defined SimilarityThreshold constitutes a match — the substrate recognises the two reasoning outputs as structurally similar.

[0008] The similarity computation is used by P95 (cognitive pattern accumulation) and P96 (reasoning style fingerprint) to identify recurring patterns and establish continuity signatures.


Detailed Description

[0009] FeatureVector construction proceeds as follows. The reasoning output is analysed for the following feature categories:

(a) Governance features — the set of governance rules activated during production; the number of fail-closed rejections; the constitutional constraint proximity score at output production.

(b) Evidence weight features — the mean, variance, and distribution of evidence weights across the input evidence set; the ratio of high-confidence to low-confidence evidence artefacts.

(c) Branch resolution features — the number of active branches at resolution; the dominant branch advantage (lead over second branch at resolution); the presence of active dissent at resolution.

(d) Temporal features — evidence age distribution; the proportion of recently retrieved versus archive evidence.

(e) Jurisdiction features — the jurisdiction profile of the evidence set; the number of distinct jurisdictions represented.

[0010] Each feature category produces a normalised sub-vector. The full FeatureVector is the concatenation of sub-vectors. The FeatureVector length is fixed by the schema version — a constant-length representation enabling direct comparison.

[0011] The weighted Euclidean distance between two FeatureVectors V₁ and V₂ is computed as:

distance(V₁, V₂) = sqrt(sum(w_i * (V₁_i - V₂_i)^2))

over all feature dimensions i, where w_i is the weight for feature dimension i retrieved from the versioned registry.

[0012] The similarity score is 1 / (1 + distance(V₁, V₂)), normalised to [0, 1]. A distance of 0 (identical vectors) produces similarity 1.0. Increasing distance produces decreasing similarity monotonically.

[0013] The SimilarityRecord carries: the two FeatureVector identifiers; the computed distance; the similarity score; the feature weight version applied; and a SimilarityHash = H(fv1_id ‖ fv2_id ‖ similarity_score ‖ feature_weight_version ‖ schema_version).

[0014] The FeatureVector itself carries: the reasoning output identifier; the component sub-vector values; the schema version; and a FeatureVectorHash = H(output_id ‖ feature_values ‖ schema_version).

[0015] Integration with P95 (Cognitive Pattern Accumulation): P95 uses the SimilarityRecord to identify when a new reasoning output is structurally similar to prior outputs. When similarity score ≥ SimilarityThreshold, the reasoning output is classified as an instance of the matching cognitive pattern. Pattern frequency, evidence weight distribution, and branch resolution characteristics accumulate across sessions into a CognitivePatternProfile.

[0016] Integration with P96 (Reasoning Style Fingerprint): P96 derives a ReasoningStyleFingerprint from the statistical distribution of FeatureVector components across all outputs produced by a given substrate identity. The fingerprint is a compact representation of the substrate’s characteristic reasoning style — its typical governance activation profile, evidence weight preferences, and branch resolution tendencies. On substrate migration (P99), the fingerprint is used to verify continuity: the migrated substrate’s outputs should produce FeatureVectors with statistical distributions consistent with the prior fingerprint.

[0017] Implementation note for cognitive-pattern TypeScript package: The package must implement extractFeatureVector(reasoningOutput: ReasoningOutput): FeatureVector, computeSimilarity(v1: FeatureVector, v2: FeatureVector, weightRegistry: FeatureWeightRegistry): SimilarityRecord, and accumulatePattern(record: SimilarityRecord, profile: CognitivePatternProfile): CognitivePatternProfile. All computations are deterministic. The weight registry version must be committed with every SimilarityRecord to ensure reproducibility.


Claims

  1. A deterministic feature extraction and similarity computation model for an AIEP substrate, wherein reasoning outputs are decomposed into schema-defined FeatureVectors, similarity between FeatureVectors is computed as a weighted distance metric over feature space, and similarity scores at or above a schema-defined threshold constitute structural matches.

  2. The model of claim 1 wherein FeatureVectors are extracted deterministically — identical reasoning outputs produce identical FeatureVectors under the same schema version.

  3. The model of claim 1 wherein feature weights are retrieved from a versioned registry and are immutable during computation.

  4. The model of claim 1 wherein similarity scores are computed as 1 / (1 + weighted_euclidean_distance), normalised to [0, 1].

  5. The model of claim 1 wherein FeatureVectors comprise governance features, evidence weight features, branch resolution features, temporal features, and jurisdiction features as normalised sub-vectors.

  6. The model of claim 1 wherein SimilarityRecords carry the two FeatureVector identifiers, computed distance, similarity score, feature weight version, and SimilarityHash.

  7. The model of claim 1 wherein, on integration with P95, SimilarityRecords with scores at or above SimilarityThreshold are classified as instances of the matching cognitive pattern and accumulated into a CognitivePatternProfile across sessions.

  8. The model of claim 1 wherein, on integration with P96, the statistical distribution of FeatureVector components across all substrate outputs constitutes a ReasoningStyleFingerprint used for substrate continuity verification on migration.

  9. A computing system implementing the method of claim 1.


Abstract

A deterministic feature extraction and similarity computation model is disclosed for AIEP reasoning substrates. Reasoning outputs are decomposed into schema-defined FeatureVectors comprising governance decision features, evidence weight features, branch resolution features, temporal features, and jurisdiction features as normalised sub-vectors. Similarity is computed as a weighted Euclidean distance metric (1 / (1 + distance), normalised to [0,1]) with weights retrieved from a versioned registry. Similarity scores at or above a schema-defined SimilarityThreshold constitute structural matches. P95 uses SimilarityRecords to accumulate CognitivePatternProfiles across sessions. P96 derives a ReasoningStyleFingerprint from FeatureVector distributions for substrate continuity verification. All computations are deterministic and schema-versioned.


Drawings

Figure 1 — FeatureVector Construction Pipeline

   Reasoning Output
        |
        v
   +------------------------------+
   | Feature Category Extraction  |
   |                              |
   |  (a) Governance features     |  → sub-vector G  (normalised)
   |  (b) Evidence weight feats   |  → sub-vector E  (mean/var/dist)
   |  (c) Branch resolution feats |  → sub-vector B  (count/advantage)
   |  (d) Temporal features       |  → sub-vector T  (age distribution)
   |  (e) Jurisdiction features   |  → sub-vector J  (profile/count)
   +------------------------------+
        |
        v
   FeatureVector = concat(G, E, B, T, J)   ← fixed-length per schema version
        |
        v
   feature_vector_hash = sha256(output_id || feature_values || schema_version)

Figure 2 — Weighted Euclidean Similarity Computation

   FeatureVector V1 (output A)    FeatureVector V2 (output B)
          |                              |
          +----------------------------->+
                         |
                         v
   +-----------------------------------------------+
   | distance(V1, V2) = sqrt(                      |
   |   sum_i( w_i * (V1_i - V2_i)^2 )             |
   | )                                             |
   |                                               |
   | w_i from FeatureWeightRegistry (versioned)    |
   +-----------------------------------------------+
                         |
                         v
   similarity_score = 1 / (1 + distance)   ∈ [0, 1]

   similarity_score ≥ SimilarityThreshold  →  STRUCTURAL MATCH
   similarity_score <  SimilarityThreshold  →  no match

Figure 3 — Integration with P95 (Cognitive Pattern Accumulation)

   New reasoning output
        |
        v
   extractFeatureVector(output) → V_new
        |
        v
   for each stored V_prior in CognitivePatternProfile:
     computeSimilarity(V_new, V_prior, weightRegistry)
        |
        +-- score ≥ threshold -->  classify as instance of pattern P_k
        |                          accumulate: frequency++, weight dist update
        |
        +-- score <  threshold -->  no match; V_new may seed new pattern

   CognitivePatternProfile updated → persisted across sessions

Figure 4 — SimilarityRecord Structure and Hash Binding

   SimilarityRecord {
     fv1_id:                  feature_vector_hash of output A
     fv2_id:                  feature_vector_hash of output B
     distance:                (float, 6 decimal places)
     similarity_score:        (float, 6 decimal places)
     feature_weight_version:  "1.0.0"
     schema_version:          "2.0.0"
     similarity_hash:         sha256(
                                fv1_id || fv2_id
                                || similarity_score
                                || feature_weight_version
                                || schema_version
                              )
   }

   Determinism guarantee:
   identical V1, V2, weight_version → identical similarity_hash
   across all distributed nodes.

P55 — AIEP — Feature Extraction and Similarity Computation Model Phatfella Ltd · piea.ai · OS PUB — Apache 2.0 Implementation substrate for P95 (cognitive pattern) and P96 (reasoning style fingerprint). Required for cognitive-pattern TypeScript package.