Source Integrity

A citation URL tells you where a source claims to be. It does not tell you how that source was reached, whether it was served from a cache, routed through a proxy, restricted by law, or accessible only from a private network.

AIEP’s P124 inspectSourceIntegrity() function checks all of these before any source enters the evidence chain.

Why network path matters

Two identical source URLs can deliver different content depending on:

Whether the request is routed through a corporate VPN or proxy (which may serve a cached, modified, or locally-overridden version)
Whether the source has been geo-restricted since the last time it was cited
Whether TLS verification succeeds (confirming content has not been tampered with in transit)
Whether the source IP resolves to a private network range (an internal system exposed by misconfiguration)
Whether the source has returned an HTTP 451 (Unavailable for Legal Reasons) response indicative of jurisdiction-based restriction

An AI evidence system that does not inspect the network path of its sources cannot distinguish between a clean public record and a proxy-cached corporate intranet page. The citation looks the same. The evidence is not.

None of AIEP’s competitors — including Perplexity, Grok, Gemini, Harvey, or AlphaSense — interrogate the network path of fetched sources. They inherit whatever the HTTP response delivers.

The inspectSourceIntegrity() function (P124)

P124 defines six checks applied at fetch time:

1. TLS verification

The source must be served over a valid TLS connection with a verifiable certificate chain. A source that fails TLS is not accepted. A source served over HTTP (non-TLS) receives a LOW confidence tier and a flag in the artefact record.

2. Via header analysis

HTTP Via headers indicate proxy traversal. A source delivered with one or more Via headers has been routed through at least one intermediate layer. This is flagged in the artefact record. The content may not represent the canonical source as served by the origin server.

3. VPN / proxy IP detection

The outbound request IP is checked against known VPN provider ranges and anonymous proxy lists. A request originating from a tunnel endpoint is flagged. This prevents corporate network deployments from silently serving proxied content as if it were direct public-internet evidence.

4. RFC1918 private range detection

Source URLs that resolve to RFC1918 addresses (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) are blocked. An internal IP address is not a public evidence source and must not enter a verifiable evidence chain as if it were.

5. HTTP 451 check

An HTTP 451 response indicates the source is unavailable for legal reasons in the current jurisdiction. A P124-compliant system records this as a negative integrity signal. The failure to retrieve legally-restricted content is an evidence integrity event, not a retrieval error.

6. Content hash verification

Where a prior retrieval exists for the same URL, the content hash of the current response is compared to the recorded hash. A mismatch indicates the source has changed since it was last cited. This is not treated as an error — sources change — but it is recorded as a freshness signal in the artefact record.

How integrity results affect artefact records

Each fetched source produces an artefact record that includes:

{
  "artefact_id": "sha256(...)",
  "source_url": "https://example.com/article",
  "retrieved_at": "2026-03-17T14:22:00Z",
  "content_hash": "sha256(...)",
  "confidence_tier": "VERIFIED",
  "integrity_flags": [],
  "via_proxy": false,
  "tls_valid": true,
  "private_range": false
}

If any integrity check fails:

The confidence_tier is downgraded: VERIFIED → DEGRADED → FLAGGED
The specific flag is recorded in integrity_flags
The artefact is still usable — unless the tier drops to BLOCKED — but its weight in the evidence reasoning is reduced
The degradation reason is surfaced to the user via the evidence rail

The difference from content accuracy checks

Source integrity is distinct from source accuracy. A source can be:

Accurate and clean (passes all integrity checks — VERIFIED)
Accurate but proxy-routed (passes accuracy, fails network path — DEGRADED)
Inaccurate but TLS-clean (fails accuracy checks in reasoning, passes integrity)
Unavailable and restricted (HTTP 451 — BLOCKED, dissent signal triggered)

AIEP separates these two dimensions. The evidence admissibility gate (P03, P04) handles accuracy and plausibility. P124 handles provenance — the independent question of whether the source reached the system cleanly.

Both questions must be answered before an artefact is fully admitted to the evidence chain.

Architecture position

Source URL
     │
     ▼
[ inspectSourceIntegrity() ] ← P124
     │
     ├─ Proxy/VPN check
     ├─ TLS validation
     ├─ Via header analysis
     ├─ RFC1918 range check
     ├─ HTTP 451 check
     └─ Content hash comparison
     │
     ▼
artefact_record { confidence_tier, integrity_flags, content_hash }
     │
     ▼
[ Admissibility Gate ] ← P03, P04
     │
     ▼
Evidence chain (P37)

P124 — Source Integrity Inspection Protocol
P37 — Evidence commitment chain
P03 — Plausibility matrix (admissibility)
P126 — Negative proof hash (dissent when evidence chain is incomplete)