Every few months, a new paper announces a technique to "reduce hallucinations" in large language models. Retrieval-Augmented Generation. Chain-of-thought prompting. Constitutional AI. Self-consistency checking.

These are patches on an architecture that was never designed for factual reliability.

The Real Problem

LLMs hallucinate because they were built to predict the next token, not to reason from verified facts. When you ask a language model "What are the contraindications for metformin?", it doesn't consult a knowledge base. It generates text that looks like a correct answer based on patterns in its training data.

Sometimes it's right. Sometimes it's plausible-sounding garbage. And you can't tell the difference by looking at the output.

In casual conversation, this is annoying. In clinical decision support, it's dangerous.

The Design Alternative

Our approach starts from a different premise: if the AI doesn't have provenance-tracked knowledge about the topic, it should say so — not make something up.

This is implemented through three mechanisms:

1. Knowledge Grounding

Every claim a NuSy being makes must trace to a triple in its knowledge graph. The being can't assert "metformin is contraindicated in severe renal failure" unless it has:

<metformin> contraindicated_in <severe_renal_failure>
    source: "FDA prescribing information, 2024"
    confidence: 0.98
    learned_from: "corpus/guidelines/diabetes_management.md"

No triple? No assertion. The being says "I don't have validated knowledge about this" instead of confabulating.

2. Coverage Gating

Before answering a question, the reasoning engine checks Y6 (metacognitive) coverage for the relevant topic. If the being has only studied 30% of the diabetes management curriculum, it reports that — rather than pretending to full expertise.

This is the opposite of how most LLMs work. ChatGPT will confidently answer questions about any topic. A NuSy being will honestly report its coverage gaps.

3. Contradiction Detection

When new claims enter the system — whether from crystallization, curriculum study, or conversation — they're checked against existing knowledge. If a new claim contradicts a validated triple, it's flagged for review rather than silently replacing the existing knowledge.

This prevents a particularly insidious failure mode: the LLM hallucinating a claim that overwrites correct knowledge in the graph.

The Healthcare Case

In clinical decision support, these mechanisms aren't nice-to-have — they're the difference between a certified medical device and a toy.

The regulatory landscape (FDA, CE marking) requires:

A hallucinating LLM fails all three. A neurosymbolic being with provenance tracking, coverage gating, and contradiction detection passes them structurally — not through post-hoc patches, but through architectural design.

What the Industry Gets Wrong

The current approach to hallucination reduction is fundamentally misguided. It treats hallucination as a failure mode to be minimized — like reducing a noise floor. But hallucination isn't noise. It's the default behavior of generative models operating without grounding.

The correct response isn't "make the LLM hallucinate less." It's "build an architecture where hallucination is structurally impossible for claims the system makes with confidence."

Ungrounded claims get deferred, not asserted. Coverage gaps get reported, not hidden. Contradictions get flagged, not ignored.

This is harder than fine-tuning away the worst hallucinations. But it's the only approach that leads to AI you can actually trust in high-stakes domains.


Previous: What Video Games Taught Us About AI Memory | Next: When Simulations Lie: What Live Testing Taught Us