Crystallization: Teaching AI to Remember

When an LLM answers a question, the answer evaporates. Next session, it's gone — no trace, no memory, no learning. The weights didn't change. The knowledge graph didn't grow. The system is exactly as ignorant as it was before.

We built a mechanism called crystallization that changes this. But not the way you'd expect.

The Geological Analogy

Minerals dissolved in water are invisible and impermanent. Given the right conditions — temperature, pressure, time — they crystallize into solid structures that persist for millennia.

Source prose — textbooks, guidelines, research papers — contains knowledge dissolved in natural language. Our crystallization pipeline provides the conditions to extract it into permanent, structured form.

The Key Insight: Don't Trust the LLM

The obvious approach would be to take the LLM's answer and extract knowledge from it. We tried that. It's dangerous.

If the LLM hallucinates — and it will — you've just permanently stored a hallucination in your knowledge graph. In medical domains, that's not a quirky bug. It's a liability.

So we separated two things that most systems conflate: answering and learning.

Answering is what the LLM does for the user right now. It synthesizes a response, the user gets their answer, and that's it. The LLM's text is never treated as a source of permanent knowledge.

Learning is what the crystallization pipeline does from source prose. Textbooks, clinical guidelines, curated documents that the being has in its library. The same pipeline runs whether a being is reading a book or filling a gap discovered during conversation.

How It Works

A being has a library of source documents — its prose layer. When the being reads and studies these documents, crystallization extracts structured knowledge from them. The same process kicks in during conversation when a gap is detected.

Here's what happens when a user asks a question the being can't answer from its graph:

1. Gap Detection. The being queries its knowledge graph. No results. There's a gap.

2. Prose Lookup. A local LLM checks the being's source prose — its Y0 layer of chunked documents — to determine whether the answer exists there. It doesn't respond to the user. It simply confirms to the brain: "yes, we can likely answer this from the prose," and points to the relevant location.

3. Extraction. From there, the standard learning pipeline takes over. The same pattern-based extraction that runs when a being reads a book now runs against the identified prose passages, pulling out subject-predicate-object triples. These patterns are domain-specific — loaded from configuration files, not hardcoded — so a medical being extracts different relationships than an educational one.

5. Validation. Each extracted triple is checked: Does it contradict existing knowledge? Is the subject actually present in the source passage? Does it match the domain ontology? Triples that pass proceed. Triples that fail are flagged.

6. Commitment. Validated triples are committed to the knowledge graph with full provenance: which source document, which passage, when, and the confidence score.

The critical point: the LLM's only role is a lookup — confirming whether the prose can answer the question and where. Everything after that is the same symbolic knowledge creation pipeline that runs when a being reads a book. No LLM output is ever treated as knowledge.

The Same Pipeline, Two Entry Points

This isn't a special mechanism that only runs during conversation. When a being studies a document — being-cli santiago learn textbook.md — the exact same extraction pipeline processes the text. Same patterns, same validation, same provenance tracking.

Crystallization during conversation is just the same learning process, triggered by a gap instead of a direct command. The being notices it doesn't know something, checks if its library has the answer, and if so, learns from the source material. If the library doesn't have it, no knowledge is created. The being simply doesn't learn what it can't verify.

The Rejection Pipeline

Not everything extracted becomes knowledge. This is the critical design choice.

Claims that contradict existing triples are rejected, not silently overwritten. The being preserves what it learned before. If new evidence is strong enough to override prior knowledge, that's a deliberate revision — logged, justified, and auditable.

Claims with low confidence are deferred. They exist as candidates, not as knowledge. A human reviewer or a subsequent learning session can promote them or discard them.

The graph grows carefully, not recklessly.

The Numbers

From our Paper 104 research:

Crystallization latency: <1 second per event
Triple yield: ~600-700 triples per 10,000-word document
Validation pass rate: ~85% (15% rejected or deferred — this is a feature, not a bug)
Provenance completeness: 100% — every triple traces to a specific source passage

In our largest training run: 16 books, 11,583 triples crystallized, 6,533 validated facts retained. That's a being that remembers what it studied — not perfectly, not completely, but verifiably. And critically: not a single triple came from an LLM's improvisation.

Why This Matters

In clinical decision support, the difference between "the LLM said this once" and "this is validated knowledge extracted from a clinical guideline" is the difference between a liability and a tool.

When a being recommends considering pulmonary embolism based on a patient's risk factors, you can trace that recommendation to:

The specific guideline passage it was extracted from
The pattern that captured the relationship
The validation that confirmed it against the ontology
The confidence score at each step

That chain of provenance is what makes neurosymbolic AI suitable for domains where "trust me, I'm a large language model" isn't good enough. The LLM helps you think. The knowledge graph remembers what's actually true.

Previous: [Why We Built Neurosymbolic Beings](/why-neurosymbolic-beings) | Next: [What Video Games Taught Us About AI Memory](/video-games-ai-memory)