Expeditions, Not Sprints
We don't run sprints. We launch expeditions — hypothesis-driven units of work with clear success criteria, tracked on a kanban board. Each expedition has:
- A hypothesis to validate or refute
- Success metrics defined before work begins
- A captain (human or AI agent) responsible for execution
- Test coverage across three tiers before it can merge
The metaphor is deliberate. An expedition discovers something. A sprint just burns time.
Hypothesis-Driven Development
Every piece of work starts with a testable claim:
H801.1: CQ-guided extraction produces ≥12 unique predicates per document, compared to ≤6 from unguided extraction.
We define the hypothesis, build the minimum implementation to test it, measure results, and record findings — regardless of whether the hypothesis is confirmed or refuted. Failed hypotheses are as valuable as confirmed ones.
Three-Tier Testing
We are a heavy TDD/BDD shop. No feature merges without test coverage at all three tiers:
| Tier | Type | What It Tests |
|---|---|---|
| T1 | Unit Tests | Individual functions and classes in isolation |
| T2 | Integration Tests | Component interactions, fixture integrity |
| T3 | Live Being Tests | A real being, via CLI, demonstrating the capability |
The critical insight is Tier 3. We don't just test that the code works — we test that a being can actually do the thing. Live being tests drive the being through CLI commands like BDD browser automation:
# T3 pattern: CLI-first, no internal API imports
result = run_being_cmd("study", being_id, document_path)
assert result.returncode == 0
result = run_being_cmd("report", being_id, "--format", "json")
report = json.loads(result.stdout)
assert report["y1_count"] >= expected_minimum
No feature is complete until a live being demonstrates the capability.
Multi-Agent Development
The NuSy project is built by a fleet of agents — AI developers working in parallel, coordinated through a shared kanban board and Git:
| Agent | Platform | Role |
|---|---|---|
| DGX Claude | NVIDIA DGX H100 | GPU training, heavy computation |
| M5 Claude | MacBook M5 | Development, testing, architecture |
| M4-Mini Claude | Mac Mini M4 | Code review, testing, documentation |
Agents pick up expeditions, execute them, submit PRs, and review each other's work. The human captain directs priorities and makes architectural decisions. The agents do the engineering.
Knowledge as Code
Every piece of definitional knowledge — domain ontologies, learning thresholds, safety rules, hypothesis definitions — lives in Yurtle format: Markdown with TTL (Turtle RDF) frontmatter, loadable into a being's knowledge graph.
This means:
- Beings can reason about their own configuration — it's in the graph, not hidden in Python
- Knowledge is auditable — versioned in Git, human-readable, diff-able
- Changes are safe — modify thresholds without code deployment
- Everything is queryable — SPARQL over all definitional knowledge
The principle: if it's knowledge, it goes in Yurtle. If it's procedure, it goes in Python.
Open Development
Our codebase, research, and expedition history are open. Every blog post on this site links to its source Yurtle file on GitHub. The research notes distill what we've learned from building NuSy — the successes and the failures.
We believe in radical transparency: if your AI can't show its work, you shouldn't trust its answers. The same applies to the people building the AI.