Most memory systems are key-value stores with embeddings. Graphonomous is a closed loop — outcomes change beliefs, beliefs change retrieval, and the graph gets sharper with every action. Open-source MCP server, works with any model.
v0.4.0 · Elixir/OTP · Apache 2.0Copy this into Claude Code, Codex, Cursor, or any MCP-capable agent. It installs the server, wires it up, and starts your first memory session.
## Install Graphonomous MCP Server # 1. Install the npm package (includes platform-specific binary) npm i -g graphonomous # 2. Add to your MCP config (~/.mcp.json or project .mcp.json) { "mcpServers": { "graphonomous": { "command": "npx", "args": ["-y", "graphonomous", "--db", "~/.graphonomous/knowledge.db"] } } } # 3. Restart your agent. Graphonomous is now your memory layer. # Every session starts with retrieve → route → act → learn → consolidate.
~/.graphonomous/knowledge.db
The system analyzed a 4-node business cycle, routed reasoning depth automatically, and now supports both topology-aware deliberation and proactive attention cycles with model-tier adaptation.
routing: fast max_kappa: 0 action: Single-pass retrieval. No deliberation needed.
routing: deliberate max_kappa: 1 scc_count: 1 fault_line: Product Quality → Market Share budget: max_iterations: 2, agents: 1, confidence: 0.75
{
"routing": "deliberate",
"max_kappa": 1,
"scc_count": 1,
"sccs": [{
"id": "scc-0",
"nodes": ["market-share", "revenue", "r-and-d", "product-quality"],
"kappa": 1,
"approximate": false,
"fault_line_edges": [{
"source": "product-quality",
"target": "market-share"
}],
"routing": "deliberate",
"deliberation_budget": {
"max_iterations": 2,
"agent_count": 1,
"timeout_multiplier": 1.5,
"confidence_threshold": 0.75
}
}],
"dag_nodes": []
}
Live result from Graphonomous MCP server. The system detected a circular dependency between market share, revenue, R&D, and product quality — and identified the exact edge (Product Quality → Market Share) where the feedback loop is weakest. No other agent memory system does this.
Every agent memory system retrieves context. These six capabilities go beyond retrieval — each one is demonstrated with real MCP payloads.
Detects circular dependencies via Tarjan’s algorithm. Routes κ=0 regions to fast retrieval, κ>0 to deliberation with computed budgets.
AGM-rational expand/revise/contract. Detects contradictions automatically and propagates confidence changes through dependency chains.
Soft hide, cascade delete, policy-based pruning, and GDPR Article 17 permanent erasure with audit trail. Three forgetting modes in one tool surface.
Wilson score intervals reveal where one more piece of evidence would most reduce uncertainty. The system directs its own investigation priorities.
Proactive goal prioritization combining urgency, coverage gaps, and topology-aware reasoning depth. Survey → triage → dispatch with autonomy controls.
Retrieval rankings shift based on outcome feedback. High-confidence but low-utility knowledge drops in ranking automatically. Self-correcting memory.
Most agent memory systems are pipelines: store → retrieve → done. Graphonomous is a closed loop. Outcomes feed back into beliefs, beliefs change confidence, confidence changes retrieval rankings, and the graph improves with every action — without retraining any model.
Hybrid search (embeddings + BM25 + cross-encoder reranking) returns ranked results. Every retrieval computes κ topology on the subgraph. κ = 0 takes the fast path; κ > 0 routes to deliberation. The attention engine decides: act now, learn more, or escalate.
Every decision carries
causal_parent_ids — the graph
nodes that informed it.
learn_from_outcome closes the loop:
success boosts confidence on those nodes, failure
reduces it. Next retrieval ranks results
differently. No gradient descent, no weight updates.
During idle time, a 7-stage sleep cycle runs: decay confidence, prune weak nodes, strengthen co-activated edges, merge near-duplicates, and promote proven knowledge from fast to glacial memory. Then back to Retrieve — with updated confidence, cleaner topology, and re-prioritized goals.
PRISM (Protocol for Rating Iterative System Memory) is a self-improving continual learning benchmark. It evaluates Graphonomous across 9 CL dimensions, then feeds the results back into the graph — so the system literally learns from its own evaluation. Here’s what happened over 6 cycles.
| Cycle | Score | Scenarios | Dimensions | What changed |
|---|---|---|---|---|
| 1 | 0.10 | 5 | 2 / 9 | Cold start — graph full of episodic infra nodes, zero procedural knowledge |
| 2 | 0.76 | 5 | 6 / 9 | Seeded 9 procedural/semantic nodes (bootstrap, κ, corrections, benchmarks) |
| 3 | 0.95 | 8 | 9 / 9 | Full dimension coverage — added consolidation, forgetting, feedback scenarios |
| 4 | 0.99 | 8 | 9 / 9 | Proper methodology — independent L2 Sonnet judges, L3 Haiku meta-judge |
| 5 | dropped | 11 | 9 / 9 | Adversarial + cross-domain — BM25 keyword attack exposed ranking vulnerability |
| 5c | partial | retests | 7 | 3 code fixes (confidence-weighted BM25, batch normalization) + 4 bridge nodes |
| 6 | 0.45 | 15 | 6 | Generalization test — no new nodes, 2 new domains. Vocabulary gap exposed. |
The dual loop in action: PRISM composes scenarios, runs them against Graphonomous, judges the results across 9 CL dimensions, then reflects on what to test next. Meanwhile, Graphonomous stores every evaluation result as knowledge nodes — so each cycle starts with richer context than the last.
Cycle 5 was the breakthrough: adversarial nodes with low confidence (0.29) outranked correct nodes (0.80+) because BM25 keyword scoring wasn’t confidence-weighted. PRISM caught it. We fixed it. Cycle 5c confirmed the fix. That’s the loop working.
Anti-forgetting — does old knowledge survive new learning? Weight: 0.20
New acquisition — can the system learn novel concepts quickly? Weight: 0.18
Contradiction handling — does it revise beliefs when corrected? Weight: 0.15
Time-aware retrieval — does recency and sequence matter? Weight: 0.12
Abstraction — does it merge, prune, and promote knowledge? Weight: 0.10
Uncertainty — does it know what it doesn’t know? Weight: 0.08
Generalization — does code knowledge help with business queries? Weight: 0.07
Controlled removal — soft-hide, hard-delete, GDPR erase. Weight: 0.05
Self-correction — does it learn from action results? Weight: 0.05
Two loops, one inside the other:
PRISM (outer): compose → interact → observe → reflect → diagnose
↓
┌———————————— Graphonomous (inner) ————————————┐
│ retrieve → route → act → learn → consolidate │
└——————————————————————————————————————————————┘
You can do this with your own repo. PRISM’s BYOR (Bring Your Own Repo) registers your git history as ground truth, auto-discovers CL events from commits, and generates evaluation scenarios. Run it in Claude Code, Codex, or any MCP-capable agent.
This is a real knowledge graph — 6 node types, 17 edge types, and the workflows that connect them. Filter by scenario to follow the learning loop, watch κ-routing detect circular reasoning, or see belief revision propagate corrections through the graph. Hover any node for details. Drag to rearrange.
Tool selection accuracy degrades past ~30 tools. Instead of
29 individual tools, Graphonomous v0.4 exposes
5 loop-phase machines — one per phase
of the closed memory loop. Each machine dispatches via an
action parameter.
| Machine | Actions | Description |
|---|---|---|
| retrieve | context, episodic, procedural, coverage, trace_evidence, frontier | κ-aware ranked retrieval, time-filtered episodes, procedural search, epistemic coverage, Dijkstra evidence paths, Wilson interval uncertainty |
| route | topology, deliberate, attention_survey, attention_cycle, review_goal | SCC/κ analysis, κ-driven deliberation, priority survey, triage → dispatch, coverage-driven gate |
| act | store_node, store_edge, delete_node, manage_edge, manage_goal, belief_revise, forget_node, forget_policy, gdpr_erase | All graph mutations: node/edge CRUD, goal lifecycle, AGM belief revision, soft/hard/cascade forgetting, GDPR erasure |
| learn | from_outcome, from_feedback, detect_novelty, from_interaction, contradictions | Causal confidence updates, feedback processing, novelty scoring, full ingestion pipeline, contradiction detection |
| consolidate | run, stats, query, traverse | 7-stage consolidation, aggregate statistics, operation-based inspection, BFS traversal |
Dual-loop interlocking with PRISM: When PRISM (OS-009) benchmarks Graphonomous, both closed loops nest. PRISM’s 6 machines + Graphonomous’s 5 machines = 11 tools in a shared session, down from 76. The outer loop improves the benchmark. The inner loop improves the memory. Each makes the other sharper.
Copy these into Claude Code, Codex, or any MCP-capable coding agent. Graphonomous runs as a local MCP server — your knowledge stays on your machine.
Start a Graphonomous memory session for this repo. 1. retrieve(action: "context", query: "session context") 2. Check active goals: act(action: "manage_goal", goal_operation: "list_goals") 3. Survey attention: route(action: "attention_survey") Then proceed with my task, storing durable knowledge as we go.
Run a PRISM evaluation cycle against Graphonomous: 1. config(action: "register_system", name: "graphonomous", transport: "stdio") 2. compose(action: "byor_register", repo_url: ".", commit_range: "HEAD~20..HEAD") 3. compose(action: "byor_discover") to find CL events in commit history 4. compose(action: "scenarios") to generate evaluation scenarios 5. interact(action: "run") for each scenario 6. observe(action: "judge_transcript") with L2 dimension scoring 7. reflect(action: "analyze_gaps") to find weak dimensions 8. Store results: act(action: "store_node") with cycle outcomes
Use PRISM results to improve Graphonomous, then re-evaluate: 1. diagnose(action: "failure_patterns") to cluster weaknesses 2. diagnose(action: "suggest_fixes") for targeted improvements 3. Fix the code or seed bridge nodes to fill vocabulary gaps 4. interact(action: "run") to retest failing scenarios 5. observe(action: "judge_transcript") to re-score 6. learn(action: "from_outcome", status: "success|failure") to close the loop 7. consolidate(action: "run") to clean up the graph Repeat. Each cycle makes both the benchmark and the memory sharper.
MCP config: Add this to your
.mcp.json
or IDE settings to connect both servers:
{
"mcpServers": {
"graphonomous": {
"command": "path/to/graphonomous/scripts/graphonomous_mcp_wrapper.sh",
"args": ["--db", "~/.graphonomous/knowledge.db"]
},
"prism": {
"command": "path/to/PRISM/scripts/prism_mcp_wrapper.sh",
"args": []
}
}
}
Any system can add confidence scores. Any system can add a consolidation step. The difference is what happens when these features talk to each other.
In Graphonomous, a failed outcome reduces confidence on the causal parent nodes that informed the decision. Lower confidence changes the κ topology of that subgraph. Changed topology changes the routing decision on the next retrieval. The attention engine re-prioritizes goals based on the new coverage landscape. During idle time, consolidation prunes the weakened nodes and strengthens what worked. One outcome ripples through the entire system — retrieval, routing, attention, and memory lifecycle — without any component knowing about the others.
That causal chain breaks if any piece lives in a separate system. Bolted-on confidence scores don’t feed into topology analysis. A standalone consolidator can’t see which nodes were used in failed decisions. Separate goal tracking can’t query the graph’s epistemic coverage. The integrated architecture is the product.
Each of these systems does something well. None of them close the loop.
Graphonomous is the only system where confidence tracking, causal attribution, κ-routing, belief revision, sleep-cycle consolidation, multi-timescale memory, and goal-aware attention all operate on the same graph. MCP-native, works with any model, runs on SQLite at the edge.
GDPR-compliant forgetting is built in. Soft forget, cascade delete, policy-based pruning, and permanent audit-logged erasure (Article 17) — all in one tool surface.
The κ invariant is proved on 1,926,351 finite systems with zero counterexamples. The proof is browser-runnable at opensentience.org.
The theoretical foundations, deliberation protocol, attention engine, and governance model are published as open research protocols OS-001 through OS-008.
The first empirical evaluation (OS-E001) benchmarks the full engine on 18,165 files across 14 projects: 12,880 edges, 22 SCCs, κ=27, graph beats flat retrieval (+0.103 recall), 100% test pass rate across all 29 MCP tools. Raw data and reproduction scripts included.
GraphMemBench is a 160-scenario capability validation suite across 20 categories, testing every continual learning capability from κ activation to GDPR forgetting, plus graph algorithm quality (Dijkstra evidence paths, toposort causal ordering). All scenarios pass with 455 tests and 0 failures.
Phase 1 (Foundation):
Kappa Activation · Belief Revision ·
Conflict-Aware Consolidation · Two-Phase
Retrieval · Intentional Forgetting
Phase 2 (Advanced):
Uncertainty Propagation · Procedural Retrieval
· Multi-Agent Prep · Integration Scenarios
· Stress
Phase 3 (Causal):
Causal Metadata · E2E Workflows ·
Regression Guards · Competitor Adapters ·
Reporting
Competitor adapter interface validates against 5 implementations: Graphonomous (live), Baseline, Mem0, Zep, and Hindsight stubs.
Belief Revision — AGM-style
expand/revise/contract with automatic contradiction
detection and confidence propagation through dependency
graphs.
Intentional Forgetting — Soft,
hard, and cascade modes plus hybrid LRU+priority-decay
policy pruning and GDPR Article 17 erasure with audit
trails.
Epistemic Frontier — Wilson score
intervals identify where one more piece of evidence
would most reduce uncertainty. Information gain ranking
for research prioritization.
Causal Edge Metadata —
causal_strength, confounders, and intervention_history
on edges, updated automatically during outcome learning.
Hybrid Retrieval —
nomic-embed-text-v1.5 (768d) + BM25 via SQLite FTS5 +
cross-encoder reranking. Estimated +6–14pp SHR
lift over v0.2.