Eight deterministic synthetic tiers designed to isolate what κ-topology actually contributes to retrieval. T1–T6 test κ-sensitive cycle detection; T7–T8 benchmark graph algorithm quality (evidence paths, causal ordering). Each tier runs topology ON vs OFF side by side.
T1–T2 are acyclic controls. T3–T6 increase in κ complexity and retrieval difficulty. T7–T8 benchmark graph algorithms: weighted shortest paths (Dijkstra) and causal DAG ordering (toposort).
Acyclic control. Direct causal chains A→B→C→D with no loop-back. Topology should stay inert — no SCCs, no κ, no deliberate routing.
Acyclic control. Tree-shaped causal decomposition: root→{A,B}→{A1,A2,B1}. Tests that fan-out alone doesn't trigger false-positive κ routing.
Disjoint 3–5 node SCCs forming directed cycles A→B→C→A. Primary test for cycle-root paradox and membership-in-cycle retrieval.
Dense bidirectional ring + chord edges per SCC. Tests fault-line detection and high-κ routing. Configurable chord density scales κ upward.
Paired old/new facts per subject, half joined by
bidirectional contradicts (κ=1), half by
one-way supersedes (κ=0). Tests belief
revision under κ pressure.
SCCs planted across a density ladder [1..5] — retriever must distinguish κ values, not just detect that a cycle exists. New metrics: discrim_accuracy and κ-MAE.
Weighted DAGs with known shortest paths (Dijkstra). Tests whether retrieval preserves evidence chain ordering and hop counts. Alternate paths probe multi-path awareness.
Layered DAGs with known topological order and critical path depth. Tests causal sequencing (toposort), longest-path detection, and source/sink identification.
Every tier passes the ≥3pp κ-metric gate (or — for the κ=0
controls — correctly produces zero delta). Sanity mode,
deterministic from --seed 42.
| Tier | κ class | kappa_recall on | kappa_recall off | Δ | max κ on / off | p50 on / off | Gate |
|---|---|---|---|---|---|---|---|
| T1 | κ=0 linear | 0.00 | 0.00 | 0 | 0 / 0 | 30ms / 32ms | ✓ correct |
| T2 | κ=0 branching | 0.00 | 0.00 | 0 | 0 / 0 | 30ms / 29ms | ✓ correct |
| T3 | κ=1 cycle | 1.00 | 0.00 | +100pp | 1 / 0 | 10.2s / 32ms | ✓ pass |
| T4 | κ≥2 multi-SCC | 1.00 | 0.00 | +100pp | 2 / 0 | 43ms / 32ms | ✓ pass |
| T5 | κ=0–1 contradict | 1.00 | 0.00 | +100pp | 1 / 0 | 41ms / 39ms | ✓ pass |
| T6 | mixed κ | 1.00 | 0.00 | +100pp | 2 / 0 | 46ms / 38ms | ✓ pass |
These tiers are κ-independent — they benchmark whether the retriever preserves structural properties of the ingested graph. Topology on/off should not affect these metrics (and doesn't).
| Metric | Topology ON | Topology OFF | Δ |
|---|---|---|---|
| path_node_recall | 1.00 | 1.00 | 0 |
| path_order_accuracy | 0.60 | 0.60 | 0 |
| hop_count_mae | 4.00 | 4.00 | 0 |
| alt_path_detected | 1.00 | 1.00 | 0 |
| latency p50 / p95 | 32ms / 200ms | 35ms / 275ms | ~0 |
trace_evidence_path MCP tool, which uses
Dijkstra directly and should achieve MAE ≈ 0.
| Metric | Topology ON | Topology OFF | Δ |
|---|---|---|---|
| ordering_accuracy | 0.46 | 0.46 | 0 |
| critical_depth_mae | 2.67 | 2.67 | 0 |
| source_sink_recall | 1.00 | 1.00 | 0 |
| latency p50 / p95 | 34ms / 205ms | 33ms / 201ms | ~0 |
Graphonomous.Algorithms.DAG)
close: Kahn's toposort + longest-path DP should push ordering
accuracy toward 1.0 and depth MAE toward 0.
All results: sanity mode, --seed 42, fallback embedder.
T7/T8 are κ-independent — topology on/off produces identical algorithm metrics by design.
The point of a synthetic benchmark is to control the difficulty axis. These five CLI flags let you push Graphonomous along orthogonal pressure points and watch where performance degrades.
T4 chord-edge density. 1 = bidirectional ring
only, 5 = dense layered chords. Higher density
increases κ and forces fault-line detection to work harder.
Injects N acyclic chains of noise alongside the gold SCCs. Beyond some threshold the similarity stage drops gold clusters from its top-K, collapsing kappa_recall.
Shrink the retrieval budget that topology gets to work with.
At similarity-limit=3, hops=0, T4 kappa_recall
drops to 0.90 and faultline_mrr falls to 0.60.
Discrimination, not detection. SCCs planted across a density ladder force the retriever to resolve the exact κ value. Tracks kappa_discrim_accuracy and MAE.
Strips distinctive domain words and replaces them with
system-N tokens. Similarity retrieval can no
longer shortcut to the right cluster — topology has to
carry the weight.
All tiers are deterministic and reproduce bit-for-bit from the
same seed. Fixtures live in
priv/graphmembench/fixtures/T{1..8}/.
retriever.ex,
topology.ex, or the LongMemEval path. Running it cannot
affect production behavior. Results land in
benchmark_results/graphmembench_T{N}_topology_{on,off}.json
for offline comparison.