Built on the Savanna Engine · Apple M5 Max · Metal GPU
Norayr Matevosyan · 2026
DagDB. A six-bounded ranked directed acyclic graph database. Built from first principles on the Savanna engine. Running on Apple M five Max with Metal GPU compute. Every node connects to at most six edges. That single constraint makes everything possible.
1 / 24
The Problem
Traditional graph databases: unbounded degree
Cache misses, unpredictable memory, no parallelism guarantees
Can't fit on GPU. Can't deterministically replay.
What if we bounded every node to 6 edges?
Traditional graph databases allow unlimited edges per node. This sounds flexible, but it's a trap. Unbounded degree means unpredictable memory access, cache misses everywhere, and no parallelism guarantees. You can't fit that on a GPU. You can't replay it deterministically. What if we accepted a constraint: maximum six edges per node? That single rule changes everything.
2 / 24
The Rule of 6
6
Maximum directed edges per node
Domain
Why 6?
Physics
6 degrees of freedom describe any rigid body in 3D
Geometry
Hexagons pack space optimally (honeycombs, graphene)
Cognition
Miller's Law: 7 ± 2 items in working memory
Hardware
26 = 64 = one UInt64 machine word
Reach
610 = 60 million nodes in 10 hops
The Rule of Six. Every node connects to at most six directed edges. This isn't arbitrary. Six appears everywhere. Six degrees of freedom in physics. Hexagonal packing in geometry, the most efficient way to tile a surface. Miller's Law says working memory handles seven plus or minus two items. And critically: two to the sixth is sixty-four, one machine word. Every possible Boolean function of six inputs fits in a single sixty-four-bit integer. Six to the tenth is sixty million, enough reach for most real-world graphs in ten hops.
3 / 24
LUT6: The Core Primitive
Presets built in:
AND6 0x8000000000000000 all inputs must be true
OR6 0xFFFFFFFFFFFFFFFE any input triggers output
XOR6 (computed parity) odd number of true inputs
MAJ6 (computed majority) 4+ of 6 inputs true
IDENT pass-through ghost node / elevator
CONST 0x0 or 0xFFFF... fixed premises
LUT six. The core primitive. Six input bits form a six-bit index, zero through sixty-three. That index looks up one bit in a sixty-four-bit integer. One clock cycle. Zero branching. Zero warp divergence. The beauty: by choosing different sixty-four-bit values, each node independently becomes any Boolean gate. AND, OR, XOR, majority vote, identity pass-through, or any of the two to the sixty-four possible functions. This is how FPGAs work. We brought it to graph databases.
4 / 24
The Ranked DAG
Edges flow upward only: leaves to root. Topologically identical to a multilayer perceptron.
The ranked DAG. Every node lives at a rank. Leaf nodes at the bottom hold raw evidence, atomic facts. Intermediate ranks hold reasoning, arguments, aggregation. The root at rank zero holds the final decision. Edges flow strictly upward: leaves to root. This is topologically identical to a multilayer perceptron in neural networks. Input layer, hidden layers, output. The rank structure is the execution schedule.
5 / 24
Dual-Time Execution
Dual-time execution. Two clocks running at different scales. Micro-time is horizontal, within a single rank. Nodes debate locally, like neurons in a layer. Hex-coloring prevents race conditions. The cluster resonates until it reaches consensus. For monotone functions, Knaster-Tarski theorem guarantees convergence. For non-monotone, we cap at a hundred micro-ticks and output undefined, the paradox horizon. Macro-time is vertical: once a rank stabilizes, it locks and passes results strictly upward. One rank per tick. Perfectly acyclic. Perfectly deterministic.
6 / 24
Morton Z-Curve Memory Layout
Memory Address (64-bit):
[ 8 bits: Rank MSB ] | [ 56 bits: Morton Z-Curve LSB ]
Rank MSB: all nodes of same rank physically adjacent
Z-curve LSB: spatially connected nodes in same cache line
Bit-shift right = zoom out one octave (1 CPU instruction)
Morton Z-curve memory layout. Every sixty-four-bit address splits into eight rank bits on top and fifty-six Morton Z-curve bits below. The rank bits keep all nodes of the same rank physically adjacent on NVMe. The Z-curve bits ensure spatially connected nodes land in the same cache line. One bit-shift right zooms out one octave, a single CPU instruction. We already proved this works: two point one one times faster at one billion cells. The Z-curve is natively fractal, which means the hub node splitting tree maps directly to cache lines with zero extra work.
7 / 24
Octaves: Fractal Self-Similarity
Each note in a diatonic scale connects to 6 peers within its octave. When a cluster reaches consensus, it abstracts into a single macro-node one octave up. Renormalization in physics. Hierarchical modular networks in graph theory.
Octaves. Inspired by music theory. Each note in a diatonic scale connects to six peers within its octave. In DagDB, when a local cluster at rank N reaches consensus, it abstracts into a single macro-node at rank N minus one. This is renormalization in physics, hierarchical modular networks in graph theory. The octave is the rank boundary. Morton Z-curves make octaves native to hardware: same rank prefix means physically adjacent on disk.
8 / 24
The Engine: Savanna
14
GCUPS
128
GB Unified Memory
M5
Apple Silicon
Component
What It Does
ELLPACK sparse matrix
Fixed 6 columns, perfect for GPU
Morton Z-curve
Connected nodes adjacent in cache
SpMV core
Adjacency x state = one tick
7-coloring
Lock-free parallel update
Carlos Delta
50x compression, ACID persistence
The engine. Savanna. Already built and battle-tested. Fourteen giga cell updates per second on Apple M five Max with one hundred twenty-eight gigabytes of unified memory. ELLPACK sparse matrix format with fixed six columns. Morton Z-curve ordering so connected nodes sit in the same cache line. Sparse matrix-vector multiplication as the core operation. Seven-coloring for lock-free parallel updates. And Carlos Delta Transport for persistence with fifty times compression. We do not build a new engine. We re-label the concepts.
9 / 24
Mathematical Isomorphism
Do not build a new engine. Re-label the concepts.
Savanna Simulation
DagDB Logic
Cell (Zebra)
=
Node (Premise)
Neighbor (Lion)
=
Dependency (Counter-argument)
State (Alive/Dead)
=
Truth-state (Valid/Invalid)
6 spatial neighbors
=
6 logical edges
Evaluation tick
=
Inference step
Scent diffusion
=
Truth propagation
Morton Z-curve
=
Morton Z-curve
Carlos Delta
=
Carlos Delta
The mathematical isomorphism. This is why DagDB exists. A zebra cell in Savanna is a premise node in DagDB. A lion neighbor is a counter-argument dependency. Alive or dead state maps to valid or invalid truth state. Six spatial neighbors become six logical edges. An evaluation tick becomes an inference step. Scent diffusion becomes truth propagation. Even the infrastructure carries over unchanged: Morton Z-curve, Carlos Delta, the Metal shaders. Same engine. Different interpretation.
10 / 24
Hub Nodes: 6-ary Fractal Fan-In
Hub nodes. Real knowledge graphs have power-law distributions. A node like water or substation might have forty-six thousand edges. That violates the rule of six. The resolution: six-ary fractal fan-in. A hub with six to the sixth edges compiles into a six-deep micro-tree of virtual nodes. Each virtual node has exactly six edges, all within bounds. The Morton Z-curve is natively fractal. The virtual nodes share the same rank prefix and sequential Z-curve bits, so they land in the same L one cache line. The sparse matrix engine doesn't even know it's a hub.
11 / 24
Convergence: Knaster-Tarski + Paradox Horizon
Reality doesn't always cleanly resolve. Sometimes the argument is a paradox. Let the DAG natively highlight the paradox.
Convergence. How do we know micro-time terminates? Two tiers. For monotone Boolean functions like AND, OR, and majority, the Knaster-Tarski theorem mathematically guarantees a fixed point. They always converge. For non-monotone functions like XOR rings or NOT gates, oscillation is possible. Here we enforce a paradox horizon: a maximum of one hundred micro-ticks. If the cluster hasn't stabilized, it outputs undefined, the third truth state. This is elegant: reality doesn't always cleanly resolve. Sometimes the argument is genuinely paradoxical. Let the database say so.
12 / 24
Skip Connections: Ghost Nodes
Memory is cheap. Execution stalls are fatal.
Skip connections. Real arguments skip ranks. Evidence at rank ten might directly support a rank zero conclusion. But direct rank-skipping breaks the layer-parallel schedule. The resolution: identity ghost nodes. The compiler pads intermediate ranks with ghost nodes whose LUT is set to identity, pure pass-through. Truth state rides the elevator up through ranks, migrating physically closer in memory at each step. Memory is cheap. Execution stalls are fatal. The M five processes millions of identity gates in nanoseconds because ELLPACK stays dense and predictable.
13 / 24
Orthogonal Wavefronts
Orthogonal wavefronts. The Fisher K P P equation from reaction-diffusion has a direct analog in DagDB. Vertical: macro-time propagation speed is exactly one rank per tick. Deterministic, universal. Horizontal: micro-time speed depends on the sensitivity of local LUT gates. OR gates are permissive: one true input fires immediately, truth flashes at maximum speed. AND gates are skeptical: all six inputs must be true, propagation is slow and requires massive corroboration. The logic topology IS the information velocity. Choose your gates. Choose your speed.
14 / 24
Persistence: Carlos Delta Transport
50x
Compression
ACID
Guarantees
0
CPU Copies
Property
How
Atomicity
Each delta block = one complete tick
Consistency
6-degree bound prevents invalid graphs
Isolation
Hex coloring + rank = lock-free
Durability
Compressed deltas on NVMe, bidirectional
GPU to NVMe direct (GPUDirect Storage style)
~98% of nodes static per tick = massive delta compression
Backward replay: stream in reverse for time-travel queries
Carlos Delta Transport. Already built and shipped. Fifty times delta compression because about ninety-eight percent of nodes remain static each tick. GPU writes directly to NVMe, bypassing the CPU entirely. Full ACID: atomicity from one delta block per tick. Consistency from the six-degree bound. Isolation from hex coloring and rank scheduling, all lock-free. Durability from compressed deltas on NVMe. And it's bidirectional: stream forward to persist, stream backward to reconstruct any historical state. Time-travel queries with zero extra infrastructure.
15 / 24
SQL Interface: PostgreSQL
[Any SQL Client]
|
v
[Postgres Wire Protocol]
|
v
[Berkeley Cost-Based Optimizer]
(40 years of query planning)
|
v
[CustomScan / ExecutorRun_hook]
(intercept BEFORE Postgres CPU touches data)
|
v
[Apple Metal API]
(GPU executes SpMV in unified memory)
|
v
[Virtual Tuples returned]
S Q L interface. PostgreSQL as query planner. Any S Q L client connects via the standard Postgres wire protocol. The Berkeley cost-based optimizer, forty years of query planning expertise, generates the execution plan. Then our custom scan hook intercepts before Postgres's CPU touches any data. The hook translates relational operations into Metal G P U graph operations. Sparse matrix-vector multiplication executes entirely in unified memory. Virtual tuples return to Postgres. Standard S Q L in, G P U-accelerated graph traversal out.
16 / 24
SQL to GPU Graph Operations
Postgres Op
GPU Equivalent
SeqScan
Color-wave sweep with predicate filter
IndexScan
Direct array index O(1)
Filter
Per-node predicate, pure SIMD
Join
Follow edges = graph traversal (6-bounded = O(1))
Aggregate
Parallel reduction kernels
Sort/OrderBy
Parallel radix sort
Group By
Parallel hash-based grouping
Every relational operation has a direct GPU graph equivalent. 6-bounded degree makes JOIN cost constant, not variable.
S Q L to G P U graph operations. Every relational operation maps directly. Sequential scan becomes a color-wave sweep. Index scan is a direct array index, O of one. Filter is per-node predicate in pure SIMD. The critical one: JOIN. In traditional databases, join cost depends on table size. In DagDB, a join is following edges in the graph. Six-bounded degree means join cost is constant, not variable. Aggregate becomes parallel reduction. Sort becomes parallel radix sort. The G P U does all of it in unified memory.
17 / 24
Platform Architecture
NVIDIA translates unstructured reality into 6-bounded Boolean leaves. The Mac reasons over the structured graph.
Platform architecture. Three machines, each with a distinct role. Agora, the M five Max with one hundred twenty-eight gigs of unified memory: the reasoner. Hot path. Active DAG evaluation, sparse matrix-vector multiplication, Carlos Delta persistence. Unified memory means zero-copy between GPU and CPU, critical for sparse graph traversal. Sparta, an NVIDIA forty-ninety with twenty-four gigs: the translator. Dense L L M inference, P D F ingestion, fact extraction, natural language to six-bounded Boolean leaves. Athens, a thirty-ninety: additional translation capacity. The NVIDIA boxes translate reality. The Mac reasons over the result.
18 / 24
Deterministic Replay
Seed = Murmur3( Genesis_Epoch_Hash + Tick_Number )
Carlos Delta guarantees tick atomicity
Replay hashes tick number
Perfectly predicts color shuffle
Zero bytes added to delta stream
7-color dispatch order shuffled per tick (prevents chromatic advection)
Shuffle seed derived from tick number via Murmur3 hash
No need to log seed in Carlos stream
Any tick can be exactly reconstructed from genesis + tick number
Cryptographic-grade determinism at zero storage cost
Deterministic replay. We shuffle the seven-color dispatch order every tick to prevent chromatic advection, a subtle directional bias. But replay needs exact determinism. The solution: tick-derived pseudo-random number generation. Hardcode Murmur three in Metal. The seed is the genesis epoch hash plus the tick number. Carlos Delta guarantees tick atomicity, so replaying just hashes the tick number and perfectly predicts the shuffle. Zero bytes added to the delta stream. Any tick can be exactly reconstructed from genesis plus the tick number. Cryptographic-grade determinism at zero storage cost.
The review process. Before writing a single line of code, this architecture went through three independent review passes. First, Deep Research analyzed eight areas of feasibility: six green, one yellow for hub nodes, one flagged as novel. Second, the Harmonic Resolver: three local models, Gemma four, Llama Scout, and Qwen Coder, debated sixty-nine claims across nine rounds with Asymmetric Veto, generating six hundred twenty-one total viewpoints. Third, Gemini Deep Think resolved all five remaining critical questions. Only after all three passes gave green light did we build.
Phase zero point one: first run. The engine compiles and runs on Apple M five Max. Two hundred fifty-six nodes on a sixteen by sixteen hex grid. Seven-coloring balanced across thirty-six to thirty-seven nodes per group. Tick zero completes in four point one nine milliseconds with leaves-up rank evaluation. All eight LUT six unit tests pass: AND, OR, XOR parity, majority vote. Build zero point one is complete. The foundation works.
Verification gates. Three of seven gates are green. Gate one: swift build passes. Gate two: LUT six correct for all eight test cases. Gate three: leaves-up execution works at four point one nine milliseconds. Gates four through seven are pending for later phases: Carlos Delta persistence, Postgres integration, million-node scaling, and time-travel queries. The file structure is clean: DagDB state, engine, hex grid reused from Savanna, Carlos Delta reused, the Metal shader, and a C L I test harness.
Phase 1: Full ranked DAG kernel with intra-rank resonance
Phase 2: Carlos Delta persistence for DagDB
Phase 3: PostgreSQL CustomScan executor hook
Phase 4: SwiftUI application
Phase 5: MCP server + Sparta/Athens integration
5
Phases Remaining
7
Verification Gates
3/7
Gates Passed
Roadmap. Phase zero point one is done. Five phases remain. Phase zero point two: generalize the neighbor table to support edge weights, enabling continuous mode. Phase one: full ranked DAG kernel with intra-rank resonance and the paradox horizon. Phase two: wire Carlos Delta Transport for DagDB persistence and time-travel. Phase three: the PostgreSQL custom scan executor hook for S Q L access. Phase four: SwiftUI application for visualization. Phase five: M C P server integration with the NVIDIA translation cluster. Seven verification gates, three passed so far.
23 / 24
DagDB
Constraints do not limit a system. They make it viable.
6 edges · 64-bit LUT · Morton Z-curve · 14 GCUPS
Carlos Delta · PostgreSQL · Apple M5 Max
This is an amateur engineering project. We are not HPC or database professionals and make no competitive claims. Numbers speak; ego does not. Errors likely.
Norayr Matevosyan · 2026
DagDB. A six-bounded ranked directed acyclic graph database. One constraint: six edges per node. From that, everything follows. LUT six gives you any Boolean function in one clock cycle. Morton Z-curve puts connected nodes in the same cache line. Savanna engine delivers fourteen giga cell updates per second. Carlos Delta provides ACID persistence with fifty times compression. PostgreSQL gives you forty years of query planning. And Apple's unified memory architecture makes sparse graph traversal viable at scale. Constraints do not limit a system. They make it viable. Thank you.
24 / 24
Press Space to narrate · ↑↓ to navigate · Works offline