DRT_Generator

A browser demo where the same sparse matrix-vector multiplication is drawn two ways — as a rigid logic pipeline, and as a deep graph. Once both are visible at once, the difference between rule-based and neural stops being a difference.

github.com/norayr-m/drt-generator · live demo · April 2026

This deck is a small primer on sparse matrix-vector multiplication, told through a worked example. The example is a browser demo. The demo runs a chain of seven small operations that turn a rotating clock tick into a harmonic waveform. Most of those operations are matrix-vector multiplications, and most of the matrices involved are sparse. The same chain is then redrawn as a seven-layer deep graph, side by side with the pipeline. The point is to make a single thing visible: a rule-based pipeline and a deep neural network can be two different drawings of the same arithmetic. Once you see them at the same time, the question of whether the demo is doing logic or doing learning stops being interesting; the linear algebra is the same either way. No prior background is required to follow the next eight slides.

1 / 9

What you see when the page loads

Seven columns laid out left to right, each labelled.
A rotating clock at the far left, ticking out a phase.
A row of weighted "dice" appearing in column four — these are the nonzero entries of a sparse vector.
A small audio waveform rebuilding itself at the far right.
A play / pause control, an audio mute, and a phase counter.

When you open the live page, the first thing you see is a row of seven labelled columns running left to right. At the far left is a rotating clock. The clock ticks out a phase, a single integer that increments. As the clock ticks, the value flows rightward through the columns. Column two takes the phase and produces an index. Column three uses that index to select a row from a sparse routing matrix. Column four shows the selected row as a set of weighted dice — these dice are the nonzero entries of a sparse vector, the rest of the entries are zero and not drawn. The dice values pass through column five's element-wise activation functions, column six's small feed-forward propagation, and arrive at column seven, where the result vector is reassembled into the harmonic frequencies of the audio. You can press play, watch the columns light up in sequence on every tick, and listen to the resulting waveform.

2 / 9

The same thing, twice

Pipeline view

Rigid boxes. Arrows. Index → row selection → sparse vector → activation → propagation → audio. The mat-vec structure is explicit.

Graph view

The same operations redrawn as a seven-layer graph. Every box becomes a layer of nodes, every arrow becomes a weighted edge — the matrices are the edge weights.

Identical computation. Identical signal. Two drawings of the same matrix-vector product.

The demo can be viewed in two modes. The pipeline view shows the seven columns as discrete boxes with arrows between them — the way a programmer would draw a flowchart of mat-vec operations in a numerical library. The graph view redraws the same machine as a seven-layer deep graph: every box becomes a layer of nodes, every arrow becomes a weighted edge, and the matrices that the pipeline view treats as opaque table lookups become explicit edge-weight patterns. Both diagrams compute the same output vector from the same input phase. Toggling between them is the central pedagogical point of the demo: what the pipeline view shows as a fixed sequence of mat-vec calls, the graph view shows as a small neural network. Same arithmetic, two pictures.

3 / 9

Why sparse matrix-vector multiplication matters

Most matrices that appear in real computation are sparse — the great majority of their entries are zero. Adjacency matrices of graphs (social networks, molecules, circuits, finite-element meshes) are sparse. Attention masks in transformers, especially block-sparse and local-attention variants, are sparse. The transition matrices of physical simulations are sparse because each cell talks only to its near neighbours. Even a fully-connected layer becomes sparse once magnitude pruning is applied.

The cost of multiplying a dense N × N matrix by a vector is O(N²) — every entry, zero or not, contributes one multiply and one add. A sparse mat-vec costs O(nnz) where nnz is the number of nonzero entries. For a graph adjacency with average degree k, that is O(kN), and k is usually a small constant. The factor of N / k is where most of modern compute's headline numbers come from.

Compute. Skip the zeros. Each nonzero is one multiply-add.
Memory. Store the nonzero pattern only — COO, CSR, CSC, or block-sparse. The dense form is never materialised.
Bandwidth. Sparse storage means fewer bytes per useful operation, which matters more than FLOPs on modern hardware.

y = A x cost = O(nnz)

where nnz is the number of nonzero entries of A — typically much less than N² for the matrices that actually show up

Most matrices that appear in real computation are sparse — the great majority of their entries are zero. Graph adjacency matrices are sparse, and that includes social networks, molecular bonds, electrical circuits, and finite-element meshes. Transformer attention masks, especially the block-sparse and local-attention variants, are sparse. Physical simulations are sparse because every cell only interacts with its near neighbours. Even a fully-connected neural layer becomes sparse once magnitude pruning has been applied. The cost of multiplying a dense N-by-N matrix by a vector is order N squared — every entry contributes one multiply and one add, regardless of whether it is zero. A sparse matrix-vector multiplication costs only on the order of the number of nonzeros. For a graph adjacency with average degree k, that is order k times N, and k is usually a small constant. The factor of N over k is where most of modern compute's headline numbers come from. Three savings stack: compute, by skipping the zeros; memory, by storing only the nonzero pattern in formats like compressed sparse row; and bandwidth, by reading fewer bytes per useful operation, which on modern hardware matters more than the raw FLOP count.

4 / 9

Pipeline equals network. Drawn out, not assumed.

Most of the time, "rule-based pipeline" and "deep neural network" are treated as different kinds of object — one is written by a programmer, the other is trained from data. The seven-column generator demonstrates, by direct construction on a single mat-vec chain, that they are the same kind of object. The same arithmetic that the pipeline view shows as a fixed logical sequence is what the graph view shows as a small network with weighted layers and activation functions. No retraining. No wrapper. Same matrices, two pictures.

The reason this matters for sparse mat-vec specifically is that the two views suggest different optimisations. The pipeline view makes the structured sparsity pattern explicit — you can see that column three is a one-hot row pick, column four is a sparse vector with a known support, column six is a small dense block. The graph view makes the parallelism explicit — you can see which layers can be evaluated independently and where the data dependencies live. Both views run on the same matrices; choosing between them is choosing what to optimise for.

There is a particular reason this slice deserves a separate demo. In most software, the line between rule-based numerical code and neural network code is treated as architectural — one is hand-written, the other is trained from data. The seven-column generator demonstrates, by direct construction on a single chain of mat-vec operations, that the line is a drawing convention rather than a structural difference. The same arithmetic that the pipeline view shows as a fixed sequence of matrix-vector calls is what the graph view shows as a small neural network with weights and activations. Toggling between them changes the picture, not the computation. The reason this matters for sparse mat-vec specifically is that the two views suggest different optimisations. The pipeline view makes the structured sparsity pattern explicit — column three is a one-hot row pick, column four is a sparse vector with known support, column six is a small dense block. The graph view makes the parallelism explicit — you can see which layers can be evaluated independently and where the data dependencies live. Both views run on the same matrices; choosing between them is choosing what to optimise for.

5 / 9

Anatomy of the seven columns

Phase. A rotating clock ticks out an integer index t = 0, 1, 2, …
Index map. A small lookup table sends t to a pattern identifier p. Function call, no mat-vec yet.
Row select. A sparse selection matrix S picks one row of the routing matrix indexed by p. This is a one-hot mat-vec: r = S e_p where e_p is the standard basis vector.
Sparse vector. The selected row is the support of a sparse weight vector — only a handful of entries are nonzero. The dice physically extrude exactly the nonzero positions.
Activation. Element-wise nonlinearity (rectifier, sigmoid, hyperbolic tangent) on the sparse vector. Cheap because zero stays zero; only the nonzeros change.
Propagate. Multiply by a small dense matrix W — a hidden-layer weight. Output is dense (most entries become nonzero), but input is sparse so cost is O(rows(W) · nnz).
Assemble. Multiply by an output mixing matrix that mixes the dense intermediate into the harmonic frequency channels. The result is the audio waveform.

Pipeline view shows these as boxes; graph view shows them as layers. Same arithmetic.

Here is the column-by-column picture. Column one is a rotating clock that ticks out an integer index. Column two is a lookup table that sends that index to a pattern identifier — a function call, no mat-vec yet. Column three is the first matrix-vector multiplication: a sparse selection matrix picks one row of the routing matrix indexed by the pattern. This is the canonical one-hot mat-vec, multiplying by a standard basis vector. Column four shows the selected row as a sparse weight vector — only a handful of entries are nonzero, and the dice physically extrude exactly the nonzero positions. Column five applies an element-wise nonlinearity, which is cheap because zero stays zero and only the nonzeros change. Column six is the second mat-vec: multiply by a small dense weight matrix, the hidden-layer weight. The output is dense, but the input is sparse, so the cost is the number of output rows times the number of input nonzeros, not the dense cost. Column seven is the third mat-vec: multiply by an output mixing matrix that maps the intermediate vector into harmonic frequency channels, producing the audio waveform. The pipeline view draws this as boxes connected by arrows. The graph view draws it as layers of a network with weighted edges. The arithmetic is identical.

6 / 9

What sparsity patterns the demo emits

Each phase tick produces one sparse weight vector — a particular pattern of nonzero entries in column four. The space of possible patterns is fixed by the lookup table in column two and the routing matrix in column three; the demo cycles through them as the clock advances. Watching the demo run for one full clock cycle is, in effect, watching the entire pattern dictionary pass through the chain once. Three illustrative patterns are shown alongside this narration.

sparse · concentrated

balanced · roughly uniform

dense · many small entries

Each tick of the rotating clock produces one sparse weight vector — a specific arrangement of nonzero entries in column four. The set of possible arrangements is fixed by the lookup table in column two and the routing matrix in column three; the demo cycles through them as the clock advances. Watching the demo run for one full clock period is, in effect, watching the entire pattern dictionary pass through the chain exactly once. Three illustrative cases are shown alongside this narration. The first is a concentrated sparse pattern with one or two heavy entries — most of the signal energy is in a small number of nonzeros. The second is a balanced pattern with roughly uniform weights spread across the support. The third is a denser pattern with many small entries — many channels active simultaneously at low individual weight. The audio sounds different for each because the downstream linear algebra is doing different work; the architecture is the same. In numerical-library terms, these three patterns have different sparsity ratios — the first has the lowest nonzero count and the highest individual magnitude; the third has the highest nonzero count and the lowest individual magnitude.

7 / 9

Forward and backward through the same matrices

The seven-column chain has a companion repository, DRT_Scanner, which runs the same chain in reverse. The forward pass takes a phase index and produces an audio output. The reverse pass takes an output signal and projects it back through the transposes of the same matrices, recovering an estimate of the input. This is the same forward/backward pattern that powers automatic differentiation in modern numerical libraries — and the same one that powers message-passing on the transpose graph in graph neural networks.

Agreement. Forward and reverse passes give matching values at each column. The matrices are well-conditioned for inversion at this configuration.
Divergence. The reverse pass produces values the forward pass did not emit. Information has been lost — the forward map is not invertible at this configuration, or numerical conditioning has broken down.

Generator multiplies by A. Scanner multiplies by A^T. Same matrices, opposite direction.

A natural next question, once you have a forward chain of matrix-vector multiplications, is whether the chain can be run backwards. The companion repository, scanner, exists to answer that. The scanner uses the same seven matrices, but the wires flow from right to left. A signal sits at the right, and at each tick that signal is multiplied by the transposes of the matrices, in reverse order. The intermediate values are read off and shown. Where the reverse pass agrees with what the forward pass produced, the columns light up green; the matrices are well-conditioned for inversion at that configuration. Where the reverse pass disagrees, the columns flag the divergence; information has been lost, the forward map is not invertible at that configuration, or numerical conditioning has broken down. This forward-backward pattern is exactly the same one that powers automatic differentiation in modern numerical libraries, and the same one that powers message-passing on transpose graphs in graph neural networks. The pair of repositories is deliberately released together to make that symmetry visible on a small example.

8 / 9

What an interested reader walks away with

A small browser demo where seven columns turn a clock tick into a harmonic waveform — a worked example of sparse matrix-vector multiplication.
A live, side-by-side proof that "rule-based pipeline" and "deep neural network" can be two drawings of the same chain of mat-vec operations.
One concrete, runnable example of how sparse vectors flow through a small pipeline of one-hot selection, element-wise activation, and dense propagation.

Live demo at norayr-m.github.io/drt-generator. Source on GitHub. Companion scanner at norayr-m.github.io/drt-scanner.

Visualization co-authored with Claude (Anthropic).

What you walk away from this deck with, if everything has landed cleanly, is three things. First, a small browser demo where seven columns turn a rotating clock tick into a harmonic waveform — a worked example of sparse matrix-vector multiplication. Second, a live, side-by-side demonstration that what is often described as a rule-based pipeline and what is often described as a deep neural network can be two different drawings of the same chain of mat-vec operations. Third, one concrete and runnable example of how a sparse vector flows through a small pipeline of one-hot selection, element-wise activation, and dense propagation. The live demo is at norayr-m.github.io slash drt-generator. The companion scanner repository, where the same matrices run backwards, is at norayr-m.github.io slash drt-scanner. This is an amateur engineering project. We are not HPC professionals and make no competitive claims. Errors are likely; the work is openly in progress; the demonstrations are honest about where they are and are not airtight.

9 / 9