MoE parameters, MoE problems: visualizing Mixture of Experts Routing Layers

Overview

I built a real-time neural telemetry engine designed to intercept and visualize the internal gating decisions of a Mixture of Experts (MoE) model at a deeper level.
The project utilizes a PyTorch forward hook to capture raw 40-dimensional routing weights from Layer 20 of an IBM Granite 3.0 model. By running this entirely on a laptop, the demo provides a live “neural heartbeat” that proves how sparse activation can achieve high-performance reasoning without the latency or computational waste of a cloud-scale cluster.

Tech stack