Transformer Architecture

Transformer World: A Deep Dive into the Building Blocks of LLMs

A hands-on walkthrough of Transformer-based LLM internals — from each module’s role to key optimization techniques.

March 5, 2026 · 15 min · 3010 words
ICMS and Bluefield-4 DPU

Know Your Enemy, Know Yourself, Part 4: Memory Capacity Bottleneck and NVIDIA ICMS

We explore the technical principles behind NVIDIA’s ICMS — a new storage tier designed to solve the KV cache capacity bottleneck in LLMs — and the Bluefield-4 DPU that manages it.

February 24, 2026 · 12 min · 2456 words