KV Cache

Transformer World: A Deep Dive into the Building Blocks of LLMs

A hands-on walkthrough of Transformer-based LLM internals — from each module’s role to key optimization techniques.

Know Your Enemy, Know Yourself, Part 4: Memory Capacity Bottleneck and NVIDIA ICMS

We explore the technical principles behind NVIDIA’s ICMS — a new storage tier designed to solve the KV cache capacity bottleneck in LLMs — and the Bluefield-4 DPU that manages it.