
Memory in the AI Era, Part 4: Understanding CXL
Next to the GPU, HBM and HBF fill the gap — but there’s another empty seat next to the CPU. We look at CXL, the new interface that fills the awkward gap between PCIe and DDR: its basic structure, device types, and the CXL product blueprints the big three memory vendors are drawing.

Memory in the AI Era, Part 3: The Remaining Challenges of HBF
HBF clearly has its place, but it still has gaps to fill before it can claim a spot in the memory hierarchy pyramid. We walk through the latest LLM model and inference workload trends, how Flash memory is actually used in LLM serving today, and the remaining challenges HBF has to solve.

Memory in the AI Era, Part 2: Where Does HBF Actually Fit?
Centered on SK hynix’s H³ architecture, we explore workloads that can overcome HBF’s weaknesses.

Know Your Enemy, Know Yourself Part 5: Cerebras and the Wafer-Scale Engine
This post explains Cerebras’s recent momentum, WSE-3 architecture, and the trade-offs of wafer-scale chips in a beginner-friendly way, following reports of a large OpenAI deal.

Memory in the AI Era, Part 1: Understanding HBF
Why are there so many types of memory, and where does HBF fit in? From SRAM to HBF, we explore the physical principles behind the memory hierarchy and the technical architecture of HBF.

Project Glasswing: Claude Mythos Preview
Centered on Anthropic’s Project Glasswing and Claude Mythos Preview, this post explains why cybersecurity capability has jumped, how benchmark design is shifting, what real defensive findings looked like, and how developers should evolve their workflow with agents.

Building an In-House Dev Environment on Kubernetes Part 3: Kubernetes Device Plugin for LPU
Building an In-House Dev Environment on Kubernetes Part 3: Kubernetes Device Plugin for LPU Hello! I’m Younghoon Jun, a DevOps Engineer on the ML team at HyperAccel. This post is the third installment of the Building an In-House Dev Environment on Kubernetes series! In Part 1, we covered the background, overall design, and direction of building a Kubernetes-based development environment. Part 2 introduced the strategy and process for building an ARC-based CI/CD infrastructure to overcome the structural limitations of self-hosted runners. In this third article, we will discuss the Device Plugin required for utilizing custom resources on Kubernetes. ...

AITER Analysis: How AMD Doubled ROCm Inference Performance
An analysis of AITER (AI Tensor Engine for ROCm), which boosts inference performance on AMD GPUs.

Transformer World: A Deep Dive into the Building Blocks of LLMs
A hands-on walkthrough of Transformer-based LLM internals — from each module’s role to key optimization techniques.

Know Your Enemy, Know Yourself, Supplement: Pallas Programming Model
Learn about Pallas programming model that enables writing custom kernels on TPU.