Hello. I’m Namyoon Kim, working on the ML team at HyperAccel.

In Part 1, we covered the motivation and overall design direction for building a Kubernetes-based development environment. This article continues from there, focusing on how we redesigned our CI/CD infrastructure.

Once the development environment lives on Kubernetes, running CI/CD pipelines on the same environment is the architecturally consistent choice. This article structurally analyzes why traditional self-hosted runners were unsustainable long-term, then documents the key technical decisions made during ARC (Actions Runner Controller) adoption — the selection criteria between DinD mode and Kubernetes mode, Rook-Ceph based Ephemeral PVC strategy, secret lifecycle management through Vault, and pipeline observability through a custom-built GitHub Actions Exporter.


The Structural Limitations of Self-hosted Runners

Previously, we operated by installing self-hosted runners as Docker containers directly on servers. Runner labels were set to hostnames (e.g., ha-xxx), and GPU test workflows used the --all-gpu flag to utilize all GPUs on a given server.

The reasons we didn’t use GitHub’s Hosted Runners (ubuntu-latest, etc.) were as follows:

  • Hardware dependency: FPGA synthesis and GPU-based testing can only run on nodes with these devices physically installed.
  • Network isolation: Internal infrastructure such as Harbor registry and Vault server are unreachable from external runners.
  • Image transfer costs: The round-trip of building, pushing, and pulling multi-GB Docker images through external networks is inefficient in both bandwidth and time.

While self-hosted runners resolved these constraints, fundamental problems with the structure itself emerged over time.

First, tight coupling between servers and workflows. Since runner labels were bound to hostnames, replacing or renaming a server required modifying every workflow referencing it. Additionally, the --all-gpu flag approach meant that when two jobs were scheduled on the same server simultaneously, GPU resource contention couldn’t be controlled, causing one job to fail unpredictably.

Second, execution environment contamination. Runners execute directly on the host OS, so packages installed or system configurations changed during Build A can affect Build B’s behavior. This means build reproducibility cannot be guaranteed — undermining the fundamental purpose of a CI pipeline.

Third, inability to scale elastically. The number of concurrent workflows varies significantly by time of day, but the number of server-installed runners is static. Under-provisioning increases queue wait times; over-provisioning wastes idle resources.

All three problems arise because runners exist outside Kubernetes’ scheduling and lifecycle management framework. In an environment where a Kubernetes cluster is already established, converting runners to Pods and placing them under cluster control was the logical decision.


ARC (Actions Runner Controller) Architecture

ARC is a GitHub-officially-supported Kubernetes Operator that manages the complete lifecycle of self-hosted runners on Kubernetes. Understanding ARC’s architecture requires examining the AutoScalingRunnerSet CRD (Custom Resource Definition) — the core resource that controls runner creation, scaling, and deletion.

Operational Flow

ARC Operational Flow

  1. A Listener Pod maintains an HTTPS Long Poll connection with the GitHub Actions Service to watch for new jobs. Since this is polling-based rather than webhook-based, no inbound network configuration is required.

  2. When a job is detected, the Listener patches the EphemeralRunnerSet replica count via the Kubernetes API to request a scale-up.

  3. The EphemeralRunner Controller obtains a JIT (Just-in-Time) configuration token and creates a Runner Pod, which registers itself with the GitHub Actions Service.

  4. Upon job completion, the EphemeralRunner Controller verifies with the GitHub API and deletes the pod. This ephemeral execution model fundamentally prevents residual artifacts from previous builds from affecting subsequent ones.

Purpose-specific AutoScalingRunnerSet Design

HyperAccel’s CI workloads span a wide resource spectrum — from general builds to FPGA synthesis to GPU testing. We deploy 7 AutoScalingRunnerSets organized by purpose:

$ kubectl get autoscalingrunnerset -n arc-systems
NAME                     MIN   MAX   CURRENT   RUNNING
runner-base              1     10    1         1       # DinD Mode
runner-cpu               1     10    3         3       # Kubernetes Mode
runner-cpu-largememory   1     10    1         1       # Kubernetes Mode
runner-fpga              1      3    1         1       # Kubernetes Mode
runner-gpu               1      3    1         1       # Kubernetes Mode
runner-highcpu           1     10    1         1       # Kubernetes Mode
runner-hybrid            1      4    1         1       # Kubernetes Mode

Each Scale Set is scheduled only on appropriate nodes via nodeAffinity or tolerations:

  • runner-gpu: Placed on GPU nodes with label nvidia.com/gpu.present=true
  • runner-fpga: Placed only on specific FPGA-equipped nodes
  • runner-cpu / runner-cpu-largememory: Placed on nodes with label ci=large-memory

Workflows select the appropriate runner via the runs-on key:

jobs:
  gpu-test:
    runs-on: runner-gpu     # GPU Runner
  fpga-synth:
    runs-on: runner-fpga    # FPGA Runner
  build:
    runs-on: runner-cpu     # General CPU Runner

A critical observation: only runner-base uses DinD mode, while the remaining 6 all use Kubernetes mode. The difference between these two modes is one of the most important architectural decisions in ARC operations.


DinD Mode vs Kubernetes Mode: Architectural Differences

When a Runner Pod needs to execute containers within a workflow (the container: key or container actions), ARC provides two modes: DinD (Docker-in-Docker) mode and Kubernetes mode. These modes differ fundamentally in their container execution mechanism, volume management, and security model.

DinD Mode (runner-base)

Runs a Docker daemon as a sidecar container inside the Runner Pod. At HyperAccel, runner-base operates in this mode.

# runner-base core configuration (DinD Mode)
spec:
  containers:
    - name: runner
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
        - name: RUNNER_WAIT_FOR_DOCKER_IN_SECONDS
          value: "120"
      volumeMounts:
        - mountPath: /var/run
          name: dind-sock          # Docker socket sharing
        - mountPath: /home/runner/_work
          name: work
  initContainers:
    - name: init-dind-externals    # Copy runner externals
      command: ["cp"]
      args: ["-r", "/home/runner/externals/.", "/home/runner/tmpDir/"]
    - name: dind                   # Docker daemon (Sidecar)
      image: docker:dind
      securityContext:
        privileged: true           # ⚠️ Privileged required
      restartPolicy: Always
      args: ["dockerd", "--host=unix:///var/run/docker.sock"]
      volumeMounts:
        - mountPath: /var/run
          name: dind-sock
        - mountPath: /home/runner/externals
          name: dind-externals
  volumes:
    - name: dind-sock
      emptyDir: {}                 # Volatile volume
    - name: dind-externals
      emptyDir: {}
    - name: work
      emptyDir: {}

The structural characteristics of DinD mode are as follows.

First, the docker:dind image runs as an initContainer with restartPolicy: Always, operating in the Sidecar pattern. The runner container accesses this Docker daemon’s Unix socket via the DOCKER_HOST environment variable.

Second, since the Docker daemon handles container layer management, image pulls, and network creation entirely within the Pod, it cannot leverage the node’s containerd image cache. Even images already cached on the node must be pulled again by the DinD daemon.

Third, privileged: true is mandatory for Docker daemon execution. This grants the Pod access to nearly all host kernel capabilities, requiring careful consideration in environments with strict security policies.

Fourth, all volumes are configured as emptyDir — data disappears when the Pod is deleted. Docker build cache is also not preserved.

Kubernetes Mode (runner-cpu, runner-gpu, runner-fpga, etc.)

The Runner Pod calls the Kubernetes API to create workflow container steps as separate Pods. ARC’s Container Hook (runner-container-hooks) mediates this process.

# runner-cpu core configuration (Kubernetes Mode)
spec:
  containers:
    - name: runner
      env:
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
          value: /home/runner/k8s/worker-podspec.yaml
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "true"
      volumeMounts:
        - mountPath: /home/runner/_work
          name: work
        - mountPath: /home/runner/k8s/worker-podspec.yaml
          name: hook-template
          subPath: worker-podspec.yaml
  volumes:
    - name: work
      ephemeral:
        volumeClaimTemplate:        # Ephemeral PVC
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: rook-ceph-block
            resources:
              requests:
                storage: 15Gi
    - name: hook-template
      configMap:
        name: arc-hook-cpu          # worker-podspec ConfigMap

The core mechanisms of Kubernetes mode are as follows.

ACTIONS_RUNNER_CONTAINER_HOOKS specifies the Container Hook entry point (index.js). When a workflow uses the container: key, instead of executing directly, the Hook calls the Kubernetes API to create a separate Workflow Pod.

ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE specifies the Workflow Pod’s spec template (worker-podspec.yaml). This template is managed as a ConfigMap, with different ConfigMaps referenced per runner type (arc-hook-cpu, arc-hook-gpu, arc-hook-fpga, etc.).

ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER=true enforces that all Jobs must execute within a container via the container: key, preventing direct execution on the Runner Pod itself.

When a job runs, a separate Workflow Pod is created alongside the Runner Pod, separating the runner from the actual workflow execution environment.

Structural Comparison of Both Modes

AspectDinD Mode (runner-base)Kubernetes Mode (runner-cpu, etc.)
Container executionDocker daemon inside PodSeparate Pod via K8s API
Privileged modeRequired (Docker daemon)Optional (per worker-podspec)
Image cacheIsolated in DinD (no node cache)Shares node’s containerd cache
Work volumeemptyDir (volatile)Ephemeral PVC (rook-ceph-block, 15Gi)
Step isolationSame Docker networkIndependent Pod per step possible
Docker CLIFully compatibleIncompatible (via Container Hook)
Config complexityLowHigh (worker-podspec, RBAC, etc.)
Best forDocker build/push workflowsIn-container build/test

runner-base is maintained in DinD mode because workflows requiring direct Docker CLI usage (image builds, registry pushes, etc.) exist. In Kubernetes mode, there is no Docker daemon, so docker build cannot be executed directly.


Volume Strategy: Ephemeral PVCs and Cache Layers

Runner volume design directly impacts build performance and stability. The volume strategies differ qualitatively between DinD mode and Kubernetes mode.

DinD Mode Volumes: emptyDir

# runner-base volumes (DinD Mode)
volumes:
  - name: dind-sock          # Docker socket sharing (Runner ↔ DinD daemon)
    emptyDir: {}
  - name: dind-externals     # Runner externals copy
    emptyDir: {}
  - name: work               # Workspace (checkout, build artifacts)
    emptyDir: {}
  - name: harbor-ca          # Harbor CA certificate
    configMap:
      name: harbor-ca

All work volumes are emptyDir, so data is lost when the Pod is deleted. Docker build cache is also not preserved — previously cached layers cannot be reused. This is disadvantageous for large image builds.

Kubernetes Mode Volumes: Ephemeral PVC + Cache Layers

# runner-cpu volumes (Kubernetes Mode)
volumes:
  - name: work
    ephemeral:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: rook-ceph-block   # Ceph block storage
          resources:
            requests:
              storage: 15Gi                   # 15Gi per runner
  - name: hook-template
    configMap:
      name: arc-hook-cpu                      # worker-podspec template

In Kubernetes mode, the work directory (/home/runner/_work) uses Ephemeral PVCs. Unlike emptyDir, these are backed by Rook-Ceph block storage, providing stable I/O independent of node local disk conditions with a dedicated 15Gi volume per runner. Ephemeral PVCs are bound to the Runner Pod’s lifecycle — they are automatically deleted when the Pod is deleted, ensuring consistent operation without storage leaks.

Cache Volume Strategy in worker-podspec

Workflow Pods created by Container Hook have additional cache volumes mounted. These are defined in each runner type’s ConfigMap (arc-hook-cpu, arc-hook-gpu, etc.) within worker-podspec.yaml.

# arc-hook-cpu ConfigMap (worker-podspec.yaml) — excerpt
spec:
  containers:
    - name: "$job"
      env:
        - name: HF_HOME
          value: /mnt/cache/huggingface
        - name: CCACHE_DIR
          value: /mnt/cache/ccache
        - name: UV_CACHE_DIR
          value: /mnt/cache/uv
      resources:
        limits:
          cpu: "32"
          memory: "128Gi"
      volumeMounts:
        - name: huggingface-cache
          mountPath: /mnt/cache/huggingface
        - name: ccache-cache
          mountPath: /mnt/cache/ccache
        - name: uv-cache
          mountPath: /mnt/cache/uv
  volumes:
    - name: huggingface-cache
      persistentVolumeClaim:
        claimName: huggingface-runner-pvc    # 1Ti NFS (shared across all runners)
    - name: ccache-cache
      hostPath:
        path: /tmp/ccache                    # Node-local cache
    - name: uv-cache
      hostPath:
        path: /tmp/uv                        # Node-local cache

The volume strategy organized by layer:

LayerVolume TypeCapacityLifecyclePurpose
Work directoryEphemeral PVC (rook-ceph-block)15Gi / RunnerPod-boundCheckout, build artifacts
Model cachePVC (huggingface-runner-pvc)Persistent (shared)HuggingFace models, datasets
Build cachehostPathNode diskNode-boundccache, uv package cache

Notably, huggingface-runner-pvc is a PVC shared across all runners, preventing multi-GB LLM models from being downloaded on every build. ccache and uv caches use hostPath to share between runners scheduled on the same node.

However, hostPath caches can cause lock contention when multiple runners execute simultaneously on the same node. We actually encountered this issue with uv cache, which we resolved by separating cache paths per runner using the UV_CACHE_DIR environment variable.


Vault: Secret Lifecycle Management

Secret management (registry credentials, API keys, signing keys, etc.) in CI/CD pipelines is critical from both security and operational perspectives.

GitHub Secrets Limitations

GitHub’s Repository Secrets and Organization Secrets suffice for small-scale environments. However, as repositories scale to dozens, the following problems emerge:

  • Duplicate management: Identical secrets registered across multiple repositories → full manual updates required on rotation
  • No audit trail: No visibility into when a secret was last updated or by whom
  • No privilege separation: Repository Admin permissions required for secret access

The most critical trigger for adopting Vault was devcontainer image tag management. Previously, we managed devcontainer image tags via GitHub Repository Variables (e.g., vars.DEVCONTAINER_IMAGE_AIDA_CU126). Every time a new image was built, a developer had to manually update the variable, and frequent omissions meant workflows ran with outdated images. After adopting Vault, CI pipelines automatically record the latest image tag in Vault upon build completion, and subsequent workflows and the Developer Portal reference the latest value via needs.fetch-secrets.outputs. The human step of managing/updating image version information is completely eliminated.

Vault with Dual Auth Strategy

We deployed HashiCorp Vault on the Kubernetes cluster and applied different Auth Methods depending on the access subject.

Vault Auth Strategy

ARC Runners use the JWT Auth Method. GitHub Actions’ OIDC provider issues an ID Token, which is submitted to Vault for authentication. By declaring permissions: id-token: write in the workflow, GitHub automatically issues an OIDC token, and hashicorp/vault-action forwards it to Vault.

# Vault secret injection in workflows (JWT Auth)
permissions:
  id-token: write    # Enable GitHub OIDC token issuance
  contents: read

steps:
  - name: Import Secrets from Vault
    uses: hashicorp/vault-action@v3
    with:
      url: ${{ secrets.VAULT_ACTION_URL }}
      method: jwt                              # GitHub OIDC JWT
      role: ${{ secrets.VAULT_ACTION_ROLE }}
      exportToken: true
      secrets: |
        secret/data/harbor username | HARBOR_USERNAME ;
        secret/data/harbor password | HARBOR_PASSWORD

In contrast, the Developer Portal runs as a Pod directly inside the Kubernetes cluster, so it authenticates via the Kubernetes Auth Method using its ServiceAccount token. By applying separate Auth Methods suited to each access subject’s characteristics, we optimize the security model for each path.

The key benefits of this architecture are twofold:

Single management point: Secret rotation requires only a single update in Vault, immediately reflected across all pipelines.

Audit logging: Every secret access records who, when, and what — satisfying security audit requirements.

fetch-secrets: Secret Centralization via Reusable Workflows

Even with Vault in place, if each workflow individually implements Vault authentication and secret retrieval logic, duplicate code proliferates. To prevent this, we designed a fetch-secrets reusable workflow (GitHub Actions workflow_call). Vault authentication (JWT) and secret retrieval logic are encapsulated in this single workflow, and callers simply reference the outputs.

# docker-build-push.yml — Caller side
jobs:
  fetch-secrets:
    uses: ./.github/workflows/fetch-secrets.yml    # Delegate Vault auth/retrieval
    secrets: inherit

  build:
    needs: [fetch-secrets]
    steps:
      - name: Log in to Harbor
        uses: docker/login-action@v3
        with:
          registry: ${{ needs.fetch-secrets.outputs.harbor_registry_url }}
          username: ${{ needs.fetch-secrets.outputs.harbor_username }}
          password: ${{ needs.fetch-secrets.outputs.harbor_password }}

Build workflows don’t even need to know Vault exists — they simply retrieve values from needs.fetch-secrets.outputs. Even if Vault’s secret paths change, updating fetch-secrets.yml alone propagates to all pipelines.


GitHub Actions Exporter: Pipeline Observability

To elevate the operational maturity of CI/CD infrastructure, observability is essential. GitHub Actions’ web UI is adequate for checking individual workflow statuses, but falls short for real-time identification of cross-repository trends, bottleneck points, and anomalies.

To address this, we developed a GitHub Actions Exporter in Go.

Development Rationale

Three metrics not provided by existing open-source exporters were needed:

  • Queue wait time by runner label: To identify bottleneck points in a mixed ARC + Hosted Runner environment
  • Consecutive failure tracking: For real-time detection and early response to serial workflow failures
  • Branch-level analysis: Separate analysis needed since main and feature branch build patterns differ

Architecture and Key Metrics

GitHub REST API  ──→  Collector  ──→  /metrics endpoint
                                           │
                                    Prometheus Scrape
                                           │
                                    Grafana Dashboard + AlertManager
MetricDescriptionUsage
workflow_runs_totalTotal workflow run countUsage trend analysis
workflow_failure_rateFailure rate (0.0 ~ 1.0)Quality monitoring
workflow_duration_secondsExecution time histogramPerformance regression detection
workflow_queue_time_secondsQueue wait timeRunner shortage detection
workflow_consecutive_failuresConsecutive failure countImmediate alert trigger
workflow_runs_in_progressCurrently running workflowsReal-time status
workflow_runs_by_branch_totalRuns per branchBranch strategy analysis

Deployed as a Kubernetes Deployment and scraped via Prometheus ServiceMonitor. ARC’s Listener Pods also have Prometheus metric annotations configured, enabling collection of runner scaling metrics.

Decision-making Through Observability

Issues detected and addressed through the dashboard:

  • Queue time spikes: Concurrent job surges hitting maxRunners limits → adjusted Scale Set maximum runner counts
  • 50%+ failure rate on a specific workflow: Identified Docker layer cache expiration pattern → revised caching strategy
  • Gradual build time increase: Growing test cases extended builds from 30 to 45 minutes → applied test parallelization
# Identify workflows with failure rate > 20%
github_actions_workflow_failure_rate > 0.2

# Queue wait exceeding 60s — runner shortage signal
github_actions_workflow_queue_time_seconds_avg > 60

# 3+ consecutive failures — immediate response needed
github_actions_workflow_consecutive_failures >= 3

Infrastructure Operations & Maintenance Automation

We automated operational tasks to maintain the stability and currency of the infrastructure.

Automated Vault Backups

Secrets and policies stored in Vault are critical data for recovery in case of cluster failure. A Raft snapshot is created every Sunday morning and backed up to two storage locations (AWS S3 and on-premise MinIO). Old backups are automatically pruned according to the retention policy (default 6 days).

ARC & Runner Version Tracking

GitHub Actions Runner and Container Hooks are continuously updated. A version tracking workflow runs every Monday to check for the latest releases and compare them with the current versions. If a new version is detected, it automatically opens a PR to notify administrators, ensuring the runner environment stays up-to-date. ARC Runner Version Update Notification


Full Architecture

Bringing all components together yields the following architecture:

Full Architecture

ComponentRole
AutoScalingRunnerSet (x7)Purpose-specific runner scaling policies (1 DinD + 6 K8s mode)
Listener PodJob detection via GitHub Long Poll, Prometheus metrics exposure
EphemeralRunnerJIT token registration → job execution → auto-deletion
Container Hook + worker-podspecWorkflow Pod creation with volume/resource injection in K8s mode
VaultCentralized secret management with JWT (ARC) / K8s Auth (Portal)
Rook-CephEphemeral PVC backend (runner work directories)
GitHub Actions ExporterWorkflow metric collection, Prometheus exposure

Quantitative Impact

MetricBeforeAfter
Build queue wait time3+ minutes averageUnder 15 seconds
Secret managementManual per-repositorySingle Vault management
Incident detection latencyPost-inquiry confirmationReal-time Grafana dashboard
Hardware resource managementManual server allocationnodeAffinity auto-scheduling
Build reproducibilityNon-deterministicGuaranteed via Ephemeral Pods

For hardware with limited availability like GPUs and FPGAs, maxRunners (3 each) caps concurrent job counts to match available hardware, preventing resource contention.


Closing

This article covered the full journey from the structural limitations of self-hosted runners to a complete CI/CD infrastructure redesign with ARC. In particular — the architectural differences between DinD mode and Kubernetes mode, the Rook-Ceph based Ephemeral PVC and multi-layer cache strategy, secret lifecycle management through Vault, and pipeline observability through a custom-built Exporter — we aimed to share not just the tools adopted, but the technical rationale behind each design decision.

Thank you for reading!


P.S.: HyperAccel is Hiring!

Vault manages secrets, ARC schedules workloads, Rook abstracts storage, and Prometheus observes everything. Each plays a different role, but when combined within a single cluster, they form a complete system. HyperAccel works the same way — experts across HW, SW, and AI come together, moving toward one goal. If you’d like to be part of this combination, visit HyperAccel Career.

Reference