CtrlK
BlogDocsLog inGet started
Tessl Logo

ci-pipeline-operations

Use when debugging CI failures, understanding the build pipeline, modifying the GitHub Actions workflow, working with artifact caching, or troubleshooting why a build succeeded locally but fails in CI

88

Quality

86%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

CI Pipeline Operations

Overview

The CI pipeline (.github/workflows/build-egg.yml) builds the Bluefin OCI image inside the bst2 container on Blacksmith runners, validates it with bootc container lint, and pushes to GHCR on main. Caching uses Blacksmith sticky disks (NVMe-backed Ceph) as primary storage, with GNOME upstream CAS (read-only, configured in project.conf) as a remote artifact source, and Cloudflare R2 as a read-only cold preseed for bootstrapping empty sticky disks.

Quick Reference

WhatValue
Workflow file.github/workflows/build-egg.yml
Runnerblacksmith-4vcpu-ubuntu-2404
Build targetoci/bluefin.bst
Build timeout120 minutes
bst2 containerregistry.gitlab.com/.../bst2:<sha> (pinned in workflow env.BST2_IMAGE)
GNOME CAS endpointgbm.gnome.org:11003 (gRPC, read-only)
Cache strategySingle Blacksmith sticky disk (NVMe-backed Ceph, ~3s mount, 7-day eviction)
Sticky diskbst-cache -> ~/.cache/buildstream (CAS + artifacts + source_protos + sources)
R2 roleRead-only cold preseed (bootstraps empty sticky disks)
R2 bucketbst-cache
Published imageghcr.io/projectbluefin/egg:latest and :$SHA
Build logs artifactbuildstream-logs (7-day retention)

Workflow Steps

#StepWhat it doesNotes
1CheckoutClones the repoStandard
2Pull bst2 imagepodman pull of the pinned bst2 containerSame image as GNOME upstream CI
3Mount BST cache (sticky disk)useblacksmith/stickydisk@v1 mounts NVMe volumeKey: ${{ github.repository }}-bst-cache; mounted at ~/.cache/buildstream
4Prepare cache layoutmkdir -p subdirsEnsures CAS/artifacts/source_protos/sources dirs exist
5Preseed CAS from R2Downloads cas.tar.zst, artifact refs, source protos from R2Cold cache only -- skips if sticky disk already has CAS objects; installs rclone on-demand
6Install justsudo apt-get install -y justUsed by build and export steps
7Generate BST configWrites buildstream-ci.conf with CI-tuned settingsNo remote artifact server -- only local cache + upstream GNOME
8Buildjust bst --log-file /src/logs/build.log build oci/bluefin.bst--privileged --device /dev/fuse; no --network=host needed
9Cache and disk statusdf -h + du -sh of cache componentsDiagnostic; always runs
10Export OCI imagejust export (checkout + skopeo load + bootc fixup)Uses Justfile recipe
11Verify image loadedpodman imagesDiagnostic
12bootc lintbootc container lint on exported imageValidates ostree structure, no /usr/etc, valid bootc metadata
13Upload build logsactions/upload-artifactAlways runs, even on failure
14Login to GHCRpodman login with GITHUB_TOKENMain only
15Tag for GHCRTags as :latest and :$SHAMain only
16Push to GHCRpodman push --retry 3 both tagsMain only

CI BuildStream Config

Generated as buildstream-ci.conf at step 8. Values and rationale:

SettingValueWhy
on-errorcontinueFind ALL failures in one run, not just the first
fetchers12Parallel downloads from artifact caches
builders1Conservative to avoid OOM on complex elements
network-retries3Retry transient network failures
retry-failedTrueAuto-retry flaky builds
error-lines80Generous error context in logs
cache-buildtreesneverSave disk; only final artifacts matter
max-jobs0Let BuildStream auto-detect (uses nproc)

Important: No remote artifact server is configured in buildstream-ci.conf. BuildStream uses only the local sticky disk cache and upstream GNOME caches defined in project.conf (read-only). Sticky disks auto-commit on job end -- no explicit upload step is needed.

Caching Architecture

Three layers, checked in order:

1. Sticky disk cache (~/.cache/buildstream/)
   Single NVMe-backed Ceph volume, persists across CI runs, ~3s mount
   Contains: CAS objects, artifact refs, source protos, source tarballs
   |-- miss -->
2. GNOME upstream CAS (https://gbm.gnome.org:11003)
   Read-only, configured in project.conf
   |-- miss -->
3. Build from source

Sticky Disk Details

Blacksmith sticky disks are NVMe-backed Ceph volumes that persist across CI runs. They auto-commit on job end (regardless of job outcome -- even failed builds persist their cache progress) and are evicted after 7 days of inactivity.

A single sticky disk holds the entire BuildStream cache:

Disk keyMount pointContains
${{ github.repository }}-bst-cache~/.cache/buildstreamCAS objects, artifact refs, source protos, source tarballs

Key behaviors:

  • ~3 second mount -- negligible overhead vs. downloading a multi-GB cache archive
  • Auto-commit on job end -- all changes written during the job are persisted automatically, even if the build fails
  • 7-day eviction -- disks unused for 7 days are reclaimed; the R2 preseed step handles cold recovery
  • Per-repository isolation -- disk keys include github.repository, so forks get separate disks
  • Single volume inside container -- the Justfile mounts ~/.cache/buildstream at /root/.cache/buildstream inside the bst2 podman container; only this single volume is visible to BuildStream

R2 Cold Preseed

R2 is used only for cold cache recovery (when sticky disks are empty). The preseed step:

  1. Checks if ~/.cache/buildstream/cas/ already has objects (warm disk = skip)
  2. Installs rclone on-demand (not a separate workflow step)
  3. Downloads cas.tar.zst from R2, validates with zstd -t, extracts
  4. Syncs artifact refs from r2:bst-cache/artifacts/
  5. Syncs source protos from r2:bst-cache/source_protos/

R2 is never written to during normal operation. The preseed data is a static snapshot. To refresh it, manually upload a new cas.tar.zst archive to R2.

Layer Summary

LayerConfigured inReadWriteContains
Sticky diskuseblacksmith/stickydisk@v1AlwaysAlways (auto-commit)CAS objects, artifact refs, source protos, source tarballs
GNOME upstreamproject.conf artifacts: sectionAlwaysNeverfreedesktop-sdk + gnome-build-meta artifacts
R2 preseedWorkflow step (rclone, on-demand)Cold cache onlyNeverBootstrap CAS snapshot (currently corrupt)

Trigger Modes

BehaviorScheduled (daily 08:00 UTC)Manual dispatch
Build runs?YesYes
bootc lint?YesYes
Sticky disk cache?YesYes
R2 preseed (cold cache)?YesYes
Push to GHCR?YesYes
ConcurrencySingle group; cancel-in-progress if another build arrivesSingle group; cancel-in-progress if another build arrives

Schedule timing: The workflow runs daily at 08:00 UTC, after GNOME OS publishes artifacts (~03:03 UTC, done by ~04:00) and source tracking runs (06:00 UTC).

Secrets and Permissions

SecretRequired?Purpose
R2_ACCESS_KEYOptionalCloudflare R2 access key ID (cold preseed only)
R2_SECRET_KEYOptionalCloudflare R2 secret access key (cold preseed only)
R2_ENDPOINTOptionalR2 S3-compatible endpoint (cold preseed only)
GITHUB_TOKENAuto-providedGHCR login (always pushes on success)

All R2 secrets are optional. If missing, the cold preseed step is skipped and the build proceeds using the sticky disk cache (if warm) plus GNOME upstream CAS. On a truly cold start without R2 secrets, everything builds from source -- slow but functional.

Job permissions: contents: read, packages: write.

bst2 Container Configuration

The bst2 container runs via podman run (NOT as a GitHub Actions container:), because sticky disk mounts must happen on the host before being bind-mounted into the container.

FlagWhy
--privilegedRequired for bubblewrap sandboxing inside BuildStream
--device /dev/fuseRequired for buildbox-fuse (ext4 on GHA lacks reflinks)
-v workspace:/src:rwMount repo into container
-v ~/.cache/buildstream:...:rwPersist CAS across steps (backed by sticky disk)
ulimit -n 1048576buildbox-casd needs many file descriptors
--no-interactivePrevents blocking on prompts in CI

Note: --network=host is not needed since there is no local cache proxy. The bst2 container only needs network access for GNOME upstream CAS, which is accessed directly.

Debugging CI Failures

Where to Find Logs

LogLocationContents
Build logbuildstream-logs artifact -> logs/build.logFull BuildStream build output
Preseed output"Preseed CAS from R2" step outputCold cache restore status (skipped if warm)
Workflow logGitHub Actions UI -> step outputEach step's stdout/stderr
Disk usage"Cache and disk status" stepdf -h + cache component breakdown

Common Failures

SymptomLikely causeFix
Build OOM or hangsToo many parallel buildersbuilders is already 1; check if element's own build is too memory-heavy
"No space left on device"BuildStream CAS fills sticky diskCheck cache-buildtrees: never is set; consider if sticky disk needs size increase
bootc container lint failsImage has /usr/etc, missing ostree refs, or invalid metadataCheck oci/bluefin.bst assembly script; ensure /usr/etc merge runs
Build succeeds locally, fails in CIDifferent element versions cached, or network-dependent sourcesCompare bst show output locally vs CI; check if GNOME CAS has stale artifacts
GHCR push failsToken permissions or rate limitingCheck packages: write permission; --retry 3 handles transient failures
Source fetch timeoutGNOME CAS or upstream source unreachablenetwork-retries: 3 handles transient issues; check GNOME infra status
Sticky disk mount failsBlacksmith infra issue or disk key mismatchCheck useblacksmith/stickydisk step output; verify disk key in workflow
Sticky disk is cold (7-day eviction)Normal -- disk was evicted after inactivityPreseed step handles this automatically if R2 secrets are configured
Preseed step fails but build continuesR2 secrets missing or R2 unreachablecontinue-on-error: true; build proceeds from GNOME upstream CAS + source builds
Preseed step skipped on warm diskNormal -- sticky disk already has CAS objectsNo action needed; this is the fast path

Debugging Workflow

  1. Check sticky disk mount: In the useblacksmith/stickydisk step output, verify the disk mounted successfully. If mounting fails, it's a Blacksmith infra issue.

  2. Check preseed status: If the build is unexpectedly slow, check the "Preseed CAS from R2" step. If it ran, the sticky disk was cold. If it was skipped ("Sticky disk already warm"), the cache should have been populated.

  3. Check disk space: Look at the "Cache and disk status" step -- it shows df -h and a breakdown of each cache component's size.

  4. Search build log: Download buildstream-logs artifact and look for [FAILURE] lines in logs/build.log. on-error: continue means all failures are collected in one run.

  5. Reproduce locally: just bst build oci/bluefin.bst uses the same bst2 container. See local-e2e-testing skill for full local workflow.

Cross-References

SkillWhen
local-e2e-testingReproducing CI issues locally
oci-layer-compositionUnderstanding what the build produces
debugging-bst-build-failuresDiagnosing individual element build failures
buildstream-element-referenceWriting or modifying .bst elements
Repository
projectbluefin/dakota
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.