Set up or align a GitHub Actions deploy pipeline for an app or service. Use when standardizing repos around the verify-then-deploy shape: push to main → detect affected lanes → verify and build artifacts → e2e → deploy each lane to its host (Cloudflare Pages, AWS Amplify, GHCR + VPS).
99
100%
Does it follow best practices?
Impact
97%
1.21xAverage score across 4 eval scenarios
Passed
No known issues
Common failure modes when standing up or operating a deploy pipeline. Check here before rewriting the workflow.
cancel-in-progress: true. Both runs reached the host; whichever uploaded last wins. The first run's e2e validated an artifact that never shipped.concurrency: { group: deploy-<env>-<lane>, cancel-in-progress: false } at the job level. Verify and e2e can stay cancellable; deploy must serialize.if: gate uses success() instead of an explicit result == 'success' check. success() is true when the upstream job is skipped, which the lane-detection job does on unrelated changes.if: ${{ needs.verify-<lane>.result == 'success' && needs.e2e-<lane>.result == 'success' }}.paths-filter rules only mention app source paths, not CI/hosting paths. A change to .github/workflows/main.yml or Dockerfile wasn't redeployed because no lane claimed those paths.--affected, layer a path-based force-trigger on top of the package-graph result.Download build artifact step. Its "Total downloaded size" should match the upload step's bytes from verify.actions/download-artifact the exact name uploaded by verify. No npm run build in the deploy job. Pin both jobs' checkout to the same ${{ github.sha }}.dist/ from the previous run that was still on the runner, or the new build wrote to apps/web/build while the deploy points at apps/web/dist.actions/upload-artifact step has if-no-files-found: error. If it ever silently succeeded with 0 files, the artifact was empty.ls -la apps/web/dist debug step to confirm what's about to ship.PENDINGamplify:StartDeployment was never called, or the IAM role lacks amplify:GetJob so the polling loop can't progress.StartDeployment failed silently.amplify:CreateDeployment, amplify:StartDeployment, amplify:GetJob. Add structured logging in deploy-branch.ts so each Amplify API call writes its response to the GitHub log.sub claim doesn't match the workflow's actual claim. Common: trust policy says ref:refs/heads/main, workflow runs on a pull_request (claim is pull_request:<head-ref>).aws sts get-caller-identity after the configure step won't help — it never reaches that point. Look at the configure step's "Federated Authentication" log line and the role's trust policy side by side.StringLike sub claim with the workflow event. For preview deploys, use a separate role whose trust policy allows pull_request-event subs.permissions: { packages: write }, or the repo is private and the org's package visibility rules block the push.packages: write at the job level. For private repos, in org settings → Packages → "Container registry", confirm the source repo is allowed to publish.traefik.enable=true, so two routers match the same host rule. Traefik picks one in label order, which may not be the new one.curl -fsS http://<vps>:8080/api/http/routers (Traefik dashboard) and inspect both api-blue and api-green routers. Exactly one should have the production host rule.127.0.0.1:<port> directly, bypassing Traefik. Traefik routing changes (host rules, TLS, middleware) are exactly the kind of bug that fails public traffic but passes the local probe.traefik.enable=false. The label flip happens after the health probe, never before. If the probe fails, the playbook aborts and the live slot is untouched.op:// strings into the .env1password/load-secrets-action ran but OP_SERVICE_ACCOUNT_TOKEN was empty (typo, wrong secret name, or scoped to a different vault). The action exits 0 on missing references unless OP_CONNECT_TOKEN is also set, and op: strings pass through as literals.grep '^[A-Z_]*=op://' "$rendered_env_file" after the load step. Any match is a render failure.op:// literals remain. Add a guard step:
if grep -q '^[A-Z_]*=op://' "$rendered_env_file"; then
echo "::error::1Password references not resolved"; exit 1
fideploy → smoke → comment. The comment job must needs: the smoke job, not just the deploy job. A comment without a successful smoke is misinformation.main, retriggering the workflow.main. If a post-deploy file update is genuinely required (e.g. update a CDN manifest), put it in a separate workflow gated on a non-main ref, or commit with [skip ci] in the message and gate main.yml on !contains(github.event.head_commit.message, '[skip ci]').web-dist, tv-dist, never just dist). Verify with actions/list-artifacts if you need to inspect mid-run.workflow_dispatch deploy ignores my ref: inputactions/checkout step is missing with: { ref: ${{ inputs.ref }} }. Without it, checkout falls back to the workflow's commit, not the requested ref.ref: to checkout in deploy.yml. Sanity-check by running git rev-parse HEAD early in the job and confirming it matches inputs.ref.${{ github.ref }} instead of (env, lane), so two pushes on main correctly serialize but a manual dispatch from a different ref gets its own queue and can race.deploy-production-web), not ref-scoped. Same key in main.yml and deploy.yml. Different lanes get different keys so web and api can deploy in parallel.pull-requests: write, or the workflow was triggered from a fork (where GITHUB_TOKEN is read-only).permissions: { pull-requests: write } at the job level. For fork PRs, gate the comment step on github.event.pull_request.head.repo.full_name == github.repository.