Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
67
84%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Use Linkerd for:
Do not expect Linkerd to replace:
| Component | Role |
|---|---|
destination | Resolves service endpoints, traffic policies, and route configurations |
identity | Issues short-lived workload certificates (default 24h) for mTLS |
proxy-injector | Webhook that injects the sidecar proxy into pods at admission time |
Each meshed pod gets a linkerd-proxy sidecar (written in Rust). The proxy intercepts all inbound and outbound traffic transparently using iptables rules added by the linkerd-init init container.
The proxy handles:
:4191/metrics (scraped by Prometheus)linkerd CLI matching the control plane version# Pre-flight checks
linkerd check --pre
# Install CRDs
linkerd install --crds | kubectl apply -f -
# Install control plane
linkerd install | kubectl apply -f -
# Verify all components are healthy
linkerd checkBack the Linkerd identity issuer with cert-manager so the intermediate CA rotates automatically:
# Certificate issued by cert-manager for Linkerd identity
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: linkerd-identity-issuer
namespace: linkerd
spec:
secretName: linkerd-identity-issuer
duration: 48h
renewBefore: 25h
issuerRef:
name: linkerd-trust-anchor # ClusterIssuer backed by your root CA
kind: ClusterIssuer
commonName: identity.linkerd.cluster.local
dnsNames:
- identity.linkerd.cluster.local
isCA: true
privateKey:
algorithm: ECDSA
usages:
- cert sign
- crl sign
- server auth
- client authThen install Linkerd pointing at cert-manager's output:
linkerd install \
--identity-external-issuer \
| kubectl apply -f -linkerd viz install | kubectl apply -f -
linkerd viz checkAnnotate a namespace to auto-inject the proxy into every new pod:
apiVersion: v1
kind: Namespace
metadata:
name: payments
annotations:
linkerd.io/inject: enabledExisting pods are not re-injected automatically. Rollout the deployment after annotating:
kubectl rollout restart deployment -n paymentsOpt a specific pod out of injection inside an injected namespace:
spec:
template:
metadata:
annotations:
linkerd.io/inject: disabled# Check which pods in a namespace are meshed
linkerd check --namespace payments
# Show proxy version and injection status per pod
kubectl get pods -n payments -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.linkerd\.io/inject}{"\n"}{end}'kube-system and kube-public — control plane components will breaklinkerd namespace — already meshed differentlyhostNetwork: trueLinkerd automatically establishes mTLS between all meshed pods. No certificate management or application changes are needed.
proxy-injector injects the sidecar at pod admissionidentity component# Show secured/unsecured edges between deployments
linkerd viz edges deployment -n payments
# Output includes: SRC, DST, SECURED (yes/no)
# If a pod is unmeshed, the edge shows "no" — traffic is plaintextRestrict which identities can call a service (replaces or supplements NetworkPolicy):
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
name: payments-api
namespace: payments
spec:
podSelector:
matchLabels:
app: payments-api
port: 8080
proxyProtocol: HTTP/2
---
apiVersion: policy.linkerd.io/v1beta3
kind: AuthorizationPolicy
metadata:
name: payments-api-allow-checkout
namespace: payments
spec:
targetRef:
group: policy.linkerd.io
kind: Server
name: payments-api
requiredAuthenticationRefs:
- name: checkout-sa
kind: MeshTLSAuthentication
group: policy.linkerd.io
---
apiVersion: policy.linkerd.io/v1beta3
kind: MeshTLSAuthentication
metadata:
name: checkout-sa
namespace: payments
spec:
identities:
- "checkout.payments.serviceaccount.identity.linkerd.cluster.local"Linkerd uses the Kubernetes Gateway API (HTTPRoute) for traffic splitting and routing rules.
Route 90% of traffic to stable, 10% to canary:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: payments-api-canary
namespace: payments
spec:
parentRefs:
- name: payments-api
kind: Service
group: core
port: 8080
rules:
- backendRefs:
- name: payments-api-stable
port: 8080
weight: 90
- name: payments-api-canary
port: 8080
weight: 10apiVersion: policy.linkerd.io/v1alpha1
kind: HTTPLocalRateLimitPolicy
metadata:
name: payments-retry
namespace: paymentsFor retries, use annotations on the HTTPRoute or configure via the Linkerd retry annotation on the Service:
# Retry on 5xx responses, up to 2 retries
metadata:
annotations:
retry.linkerd.io/http: "5xx"
retry.linkerd.io/limit: "2"# Per-route timeout via HTTPRoute
spec:
rules:
- timeouts:
request: 10s
backendRequest: 5s# Success rate, RPS, and latency for all deployments in a namespace
linkerd viz stat deploy -n payments
# Drill into a specific deployment
linkerd viz stat deploy/payments-api -n payments
# Per-route breakdown
linkerd viz stat httproute -n payments# Tap live traffic to a deployment (shows headers, status codes, latency)
linkerd viz tap deploy/payments-api -n payments
# Filter to specific path
linkerd viz tap deploy/payments-api -n payments --path /api/v1/chargeLinkerd proxies expose metrics on port 4191. Scrape them with a PodMonitor (Prometheus Operator):
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: linkerd-proxy
namespace: monitoring
spec:
namespaceSelector:
any: true
podMetricsEndpoints:
- port: linkerd-admin
path: /metrics
selector:
matchLabels:
linkerd.io/control-plane-ns: linkerdKey metrics to alert on:
| Metric | Alert threshold |
|---|---|
request_total{direction="inbound", classification="failure"} | > 1% of total |
response_latency_ms_bucket{le="1000"} | p99 > 1000ms |
tcp_open_connections | sustained spike vs. baseline |
Linkerd multi-cluster mirrors services from one cluster into another using a gateway and a ServiceMirror controller.
On the target cluster (where the service lives):
# Install multi-cluster extension
linkerd multicluster install | kubectl apply -f -
linkerd multicluster check
# Generate link credentials for the source cluster
linkerd multicluster link --cluster-name production > link-production.yamlOn the source cluster (where you want to consume the service):
kubectl apply -f link-production.yaml
linkerd multicluster checkLabel the service in the target cluster to make it available cross-cluster:
metadata:
labels:
mirror.linkerd.io/exported: "true"The ServiceMirror controller creates a mirrored service in the source cluster named <service>-<cluster-name> (e.g., payments-api-production). Traffic to that service is tunnelled over mTLS to the target cluster's gateway.
# On source cluster — should see mirrored services
kubectl get svc -n payments | grep "production"
# Check mirroring is healthy
linkerd multicluster gatewaysFor every Linkerd issue: identify the layer (control plane / data plane / policy / multi-cluster), collect evidence, form a hypothesis, then fix.
Symptom: Pods not getting proxies injected
Evidence to collect:
kubectl describe namespace <ns> | grep -i inject
kubectl get mutatingwebhookconfigurations linkerd-proxy-injector -o yaml | grep -A5 namespaceSelector
linkerd checkLikely causes:
linkerd.io/inject: enabled annotationlinkerd.io/inject: disabled overrideproxy-injector webhook is failing — check linkerd check outputSymptom: mTLS edges showing no (plaintext)
Evidence to collect:
linkerd viz edges deployment -n <namespace>
kubectl get pods -n <namespace> -o wide # check both src and dst are meshed
linkerd check --namespace <namespace>Likely causes:
Fix:
kubectl rollout restart deployment/<name> -n <namespace>Symptom: linkerd check reports certificate expiry or identity errors
Evidence to collect:
linkerd check
kubectl get secret linkerd-identity-issuer -n linkerd -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -datesLikely causes:
Fix: rotate using cert-manager renewal or linkerd upgrade --identity-external-issuer.
Symptom: High latency after enabling Linkerd
Evidence to collect:
linkerd viz stat deploy -n <namespace>
linkerd viz tap deploy/<name> --max-rps 10
kubectl top pods -n <namespace> # check proxy CPULikely causes:
Fix: increase proxy resource limits via annotation:
annotations:
config.linkerd.io/proxy-cpu-limit: "2"
config.linkerd.io/proxy-memory-limit: "256Mi"Symptom: Multi-cluster mirrored service unreachable
Evidence to collect:
linkerd multicluster gateways # check gateway status
kubectl get svc -n linkerd-multicluster # check gateway LB IP is assigned
linkerd check --multiclusterLikely causes:
.claude-plugin
.github
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
supply-chain
terraform
references
scripts
skills
platform-skills
tests