Analyze Azure DevOps CI build failures in dd-trace-dotnet pipeline. This skill should be used when the user mentions a failing CI build, PR checks failing, Azure DevOps pipeline failures, test failures in CI, or when they share a build ID or PR number and want to understand what went wrong. Analyzes build failures, categorizes them (infrastructure/flaky/real), and provides actionable recommendations.
71
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Troubleshoot Azure DevOps pipeline failures with automated analysis.
pwsh) preferred; PowerShell 5.1 (powershell.exe, Windows only) is supported.az) — Optional for read-only analysis; required for stage retry.gh) — Optional; HTTP fallback for public repos.See scripts-reference.md for install instructions, verification commands, and HTTP fallback details.
If the user does not have PowerShell installed:
pwsh first: pwsh -Versionpowershell -NoProfile -Command '$PSVersionTable.PSVersion'Always prefer pwsh over powershell.exe when both are available.
Analyze CI failures for the dd-trace-dotnet repository.
Phase 1 - Quick Initial Analysis (DO THIS FIRST):
Assumption: All tests pass in master. If a test were consistently failing, it wouldn't have been merged.
Phase 2 - Deep Analysis (ONLY IF USER REQUESTS):
The skill accepts these invocation patterns:
pr <NUMBER> - Analyze failures for a GitHub PRbuild <BUILD_ID> - Analyze a specific Azure DevOps buildArguments are available as: $ARGUMENTS
Perform these steps quickly to give user an overview, then ask what they want to investigate.
Use the Get-AzureDevOpsBuildAnalysis.ps1 script for quick analysis:
No arguments provided — the script auto-detects the PR for the current branch:
pwsh -NoProfile -Command ".\tracer\tools\Get-AzureDevOpsBuildAnalysis.ps1 -Verbose"For PR analysis (pr <NUMBER>):
pwsh -NoProfile -Command ".\tracer\tools\Get-AzureDevOpsBuildAnalysis.ps1 -PullRequest $PR_NUMBER -Verbose"For direct build analysis (build <BUILD_ID>):
pwsh -NoProfile -Command ".\tracer\tools\Get-AzureDevOpsBuildAnalysis.ps1 -BuildId $BUILD_ID -Verbose"The script outputs:
Note: The script uses the scratchpad/temp directory automatically for saved artifacts.
Present a concise summary and ask what they want to investigate next.
After presenting the quick summary, check if any failed tests are likely snapshot verification failures. Detect these by looking for:
Received file does not match, *.received.* vs *.verified.* diffs, or Verify assertion failuresSubmitsTraces in Datadog.Trace.ClrProfiler.IntegrationTests.* — but only if the error is a snapshot diff, not a span count mismatchImportant: SubmitsTraces tests typically assert on span count first (Expected N spans but got M), then compare snapshots. A span count mismatch is a deeper issue (missing/extra instrumentation), not a snapshot problem — updating snapshots won't help. Only suggest snapshot updates when the error is specifically about snapshot file content differences.
If snapshot failures are detected:
./tracer/build.ps1 UpdateSnapshotsFromBuild --BuildId <BUILD_ID>./tracer/build.sh UpdateSnapshotsFromBuild --BuildId <BUILD_ID>.received.txt snapshot artifacts from the CI build and replaces local .verified.txt files in tracer/test/snapshots/When the user selects "Retry failed stages" from the investigation menu:
Show what would be retried — Run with -WhatIf first to preview:
pwsh -NoProfile -Command ".\tracer\tools\Retry-AzureDevOpsFailedStages.ps1 -BuildId $BUILD_ID -All -WhatIf"Confirm with user — Show the list of stages that would be retried and ask for confirmation.
Run the retry — After confirmation:
pwsh -NoProfile -Command ".\tracer\tools\Retry-AzureDevOpsFailedStages.ps1 -BuildId $BUILD_ID -All"Or for specific stages:
pwsh -NoProfile -Command ".\tracer\tools\Retry-AzureDevOpsFailedStages.ps1 -BuildId $BUILD_ID -Stage stage_identifier_1, stage_identifier_2"Optionally re-check status — After retrying, offer to re-run the analysis script later to check the new results.
Note: Use -ForceRetryAllJobs if the user wants to rerun all jobs in a stage, not just the failed ones.
DO NOT perform these steps automatically. Only do them if the user asks for:
Download Logs (if requested):
pwsh -NoProfile -Command ".\tracer\tools\Get-AzureDevOpsBuildAnalysis.ps1 -BuildId $BUILD_ID -IncludeLogs -Verbose"The script automatically:
Invoke-RestMethodNote: Log downloads use timeline URLs directly, not az devops invoke --resource logs which returns HTTP 500.
Important: Use actual Unicode emoji characters (e.g., ❌, 🔴, 🟡, 🔵, 🔍, ✅), NOT markdown emoji codes (e.g., :x:).
Structure the output as:
# CI Failure Analysis for Build <BUILD_ID>https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=<BUILD_ID>), PR link (if PR-triggered), Branch, Commit
triggerInfo["pr.sourceBranch"] instead of sourceBranchtriggerInfo["pr.number"] or parse from refs/pull/<NUMBER>/mergeFailureHierarchy field from the script output to show the full tree. Format as:
❌ stage_name
❌ failed_job_name
- failed_task_name
⚠️ canceled_job_name (duration)- prefix under their job.result="canceled" and duration >= 55 min (show duration)UpdateSnapshotsFromBuild commandFor detailed categorization rules, pattern examples, and the decision tree, see failure-patterns.md.
Quick reference — three categories:
gh pr checks <PR> --repo DataDog/dd-trace-dotnet.log.url field) and use curl or WebFetch to downloadaz devops invoke --resource logs API returns HTTP 500, so avoid it.issues field in timeline records for inline error messages (already available without downloading logs).issues are not the same as test failures - filter by result == "failed"Command: /analyze-azdo-build build 195272
Phase 1 Output (shown immediately):
# CI Failure Analysis for Build 195272
**Build**: 20260204-49
**Status**: ❌ Failed
**Failure Hierarchy (Stage > Job > Task)**:
❌ integration_tests_linux
❌ Test alpine_net8.0_Tracer
- docker-compose run IntegrationTests (Tracer)
❌ Test debian_net8.0_Tracer
- docker-compose run IntegrationTests (Tracer)
❌ integration_tests_windows
❌ Win x86_net8.0_Tracer
- Run integration tests (Tracer)
❌ Win x64_net8.0_Tracer
- Run integration tests (Tracer)
(and 4 more...)
⚠️ profiler_integration_tests
⚠️ Test alpine (43.2 min)
**Failed Tests**: 12
- Datadog.Trace.ClrProfiler.IntegrationTests.AspNetCore.AspNetCoreMvcTests.SubmitMetrics
- Datadog.Trace.ClrProfiler.IntegrationTests.AspNetCore.AspNetCoreMvcTests.TracingDisabled_DoesNotSubmitTraces
- Datadog.Trace.ClrProfiler.IntegrationTests.HttpClientTests.HttpClient_GetAsync_SubmitsTraces
- Datadog.Trace.ClrProfiler.IntegrationTests.HttpClientTests.HttpClient_PostAsync_SubmitsTraces
(and 8 more...)
What would you like to investigate?
1. Categorize failures
2. View specific logs
3. Show full analysis
4. Retry failed stagesCommand: /analyze-azdo-build pr 7806
Output:
Found Azure DevOps build 195272 for PR #7806
[Quick summary as in Example 1]
What would you like to investigate?
1. Categorize failures
2. View specific logs
3. Show full analysisdocs/development/CI/TroubleshootingCIFailures.md - Manual troubleshooting stepsdocs/development/CI/RunSmokeTestsLocally.md - Reproduce failures locally.azure-pipelines.yml - Pipeline configurationThe Azure DevOps timeline contains a hierarchy:
type field: "Stage", "Phase", "Job", or "Task"identifier field reveals platform/variant info (e.g., integration_tests_linux.Test.Job23)"succeeded", "failed", "canceled", "abandoned"result == "canceled", NOT "failed" — see failure-patterns.md for classification detailsgh and az CLIs are optional for read-only analysis (HTTP fallback for public repos/projects)az CLI with azure-devops extension is required for stage retry (PATCH operations)6249bb4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.