CtrlK
BlogDocsLog inGet started
Tessl Logo

workflow-triaging

Triage AEM Workflow issues on AEM as a Cloud Service by classifying symptoms, gathering the right logs and metrics, and mapping to runbooks or Splunk searches. Use when the user asks for workflow activity/errors on a Cloud Service host, needs to classify a Jira ticket, or wants to know what to collect for workflow debugging.

90

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

AEM Workflow Triaging — Cloud Service

Classify workflow issues, determine what logs and data to gather, and map to the correct runbook or log search. Optimized for production support on AEM as a Cloud Service.

Variant Scope

  • This skill is cloud-service-only.
  • Log access via Cloud Manager download or log streaming.
  • No JMX — workflow counts and queue metrics come from logs, APIs, or Developer Console.

When to use this skill

  • User asks: "Workflow errors on <host> for the past X hours", "Workflow activity on <host>", "Why did workflow X fail?", "What should I collect to debug this workflow ticket?"
  • User needs: Symptom classification, log patterns to search, Splunk queries, or required inputs for a runbook.
  • Context: AEM Cloud Service (e.g. cm-p12345-e67890).

Step 1: Classify symptom (symptom_id)

Map the user's description to a symptom_id and runbook.

User says / observessymptom_idRunbook
Workflow not moving to next step; stuck in Runningworkflow_stuck_not_progressingrunbook-workflow-stuck.md
Task should be in Inbox but is not visibletask_not_in_inboxrunbook-task-not-in-inbox.md
Workflow should start automatically but no instance createdworkflow_not_starting_launcherrunbook-launcher-not-starting.md
Workflow in Failed state or step shows errorworkflow_fails_or_shows_errorrunbook-workflow-fails-or-shows-error.md
Step failed after retries; failure item in Inboxstep_failed_retries_exhaustedrunbook-failed-work-items.md
Instance Running but no current work item (inconsistent)stale_workflow_no_work_itemrunbook-stale-workflows.md
Too many instances; slow queries; disk/repo bloatrepository_bloat_too_many_instancesrunbook-purge-and-cleanup.md
User cannot see work item or complete/delegate/returnuser_cannot_see_or_complete_itemrunbook-inbox-and-permissions.md
Cannot delete workflow model (running instances)cannot_delete_modelrunbook-model-delete-and-update.md
Jobs queued a long time; slow completion; queue depth highslow_throughput_queue_backlogrunbook-job-throughput-and-concurrency.md
New or changed workflow not starting or step not executingworkflow_setup_validationrunbook-validate-workflow-setup.md

Step 2: Required inputs for triage

Before suggesting a runbook or Splunk search, try to obtain:

InputPurpose
Host / instancee.g. cm-p163724-e1759416 (Cloud Service program-environment format).
Time rangee.g. "past 4 hours", "past 10 hours" – for log/Splunk scope.
Workflow model or step namee.g. "Dynamic Media Reupload", "DAM Update Asset", "testmodel".
Instance ID (if known)From Workflow console URL or payload; ties logs to one instance.
Payload path (if known)e.g. /content/dam/...; for path-related errors.
Log sourceCloud Manager log download, log streaming, or Splunk index/sourcetype.

If the user only provides host + time, respond with the generic workflow error searches and note that narrowing by model/instance ID will improve accuracy.


Step 3: Log patterns and Splunk (what to search)

Logs on Cloud Service are accessed via Cloud Manager → Environments → Logs (download or streaming). When logs are in Splunk (or any log aggregator), use these patterns.

ScenarioPrimary log pattern(s)Splunk hint
Step failedError executing workflow stepAdd instance ID or model name to narrow.
Process not foundgetProcess for '*' failedExtract process name for OSGi check.
Stuck at Process stepSame as step failed + getProcessCombine with payload path.
Stale workflowCannot archive workitemCorrelate time with instance.
Lock / throughputwait for a lock or refreshing the session since we had to waitTimechart by host.
PermissionTerminate failed / Resume failed / Suspend failed + verifyAccessOr AccessControlException.
Payload pathPathNotFoundException + workflow/payloadLauncher: "launcher config".
Launcher not startingError adding launcher config / Error retrieving launcher config entriesPath: /conf/global/settings/workflow/launcher/config.
Purge failureWorkflow purge '*' :Filter by repository exception / invalid state.

Example Splunk searches (replace index/sourcetype/field names as needed):

  • All workflow step errors (last 24h): index=aem sourcetype=aem:error "Error executing workflow step" | table _time host message | sort - _time
  • Process not registered: index=aem "getProcess for" "failed" | table _time host message
  • By workflow model or instance: index=aem ("Error executing workflow step" OR WorkflowException) (message=*<modelName>* OR message=*<instanceId>*) | sort - _time
  • Lock contention: index=aem "wait for a lock" OR "refreshing the session since we had to wait" | table _time host message

Step 4: Example triage prompts and responses

User promptTriage response
"Workflow errors on <host> for the past X hours"Classify as workflow_fails_or_shows_error / step_failed_retries_exhausted. Search Cloud Manager logs or Splunk for "Error executing workflow step", "Error processing workflow job", "getProcess for … failed" on that host. Route to runbook-workflow-fails-or-shows-error.
"Workflow activity on <host> for the past X hours"Clarify: "activity" = counts (started/completed/failed) or list of errors? For errors, use same searches. For counts on Cloud Service, use log aggregation or custom reporting API — no JMX.
"Why did <workflow-or-step> fail? Show failure details."Need: host, time range, and if possible instance ID. Search Cloud Manager logs for "Error executing workflow step" + model/step name or instance ID; return exception type, message, and stack. Route to runbook-workflow-fails-or-shows-error.
"Task not in Inbox"symptom_id: task_not_in_inbox. Route to runbook-task-not-in-inbox. Gather: instance ID, assignee, whether user is initiator/assignee; check Inbox filters and enforceWorkitemAssigneePermissions.
"Workflow not starting"symptom_id: workflow_not_starting_launcher. Route to runbook-launcher-not-starting. Gather: model name, payload path, launcher config path; search logs for launcher errors.
"Workflow stuck / not progressing"symptom_id: workflow_stuck_not_progressing. Route to runbook-workflow-stuck. First: Does instance have a current work item? If no → stale. If yes, follow decision tree by step type.

Step 5: What logs can and cannot answer

Can answer (with AEM workflow logs in Cloud Manager / Splunk):

  • Step failures: exception type, message, stack (by host, time, model, step).
  • Process not registered: which process.label is missing.
  • Stuck: step errors, getProcess failures, lock wait, payload/path errors.
  • Stale: "Cannot archive workitem" and transition errors.
  • Throughput: lock wait, session refresh, JobHandler volume.
  • Permission: Terminate/Resume/Suspend failed (verifyAccess), AccessControlException.
  • Payload/launcher: PathNotFoundException, launcher config errors.
  • Purge: "Workflow purge …" repository exception or invalid state.

Cannot answer directly (Cloud Service limitations):

  • Console state (e.g. "is there a current work item?"). Use Workflow Console UI or custom API.
  • JMX counts (e.g. countStaleWorkflows, queue depth). No JMX on Cloud Service — use log aggregation, custom HTTP APIs, or Developer Console.
  • Thread pool metrics. Request thread dump via Developer Console or support.
  • Configuration status ZIP. Request from support.

Always pair log-based triage with the appropriate runbook for actions (retry via Inbox, Purge Scheduler config, pipeline deploy).


References (in repo)

  • Machine-readable index: aem-agent-marketplace-workflow-knowledge-base/docs/debugging-index.md
  • Decision guide: runbooks/runbook-decision-guide.md
  • Splunk scenarios and queries: Workflow-docs/splunk-workflow-triaging.md
  • Error patterns: docs/error-patterns.md
Repository
adobe/skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.