Triage AEM Workflow issues on AEM as a Cloud Service by classifying symptoms, gathering the right logs and metrics, and mapping to runbooks or Splunk searches. Use when the user asks for workflow activity/errors on a Cloud Service host, needs to classify a Jira ticket, or wants to know what to collect for workflow debugging.
90
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Classify workflow issues, determine what logs and data to gather, and map to the correct runbook or log search. Optimized for production support on AEM as a Cloud Service.
Map the user's description to a symptom_id and runbook.
| User says / observes | symptom_id | Runbook |
|---|---|---|
| Workflow not moving to next step; stuck in Running | workflow_stuck_not_progressing | runbook-workflow-stuck.md |
| Task should be in Inbox but is not visible | task_not_in_inbox | runbook-task-not-in-inbox.md |
| Workflow should start automatically but no instance created | workflow_not_starting_launcher | runbook-launcher-not-starting.md |
| Workflow in Failed state or step shows error | workflow_fails_or_shows_error | runbook-workflow-fails-or-shows-error.md |
| Step failed after retries; failure item in Inbox | step_failed_retries_exhausted | runbook-failed-work-items.md |
| Instance Running but no current work item (inconsistent) | stale_workflow_no_work_item | runbook-stale-workflows.md |
| Too many instances; slow queries; disk/repo bloat | repository_bloat_too_many_instances | runbook-purge-and-cleanup.md |
| User cannot see work item or complete/delegate/return | user_cannot_see_or_complete_item | runbook-inbox-and-permissions.md |
| Cannot delete workflow model (running instances) | cannot_delete_model | runbook-model-delete-and-update.md |
| Jobs queued a long time; slow completion; queue depth high | slow_throughput_queue_backlog | runbook-job-throughput-and-concurrency.md |
| New or changed workflow not starting or step not executing | workflow_setup_validation | runbook-validate-workflow-setup.md |
Before suggesting a runbook or Splunk search, try to obtain:
| Input | Purpose |
|---|---|
| Host / instance | e.g. cm-p163724-e1759416 (Cloud Service program-environment format). |
| Time range | e.g. "past 4 hours", "past 10 hours" – for log/Splunk scope. |
| Workflow model or step name | e.g. "Dynamic Media Reupload", "DAM Update Asset", "testmodel". |
| Instance ID (if known) | From Workflow console URL or payload; ties logs to one instance. |
| Payload path (if known) | e.g. /content/dam/...; for path-related errors. |
| Log source | Cloud Manager log download, log streaming, or Splunk index/sourcetype. |
If the user only provides host + time, respond with the generic workflow error searches and note that narrowing by model/instance ID will improve accuracy.
Logs on Cloud Service are accessed via Cloud Manager → Environments → Logs (download or streaming). When logs are in Splunk (or any log aggregator), use these patterns.
| Scenario | Primary log pattern(s) | Splunk hint |
|---|---|---|
| Step failed | Error executing workflow step | Add instance ID or model name to narrow. |
| Process not found | getProcess for '*' failed | Extract process name for OSGi check. |
| Stuck at Process step | Same as step failed + getProcess | Combine with payload path. |
| Stale workflow | Cannot archive workitem | Correlate time with instance. |
| Lock / throughput | wait for a lock or refreshing the session since we had to wait | Timechart by host. |
| Permission | Terminate failed / Resume failed / Suspend failed + verifyAccess | Or AccessControlException. |
| Payload path | PathNotFoundException + workflow/payload | Launcher: "launcher config". |
| Launcher not starting | Error adding launcher config / Error retrieving launcher config entries | Path: /conf/global/settings/workflow/launcher/config. |
| Purge failure | Workflow purge '*' : | Filter by repository exception / invalid state. |
Example Splunk searches (replace index/sourcetype/field names as needed):
index=aem sourcetype=aem:error "Error executing workflow step" | table _time host message | sort - _timeindex=aem "getProcess for" "failed" | table _time host messageindex=aem ("Error executing workflow step" OR WorkflowException) (message=*<modelName>* OR message=*<instanceId>*) | sort - _timeindex=aem "wait for a lock" OR "refreshing the session since we had to wait" | table _time host message| User prompt | Triage response |
|---|---|
| "Workflow errors on <host> for the past X hours" | Classify as workflow_fails_or_shows_error / step_failed_retries_exhausted. Search Cloud Manager logs or Splunk for "Error executing workflow step", "Error processing workflow job", "getProcess for … failed" on that host. Route to runbook-workflow-fails-or-shows-error. |
| "Workflow activity on <host> for the past X hours" | Clarify: "activity" = counts (started/completed/failed) or list of errors? For errors, use same searches. For counts on Cloud Service, use log aggregation or custom reporting API — no JMX. |
| "Why did <workflow-or-step> fail? Show failure details." | Need: host, time range, and if possible instance ID. Search Cloud Manager logs for "Error executing workflow step" + model/step name or instance ID; return exception type, message, and stack. Route to runbook-workflow-fails-or-shows-error. |
| "Task not in Inbox" | symptom_id: task_not_in_inbox. Route to runbook-task-not-in-inbox. Gather: instance ID, assignee, whether user is initiator/assignee; check Inbox filters and enforceWorkitemAssigneePermissions. |
| "Workflow not starting" | symptom_id: workflow_not_starting_launcher. Route to runbook-launcher-not-starting. Gather: model name, payload path, launcher config path; search logs for launcher errors. |
| "Workflow stuck / not progressing" | symptom_id: workflow_stuck_not_progressing. Route to runbook-workflow-stuck. First: Does instance have a current work item? If no → stale. If yes, follow decision tree by step type. |
Can answer (with AEM workflow logs in Cloud Manager / Splunk):
process.label is missing.Cannot answer directly (Cloud Service limitations):
Always pair log-based triage with the appropriate runbook for actions (retry via Inbox, Purge Scheduler config, pipeline deploy).
aem-agent-marketplace-workflow-knowledge-base/docs/debugging-index.mdrunbooks/runbook-decision-guide.mdWorkflow-docs/splunk-workflow-triaging.mddocs/error-patterns.mded46712
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.