Diagnose and fix common AEM 6.5 LTS replication issues including blocked queues, connectivity failures, and content distribution problems
50
55%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/aem/6.5-lts/skills/aem-replication/troubleshoot-replication/SKILL.mdThis skill provides systematic troubleshooting guidance for Adobe Experience Manager 6.5 LTS replication issues. Use this to diagnose and resolve problems with content distribution, agent configuration, and replication workflows.
Use this skill when experiencing:
Follow this systematic approach to identify and resolve replication issues:
1. Verify Symptoms
↓
2. Check Agent Status
↓
3. Review Replication Queue
↓
4. Test Connectivity
↓
5. Examine Logs
↓
6. Verify Configuration
↓
7. Apply Fix
↓
8. Validate ResolutionSymptoms:
Diagnosis:
Check agent status:
Navigate to: Tools → Deployment → Replication → Agents on author
Look for: Red indicator next to agent nameView queue details:
Click agent name
Review queue entries
Check error message on failed itemRoot Causes:
Solutions:
Solution A: Retry Failed Item
Steps:
1. Open blocked replication agent
2. Click "Force Retry" button
3. Monitor queue to see if item processes
4. If successful, remaining items will process automaticallySolution B: Clear Failed Item
Steps:
1. Open blocked replication agent
2. Select failed item in queue
3. Click "Clear" to remove it
4. Remaining items will process
5. Manually re-replicate cleared content if neededSolution C: Restart Replication Components
Navigate to: /system/console/bundles
Search for: "replication"
Restart these bundles:
- com.day.cq.cq-replication
- com.day.cq.cq-replication-audit
- com.day.cq.wcm.cq-wcm-replication
Steps:
1. Find bundle
2. Click "Stop"
3. Wait for status: Resolved
4. Click "Start"
5. Verify status: ActiveSolution D: Restart Event Processing
OSGi Console: /system/console/bundles
Restart: Apache Sling Event Support (org.apache.sling.event)
This clears event queue backlogsSymptoms:
Diagnosis:
Verify Publish instance is running:
# Check if Publish is accessible
curl -I http://publish-host:4503/system/console
# Or browse to:
http://publish-host:4503/system/consoleTest network connectivity:
# From Author server
telnet publish-host 4503
# Or
nc -zv publish-host 4503
# Or
ping publish-hostCheck replication agent URI:
Navigate to: Agent → Edit → Transport tab
Verify: URI matches Publish host and port
Expected: http://publish-host:4503/bin/receive?sling:authRequestLogin=1Root Causes:
Solutions:
Solution A: Start Publish Instance
cd /path/to/publish/crx-quickstart
./bin/startSolution B: Fix Network/Firewall
1. Verify firewall rules allow Author → Publish on port 4503
2. Check network ACLs and security groups (cloud environments)
3. Verify no proxy blocking connection
4. Test from Author server command lineSolution C: Correct Agent URI
Steps:
1. Edit replication agent
2. Transport tab
3. Update URI to correct host/port:
http://correct-publish-host:4503/bin/receive?sling:authRequestLogin=1
4. Save
5. Test ConnectionSymptoms:
Diagnosis:
Check agent credentials:
Agent → Edit → Transport tab
Verify: User and Password fieldsVerify user exists on Publish:
Publish instance: http://publish:4503/crx/explorer
Navigate to: /home/users
Search for: replication service userCheck user permissions:
On Publish instance:
User → Permissions
Required: Read, Write, Replicate privilegesRoot Causes:
Solutions:
Solution A: Update Credentials
Steps:
1. Edit replication agent
2. Transport tab
3. Enter correct username
4. Enter correct password
5. Save
6. Test ConnectionSolution B: Create/Enable User on Publish
On Publish instance:
1. Navigate to: Security → Users
2. Create user: replication-service
3. Set password (match Agent configuration)
4. Save
Grant permissions:
1. Navigate to: Security → Permissions
2. Select user: replication-service
3. Add entries:
- Path: /content
- Privileges: jcr:read, crx:replicate, jcr:write
4. SaveSolution C: Reset Password
On Publish instance:
1. Navigate to: Security → Users
2. Find user in agent configuration
3. Click "Set Password"
4. Enter new password
5. Save
On Author:
1. Update replication agent with new password
2. Save
3. Test ConnectionSymptoms:
Diagnosis:
Check agent URI protocol:
Agent → Transport tab
URI: https://... or http://...Review error logs:
error.log contains:
- javax.net.ssl.SSLHandshakeException
- PKIX path building failed
- Certificate validation failedSolutions:
Solution A: Enable Relaxed SSL (Development Only)
WARNING: Only for development/testing environments
Steps:
1. Edit replication agent
2. Transport tab
3. SSL section:
✓ Relaxed SSL (allow self-signed certificates)
✓ Allow expired (allow expired certificates)
4. Save
5. Test ConnectionSolution B: Import Certificates (Production)
On Author instance:
1. Export certificate from Publish:
openssl s_client -connect publish:4503 -showcerts > publish-cert.pem
2. Import into Java keystore:
cd $JAVA_HOME/jre/lib/security
keytool -import -alias publish-aem -file publish-cert.pem \
-keystore cacerts -storepass changeit
3. Restart AEM Author
4. Test replication agent connectionSolution C: Use HTTP (Not Recommended for Production)
If SSL is not required:
1. Edit agent
2. Transport tab
3. Change URI from https:// to http://
4. Save
5. Test ConnectionSymptoms:
Diagnosis:
Check content directly on Publish:
Bypass Dispatcher:
http://publish:4503/content/mysite/en/page.html
If content appears here but not via Dispatcher:
→ Dispatcher cache issue
If content doesn't appear:
→ Replication issueVerify replication status:
On Author:
Page → Properties → Basic tab
Check: Last Published timestamp
Verify: Status shows "Published"Check Publish logs:
Publish instance: crx-quickstart/logs/error.log
Search for: path of page
Look for: Errors during content importRoot Causes:
Solutions:
Solution A: Manual Dispatcher Cache Clear
# On Dispatcher server
cd /path/to/dispatcher/cache
rm -rf *
# Or specific path
rm -rf /path/to/dispatcher/cache/content/mysite/en/*
# Check Dispatcher logs
tail -f /path/to/dispatcher/logs/dispatcher.logSolution B: Verify Dispatcher Flush Agent
On Publish instance:
1. Navigate to: Tools → Deployment → Replication
2. Select: Agents on publish
3. Click: Dispatcher Flush
4. Verify: Enabled = ✓
5. Transport tab:
URI: http://dispatcher:80/dispatcher/invalidate.cache
6. Test Connection
7. If failed, fix connectivitySolution C: Check Content Permissions on Publish
On Publish instance:
1. Navigate to: CRXDE Lite
2. Browse to: /content/mysite/en/page
3. Check node exists
4. Verify permissions: anonymous user can read
5. If not, adjust permissionsSolution D: Force Republish
On Author:
1. Select page(s)
2. Manage Publication
3. Action: Unpublish
4. Execute
5. Wait for completion
6. Manage Publication
7. Action: Publish
8. Execute
9. Verify on PublishSymptoms:
Diagnosis:
Check Dispatcher Flush agent:
Publish instance: /etc/replication/agents.publish/flush
Status: Should be green (idle/active)Review Dispatcher configuration:
dispatcher.any file:
/allowedClients {
/0 { /type "allow" /glob "*publish-ip*" }
}
/invalidate {
/0000 { /glob "*" /type "allow" }
}Check Dispatcher logs:
tail -f /var/log/httpd/dispatcher.log
Look for invalidation requests:
[date] [I] [pid] Received invalidate requestSolutions:
Solution A: Enable Dispatcher Flush Agent
On Publish instance:
1. Navigate to: /etc/replication/agents.publish/flush
2. Edit agent
3. Settings tab: ✓ Enabled
4. Serialization Type: Dispatcher Flush
5. SaveSolution B: Fix Dispatcher Configuration
Edit dispatcher.any:
/allowedClients {
/0 {
/type "allow"
/glob "*<publish-instance-ip>*"
}
}
/cache {
/invalidate {
/0000 { /glob "*" /type "allow" }
}
}
Reload Dispatcher:
apachectl gracefulSolution C: Verify Flush Agent Transport
Dispatcher Flush agent → Transport tab
Correct URI format:
http://dispatcher-host:80/dispatcher/invalidate.cache
OR if virtual host:
http://www.example.com/dispatcher/invalidate.cache
Test ConnectionSymptoms:
Diagnosis:
Check agent timeouts:
Agent → Edit → Extended tab
Connection Timeout: default 10000ms
Socket Timeout: default 10000msReview package size:
Large packages (>100MB) may timeout
Check: crx-quickstart/logs/replication.logSolutions:
Solution A: Increase Timeouts
Agent → Edit → Extended tab
Connection Timeout: 30000 (30 seconds)
Socket Timeout: 60000 (60 seconds)
For very large packages: 120000 (2 minutes)Solution B: Use Asynchronous Replication
For large content:
1. Use default async replication (not synchronous)
2. Monitor queue instead of waiting
3. Package-based replication for very large setsSolution C: Split Large Packages
Instead of tree activation:
1. Activate in smaller batches
2. Use incremental replication
3. Schedule large activations during off-peak hoursSymptoms:
Diagnosis:
Check enabled agents:
Navigate to: /etc/replication/agents.author
Verify: At least one agent is enabled
Check: Green status indicatorReview agent triggers:
Agent → Edit → Triggers tab
Check: "Ignore default" is NOT checked
Verify: Appropriate triggers enabledRoot Causes:
Solutions:
Solution A: Enable Default Agent
Steps:
1. Navigate to: /etc/replication/agents.author/publish
2. Edit agent
3. Settings tab: ✓ Enabled
4. Triggers tab: Uncheck "Ignore default"
5. SaveSolution B: Check Agent Filters (Programmatic)
// If using ReplicationOptions in code
ReplicationOptions opts = new ReplicationOptions();
// Ensure filter doesn't exclude all agents
opts.setFilter(new AgentFilter() {
public boolean isIncluded(Agent agent) {
// Return true for at least one agent
return !agent.getId().contains("invalid");
}
});Solution C: Verify Agent Configuration
For each agent:
1. Enabled: ✓
2. Transport URI: Valid and reachable
3. Test Connection: Success
4. Triggers: At least one enabled
5. Ignore default: Unchecked (unless custom workflow)Symptoms:
Diagnosis:
From official AEM 6.5 LTS documentation: Check for blocking nodes in repository.
Navigate to: CRXDE Lite (/crx/de/index.jsp)
Search for: /bin/replicate or /bin/replicate.json
These nodes may block the replication servletRoot Cause:
Custom nodes created at /bin/replicate or /bin/replicate.json can override the default replication servlet, preventing normal replication operations.
Solution:
Steps:
1. Navigate to CRXDE Lite: http://localhost:4502/crx/de/index.jsp
2. Check path: /bin/replicate
3. If node exists and is not the system servlet:
- Right-click node
- Select "Delete"
- Save All
4. Repeat for: /bin/replicate.json
5. Test replicationVerification:
After deletion:
1. Activate a test page
2. Check replication queue processes
3. Verify content appears on PublishSymptoms:
Diagnosis:
From official AEM 6.5 LTS documentation: Replication user lacks namespace management privileges.
Navigate to: CRXDE Lite
Path: Repository level (/)
Check: Replication user privilegesRoot Cause:
The replication user (configured in agent's "Agent User Id") doesn't have jcr:namespaceManagement privilege, which is required to replicate custom namespaces.
Solution:
Steps:
1. Navigate to CRXDE Lite
2. Select repository root: /
3. Click "Access Control" tab
4. Find replication service user
5. Add privilege:
- Privilege: jcr:namespaceManagement
- Apply
6. Save AllGrant via CRX/DE:
1. Tools → Security → Permissions
2. Search for: replication-service user
3. Repository level permissions:
✓ jcr:read
✓ jcr:write
✓ crx:replicate
✓ jcr:namespaceManagement ← Add this
4. SaveSymptoms:
/var/replication/data has many itemsDiagnosis:
From official AEM 6.5 LTS documentation: Check for corrupted replication jobs.
Check event queue:
Navigate to: CRXDE Lite
XPath Query:
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)
This shows all pending Sling event jobsCheck replication data:
Path: /var/replication/data
Look for: Large number of nodesRoot Cause: Repository corruption or serialization errors can cause replication jobs to get stuck in the Sling event queue.
Solution A: Clean Event Jobs
Via CRXDE Lite:
1. Run XPath query:
/jcr:root/var/eventing/jobs//element(*,slingevent:Job)
2. Review results for stuck jobs
3. Identify jobs with:
- Old timestamps
- Error properties
- Replication-related topic
4. Carefully delete stuck jobs
5. Save AllSolution B: Clear Replication Data
WARNING: Only if queue is irreparably stuck
1. Stop AEM instance
2. Navigate to: crx-quickstart/repository/
3. Backup: /var/replication/data
4. Delete corrupted items in /var/replication/data
5. Start AEM
6. Verify replication resumesSolution C: Enable Detailed Logging
From official documentation - configure detailed replication logging:
Navigate to: /system/console/configMgr
Search for: Apache Sling Logging Logger Configuration
Create new configuration:
- Logger: com.day.cq.replication
- Log Level: DEBUG
- Log File: logs/replication.log
Save and review logs for root causeSymptoms:
Diagnosis:
From official AEM 6.5 LTS documentation: Queue pause has known limitations.
Known Limitations:
Not persisted across restarts
Auto-resume timeout
Workaround:
Instead of pausing, disable the agent:
Agent configuration:
1. Edit agent
2. Settings tab
3. Uncheck "Enabled"
4. Save
This persists across restartsFor temporary pause:
Accept the limitations:
- Must re-pause after restart
- Must re-pause if idle >1 hour
- Use agent monitoring to track stateLocation: crx-quickstart/logs/replication.log
Key patterns to search:
# Successful replication
grep "Replication (ACTIVATE) of /content/mysite" replication.log
# Failed replication
grep "ERROR" replication.log | grep replication
# Agent not found
grep "no agent found" replication.log
# Authentication failures
grep "401" replication.log
# Connection issues
grep "Connection refused" replication.logExample log analysis:
# Find all replication attempts for a path
grep "/content/mysite/en/page" replication.log
# Count failures by type
grep "ERROR" replication.log | cut -d' ' -f5- | sort | uniq -c | sort -rn
# Recent replication activity
tail -100 replication.log | grep "ACTIVATE\|DEACTIVATE"Navigate to: /system/console/jmx
Search for: com.day.cq.replication
Monitor MBeans:
- Replication Agent Stats
- Queue Size
- Number of queued items
- Last processed item
- Error count
- Replication Service
- Active replications
- Failed replications
- Average processing timeNavigate to: /system/console/configMgr
Relevant configurations:
- Day CQ Replication Service
- Day CQ WCM Replication Impl ReplicationComponentFactoryImpl
- Apache Sling Job Consumer Manager
Verify:
- Services are active
- No configuration errors
- Thread pools not exhaustedNavigate to: /system/console/slingevent
Check:
- Event queue depth
- Stuck events
- Processing rate
- Failed events
If queue stuck:
- Restart org.apache.sling.event bundle
- Check disk space
- Review thread dumpsSchedule periodic tests:
Weekly:
1. Test Connection for all agents
2. Verify queues are empty
3. Review error logs
4. Check disk space on PublishSet up monitoring for:
Regular maintenance:
Monthly:
- Review and clear old logs
- Verify agent credentials
- Test disaster recovery procedures
- Update documentation
Quarterly:
- Certificate renewal checks
- Performance testing
- Capacity planning reviewUse this checklist for systematic troubleshooting:
□ Verify symptom and impact
□ Check replication agent status (green/red)
□ Review replication queue for stuck items
□ Test agent connectivity
□ Verify Publish instance is running
□ Check authentication credentials
□ Review error.log and replication.log
□ Verify agent configuration (URI, credentials, settings)
□ Check network connectivity (ping, telnet, curl)
□ Test direct Publish access (bypass Dispatcher)
□ Verify Dispatcher Flush agent (if applicable)
□ Check content permissions on Publish
□ Review OSGi bundles status
□ Examine Sling event queue
□ Check disk space on Author and Publish
□ Verify JVM heap usage
□ Test with simple content first
□ Document findings and resolutionIf issue persists after troubleshooting:
Gather diagnostic information:
Check Adobe Experience League Community:
Adobe Support (if entitled):
configure-replication-agent: Set up and configure agents properlyreplicate-content: Understand replication methodsreplication-api: Programmatic replication for custom code4517c39
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.