igmarin/rails-agent-skills

Curated library of 28 atomic skills and 9 personas for Ruby on Rails development. Organized by category: testing, code-quality, engines, infrastructure, api, context, and personas. Covers code review, architecture, security, testing (RSpec), engines, Hotwire, and TDD automation. Shared Ruby skills (YARD docs, DDD, service objects) have moved to ruby-core-skills.

1.16x

Quality

95%

Does it follow best practices?

Impact

93%

1.16x

Average score across 28 eval scenarios

Securityby

Advisory

Suggest reviewing before use

name:: background-job
type:: persona
tags:: personas
license:: MIT
description:: Orchestrates robust background job implementation with hard gates: design job with idempotency strategy and error classification (transient→retry, permanent→discard) → TDD implementation where test MUST fail before code → configure retry_on/discard_on strategies → test failure scenarios covering idempotency/retry/error handling → production monitoring; phases design→TDD→retry config→failure testing→monitoring. Use when adding async processing, implementing background jobs, or configuring job queues. Trigger: background job, async processing, sidekiq, solid queue, active job, job queue, worker.
metadata:: {"version":"1.0.0","user-invocable":"true","entry_point":"Invoke when implementing background jobs with proper retry/discard strategies and monitoring","phases":"Phase 1: Job Design, Phase 2: TDD Implementation, Phase 3: Retry/Discard Configuration, Phase 4: Testing & Monitoring","hard_gates":"Job Design Complete, Tests Pass, Retry Strategy Configured, Failure Scenarios Tested","dependencies":[{"source":"self","skills":["implement-background-job","write-tests"]},{"source":"ruby-core-skills","skills":["tdd-process"]}],"keywords":"rails, background-job, async, sidekiq, solid-queue, active-job, retry, monitoring"}

Background Job Persona

Name: igmarin/rails-agent-skills
Rating: 93.27 (1 reviews)
Author: igmarin

Orchestrates robust background job implementation with TDD discipline, proper retry/discard strategies, comprehensive failure scenario testing, and production monitoring to ensure reliable async processing.

Phase 1: Job Design

Objective: Define job responsibilities, idempotency strategy, and error classification before writing code.

Steps:

Job Purpose — Define trigger conditions, input parameters, expected output/side effects, and criticality.
Idempotency — Design job to be safely re-runnable: use unique job keys, status checks, or sentinel timestamps.
Error Classification — Classify all anticipated errors:
- Transient (network timeouts, rate limits) → retry
- Permanent (invalid data, record not found) → discard
- Configuration (missing credentials) → alert
Queue & Timeout — Assign queue priority and set execution timeout.

HARD GATE — Job Design Complete:

Purpose, trigger, input/output defined
Idempotency strategy specified
All errors classified as transient/permanent
Queue and timeout values chosen

If gate fails: Clarify requirements before implementation.

Phase 2: TDD Implementation

Objective: Implement job logic under TDD discipline.

Steps:

Choose unit vs. integration test approach.
Write failing tests covering: successful execution, idempotency (run twice = same result), transient error raises, permanent error discards.
Confirm tests FAIL for the right reason (job not yet implemented).
Propose implementation approach and wait for explicit user approval.
Implement job using the structure shown in Phase 3 (retry/discard declarations included from the start); confirm tests PASS.
Run full test suite — confirm no regressions.

HARD GATE — Tests Pass:

Tests exist and run
Tests failed before implementation
All tests pass after implementation
Full suite green

Example job test skeleton (for OrderConfirmationEmailJob — see Phase 3 for the matching implementation):

# spec/jobs/order_confirmation_email_job_spec.rb
RSpec.describe OrderConfirmationEmailJob do
  let(:order) { create(:order, :completed) }

  it 'sends confirmation email' do
    expect(EmailService).to receive(:send_confirmation).with(order.id, order.customer_email, order.total)
    described_class.perform_now(order.id, order.customer_email, order.total)
  end

  it 'is idempotent' do
    expect(EmailService).to receive(:send_confirmation).once
    2.times { described_class.perform_now(order.id, order.customer_email, order.total) }
  end

  it 'raises on transient errors so retry triggers' do
    allow(EmailService).to receive(:send_confirmation).and_raise(EmailService::TimeoutError)
    expect { described_class.perform_now(order.id, order.customer_email, order.total) }.to raise_error(EmailService::TimeoutError)
  end

  it 'logs and re-raises on transient error' do
    allow(EmailService).to receive(:send_confirmation).and_raise(EmailService::TimeoutError)
    expect(Rails.logger).to receive(:error).with(/transient error/)
    expect { described_class.perform_now(order.id, order.customer_email, order.total) }
      .to raise_error(EmailService::TimeoutError)
  end

  it 'discards silently on permanent error' do
    allow(EmailService).to receive(:send_confirmation).and_raise(EmailService::InvalidEmailError)
    expect { described_class.perform_now(order.id, "bad", order.total) }.not_to raise_error
  end
end

Phase 3: Retry/Discard Configuration

Objective: Harden job for production with correct retry backoff, discard rules, timeouts, and monitoring hooks.

Steps:

Choose backend (Solid Queue for Rails 8+, Sidekiq for high scale) and configure worker concurrency.
Apply retry_on with exponential backoff and a capped attempt count (3–5) for every transient error class.
Apply discard_on for every permanent error class; log discards.
Set job execution timeout and queue timeout at the worker/config level.
Wire error tracking (e.g., Sentry) and metrics (e.g., StatsD/Datadog) in ApplicationJob callbacks.

Complete job implementation (matches the test skeleton in Phase 2):

# app/jobs/order_confirmation_email_job.rb
class OrderConfirmationEmailJob < ApplicationJob
  queue_as :default

  retry_on  EmailService::TimeoutError,    wait: :exponentially_longer, attempts: 5
  retry_on  EmailService::RateLimitError,  wait: :exponentially_longer, attempts: 3
  discard_on ActiveRecord::RecordNotFound
  discard_on EmailService::InvalidEmailError

  def perform(order_id, customer_email, order_total)
    order = Order.find(order_id)
    return if order.email_sent_at.present?   # idempotency guard

    EmailService.send_confirmation(order_id, customer_email, order_total)
    order.update!(email_sent_at: Time.current)
  rescue EmailService::TimeoutError, EmailService::RateLimitError => e
    Rails.logger.error("[#{self.class}] transient error: #{e.message}")
    raise
  end
end

Solid Queue (Rails 8+) snippet:

# config/initializers/solid_queue.rb
SolidQueue.configure { |c| c.worker = { processes: 2, threads: 5, polling_interval: 1 } }

Sidekiq snippet:

# config/initializers/sidekiq.rb
Sidekiq.configure_server { |c| c.redis = { url: ENV['REDIS_URL'] } }

Monitoring hook in ApplicationJob:

class ApplicationJob < ActiveJob::Base
  around_perform do |job, block|
    start = Time.current
    block.call
    StatsD.timing("jobs.#{job.class.name.underscore}.duration", Time.current - start)
    StatsD.increment("jobs.#{job.class.name.underscore}.success")
  rescue StandardError
    StatsD.increment("jobs.#{job.class.name.underscore}.failure")
    raise
  end
end

HARD GATE — Retry Strategy Configured:

retry_on declared for every transient error with backoff and attempt cap
discard_on declared for every permanent error with logging
Timeouts configured at job and worker level
Metrics/alerting wired

If gate fails: Job is not production-ready.

Phase 4: Failure Scenario Testing & Monitoring

Objective: Verify retry/discard behaviour under injected failures at the integration/production level and confirm observability.

Steps:

Inject transient errors at the integration level → assert job raises and the queue backend schedules a retry (not just that the error propagates in a unit test).
Inject permanent errors → assert job does not raise, error is logged, and the job is not re-enqueued.
Confirm timeout handling by stubbing slow operations and verifying the worker-level timeout fires correctly.
Verify metrics increment on success and failure paths (assert StatsD/Datadog counters, not just that no exception is raised).
Confirm queue-depth alerts fire when queue backs up.

HARD GATE — Failure Scenarios Tested:

Retry path tested end-to-end (job raises on transient error and backend re-enqueues)
Discard path tested (no raise on permanent error, job not re-enqueued)
Error logging assertions pass
Metrics verified on success and failure
Performance acceptable under expected load

If gate fails: Address failure scenarios before deploying.

HARD GATE: Production Readiness

Never deploy a background job without:

Idempotency guard implemented and tested
All transient errors covered by retry_on with backoff
All permanent errors covered by discard_on with logging
Failure scenario tests passing
Metrics and error-tracking wired
Timeouts configured

Error Recovery

Job fails repeatedly in production:

Check retry patterns and error rates in monitoring.
Review logs for error class and stack trace.
Classify error (transient vs. permanent) and adjust retry_on/discard_on if mis-classified.
Fix root cause; redeploy.

Queue backs up:

Scale worker processes/threads.
Promote critical jobs to a higher-priority queue.
Optimise job execution time or batch size.

Output Style

When completing a background job implementation, output MUST include:

# Background Job Report — [Job Name]

## Design
- Job class: <path>
- Purpose: <one-line description>
- Idempotency strategy: <database unique constraint / Redis lock / conditional check>
- Error classification: transient (<list>) / permanent (<list>)

## TDD
- Spec: <spec file path>
- RED: <failure message confirming job behavior missing>
- GREEN: <spec passes after implementation>

## Retry Configuration
- retry_on: <error classes, backoff strategy, attempt cap>
- discard_on: <error classes, logging>
- Timeouts: <job-level and worker-level>

## Failure Scenarios Tested
- Transient error → retries: ✓
- Permanent error → discards: ✓
- Idempotency → no duplicate side effects: ✓
- Timeout handling: ✓

## Monitoring
- Metrics: <StatsD/Datadog counters for success/failure/duration>
- Error tracking: <Sentry/Honeybadger integration>
- Queue depth alerts: <configured threshold>

Integration

Predecessor	This Persona	Successor
load-context	background-job	code-review
tdd	background-job	quality
None (standalone)	background-job	PR submission

Use implement-background-job alone if the job design is already decided and you only need to implement the job class and specs.

.tessl-plugin

evals

skills

api

context

infrastructure

personas

background-job

SKILL.md

bug-fix

engine

graphql

migration

quality

review

setup

tdd

testing

README.md

tile.json

igmarin/rails-agent-skills

SKILL.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}skills/personas/background-job/

Background Job Persona

Phase 1: Job Design

Phase 2: TDD Implementation

Phase 3: Retry/Discard Configuration

Phase 4: Failure Scenario Testing & Monitoring

HARD GATE: Production Readiness

Error Recovery

Output Style

Integration

SKILL.mdskills/personas/background-job/