A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).
Install with Tessl CLI
npx tessl i github:supercent-io/skills-template --skill prompt-repetition65
Quality
47%
Does it follow best practices?
Impact
97%
1.56xAverage score across 3 eval scenarios
Optimize this skill with Tessl
npx tessl skill review --optimize ./.agent-skills/prompt-repetition/SKILL.mdLLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:
Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.
[Context] → [Question]
↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear[First Pass] [Second Pass]
Context → Question → Context' → Question'
↑ ↑
Can reference entire first passIn the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.
Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.
| Metric | Result |
|---|---|
| Significant improvement (p < 0.1) | 47 / 70 benchmarks |
| Performance degradation | 0 |
| Neutral | 23 |
| Improvement rate | 67% |
Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)
| Provider | Auto-apply models | Excluded models |
|---|---|---|
| Claude | haiku series | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |
| Task Type | Keyword Pattern | Repetitions | Expected Improvement |
|---|---|---|---|
| Options-First MCQ | A. B. C. D. choices first | 2× | +15-40%p |
| Index/Position | slot, position, index, N-th | 3× | +50-76%p |
| Context + Question | General question | 2× | +5-15%p |
| With CoT | step by step, think through | 0× (not applied) | ~0% |
# Check context before auto-apply
max_context = model_context_window * 0.8 # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)Before:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.After (repetition ×2 applied):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.Expected output:
AAccuracy: original 78% → after repetition 93% (+15%p)
Before:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?After (repetition ×3 applied): Prompt repeated 3 times
Expected output:
Dragon ScaleAccuracy: original 21% → after repetition 97% (+76%p)
Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.
Before:
Use the calculator tool to compute 234 * 567.
What is the result?After (repetition ×2):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?Research results show that full repetition including tool call sections is also effective.
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
# Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
# CoT patterns (excluded from apply)
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
# Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
# Skip if already applied
if self.config.applied_marker in prompt:
return False
# Check target model
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# Skip when CoT pattern detected
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""Determine repetition count based on task type"""
prompt_lower = prompt.lower()
# Position/Index pattern detected → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""Simple token count estimation (speed over precision)"""
# Estimate approximately 4 characters = 1 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""Apply repetition to prompt"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# Check context limit
model_lower = model.lower()
max_tokens = 128_000 # Default value
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# Reduce repetitions if token limit exceeded
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# Apply repetition + add marker
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""Wrap LLM call function"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrappeddef run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""A/B test for prompt repetition effectiveness"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# Baseline
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# With Repetition
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")| Metric | Measurement Method |
|---|---|
| Accuracy | Compare correct answer rates |
| Consistency | Variance across 10 runs of same prompt |
| Token cost | Input token increase rate |
| Latency | Compare p50, p99 latency |
| Case | Reason |
|---|---|
| Using CoT | Reasoning process already provides context |
| Reasoning models (opus, sonnet) | Already optimized; minimal effect |
| Very long prompts | Risk of exceeding context limit |
| Already repeated | Duplicate application wastes tokens |
| Metric | Baseline | With Repetition | Change |
|---|---|---|---|
| Input tokens | 500/req | 1000/req | +100% |
| Output tokens | 100/req | 100/req | 0% |
| Latency (p50) | 450ms | 460ms | +2% |
| Latency (p99) | 1200ms | 1250ms | +4% |
| Accuracy | 78% | 89% | +14%p |
| Cost per correct answer | $0.019 | $0.020 | +5% |
Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.
| Agent | Model | Repetition Applied | Applied At |
|---|---|---|---|
| Claude Orchestrator | opus/sonnet | Optional | - |
| Claude Executor | haiku | Auto | skill_loader.py |
| Gemini Analyst | flash | Auto | On MCP call |
| OpenAI | gpt-4o-mini | Auto | skill_loader.py |
To prevent duplicate application in multi-agent pipelines:
<!-- prompt-repetition-applied --> markerx-prompt-repetition-applied: true header between agents[Claude Sonnet] Planning (no repetition needed)
↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)# Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def __init__(self, ...):
# ... existing code ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""Handle auto-apply skills"""
# Auto-apply prompt-repetition
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt. etc. has no effect (per research)=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)
=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)
=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%
=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->2311988
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.