CtrlK
BlogDocsLog inGet started
Tessl Logo

m10-performance

CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, 性能优化, 基准测试

Install with Tessl CLI

npx tessl i github:actionbook/rust-skills --skill m10-performance
What are skills?

82

Does it follow best practices?

Validation for skill structure

SKILL.md
Review
Evals

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

  • Have you measured? (Don't guess)
  • What's the acceptable performance?
  • Will optimization add complexity?

Performance Decision → Implementation

GoalDesign ChoiceImplementation
Reduce allocationsPre-allocate, reusewith_capacity, object pools
Improve cacheContiguous dataVec, SmallVec
ParallelizeData parallelismrayon, threads
Avoid copiesZero-copyReferences, Cow<T>
Reduce indirectionInline datasmallvec, arrays

Thinking Prompt

Before optimizing:

  1. Have you measured?

    • Profile first → flamegraph, perf
    • Benchmark → criterion, cargo bench
    • Identify actual hotspots
  2. What's the priority?

    • Algorithm (10x-1000x improvement)
    • Data structure (2x-10x)
    • Allocation (2x-5x)
    • Cache (1.5x-3x)
  3. What's the trade-off?

    • Complexity vs speed
    • Memory vs CPU
    • Latency vs throughput

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?"
    ↑ Ask: What's the performance SLA?
    ↑ Check: domain-* (latency requirements)
    ↑ Check: Business requirements (acceptable response time)
QuestionTrace ToAsk
Latency requirementsdomain-*What's acceptable response time?
Throughput needsdomain-*How many requests per second?
Memory constraintsdomain-*What's the memory budget?

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations"
    ↓ m01-ownership: Use references, avoid clone
    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"
    ↓ m07-concurrency: Choose rayon or threads
    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"
    ↓ Data layout: Prefer Vec over HashMap when possible
    ↓ Access patterns: Sequential over random access

Quick Reference

ToolPurpose
cargo benchMicro-benchmarks
criterionStatistical benchmarks
perf / flamegraphCPU profiling
heaptrackAllocation tracking
valgrind / cachegrindCache analysis

Optimization Priority

1. Algorithm choice     (10x - 1000x)
2. Data structure       (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization   (1.5x - 3x)
5. SIMD/Parallelism     (2x - 8x)

Common Techniques

TechniqueWhenHow
Pre-allocationKnown sizeVec::with_capacity(n)
Avoid cloningHot pathsUse references or Cow<T>
Batch operationsMany small opsCollect then process
SmallVecUsually smallsmallvec::SmallVec<[T; N]>
Inline buffersFixed-size dataArrays over Vec

Common Mistakes

MistakeWhy WrongBetter
Optimize without profilingWrong targetProfile first
Benchmark in debug modeMeaninglessAlways --release
Use LinkedListCache unfriendlyVec or VecDeque
Hidden .clone()Unnecessary allocsUse references
Premature optimizationWasted effortMake it work first

Anti-Patterns

Anti-PatternWhy BadBetter
Clone to avoid lifetimesPerformance costProper ownership
Box everythingIndirection costStack when possible
HashMap for small setsOverheadVec with linear search
String concat in loopO(n^2)String::with_capacity or format!

Related Skills

WhenSee
Reducing clonesm01-ownership
Concurrency optionsm07-concurrency
Smart pointer choicem02-resource
Domain requirementsdomain-*
Repository
actionbook/rust-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.