running tests at various levels from smoke tests to full suite to randomized tests
71
55%
Does it follow best practices?
Impact
100%
1.75xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/running-tests/SKILL.mdThis skill is for running tests systematically, starting with fast/focused tests and progressing to slower/broader tests. This ordering allows failures to be caught early, minimizing wasted time.
This skill is designed to be run as a subagent to avoid cluttering the invoking agent's context. The output is either confirmation that all tests passed, or a report of failures.
Since subagents cannot ask for clarification, the invoking agent must gather this information before launching:
Changed files/modules: Which files or modules were changed, so the subagent can identify appropriate smoke tests and focused tests.
Test levels to run: Which levels to execute. Options:
The subagent prompt should include: "Run tests for changes in <files/modules>."
To reduce noise and keep agent context manageable, always use these flags:
# Recommended flags for quiet output
--ll fatal # Only log fatal errors (not info/debug messages)
-r simple # Use simple reporter (minimal output)
--disable-dots # Don't print progress dots
--abort # Stop on first failure (don't run remaining tests)Example:
./stellar-core test --ll fatal -r simple --disable-dots --abort "test name"Note that if you ever do need information about a test when trying to diagnose
what went wrong with it, you might want to turn the log level up from fatal to
info, debug or even trace, using --ll debug or --ll trace for example.
Many tests are protocol-specific and can behave differently across protocol versions. Use these flags to control which protocol versions are tested:
--version <N> # Run tests for a specific protocol version
--all-versions # Run tests for all supported protocol versionsFor focused testing during development, test with the current protocol version,
which is the default. The full test suite should eventually be run with
--all-versions.
Tests use a deterministic PRNG. By default, the seed varies, but you can set a specific seed for reproducibility:
--rng-seed <N> # Use a specific RNG seed for reproducibilityThis is useful for reproducing failures or for baseline checks that require consistent output.
Tests are run in order of increasing cost. Stop at the first failure.
Run 2-3 specific tests that are most likely to catch breakage in the changed code. These should complete in seconds.
To identify smoke tests:
# Run a specific test by name (use quotes for exact match)
./stellar-core test --ll fatal -r simple --abort "exact test name"Run all tests in the test file(s) related to the change. This typically takes a few minutes.
# Run tests matching a tag pattern
./stellar-core test --ll fatal -r simple --abort "[ModuleName*]"
# Run tests from a specific area
./stellar-core test --ll fatal -r simple --abort "[ledgertxn]"
# Combine tags (AND logic - must match all)
./stellar-core test --ll fatal -r simple --abort "[tx][soroban]"Ledger/Transaction tests:
"[ledgertxn]" - LedgerTxn operations"[tx][payment]" - Payment transaction tests"[tx][createaccount]" - CreateAccount tests"[tx][offers]" - Offer/DEX tests"[tx][soroban]" - Soroban (smart contract) transaction testsBucket/BucketList tests:
"[bucket]" - General bucket tests"[bucketlist]" - BucketList specific tests"[bucketmergemap]" - Bucket merge map testsHerder tests:
"[herder]" - General herder tests"[txset]" - Transaction set tests"[transactionqueue]" - Transaction queue tests"[quorumintersection]" - Quorum intersection tests"[upgrades]" - Protocol upgrade testsOverlay/Network tests:
"[overlay]" - Overlay network tests"[flood]" - Transaction flooding tests"[PeerManager]" - Peer management testsCrypto/Utility tests:
"[crypto]" - Cryptography tests"[decoder]" - Base32/64 encoding tests"[timer]" - VirtualClock timer tests"[cache]" - Cache implementation testsSoroban-specific tests:
"[soroban]" - All Soroban tests"[soroban][archival]" - State archival tests"[soroban][upgrades]" - Soroban upgrade testsRun the complete unit test suite. This may take 10-30 minutes.
make checkOr directly with quiet output:
./stellar-core test --ll fatal -r simple --disable-dots --abortFor faster execution, use parallel partitions via make check:
# Run with partitions equal to CPU cores
NUM_PARTITIONS=$(nproc) make checkThe full test suite should be run with all protocol versions:
ALL_VERSIONS=1 NUM_PARTITIONS=$(nproc) make checkTo test with SQLite only (faster, no Postgres dependency):
./configure --disable-postgres --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
NUM_PARTITIONS=$(nproc) make checkThis validates that transaction test execution produces the same metadata hashes as fixed baselines stored in the repository. This catches unintended changes to transaction semantics.
Important: Always use --rng-seed 12345 for baseline checks to ensure
deterministic results.
# Check transaction tests against current protocol baseline
./stellar-core test "[tx]" --all-versions --rng-seed 12345 --ll fatal \
--abort -r simple --check-test-tx-meta test-tx-meta-baseline-currentFor next-protocol testing (when preparing protocol upgrades):
./stellar-core test "[tx]" --all-versions --rng-seed 12345 --ll fatal \
--abort -r simple --check-test-tx-meta test-tx-meta-baseline-nextIf baselines need updating after intentional changes, the test will fail and indicate which baselines differ.
When to run: Only needed for changes touching memory management, pointers, concurrency, or threading code. Skip for simple logic changes, config changes, or test-only changes.
Run tests with sanitizers enabled to catch memory errors and undefined behavior. This requires reconfiguring and rebuilding.
Catches memory errors: buffer overflows, use-after-free, memory leaks.
./configure --enable-asan --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abortCatches data races and threading issues.
./configure --enable-threadsanitizer --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abortCatches undefined behavior like integer overflow, null pointer dereference.
./configure --enable-undefinedcheck --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abortWhen to run: Only for changes to core data structures or when Level 4 sanitizers found something suspicious. Usually overkill.
Run with C++ standard library debugging enabled. Slower but catches more issues.
./configure --enable-extrachecks --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abortBefore running tests at Levels 4-6, also verify the build succeeds with
--disable-tests (the production configuration):
./configure --disable-tests --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)This doesn't run tests but ensures the production build works.
When a test fails:
Report the results:
## Test Results: PASS
All test levels completed successfully:
- Level 1 (Smoke): 3 tests, 2.1s
- Level 2 (Focused): 47 tests, 1m 12s
- Level 3 (Full Suite): 1,234 tests, 18m 45s
- Level 3b (TX Meta Baseline): OK
Build verification:
- --disable-tests: OKOr on failure:
## Test Results: FAIL
Failed at Level 2 (Focused Unit Tests)
**Failing test:** `LedgerManagerTests.processTransactionRejectsEmpty`
**File:** src/ledger/LedgerManagerTests.cpp:142
**Error:**
REQUIRE( result == TRANSACTION_REJECTED )
with expansion:
TRANSACTION_SUCCESS == TRANSACTION_REJECTED
**Analysis:** The test expects empty transactions to be rejected, but the
new code path is allowing them through. See LedgerManager.cpp:98 where the
empty check appears to be missing.
Levels completed before failure:
- Level 1 (Smoke): 3 tests, 2.1s ✓For most changes (logic fixes, new features, refactors):
--all-versionsFor memory-sensitive changes (pointers, allocations, C++ containers):
For concurrency changes (threading, async, locks):
For test-only changes or documentation:
--abort flag)--ll fatal -r simple --disable-dots for quiet output--all-versions before considering complete--rng-seed 12345 for tx-meta baseline checksReport to the invoking agent:
1b0eccd
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.