Structured performance opportunity investigation for SpiderMonkey (the Firefox JavaScript engine). Use this skill when the user wants to investigate JS engine performance, profile SpiderMonkey, find optimization opportunities, write performance patches, or evaluate benchmark regressions. Trigger on mentions of: profiling JS, SpiderMonkey performance, JIT optimization, benchmark regression analysis, shell benchmarking, or any request to make JS workloads faster. The methodolgy is described mostly for the JS shell but can be adapted to browser investigation.
76
—
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
This skill guides a structured, evidence-driven performance investigation for the SpiderMonkey JavaScript engine. The methodology has four phases: hypothesis generation, evidence gathering, patch writing, and evaluation. Each phase builds on the last: resist the urge to skip ahead to writing patches before you have empirical evidence that a change will help.
When asked to create multiple patches, iterate through the phases each time to ensure each patch is independently validated and measured. Always create commits before moving onto a new patch if you are creating multiple patches. This will make it easier to review and to measure contribution.
The end result of this skill will be a summary of the investigation, and one or more patches that measurably improve the performance of the targeted workload, with each patch describing supporting evidence and measured impact.
The user should provide:
You have access to:
samply — sampling profiler that produces Firefox Profiler-compatible outputprofiler-cli — for analyzing profiles. This can also be used to investigate Gecko
profiler profiles if the investigation is being done in the browser.searchfox-cli — source code search for the Firefox codebaseFor more details on how to use these tools load the "profiler-analysis" skill, which will also hint on how to get the tools installed if needed.
An artifacts/ directory can be created and this is excluded from version control.
The goal is to identify where time is being spent and form testable hypotheses about what could be improved.
Use an opt-nodebug (optimized, no debug checks) build. Debug builds distort profiles with assertion overhead.
The user should provide or confirm the mozconfig to use. The key settings for an opt-nodebug build are:
ac_add_options --enable-optimize
ac_add_options --disable-debugIf the user hasn't specified a mozconfig, ask them — build configurations vary across machines and the user will know which obj-dir and config is appropriate for their setup.
Always run the shell with --strict-benchmark-mode when investigating performance.
This flag validates the runtimeconfiguration and will error if something would produce
unreliable numbers (e.g. JIT is disabled unexpectedly). Generating profiles without this
flag risks producing misleading data.
Examine the workload to understand what it does. If the workload has an iteration count or loop parameter, determine an appropriate count so that the workload runs for at least 30 seconds under profiling. Statistical profilers need sufficient samples to produce meaningful data — short runs produce noisy profiles where real hotspots are hard to distinguish from sampling noise.
For targeted micro-optimizations (e.g. improving a single opcode or a specific stub), longer runs (60s+) may be necessary to accumulate enough samples in the specific code path of interest.
If the workload driver supports iteration configuration, prefer that.
Otherwise, wrap it:
for (let i = 0; i < ITERATIONS; i++) {
load("workload.js"); // or call the main function
}Record a profile with samply. Always set IONPERF=func and PERF_SPEW_DIR so that
JIT-compiled functions appear with readable names in the profile instead of raw addresses.
The overhead is negligible:
mkdir -p artifacts/perf-spew
PERF_SPEW_DIR=artifacts/perf-spew IONPERF=func \
samply record --save-only -o artifacts/profile.json.gz -- \
./obj-opt-nodebug/dist/bin/js --strict-benchmark-mode workload.jsUsing --save-only avoids opening the browser and gives you a local file you can analyze
with profiler-cli. Save profiles to the artifacts/ directory; you may need to gzip
the profile for profiler-cli to read it.
For deeper JIT investigation (e.g. understanding what IR the JIT emitted for a hot
function), use IONPERF=ir instead — see references/advanced-tools.md.
Start broad and narrow down: Looking at the profile, answer some of the following questionsfile:
For Speedometer profiles, always use --focus-marker="-async,-sync" to exclude async idle
time between benchmark iterations.
Based on the profile data, form specific, testable hypotheses. Good hypotheses look like:
Bad hypotheses (avoid these):
Before writing a patch, gather enough evidence to be confident the hypothesis is sound.
Use searchfox-cli to understand the relevant code and understand the current behavior.
Use searchfox-cli for blame on relevant code, as well as git history on relevant files. This might provide context on why things are the way they are.
Profiling shows where time is spent but not always why. When your hypothesis depends on runtime state (data distributions, cache hit rates, list lengths, frequency of code paths), add temporary instrumentation to measure it directly.
Use MOZ_LOG or JS_LOG for instrumentation.
JS_LOG(debug /* you can also add your own channel, but debug should be unused */, Debug, "list length: %zu, sorted: %s",
list.length(), isSorted ? "yes" : "no");Throttle instrumentation output when it would fire on every iteration — use a counter to log every Nth occurrence, or accumulate statistics and log a summary. Unthrottled logging in a hot path will drown the output and slow the workload enough to distort measurements.
static uint32_t callCount = 0;
if (++callCount % 10000 == 0) {
JS_LOG_FMT(debug, Debug, "after %u calls: avg length = %zu",
callCount, totalLength / callCount);
}Re run with MOZ_LOG=debug:5 to see the output.
In a browser build you can add profiler markers instead of logging which can be read through gecko-profiling and the profiler-cli.
Run the instrumented build and collect the data. This confirms whether your hypothesis about runtime behavior is correct before you invest in writing a real patch.
Now that you have evidence, write the patch.
Where possible, gate the optimization behind a JS::Prefs preference so you can do apples-to-apples comparison on the same binary. This eliminates build-to-build variation as a confounding factor and makes it trivial to re-measure later.
To add the pref, add an entry to StaticPrefList.yaml:
- name: javascript.options.experimental.my_optimization
type: bool
value: true
mirror: always
set_spidermonkey_pref: alwaysThen guard the code path:
if (JS::Prefs::experimental_my_optimization()) {
// new path (default: on)
} else {
// old path
}Use set_spidermonkey_pref: always (not startup) so the pref can be toggled via
--setpref without requiring a restart:
# Measure with optimization (default):
./js --strict-benchmark-mode workload.js
# Measure without:
./js --strict-benchmark-mode --setpref experimental.my_optimization=false workload.jsNote that pref-gating is not always feasible. For changes on extremely hot paths (tight JIT loops, inline caches), the branch on the pref check itself can be costly enough to distort measurements. In those cases, fall back to saving the obj-dir from a build without the patch and comparing against a build with the patch applied.
Note: You can't save -just- a js binary, as there are dynamically linked libraries.
Always save the obj-dir, or create a different mozconfig.
During patch development, add JS_LOG logging to the debug channel to verify the new
code path is being taken where expected. Throttle by a counter to avoid flooding output.
Do a run with the instrumentation logging to ensure the logging fires when/where/as-much
as expected. Remove or reduce this logging before the patch is finalized.
For a given optimization is is often compelling to also generate a microbenchmark which demonstrates in the absolute most ideal circumstances for the optimization what kind of result is achievable. This is not a replacement for measuring the real workload, but can be a useful sanity check that the optimization is working as intended and has the potential to produce the expected impact, and can help in choosing to keep patches which are effective in the microbenchmark but don't show good impact under the real workload.
When investigating multiple optimization opportunities:
Run the workload with and without the patch (using the pref toggle or separate builds).
If hyperfine is available, you can use that if. If not, start with 5 runs of each configuration, collecting timing results into arrays.
# With pref-gated optimization — collect results into a file:
for i in $(seq 1 5); do
./js --strict-benchmark-mode --setpref experimental.my_optimization=true workload.js \
2>&1 | tee -a artifacts/results_with.txt
done
for i in $(seq 1 5); do
./js --strict-benchmark-mode --setpref experimental.my_optimization=false workload.js \
2>&1 | tee -a artifacts/results_without.txt
doneAfter collecting initial results, use a Python script to assess whether the sample size is sufficient. Use the Mann-Whitney U test (non-parametric, robust to non-normal distributions common in benchmark data) to test for significance:
# /// script
# dependencies = [
# "numpy",
# "scipy",
# ]
# ///
# use `uv run script.py` and deps should be automaticaly installed
import numpy as np
from scipy import stats
baseline = np.array([...]) # times without patch
patched = np.array([...]) # times with patch
stat, p_value = stats.mannwhitneyu(baseline, patched, alternative='two-sided')
effect_size = (np.mean(baseline) - np.mean(patched)) / np.mean(baseline) * 100
print(f"Baseline: {np.mean(baseline):.2f} +/- {np.std(baseline):.2f}")
print(f"Patched: {np.mean(patched):.2f} +/- {np.std(patched):.2f}")
print(f"Effect: {effect_size:.2f}%")
print(f"p-value: {p_value:.4f}")
if p_value > 0.05:
print("Result not statistically significant at p<0.05 — consider more runs")If the p-value is borderline (0.01 < p < 0.10) or the effect size is small relative to the observed variance, collect additional runs and retest. But do not exceed 20 runs per configuration — if 20 runs on each side still can't produce a significant result, the effect is too close to the noise floor to be meaningfully measured this way. That's a signal to step back and reconsider: either the optimization isn't having the expected impact, or the workload needs to be restructured to isolate the effect better (e.g. more iterations of the hot path, a more targeted microbenchmark).
Don't just measure — profile again to confirm the patch is having the expected effect. The profile should show reduced time in the targeted code path. If it doesn't, investigate why.
After each patch is written, but before it's commited, run the correctness test suites.
Both of these must pass. Test with opt-nodebug first (because you have the build) but also test with an opt-debug build as well, as there are many debug-only assertions that catch errors that are needed to be evaluated.
./mach jit-test
./mach jstestsIf the patch touches GC-related code, run both suites with --jitflags=all for more
thorough coverage:
./mach jit-test --jitflags=all
./mach jstests --jitflags=allBeyond the test suites, consider adding test cases to address
Produce a summary document (outside the source tree, e.g. in artifacts/) that records:
--strict-benchmark-mode: Without this flag, the shell may be in a
configuration that produces misleading numbers. Always use it.cd457f4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.