Maintains a persistent, interlinked markdown wiki between immutable raw sources and answers: ingest, query, lint, index and log—compounding knowledge instead of one-shot RAG.
94
94%
Does it follow best practices?
Impact
96%
0.97xAverage score across 3 eval scenarios
Passed
No known issues
A machine learning researcher named Dr. Chen has been accumulating raw notes and papers in a local folder for several months. She has decided to start maintaining a structured knowledge base to help her synthesize ideas across sources. She already wrote a configuration file (AGENTS.md) that defines where things should live in her wiki—including the raw sources root, the wiki root, the index filename, the log filename, and naming conventions for topic pages.
Dr. Chen has one raw source ready to ingest: a recent summary she wrote on attention mechanisms. She wants the agent to set up the wiki and ingest this article so the knowledge base is ready to use. She expects the agent to respect her directory layout exactly as configured, rather than making up its own folder structure.
Ingest the raw source according to the wiki schema defined in AGENTS.md. The expected outputs are:
All file paths used must match the schema configuration—do not create directories or files at paths not defined in or derived from the schema.
The following files are provided as inputs. Extract them before beginning.
=============== FILE: AGENTS.md ===============
notes/raw/notes/wiki/notes/wiki/index.mdnotes/wiki/log.mdnotes/wiki/topics/notes/wiki/people/attention-mechanisms.md)firstname-lastname.md## [YYYY-MM-DD] <keyword> | <short title>The index groups entries under: ## Topics, ## People, ## Sources
=============== FILE: notes/raw/2026-03-15-attention-mechanisms.md ===============
Date: 2026-03-15 Author: Vaswani et al. (summary by Dr. Chen)
Attention mechanisms allow neural networks to selectively focus on different parts of the input when producing each output token. The key insight is the scaled dot-product attention formula:
Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) * V
where Q, K, V are the query, key, and value matrices, and d_k is the key dimension. Scaling by sqrt(d_k) prevents vanishing gradients when d_k is large.
Multi-head attention runs h parallel attention operations ("heads") over different linear projections of Q, K, V, then concatenates and projects the results. This lets the model attend to different representation subspaces simultaneously.
Self-attention builds on sequence-to-sequence models and earlier additive attention (Bahdanau et al., 2015), but replaces recurrence entirely.