Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills.
34
30%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./aps-doc-skills/core/SKILL.mdCore documentation generation framework providing shared patterns, templates, and utilities used by all APS layer-specific documentation skills (ingestion, hist-union, staging, id-unification, golden).
Use this skill when:
Note: For layer-specific documentation, use the specialized skills:
aps-doc-skills:ingestion for ingestion layersaps-doc-skills:hist-union for hist-union workflowsaps-doc-skills:staging for staging transformationsaps-doc-skills:id-unification for ID unificationaps-doc-skills:golden for golden layersWITHOUT codebase access = NO documentation. Period.
If no codebase access provided:
I cannot create technical documentation without codebase access.
Required:
- Directory path to code
- Access to relevant files (.dig, .sql, .yml)
Without access, I cannot extract real configurations, SQL, or workflow logic.
Provide path: "Code is in /path/to/layer/"Before proceeding:
All documentation MUST contain real data from codebase:
NO generic templates. Only production-ready, codebase-driven documentation.
All documentation follows a template-driven approach:
Process:
Benefits:
1. Fetch existing Confluence page (if provided)
2. Extract structure:
- Section headings hierarchy
- Content organization patterns
- Tables and formatting styles
- Code block conventions
3. Identify required sections
4. Map sections to codebase elements1. Locate relevant files:
- Workflow files (.dig)
- Configuration files (.yml)
- SQL/transformation files (.sql)
- README and documentation and others if any
2. Extract metadata:
- Table schemas (columns, types, nullability)
- Data lineage (source → destination)
- Dependencies (what depends on what)
- Configuration parameters
3. Analyze patterns:
- Processing logic (incremental, full, batch)
- Error handling strategies
- Performance optimizations
- Security patterns (PII, auth)1. Create outline matching template
2. Populate sections with codebase data:
- Use actual file names and paths
- Include real configuration examples
- Show actual SQL transformations
- Document real table/column names
3. Add visual elements:
- Mermaid diagrams (flow, ERD, dependencies)
- Tables (configuration, mappings, metrics)
- Code blocks (with syntax highlighting)
4. Validate quality (60+ checks)
5. Test code examples (execute SQL, validate YAML)
6. Publish to ConfluenceUse this structure as the base template for all layer documentation:
# {Layer Name}
## Overview
Brief introduction explaining purpose and key characteristics.
### Key Characteristics
* **Engine**: Processing engine (Presto/Trino, Hive, etc.)
* **Architecture**: Processing approach (loop-based, parallel, etc.)
* **Processing Mode**: Incremental/Full/Batch
* **Location**: File system path
---
## Architecture Overview
### Directory Structurelayer_directory/ ├── main_workflow.dig ├── config/ │ └── configuration.yml ├── sql/ or queries/ │ └── transformation.sql └── README.md
### Core Components
Detailed description of each component.
---
## Processing Flow
### Initial Load (if applicable)
Step-by-step description of first-time processing.
### Incremental Load
Step-by-step description of ongoing processing.
---
## Configuration
Complete configuration reference with examples.
---
## Monitoring and Troubleshooting
### Monitoring Queries
Executable SQL queries for checking status.
### Common Issues
Issue descriptions with solutions.
---
## Best Practices
Numbered list of recommendations.
---
## Summary
Key takeaways and benefits.Generate Mermaid diagrams to visualize architecture:
graph LR
A[Source] -->|Process| B[Destination]
B -->|Transform| C[Output]graph TD
Start[Start] --> Task1[Task 1]
Task1 --> Parallel{Parallel?}
Parallel -->|Yes| Task2A[Task 2A]
Parallel -->|Yes| Task2B[Task 2B]
Task2A --> End[End]
Task2B --> EnderDiagram
TABLE_A ||--o{ TABLE_B : "has"
TABLE_B ||--|| TABLE_C : "references"graph TB
A[Source A] --> D[Target D]
B[Source B] --> D
C[Source C] --> E[Target E]
D --> F[Final]
E --> FExtract and document schemas:
-- Get schema
DESCRIBE {database}.{table};
SHOW COLUMNS FROM {database}.{table};Document in table format:
| Column | Type | Nullable | Description | Source | Transformation |
|---|---|---|---|---|---|
| id | BIGINT | NO | Primary key | source.id | CAST(id AS BIGINT) |
| VARCHAR | YES | Email address | source.email | LOWER(TRIM(email)) |
SELECT
'{table}' as table_name,
COUNT(*) as total_rows,
COUNT(DISTINCT primary_key) as unique_records,
MIN(time) as earliest_record,
MAX(time) as latest_record
FROM {database}.{table};column_name:
Source: source_system.source_table.source_column
→ Raw: raw_db.raw_table.column (as-is)
→ Staging: stg_db.stg_table.column_std (UPPER(TRIM(column)))
→ Unified: unif_db.unif_table.column (from staging)
→ Golden: gld_db.gld_table.column (SCD Type 2)Before publishing, validate:
-- Test monitoring queries
SELECT * FROM {database}.{log_table}
WHERE source = '{source}'
ORDER BY time DESC
LIMIT 10;
-- Test schema queries
DESCRIBE {database}.{table};
-- Test volume queries
SELECT COUNT(*) FROM {database}.{table};# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('config.yml'))"
# Check for placeholders
grep -r "your_\|example_\|placeholder" config.yml-- Verify table exists
SHOW TABLES IN {database} LIKE '{table}';
-- Verify columns exist
SELECT column_name FROM information_schema.columns
WHERE table_schema = '{database}'
AND table_name = '{table}';Tool: mcp__atlassian__createConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
spaceId: "{numeric space ID}"
title: "Clear, descriptive title"
body: "Complete Markdown content"
parentId: "{parent page ID}" (optional, for hierarchy)Tool: mcp__atlassian__updateConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
pageId: "{existing page ID}"
body: "Updated Markdown content"
title: "New title" (optional)
versionMessage: "Description of changes" (optional)For complex layers with multiple components:
## Components
1. [**Component 1**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+1)
- Description
2. [**Component 2**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+2)
- DescriptionFor layers with multiple workflows/tables:
Document performance characteristics:
| Metric | Value | Benchmark |
|---|---|---|
| Avg Processing Time | 15 min | < 30 min SLA |
| Peak Memory Usage | 8 GB | 12 GB limit |
| Avg Rows/Day | 2.5M | Growing 10% monthly |
Document PII and compliance:
| Table | PII Columns | Protection | Retention | Access |
|---|---|---|---|---|
| table_a | email, phone | SHA256 | 7 years | Restricted |
| table_b | ip_address | Anonymization | 90 days | Internal |
Track documentation changes:
| Version | Date | Changed By | Changes | Impact |
|---|---|---|---|---|
| v2.1 | 2025-11-27 | Claude | Added 3 tables | Low |
| v2.0 | 2025-11-15 | Team | Migrated engine | High |
Solutions:
lsfind . -name "*.dig"Solutions:
Solutions:
Solutions:
The APS Documentation Core provides:
This core framework is used by all layer-specific skills to ensure consistent, high-quality documentation across all Treasure Data pipeline layers.
79bb9b8
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.