cluster-documents

Automated content similarity and grouping analysis. Groups related documents by topic, purpose, or content similarity.

Quality

33%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/cluster-documents/SKILL.md

Document Clustering Skill

Analyze a repository of documents to group them based on content similarity, topic, or purpose. This skill helps organize large collections, identify redundancies, and discover relationships.

Inputs

PATH - The repository to analyze (e.g., "/repository")
SIMILARITY_THRESHOLD - (Optional) Float (0.0-1.0), threshold for grouping (default: 0.8)
VISUALIZATION - (Optional) Boolean, whether to generate a visual representation (default: false)

Workflow

Step 1: Text Processing

Ingest documents from PATH.

Normalize text (remove stop words, stemming/lemmatization).
Generate embeddings or TF-IDF vectors for each document.

Step 2: Clustering Analysis

Apply clustering algorithms (e.g., K-Means, DBSCAN) to the document vectors.

Group documents that meet the SIMILARITY_THRESHOLD.
Identify outliers or unique documents.

Step 3: Cluster Labeling

Analyze the centroid or representative terms of each cluster to assign a meaningful label (Topic).

Step 4: Output Generation

Generate the clustering report.

If VISUALIZATION is true, create a scatter plot or dendrogram data.

Required Outputs

A CLUSTERING_REPORT object containing:

Cluster List: ID, Label, and List of Documents in each cluster.
Redundancy Report: Sets of highly similar documents (potential duplicates).
Visualization Data: (If requested) Coordinates for plotting.

Quick Reference

Purpose: Organize unstructured content and find duplicates.
Techniques: Text Mining, NLP, Vector Space Models.

Repository: dandye/ai-runbooks
Commit: 086cbf6

Last updated: about 2 months ago
Created: about 2 months ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.