deduplication

Event deduplication with canonical selection, reputation scoring, and hash-based grouping for multi-source data aggregation. Handles both ID-based and content-based deduplication.

1.58x

Quality

66%

Does it follow best practices?

Impact

98%

1.58x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./data-access/deduplication-dadbodgeoff-drift/SKILL.md

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at specificity and distinctiveness, clearly defining a narrow technical domain with concrete operations. However, it lacks an explicit 'Use when...' clause and could benefit from more natural trigger terms that users would actually say when needing this functionality.

Suggestions

Add a 'Use when...' clause such as 'Use when the user needs to remove duplicate events from multiple data sources, merge overlapping records, or deduplicate event streams.'

Include more natural language trigger terms like 'remove duplicates', 'dedupe', 'duplicate detection', 'merge duplicate records' alongside the technical terminology.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'canonical selection', 'reputation scoring', 'hash-based grouping', 'ID-based and content-based deduplication'. These are concrete, well-defined operations.	3 / 3
Completeness	Clearly answers 'what does this do' with specific capabilities, but lacks an explicit 'Use when...' clause or equivalent trigger guidance. The 'when' is only implied by the domain description.	2 / 3
Trigger Term Quality	Includes some relevant keywords like 'deduplication', 'multi-source', 'data aggregation', and 'hash-based grouping', but these are somewhat technical. Missing more natural user terms like 'remove duplicates', 'merge events', 'duplicate detection', or 'dedupe'.	2 / 3
Distinctiveness Conflict Risk	Highly specific niche combining event deduplication, canonical selection, reputation scoring, and hash-based grouping. This is unlikely to conflict with other skills due to its very targeted domain.	3 / 3
	Total	10 / 12 Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides solid, executable TypeScript code for event deduplication with good coverage of both ID-based and content-based approaches. Its main weaknesses are the verbose introductory sections that explain concepts Claude already knows, the large inline code block that could benefit from being split into a reference file, and the lack of validation/verification steps in the workflow for what is essentially a data transformation pipeline where silent errors are possible.

Suggestions

Remove or significantly trim the 'Core Concepts' and 'When to Use This Skill' sections — Claude understands deduplication concepts and can infer when to apply the skill from the implementation itself.

Add validation checkpoints: e.g., after grouping, log/check group sizes to detect overly aggressive or insufficient grouping; after canonical selection, verify the selected item meets minimum quality thresholds.

Move the full implementation code to a separate reference file (e.g., IMPLEMENTATION.md) and keep only a concise quick-start example in the main skill file.

Dimension	Reasoning	Score
Conciseness	The skill includes some unnecessary sections like 'Core Concepts' that explain things Claude already understands (what deduplication is, why simple URL dedup isn't enough). The 'When to Use This Skill' section is also somewhat redundant. However, the code itself is reasonably lean and the best practices/common mistakes sections are concise bullet points.	2 / 3
Actionability	The skill provides fully executable TypeScript code with complete type definitions, concrete implementations for both ID-based and content-based deduplication, reputation scoring with real domain examples, and clear usage examples showing how to call the functions. Code is copy-paste ready.	3 / 3
Workflow Clarity	The two deduplication modes are explained and the usage examples show the sequence (fetch → deduplicate → use results), but there are no validation checkpoints or error handling steps. For a data aggregation pipeline that could silently drop or misgroup items, there should be verification steps (e.g., checking group quality, validating canonical selection).	2 / 3
Progressive Disclosure	The content is reasonably structured with clear sections, but the main implementation block is quite long (~100 lines of inline code) that could be referenced externally. The 'Related Patterns' section at the end hints at cross-references but uses plain text rather than actual links. For a skill of this length (~200 lines), splitting the full implementation into a separate reference file would improve organization.	2 / 3
	Total	9 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning

	Total	10 / 11 Passed

Repository: majiayu000/claude-skill-registry-data
Commit: 6770aaa

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.