CtrlK
BlogDocsLog inGet started
Tessl Logo

vector-text-fixer

Fix garbled text in PDF/SVG vector graphics for final editing in AI. Detect, replace and repair garbled text in vector graphic files while maintaining original formatting and layout.

62

5.00x
Quality

51%

Does it follow best practices?

Impact

70%

5.00x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./scientific-skills/Data analysis/vector-text-fixer/SKILL.md
SKILL.md
Quality
Evals
Security

Vector Text Fixer

Fixes garbled text in PDF/SVG vector graphics to make them editable in AI tools.

Features

  • Garbled Text Detection: Automatically identifies garbled text in PDF/SVG files
  • Smart Repair: Infers original text content based on context
  • Batch Processing: Supports batch processing of multiple files in a folder
  • Format Preservation: Repaired files maintain original vector format and layout
  • AI-assisted Editing: Outputs intermediate format that can be imported into AI editors

Supported Scenarios

1. PDF Garbled Text Repair

  • Box/question mark issues caused by font embedding problems
  • Garbled text caused by encoding conversion errors
  • Abnormal characters generated by missing font substitution
  • Multi-language mixed encoding issues

2. SVG Garbled Text Repair

  • Text entity encoding errors
  • Special character escaping issues
  • Display abnormalities caused by invalid font references
  • XML encoding declaration errors

Usage

Command Line

# Fix a single PDF file
python scripts/main.py --input document.pdf --output fixed.pdf

# Fix a single SVG file
python scripts/main.py --input diagram.svg --output fixed.svg

# Batch process folder
python scripts/main.py --batch ./input_folder --output ./output_folder

# Interactive repair (manually specify replacement content)
python scripts/main.py --input doc.pdf --interactive

# Export as editable format (JSON)
python scripts/main.py --input doc.pdf --export-json editable.json

Python API

from scripts.main import VectorTextFixer

# Create fixer instance
fixer = VectorTextFixer()

# Fix PDF
result = fixer.fix_pdf("input.pdf", "output.pdf")

# Fix SVG
result = fixer.fix_svg("input.svg", "output.svg")

# Batch processing
results = fixer.batch_fix("./input_folder", "./output_folder")

# Get text map (for AI editing)
text_map = fixer.extract_text_map("input.pdf")

Input Parameters

ParameterTypeRequiredDescription
--inputstrYes*Input file path (PDF or SVG)
--batchstrNoBatch processing input folder
--outputstrYes*Output file/folder path
--interactiveboolNoEnable interactive repair mode
--export-jsonstrNoExport editable JSON format
--encodingstrNoSpecify source file encoding (default: auto-detect)
--font-substitutiondictNoFont replacement mapping
--repair-levelstrNoRepair level: minimal, standard, aggressive (default: standard)

*At least one of --input and --batch is required

Output Format

Repaired PDF/SVG

  • Maintains original vector format
  • Garbled text replaced with readable content
  • Fonts and layout remain unchanged

JSON Export Format

{
  "file_type": "pdf",
  "pages": [
    {
      "page_num": 1,
      "text_blocks": [
        {
          "id": "tb_001",
          "bbox": [100, 200, 300, 220],
          "original_text": "�����",
          "detected_encoding": "UTF-8",
          "confidence": 0.3,
          "suggested_fix": "Sample Text"
        }
      ]
    }
  ],
  "fonts_used": ["Arial", "SimSun"],
  "repair_summary": {
    "total_blocks": 15,
    "fixed_blocks": 12,
    "skipped_blocks": 3
  }
}

Garbled Text Detection Rules

The tool uses the following rules to detect garbled text:

  1. Replacement Character Detection: Identifies U+FFFD (�) and box characters
  2. Control Character Filtering: Excludes non-printing control characters
  3. Encoding Consistency: Detects anomalies caused by mixed encodings
  4. Font Fallback Detection: Identifies substitution characters generated due to missing fonts
  5. Probability Model: Garbled text probability assessment based on character frequency

Repair Strategies

Minimal

  • Only repairs obvious errors (replacement characters, null bytes)
  • Maintains maximum integrity of original text
  • Suitable for minor garbled text issues

Standard

  • Repairs common encoding issues
  • Smart font replacement
  • Balances repair rate and accuracy

Aggressive

  • Comprehensive text re-encoding
  • Uses OCR-assisted recognition
  • Suitable for severely garbled documents

Examples

Fix Single Page PDF

Input:

python scripts/main.py --input report.pdf --output fixed_report.pdf

Output:

✓ Processing: report.pdf
✓ Detected 5 garbled text blocks
✓ Fixed 4 blocks automatically
⚠ 1 block requires manual review
✓ Output saved: fixed_report.pdf
✓ Report saved: fixed_report_repair_log.json

Export Editable JSON

Input:

python scripts/main.py --input diagram.svg --export-json editable.json

Output JSON Structure:

{
  "file_type": "svg",
  "svg_info": {
    "width": 800,
    "height": 600,
    "viewBox": "0 0 800 600"
  },
  "text_elements": [
    {
      "id": "text_1",
      "x": 100,
      "y": 200,
      "font_family": "Arial",
      "font_size": 14,
      "original": "�����",
      "user_editable": "",
      "confidence": 0.25
    }
  ]
}

Dependencies

pdfplumber>=0.10.0      # PDF parsing
PyMuPDF>=1.23.0         # PDF processing (fitz)
cairosvg>=2.7.0         # SVG conversion
beautifulsoup4>=4.12.0  # SVG parsing
fonttools>=4.40.0       # Font processing
chardet>=5.0.0          # Encoding detection
Pillow>=10.0.0          # Image processing

Limitations

  • Encrypted PDFs require password unlock before processing
  • Severely damaged vector files may not be fully repairable
  • Some rare fonts may not map correctly
  • Scanned PDFs require OCR recognition first

Version Information

  • Version: 1.0.0
  • Last Updated: 2026-02-06
  • Status: Ready for use

Risk Assessment

Risk IndicatorAssessmentLevel
Code ExecutionPython/R scripts executed locallyMedium
Network AccessNo external API callsLow
File System AccessRead input files, write output filesMedium
Instruction TamperingStandard prompt guidelinesLow
Data ExposureOutput files saved to workspaceLow

Security Checklist

  • No hardcoded credentials or API keys
  • No unauthorized file system access (../)
  • Output does not expose sensitive information
  • Prompt injection protections in place
  • Input file paths validated (no ../ traversal)
  • Output directory restricted to workspace
  • Script execution in sandboxed environment
  • Error messages sanitized (no stack traces exposed)
  • Dependencies audited

Prerequisites

# Python dependencies
pip install -r requirements.txt

Evaluation Criteria

Success Metrics

  • Successfully executes main functionality
  • Output meets quality standards
  • Handles edge cases gracefully
  • Performance is acceptable

Test Cases

  1. Basic Functionality: Standard input → Expected output
  2. Edge Case: Invalid input → Graceful error handling
  3. Performance: Large dataset → Acceptable processing time

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
    • Performance optimization
    • Additional feature support
Repository
aipoch/medical-research-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.