Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
87
78%
Does it follow best practices?
Impact
92%
1.37xAverage score across 6 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-computer-vision/SKILL.mdDataset preparation pipeline
Uses pipeline script
0%
100%
COCO output format
0%
100%
Correct split ratios
0%
100%
Stratified split
0%
100%
Seed 42
0%
100%
Mosaic augmentation
0%
100%
Mixup augmentation
0%
100%
Cutout parameters
0%
100%
Horizontal flip probability
100%
100%
Rotate limit
0%
0%
Brightness/contrast limits
0%
0%
Hue saturation values
0%
0%
pycocotools verification
0%
100%
Architecture selection and training setup
YOLOv8 or RT-DETR chosen
100%
100%
Uses vision_model_trainer.py
0%
100%
YOLO CLI training command
100%
100%
mAP@50 threshold
100%
100%
mAP@50:95 threshold
100%
100%
Precision threshold
100%
100%
Recall threshold
100%
100%
Inference time target
100%
100%
CNN vs ViT data reasoning
100%
100%
Detection task flag
0%
100%
Model optimization and deployment
Uses inference_optimizer.py
0%
100%
Baseline benchmark step
70%
100%
Benchmark parameters
62%
100%
Dynamic batch flag
37%
100%
Simplify flag
50%
100%
ONNX verification
100%
100%
Cloud path: TensorRT FP16
100%
100%
Intel path: OpenVINO
100%
100%
ONNX intermediary for Intel
100%
100%
INT8 calibration samples
0%
100%
Opset version 17
100%
50%
Segmentation architecture selection
Instance seg architecture
100%
100%
Real-time instance choice
100%
100%
Semantic seg architecture
100%
100%
SegFormer speed advantage
100%
62%
SAM for zero-shot
100%
100%
SAM prompt types
100%
100%
segment-anything package
100%
100%
Mask R-CNN quality trade-off
0%
100%
mmsegmentation or torchvision
100%
100%
Architecture decision format
100%
100%
Instance vs semantic distinction
100%
100%
Video tracking pipeline
ByteTrack or SORT
100%
100%
YOLOv8 detector
100%
100%
Real-time FPS target
100%
100%
Latency P99 target
0%
50%
GPU memory target
0%
100%
Model size target
0%
57%
Persistent track IDs
100%
100%
Video capture loop
100%
100%
Detection then tracking flow
100%
100%
Script is executable
100%
100%
FPS measurement
100%
100%
Small object detection and edge deployment
Edge architecture choice
58%
50%
SAHI or high-res for small objects
100%
100%
CNN over ViT justification
100%
100%
Edge FPS target
100%
100%
Edge mAP@50 target
100%
100%
Edge GPU memory target
100%
100%
Edge model size target
100%
100%
NVIDIA edge optimization path
0%
0%
Copy-paste for small objects
100%
100%
Architecture plan document
100%
100%
P2 FPN level or anchor adjustment
100%
100%
967fe01
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.