0
# SAHI - Slicing Aided Hyper Inference
1
2
A comprehensive computer vision library specifically designed for performing large-scale object detection and instance segmentation on high-resolution images. SAHI addresses the challenge of detecting small objects in large images by implementing sliced inference techniques that divide large images into smaller, overlapping patches for processing.
3
4
## Package Information
5
6
- **Package Name**: sahi
7
- **Language**: Python
8
- **Installation**: `pip install sahi`
9
10
## Core Imports
11
12
```python
13
import sahi
14
from sahi import AutoDetectionModel
15
```
16
17
Core classes and functions:
18
19
```python
20
from sahi import (
21
BoundingBox,
22
Category,
23
Mask,
24
AutoDetectionModel,
25
DetectionModel,
26
ObjectPrediction
27
)
28
```
29
30
Prediction functions:
31
32
```python
33
from sahi.predict import get_prediction, get_sliced_prediction, predict
34
```
35
36
## Basic Usage
37
38
```python
39
from sahi import AutoDetectionModel
40
from sahi.predict import get_sliced_prediction
41
42
# Load a detection model
43
detection_model = AutoDetectionModel.from_pretrained(
44
model_type='ultralytics',
45
model_path='yolov8n.pt',
46
confidence_threshold=0.3,
47
device="cpu"
48
)
49
50
# Perform sliced inference on a large image
51
result = get_sliced_prediction(
52
image="path/to/large_image.jpg",
53
detection_model=detection_model,
54
slice_height=512,
55
slice_width=512,
56
overlap_height_ratio=0.2,
57
overlap_width_ratio=0.2
58
)
59
60
# Access predictions
61
predictions = result.object_prediction_list
62
for prediction in predictions:
63
print(f"Class: {prediction.category.name}")
64
print(f"Confidence: {prediction.score.value}")
65
print(f"BBox: {prediction.bbox.to_coco_bbox()}")
66
67
# Export visualization
68
result.export_visuals(export_dir="output/")
69
```
70
71
## Architecture
72
73
SAHI's architecture centers around three key concepts:
74
75
- **Detection Models**: Unified interface for various deep learning frameworks (YOLO, MMDetection, Detectron2, HuggingFace, etc.)
76
- **Sliced Inference**: Automatic image slicing with overlapping patches to handle large images and small objects
77
- **Annotation Framework**: Comprehensive data structures for bounding boxes, masks, and predictions with format conversions
78
79
The library seamlessly integrates with popular frameworks while providing consistent APIs for slicing, prediction postprocessing, dataset operations, and visualization across research and production environments.
80
81
## Capabilities
82
83
### Model Integration and Loading
84
85
Unified interface for loading detection models from various frameworks including Ultralytics YOLO, MMDetection, Detectron2, HuggingFace Transformers, TorchVision, and Roboflow.
86
87
```python { .api }
88
class AutoDetectionModel:
89
@staticmethod
90
def from_pretrained(
91
model_type: str,
92
model_path: Optional[str] = None,
93
model: Optional[Any] = None,
94
config_path: Optional[str] = None,
95
device: Optional[str] = None,
96
mask_threshold: float = 0.5,
97
confidence_threshold: float = 0.3,
98
category_mapping: Optional[Dict] = None,
99
category_remapping: Optional[Dict] = None,
100
load_at_init: bool = True,
101
image_size: Optional[int] = None,
102
**kwargs,
103
) -> DetectionModel: ...
104
```
105
106
[Model Integration](./model-integration.md)
107
108
### Core Prediction Functions
109
110
Main prediction capabilities including standard inference, sliced inference for large images, batch processing, and video processing with comprehensive parameter control.
111
112
```python { .api }
113
def get_prediction(
114
image,
115
detection_model,
116
shift_amount: list = [0, 0],
117
full_shape=None,
118
postprocess: Optional[PostprocessPredictions] = None,
119
verbose: int = 0,
120
exclude_classes_by_name: Optional[List[str]] = None,
121
exclude_classes_by_id: Optional[List[int]] = None,
122
) -> PredictionResult: ...
123
124
def get_sliced_prediction(
125
image,
126
detection_model,
127
slice_height: Optional[int] = None,
128
slice_width: Optional[int] = None,
129
overlap_height_ratio: float = 0.2,
130
overlap_width_ratio: float = 0.2,
131
perform_standard_pred: bool = True,
132
postprocess_type: str = "GREEDYNMM",
133
postprocess_match_metric: str = "IOS",
134
postprocess_match_threshold: float = 0.5,
135
postprocess_class_agnostic: bool = False,
136
verbose: int = 1,
137
merge_buffer_length: Optional[int] = None,
138
auto_slice_resolution: bool = True,
139
slice_export_prefix: Optional[str] = None,
140
slice_dir: Optional[str] = None,
141
exclude_classes_by_name: Optional[List[str]] = None,
142
exclude_classes_by_id: Optional[List[int]] = None,
143
) -> PredictionResult: ...
144
145
def predict(
146
detection_model: Optional[DetectionModel] = None,
147
model_type: str = "ultralytics",
148
model_path: Optional[str] = None,
149
model_config_path: Optional[str] = None,
150
model_confidence_threshold: float = 0.25,
151
model_device: Optional[str] = None,
152
model_category_mapping: Optional[dict] = None,
153
model_category_remapping: Optional[dict] = None,
154
source: Optional[str] = None,
155
no_standard_prediction: bool = False,
156
no_sliced_prediction: bool = False,
157
image_size: Optional[int] = None,
158
slice_height: int = 512,
159
slice_width: int = 512,
160
overlap_height_ratio: float = 0.2,
161
overlap_width_ratio: float = 0.2,
162
postprocess_type: str = "GREEDYNMM",
163
postprocess_match_metric: str = "IOS",
164
postprocess_match_threshold: float = 0.5,
165
postprocess_class_agnostic: bool = False,
166
novisual: bool = False,
167
view_video: bool = False,
168
frame_skip_interval: int = 0,
169
export_pickle: bool = False,
170
export_crop: bool = False,
171
dataset_json_path: Optional[str] = None,
172
project: str = "runs/predict",
173
name: str = "exp",
174
visual_bbox_thickness: Optional[int] = None,
175
visual_text_size: Optional[float] = None,
176
visual_text_thickness: Optional[int] = None,
177
visual_hide_labels: bool = False,
178
visual_hide_conf: bool = False,
179
visual_export_format: str = "png",
180
verbose: int = 1,
181
return_dict: bool = False,
182
force_postprocess_type: bool = False,
183
exclude_classes_by_name: Optional[List[str]] = None,
184
exclude_classes_by_id: Optional[List[int]] = None,
185
**kwargs,
186
) -> Optional[Dict]: ...
187
```
188
189
[Prediction Functions](./prediction-functions.md)
190
191
### Annotation and Data Structures
192
193
Core data structures for handling bounding boxes, masks, categories, and complete object annotations with comprehensive format conversion and manipulation methods.
194
195
```python { .api }
196
@dataclass(frozen=True)
197
class BoundingBox:
198
box: Union[Tuple[float, float, float, float], List[float]]
199
shift_amount: Tuple[int, int] = (0, 0)
200
201
def get_expanded_box(self, ratio: float = 0.1, max_x: int = None, max_y: int = None) -> "BoundingBox": ...
202
def to_coco_bbox(self) -> List[float]: ...
203
def to_xyxy(self) -> List[float]: ...
204
def get_shifted_box(self) -> "BoundingBox": ...
205
206
@dataclass(frozen=True)
207
class Category:
208
id: Optional[Union[int, str]] = None
209
name: Optional[str] = None
210
211
class Mask:
212
def __init__(self, bool_mask: Optional[np.ndarray] = None, segmentation: Optional[List] = None, shift_amount: Tuple[int, int] = (0, 0)): ...
213
@classmethod
214
def from_float_mask(cls, mask: np.ndarray, mask_threshold: float = 0.5, shift_amount: Tuple[int, int] = (0, 0)) -> "Mask": ...
215
@classmethod
216
def from_bool_mask(cls, mask: np.ndarray, shift_amount: Tuple[int, int] = (0, 0)) -> "Mask": ...
217
def get_shifted_mask(self) -> "Mask": ...
218
219
class ObjectPrediction(ObjectAnnotation):
220
def __init__(
221
self,
222
bbox: Optional[BoundingBox] = None,
223
category: Optional[Category] = None,
224
score: Optional[PredictionScore] = None,
225
mask: Optional[Mask] = None,
226
shift_amount: Optional[List[int]] = None,
227
full_shape: Optional[List[int]] = None,
228
): ...
229
def get_shifted_object_prediction(self) -> "ObjectPrediction": ...
230
def to_coco_prediction(self) -> CocoPrediction: ...
231
def to_fiftyone_detection(self): ...
232
```
233
234
[Annotation Framework](./annotation-framework.md)
235
236
### Image Slicing and Processing
237
238
Advanced image slicing capabilities for handling large images, including automatic parameter calculation, annotation processing, and dataset slicing operations.
239
240
```python { .api }
241
def get_slice_bboxes(
242
image_height: int,
243
image_width: int,
244
slice_height: Optional[int] = None,
245
slice_width: Optional[int] = None,
246
auto_slice_resolution: Optional[bool] = True,
247
overlap_height_ratio: Optional[float] = 0.2,
248
overlap_width_ratio: Optional[float] = 0.2,
249
) -> List[List[int]]: ...
250
251
def slice_image(
252
image: Union[str, Image.Image],
253
output_file_name: Optional[str] = None,
254
output_dir: Optional[str] = None,
255
slice_height: int = 512,
256
slice_width: int = 512,
257
overlap_height_ratio: float = 0.2,
258
overlap_width_ratio: float = 0.2,
259
auto_slice_resolution: bool = True,
260
min_area_ratio: float = 0.1,
261
out_ext: Optional[str] = None,
262
verbose: bool = False,
263
) -> SliceImageResult: ...
264
265
def slice_coco(
266
coco_annotation_file_path: str,
267
image_dir: str,
268
output_coco_annotation_file_name: str = "",
269
output_dir: Optional[str] = None,
270
ignore_negative_samples: bool = False,
271
slice_height: int = 512,
272
slice_width: int = 512,
273
overlap_height_ratio: float = 0.2,
274
overlap_width_ratio: float = 0.2,
275
min_area_ratio: float = 0.1,
276
verbose: bool = False,
277
) -> str: ...
278
```
279
280
[Image Slicing](./image-slicing.md)
281
282
### Postprocessing and NMS
283
284
Advanced postprocessing methods for combining overlapping predictions including Non-Maximum Suppression (NMS), Non-Maximum Merging (NMM), and specialized algorithms for sliced inference results.
285
286
```python { .api }
287
class PostprocessPredictions:
288
def __init__(
289
self,
290
match_threshold: float = 0.5,
291
match_metric: str = "IOS",
292
class_agnostic: bool = False,
293
): ...
294
def __call__(
295
self,
296
object_predictions: List[ObjectPrediction],
297
) -> List[ObjectPrediction]: ...
298
299
class NMSPostprocess(PostprocessPredictions): ...
300
class NMMPostprocess(PostprocessPredictions): ...
301
class GreedyNMMPostprocess(PostprocessPredictions): ...
302
class LSNMSPostprocess(PostprocessPredictions): ...
303
304
def nms(
305
predictions: np.ndarray,
306
match_threshold: float = 0.5,
307
class_agnostic: bool = False,
308
) -> List[int]: ...
309
310
def greedy_nmm(
311
predictions: np.ndarray,
312
match_threshold: float = 0.5,
313
class_agnostic: bool = False,
314
) -> List[int]: ...
315
```
316
317
[Postprocessing](./postprocessing.md)
318
319
### COCO Dataset Integration
320
321
Comprehensive COCO dataset handling including loading, manipulation, annotation processing, evaluation, and format conversion capabilities.
322
323
```python { .api }
324
class Coco:
325
def __init__(self, coco_path: Optional[str] = None): ...
326
def add_image(self, coco_image: CocoImage) -> int: ...
327
def add_annotation(self, coco_annotation: CocoAnnotation) -> int: ...
328
def add_category(self, coco_category: CocoCategory) -> int: ...
329
def merge(self, coco2: "Coco") -> "Coco": ...
330
def export_as_yolo(
331
self,
332
output_dir: str,
333
train_split_rate: float = 1.0,
334
numpy_seed: int = 0,
335
mp: bool = True,
336
): ...
337
338
class CocoImage:
339
def __init__(self, image_path: str, image_id: Optional[int] = None): ...
340
def add_annotation(self, annotation: CocoAnnotation): ...
341
342
class CocoAnnotation:
343
def __init__(
344
self,
345
bbox: Optional[List[int]] = None,
346
category_id: Optional[int] = None,
347
category_name: Optional[str] = None,
348
iscrowd: int = 0,
349
area: Optional[int] = None,
350
segmentation: Optional[List] = None,
351
image_id: Optional[int] = None,
352
annotation_id: Optional[int] = None,
353
): ...
354
355
def create_coco_dict() -> Dict: ...
356
def export_coco_as_yolo(
357
coco_path: str,
358
output_dir: str,
359
train_split_rate: float = 1.0,
360
numpy_seed: int = 0,
361
) -> str: ...
362
```
363
364
[COCO Integration](./coco-integration.md)
365
366
### Command Line Interface
367
368
Complete command-line interface for prediction, dataset processing, evaluation, and format conversion operations accessible through the `sahi` command.
369
370
```bash { .api }
371
# Main prediction command
372
sahi predict --model_type ultralytics --model_path yolov8n.pt --source image.jpg
373
374
# Prediction with FiftyOne integration
375
sahi predict-fiftyone --model_type ultralytics --model_path yolov8n.pt --source image.jpg
376
377
# COCO dataset operations
378
sahi coco slice --image_dir images/ --dataset_json_path dataset.json
379
sahi coco evaluate --dataset_json_path dataset.json --result_json_path results.json
380
sahi coco yolo --coco_annotation_file_path dataset.json --image_dir images/
381
sahi coco analyse --dataset_json_path dataset.json --result_json_path results.json
382
sahi coco fiftyone --coco_annotation_file_path dataset.json --image_dir images/
383
384
# Environment and version info
385
sahi version
386
sahi env
387
```
388
389
[Command Line Interface](./cli.md)
390
391
### Utilities and Framework Integration
392
393
Utility functions for computer vision operations, framework-specific integrations, file I/O operations, and compatibility across different deep learning ecosystems.
394
395
```python { .api }
396
# CV utilities
397
def read_image_as_pil(image: Union[Image.Image, str, np.ndarray], exif_fix: bool = True) -> Image.Image: ...
398
def read_image(image_path: str) -> np.ndarray: ...
399
def visualize_object_predictions(
400
image: np.ndarray,
401
object_prediction_list: List[ObjectPrediction],
402
rect_th: int = 3,
403
text_size: float = 3,
404
text_th: float = 3,
405
color: tuple = None,
406
hide_labels: bool = False,
407
hide_conf: bool = False,
408
output_dir: Optional[str] = None,
409
file_name: Optional[str] = "prediction_visual",
410
) -> np.ndarray: ...
411
def crop_object_predictions(
412
image: np.ndarray,
413
object_prediction_list: List[ObjectPrediction],
414
output_dir: str,
415
file_name: str,
416
export_format: str = "PNG",
417
) -> None: ...
418
def get_video_reader(video_path: str): ...
419
420
# File utilities
421
def save_json(data, save_path: str, indent: Optional[int] = None): ...
422
def load_json(load_path: str, encoding: str = "utf-8") -> Dict: ...
423
def save_pickle(data: Any, save_path: str): ...
424
def load_pickle(load_path: str) -> Any: ...
425
def list_files(
426
directory: str,
427
contains: List[str] = None,
428
verbose: bool = True,
429
max_depth: Optional[int] = None,
430
) -> List[str]: ...
431
def get_base_filename(path: str) -> str: ...
432
def get_file_extension(path: str) -> str: ...
433
def download_from_url(from_url: str, to_path: str): ...
434
435
# Import utilities
436
def is_available(package: str) -> bool: ...
437
def check_requirements(requirements: List[str], raise_exception: bool = True): ...
438
```
439
440
[Utilities](./utilities.md)
441
442
## Types
443
444
```python { .api }
445
class PredictionResult:
446
def __init__(
447
self,
448
object_prediction_list: List[ObjectPrediction],
449
image: Image.Image,
450
durations_in_seconds: Optional[Dict] = None,
451
): ...
452
def export_visuals(self, export_dir: str, text_size: float = None): ...
453
def to_coco_annotations(self) -> List[CocoAnnotation]: ...
454
def to_coco_predictions(self) -> List[CocoPrediction]: ...
455
456
class PredictionScore:
457
def __init__(self, value: Union[float, np.ndarray]): ...
458
def is_greater_than_threshold(self, threshold: float) -> bool: ...
459
460
class SliceImageResult:
461
def __init__(self, original_image_size: List[int], image_dir: str): ...
462
463
class SlicedImage:
464
def __init__(self, image: Image.Image, coco_image: CocoImage, starting_pixel: List[int]): ...
465
```