Tessl Tile for pypi/oemer@0.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md inference.md layer-management.md main-pipeline.md note-grouping.md notehead-extraction.md staffline-detection.md

inference.mddocs/

0
# Neural Network Inference
1

2
Model inference capabilities using U-Net architectures for semantic segmentation of musical elements. Oemer uses two specialized neural networks for different aspects of musical notation recognition.
3

4
## Capabilities
5

6
### Primary Inference Function
7

8
Run neural network inference using sliding window approach for large images.
9

10
```python { .api }
11
def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:
12
    """
13
    Run neural network inference on image patches using sliding window approach.
14
    
15
    Parameters:
16
    - model_path (str): Path to model checkpoint directory containing model files
17
    - img_path (str): Path to input image file
18
    - step_size (int): Sliding window step size in pixels (default: 128)
19
    - batch_size (int): Number of patches to process in each batch (default: 16)
20
    - manual_th (Optional[Any]): Manual threshold for prediction binarization
21
    - use_tf (bool): Use TensorFlow instead of ONNX runtime (default: False)
22
    
23
    Returns:
24
    Tuple containing:
25
    - predictions (ndarray): Segmentation predictions with class labels
26
    - metadata (ndarray): Additional prediction metadata
27
    
28
    Raises:
29
    FileNotFoundError: If model files or input image are not found
30
    RuntimeError: If inference fails due to model or memory issues
31
    """
32
```
33

34
### Image Preprocessing
35

36
Prepare images for optimal neural network processing.
37

38
```python { .api }
39
def resize_image(image: Image.Image) -> Image.Image:
40
    """
41
    Resize image to optimal dimensions for neural network processing.
42
    
43
    Maintains aspect ratio while ensuring dimensions are compatible
44
    with the model's expected input size requirements.
45
    
46
    Parameters:
47
    - image (PIL.Image.Image): Input image to resize
48
    
49
    Returns:
50
    PIL.Image.Image: Resized image optimized for model inference
51
    """
52
```
53

54
### Symbol Classification
55

56
Use trained sklearn models for fine-grained symbol classification.
57

58
```python { .api }
59
def predict(region: ndarray, model_name: str) -> str:
60
    """
61
    Predict symbol type using trained sklearn classification models.
62
    
63
    Used for distinguishing between similar symbols that neural networks
64
    cannot reliably differentiate (e.g., different clef types, accidentals).
65
    
66
    Parameters:
67
    - region (ndarray): Image region containing the symbol to classify
68
    - model_name (str): Name of the sklearn model to use for prediction
69
    
70
    Returns:
71
    str: Predicted symbol class label
72
    
73
    Raises:
74
    ValueError: If model_name is not recognized
75
    FileNotFoundError: If sklearn model file is not found
76
    """
77
```
78

79
## Neural Network Architecture
80

81
### Two-Stage Processing
82

83
Oemer uses two specialized U-Net models for different aspects of music recognition:
84

85
#### Model 1: Staff vs. Symbols Segmentation
86
- **Purpose**: Separate staff lines from all other musical symbols
87
- **Input**: Full music sheet image
88
- **Output**: Binary mask distinguishing staff lines (class 1) from symbols (class 2)
89
- **Location**: `checkpoints/unet_big/`
90

91
#### Model 2: Detailed Symbol Classification  
92
- **Purpose**: Classify specific types of musical symbols
93
- **Input**: Full music sheet image
94
- **Output**: Multi-class segmentation with:
95
  - Class 1: Stems and rests
96
  - Class 2: Note heads
97
  - Class 3: Clefs and accidentals (sharp/flat/natural)
98
- **Location**: `checkpoints/seg_net/`
99

100
### Model Files Structure
101

102
Each model directory contains:
103
- `model.onnx` - ONNX format model (default runtime)
104
- `weights.h5` - TensorFlow/Keras weights (when using `--use-tf`)
105
- `metadata.pkl` - Model metadata and configuration
106
- `arch.json` - Model architecture description
107

108
## Usage Examples
109

110
### Basic Inference
111

112
```python
113
from oemer.inference import inference
114
import numpy as np
115

116
# Run inference on a music sheet image
117
model_path = "oemer/checkpoints/unet_big"
118
img_path = "sheet_music.jpg"
119

120
# Generate predictions
121
predictions, metadata = inference(
122
    model_path=model_path,
123
    img_path=img_path,
124
    step_size=128,
125
    batch_size=16,
126
    use_tf=False  # Use ONNX runtime
127
)
128

129
# Extract staff and symbol predictions
130
staff_mask = np.where(predictions == 1, 1, 0)
131
symbol_mask = np.where(predictions == 2, 1, 0)
132

133
print(f"Predictions shape: {predictions.shape}")
134
print(f"Staff pixels: {np.sum(staff_mask)}")
135
print(f"Symbol pixels: {np.sum(symbol_mask)}")
136
```
137

138
### Two-Stage Inference Pipeline
139

140
```python
141
from oemer.inference import inference, resize_image
142
from PIL import Image
143
import os
144

145
def run_complete_inference(img_path: str, use_tf: bool = False):
146
    """Run both inference models on an image."""
147
    
148
    # Resize image for optimal processing
149
    image = Image.open(img_path)
150
    resized_image = resize_image(image)
151
    temp_path = "temp_resized.jpg"
152
    resized_image.save(temp_path)
153
    
154
    try:
155
        # Stage 1: Staff vs. symbols segmentation
156
        print("Running stage 1 inference (staff vs symbols)...")
157
        staff_symbols, _ = inference(
158
            model_path="oemer/checkpoints/unet_big",
159
            img_path=temp_path,
160
            step_size=128,
161
            batch_size=16,
162
            use_tf=use_tf
163
        )
164
        
165
        # Stage 2: Detailed symbol classification
166
        print("Running stage 2 inference (symbol details)...")
167
        symbol_details, _ = inference(
168
            model_path="oemer/checkpoints/seg_net",
169
            img_path=temp_path,
170
            step_size=128,
171
            batch_size=16,
172
            use_tf=use_tf
173
        )
174
        
175
        # Process results
176
        staff = np.where(staff_symbols == 1, 1, 0)
177
        symbols = np.where(staff_symbols == 2, 1, 0)
178
        stems_rests = np.where(symbol_details == 1, 1, 0)
179
        noteheads = np.where(symbol_details == 2, 1, 0)
180
        clefs_keys = np.where(symbol_details == 3, 1, 0)
181
        
182
        return {
183
            'staff': staff,
184
            'symbols': symbols,
185
            'stems_rests': stems_rests,
186
            'noteheads': noteheads,
187
            'clefs_keys': clefs_keys
188
        }
189
        
190
    finally:
191
        # Clean up temporary file
192
        if os.path.exists(temp_path):
193
            os.remove(temp_path)
194

195
# Run complete inference
196
results = run_complete_inference("my_sheet_music.jpg")
197
for key, mask in results.items():
198
    print(f"{key}: {mask.shape}, pixels: {np.sum(mask)}")
199
```
200

201
### Custom Inference Parameters
202

203
```python
204
from oemer.inference import inference
205

206
# High-precision inference with smaller steps
207
high_precision_predictions, _ = inference(
208
    model_path="oemer/checkpoints/unet_big",
209
    img_path="complex_score.jpg",
210
    step_size=64,      # Smaller steps for more overlap
211
    batch_size=8,      # Smaller batches to reduce memory usage
212
    use_tf=True        # Use TensorFlow for potentially better precision
213
)
214

215
# Fast inference with larger steps  
216
fast_predictions, _ = inference(
217
    model_path="oemer/checkpoints/unet_big", 
218
    img_path="simple_score.jpg",
219
    step_size=256,     # Larger steps for faster processing
220
    batch_size=32,     # Larger batches if memory allows
221
    use_tf=False       # ONNX is typically faster
222
)
223
```
224

225
### Symbol Classification with sklearn Models
226

227
```python
228
from oemer.inference import predict
229
import cv2
230
import numpy as np
231

232
# Extract a symbol region from the image
233
image = cv2.imread("sheet_music.jpg", cv2.IMREAD_GRAYSCALE)
234
symbol_region = image[100:150, 200:250]  # Extract 50x50 region
235

236
# Classify the symbol using trained sklearn models
237
try:
238
    # Predict clef type
239
    clef_type = predict(symbol_region, "clef_classifier")
240
    print(f"Detected clef: {clef_type}")
241
    
242
    # Predict accidental type  
243
    accidental_type = predict(symbol_region, "accidental_classifier")
244
    print(f"Detected accidental: {accidental_type}")
245
    
246
    # Predict rest type
247
    rest_type = predict(symbol_region, "rest_classifier")
248
    print(f"Detected rest: {rest_type}")
249
    
250
except ValueError as e:
251
    print(f"Classification error: {e}")
252
```
253

254
## Performance Considerations
255

256
### Memory Management
257

258
- **Batch Size**: Reduce `batch_size` if encountering out-of-memory errors
259
- **Step Size**: Larger `step_size` uses less memory but may reduce accuracy
260
- **Model Backend**: ONNX runtime typically uses less memory than TensorFlow
261

262
### Speed Optimization
263

264
- **ONNX Runtime**: Generally faster than TensorFlow for inference
265
- **GPU Acceleration**: Install `onnxruntime-gpu` for GPU acceleration on Linux
266
- **Image Size**: Resize very large images to reduce processing time
267

268
### Quality vs. Speed Trade-offs
269

270
```python
271
# Quality-focused settings (slower)
272
quality_settings = {
273
    'step_size': 64,
274
    'batch_size': 8,
275
    'use_tf': True
276
}
277

278
# Speed-focused settings (faster)  
279
speed_settings = {
280
    'step_size': 256,
281
    'batch_size': 32,
282
    'use_tf': False
283
}
284

285
# Balanced settings (recommended)
286
balanced_settings = {
287
    'step_size': 128,
288
    'batch_size': 16, 
289
    'use_tf': False
290
}
291
```
292

293
The inference system is designed to handle various image sizes and qualities, automatically adapting the processing pipeline for optimal results while maintaining reasonable performance.

Version

Tile

Files

inference.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

inference.mddocs/