0
# Neural Network Inference
1
2
Model inference capabilities using U-Net architectures for semantic segmentation of musical elements. Oemer uses two specialized neural networks for different aspects of musical notation recognition.
3
4
## Capabilities
5
6
### Primary Inference Function
7
8
Run neural network inference using sliding window approach for large images.
9
10
```python { .api }
11
def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:
12
"""
13
Run neural network inference on image patches using sliding window approach.
14
15
Parameters:
16
- model_path (str): Path to model checkpoint directory containing model files
17
- img_path (str): Path to input image file
18
- step_size (int): Sliding window step size in pixels (default: 128)
19
- batch_size (int): Number of patches to process in each batch (default: 16)
20
- manual_th (Optional[Any]): Manual threshold for prediction binarization
21
- use_tf (bool): Use TensorFlow instead of ONNX runtime (default: False)
22
23
Returns:
24
Tuple containing:
25
- predictions (ndarray): Segmentation predictions with class labels
26
- metadata (ndarray): Additional prediction metadata
27
28
Raises:
29
FileNotFoundError: If model files or input image are not found
30
RuntimeError: If inference fails due to model or memory issues
31
"""
32
```
33
34
### Image Preprocessing
35
36
Prepare images for optimal neural network processing.
37
38
```python { .api }
39
def resize_image(image: Image.Image) -> Image.Image:
40
"""
41
Resize image to optimal dimensions for neural network processing.
42
43
Maintains aspect ratio while ensuring dimensions are compatible
44
with the model's expected input size requirements.
45
46
Parameters:
47
- image (PIL.Image.Image): Input image to resize
48
49
Returns:
50
PIL.Image.Image: Resized image optimized for model inference
51
"""
52
```
53
54
### Symbol Classification
55
56
Use trained sklearn models for fine-grained symbol classification.
57
58
```python { .api }
59
def predict(region: ndarray, model_name: str) -> str:
60
"""
61
Predict symbol type using trained sklearn classification models.
62
63
Used for distinguishing between similar symbols that neural networks
64
cannot reliably differentiate (e.g., different clef types, accidentals).
65
66
Parameters:
67
- region (ndarray): Image region containing the symbol to classify
68
- model_name (str): Name of the sklearn model to use for prediction
69
70
Returns:
71
str: Predicted symbol class label
72
73
Raises:
74
ValueError: If model_name is not recognized
75
FileNotFoundError: If sklearn model file is not found
76
"""
77
```
78
79
## Neural Network Architecture
80
81
### Two-Stage Processing
82
83
Oemer uses two specialized U-Net models for different aspects of music recognition:
84
85
#### Model 1: Staff vs. Symbols Segmentation
86
- **Purpose**: Separate staff lines from all other musical symbols
87
- **Input**: Full music sheet image
88
- **Output**: Binary mask distinguishing staff lines (class 1) from symbols (class 2)
89
- **Location**: `checkpoints/unet_big/`
90
91
#### Model 2: Detailed Symbol Classification
92
- **Purpose**: Classify specific types of musical symbols
93
- **Input**: Full music sheet image
94
- **Output**: Multi-class segmentation with:
95
- Class 1: Stems and rests
96
- Class 2: Note heads
97
- Class 3: Clefs and accidentals (sharp/flat/natural)
98
- **Location**: `checkpoints/seg_net/`
99
100
### Model Files Structure
101
102
Each model directory contains:
103
- `model.onnx` - ONNX format model (default runtime)
104
- `weights.h5` - TensorFlow/Keras weights (when using `--use-tf`)
105
- `metadata.pkl` - Model metadata and configuration
106
- `arch.json` - Model architecture description
107
108
## Usage Examples
109
110
### Basic Inference
111
112
```python
113
from oemer.inference import inference
114
import numpy as np
115
116
# Run inference on a music sheet image
117
model_path = "oemer/checkpoints/unet_big"
118
img_path = "sheet_music.jpg"
119
120
# Generate predictions
121
predictions, metadata = inference(
122
model_path=model_path,
123
img_path=img_path,
124
step_size=128,
125
batch_size=16,
126
use_tf=False # Use ONNX runtime
127
)
128
129
# Extract staff and symbol predictions
130
staff_mask = np.where(predictions == 1, 1, 0)
131
symbol_mask = np.where(predictions == 2, 1, 0)
132
133
print(f"Predictions shape: {predictions.shape}")
134
print(f"Staff pixels: {np.sum(staff_mask)}")
135
print(f"Symbol pixels: {np.sum(symbol_mask)}")
136
```
137
138
### Two-Stage Inference Pipeline
139
140
```python
141
from oemer.inference import inference, resize_image
142
from PIL import Image
143
import os
144
145
def run_complete_inference(img_path: str, use_tf: bool = False):
146
"""Run both inference models on an image."""
147
148
# Resize image for optimal processing
149
image = Image.open(img_path)
150
resized_image = resize_image(image)
151
temp_path = "temp_resized.jpg"
152
resized_image.save(temp_path)
153
154
try:
155
# Stage 1: Staff vs. symbols segmentation
156
print("Running stage 1 inference (staff vs symbols)...")
157
staff_symbols, _ = inference(
158
model_path="oemer/checkpoints/unet_big",
159
img_path=temp_path,
160
step_size=128,
161
batch_size=16,
162
use_tf=use_tf
163
)
164
165
# Stage 2: Detailed symbol classification
166
print("Running stage 2 inference (symbol details)...")
167
symbol_details, _ = inference(
168
model_path="oemer/checkpoints/seg_net",
169
img_path=temp_path,
170
step_size=128,
171
batch_size=16,
172
use_tf=use_tf
173
)
174
175
# Process results
176
staff = np.where(staff_symbols == 1, 1, 0)
177
symbols = np.where(staff_symbols == 2, 1, 0)
178
stems_rests = np.where(symbol_details == 1, 1, 0)
179
noteheads = np.where(symbol_details == 2, 1, 0)
180
clefs_keys = np.where(symbol_details == 3, 1, 0)
181
182
return {
183
'staff': staff,
184
'symbols': symbols,
185
'stems_rests': stems_rests,
186
'noteheads': noteheads,
187
'clefs_keys': clefs_keys
188
}
189
190
finally:
191
# Clean up temporary file
192
if os.path.exists(temp_path):
193
os.remove(temp_path)
194
195
# Run complete inference
196
results = run_complete_inference("my_sheet_music.jpg")
197
for key, mask in results.items():
198
print(f"{key}: {mask.shape}, pixels: {np.sum(mask)}")
199
```
200
201
### Custom Inference Parameters
202
203
```python
204
from oemer.inference import inference
205
206
# High-precision inference with smaller steps
207
high_precision_predictions, _ = inference(
208
model_path="oemer/checkpoints/unet_big",
209
img_path="complex_score.jpg",
210
step_size=64, # Smaller steps for more overlap
211
batch_size=8, # Smaller batches to reduce memory usage
212
use_tf=True # Use TensorFlow for potentially better precision
213
)
214
215
# Fast inference with larger steps
216
fast_predictions, _ = inference(
217
model_path="oemer/checkpoints/unet_big",
218
img_path="simple_score.jpg",
219
step_size=256, # Larger steps for faster processing
220
batch_size=32, # Larger batches if memory allows
221
use_tf=False # ONNX is typically faster
222
)
223
```
224
225
### Symbol Classification with sklearn Models
226
227
```python
228
from oemer.inference import predict
229
import cv2
230
import numpy as np
231
232
# Extract a symbol region from the image
233
image = cv2.imread("sheet_music.jpg", cv2.IMREAD_GRAYSCALE)
234
symbol_region = image[100:150, 200:250] # Extract 50x50 region
235
236
# Classify the symbol using trained sklearn models
237
try:
238
# Predict clef type
239
clef_type = predict(symbol_region, "clef_classifier")
240
print(f"Detected clef: {clef_type}")
241
242
# Predict accidental type
243
accidental_type = predict(symbol_region, "accidental_classifier")
244
print(f"Detected accidental: {accidental_type}")
245
246
# Predict rest type
247
rest_type = predict(symbol_region, "rest_classifier")
248
print(f"Detected rest: {rest_type}")
249
250
except ValueError as e:
251
print(f"Classification error: {e}")
252
```
253
254
## Performance Considerations
255
256
### Memory Management
257
258
- **Batch Size**: Reduce `batch_size` if encountering out-of-memory errors
259
- **Step Size**: Larger `step_size` uses less memory but may reduce accuracy
260
- **Model Backend**: ONNX runtime typically uses less memory than TensorFlow
261
262
### Speed Optimization
263
264
- **ONNX Runtime**: Generally faster than TensorFlow for inference
265
- **GPU Acceleration**: Install `onnxruntime-gpu` for GPU acceleration on Linux
266
- **Image Size**: Resize very large images to reduce processing time
267
268
### Quality vs. Speed Trade-offs
269
270
```python
271
# Quality-focused settings (slower)
272
quality_settings = {
273
'step_size': 64,
274
'batch_size': 8,
275
'use_tf': True
276
}
277
278
# Speed-focused settings (faster)
279
speed_settings = {
280
'step_size': 256,
281
'batch_size': 32,
282
'use_tf': False
283
}
284
285
# Balanced settings (recommended)
286
balanced_settings = {
287
'step_size': 128,
288
'batch_size': 16,
289
'use_tf': False
290
}
291
```
292
293
The inference system is designed to handle various image sizes and qualities, automatically adapting the processing pipeline for optimal results while maintaining reasonable performance.