0
# Main Processing Pipeline
1
2
Complete end-to-end optical music recognition pipeline that handles the full workflow from image input to MusicXML output. The pipeline orchestrates neural network inference, feature extraction, musical analysis, and structured document generation.
3
4
## Capabilities
5
6
### Primary Pipeline Function
7
8
The main extraction function that coordinates the entire OMR pipeline.
9
10
```python { .api }
11
def extract(args: Namespace) -> str:
12
"""
13
Main extraction pipeline function that processes a music sheet image and generates MusicXML.
14
15
Parameters:
16
- args.img_path (str): Path to the input image file
17
- args.output_path (str): Directory path for output files
18
- args.use_tf (bool): Use TensorFlow instead of ONNX runtime for inference
19
- args.save_cache (bool): Save model predictions to disk for reuse
20
- args.without_deskew (bool): Skip the deskewing step for aligned images
21
22
Returns:
23
str: Full path to the generated MusicXML file
24
25
Raises:
26
FileNotFoundError: If the input image file doesn't exist
27
StafflineException: If staffline detection fails
28
"""
29
```
30
31
### Command Line Interface
32
33
The CLI entry point that provides the `oemer` command.
34
35
```python { .api }
36
def main() -> None:
37
"""
38
CLI entry point for the oemer command.
39
40
Parses command line arguments and executes the extraction pipeline.
41
Downloads model checkpoints automatically if not present.
42
"""
43
44
def get_parser() -> ArgumentParser:
45
"""
46
Get the command line argument parser.
47
48
Returns:
49
ArgumentParser: Configured parser for oemer CLI options
50
"""
51
```
52
53
### Neural Network Prediction
54
55
Generate predictions using the two-stage neural network architecture.
56
57
```python { .api }
58
def generate_pred(img_path: str, use_tf: bool = False) -> Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]:
59
"""
60
Generate neural network predictions for all musical elements.
61
62
Runs two U-Net models:
63
1. Staff vs. symbols segmentation
64
2. Detailed symbol classification
65
66
Parameters:
67
- img_path (str): Path to the input image
68
- use_tf (bool): Use TensorFlow instead of ONNX runtime
69
70
Returns:
71
Tuple containing:
72
- staff (ndarray): Staff line predictions
73
- symbols (ndarray): General symbol predictions
74
- stems_rests (ndarray): Stems and rests predictions
75
- notehead (ndarray): Note head predictions
76
- clefs_keys (ndarray): Clefs and accidentals predictions
77
"""
78
```
79
80
### Data Management
81
82
Functions for managing intermediate processing data and model checkpoints.
83
84
```python { .api }
85
def clear_data() -> None:
86
"""
87
Clear all registered processing layers.
88
89
Removes all intermediate data from the layer management system
90
to free memory and reset state between processing runs.
91
"""
92
93
def download_file(title: str, url: str, save_path: str) -> None:
94
"""
95
Download model checkpoint files with progress tracking.
96
97
Parameters:
98
- title (str): Display name for download progress
99
- url (str): URL of the file to download
100
- save_path (str): Local path to save the downloaded file
101
"""
102
103
def polish_symbols(rgb_black_th: int = 300) -> ndarray:
104
"""
105
Polish symbol predictions by filtering background pixels.
106
107
Parameters:
108
- rgb_black_th (int): RGB threshold for black pixel detection
109
110
Returns:
111
ndarray: Refined symbol predictions
112
"""
113
```
114
115
### Registration Functions
116
117
Functions for registering extracted elements in the layer system.
118
119
```python { .api }
120
def register_notehead_bbox(bboxes: List[BBox]) -> ndarray:
121
"""
122
Register notehead bounding boxes in the layer system.
123
124
Parameters:
125
- bboxes (List[BBox]): List of notehead bounding boxes
126
127
Returns:
128
ndarray: Updated bounding box layer
129
"""
130
131
def register_note_id() -> None:
132
"""
133
Register note IDs in the layer system.
134
135
Creates a mapping layer where each pixel contains the ID
136
of the note it belongs to, or -1 for background pixels.
137
"""
138
```
139
140
## Usage Examples
141
142
### Basic Programmatic Usage
143
144
```python
145
from oemer.ete import extract, clear_data
146
from argparse import Namespace
147
import os
148
149
# Clear any previous data
150
clear_data()
151
152
# Configure extraction parameters
153
args = Namespace(
154
img_path='scores/beethoven_symphony.jpg',
155
output_path='./output/',
156
use_tf=False, # Use ONNX runtime (faster)
157
save_cache=True, # Save predictions for reuse
158
without_deskew=False # Enable deskewing for phone photos
159
)
160
161
try:
162
# Run the extraction pipeline
163
musicxml_path = extract(args)
164
print(f"Successfully generated: {musicxml_path}")
165
166
# Check if teaser image was also created
167
teaser_path = musicxml_path.replace('.musicxml', '_teaser.png')
168
if os.path.exists(teaser_path):
169
print(f"Analysis visualization: {teaser_path}")
170
171
except FileNotFoundError:
172
print(f"Image file not found: {args.img_path}")
173
except Exception as e:
174
print(f"Processing failed: {e}")
175
```
176
177
### Batch Processing Multiple Images
178
179
```python
180
import os
181
from pathlib import Path
182
from oemer.ete import extract, clear_data
183
from argparse import Namespace
184
185
def process_directory(input_dir: str, output_dir: str):
186
"""Process all images in a directory."""
187
input_path = Path(input_dir)
188
output_path = Path(output_dir)
189
output_path.mkdir(exist_ok=True)
190
191
# Find all image files
192
image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff'}
193
image_files = [f for f in input_path.iterdir()
194
if f.suffix.lower() in image_extensions]
195
196
print(f"Found {len(image_files)} images to process")
197
198
for i, img_file in enumerate(image_files, 1):
199
print(f"Processing {i}/{len(image_files)}: {img_file.name}")
200
201
# Clear data between files to free memory
202
clear_data()
203
204
args = Namespace(
205
img_path=str(img_file),
206
output_path=str(output_path),
207
use_tf=False,
208
save_cache=True, # Reuse predictions if processing same image again
209
without_deskew=False
210
)
211
212
try:
213
musicxml_path = extract(args)
214
print(f"✓ Generated: {Path(musicxml_path).name}")
215
except Exception as e:
216
print(f"✗ Failed: {e}")
217
218
# Process a directory of sheet music images
219
process_directory('./sheet_music_images/', './musicxml_output/')
220
```
221
222
### Using with Different Model Backends
223
224
```python
225
from oemer.ete import extract
226
from argparse import Namespace
227
228
# Test with both ONNX and TensorFlow backends
229
test_image = 'test_score.jpg'
230
231
# ONNX runtime (default - faster inference)
232
args_onnx = Namespace(
233
img_path=test_image,
234
output_path='./onnx_output/',
235
use_tf=False,
236
save_cache=False,
237
without_deskew=False
238
)
239
240
# TensorFlow backend (may have different precision)
241
args_tf = Namespace(
242
img_path=test_image,
243
output_path='./tf_output/',
244
use_tf=True,
245
save_cache=False,
246
without_deskew=False
247
)
248
249
print("Processing with ONNX runtime...")
250
onnx_result = extract(args_onnx)
251
252
print("Processing with TensorFlow...")
253
tf_result = extract(args_tf)
254
255
print(f"ONNX result: {onnx_result}")
256
print(f"TensorFlow result: {tf_result}")
257
```
258
259
## Pipeline Architecture
260
261
The extraction pipeline follows these stages:
262
263
1. **Input Validation**: Verify image file exists and is readable
264
2. **Model Loading**: Load or download neural network checkpoints
265
3. **Image Preprocessing**: Resize and normalize input image
266
4. **Neural Network Inference**: Run two-stage semantic segmentation
267
5. **Image Dewarping**: Correct perspective and skew (optional)
268
6. **Layer Registration**: Store all predictions in layer system
269
7. **Staff Extraction**: Detect and analyze staff lines
270
8. **Note Extraction**: Identify and classify noteheads
271
9. **Note Grouping**: Group notes by stems and beams
272
10. **Symbol Extraction**: Extract clefs, accidentals, rests, barlines
273
11. **Rhythm Analysis**: Analyze beams, flags, and dots for timing
274
12. **MusicXML Building**: Generate structured musical document
275
13. **Output Generation**: Write MusicXML file and analysis visualization
276
277
Each stage can access intermediate results through the layer management system, enabling modular processing and debugging capabilities.