0
# Oemer
1
2
End-to-end Optical Music Recognition (OMR) system that transcribes musical notation from images into structured MusicXML format. Built on deep learning models and computer vision techniques, oemer can process skewed and phone-taken photos of Western music notation sheets, providing a complete pipeline from image preprocessing to digital score generation.
3
4
## Package Information
5
6
- **Package Name**: oemer
7
- **Language**: Python
8
- **Installation**: `pip install oemer`
9
- **Optional**: `pip install oemer[tf]` for TensorFlow support
10
- **CLI Usage**: `oemer <path_to_image>`
11
12
## Core Imports
13
14
```python
15
import oemer
16
from oemer.ete import extract, main
17
```
18
19
For programmatic usage:
20
21
```python
22
from oemer.ete import extract
23
from argparse import Namespace
24
```
25
26
## Basic Usage
27
28
### Command Line Interface
29
30
```bash
31
# Basic usage - outputs MusicXML file and analysis image
32
oemer image.jpg
33
34
# Specify output directory
35
oemer image.jpg -o ./output/
36
37
# Use TensorFlow instead of ONNX runtime
38
oemer image.jpg --use-tf
39
40
# Save model predictions for reuse
41
oemer image.jpg --save-cache
42
43
# Disable image deskewing for aligned images
44
oemer image.jpg --without-deskew
45
```
46
47
### Programmatic Usage
48
49
```python
50
from oemer.ete import extract
51
from argparse import Namespace
52
53
# Configure extraction parameters
54
args = Namespace(
55
img_path='path/to/music_sheet.jpg',
56
output_path='./',
57
use_tf=False,
58
save_cache=False,
59
without_deskew=False
60
)
61
62
# Extract musical notation and generate MusicXML
63
musicxml_path = extract(args)
64
print(f"Generated MusicXML: {musicxml_path}")
65
```
66
67
## Architecture
68
69
Oemer follows a layered pipeline architecture using a global state management system:
70
71
1. **Image Preprocessing**: Dewarping and size normalization for skewed photos
72
2. **Neural Network Inference**: Two U-Net models for semantic segmentation
73
- Model 1: Stafflines vs. all other symbols
74
- Model 2: Detailed symbol classification (noteheads, clefs, stems, rests, accidentals)
75
3. **Feature Extraction**: Specialized extractors for musical elements
76
4. **Grouping and Rhythm Analysis**: Note grouping by stems/beams and rhythm detection
77
5. **MusicXML Generation**: Structured output with proper musical semantics
78
79
The **Layer Management System** (`oemer.layers`) provides global state management, allowing each processing stage to register intermediate results for use by subsequent stages.
80
81
## Capabilities
82
83
### Main Processing Pipeline
84
85
Complete end-to-end optical music recognition pipeline that handles the full workflow from image input to MusicXML output.
86
87
```python { .api }
88
def extract(args: Namespace) -> str:
89
"""
90
Main extraction pipeline function.
91
92
Parameters:
93
- args.img_path (str): Path to input image
94
- args.output_path (str): Output directory path
95
- args.use_tf (bool): Use TensorFlow instead of ONNX
96
- args.save_cache (bool): Save predictions for reuse
97
- args.without_deskew (bool): Skip deskewing step
98
99
Returns:
100
str: Path to generated MusicXML file
101
"""
102
103
def main() -> None:
104
"""CLI entry point for oemer command."""
105
106
def generate_pred(img_path: str, use_tf: bool = False) -> Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]:
107
"""
108
Generate neural network predictions.
109
110
Returns:
111
Tuple containing staff, symbols, stems_rests, notehead, and clefs_keys predictions
112
"""
113
```
114
115
[Main Processing Pipeline](./main-pipeline.md)
116
117
### Neural Network Inference
118
119
Model inference capabilities using U-Net architectures for semantic segmentation of musical elements.
120
121
```python { .api }
122
def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:
123
"""
124
Run neural network inference on image patches.
125
126
Parameters:
127
- model_path (str): Path to model checkpoint directory
128
- img_path (str): Path to input image
129
- step_size (int): Sliding window step size
130
- batch_size (int): Inference batch size
131
- manual_th: Manual threshold for predictions
132
- use_tf (bool): Use TensorFlow instead of ONNX
133
134
Returns:
135
Tuple of prediction arrays and metadata
136
"""
137
```
138
139
[Neural Network Inference](./inference.md)
140
141
### Staffline Detection and Analysis
142
143
Detection and analysis of musical staff lines, which form the foundation for all subsequent processing steps.
144
145
```python { .api }
146
def extract(splits: int = 8, line_threshold: float = 0.8, horizontal_diff_th: float = 0.1, unit_size_diff_th: float = 0.1, barline_min_degree: int = 75) -> Tuple[ndarray, ndarray]:
147
"""Extract staff lines and group information."""
148
149
class Staff:
150
"""Complete staff (5 lines) representation."""
151
lines: List[Line]
152
track: int
153
group: int
154
is_interp: bool
155
156
def add_line(self, line: Line) -> None: ...
157
def duplicate(self, x_offset=0, y_offset=0): ...
158
159
@property
160
def unit_size(self) -> float: ...
161
@property
162
def y_center(self) -> float: ...
163
@property
164
def slope(self) -> float: ...
165
```
166
167
[Staffline Detection](./staffline-detection.md)
168
169
### Musical Symbol Recognition
170
171
Recognition and classification of musical symbols including noteheads, clefs, accidentals, rests, and barlines.
172
173
```python { .api }
174
def extract() -> List[NoteHead]:
175
"""Extract noteheads from neural network predictions."""
176
177
def extract(min_barline_h_unit_ratio: float = 3.75) -> Tuple[List[Barline], List[Clef], List[Sfn], List[Rest]]:
178
"""Extract musical symbols (barlines, clefs, accidentals, rests)."""
179
180
class NoteHead:
181
"""Note head representation with rhythm and pitch information."""
182
points: List[Tuple[int, int]]
183
pitch: Optional[int]
184
has_dot: bool
185
bbox: BBox
186
stem_up: Optional[bool]
187
stem_right: Optional[bool]
188
track: Optional[int]
189
group: Optional[int]
190
staff_line_pos: int
191
invalid: bool
192
id: Optional[int]
193
note_group_id: Optional[int]
194
sfn: Optional[Any] # Sharp/flat/natural association
195
label: NoteType
196
197
def add_point(self, x: int, y: int) -> None: ...
198
def force_set_label(self, label: NoteType) -> None: ...
199
200
class Clef:
201
"""Musical clef representation."""
202
bbox: BBox
203
track: Optional[int]
204
group: Optional[int]
205
label: ClefType
206
207
@property
208
def x_center(self) -> float: ...
209
210
class Sfn:
211
"""Sharp/Flat/Natural (accidental) representation."""
212
bbox: BBox
213
note_id: Optional[int]
214
is_key: Optional[bool] # Whether is key signature or accidental
215
track: Optional[int]
216
group: Optional[int]
217
label: SfnType
218
219
@property
220
def x_center(self) -> float: ...
221
222
class Rest:
223
"""Musical rest representation."""
224
bbox: BBox
225
track: Optional[int]
226
group: Optional[int]
227
label: RestType
228
229
@property
230
def x_center(self) -> float: ...
231
232
class Barline:
233
"""Musical barline representation."""
234
bbox: BBox
235
track: Optional[int]
236
group: Optional[int]
237
238
@property
239
def x_center(self) -> float: ...
240
```
241
242
*For complete notehead extraction details, see:*
243
244
[Notehead Extraction](./notehead-extraction.md)
245
246
### Note Grouping and Rhythm Analysis
247
248
Advanced grouping of individual notes into chords and rhythm pattern recognition through beam and flag analysis.
249
250
```python { .api }
251
def extract() -> Tuple[List[NoteGroup], ndarray]:
252
"""Group notes by stems and beams into chord groups."""
253
254
def extract(min_area_ratio: float = 0.08, max_area_ratio: float = 0.2, beam_th: float = 0.5) -> None:
255
"""Extract rhythm information from beams, flags, and dots."""
256
257
class NoteGroup:
258
"""Group of notes connected by stems/beams."""
259
id: Optional[int]
260
bbox: BBox
261
note_ids: List[int]
262
top_note_ids: List[int] # For multi-melody cases
263
bottom_note_ids: List[int] # For multi-melody cases
264
stem_up: Optional[bool]
265
has_stem: Optional[bool]
266
all_same_type: Optional[bool] # All notes are solid or hollow
267
group: Optional[int]
268
track: Optional[int]
269
270
@property
271
def x_center(self) -> float: ...
272
```
273
274
*For complete note grouping and rhythm analysis details, see:*
275
276
[Note Grouping and Rhythm Analysis](./note-grouping.md)
277
278
### MusicXML Generation
279
280
Generation of structured MusicXML documents from extracted musical elements with proper musical semantics and formatting.
281
282
```python { .api }
283
class MusicXMLBuilder:
284
"""Main MusicXML document builder."""
285
286
def __init__(self, title: str = "Unknown"): ...
287
def build(self) -> None:
288
"""Build the MusicXML structure from extracted elements."""
289
def to_musicxml(self) -> bytes:
290
"""Export to MusicXML format."""
291
292
# Key signature enumeration
293
class Key(enum.Enum):
294
C_MAJOR = 0 # Same as A-minor
295
G_MAJOR = 1 # Same as E-minor
296
D_MAJOR = 2 # Same as B-minor
297
A_MAJOR = 3 # Same as F#-minor
298
E_MAJOR = 4 # Same as C#-minor
299
B_MAJOR = 5 # Same as G#-minor
300
F_SHARP_MAJOR = 6 # Same as D#-minor
301
F_MAJOR = -1 # Same as D-minor
302
B_FLAT_MAJOR = -2 # Same as G-minor
303
E_FLAT_MAJOR = -3 # Same as C-minor
304
A_FLAT_MAJOR = -4 # Same as F-minor
305
D_FLAT_MAJOR = -5 # Same as Bb-minor
306
G_FLAT_MAJOR = -6 # Same as Eb-minor
307
308
class Voice:
309
"""Voice representation for MusicXML generation."""
310
id: Optional[int]
311
note_ids: List[int]
312
stem_up: Optional[bool]
313
group_id: Optional[int]
314
x_center: Optional[float]
315
label: NoteType
316
has_dot: Optional[bool]
317
group: Optional[int]
318
track: Optional[int]
319
duration: int
320
rhythm_name: Optional[str]
321
322
def init(self) -> None: ...
323
```
324
325
*MusicXML generation is handled by the MusicXMLBuilder class - see the main processing pipeline documentation above for complete details.*
326
327
### Layer Management System
328
329
Global state management system for intermediate processing results, enabling modular pipeline architecture.
330
331
```python { .api }
332
def register_layer(name: str, layer: ndarray) -> None:
333
"""Register a processing layer for global access."""
334
335
def get_layer(name: str) -> ndarray:
336
"""Retrieve a registered processing layer."""
337
338
def delete_layer(name: str) -> None:
339
"""Delete a registered layer."""
340
341
def list_layers() -> List[str]:
342
"""List all registered layer names."""
343
```
344
345
[Layer Management](./layer-management.md)
346
347
### Image Processing Utilities
348
349
Comprehensive image processing utilities including dewarping, morphological operations, and bounding box management.
350
351
```python { .api }
352
def estimate_coords(staff_pred: ndarray) -> Tuple[ndarray, ndarray]:
353
"""Estimate dewarping coordinates from staff predictions."""
354
355
def dewarp(img: ndarray, coords_x: ndarray, coords_y: ndarray) -> ndarray:
356
"""Apply dewarping transformation to correct image skew."""
357
358
def get_bbox(data: ndarray) -> List[BBox]:
359
"""Extract bounding boxes from binary image data using OpenCV contours."""
360
361
def get_center(bbox: Union[BBox, ndarray]) -> Tuple[int, int]:
362
"""Get center coordinates of a bounding box."""
363
364
def merge_nearby_bbox(bboxes: List[BBox], distance: float, x_factor: int = 1, y_factor: int = 1) -> List[BBox]:
365
"""Merge nearby bounding boxes using agglomerative clustering."""
366
367
def rm_merge_overlap_bbox(bboxes: List[BBox], overlap_ratio: float = 0.8) -> List[BBox]:
368
"""Remove and merge overlapping bounding boxes."""
369
370
class Grid:
371
"""Grid structure for dewarping coordinate estimation."""
372
id: Optional[int]
373
bbox: BBox
374
y_shift: int
375
376
@property
377
def y_center(self) -> float: ...
378
@property
379
def height(self) -> int: ...
380
381
class GridGroup:
382
"""Group of grids for dewarping processing."""
383
id: Optional[int]
384
reg_id: Optional[int]
385
bbox: BBox
386
gids: List[int]
387
split_unit: int
388
389
@property
390
def y_center(self) -> int: ...
391
392
def build_grid(st_pred: ndarray, split_unit: int = 11) -> Tuple[ndarray, List[Grid]]:
393
"""Build grid structure from staff predictions for dewarping."""
394
395
def build_grid_group(grid_map: ndarray, grids: List[Grid]) -> Tuple[ndarray, List[GridGroup]]:
396
"""Group grids into connected components for dewarping."""
397
```
398
399
*Image processing utilities are used throughout the pipeline - key dewarping and bounding box functions are documented in the main pipeline and neural network inference sections above.*
400
401
## Types
402
403
```python { .api }
404
from typing import Tuple, List, Optional, Union
405
from numpy import ndarray
406
from argparse import Namespace
407
408
# Core type aliases
409
BBox = Tuple[int, int, int, int] # Bounding box (x1, y1, x2, y2)
410
411
# Enumerations
412
class NoteType(enum.Enum):
413
WHOLE = 0
414
HALF = 1
415
QUARTER = 2
416
EIGHTH = 3
417
SIXTEENTH = 4
418
THIRTY_SECOND = 5
419
SIXTY_FOURTH = 6
420
TRIPLET = 7
421
OTHERS = 8
422
HALF_OR_WHOLE = 9 # Intermediate parsing state
423
424
class ClefType(enum.Enum):
425
G_CLEF = 1
426
F_CLEF = 2
427
428
class SfnType(enum.Enum):
429
FLAT = 1
430
SHARP = 2
431
NATURAL = 3
432
433
class RestType(enum.Enum):
434
WHOLE_HALF = 1
435
QUARTER = 2
436
EIGHTH = 3
437
SIXTEENTH = 4
438
THIRTY_SECOND = 5
439
SIXTY_FOURTH = 6
440
WHOLE = 7
441
HALF = 8
442
```
443
444
## Error Handling
445
446
Oemer defines custom exceptions for specific processing errors:
447
448
```python { .api }
449
class SfnException(Exception):
450
"""Base exception for Sharp/Flat/Natural processing errors."""
451
452
class SfnNoteTrackMismatch(SfnException):
453
"""Track mismatch error in accidental processing."""
454
455
class SfnNoteGroupMismatch(SfnException):
456
"""Group mismatch error in accidental processing."""
457
458
class StafflineException(Exception):
459
"""Base exception for staffline processing errors."""
460
461
class StafflineCountInconsistent(StafflineException):
462
"""Inconsistent staffline count detected."""
463
464
class StafflineNotAligned(StafflineException):
465
"""Stafflines are not properly aligned."""
466
467
class StafflineUnitSizeInconsistent(StafflineException):
468
"""Inconsistent unit sizes across stafflines."""
469
```
470
471
Common error handling pattern:
472
473
```python
474
try:
475
musicxml_path = extract(args)
476
except FileNotFoundError:
477
print("Input image file not found")
478
except StafflineException as e:
479
print(f"Staffline processing error: {e}")
480
except Exception as e:
481
print(f"Processing failed: {e}")
482
```