Tessl Tile for pypi/oemer@0.1.0

npx @tessl/cli init
Version
Tile
Files
index.mddocs/
0
# Oemer
1

2
End-to-end Optical Music Recognition (OMR) system that transcribes musical notation from images into structured MusicXML format. Built on deep learning models and computer vision techniques, oemer can process skewed and phone-taken photos of Western music notation sheets, providing a complete pipeline from image preprocessing to digital score generation.
3

4
## Package Information
5

6
- **Package Name**: oemer
7
- **Language**: Python
8
- **Installation**: `pip install oemer`
9
- **Optional**: `pip install oemer[tf]` for TensorFlow support
10
- **CLI Usage**: `oemer <path_to_image>`
11

12
## Core Imports
13

14
```python
15
import oemer
16
from oemer.ete import extract, main
17
```
18

19
For programmatic usage:
20

21
```python
22
from oemer.ete import extract
23
from argparse import Namespace
24
```
25

26
## Basic Usage
27

28
### Command Line Interface
29

30
```bash
31
# Basic usage - outputs MusicXML file and analysis image
32
oemer image.jpg
33

34
# Specify output directory
35
oemer image.jpg -o ./output/
36

37
# Use TensorFlow instead of ONNX runtime
38
oemer image.jpg --use-tf
39

40
# Save model predictions for reuse
41
oemer image.jpg --save-cache
42

43
# Disable image deskewing for aligned images
44
oemer image.jpg --without-deskew
45
```
46

47
### Programmatic Usage
48

49
```python
50
from oemer.ete import extract
51
from argparse import Namespace
52

53
# Configure extraction parameters
54
args = Namespace(
55
    img_path='path/to/music_sheet.jpg',
56
    output_path='./',
57
    use_tf=False,
58
    save_cache=False,
59
    without_deskew=False
60
)
61

62
# Extract musical notation and generate MusicXML
63
musicxml_path = extract(args)
64
print(f"Generated MusicXML: {musicxml_path}")
65
```
66

67
## Architecture
68

69
Oemer follows a layered pipeline architecture using a global state management system:
70

71
1. **Image Preprocessing**: Dewarping and size normalization for skewed photos
72
2. **Neural Network Inference**: Two U-Net models for semantic segmentation
73
   - Model 1: Stafflines vs. all other symbols
74
   - Model 2: Detailed symbol classification (noteheads, clefs, stems, rests, accidentals)
75
3. **Feature Extraction**: Specialized extractors for musical elements
76
4. **Grouping and Rhythm Analysis**: Note grouping by stems/beams and rhythm detection
77
5. **MusicXML Generation**: Structured output with proper musical semantics
78

79
The **Layer Management System** (`oemer.layers`) provides global state management, allowing each processing stage to register intermediate results for use by subsequent stages.
80

81
## Capabilities
82

83
### Main Processing Pipeline
84

85
Complete end-to-end optical music recognition pipeline that handles the full workflow from image input to MusicXML output.
86

87
```python { .api }
88
def extract(args: Namespace) -> str:
89
    """
90
    Main extraction pipeline function.
91
    
92
    Parameters:
93
    - args.img_path (str): Path to input image
94
    - args.output_path (str): Output directory path
95
    - args.use_tf (bool): Use TensorFlow instead of ONNX
96
    - args.save_cache (bool): Save predictions for reuse
97
    - args.without_deskew (bool): Skip deskewing step
98
    
99
    Returns:
100
    str: Path to generated MusicXML file
101
    """
102

103
def main() -> None:
104
    """CLI entry point for oemer command."""
105

106
def generate_pred(img_path: str, use_tf: bool = False) -> Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]:
107
    """
108
    Generate neural network predictions.
109
    
110
    Returns:
111
    Tuple containing staff, symbols, stems_rests, notehead, and clefs_keys predictions
112
    """
113
```
114

115
[Main Processing Pipeline](./main-pipeline.md)
116

117
### Neural Network Inference
118

119
Model inference capabilities using U-Net architectures for semantic segmentation of musical elements.
120

121
```python { .api }
122
def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:
123
    """
124
    Run neural network inference on image patches.
125
    
126
    Parameters:
127
    - model_path (str): Path to model checkpoint directory
128
    - img_path (str): Path to input image
129
    - step_size (int): Sliding window step size
130
    - batch_size (int): Inference batch size
131
    - manual_th: Manual threshold for predictions
132
    - use_tf (bool): Use TensorFlow instead of ONNX
133
    
134
    Returns:
135
    Tuple of prediction arrays and metadata
136
    """
137
```
138

139
[Neural Network Inference](./inference.md)
140

141
### Staffline Detection and Analysis
142

143
Detection and analysis of musical staff lines, which form the foundation for all subsequent processing steps.
144

145
```python { .api }
146
def extract(splits: int = 8, line_threshold: float = 0.8, horizontal_diff_th: float = 0.1, unit_size_diff_th: float = 0.1, barline_min_degree: int = 75) -> Tuple[ndarray, ndarray]:
147
    """Extract staff lines and group information."""
148

149
class Staff:
150
    """Complete staff (5 lines) representation."""
151
    lines: List[Line]
152
    track: int
153
    group: int
154
    is_interp: bool
155
    
156
    def add_line(self, line: Line) -> None: ...
157
    def duplicate(self, x_offset=0, y_offset=0): ...
158
    
159
    @property
160
    def unit_size(self) -> float: ...
161
    @property
162
    def y_center(self) -> float: ...
163
    @property
164
    def slope(self) -> float: ...
165
```
166

167
[Staffline Detection](./staffline-detection.md)
168

169
### Musical Symbol Recognition
170

171
Recognition and classification of musical symbols including noteheads, clefs, accidentals, rests, and barlines.
172

173
```python { .api }
174
def extract() -> List[NoteHead]:
175
    """Extract noteheads from neural network predictions."""
176

177
def extract(min_barline_h_unit_ratio: float = 3.75) -> Tuple[List[Barline], List[Clef], List[Sfn], List[Rest]]:
178
    """Extract musical symbols (barlines, clefs, accidentals, rests)."""
179

180
class NoteHead:
181
    """Note head representation with rhythm and pitch information."""
182
    points: List[Tuple[int, int]]
183
    pitch: Optional[int]
184
    has_dot: bool
185
    bbox: BBox
186
    stem_up: Optional[bool]
187
    stem_right: Optional[bool]
188
    track: Optional[int] 
189
    group: Optional[int]
190
    staff_line_pos: int
191
    invalid: bool
192
    id: Optional[int]
193
    note_group_id: Optional[int]
194
    sfn: Optional[Any]  # Sharp/flat/natural association
195
    label: NoteType
196
    
197
    def add_point(self, x: int, y: int) -> None: ...
198
    def force_set_label(self, label: NoteType) -> None: ...
199

200
class Clef:
201
    """Musical clef representation."""
202
    bbox: BBox
203
    track: Optional[int]
204
    group: Optional[int]
205
    label: ClefType
206
    
207
    @property
208
    def x_center(self) -> float: ...
209

210
class Sfn:
211
    """Sharp/Flat/Natural (accidental) representation.""" 
212
    bbox: BBox
213
    note_id: Optional[int]
214
    is_key: Optional[bool]  # Whether is key signature or accidental
215
    track: Optional[int]
216
    group: Optional[int]
217
    label: SfnType
218
    
219
    @property
220
    def x_center(self) -> float: ...
221

222
class Rest:
223
    """Musical rest representation."""
224
    bbox: BBox
225
    track: Optional[int]
226
    group: Optional[int]
227
    label: RestType
228
    
229
    @property 
230
    def x_center(self) -> float: ...
231

232
class Barline:
233
    """Musical barline representation."""
234
    bbox: BBox
235
    track: Optional[int]
236
    group: Optional[int]
237
    
238
    @property
239
    def x_center(self) -> float: ...
240
```
241

242
*For complete notehead extraction details, see:*
243

244
[Notehead Extraction](./notehead-extraction.md)
245

246
### Note Grouping and Rhythm Analysis
247

248
Advanced grouping of individual notes into chords and rhythm pattern recognition through beam and flag analysis.
249

250
```python { .api }
251
def extract() -> Tuple[List[NoteGroup], ndarray]:
252
    """Group notes by stems and beams into chord groups."""
253

254
def extract(min_area_ratio: float = 0.08, max_area_ratio: float = 0.2, beam_th: float = 0.5) -> None:
255
    """Extract rhythm information from beams, flags, and dots."""
256

257
class NoteGroup:
258
    """Group of notes connected by stems/beams."""
259
    id: Optional[int]
260
    bbox: BBox
261
    note_ids: List[int]
262
    top_note_ids: List[int]  # For multi-melody cases
263
    bottom_note_ids: List[int]  # For multi-melody cases
264
    stem_up: Optional[bool]
265
    has_stem: Optional[bool]
266
    all_same_type: Optional[bool]  # All notes are solid or hollow
267
    group: Optional[int]
268
    track: Optional[int]
269
    
270
    @property
271
    def x_center(self) -> float: ...
272
```
273

274
*For complete note grouping and rhythm analysis details, see:*
275

276
[Note Grouping and Rhythm Analysis](./note-grouping.md)
277

278
### MusicXML Generation
279

280
Generation of structured MusicXML documents from extracted musical elements with proper musical semantics and formatting.
281

282
```python { .api }
283
class MusicXMLBuilder:
284
    """Main MusicXML document builder."""
285
    
286
    def __init__(self, title: str = "Unknown"): ...
287
    def build(self) -> None:
288
        """Build the MusicXML structure from extracted elements."""
289
    def to_musicxml(self) -> bytes:
290
        """Export to MusicXML format."""
291

292
# Key signature enumeration
293
class Key(enum.Enum):
294
    C_MAJOR = 0  # Same as A-minor
295
    G_MAJOR = 1  # Same as E-minor
296
    D_MAJOR = 2  # Same as B-minor
297
    A_MAJOR = 3  # Same as F#-minor
298
    E_MAJOR = 4  # Same as C#-minor
299
    B_MAJOR = 5  # Same as G#-minor
300
    F_SHARP_MAJOR = 6  # Same as D#-minor
301
    F_MAJOR = -1  # Same as D-minor
302
    B_FLAT_MAJOR = -2  # Same as G-minor
303
    E_FLAT_MAJOR = -3  # Same as C-minor
304
    A_FLAT_MAJOR = -4  # Same as F-minor
305
    D_FLAT_MAJOR = -5  # Same as Bb-minor
306
    G_FLAT_MAJOR = -6  # Same as Eb-minor
307

308
class Voice:
309
    """Voice representation for MusicXML generation."""
310
    id: Optional[int]
311
    note_ids: List[int]
312
    stem_up: Optional[bool]
313
    group_id: Optional[int]
314
    x_center: Optional[float]
315
    label: NoteType
316
    has_dot: Optional[bool]
317
    group: Optional[int]
318
    track: Optional[int]
319
    duration: int
320
    rhythm_name: Optional[str]
321
    
322
    def init(self) -> None: ...
323
```
324

325
*MusicXML generation is handled by the MusicXMLBuilder class - see the main processing pipeline documentation above for complete details.*
326

327
### Layer Management System
328

329
Global state management system for intermediate processing results, enabling modular pipeline architecture.
330

331
```python { .api }
332
def register_layer(name: str, layer: ndarray) -> None:
333
    """Register a processing layer for global access."""
334

335
def get_layer(name: str) -> ndarray:
336
    """Retrieve a registered processing layer."""
337

338
def delete_layer(name: str) -> None:
339
    """Delete a registered layer."""
340

341
def list_layers() -> List[str]:
342
    """List all registered layer names."""
343
```
344

345
[Layer Management](./layer-management.md)
346

347
### Image Processing Utilities
348

349
Comprehensive image processing utilities including dewarping, morphological operations, and bounding box management.
350

351
```python { .api }
352
def estimate_coords(staff_pred: ndarray) -> Tuple[ndarray, ndarray]:
353
    """Estimate dewarping coordinates from staff predictions."""
354

355
def dewarp(img: ndarray, coords_x: ndarray, coords_y: ndarray) -> ndarray:
356
    """Apply dewarping transformation to correct image skew."""
357

358
def get_bbox(data: ndarray) -> List[BBox]:
359
    """Extract bounding boxes from binary image data using OpenCV contours."""
360

361
def get_center(bbox: Union[BBox, ndarray]) -> Tuple[int, int]:
362
    """Get center coordinates of a bounding box."""
363

364
def merge_nearby_bbox(bboxes: List[BBox], distance: float, x_factor: int = 1, y_factor: int = 1) -> List[BBox]:
365
    """Merge nearby bounding boxes using agglomerative clustering."""
366

367
def rm_merge_overlap_bbox(bboxes: List[BBox], overlap_ratio: float = 0.8) -> List[BBox]:
368
    """Remove and merge overlapping bounding boxes."""
369

370
class Grid:
371
    """Grid structure for dewarping coordinate estimation."""
372
    id: Optional[int]
373
    bbox: BBox
374
    y_shift: int
375
    
376
    @property
377
    def y_center(self) -> float: ...
378
    @property 
379
    def height(self) -> int: ...
380

381
class GridGroup:
382
    """Group of grids for dewarping processing."""
383
    id: Optional[int]
384
    reg_id: Optional[int]
385
    bbox: BBox
386
    gids: List[int]
387
    split_unit: int
388
    
389
    @property
390
    def y_center(self) -> int: ...
391

392
def build_grid(st_pred: ndarray, split_unit: int = 11) -> Tuple[ndarray, List[Grid]]:
393
    """Build grid structure from staff predictions for dewarping."""
394

395
def build_grid_group(grid_map: ndarray, grids: List[Grid]) -> Tuple[ndarray, List[GridGroup]]:
396
    """Group grids into connected components for dewarping."""
397
```
398

399
*Image processing utilities are used throughout the pipeline - key dewarping and bounding box functions are documented in the main pipeline and neural network inference sections above.*
400

401
## Types
402

403
```python { .api }
404
from typing import Tuple, List, Optional, Union
405
from numpy import ndarray
406
from argparse import Namespace
407

408
# Core type aliases
409
BBox = Tuple[int, int, int, int]  # Bounding box (x1, y1, x2, y2)
410

411
# Enumerations
412
class NoteType(enum.Enum):
413
    WHOLE = 0
414
    HALF = 1
415
    QUARTER = 2
416
    EIGHTH = 3
417
    SIXTEENTH = 4
418
    THIRTY_SECOND = 5
419
    SIXTY_FOURTH = 6
420
    TRIPLET = 7
421
    OTHERS = 8
422
    HALF_OR_WHOLE = 9  # Intermediate parsing state
423

424
class ClefType(enum.Enum):
425
    G_CLEF = 1
426
    F_CLEF = 2
427

428
class SfnType(enum.Enum):
429
    FLAT = 1
430
    SHARP = 2
431
    NATURAL = 3
432

433
class RestType(enum.Enum):
434
    WHOLE_HALF = 1
435
    QUARTER = 2
436
    EIGHTH = 3
437
    SIXTEENTH = 4
438
    THIRTY_SECOND = 5
439
    SIXTY_FOURTH = 6
440
    WHOLE = 7
441
    HALF = 8
442
```
443

444
## Error Handling
445

446
Oemer defines custom exceptions for specific processing errors:
447

448
```python { .api }
449
class SfnException(Exception):
450
    """Base exception for Sharp/Flat/Natural processing errors."""
451

452
class SfnNoteTrackMismatch(SfnException):
453
    """Track mismatch error in accidental processing."""
454

455
class SfnNoteGroupMismatch(SfnException):
456
    """Group mismatch error in accidental processing."""
457

458
class StafflineException(Exception):
459
    """Base exception for staffline processing errors."""
460

461
class StafflineCountInconsistent(StafflineException):
462
    """Inconsistent staffline count detected."""
463

464
class StafflineNotAligned(StafflineException):
465
    """Stafflines are not properly aligned."""
466

467
class StafflineUnitSizeInconsistent(StafflineException):
468
    """Inconsistent unit sizes across stafflines."""
469
```
470

471
Common error handling pattern:
472

473
```python
474
try:
475
    musicxml_path = extract(args)
476
except FileNotFoundError:
477
    print("Input image file not found")
478
except StafflineException as e:
479
    print(f"Staffline processing error: {e}")
480
except Exception as e:
481
    print(f"Processing failed: {e}")
482
```
Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/