or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdinference.mdlayer-management.mdmain-pipeline.mdnote-grouping.mdnotehead-extraction.mdstaffline-detection.md

index.mddocs/

0

# Oemer

1

2

End-to-end Optical Music Recognition (OMR) system that transcribes musical notation from images into structured MusicXML format. Built on deep learning models and computer vision techniques, oemer can process skewed and phone-taken photos of Western music notation sheets, providing a complete pipeline from image preprocessing to digital score generation.

3

4

## Package Information

5

6

- **Package Name**: oemer

7

- **Language**: Python

8

- **Installation**: `pip install oemer`

9

- **Optional**: `pip install oemer[tf]` for TensorFlow support

10

- **CLI Usage**: `oemer <path_to_image>`

11

12

## Core Imports

13

14

```python

15

import oemer

16

from oemer.ete import extract, main

17

```

18

19

For programmatic usage:

20

21

```python

22

from oemer.ete import extract

23

from argparse import Namespace

24

```

25

26

## Basic Usage

27

28

### Command Line Interface

29

30

```bash

31

# Basic usage - outputs MusicXML file and analysis image

32

oemer image.jpg

33

34

# Specify output directory

35

oemer image.jpg -o ./output/

36

37

# Use TensorFlow instead of ONNX runtime

38

oemer image.jpg --use-tf

39

40

# Save model predictions for reuse

41

oemer image.jpg --save-cache

42

43

# Disable image deskewing for aligned images

44

oemer image.jpg --without-deskew

45

```

46

47

### Programmatic Usage

48

49

```python

50

from oemer.ete import extract

51

from argparse import Namespace

52

53

# Configure extraction parameters

54

args = Namespace(

55

img_path='path/to/music_sheet.jpg',

56

output_path='./',

57

use_tf=False,

58

save_cache=False,

59

without_deskew=False

60

)

61

62

# Extract musical notation and generate MusicXML

63

musicxml_path = extract(args)

64

print(f"Generated MusicXML: {musicxml_path}")

65

```

66

67

## Architecture

68

69

Oemer follows a layered pipeline architecture using a global state management system:

70

71

1. **Image Preprocessing**: Dewarping and size normalization for skewed photos

72

2. **Neural Network Inference**: Two U-Net models for semantic segmentation

73

- Model 1: Stafflines vs. all other symbols

74

- Model 2: Detailed symbol classification (noteheads, clefs, stems, rests, accidentals)

75

3. **Feature Extraction**: Specialized extractors for musical elements

76

4. **Grouping and Rhythm Analysis**: Note grouping by stems/beams and rhythm detection

77

5. **MusicXML Generation**: Structured output with proper musical semantics

78

79

The **Layer Management System** (`oemer.layers`) provides global state management, allowing each processing stage to register intermediate results for use by subsequent stages.

80

81

## Capabilities

82

83

### Main Processing Pipeline

84

85

Complete end-to-end optical music recognition pipeline that handles the full workflow from image input to MusicXML output.

86

87

```python { .api }

88

def extract(args: Namespace) -> str:

89

"""

90

Main extraction pipeline function.

91

92

Parameters:

93

- args.img_path (str): Path to input image

94

- args.output_path (str): Output directory path

95

- args.use_tf (bool): Use TensorFlow instead of ONNX

96

- args.save_cache (bool): Save predictions for reuse

97

- args.without_deskew (bool): Skip deskewing step

98

99

Returns:

100

str: Path to generated MusicXML file

101

"""

102

103

def main() -> None:

104

"""CLI entry point for oemer command."""

105

106

def generate_pred(img_path: str, use_tf: bool = False) -> Tuple[ndarray, ndarray, ndarray, ndarray, ndarray]:

107

"""

108

Generate neural network predictions.

109

110

Returns:

111

Tuple containing staff, symbols, stems_rests, notehead, and clefs_keys predictions

112

"""

113

```

114

115

[Main Processing Pipeline](./main-pipeline.md)

116

117

### Neural Network Inference

118

119

Model inference capabilities using U-Net architectures for semantic segmentation of musical elements.

120

121

```python { .api }

122

def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:

123

"""

124

Run neural network inference on image patches.

125

126

Parameters:

127

- model_path (str): Path to model checkpoint directory

128

- img_path (str): Path to input image

129

- step_size (int): Sliding window step size

130

- batch_size (int): Inference batch size

131

- manual_th: Manual threshold for predictions

132

- use_tf (bool): Use TensorFlow instead of ONNX

133

134

Returns:

135

Tuple of prediction arrays and metadata

136

"""

137

```

138

139

[Neural Network Inference](./inference.md)

140

141

### Staffline Detection and Analysis

142

143

Detection and analysis of musical staff lines, which form the foundation for all subsequent processing steps.

144

145

```python { .api }

146

def extract(splits: int = 8, line_threshold: float = 0.8, horizontal_diff_th: float = 0.1, unit_size_diff_th: float = 0.1, barline_min_degree: int = 75) -> Tuple[ndarray, ndarray]:

147

"""Extract staff lines and group information."""

148

149

class Staff:

150

"""Complete staff (5 lines) representation."""

151

lines: List[Line]

152

track: int

153

group: int

154

is_interp: bool

155

156

def add_line(self, line: Line) -> None: ...

157

def duplicate(self, x_offset=0, y_offset=0): ...

158

159

@property

160

def unit_size(self) -> float: ...

161

@property

162

def y_center(self) -> float: ...

163

@property

164

def slope(self) -> float: ...

165

```

166

167

[Staffline Detection](./staffline-detection.md)

168

169

### Musical Symbol Recognition

170

171

Recognition and classification of musical symbols including noteheads, clefs, accidentals, rests, and barlines.

172

173

```python { .api }

174

def extract() -> List[NoteHead]:

175

"""Extract noteheads from neural network predictions."""

176

177

def extract(min_barline_h_unit_ratio: float = 3.75) -> Tuple[List[Barline], List[Clef], List[Sfn], List[Rest]]:

178

"""Extract musical symbols (barlines, clefs, accidentals, rests)."""

179

180

class NoteHead:

181

"""Note head representation with rhythm and pitch information."""

182

points: List[Tuple[int, int]]

183

pitch: Optional[int]

184

has_dot: bool

185

bbox: BBox

186

stem_up: Optional[bool]

187

stem_right: Optional[bool]

188

track: Optional[int]

189

group: Optional[int]

190

staff_line_pos: int

191

invalid: bool

192

id: Optional[int]

193

note_group_id: Optional[int]

194

sfn: Optional[Any] # Sharp/flat/natural association

195

label: NoteType

196

197

def add_point(self, x: int, y: int) -> None: ...

198

def force_set_label(self, label: NoteType) -> None: ...

199

200

class Clef:

201

"""Musical clef representation."""

202

bbox: BBox

203

track: Optional[int]

204

group: Optional[int]

205

label: ClefType

206

207

@property

208

def x_center(self) -> float: ...

209

210

class Sfn:

211

"""Sharp/Flat/Natural (accidental) representation."""

212

bbox: BBox

213

note_id: Optional[int]

214

is_key: Optional[bool] # Whether is key signature or accidental

215

track: Optional[int]

216

group: Optional[int]

217

label: SfnType

218

219

@property

220

def x_center(self) -> float: ...

221

222

class Rest:

223

"""Musical rest representation."""

224

bbox: BBox

225

track: Optional[int]

226

group: Optional[int]

227

label: RestType

228

229

@property

230

def x_center(self) -> float: ...

231

232

class Barline:

233

"""Musical barline representation."""

234

bbox: BBox

235

track: Optional[int]

236

group: Optional[int]

237

238

@property

239

def x_center(self) -> float: ...

240

```

241

242

*For complete notehead extraction details, see:*

243

244

[Notehead Extraction](./notehead-extraction.md)

245

246

### Note Grouping and Rhythm Analysis

247

248

Advanced grouping of individual notes into chords and rhythm pattern recognition through beam and flag analysis.

249

250

```python { .api }

251

def extract() -> Tuple[List[NoteGroup], ndarray]:

252

"""Group notes by stems and beams into chord groups."""

253

254

def extract(min_area_ratio: float = 0.08, max_area_ratio: float = 0.2, beam_th: float = 0.5) -> None:

255

"""Extract rhythm information from beams, flags, and dots."""

256

257

class NoteGroup:

258

"""Group of notes connected by stems/beams."""

259

id: Optional[int]

260

bbox: BBox

261

note_ids: List[int]

262

top_note_ids: List[int] # For multi-melody cases

263

bottom_note_ids: List[int] # For multi-melody cases

264

stem_up: Optional[bool]

265

has_stem: Optional[bool]

266

all_same_type: Optional[bool] # All notes are solid or hollow

267

group: Optional[int]

268

track: Optional[int]

269

270

@property

271

def x_center(self) -> float: ...

272

```

273

274

*For complete note grouping and rhythm analysis details, see:*

275

276

[Note Grouping and Rhythm Analysis](./note-grouping.md)

277

278

### MusicXML Generation

279

280

Generation of structured MusicXML documents from extracted musical elements with proper musical semantics and formatting.

281

282

```python { .api }

283

class MusicXMLBuilder:

284

"""Main MusicXML document builder."""

285

286

def __init__(self, title: str = "Unknown"): ...

287

def build(self) -> None:

288

"""Build the MusicXML structure from extracted elements."""

289

def to_musicxml(self) -> bytes:

290

"""Export to MusicXML format."""

291

292

# Key signature enumeration

293

class Key(enum.Enum):

294

C_MAJOR = 0 # Same as A-minor

295

G_MAJOR = 1 # Same as E-minor

296

D_MAJOR = 2 # Same as B-minor

297

A_MAJOR = 3 # Same as F#-minor

298

E_MAJOR = 4 # Same as C#-minor

299

B_MAJOR = 5 # Same as G#-minor

300

F_SHARP_MAJOR = 6 # Same as D#-minor

301

F_MAJOR = -1 # Same as D-minor

302

B_FLAT_MAJOR = -2 # Same as G-minor

303

E_FLAT_MAJOR = -3 # Same as C-minor

304

A_FLAT_MAJOR = -4 # Same as F-minor

305

D_FLAT_MAJOR = -5 # Same as Bb-minor

306

G_FLAT_MAJOR = -6 # Same as Eb-minor

307

308

class Voice:

309

"""Voice representation for MusicXML generation."""

310

id: Optional[int]

311

note_ids: List[int]

312

stem_up: Optional[bool]

313

group_id: Optional[int]

314

x_center: Optional[float]

315

label: NoteType

316

has_dot: Optional[bool]

317

group: Optional[int]

318

track: Optional[int]

319

duration: int

320

rhythm_name: Optional[str]

321

322

def init(self) -> None: ...

323

```

324

325

*MusicXML generation is handled by the MusicXMLBuilder class - see the main processing pipeline documentation above for complete details.*

326

327

### Layer Management System

328

329

Global state management system for intermediate processing results, enabling modular pipeline architecture.

330

331

```python { .api }

332

def register_layer(name: str, layer: ndarray) -> None:

333

"""Register a processing layer for global access."""

334

335

def get_layer(name: str) -> ndarray:

336

"""Retrieve a registered processing layer."""

337

338

def delete_layer(name: str) -> None:

339

"""Delete a registered layer."""

340

341

def list_layers() -> List[str]:

342

"""List all registered layer names."""

343

```

344

345

[Layer Management](./layer-management.md)

346

347

### Image Processing Utilities

348

349

Comprehensive image processing utilities including dewarping, morphological operations, and bounding box management.

350

351

```python { .api }

352

def estimate_coords(staff_pred: ndarray) -> Tuple[ndarray, ndarray]:

353

"""Estimate dewarping coordinates from staff predictions."""

354

355

def dewarp(img: ndarray, coords_x: ndarray, coords_y: ndarray) -> ndarray:

356

"""Apply dewarping transformation to correct image skew."""

357

358

def get_bbox(data: ndarray) -> List[BBox]:

359

"""Extract bounding boxes from binary image data using OpenCV contours."""

360

361

def get_center(bbox: Union[BBox, ndarray]) -> Tuple[int, int]:

362

"""Get center coordinates of a bounding box."""

363

364

def merge_nearby_bbox(bboxes: List[BBox], distance: float, x_factor: int = 1, y_factor: int = 1) -> List[BBox]:

365

"""Merge nearby bounding boxes using agglomerative clustering."""

366

367

def rm_merge_overlap_bbox(bboxes: List[BBox], overlap_ratio: float = 0.8) -> List[BBox]:

368

"""Remove and merge overlapping bounding boxes."""

369

370

class Grid:

371

"""Grid structure for dewarping coordinate estimation."""

372

id: Optional[int]

373

bbox: BBox

374

y_shift: int

375

376

@property

377

def y_center(self) -> float: ...

378

@property

379

def height(self) -> int: ...

380

381

class GridGroup:

382

"""Group of grids for dewarping processing."""

383

id: Optional[int]

384

reg_id: Optional[int]

385

bbox: BBox

386

gids: List[int]

387

split_unit: int

388

389

@property

390

def y_center(self) -> int: ...

391

392

def build_grid(st_pred: ndarray, split_unit: int = 11) -> Tuple[ndarray, List[Grid]]:

393

"""Build grid structure from staff predictions for dewarping."""

394

395

def build_grid_group(grid_map: ndarray, grids: List[Grid]) -> Tuple[ndarray, List[GridGroup]]:

396

"""Group grids into connected components for dewarping."""

397

```

398

399

*Image processing utilities are used throughout the pipeline - key dewarping and bounding box functions are documented in the main pipeline and neural network inference sections above.*

400

401

## Types

402

403

```python { .api }

404

from typing import Tuple, List, Optional, Union

405

from numpy import ndarray

406

from argparse import Namespace

407

408

# Core type aliases

409

BBox = Tuple[int, int, int, int] # Bounding box (x1, y1, x2, y2)

410

411

# Enumerations

412

class NoteType(enum.Enum):

413

WHOLE = 0

414

HALF = 1

415

QUARTER = 2

416

EIGHTH = 3

417

SIXTEENTH = 4

418

THIRTY_SECOND = 5

419

SIXTY_FOURTH = 6

420

TRIPLET = 7

421

OTHERS = 8

422

HALF_OR_WHOLE = 9 # Intermediate parsing state

423

424

class ClefType(enum.Enum):

425

G_CLEF = 1

426

F_CLEF = 2

427

428

class SfnType(enum.Enum):

429

FLAT = 1

430

SHARP = 2

431

NATURAL = 3

432

433

class RestType(enum.Enum):

434

WHOLE_HALF = 1

435

QUARTER = 2

436

EIGHTH = 3

437

SIXTEENTH = 4

438

THIRTY_SECOND = 5

439

SIXTY_FOURTH = 6

440

WHOLE = 7

441

HALF = 8

442

```

443

444

## Error Handling

445

446

Oemer defines custom exceptions for specific processing errors:

447

448

```python { .api }

449

class SfnException(Exception):

450

"""Base exception for Sharp/Flat/Natural processing errors."""

451

452

class SfnNoteTrackMismatch(SfnException):

453

"""Track mismatch error in accidental processing."""

454

455

class SfnNoteGroupMismatch(SfnException):

456

"""Group mismatch error in accidental processing."""

457

458

class StafflineException(Exception):

459

"""Base exception for staffline processing errors."""

460

461

class StafflineCountInconsistent(StafflineException):

462

"""Inconsistent staffline count detected."""

463

464

class StafflineNotAligned(StafflineException):

465

"""Stafflines are not properly aligned."""

466

467

class StafflineUnitSizeInconsistent(StafflineException):

468

"""Inconsistent unit sizes across stafflines."""

469

```

470

471

Common error handling pattern:

472

473

```python

474

try:

475

musicxml_path = extract(args)

476

except FileNotFoundError:

477

print("Input image file not found")

478

except StafflineException as e:

479

print(f"Staffline processing error: {e}")

480

except Exception as e:

481

print(f"Processing failed: {e}")

482

```