or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdinference.mdlayer-management.mdmain-pipeline.mdnote-grouping.mdnotehead-extraction.mdstaffline-detection.md

inference.mddocs/

0

# Neural Network Inference

1

2

Model inference capabilities using U-Net architectures for semantic segmentation of musical elements. Oemer uses two specialized neural networks for different aspects of musical notation recognition.

3

4

## Capabilities

5

6

### Primary Inference Function

7

8

Run neural network inference using sliding window approach for large images.

9

10

```python { .api }

11

def inference(model_path: str, img_path: str, step_size: int = 128, batch_size: int = 16, manual_th: Optional[Any] = None, use_tf: bool = False) -> Tuple[ndarray, ndarray]:

12

"""

13

Run neural network inference on image patches using sliding window approach.

14

15

Parameters:

16

- model_path (str): Path to model checkpoint directory containing model files

17

- img_path (str): Path to input image file

18

- step_size (int): Sliding window step size in pixels (default: 128)

19

- batch_size (int): Number of patches to process in each batch (default: 16)

20

- manual_th (Optional[Any]): Manual threshold for prediction binarization

21

- use_tf (bool): Use TensorFlow instead of ONNX runtime (default: False)

22

23

Returns:

24

Tuple containing:

25

- predictions (ndarray): Segmentation predictions with class labels

26

- metadata (ndarray): Additional prediction metadata

27

28

Raises:

29

FileNotFoundError: If model files or input image are not found

30

RuntimeError: If inference fails due to model or memory issues

31

"""

32

```

33

34

### Image Preprocessing

35

36

Prepare images for optimal neural network processing.

37

38

```python { .api }

39

def resize_image(image: Image.Image) -> Image.Image:

40

"""

41

Resize image to optimal dimensions for neural network processing.

42

43

Maintains aspect ratio while ensuring dimensions are compatible

44

with the model's expected input size requirements.

45

46

Parameters:

47

- image (PIL.Image.Image): Input image to resize

48

49

Returns:

50

PIL.Image.Image: Resized image optimized for model inference

51

"""

52

```

53

54

### Symbol Classification

55

56

Use trained sklearn models for fine-grained symbol classification.

57

58

```python { .api }

59

def predict(region: ndarray, model_name: str) -> str:

60

"""

61

Predict symbol type using trained sklearn classification models.

62

63

Used for distinguishing between similar symbols that neural networks

64

cannot reliably differentiate (e.g., different clef types, accidentals).

65

66

Parameters:

67

- region (ndarray): Image region containing the symbol to classify

68

- model_name (str): Name of the sklearn model to use for prediction

69

70

Returns:

71

str: Predicted symbol class label

72

73

Raises:

74

ValueError: If model_name is not recognized

75

FileNotFoundError: If sklearn model file is not found

76

"""

77

```

78

79

## Neural Network Architecture

80

81

### Two-Stage Processing

82

83

Oemer uses two specialized U-Net models for different aspects of music recognition:

84

85

#### Model 1: Staff vs. Symbols Segmentation

86

- **Purpose**: Separate staff lines from all other musical symbols

87

- **Input**: Full music sheet image

88

- **Output**: Binary mask distinguishing staff lines (class 1) from symbols (class 2)

89

- **Location**: `checkpoints/unet_big/`

90

91

#### Model 2: Detailed Symbol Classification

92

- **Purpose**: Classify specific types of musical symbols

93

- **Input**: Full music sheet image

94

- **Output**: Multi-class segmentation with:

95

- Class 1: Stems and rests

96

- Class 2: Note heads

97

- Class 3: Clefs and accidentals (sharp/flat/natural)

98

- **Location**: `checkpoints/seg_net/`

99

100

### Model Files Structure

101

102

Each model directory contains:

103

- `model.onnx` - ONNX format model (default runtime)

104

- `weights.h5` - TensorFlow/Keras weights (when using `--use-tf`)

105

- `metadata.pkl` - Model metadata and configuration

106

- `arch.json` - Model architecture description

107

108

## Usage Examples

109

110

### Basic Inference

111

112

```python

113

from oemer.inference import inference

114

import numpy as np

115

116

# Run inference on a music sheet image

117

model_path = "oemer/checkpoints/unet_big"

118

img_path = "sheet_music.jpg"

119

120

# Generate predictions

121

predictions, metadata = inference(

122

model_path=model_path,

123

img_path=img_path,

124

step_size=128,

125

batch_size=16,

126

use_tf=False # Use ONNX runtime

127

)

128

129

# Extract staff and symbol predictions

130

staff_mask = np.where(predictions == 1, 1, 0)

131

symbol_mask = np.where(predictions == 2, 1, 0)

132

133

print(f"Predictions shape: {predictions.shape}")

134

print(f"Staff pixels: {np.sum(staff_mask)}")

135

print(f"Symbol pixels: {np.sum(symbol_mask)}")

136

```

137

138

### Two-Stage Inference Pipeline

139

140

```python

141

from oemer.inference import inference, resize_image

142

from PIL import Image

143

import os

144

145

def run_complete_inference(img_path: str, use_tf: bool = False):

146

"""Run both inference models on an image."""

147

148

# Resize image for optimal processing

149

image = Image.open(img_path)

150

resized_image = resize_image(image)

151

temp_path = "temp_resized.jpg"

152

resized_image.save(temp_path)

153

154

try:

155

# Stage 1: Staff vs. symbols segmentation

156

print("Running stage 1 inference (staff vs symbols)...")

157

staff_symbols, _ = inference(

158

model_path="oemer/checkpoints/unet_big",

159

img_path=temp_path,

160

step_size=128,

161

batch_size=16,

162

use_tf=use_tf

163

)

164

165

# Stage 2: Detailed symbol classification

166

print("Running stage 2 inference (symbol details)...")

167

symbol_details, _ = inference(

168

model_path="oemer/checkpoints/seg_net",

169

img_path=temp_path,

170

step_size=128,

171

batch_size=16,

172

use_tf=use_tf

173

)

174

175

# Process results

176

staff = np.where(staff_symbols == 1, 1, 0)

177

symbols = np.where(staff_symbols == 2, 1, 0)

178

stems_rests = np.where(symbol_details == 1, 1, 0)

179

noteheads = np.where(symbol_details == 2, 1, 0)

180

clefs_keys = np.where(symbol_details == 3, 1, 0)

181

182

return {

183

'staff': staff,

184

'symbols': symbols,

185

'stems_rests': stems_rests,

186

'noteheads': noteheads,

187

'clefs_keys': clefs_keys

188

}

189

190

finally:

191

# Clean up temporary file

192

if os.path.exists(temp_path):

193

os.remove(temp_path)

194

195

# Run complete inference

196

results = run_complete_inference("my_sheet_music.jpg")

197

for key, mask in results.items():

198

print(f"{key}: {mask.shape}, pixels: {np.sum(mask)}")

199

```

200

201

### Custom Inference Parameters

202

203

```python

204

from oemer.inference import inference

205

206

# High-precision inference with smaller steps

207

high_precision_predictions, _ = inference(

208

model_path="oemer/checkpoints/unet_big",

209

img_path="complex_score.jpg",

210

step_size=64, # Smaller steps for more overlap

211

batch_size=8, # Smaller batches to reduce memory usage

212

use_tf=True # Use TensorFlow for potentially better precision

213

)

214

215

# Fast inference with larger steps

216

fast_predictions, _ = inference(

217

model_path="oemer/checkpoints/unet_big",

218

img_path="simple_score.jpg",

219

step_size=256, # Larger steps for faster processing

220

batch_size=32, # Larger batches if memory allows

221

use_tf=False # ONNX is typically faster

222

)

223

```

224

225

### Symbol Classification with sklearn Models

226

227

```python

228

from oemer.inference import predict

229

import cv2

230

import numpy as np

231

232

# Extract a symbol region from the image

233

image = cv2.imread("sheet_music.jpg", cv2.IMREAD_GRAYSCALE)

234

symbol_region = image[100:150, 200:250] # Extract 50x50 region

235

236

# Classify the symbol using trained sklearn models

237

try:

238

# Predict clef type

239

clef_type = predict(symbol_region, "clef_classifier")

240

print(f"Detected clef: {clef_type}")

241

242

# Predict accidental type

243

accidental_type = predict(symbol_region, "accidental_classifier")

244

print(f"Detected accidental: {accidental_type}")

245

246

# Predict rest type

247

rest_type = predict(symbol_region, "rest_classifier")

248

print(f"Detected rest: {rest_type}")

249

250

except ValueError as e:

251

print(f"Classification error: {e}")

252

```

253

254

## Performance Considerations

255

256

### Memory Management

257

258

- **Batch Size**: Reduce `batch_size` if encountering out-of-memory errors

259

- **Step Size**: Larger `step_size` uses less memory but may reduce accuracy

260

- **Model Backend**: ONNX runtime typically uses less memory than TensorFlow

261

262

### Speed Optimization

263

264

- **ONNX Runtime**: Generally faster than TensorFlow for inference

265

- **GPU Acceleration**: Install `onnxruntime-gpu` for GPU acceleration on Linux

266

- **Image Size**: Resize very large images to reduce processing time

267

268

### Quality vs. Speed Trade-offs

269

270

```python

271

# Quality-focused settings (slower)

272

quality_settings = {

273

'step_size': 64,

274

'batch_size': 8,

275

'use_tf': True

276

}

277

278

# Speed-focused settings (faster)

279

speed_settings = {

280

'step_size': 256,

281

'batch_size': 32,

282

'use_tf': False

283

}

284

285

# Balanced settings (recommended)

286

balanced_settings = {

287

'step_size': 128,

288

'batch_size': 16,

289

'use_tf': False

290

}

291

```

292

293

The inference system is designed to handle various image sizes and qualities, automatically adapting the processing pipeline for optimal results while maintaining reasonable performance.