or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced.mdcrf-estimator.mdindex.mdmetrics.mdsklearn-integration.mdutils.md

crf-estimator.mddocs/

0

# CRF Estimator

1

2

The main CRF class provides a scikit-learn compatible interface for Conditional Random Field sequence labeling. It wraps the efficient CRFsuite C++ implementation while maintaining full compatibility with sklearn's ecosystem for model selection, cross-validation, and pipeline integration.

3

4

## Capabilities

5

6

### Constructor and Configuration

7

8

Initialize a CRF estimator with algorithm selection and comprehensive hyperparameter configuration.

9

10

```python { .api }

11

class CRF:

12

def __init__(

13

self,

14

algorithm='lbfgs',

15

min_freq=0,

16

all_possible_states=False,

17

all_possible_transitions=False,

18

c1=0,

19

c2=1.0,

20

max_iterations=None,

21

num_memories=6,

22

epsilon=1e-5,

23

period=10,

24

delta=1e-5,

25

linesearch='MoreThuente',

26

max_linesearch=20,

27

calibration_eta=0.1,

28

calibration_rate=2.0,

29

calibration_samples=1000,

30

calibration_candidates=10,

31

calibration_max_trials=20,

32

pa_type=1,

33

c=1,

34

error_sensitive=True,

35

averaging=True,

36

variance=1,

37

gamma=1,

38

verbose=False,

39

model_filename=None,

40

keep_tempfiles=False,

41

trainer_cls=None

42

):

43

"""

44

Initialize CRF estimator.

45

46

Parameters:

47

- algorithm: str, training algorithm ('lbfgs', 'l2sgd', 'ap', 'pa', 'arow')

48

- min_freq: float, feature occurrence frequency cutoff threshold

49

- all_possible_states: bool, generate state features for all attribute-label combinations

50

- all_possible_transitions: bool, generate transition features for all label pairs

51

- c1: float, L1 regularization coefficient (lbfgs only)

52

- c2: float, L2 regularization coefficient

53

- max_iterations: int, maximum optimization iterations

54

- num_memories: int, limited memories for inverse hessian approximation (lbfgs)

55

- epsilon: float, convergence condition parameter

56

- period: int, iteration period for stopping criterion testing

57

- delta: float, stopping criterion threshold

58

- linesearch: str, line search algorithm ('MoreThuente', 'Backtracking', 'StrongBacktracking')

59

- max_linesearch: int, maximum line search trials

60

- calibration_eta: float, initial learning rate for calibration (l2sgd)

61

- calibration_rate: float, learning rate change rate (l2sgd)

62

- calibration_samples: int, calibration sample count (l2sgd)

63

- calibration_candidates: int, learning rate candidates (l2sgd)

64

- calibration_max_trials: int, maximum calibration trials (l2sgd)

65

- pa_type: int, passive aggressive strategy (0=no slack, 1=PA-I, 2=PA-II)

66

- c: float, aggressiveness parameter for PA

67

- error_sensitive: bool, include prediction error count in objective

68

- averaging: bool, compute averaged feature weights

69

- variance: float, initial feature weight variance (arow)

70

- gamma: float, loss vs weight change tradeoff (arow)

71

- verbose: bool, enable training progress output

72

- model_filename: str, path to existing model file

73

- keep_tempfiles: bool, preserve temporary model files

74

- trainer_cls: class, custom trainer class

75

"""

76

```

77

78

**Usage Example:**

79

80

```python

81

# Basic L-BFGS with regularization

82

crf = CRF(algorithm='lbfgs', c1=0.1, c2=0.1, max_iterations=100)

83

84

# Stochastic gradient descent setup

85

crf_sgd = CRF(

86

algorithm='l2sgd',

87

c2=1.0,

88

calibration_eta=0.01,

89

calibration_samples=500,

90

verbose=True

91

)

92

93

# Passive Aggressive configuration

94

crf_pa = CRF(algorithm='pa', pa_type=1, c=0.5, error_sensitive=True)

95

```

96

97

### Training

98

99

Train the CRF model on sequential data with optional development set for validation.

100

101

```python { .api }

102

def fit(self, X, y, X_dev=None, y_dev=None):

103

"""

104

Train the CRF model.

105

106

Parameters:

107

- X: List[List[Dict]], feature sequences for training documents

108

- y: List[List[str]], label sequences for training documents

109

- X_dev: List[List[Dict]], optional development/validation feature sequences

110

- y_dev: List[List[str]], optional development/validation label sequences

111

112

Returns:

113

- self: fitted CRF instance

114

"""

115

```

116

117

**Usage Example:**

118

119

```python

120

# Basic training

121

crf.fit(X_train, y_train)

122

123

# Training with validation set

124

crf.fit(X_train, y_train, X_dev=X_val, y_dev=y_val)

125

```

126

127

### Prediction

128

129

Make predictions on new sequences with various output formats.

130

131

```python { .api }

132

def predict(self, X):

133

"""

134

Predict labels for input sequences.

135

136

Parameters:

137

- X: List[List[Dict]], feature sequences to predict

138

139

Returns:

140

- List[List[str]]: predicted label sequences

141

"""

142

143

def predict_single(self, xseq):

144

"""

145

Predict labels for a single sequence.

146

147

Parameters:

148

- xseq: List[Dict], single feature sequence

149

150

Returns:

151

- List[str]: predicted labels for the sequence

152

"""

153

154

def predict_marginals(self, X):

155

"""

156

Get marginal probabilities for all labels at each position.

157

158

Parameters:

159

- X: List[List[Dict]], feature sequences

160

161

Returns:

162

- List[List[Dict[str, float]]]: marginal probabilities for each position

163

"""

164

165

def predict_marginals_single(self, xseq):

166

"""

167

Get marginal probabilities for a single sequence.

168

169

Parameters:

170

- xseq: List[Dict], single feature sequence

171

172

Returns:

173

- List[Dict[str, float]]: marginal probabilities for each position

174

"""

175

```

176

177

**Usage Example:**

178

179

```python

180

# Basic prediction

181

predictions = crf.predict(X_test)

182

183

# Single sequence prediction

184

single_pred = crf.predict_single(X_test[0])

185

186

# Get prediction confidence

187

marginals = crf.predict_marginals(X_test)

188

for seq_marginals in marginals:

189

for pos_probs in seq_marginals:

190

best_label = max(pos_probs, key=pos_probs.get)

191

confidence = pos_probs[best_label]

192

print(f"Label: {best_label}, Confidence: {confidence:.3f}")

193

```

194

195

### Evaluation

196

197

Evaluate model performance using built-in scoring methods.

198

199

```python { .api }

200

def score(self, X, y):

201

"""

202

Return token-level accuracy score.

203

204

Parameters:

205

- X: List[List[Dict]], feature sequences

206

- y: List[List[str]], true label sequences

207

208

Returns:

209

- float: flat accuracy score (token-level accuracy)

210

"""

211

```

212

213

### Model Introspection

214

215

Access learned model parameters and feature information.

216

217

```python { .api }

218

@property

219

def classes_(self):

220

"""List of class labels learned during training."""

221

222

@property

223

def tagger_(self):

224

"""Underlying pycrfsuite.Tagger instance."""

225

226

@property

227

def size_(self):

228

"""Model size in bytes."""

229

230

@property

231

def num_attributes_(self):

232

"""Number of non-zero CRF attributes."""

233

234

@property

235

def attributes_(self):

236

"""List of learned feature attributes."""

237

238

@property

239

def state_features_(self):

240

"""

241

Dict mapping (attribute_name, label) tuples to feature coefficients.

242

Shows learned weights for state features.

243

"""

244

245

@property

246

def transition_features_(self):

247

"""

248

Dict mapping (label_from, label_to) tuples to transition coefficients.

249

Shows learned weights for label transitions.

250

"""

251

252

@property

253

def training_log_(self):

254

"""Training log parser with iteration details."""

255

```

256

257

**Usage Example:**

258

259

```python

260

# Inspect learned model

261

print(f"Model size: {crf.size_} bytes")

262

print(f"Number of features: {crf.num_attributes_}")

263

print(f"Learned labels: {crf.classes_}")

264

265

# Examine feature weights

266

for (attr, label), weight in crf.state_features_.items():

267

if abs(weight) > 0.1: # Show only significant features

268

print(f"Feature '{attr}' -> '{label}': {weight:.3f}")

269

270

# Check transition patterns

271

for (from_label, to_label), weight in crf.transition_features_.items():

272

if abs(weight) > 0.1:

273

print(f"Transition '{from_label}' -> '{to_label}': {weight:.3f}")

274

```