or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-sklearn-crfsuite

CRFsuite (python-crfsuite) wrapper which provides interface similar to scikit-learn

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sklearn-crfsuite@0.3.x

To install, run

npx @tessl/cli install tessl/pypi-sklearn-crfsuite@0.3.0

0

# sklearn-crfsuite

1

2

A scikit-learn compatible wrapper for CRFsuite that enables Conditional Random Fields (CRF) for sequence labeling tasks. It provides a familiar fit/predict interface while leveraging the efficient C++ CRFsuite implementation through python-crfsuite, making it ideal for named entity recognition, part-of-speech tagging, and other structured prediction tasks.

3

4

## Package Information

5

6

- **Package Name**: sklearn-crfsuite

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install sklearn-crfsuite`

10

11

## Core Imports

12

13

```python

14

from sklearn_crfsuite import CRF

15

```

16

17

Common pattern for metrics and evaluation:

18

19

```python

20

from sklearn_crfsuite import metrics

21

```

22

23

For scikit-learn integration:

24

25

```python

26

from sklearn_crfsuite import scorers

27

```

28

29

For utility functions:

30

31

```python

32

from sklearn_crfsuite import utils

33

```

34

35

For advanced trainer customization:

36

37

```python

38

from sklearn_crfsuite import trainer

39

```

40

41

## Basic Usage

42

43

```python

44

from sklearn_crfsuite import CRF

45

from sklearn_crfsuite import metrics

46

47

# Prepare training data (list of lists of feature dicts)

48

X_train = [

49

[{'word': 'I', 'pos': 'PRP'}, {'word': 'love', 'pos': 'VBP'}, {'word': 'Python', 'pos': 'NNP'}],

50

[{'word': 'CRF', 'pos': 'NNP'}, {'word': 'models', 'pos': 'NNS'}, {'word': 'work', 'pos': 'VBP'}]

51

]

52

53

# Labels for each sequence

54

y_train = [

55

['O', 'O', 'B-LANG'],

56

['B-TECH', 'I-TECH', 'O']

57

]

58

59

# Create and train the CRF model

60

crf = CRF(algorithm='lbfgs', c1=0.1, c2=0.1, max_iterations=100)

61

crf.fit(X_train, y_train)

62

63

# Make predictions

64

X_test = [

65

[{'word': 'Java', 'pos': 'NNP'}, {'word': 'is', 'pos': 'VBZ'}, {'word': 'popular', 'pos': 'JJ'}]

66

]

67

y_pred = crf.predict(X_test)

68

69

# Evaluate with sequence-level metrics

70

y_test = [['B-LANG', 'O', 'O']]

71

accuracy = metrics.flat_accuracy_score(y_test, y_pred)

72

seq_accuracy = metrics.sequence_accuracy_score(y_test, y_pred)

73

74

print(f"Token accuracy: {accuracy}")

75

print(f"Sequence accuracy: {seq_accuracy}")

76

```

77

78

## Architecture

79

80

sklearn-crfsuite bridges two key technologies:

81

82

- **CRFsuite**: High-performance C++ implementation of Conditional Random Fields

83

- **scikit-learn**: Python machine learning ecosystem providing standardized interfaces

84

85

The library maintains compatibility with sklearn's model selection utilities (cross-validation, grid search, pipeline integration) while providing access to CRF-specific features like marginal probabilities and feature introspection.

86

87

## Capabilities

88

89

### CRF Estimator

90

91

The main CRF class providing scikit-learn compatible interface for Conditional Random Field sequence labeling with comprehensive algorithm options and hyperparameter configuration.

92

93

```python { .api }

94

class CRF:

95

def __init__(self, algorithm='lbfgs', c1=0, c2=1.0, max_iterations=None, **kwargs): ...

96

def fit(self, X, y, X_dev=None, y_dev=None): ...

97

def predict(self, X): ...

98

def predict_marginals(self, X): ...

99

def score(self, X, y): ...

100

```

101

102

[CRF Estimator](./crf-estimator.md)

103

104

### Evaluation Metrics

105

106

Specialized metrics for sequence labeling evaluation, including both token-level (flat) and sequence-level accuracy measures designed for structured prediction tasks.

107

108

```python { .api }

109

def flat_accuracy_score(y_true, y_pred): ...

110

def flat_precision_score(y_true, y_pred, **kwargs): ...

111

def flat_recall_score(y_true, y_pred, **kwargs): ...

112

def flat_f1_score(y_true, y_pred, **kwargs): ...

113

def sequence_accuracy_score(y_true, y_pred): ...

114

```

115

116

[Evaluation Metrics](./metrics.md)

117

118

### Scikit-learn Integration

119

120

Ready-to-use scorer functions compatible with scikit-learn's cross-validation, grid search, and model selection utilities for seamless integration into ML pipelines.

121

122

```python { .api }

123

flat_accuracy: sklearn.metrics.scorer

124

sequence_accuracy: sklearn.metrics.scorer

125

```

126

127

[Scikit-learn Integration](./sklearn-integration.md)

128

129

### Utility Functions

130

131

Helper functions for working with sequence data and CRF-specific data transformations.

132

133

```python { .api }

134

def flatten(sequences): ...

135

```

136

137

[Utility Functions](./utils.md)

138

139

### Advanced Features

140

141

Advanced customization options including custom trainer classes for specialized training workflows and logging.

142

143

```python { .api }

144

class LinePerIterationTrainer: ...

145

```

146

147

[Advanced Features](./advanced.md)

148

149

## Types

150

151

```python { .api }

152

# Feature representation for CRF input

153

FeatureDict = Dict[str, Union[str, int, float, bool]]

154

Sequence = List[FeatureDict]

155

Dataset = List[Sequence]

156

157

# Label representation

158

LabelSequence = List[str]

159

LabelDataset = List[LabelSequence]

160

161

# Marginal probabilities output

162

MarginalProbs = Dict[str, float]

163

SequenceMarginals = List[MarginalProbs]

164

DatasetMarginals = List[SequenceMarginals]

165

```