or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced.mdcrf-estimator.mdindex.mdmetrics.mdsklearn-integration.mdutils.md
tile.json

tessl/pypi-sklearn-crfsuite

CRFsuite (python-crfsuite) wrapper which provides interface similar to scikit-learn

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sklearn-crfsuite@0.3.x

To install, run

npx @tessl/cli install tessl/pypi-sklearn-crfsuite@0.3.0

index.mddocs/

sklearn-crfsuite

A scikit-learn compatible wrapper for CRFsuite that enables Conditional Random Fields (CRF) for sequence labeling tasks. It provides a familiar fit/predict interface while leveraging the efficient C++ CRFsuite implementation through python-crfsuite, making it ideal for named entity recognition, part-of-speech tagging, and other structured prediction tasks.

Package Information

  • Package Name: sklearn-crfsuite
  • Package Type: pypi
  • Language: Python
  • Installation: pip install sklearn-crfsuite

Core Imports

from sklearn_crfsuite import CRF

Common pattern for metrics and evaluation:

from sklearn_crfsuite import metrics

For scikit-learn integration:

from sklearn_crfsuite import scorers

For utility functions:

from sklearn_crfsuite import utils

For advanced trainer customization:

from sklearn_crfsuite import trainer

Basic Usage

from sklearn_crfsuite import CRF
from sklearn_crfsuite import metrics

# Prepare training data (list of lists of feature dicts)
X_train = [
    [{'word': 'I', 'pos': 'PRP'}, {'word': 'love', 'pos': 'VBP'}, {'word': 'Python', 'pos': 'NNP'}],
    [{'word': 'CRF', 'pos': 'NNP'}, {'word': 'models', 'pos': 'NNS'}, {'word': 'work', 'pos': 'VBP'}]
]

# Labels for each sequence
y_train = [
    ['O', 'O', 'B-LANG'],
    ['B-TECH', 'I-TECH', 'O']
]

# Create and train the CRF model
crf = CRF(algorithm='lbfgs', c1=0.1, c2=0.1, max_iterations=100)
crf.fit(X_train, y_train)

# Make predictions
X_test = [
    [{'word': 'Java', 'pos': 'NNP'}, {'word': 'is', 'pos': 'VBZ'}, {'word': 'popular', 'pos': 'JJ'}]
]
y_pred = crf.predict(X_test)

# Evaluate with sequence-level metrics
y_test = [['B-LANG', 'O', 'O']]
accuracy = metrics.flat_accuracy_score(y_test, y_pred)
seq_accuracy = metrics.sequence_accuracy_score(y_test, y_pred)

print(f"Token accuracy: {accuracy}")
print(f"Sequence accuracy: {seq_accuracy}")

Architecture

sklearn-crfsuite bridges two key technologies:

  • CRFsuite: High-performance C++ implementation of Conditional Random Fields
  • scikit-learn: Python machine learning ecosystem providing standardized interfaces

The library maintains compatibility with sklearn's model selection utilities (cross-validation, grid search, pipeline integration) while providing access to CRF-specific features like marginal probabilities and feature introspection.

Capabilities

CRF Estimator

The main CRF class providing scikit-learn compatible interface for Conditional Random Field sequence labeling with comprehensive algorithm options and hyperparameter configuration.

class CRF:
    def __init__(self, algorithm='lbfgs', c1=0, c2=1.0, max_iterations=None, **kwargs): ...
    def fit(self, X, y, X_dev=None, y_dev=None): ...
    def predict(self, X): ...
    def predict_marginals(self, X): ...
    def score(self, X, y): ...

CRF Estimator

Evaluation Metrics

Specialized metrics for sequence labeling evaluation, including both token-level (flat) and sequence-level accuracy measures designed for structured prediction tasks.

def flat_accuracy_score(y_true, y_pred): ...
def flat_precision_score(y_true, y_pred, **kwargs): ...
def flat_recall_score(y_true, y_pred, **kwargs): ...
def flat_f1_score(y_true, y_pred, **kwargs): ...
def sequence_accuracy_score(y_true, y_pred): ...

Evaluation Metrics

Scikit-learn Integration

Ready-to-use scorer functions compatible with scikit-learn's cross-validation, grid search, and model selection utilities for seamless integration into ML pipelines.

flat_accuracy: sklearn.metrics.scorer
sequence_accuracy: sklearn.metrics.scorer

Scikit-learn Integration

Utility Functions

Helper functions for working with sequence data and CRF-specific data transformations.

def flatten(sequences): ...

Utility Functions

Advanced Features

Advanced customization options including custom trainer classes for specialized training workflows and logging.

class LinePerIterationTrainer: ...

Advanced Features

Types

# Feature representation for CRF input
FeatureDict = Dict[str, Union[str, int, float, bool]]
Sequence = List[FeatureDict]
Dataset = List[Sequence]

# Label representation
LabelSequence = List[str]
LabelDataset = List[LabelSequence]

# Marginal probabilities output
MarginalProbs = Dict[str, float]
SequenceMarginals = List[MarginalProbs]
DatasetMarginals = List[SequenceMarginals]