Tessl Tile for pypi/sklearn-crfsuite@0.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced.md crf-estimator.md index.md metrics.md sklearn-integration.md utils.md

utils.mddocs/

0
# Utility Functions
1

2
Helper functions for working with sequence data and CRF-specific data transformations. These utilities are primarily used internally by the metrics module but are available for advanced use cases requiring sequence data manipulation.
3

4
## Capabilities
5

6
### Sequence Flattening
7

8
Converts nested sequence structures into flat lists, essential for adapting CRF sequence data to work with standard scikit-learn metrics that expect flat label arrays.
9

10
```python { .api }
11
def flatten(sequences):
12
    """
13
    Flatten a list of sequences into a single list.
14

15
    Parameters:
16
    - sequences: List[List[Any]], list of sequences to flatten
17

18
    Returns:
19
    - List[Any]: flattened list combining all sequence elements
20
    """
21
```
22

23
**Usage Example:**
24

25
```python
26
from sklearn_crfsuite.utils import flatten
27

28
# Flatten sequence labels for use with sklearn metrics
29
y_sequences = [['B-PER', 'I-PER', 'O'], ['O', 'B-LOC']]
30
y_flat = flatten(y_sequences)
31
print(y_flat)  # ['B-PER', 'I-PER', 'O', 'O', 'B-LOC']
32

33
# Flatten feature sequences (less common use case)
34
feature_sequences = [
35
    [{'word': 'John'}, {'word': 'Smith'}],
36
    [{'word': 'New'}, {'word': 'York'}]
37
]
38
# Note: flatten works on any nested list structure
39
flat_features = flatten([[f['word'] for f in seq] for seq in feature_sequences])
40
print(flat_features)  # ['John', 'Smith', 'New', 'York']
41
```
42

43
### Integration with Metrics
44

45
The flatten function is automatically used by all "flat" metrics in sklearn_crfsuite.metrics to convert sequence data before passing to sklearn metrics functions.
46

47
**Usage Pattern:**
48

49
```python
50
from sklearn_crfsuite import metrics
51
from sklearn_crfsuite.utils import flatten
52
from sklearn.metrics import classification_report
53

54
# Automatic flattening (recommended)
55
report = metrics.flat_classification_report(y_true, y_pred)
56

57
# Manual flattening (for custom metrics)
58
y_true_flat = flatten(y_true)
59
y_pred_flat = flatten(y_pred)
60
custom_report = classification_report(y_true_flat, y_pred_flat)
61
```
62

63
### Data Preprocessing Applications
64

65
The utility can be useful for various sequence data preprocessing tasks:
66

67
**Usage Example:**
68

69
```python
70
from sklearn_crfsuite.utils import flatten
71
from collections import Counter
72

73
def analyze_label_distribution(y_sequences):
74
    """Analyze label distribution across all sequences."""
75
    all_labels = flatten(y_sequences)
76
    return Counter(all_labels)
77

78
def create_vocabulary(feature_sequences, feature_key='word'):
79
    """Create vocabulary from feature sequences."""
80
    all_words = flatten([[token.get(feature_key, '') for token in seq] 
81
                        for seq in feature_sequences])
82
    return set(all_words)
83

84
# Example usage
85
y_train = [['B-PER', 'I-PER', 'O'], ['O', 'B-LOC', 'I-LOC']]
86
label_dist = analyze_label_distribution(y_train)
87
print(f"Label distribution: {label_dist}")
88

89
X_train = [
90
    [{'word': 'John', 'pos': 'NNP'}, {'word': 'lives', 'pos': 'VBZ'}],
91
    [{'word': 'in', 'pos': 'IN'}, {'word': 'Boston', 'pos': 'NNP'}]
92
]
93
vocab = create_vocabulary(X_train)
94
print(f"Vocabulary: {sorted(vocab)}")
95
```

Version

Tile

Files

utils.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utils.mddocs/