Tessl Tile for pypi/python-terrier@0.13.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

datasets.md evaluation.md index.md indexing.md java.md retrieval.md text-processing.md transformers.md utilities.md

transformers.mddocs/

0
# Core Transformers
1

2
PyTerrier's core transformer architecture provides the foundation for building composable information retrieval pipelines. All PyTerrier components inherit from base transformer classes that support operator overloading for intuitive pipeline construction.
3

4
## Capabilities
5

6
### Base Transformer Class
7

8
The fundamental base class that all PyTerrier components inherit from, providing pipeline composition through operator overloading.
9

10
```python { .api }
11
class Transformer:
12
    """
13
    Base class for all PyTerrier transformers that process dataframes or iterators.
14
    
15
    Core Methods:
16
    - transform(topics_or_res): Transform DataFrame input to DataFrame output
17
    - transform_iter(input_iter): Transform iterator input to iterator output  
18
    - search(query): Convenience method for single query search
19
    """
20
    def transform(self, topics_or_res: pd.DataFrame) -> pd.DataFrame: ...
21
    def transform_iter(self, input_iter: Iterator[Dict[str, Any]]) -> Iterator[Dict[str, Any]]: ...
22
    def search(self, query: str, qid: str = "1") -> pd.DataFrame: ...
23
    def compile(self) -> 'Transformer': ...
24
    def parallel(self, jobs: int = 2, backend: str = 'joblib') -> 'Transformer': ...
25
    def get_parameter(self, name: str) -> Any: ...
26
    def set_parameter(self, name: str, value: Any) -> 'Transformer': ...
27
    
28
    # Static methods
29
    @staticmethod
30
    def identity() -> 'Transformer': ...
31
    @staticmethod  
32
    def from_df(df: pd.DataFrame, copy: bool = True) -> 'Transformer': ...
33
    
34
    # Pipeline operators
35
    def __rshift__(self, other: 'Transformer') -> 'Transformer': ...  # >>
36
    def __add__(self, other: 'Transformer') -> 'Transformer': ...     # +
37
    def __pow__(self, other: 'Transformer') -> 'Transformer': ...     # **
38
    def __or__(self, other: 'Transformer') -> 'Transformer': ...      # |
39
    def __and__(self, other: 'Transformer') -> 'Transformer': ...     # &
40
    def __mod__(self, cutoff: int) -> 'Transformer': ...              # %
41
    def __xor__(self, other: 'Transformer') -> 'Transformer': ...     # ^
42
    def __mul__(self, factor: float) -> 'Transformer': ...            # *
43
```
44

45
**Usage Examples:**
46

47
```python
48
# Basic pipeline composition
49
pipeline = retriever >> reranker >> cutoff_transformer
50

51
# Score combination
52
combined = system1 + system2  # Add scores
53

54
# Feature union  
55
features = feature_extractor1 ** feature_extractor2
56

57
# Set operations
58
union_results = system1 | system2      # Union of retrieved documents
59
intersection = system1 & system2       # Intersection of retrieved documents
60

61
# Rank cutoff
62
top10 = retriever % 10  # Keep only top 10 results
63

64
# Result concatenation
65
concatenated = system1 ^ system2
66
```
67

68
### Estimator Class
69

70
Base class for trainable transformers that can learn from training data.
71

72
```python { .api }
73
class Estimator(Transformer):
74
    """
75
    Base class for trainable transformers that learn from training data.
76
    
77
    Parameters:
78
    - topics_or_res_tr: Training topics (usually with documents)
79
    - qrels_tr: Training qrels (relevance judgments)
80
    - topics_or_res_va: Validation topics (usually with documents)
81
    - qrels_va: Validation qrels (relevance judgments)
82
    
83
    Returns:
84
    - Trained estimator instance
85
    """
86
    def fit(self, topics_or_res_tr: pd.DataFrame, qrels_tr: pd.DataFrame, 
87
            topics_or_res_va: pd.DataFrame, qrels_va: pd.DataFrame) -> 'Estimator': ...
88
```
89

90
**Usage Example:**
91

92
```python
93
# Train a learning-to-rank model
94
ltr_model = SomeLearnToRankTransformer()
95
trained_model = ltr_model.fit(training_topics_res, training_qrels, 
96
                              validation_topics_res, validation_qrels)
97

98
# Use trained model in pipeline
99
pipeline = retriever >> trained_model
100
```
101

102
### Indexer Class
103

104
Base class for components that create searchable indexes from document collections.
105

106
```python { .api }
107
class Indexer(Transformer):
108
    """
109
    Base class for indexers that create searchable indexes from document collections.
110
    
111
    Parameters:
112
    - iter_dict: Iterator over documents with 'docno' and 'text' fields
113
    
114
    Returns:
115
    - IndexRef object representing the created index
116
    """
117
    def index(self, iter_dict: Iterator[Dict[str, Any]]) -> Any: ...
118
```
119

120
**Usage Example:**
121

122
```python
123
# Create an indexer
124
indexer = pt.FilesIndexer('/path/to/index')
125

126
# Index documents
127
documents = [
128
    {'docno': 'doc1', 'text': 'This is document 1'},
129
    {'docno': 'doc2', 'text': 'This is document 2'}
130
]
131
index_ref = indexer.index(documents)
132
```
133

134
### Pipeline Operators
135

136
Specialized transformer classes that implement pipeline operators for combining multiple transformers.
137

138
```python { .api }
139
class Compose(Transformer):
140
    """Pipeline composition operator (>>). Chains transformers sequentially."""
141
    def __init__(self, *transformers: Transformer): ...
142
    def index(self, iter: Iterator[Dict[str, Any]], batch_size: int = None) -> Any: ...
143
    def transform_iter(self, inp: Iterator[Dict[str, Any]]) -> Iterator[Dict[str, Any]]: ...
144
    def fit(self, topics_or_res_tr: pd.DataFrame, qrels_tr: pd.DataFrame, 
145
            topics_or_res_va: pd.DataFrame = None, qrels_va: pd.DataFrame = None) -> None: ...
146

147
class RankCutoff(Transformer):
148
    """Rank cutoff operator (%). Limits results to top-k documents."""
149
    def __init__(self, k: int = 1000): ...
150

151
class FeatureUnion(Transformer):  
152
    """Feature union operator (**). Combines features from multiple transformers."""
153
    def __init__(self, *transformers: Transformer): ...
154

155
class Sum(Transformer):
156
    """Score addition operator (+). Adds scores from multiple transformers."""
157
    def __init__(self, left: Transformer, right: Transformer): ...
158

159
class SetUnion(Transformer):
160
    """Set union operator (|). Union of documents from multiple transformers."""
161
    def __init__(self, left: Transformer, right: Transformer): ...
162

163
class SetIntersection(Transformer):
164
    """Set intersection operator (&). Intersection of documents from multiple transformers."""
165
    def __init__(self, left: Transformer, right: Transformer): ...
166

167
class Concatenate(Transformer):
168
    """Concatenation operator (^). Concatenates results from multiple transformers."""
169
    def __init__(self, left: Transformer, right: Transformer): ...
170

171
class ScalarProduct(Transformer): 
172
    """Scalar multiplication operator (*). Multiplies scores by a constant factor."""
173
    def __init__(self, scalar: float): ...
174
```
175

176
### Apply Interface
177

178
Dynamic transformer creation interface for building custom transformers from functions.
179

180
```python { .api }
181
# Apply interface methods accessed via pt.apply.*
182
def query(fn: Callable[[Union[pd.Series, Dict[str, Any]]], str], *args, **kwargs) -> Transformer: ...
183
def doc_score(fn: Union[Callable[[Union[pd.Series, Dict[str, Any]]], float], 
184
                       Callable[[pd.DataFrame], Sequence[float]]], 
185
              *args, batch_size: Optional[int] = None, **kwargs) -> Transformer: ...
186
def doc_features(fn: Callable[[Union[pd.Series, Dict[str, Any]]], npt.NDArray[Any]], 
187
                 *args, **kwargs) -> Transformer: ...
188
def indexer(fn: Callable[[Iterator[Dict[str, Any]]], Any], **kwargs) -> Indexer: ...
189
def rename(columns: Dict[str, str], *args, errors: Literal['raise', 'ignore'] = 'raise', **kwargs) -> Transformer: ...
190
def generic(fn: Union[Callable[[pd.DataFrame], pd.DataFrame], 
191
                      Callable[[Iterator[Dict[str, Any]]], Iterator[Dict[str, Any]]]], 
192
            *args, batch_size: Optional[int] = None, iter: bool = False, **kwargs) -> Transformer: ...
193
def by_query(fn: Union[Callable[[pd.DataFrame], pd.DataFrame], 
194
                       Callable[[Iterator[Dict[str, Any]]], Iterator[Dict[str, Any]]]], 
195
             *args, batch_size: Optional[int] = None, iter: bool = False, 
196
             verbose: bool = False, **kwargs) -> Transformer: ...
197
```
198

199
**Usage Examples:**
200

201
```python
202
# Create custom query transformer
203
query_expander = pt.apply.query(lambda q: q["query"] + " information retrieval")
204

205
# Create custom scoring transformer (row-wise)
206
score_booster = pt.apply.doc_score(lambda row: row["score"] * 2)
207

208
# Create custom feature transformer
209
feature_extractor = pt.apply.doc_features(lambda row: np.array([len(row["text"])]))
210

211
# Column renaming transformer
212
renamer = pt.apply.rename({'old_column': 'new_column'})
213

214
# Batch-wise scoring transformer
215
def batch_scorer(df):
216
    return df["score"] * 2
217
batch_score_booster = pt.apply.doc_score(batch_scorer, batch_size=128)
218
```
219

220
## Design Patterns
221

222
### Operator Overloading
223

224
PyTerrier's operator overloading enables intuitive pipeline construction:
225

226
- `>>`: Sequential composition (pipe operator)
227
- `+`: Score addition for late fusion
228
- `**`: Feature union for combining features  
229
- `|`: Set union for combining document sets
230
- `&`: Set intersection for filtering results
231
- `%`: Rank cutoff for limiting results
232
- `^`: Result concatenation
233
- `*`: Score multiplication by constant factor
234

235
### Dual API Support
236

237
Most transformers support both DataFrame and iterator interfaces:
238

239
- `transform(df)`: Process pandas DataFrame (preferred for most use cases)
240
- `transform_iter(iter)`: Process iterator of dictionaries (memory efficient for large datasets)
241

242
### Parameter Management
243

244
Transformers support dynamic parameter access:
245

246
- `get_parameter(name)`: Retrieve parameter value
247
- `set_parameter(name, value)`: Update parameter value
248

249
This enables parameter tuning and grid search functionality.
250

251
## Types
252

253
```python { .api }
254
from typing import Dict, List, Any, Iterator, Callable, Union, Optional
255
import pandas as pd
256

257
# Common type aliases
258
IterDictRecord = Dict[str, Any]
259
IterDict = Iterator[IterDictRecord]
260
TransformerLike = Union[Transformer, Callable[[pd.DataFrame], pd.DataFrame]]
261
```

Version

Tile

Files

transformers.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

transformers.mddocs/