Tessl Tile for pypi/catboost@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-catboost

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library used for ranking, classification, regression and other ML tasks.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/catboost@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-catboost@1.2.0

0
# CatBoost
1

2
CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks. CatBoost provides superior quality compared to other GBDT libraries, best-in-class prediction speed, native GPU and multi-GPU support, built-in visualization tools, and distributed training capabilities.
3

4
## Package Information
5

6
- **Package Name**: catboost
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install catboost`
10

11
## Core Imports
12

13
```python
14
import catboost
15
```
16

17
Common for working with models:
18

19
```python
20
from catboost import CatBoostClassifier, CatBoostRegressor, CatBoostRanker
21
from catboost import Pool, cv, train
22
```
23

24
Submodule imports:
25

26
```python
27
# Dataset utilities
28
from catboost import datasets
29
# or specific functions
30
from catboost.datasets import titanic, adult, amazon
31

32
# Utility functions
33
from catboost import utils
34
# or specific functions  
35
from catboost.utils import eval_metric, get_roc_curve, create_cd
36

37
# Evaluation framework
38
from catboost import eval
39
# or specific classes
40
from catboost.eval import CatboostEvaluation, EvaluationResults
41

42
# Metrics framework
43
from catboost import metrics
44
# or specific metrics
45
from catboost.metrics import Logloss, AUC, RMSE
46

47
# Text processing
48
from catboost.text_processing import Tokenizer, Dictionary
49

50
# Model interpretation
51
from catboost.monoforest import to_polynom, explain_features
52
```
53

54
## Basic Usage
55

56
```python
57
from catboost import CatBoostClassifier, Pool
58
import pandas as pd
59
import numpy as np
60

61
# Prepare data
62
train_data = pd.DataFrame({
63
    'feature1': np.random.randn(1000),
64
    'feature2': np.random.randn(1000),
65
    'category': np.random.choice(['A', 'B', 'C'], 1000)
66
})
67
train_labels = np.random.randint(0, 2, 1000)
68

69
# Create CatBoost pool with categorical features
70
train_pool = Pool(
71
    data=train_data,
72
    label=train_labels,
73
    cat_features=['category']
74
)
75

76
# Initialize and train classifier
77
model = CatBoostClassifier(
78
    iterations=100,
79
    learning_rate=0.1,
80
    depth=6,
81
    verbose=True
82
)
83

84
model.fit(train_pool)
85

86
# Make predictions
87
predictions = model.predict(train_data)
88
probabilities = model.predict_proba(train_data)
89

90
# Get feature importance
91
feature_importance = model.get_feature_importance()
92
```
93

94
## Architecture
95

96
CatBoost is built around several key components:
97

98
- **Model Classes**: CatBoost, CatBoostClassifier, CatBoostRegressor, and CatBoostRanker provide different interfaces for gradient boosting tasks
99
- **Data Handling**: Pool class efficiently manages training data with categorical features, text features, and metadata
100
- **Training Pipeline**: Support for cross-validation, hyperparameter tuning, and early stopping
101
- **Feature Analysis**: Comprehensive feature importance, SHAP values, and automatic feature selection
102
- **GPU Acceleration**: Native GPU support for training and prediction across multiple devices
103

104
## Capabilities
105

106
### Core Model Classes
107

108
Scikit-learn compatible classifier, regressor, and ranker implementations with the base CatBoost class providing the core gradient boosting functionality.
109

110
```python { .api }
111
class CatBoostClassifier:
112
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0, 
113
                 loss_function='Logloss', **kwargs): ...
114
    def fit(self, X, y, cat_features=None, sample_weight=None, baseline=None, 
115
            use_best_model=None, eval_set=None, **kwargs): ...
116
    def predict(self, data, prediction_type='Class', **kwargs): ...
117
    def predict_proba(self, X, **kwargs): ...
118

119
class CatBoostRegressor:
120
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,
121
                 loss_function='RMSE', **kwargs): ...
122
    def fit(self, X, y, **kwargs): ...
123
    def predict(self, data, **kwargs): ...
124

125
class CatBoostRanker:
126
    def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,
127
                 loss_function='YetiRank', **kwargs): ...
128
    def fit(self, X, y, **kwargs): ...
129
    def predict(self, data, **kwargs): ...
130
```
131

132
[Core Model Classes](./core-models.md)
133

134
### Data Handling
135

136
Pool class and FeaturesData for efficient data management with categorical features, text features, embeddings, and metadata like groups and weights.
137

138
```python { .api }
139
class Pool:
140
    def __init__(self, data, label=None, cat_features=None, text_features=None,
141
                 embedding_features=None, column_description=None, pairs=None, 
142
                 delimiter='\t', has_header=False, weight=None, group_id=None, 
143
                 **kwargs): ...
144
    def slice(self, rindex): ...
145
    def save(self, fname): ...
146
    def quantize(self, **kwargs): ...
147

148
class FeaturesData:
149
    # Container for feature data with metadata
150
    ...
151
```
152

153
[Data Handling](./data-handling.md)
154

155
### Training and Evaluation
156

157
Cross-validation, training functions, and model evaluation utilities for comprehensive model development and assessment.
158

159
```python { .api }
160
def train(pool, params=None, dtrain=None, logging_level=None, verbose=None, 
161
          iterations=None, **kwargs): ...
162

163
def cv(pool, params=None, dtrain=None, iterations=None, num_boost_round=None,
164
       fold_count=3, inverted=False, shuffle=True, partition_random_seed=0,
165
       stratified=None, **kwargs): ...
166

167
def sample_gaussian_process(X, y, **kwargs): ...
168
```
169

170
[Training and Evaluation](./training-evaluation.md)
171

172
### Feature Analysis
173

174
Feature importance calculation, SHAP values, feature selection algorithms, and interpretability tools for understanding model behavior.
175

176
```python { .api }
177
# Enums for feature analysis
178
class EFstrType:
179
    PredictionValuesChange = 0
180
    LossFunctionChange = 1
181
    FeatureImportance = 2
182
    Interaction = 3
183
    ShapValues = 4
184
    PredictionDiff = 5
185
    ShapInteractionValues = 6
186
    SageValues = 7
187

188
class EShapCalcType:
189
    Regular = "Regular"
190
    Approximate = "Approximate"
191
    Exact = "Exact"
192

193
class EFeaturesSelectionAlgorithm:
194
    RecursiveByPredictionValuesChange = "RecursiveByPredictionValuesChange"
195
    RecursiveByLossFunctionChange = "RecursiveByLossFunctionChange"
196
    RecursiveByShapValues = "RecursiveByShapValues"
197

198
class EFeaturesSelectionGrouping:
199
    Individual = "Individual"
200
    ByTags = "ByTags"
201
```
202

203
[Feature Analysis](./feature-analysis.md)
204

205
### Utility Functions
206

207
Model conversion, GPU utilities, metric evaluation, confusion matrices, ROC curves, and threshold selection tools.
208

209
```python { .api }
210
def sum_models(models, weights=None, ctr_merge_policy='IntersectingCountersAverage'): ...
211
def to_regressor(model): ...
212
def to_classifier(model): ...
213
def to_ranker(model): ...
214

215
# From catboost.utils
216
def eval_metric(label, approx, metric, weight=None, group_id=None, **kwargs): ...
217
def get_gpu_device_count(): ...
218
def get_confusion_matrix(model, data, thread_count=-1): ...
219
def get_roc_curve(model, data, thread_count=-1, plot=False): ...
220
def select_threshold(model, data, curve=None, FPR=None, FNR=None, thread_count=-1): ...
221
```
222

223
[Utilities](./utilities.md)
224

225
### Dataset Utilities
226

227
Built-in datasets for testing and learning, including Titanic, Amazon, IMDB, Adult, Higgs, and ranking datasets.
228

229
```python { .api }
230
# From catboost.datasets
231
def titanic(): ...
232
def amazon(): ...
233
def adult(): ...
234
def imdb(): ...
235
def higgs(): ...
236
def msrank(): ...
237
def msrank_10k(): ...
238
def epsilon(): ...
239
def rotten_tomatoes(): ...
240
def monotonic1(): ...
241
def monotonic2(): ...
242
def set_cache_path(path): ...
243
```
244

245
[Dataset Utilities](./datasets.md)
246

247
### Visualization
248

249
Interactive widgets for Jupyter notebooks, metrics plotting, and compatibility with XGBoost and LightGBM plotting callbacks.
250

251
```python { .api }
252
# From catboost.widget (conditionally imported)
253
class MetricVisualizer:
254
    # Interactive metric visualization widget for Jupyter
255
    ...
256

257
class MetricsPlotter:
258
    # Plotting utility for training metrics
259
    ...
260

261
def XGBPlottingCallback(): ...
262
def lgbm_plotting_callback(): ...
263
```
264

265
[Visualization](./visualization.md)
266

267
### Advanced Features
268

269
Text processing, monoforest model interpretation, custom metrics and objectives for specialized use cases.
270

271
```python { .api }
272
# Custom metrics and objectives
273
class MultiRegressionCustomMetric: ...
274
class MultiRegressionCustomObjective: ...
275
class MultiTargetCustomMetric: ...  # Alias
276
class MultiTargetCustomObjective: ...  # Alias
277

278
# From catboost.text_processing
279
class Tokenizer: ...
280
class Dictionary: ...
281

282
# From catboost.monoforest
283
def to_polynom(model): ...
284
def to_polynom_string(model): ...
285
def explain_features(model): ...
286
class FeatureExplanation: ...
287
```
288

289
[Advanced Features](./advanced-features.md)
290

291
### Model Evaluation Framework
292

293
Comprehensive evaluation framework for statistical testing, performance comparisons, and model validation with confidence intervals.
294

295
```python { .api }
296
# From catboost.eval
297
class EvalType: ...
298
class CatboostEvaluation: ...
299
class ScoreType: ...
300
class ScoreConfig: ...
301
class CaseEvaluationResult: ...
302
class MetricEvaluationResult: ...
303
class EvaluationResults: ...
304
class ExecutionCase: ...
305

306
def calc_wilcoxon_test(): ...
307
def calc_bootstrap_ci_for_mean(): ...
308
def make_dirs_if_not_exists(): ...
309
def series_to_line(): ...
310
def save_plot(): ...
311
```
312

313
[Model Evaluation Framework](./evaluation.md)
314

315
### Metrics Framework
316

317
Dynamic metric classes for evaluating model performance across classification, regression, and ranking tasks.
318

319
```python { .api }
320
# From catboost.metrics
321
class BuiltinMetric:
322
    def eval(self, label, approx, weight=None, group_id=None, **kwargs): ...
323
    def is_max_optimal(self): ...
324
    def is_min_optimal(self): ...
325
    def set_hints(self, **hints): ...
326
    @staticmethod
327
    def params_with_defaults(): ...
328

329
# Dynamically generated metric classes (examples)
330
class Logloss(BuiltinMetric): ...
331
class CrossEntropy(BuiltinMetric): ...
332
class Accuracy(BuiltinMetric): ...
333
class AUC(BuiltinMetric): ...
334
class RMSE(BuiltinMetric): ...
335
class MAE(BuiltinMetric): ...
336
class NDCG(BuiltinMetric): ...
337
class MAP(BuiltinMetric): ...
338
```
339

340
[Metrics Framework](./metrics.md)
341

342
## Constants and Exceptions
343

344
```python { .api }
345
class CatBoostError(Exception):
346
    """Main exception class for CatBoost errors."""
347
    ...
348

349
# Compatibility alias
350
CatboostError = CatBoostError
351

352
__version__: str  # Currently '1.2.8'
353
```