or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-catboost

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library used for ranking, classification, regression and other ML tasks.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/catboost@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-catboost@1.2.0

0

# CatBoost

1

2

CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks. CatBoost provides superior quality compared to other GBDT libraries, best-in-class prediction speed, native GPU and multi-GPU support, built-in visualization tools, and distributed training capabilities.

3

4

## Package Information

5

6

- **Package Name**: catboost

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install catboost`

10

11

## Core Imports

12

13

```python

14

import catboost

15

```

16

17

Common for working with models:

18

19

```python

20

from catboost import CatBoostClassifier, CatBoostRegressor, CatBoostRanker

21

from catboost import Pool, cv, train

22

```

23

24

Submodule imports:

25

26

```python

27

# Dataset utilities

28

from catboost import datasets

29

# or specific functions

30

from catboost.datasets import titanic, adult, amazon

31

32

# Utility functions

33

from catboost import utils

34

# or specific functions

35

from catboost.utils import eval_metric, get_roc_curve, create_cd

36

37

# Evaluation framework

38

from catboost import eval

39

# or specific classes

40

from catboost.eval import CatboostEvaluation, EvaluationResults

41

42

# Metrics framework

43

from catboost import metrics

44

# or specific metrics

45

from catboost.metrics import Logloss, AUC, RMSE

46

47

# Text processing

48

from catboost.text_processing import Tokenizer, Dictionary

49

50

# Model interpretation

51

from catboost.monoforest import to_polynom, explain_features

52

```

53

54

## Basic Usage

55

56

```python

57

from catboost import CatBoostClassifier, Pool

58

import pandas as pd

59

import numpy as np

60

61

# Prepare data

62

train_data = pd.DataFrame({

63

'feature1': np.random.randn(1000),

64

'feature2': np.random.randn(1000),

65

'category': np.random.choice(['A', 'B', 'C'], 1000)

66

})

67

train_labels = np.random.randint(0, 2, 1000)

68

69

# Create CatBoost pool with categorical features

70

train_pool = Pool(

71

data=train_data,

72

label=train_labels,

73

cat_features=['category']

74

)

75

76

# Initialize and train classifier

77

model = CatBoostClassifier(

78

iterations=100,

79

learning_rate=0.1,

80

depth=6,

81

verbose=True

82

)

83

84

model.fit(train_pool)

85

86

# Make predictions

87

predictions = model.predict(train_data)

88

probabilities = model.predict_proba(train_data)

89

90

# Get feature importance

91

feature_importance = model.get_feature_importance()

92

```

93

94

## Architecture

95

96

CatBoost is built around several key components:

97

98

- **Model Classes**: CatBoost, CatBoostClassifier, CatBoostRegressor, and CatBoostRanker provide different interfaces for gradient boosting tasks

99

- **Data Handling**: Pool class efficiently manages training data with categorical features, text features, and metadata

100

- **Training Pipeline**: Support for cross-validation, hyperparameter tuning, and early stopping

101

- **Feature Analysis**: Comprehensive feature importance, SHAP values, and automatic feature selection

102

- **GPU Acceleration**: Native GPU support for training and prediction across multiple devices

103

104

## Capabilities

105

106

### Core Model Classes

107

108

Scikit-learn compatible classifier, regressor, and ranker implementations with the base CatBoost class providing the core gradient boosting functionality.

109

110

```python { .api }

111

class CatBoostClassifier:

112

def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,

113

loss_function='Logloss', **kwargs): ...

114

def fit(self, X, y, cat_features=None, sample_weight=None, baseline=None,

115

use_best_model=None, eval_set=None, **kwargs): ...

116

def predict(self, data, prediction_type='Class', **kwargs): ...

117

def predict_proba(self, X, **kwargs): ...

118

119

class CatBoostRegressor:

120

def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,

121

loss_function='RMSE', **kwargs): ...

122

def fit(self, X, y, **kwargs): ...

123

def predict(self, data, **kwargs): ...

124

125

class CatBoostRanker:

126

def __init__(self, iterations=500, learning_rate=None, depth=6, l2_leaf_reg=3.0,

127

loss_function='YetiRank', **kwargs): ...

128

def fit(self, X, y, **kwargs): ...

129

def predict(self, data, **kwargs): ...

130

```

131

132

[Core Model Classes](./core-models.md)

133

134

### Data Handling

135

136

Pool class and FeaturesData for efficient data management with categorical features, text features, embeddings, and metadata like groups and weights.

137

138

```python { .api }

139

class Pool:

140

def __init__(self, data, label=None, cat_features=None, text_features=None,

141

embedding_features=None, column_description=None, pairs=None,

142

delimiter='\t', has_header=False, weight=None, group_id=None,

143

**kwargs): ...

144

def slice(self, rindex): ...

145

def save(self, fname): ...

146

def quantize(self, **kwargs): ...

147

148

class FeaturesData:

149

# Container for feature data with metadata

150

...

151

```

152

153

[Data Handling](./data-handling.md)

154

155

### Training and Evaluation

156

157

Cross-validation, training functions, and model evaluation utilities for comprehensive model development and assessment.

158

159

```python { .api }

160

def train(pool, params=None, dtrain=None, logging_level=None, verbose=None,

161

iterations=None, **kwargs): ...

162

163

def cv(pool, params=None, dtrain=None, iterations=None, num_boost_round=None,

164

fold_count=3, inverted=False, shuffle=True, partition_random_seed=0,

165

stratified=None, **kwargs): ...

166

167

def sample_gaussian_process(X, y, **kwargs): ...

168

```

169

170

[Training and Evaluation](./training-evaluation.md)

171

172

### Feature Analysis

173

174

Feature importance calculation, SHAP values, feature selection algorithms, and interpretability tools for understanding model behavior.

175

176

```python { .api }

177

# Enums for feature analysis

178

class EFstrType:

179

PredictionValuesChange = 0

180

LossFunctionChange = 1

181

FeatureImportance = 2

182

Interaction = 3

183

ShapValues = 4

184

PredictionDiff = 5

185

ShapInteractionValues = 6

186

SageValues = 7

187

188

class EShapCalcType:

189

Regular = "Regular"

190

Approximate = "Approximate"

191

Exact = "Exact"

192

193

class EFeaturesSelectionAlgorithm:

194

RecursiveByPredictionValuesChange = "RecursiveByPredictionValuesChange"

195

RecursiveByLossFunctionChange = "RecursiveByLossFunctionChange"

196

RecursiveByShapValues = "RecursiveByShapValues"

197

198

class EFeaturesSelectionGrouping:

199

Individual = "Individual"

200

ByTags = "ByTags"

201

```

202

203

[Feature Analysis](./feature-analysis.md)

204

205

### Utility Functions

206

207

Model conversion, GPU utilities, metric evaluation, confusion matrices, ROC curves, and threshold selection tools.

208

209

```python { .api }

210

def sum_models(models, weights=None, ctr_merge_policy='IntersectingCountersAverage'): ...

211

def to_regressor(model): ...

212

def to_classifier(model): ...

213

def to_ranker(model): ...

214

215

# From catboost.utils

216

def eval_metric(label, approx, metric, weight=None, group_id=None, **kwargs): ...

217

def get_gpu_device_count(): ...

218

def get_confusion_matrix(model, data, thread_count=-1): ...

219

def get_roc_curve(model, data, thread_count=-1, plot=False): ...

220

def select_threshold(model, data, curve=None, FPR=None, FNR=None, thread_count=-1): ...

221

```

222

223

[Utilities](./utilities.md)

224

225

### Dataset Utilities

226

227

Built-in datasets for testing and learning, including Titanic, Amazon, IMDB, Adult, Higgs, and ranking datasets.

228

229

```python { .api }

230

# From catboost.datasets

231

def titanic(): ...

232

def amazon(): ...

233

def adult(): ...

234

def imdb(): ...

235

def higgs(): ...

236

def msrank(): ...

237

def msrank_10k(): ...

238

def epsilon(): ...

239

def rotten_tomatoes(): ...

240

def monotonic1(): ...

241

def monotonic2(): ...

242

def set_cache_path(path): ...

243

```

244

245

[Dataset Utilities](./datasets.md)

246

247

### Visualization

248

249

Interactive widgets for Jupyter notebooks, metrics plotting, and compatibility with XGBoost and LightGBM plotting callbacks.

250

251

```python { .api }

252

# From catboost.widget (conditionally imported)

253

class MetricVisualizer:

254

# Interactive metric visualization widget for Jupyter

255

...

256

257

class MetricsPlotter:

258

# Plotting utility for training metrics

259

...

260

261

def XGBPlottingCallback(): ...

262

def lgbm_plotting_callback(): ...

263

```

264

265

[Visualization](./visualization.md)

266

267

### Advanced Features

268

269

Text processing, monoforest model interpretation, custom metrics and objectives for specialized use cases.

270

271

```python { .api }

272

# Custom metrics and objectives

273

class MultiRegressionCustomMetric: ...

274

class MultiRegressionCustomObjective: ...

275

class MultiTargetCustomMetric: ... # Alias

276

class MultiTargetCustomObjective: ... # Alias

277

278

# From catboost.text_processing

279

class Tokenizer: ...

280

class Dictionary: ...

281

282

# From catboost.monoforest

283

def to_polynom(model): ...

284

def to_polynom_string(model): ...

285

def explain_features(model): ...

286

class FeatureExplanation: ...

287

```

288

289

[Advanced Features](./advanced-features.md)

290

291

### Model Evaluation Framework

292

293

Comprehensive evaluation framework for statistical testing, performance comparisons, and model validation with confidence intervals.

294

295

```python { .api }

296

# From catboost.eval

297

class EvalType: ...

298

class CatboostEvaluation: ...

299

class ScoreType: ...

300

class ScoreConfig: ...

301

class CaseEvaluationResult: ...

302

class MetricEvaluationResult: ...

303

class EvaluationResults: ...

304

class ExecutionCase: ...

305

306

def calc_wilcoxon_test(): ...

307

def calc_bootstrap_ci_for_mean(): ...

308

def make_dirs_if_not_exists(): ...

309

def series_to_line(): ...

310

def save_plot(): ...

311

```

312

313

[Model Evaluation Framework](./evaluation.md)

314

315

### Metrics Framework

316

317

Dynamic metric classes for evaluating model performance across classification, regression, and ranking tasks.

318

319

```python { .api }

320

# From catboost.metrics

321

class BuiltinMetric:

322

def eval(self, label, approx, weight=None, group_id=None, **kwargs): ...

323

def is_max_optimal(self): ...

324

def is_min_optimal(self): ...

325

def set_hints(self, **hints): ...

326

@staticmethod

327

def params_with_defaults(): ...

328

329

# Dynamically generated metric classes (examples)

330

class Logloss(BuiltinMetric): ...

331

class CrossEntropy(BuiltinMetric): ...

332

class Accuracy(BuiltinMetric): ...

333

class AUC(BuiltinMetric): ...

334

class RMSE(BuiltinMetric): ...

335

class MAE(BuiltinMetric): ...

336

class NDCG(BuiltinMetric): ...

337

class MAP(BuiltinMetric): ...

338

```

339

340

[Metrics Framework](./metrics.md)

341

342

## Constants and Exceptions

343

344

```python { .api }

345

class CatBoostError(Exception):

346

"""Main exception class for CatBoost errors."""

347

...

348

349

# Compatibility alias

350

CatboostError = CatBoostError

351

352

__version__: str # Currently '1.2.8'

353

```