Tessl Tile for pypi/xgboost-cpu@3.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-models.md distributed-computing.md index.md sklearn-interface.md training-evaluation.md utilities.md

utilities.mddocs/

0
# Utilities and Visualization
1

2
Utility functions for model interpretation, configuration management, visualization, and distributed communication. These tools help understand model behavior, manage XGBoost settings, create visual insights, and coordinate distributed training.
3

4
## Capabilities
5

6
### Model Visualization
7

8
Comprehensive visualization tools for understanding model behavior, feature importance, and decision tree structure. These functions integrate with matplotlib and graphviz for publication-quality plots.
9

10
```python { .api }
11
def plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, 
12
                   title='Feature importance', xlabel='F score', 
13
                   ylabel='Features', fmap='', importance_type='weight', 
14
                   max_num_features=None, grid=True, show_values=True, 
15
                   values_format='{v}', **kwargs):
16
    """
17
    Plot feature importance based on fitted trees.
18
    
19
    Parameters:
20
    - booster: Trained XGBoost model (Booster)
21
    - ax: Matplotlib axes object (matplotlib.axes.Axes, optional)
22
    - height: Bar height for horizontal bar plot (float)
23
    - xlim: X-axis limits as (xmin, xmax) (tuple, optional)
24
    - ylim: Y-axis limits as (ymin, ymax) (tuple, optional)
25
    - title: Plot title (str)
26
    - xlabel: X-axis label (str)
27
    - ylabel: Y-axis label (str)
28
    - fmap: Feature map file path (str)
29
    - importance_type: Type of importance to plot (str)
30
        Options: 'weight', 'gain', 'cover', 'total_gain', 'total_cover'
31
    - max_num_features: Maximum number of top features to display (int, optional)
32
    - grid: Whether to show grid lines (bool)
33
    - show_values: Whether to show importance values on bars (bool)
34
    - values_format: Format string for importance values (str)
35
    - **kwargs: Additional arguments passed to matplotlib.pyplot.barh
36
    
37
    Returns: matplotlib.axes.Axes - The plot axes object
38
    """
39

40
def plot_tree(booster, fmap='', num_trees=0, rankdir=None, ax=None, 
41
              tree_idx=0, show_info=None, precision=None, **kwargs):
42
    """
43
    Plot specified tree using matplotlib.
44
    
45
    Parameters:
46
    - booster: Trained XGBoost model (Booster)
47
    - fmap: Feature map file path (str)
48
    - num_trees: Tree index to plot (deprecated, use tree_idx) (int)
49
    - rankdir: Direction of plot layout ('UT', 'LR', 'BT', 'RL') (str, optional)
50
    - ax: Matplotlib axes object (matplotlib.axes.Axes, optional)
51
    - tree_idx: Index of tree to plot (int)
52
    - show_info: Information to show in nodes (list of str, optional)
53
        Options include: 'split', 'gain', 'cover', 'weight'
54
    - precision: Number of decimal places for floating point values (int, optional)
55
    - **kwargs: Additional arguments for graphviz layout
56
    
57
    Returns: matplotlib.axes.Axes - The plot axes object
58
    """
59

60
def to_graphviz(booster, fmap='', num_trees=0, rankdir=None, 
61
                yes_color=None, no_color=None, condition_node_params=None, 
62
                leaf_node_params=None, **kwargs):
63
    """
64
    Convert specified tree to graphviz instance for advanced visualization.
65
    
66
    Parameters:
67
    - booster: Trained XGBoost model (Booster)
68
    - fmap: Feature map file path (str)
69
    - num_trees: Tree index (deprecated, use tree_idx in kwargs) (int)
70
    - rankdir: Direction of tree layout ('UT', 'LR', 'BT', 'RL') (str, optional)
71
    - yes_color: Color for 'yes' edges (str, optional)
72
    - no_color: Color for 'no' edges (str, optional)
73
    - condition_node_params: Parameters for condition nodes (dict, optional)
74
    - leaf_node_params: Parameters for leaf nodes (dict, optional)
75
    - **kwargs: Additional parameters including:
76
        - tree_idx: Index of tree to visualize (int)
77
        - with_stats: Whether to include node statistics (bool)
78
    
79
    Returns: graphviz.Source - Graphviz source object for rendering
80
    """
81
```
82

83
### Configuration Management
84

85
Global configuration management for XGBoost behavior, including device selection, verbosity levels, and algorithm parameters that affect all XGBoost operations.
86

87
```python { .api }
88
def set_config(**new_config):
89
    """
90
    Set global XGBoost configuration parameters.
91
    
92
    Parameters:
93
    - **new_config: Configuration parameters as keyword arguments
94
        Common parameters:
95
        - verbosity: Global verbosity level (int, 0-3)
96
            0=silent, 1=warning, 2=info, 3=debug
97
        - use_rmm: Whether to use RMM memory allocator (bool)
98
        - nthread: Global number of threads (int)
99
        - device: Global device setting ('cpu', 'cuda', 'gpu') (str)
100
    
101
    Example configurations:
102
        set_config(verbosity=2, device='cpu', nthread=4)
103
        set_config(use_rmm=True)  # For GPU memory management
104
    """
105

106
def get_config():
107
    """
108
    Get current global XGBoost configuration values.
109
    
110
    Returns: dict - Dictionary containing all current configuration parameters
111
        Keys include: 'verbosity', 'use_rmm', 'nthread', 'device', etc.
112
    """
113

114
def config_context(**new_config):
115
    """
116
    Context manager for temporary XGBoost configuration changes.
117
    
118
    Parameters:
119
    - **new_config: Temporary configuration parameters
120
    
121
    Returns: Context manager that restores previous config on exit
122
    
123
    Usage:
124
        with config_context(verbosity=0, device='cpu'):
125
            # XGBoost operations with temporary config
126
            model = xgb.train(params, dtrain, num_boost_round=100)
127
        # Previous configuration restored automatically
128
    """
129
```
130

131
### Collective Communication
132

133
Low-level distributed communication primitives for custom distributed training setups. These functions enable coordination between multiple workers in distributed environments.
134

135
```python { .api }
136
import xgboost.collective as collective
137

138
def collective.init(config):
139
    """
140
    Initialize collective communication library.
141
    
142
    Parameters:
143
    - config: Configuration for collective communication (collective.Config)
144
    """
145

146
def collective.finalize():
147
    """Finalize collective communication and clean up resources."""
148

149
def collective.get_rank():
150
    """
151
    Get rank (ID) of current process in distributed setup.
152
    
153
    Returns: int - Process rank (0-based indexing)
154
    """
155

156
def collective.get_world_size():
157
    """
158
    Get total number of workers in distributed setup.
159
    
160
    Returns: int - Total number of processes
161
    """
162

163
def collective.is_distributed():
164
    """
165
    Check if running in distributed mode.
166
    
167
    Returns: bool - True if distributed, False if single process
168
    """
169

170
def collective.communicator_print(msg):
171
    """
172
    Print message from the communicator with rank information.
173
    
174
    Parameters:
175
    - msg: Message to print (str)
176
    """
177

178
def collective.get_processor_name():
179
    """
180
    Get name of the processor/node.
181
    
182
    Returns: str - Processor/hostname identifier
183
    """
184

185
def collective.broadcast(data, root):
186
    """
187
    Broadcast object from root process to all other processes.
188
    
189
    Parameters:
190
    - data: Data to broadcast (any serializable object)
191
    - root: Rank of root process (int)
192
    
193
    Returns: object - Broadcasted data (same on all processes)
194
    """
195

196
def collective.allreduce(data, op):
197
    """
198
    Perform allreduce operation across all processes.
199
    
200
    Parameters:
201
    - data: Data to reduce (numeric array-like)
202
    - op: Reduction operation (collective.Op)
203
        Options: Op.MAX, Op.MIN, Op.SUM, Op.BITWISE_AND, Op.BITWISE_OR, Op.BITWISE_XOR
204
    
205
    Returns: object - Reduced result (same on all processes)
206
    """
207

208
def collective.signal_error(msg):
209
    """
210
    Signal error condition to terminate all processes.
211
    
212
    Parameters:
213
    - msg: Error message (str)
214
    """
215

216
class collective.Config:
217
    def __init__(self, *, retry=3, timeout=300, tracker_host_ip=None, 
218
                 tracker_port=0, tracker_timeout=30):
219
        """
220
        Configuration for collective communication.
221
        
222
        Parameters:
223
        - retry: Number of connection retries (int)
224
        - timeout: Communication timeout in seconds (int)
225
        - tracker_host_ip: IP address of tracker (str, optional)
226
        - tracker_port: Port number for tracker (int)
227
        - tracker_timeout: Tracker connection timeout (int)
228
        """
229

230
class collective.CommunicatorContext:
231
    def __init__(self, **kwargs):
232
        """
233
        Context manager for collective communicator setup and cleanup.
234
        
235
        Parameters:
236
        - **kwargs: Arguments passed to collective.init()
237
        """
238

239
class collective.Op:
240
    """Enumeration of reduction operations for allreduce."""
241
    MAX = 0      # Maximum value
242
    MIN = 1      # Minimum value  
243
    SUM = 2      # Sum of values
244
    BITWISE_AND = 3  # Bitwise AND
245
    BITWISE_OR = 4   # Bitwise OR
246
    BITWISE_XOR = 5  # Bitwise XOR
247
```
248

249
### Distributed Coordination
250

251
High-level coordination utilities for distributed training setups, including worker synchronization and fault tolerance.
252

253
```python { .api }
254
from xgboost.tracker import RabitTracker
255

256
class RabitTracker:
257
    def __init__(self, n_workers, host_ip=None, port=0, *, sortby='process', 
258
                 timeout=3600):
259
        """
260
        Tracker for collective communication coordination between workers.
261
        
262
        Parameters:
263
        - n_workers: Number of worker processes (int)
264
        - host_ip: Host IP address for tracker (str, optional)
265
            If None, uses local machine IP
266
        - port: Port number for tracker (int, 0 for auto-assignment)
267
        - sortby: Method for sorting workers ('process' or 'ip') (str)
268
        - timeout: Maximum time to wait for workers (int, seconds)
269
        """
270
    
271
    def start(self):
272
        """
273
        Start the tracker server.
274
        
275
        Returns: dict - Connection information including:
276
            - 'host_ip': Tracker IP address
277
            - 'port': Tracker port number
278
        """
279
    
280
    def wait_for(self, timeout=None):
281
        """
282
        Wait for all workers to connect and complete training.
283
        
284
        Parameters:
285
        - timeout: Maximum wait time in seconds (int, optional)
286
            If None, uses the timeout from __init__
287
        
288
        Returns: bool - True if all workers completed successfully
289
        """
290
    
291
    def worker_args(self):
292
        """
293
        Get environment arguments for worker processes.
294
        
295
        Returns: dict - Environment variables for workers including:
296
            - 'DMLC_TRACKER_URI': Tracker URI
297
            - 'DMLC_TRACKER_PORT': Tracker port
298
            - 'DMLC_TASK_ID': Task ID (set per worker)
299
        """
300

301
def build_info():
302
    """
303
    Get build information for XGBoost installation.
304
    
305
    Returns: dict - Build configuration including:
306
        - 'USE_CUDA': Whether CUDA support is compiled
307
        - 'USE_NCCL': Whether NCCL support is available
308
        - 'COMPILER': Compiler used for building
309
        - 'BUILD_WITH_SHARED_PTR': Shared pointer support
310
        - And other compilation flags
311
    """
312
```
313

314
## Usage Examples
315

316
### Feature Importance Visualization
317

318
```python
319
import xgboost as xgb
320
import matplotlib.pyplot as plt
321
from sklearn.datasets import make_classification
322

323
# Create and train model
324
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, 
325
                          random_state=42)
326
feature_names = [f'feature_{i}' for i in range(20)]
327

328
dtrain = xgb.DMatrix(X, label=y, feature_names=feature_names)
329
params = {'objective': 'binary:logistic', 'max_depth': 6, 'learning_rate': 0.1}
330
model = xgb.train(params, dtrain, num_boost_round=100)
331

332
# Plot feature importance with different metrics
333
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
334

335
# Weight-based importance (frequency of splits)
336
xgb.plot_importance(model, ax=axes[0,0], importance_type='weight', 
337
                   max_num_features=10, title='Importance by Weight')
338

339
# Gain-based importance (average gain of splits)
340
xgb.plot_importance(model, ax=axes[0,1], importance_type='gain',
341
                   max_num_features=10, title='Importance by Gain')
342

343
# Cover-based importance (average coverage of splits)
344
xgb.plot_importance(model, ax=axes[1,0], importance_type='cover',
345
                   max_num_features=10, title='Importance by Cover')
346

347
# Total gain importance
348
xgb.plot_importance(model, ax=axes[1,1], importance_type='total_gain',
349
                   max_num_features=10, title='Importance by Total Gain')
350

351
plt.tight_layout()
352
plt.show()
353

354
# Customized importance plot
355
plt.figure(figsize=(10, 8))
356
xgb.plot_importance(model, 
357
                   height=0.5,
358
                   importance_type='gain',
359
                   max_num_features=15,
360
                   title='Top 15 Features by Information Gain',
361
                   xlabel='Information Gain',
362
                   grid=True,
363
                   show_values=True,
364
                   values_format='{v:.3f}',
365
                   color='skyblue',
366
                   edgecolor='navy')
367
plt.show()
368
```
369

370
### Tree Visualization
371

372
```python
373
import xgboost as xgb
374
import matplotlib.pyplot as plt
375

376
# Train a simple model for visualization
377
dtrain = xgb.DMatrix(X[:100], label=y[:100], feature_names=feature_names[:5])
378
simple_model = xgb.train({'max_depth': 3, 'objective': 'binary:logistic'}, 
379
                        dtrain, num_boost_round=3)
380

381
# Plot individual trees
382
fig, axes = plt.subplots(1, 3, figsize=(20, 6))
383

384
for i in range(3):
385
    xgb.plot_tree(simple_model, ax=axes[i], tree_idx=i, 
386
                 show_info=['split', 'gain'], 
387
                 precision=2)
388
    axes[i].set_title(f'Tree {i}')
389

390
plt.tight_layout()
391
plt.show()
392

393
# Create graphviz visualization for high-quality output
394
graphviz_tree = xgb.to_graphviz(simple_model, tree_idx=0, 
395
                               rankdir='TB',  # Top to bottom
396
                               yes_color='lightblue',
397
                               no_color='lightcoral',
398
                               condition_node_params={'shape': 'box', 'style': 'filled'},
399
                               leaf_node_params={'shape': 'ellipse', 'style': 'filled'})
400

401
# Save as PDF or PNG
402
graphviz_tree.render('xgb_tree_visualization', format='png', cleanup=True)
403
print("Tree visualization saved as 'xgb_tree_visualization.png'")
404

405
# Display in Jupyter notebook
406
# graphviz_tree.view()
407
```
408

409
### Configuration Management
410

411
```python
412
import xgboost as xgb
413

414
# Check current configuration
415
current_config = xgb.get_config()
416
print("Current XGBoost configuration:")
417
for key, value in current_config.items():
418
    print(f"  {key}: {value}")
419

420
# Set global configuration
421
xgb.set_config(verbosity=2,    # More verbose output
422
               nthread=4,      # Use 4 threads globally
423
               device='cpu')   # Force CPU usage
424

425
print(f"\nUpdated verbosity: {xgb.get_config()['verbosity']}")
426

427
# Use configuration context for temporary changes
428
print("\nTraining with temporary quiet configuration:")
429
with xgb.config_context(verbosity=0):  # Silent mode
430
    quiet_model = xgb.train({'objective': 'binary:logistic'}, dtrain, 
431
                           num_boost_round=10)
432
    print("Model trained silently")
433

434
print("Back to previous verbosity level")
435

436
# Configuration for GPU training (if available)
437
try:
438
    with xgb.config_context(device='cuda'):
439
        gpu_params = {'objective': 'binary:logistic', 'tree_method': 'gpu_hist'}
440
        gpu_model = xgb.train(gpu_params, dtrain, num_boost_round=10)
441
        print("GPU training completed")
442
except Exception as e:
443
    print(f"GPU training not available: {e}")
444

445
# Reset to default configuration
446
xgb.set_config(verbosity=1, device='cpu')
447
```
448

449
### Distributed Communication Example
450

451
```python
452
import xgboost as xgb
453
from xgboost import collective
454
import numpy as np
455

456
# Example of basic collective operations (typically run across multiple processes)
457
def distributed_example():
458
    """Example showing collective communication primitives."""
459
    
460
    # Initialize collective communication
461
    config = collective.Config(timeout=300, retry=3)
462
    
463
    with collective.CommunicatorContext(config=config):
464
        rank = collective.get_rank()
465
        world_size = collective.get_world_size()
466
        
467
        print(f"Process {rank} of {world_size}")
468
        print(f"Running on: {collective.get_processor_name()}")
469
        
470
        # Example data for each process
471
        local_data = np.array([rank + 1, rank * 2])
472
        
473
        # Broadcast data from rank 0 to all processes
474
        if rank == 0:
475
            broadcast_data = {'model_params': {'max_depth': 6, 'learning_rate': 0.1}}
476
        else:
477
            broadcast_data = None
478
        
479
        shared_params = collective.broadcast(broadcast_data, root=0)
480
        print(f"Rank {rank} received: {shared_params}")
481
        
482
        # Sum all local data across processes
483
        global_sum = collective.allreduce(local_data, collective.Op.SUM)
484
        print(f"Rank {rank} global sum: {global_sum}")
485
        
486
        # Find maximum across all processes
487
        global_max = collective.allreduce(local_data, collective.Op.MAX)
488
        print(f"Rank {rank} global max: {global_max}")
489

490
# Note: This would typically be run in a multi-process environment
491
# distributed_example()
492
```
493

494
### RabitTracker for Distributed Training
495

496
```python
497
import xgboost as xgb
498
from xgboost.tracker import RabitTracker
499
import multiprocessing as mp
500
import os
501

502
def worker_process(worker_id, tracker_args, data_partition):
503
    """Worker process for distributed training."""
504
    
505
    # Set up environment for this worker
506
    os.environ.update(tracker_args)
507
    os.environ['DMLC_TASK_ID'] = str(worker_id)
508
    
509
    # Initialize collective communication
510
    collective_config = collective.Config()
511
    collective.init(collective_config)
512
    
513
    try:
514
        # Create local DMatrix from data partition
515
        X_local, y_local = data_partition
516
        dtrain_local = xgb.DMatrix(X_local, label=y_local)
517
        
518
        # Training parameters
519
        params = {
520
            'objective': 'binary:logistic',
521
            'max_depth': 6,
522
            'learning_rate': 0.1,
523
            'tree_method': 'hist'
524
        }
525
        
526
        # Distributed training
527
        model = xgb.train(params, dtrain_local, num_boost_round=50)
528
        
529
        print(f"Worker {worker_id} completed training")
530
        return model
531
        
532
    finally:
533
        collective.finalize()
534

535
def distributed_training_example():
536
    """Example of distributed training setup with RabitTracker."""
537
    
538
    # Create sample data and split into partitions
539
    X, y = make_classification(n_samples=10000, n_features=20, 
540
                              n_classes=2, random_state=42)
541
    
542
    n_workers = 4
543
    partition_size = len(X) // n_workers
544
    data_partitions = []
545
    
546
    for i in range(n_workers):
547
        start_idx = i * partition_size
548
        end_idx = (i + 1) * partition_size if i < n_workers - 1 else len(X)
549
        partition = (X[start_idx:end_idx], y[start_idx:end_idx])
550
        data_partitions.append(partition)
551
    
552
    # Initialize tracker
553
    tracker = RabitTracker(n_workers=n_workers, timeout=300)
554
    
555
    # Start tracker
556
    tracker_info = tracker.start()
557
    print(f"Tracker started at {tracker_info['host_ip']}:{tracker_info['port']}")
558
    
559
    # Get worker arguments
560
    worker_args = tracker.worker_args()
561
    
562
    # Start worker processes
563
    processes = []
564
    for worker_id in range(n_workers):
565
        p = mp.Process(target=worker_process, 
566
                      args=(worker_id, worker_args, data_partitions[worker_id]))
567
        p.start()
568
        processes.append(p)
569
    
570
    # Wait for all workers to complete
571
    success = tracker.wait_for(timeout=600)
572
    
573
    # Clean up processes
574
    for p in processes:
575
        p.join()
576
    
577
    if success:
578
        print("Distributed training completed successfully!")
579
    else:
580
        print("Distributed training failed or timed out")
581

582
# Note: Run this in a script, not in interactive environment
583
# distributed_training_example()
584
```
585

586
### Build Information and Diagnostics
587

588
```python
589
import xgboost as xgb
590

591
# Get comprehensive build information
592
build_info = xgb.build_info()
593

594
print("XGBoost Build Information:")
595
print("=" * 50)
596

597
# Check for key capabilities
598
gpu_support = build_info.get('USE_CUDA', False)
599
nccl_support = build_info.get('USE_NCCL', False)
600
omp_support = build_info.get('USE_OPENMP', False)
601

602
print(f"GPU Support (CUDA): {gpu_support}")
603
print(f"Multi-GPU Support (NCCL): {nccl_support}")
604
print(f"OpenMP Support: {omp_support}")
605

606
# Compiler and build details
607
print(f"\nCompiler: {build_info.get('COMPILER', 'Unknown')}")
608
print(f"Build with shared pointers: {build_info.get('BUILD_WITH_SHARED_PTR', False)}")
609

610
# Print all build flags
611
print(f"\nAll build configuration:")
612
for key, value in sorted(build_info.items()):
613
    print(f"  {key}: {value}")
614

615
# Version information
616
print(f"\nXGBoost version: {xgb.__version__}")
617

618
# Device availability check
619
def check_device_availability():
620
    """Check what devices are available for XGBoost."""
621
    devices = []
622
    
623
    # CPU is always available
624
    devices.append('cpu')
625
    
626
    # Check GPU availability
627
    if build_info.get('USE_CUDA', False):
628
        try:
629
            # Try to set CUDA device to test availability
630
            with xgb.config_context(device='cuda'):
631
                devices.append('cuda')
632
        except Exception:
633
            pass
634
    
635
    return devices
636

637
available_devices = check_device_availability()
638
print(f"\nAvailable devices: {available_devices}")
639

640
# Memory and performance recommendations
641
def get_performance_recommendations():
642
    """Get performance recommendations based on build configuration."""
643
    recommendations = []
644
    
645
    if not build_info.get('USE_CUDA', False):
646
        recommendations.append("Consider GPU version for large datasets")
647
    
648
    if not build_info.get('USE_OPENMP', False):
649
        recommendations.append("OpenMP not available - limited CPU parallelization")
650
    
651
    if build_info.get('USE_NCCL', False):
652
        recommendations.append("NCCL available - good for multi-GPU training")
653
    
654
    return recommendations
655

656
recommendations = get_performance_recommendations()
657
if recommendations:
658
    print(f"\nPerformance recommendations:")
659
    for rec in recommendations:
660
        print(f"  - {rec}")
661
```
662

663
### Advanced Visualization with Custom Styling
664

665
```python
666
import xgboost as xgb
667
import matplotlib.pyplot as plt
668
import seaborn as sns
669
from matplotlib.patches import Rectangle
670

671
# Set up styling
672
plt.style.use('seaborn-v0_8')
673
colors = sns.color_palette("husl", 10)
674

675
def create_comprehensive_model_report(model, feature_names=None):
676
    """Create a comprehensive visual report for XGBoost model."""
677
    
678
    fig = plt.figure(figsize=(20, 16))
679
    
680
    # Feature importance by different metrics
681
    importance_types = ['weight', 'gain', 'cover', 'total_gain']
682
    
683
    for i, imp_type in enumerate(importance_types):
684
        ax = plt.subplot(3, 4, i + 1)
685
        xgb.plot_importance(model, ax=ax, importance_type=imp_type,
686
                          max_num_features=10, color=colors[i],
687
                          title=f'Importance by {imp_type.title()}')
688
    
689
    # Individual trees (first 4 trees)
690
    for i in range(4):
691
        if i < model.num_boosted_rounds():
692
            ax = plt.subplot(3, 4, i + 5)
693
            xgb.plot_tree(model, ax=ax, tree_idx=i, precision=2)
694
            ax.set_title(f'Tree {i}')
695
    
696
    # Model performance metrics (if available)
697
    ax = plt.subplot(3, 2, 5)
698
    
699
    # Get feature scores for analysis
700
    feature_scores = model.get_score(importance_type='gain')
701
    if feature_scores:
702
        top_features = sorted(feature_scores.items(), 
703
                            key=lambda x: x[1], reverse=True)[:10]
704
        
705
        features, scores = zip(*top_features)
706
        bars = ax.barh(range(len(features)), scores, color=colors[:len(features)])
707
        ax.set_yticks(range(len(features)))
708
        ax.set_yticklabels(features)
709
        ax.set_xlabel('Feature Gain')
710
        ax.set_title('Top 10 Features by Gain')
711
        
712
        # Add value labels on bars
713
        for i, (bar, score) in enumerate(zip(bars, scores)):
714
            ax.text(bar.get_width() + max(scores) * 0.01, bar.get_y() + bar.get_height()/2,
715
                   f'{score:.3f}', ha='left', va='center', fontsize=8)
716
    
717
    # Model info panel
718
    ax = plt.subplot(3, 2, 6)
719
    ax.axis('off')
720
    
721
    # Create info text
722
    info_text = f"""
723
    Model Information:
724
    
725
    Number of trees: {model.num_boosted_rounds()}
726
    Number of features: {model.num_features()}
727
    Best iteration: {getattr(model, 'best_iteration', 'N/A')}
728
    Best score: {getattr(model, 'best_score', 'N/A')}
729
    
730
    Top 3 Features:
731
    """
732
    
733
    if feature_scores:
734
        for i, (feature, score) in enumerate(top_features[:3]):
735
            info_text += f"\n    {i+1}. {feature}: {score:.3f}"
736
    
737
    ax.text(0.1, 0.9, info_text, transform=ax.transAxes, fontsize=12,
738
           verticalalignment='top', fontfamily='monospace',
739
           bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8))
740
    
741
    plt.suptitle('XGBoost Model Analysis Report', fontsize=16, fontweight='bold')
742
    plt.tight_layout()
743
    plt.show()
744

745
# Generate comprehensive report
746
create_comprehensive_model_report(model, feature_names)
747
```

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/