or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced.mdclustering.mddaal4py-mb.mddecomposition.mdensemble.mdindex.mdlinear-models.mdmetrics-model-selection.mdneighbors.mdpatching-config.mdstats-manifold.mdsvm.md

stats-manifold.mddocs/

0

# Statistics and Manifold Learning

1

2

High-performance implementations of statistical analysis and manifold learning algorithms with Intel hardware acceleration. These algorithms provide significant speedups for statistical computations and dimensionality reduction on large datasets.

3

4

## Capabilities

5

6

### Basic Statistics

7

8

#### BasicStatistics

9

10

Intel-accelerated computation of basic statistical metrics with vectorized operations for large datasets.

11

12

```python { .api }

13

class BasicStatistics:

14

"""

15

Basic statistics computation with Intel optimization.

16

17

Provides efficient computation of fundamental statistical metrics

18

including mean, variance, covariance, correlation, and quantiles.

19

"""

20

21

def __init__(

22

self,

23

result_options='all',

24

algorithm='by_default'

25

):

26

"""

27

Initialize BasicStatistics estimator.

28

29

Parameters:

30

result_options (str or list): Statistics to compute

31

('all', 'mean', 'variance', 'variation', 'sum', 'sum_squares',

32

'sum_squares_centered', 'second_order_raw_moment', 'min', 'max')

33

algorithm (str): Algorithm implementation to use

34

"""

35

36

def fit(self, X, y=None):

37

"""

38

Compute basic statistics for the input data.

39

40

Parameters:

41

X (array-like): Input data of shape (n_samples, n_features)

42

y: Ignored, present for API consistency

43

44

Returns:

45

self: Fitted estimator with computed statistics

46

"""

47

48

def partial_fit(self, X, y=None):

49

"""

50

Update statistics with new batch of data.

51

52

Parameters:

53

X (array-like): New batch of data

54

y: Ignored

55

56

Returns:

57

self: Updated estimator

58

"""

59

60

def finalize_fit(self):

61

"""

62

Finalize the computation of statistics.

63

64

Returns:

65

self: Finalized estimator

66

"""

67

68

# Attributes available after fitting

69

min_: ... # Minimum values per feature

70

max_: ... # Maximum values per feature

71

sum_: ... # Sum of values per feature

72

mean_: ... # Mean values per feature

73

variance_: ... # Variance per feature

74

variation_: ... # Coefficient of variation per feature

75

sum_squares_: ... # Sum of squares per feature

76

sum_squares_centered_: ... # Centered sum of squares per feature

77

second_order_raw_moment_: ... # Second order raw moments

78

n_samples_seen_: ... # Number of samples processed

79

```

80

81

#### IncrementalBasicStatistics

82

83

Intel-accelerated incremental computation of basic statistics for streaming data.

84

85

```python { .api }

86

class IncrementalBasicStatistics:

87

"""

88

Incremental basic statistics with Intel optimization.

89

90

Enables efficient online computation of statistical metrics

91

for streaming data or datasets that don't fit in memory.

92

"""

93

94

def __init__(

95

self,

96

result_options='all',

97

algorithm='by_default'

98

):

99

"""

100

Initialize IncrementalBasicStatistics estimator.

101

102

Parameters:

103

result_options (str or list): Statistics to compute

104

('all', 'mean', 'variance', 'variation', 'sum', 'sum_squares',

105

'sum_squares_centered', 'second_order_raw_moment', 'min', 'max')

106

algorithm (str): Algorithm implementation to use

107

"""

108

109

def partial_fit(self, X, y=None):

110

"""

111

Update statistics incrementally with new data batch.

112

113

Parameters:

114

X (array-like): New batch of data

115

y: Ignored, present for API consistency

116

117

Returns:

118

self: Updated estimator

119

"""

120

121

def fit(self, X, y=None):

122

"""

123

Compute statistics for input data (equivalent to single partial_fit).

124

125

Parameters:

126

X (array-like): Input data of shape (n_samples, n_features)

127

y: Ignored

128

129

Returns:

130

self: Fitted estimator

131

"""

132

133

def finalize_fit(self):

134

"""

135

Finalize incremental statistics computation.

136

137

Returns:

138

self: Finalized estimator with complete statistics

139

"""

140

141

# Attributes available after fitting

142

min_: ... # Minimum values per feature

143

max_: ... # Maximum values per feature

144

sum_: ... # Sum of values per feature

145

mean_: ... # Mean values per feature

146

variance_: ... # Variance per feature

147

variation_: ... # Coefficient of variation per feature

148

sum_squares_: ... # Sum of squares per feature

149

sum_squares_centered_: ... # Centered sum of squares per feature

150

second_order_raw_moment_: ... # Second order raw moments

151

n_samples_seen_: ... # Total number of samples processed

152

```

153

154

### Covariance Estimation

155

156

#### IncrementalEmpiricalCovariance

157

158

Intel-accelerated incremental empirical covariance estimation for streaming data and large datasets.

159

160

```python { .api }

161

class IncrementalEmpiricalCovariance:

162

"""

163

Incremental empirical covariance estimation with Intel optimization.

164

165

Efficiently computes sample covariance matrix incrementally, making it

166

suitable for streaming data and datasets too large to fit in memory.

167

"""

168

169

def __init__(

170

self,

171

store_precision=True,

172

assume_centered=False

173

):

174

"""

175

Initialize Incremental Empirical Covariance.

176

177

Parameters:

178

store_precision (bool): Whether to store precision matrix

179

assume_centered (bool): Whether data is already centered

180

"""

181

182

def fit(self, X, y=None):

183

"""

184

Fit covariance model to data.

185

186

Parameters:

187

X (array-like): Training data of shape (n_samples, n_features)

188

y: Ignored, present for API consistency

189

190

Returns:

191

self: Fitted estimator

192

"""

193

194

def partial_fit(self, X, y=None):

195

"""

196

Incrementally fit covariance model.

197

198

Parameters:

199

X (array-like): Data batch of shape (n_samples, n_features)

200

y: Ignored

201

202

Returns:

203

self: Updated estimator

204

"""

205

206

def score(self, X, y=None):

207

"""

208

Compute log-likelihood under the model.

209

210

Parameters:

211

X (array-like): Test data

212

y: Ignored

213

214

Returns:

215

float: Average log-likelihood

216

"""

217

218

# Attributes available after fitting

219

covariance_: ... # Estimated covariance matrix

220

location_: ... # Estimated location (mean)

221

precision_: ... # Estimated precision matrix (if store_precision=True)

222

n_samples_seen_: ... # Number of samples processed

223

```

224

225

### Manifold Learning

226

227

#### t-SNE (t-Distributed Stochastic Neighbor Embedding)

228

229

Intel-accelerated t-SNE for non-linear dimensionality reduction and visualization.

230

231

```python { .api }

232

class TSNE:

233

"""

234

t-distributed Stochastic Neighbor Embedding with Intel optimization.

235

236

Provides efficient non-linear dimensionality reduction for visualization

237

and exploratory data analysis with optimized gradient computations.

238

"""

239

240

def __init__(

241

self,

242

n_components=2,

243

perplexity=30.0,

244

early_exaggeration=12.0,

245

learning_rate='warn',

246

n_iter=1000,

247

n_iter_without_progress=300,

248

min_grad_norm=1e-7,

249

metric='euclidean',

250

init='warn',

251

verbose=0,

252

random_state=None,

253

method='barnes_hut',

254

angle=0.5,

255

n_jobs=None,

256

square_distances='deprecated'

257

):

258

"""

259

Initialize t-SNE estimator.

260

261

Parameters:

262

n_components (int): Dimension of embedded space (usually 2 or 3)

263

perplexity (float): Related to number of nearest neighbors

264

early_exaggeration (float): How tight natural clusters are in original space

265

learning_rate (float or str): Learning rate for optimization

266

n_iter (int): Maximum number of iterations

267

n_iter_without_progress (int): Maximum iterations without progress

268

min_grad_norm (float): Minimum gradient norm for early stopping

269

metric (str): Distance metric to use

270

init (str or array): Initialization method ('random', 'pca', array)

271

verbose (int): Verbosity level

272

random_state (int): Random state for reproducibility

273

method (str): Algorithm to use ('barnes_hut', 'exact')

274

angle (float): Trade-off between speed and accuracy for Barnes-Hut

275

n_jobs (int): Number of parallel jobs

276

square_distances (str): Deprecated parameter

277

"""

278

279

def fit(self, X, y=None):

280

"""

281

Fit X into an embedded space.

282

283

Parameters:

284

X (array-like): Input data of shape (n_samples, n_features)

285

y: Ignored, present for API consistency

286

287

Returns:

288

self: Fitted estimator

289

"""

290

291

def fit_transform(self, X, y=None):

292

"""

293

Fit X into an embedded space and return transformed array.

294

295

Parameters:

296

X (array-like): Input data of shape (n_samples, n_features)

297

y: Ignored

298

299

Returns:

300

array: Embedded coordinates of shape (n_samples, n_components)

301

"""

302

303

# Attributes available after fitting

304

embedding_: ... # Stores embedding vectors

305

kl_divergence_: ... # Kullback-Leibler divergence after optimization

306

n_features_in_: ... # Number of features in input data

307

n_iter_: ... # Number of iterations run

308

learning_rate_: ... # Effective learning rate

309

```

310

311

## Usage Examples

312

313

### Basic Statistics Computation

314

315

```python

316

import numpy as np

317

from sklearnex.basic_statistics import BasicStatistics

318

from sklearn.datasets import make_regression

319

320

# Generate sample data

321

X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

322

323

# Compute basic statistics

324

stats = BasicStatistics(result_options='all')

325

stats.fit(X)

326

327

print("Basic Statistics Results:")

328

print(f"Data shape: {X.shape}")

329

print(f"Samples processed: {stats.n_samples_seen_}")

330

331

# Access computed statistics

332

print(f"Mean per feature: {stats.mean_}")

333

print(f"Variance per feature: {stats.variance_}")

334

print(f"Min values: {stats.min_}")

335

print(f"Max values: {stats.max_}")

336

print(f"Sum per feature: {stats.sum_}")

337

338

# Coefficient of variation (std/mean)

339

print(f"Coefficient of variation: {stats.variation_}")

340

341

# Statistical moments

342

print(f"Sum of squares: {stats.sum_squares_}")

343

print(f"Centered sum of squares: {stats.sum_squares_centered_}")

344

print(f"Second order raw moment: {stats.second_order_raw_moment_}")

345

346

# Compute specific statistics only

347

stats_subset = BasicStatistics(result_options=['mean', 'variance', 'min', 'max'])

348

stats_subset.fit(X)

349

350

print("\nSubset of statistics:")

351

print(f"Mean: {stats_subset.mean_}")

352

print(f"Variance: {stats_subset.variance_}")

353

print(f"Min: {stats_subset.min_}")

354

print(f"Max: {stats_subset.max_}")

355

356

# Verify against NumPy computations

357

print(f"\nVerification against NumPy:")

358

print(f"Mean matches NumPy: {np.allclose(stats.mean_, np.mean(X, axis=0))}")

359

print(f"Variance matches NumPy: {np.allclose(stats.variance_, np.var(X, axis=0, ddof=0))}")

360

print(f"Min matches NumPy: {np.allclose(stats.min_, np.min(X, axis=0))}")

361

print(f"Max matches NumPy: {np.allclose(stats.max_, np.max(X, axis=0))}")

362

```

363

364

### Incremental Statistics for Streaming Data

365

366

```python

367

import numpy as np

368

from sklearnex.basic_statistics import IncrementalBasicStatistics

369

370

# Simulate streaming data

371

np.random.seed(42)

372

total_samples = 5000

373

batch_size = 500

374

n_features = 8

375

376

# Create incremental statistics estimator

377

inc_stats = IncrementalBasicStatistics(result_options='all')

378

379

# Process data in batches

380

all_data = []

381

for batch_idx in range(0, total_samples, batch_size):

382

# Generate batch of data

383

batch_data = np.random.randn(batch_size, n_features)

384

all_data.append(batch_data)

385

386

# Update statistics incrementally

387

inc_stats.partial_fit(batch_data)

388

389

print(f"Processed batch {batch_idx//batch_size + 1}: "

390

f"{inc_stats.n_samples_seen_} total samples")

391

392

# Finalize computation

393

inc_stats.finalize_fit()

394

395

# Compare with batch computation

396

full_data = np.vstack(all_data)

397

batch_stats = BasicStatistics(result_options='all')

398

batch_stats.fit(full_data)

399

400

print(f"\nIncremental vs Batch Statistics Comparison:")

401

print(f"Samples processed - Incremental: {inc_stats.n_samples_seen_}, "

402

f"Batch: {batch_stats.n_samples_seen_}")

403

404

# Verify results are identical

405

print(f"Mean identical: {np.allclose(inc_stats.mean_, batch_stats.mean_)}")

406

print(f"Variance identical: {np.allclose(inc_stats.variance_, batch_stats.variance_)}")

407

print(f"Min identical: {np.allclose(inc_stats.min_, batch_stats.min_)}")

408

print(f"Max identical: {np.allclose(inc_stats.max_, batch_stats.max_)}")

409

410

# Demonstrate memory efficiency for large datasets

411

print(f"\nMemory-efficient processing example:")

412

inc_stats_large = IncrementalBasicStatistics(result_options=['mean', 'variance'])

413

414

# Simulate processing very large dataset in small batches

415

n_batches = 100

416

batch_size = 1000

417

418

for i in range(n_batches):

419

# Generate and immediately process batch (no storage)

420

batch = np.random.normal(loc=i*0.1, scale=1.0, size=(batch_size, n_features))

421

inc_stats_large.partial_fit(batch)

422

423

if (i + 1) % 20 == 0:

424

print(f" Processed {inc_stats_large.n_samples_seen_} samples")

425

426

inc_stats_large.finalize_fit()

427

print(f"Final mean: {inc_stats_large.mean_}")

428

print(f"Final variance: {inc_stats_large.variance_}")

429

```

430

431

### t-SNE for Dimensionality Reduction and Visualization

432

433

```python

434

import numpy as np

435

import matplotlib.pyplot as plt

436

from sklearnex.manifold import TSNE

437

from sklearn.datasets import load_digits, make_blobs

438

439

# Example 1: Digits dataset visualization

440

digits = load_digits()

441

X_digits, y_digits = digits.data, digits.target

442

443

print(f"Digits dataset shape: {X_digits.shape}")

444

print(f"Number of classes: {len(np.unique(y_digits))}")

445

446

# Apply t-SNE for 2D visualization

447

tsne = TSNE(n_components=2, perplexity=30, random_state=42, verbose=1)

448

X_tsne = tsne.fit_transform(X_digits)

449

450

print(f"t-SNE embedding shape: {X_tsne.shape}")

451

print(f"KL divergence: {tsne.kl_divergence_:.4f}")

452

print(f"Iterations run: {tsne.n_iter_}")

453

454

# Visualize results

455

plt.figure(figsize=(12, 5))

456

457

plt.subplot(1, 2, 1)

458

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y_digits, cmap='tab10', s=20, alpha=0.7)

459

plt.colorbar()

460

plt.title('t-SNE: Digits Dataset (Colored by Digit)')

461

plt.xlabel('t-SNE Component 1')

462

plt.ylabel('t-SNE Component 2')

463

464

# Example 2: High-dimensional synthetic data

465

X_synthetic, y_synthetic = make_blobs(

466

n_samples=1000, centers=5, n_features=50,

467

cluster_std=2.0, random_state=42

468

)

469

470

print(f"\nSynthetic dataset shape: {X_synthetic.shape}")

471

472

# t-SNE with different parameters

473

tsne_synthetic = TSNE(

474

n_components=2,

475

perplexity=50,

476

early_exaggeration=12.0,

477

learning_rate=200.0,

478

n_iter=1000,

479

random_state=42

480

)

481

X_tsne_synthetic = tsne_synthetic.fit_transform(X_synthetic)

482

483

plt.subplot(1, 2, 2)

484

plt.scatter(X_tsne_synthetic[:, 0], X_tsne_synthetic[:, 1],

485

c=y_synthetic, cmap='viridis', s=20, alpha=0.7)

486

plt.colorbar()

487

plt.title('t-SNE: Synthetic High-D Data')

488

plt.xlabel('t-SNE Component 1')

489

plt.ylabel('t-SNE Component 2')

490

491

plt.tight_layout()

492

plt.show()

493

494

# Example 3: 3D embedding

495

tsne_3d = TSNE(n_components=3, perplexity=30, random_state=42)

496

X_tsne_3d = tsne_3d.fit_transform(X_digits[:500]) # Use subset for faster computation

497

498

print(f"\n3D t-SNE embedding shape: {X_tsne_3d.shape}")

499

500

# 3D visualization

501

fig = plt.figure(figsize=(10, 8))

502

ax = fig.add_subplot(111, projection='3d')

503

scatter = ax.scatter(X_tsne_3d[:, 0], X_tsne_3d[:, 1], X_tsne_3d[:, 2],

504

c=y_digits[:500], cmap='tab10', s=30, alpha=0.7)

505

ax.set_xlabel('t-SNE Component 1')

506

ax.set_ylabel('t-SNE Component 2')

507

ax.set_zlabel('t-SNE Component 3')

508

ax.set_title('3D t-SNE: Digits Dataset')

509

plt.colorbar(scatter)

510

plt.show()

511

512

# Parameter sensitivity analysis

513

perplexity_values = [5, 15, 30, 50, 100]

514

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

515

axes = axes.ravel()

516

517

for i, perp in enumerate(perplexity_values):

518

if i >= len(axes):

519

break

520

521

tsne_param = TSNE(n_components=2, perplexity=perp, random_state=42)

522

X_param = tsne_param.fit_transform(X_digits[:1000]) # Use subset for speed

523

524

axes[i].scatter(X_param[:, 0], X_param[:, 1], c=y_digits[:1000],

525

cmap='tab10', s=10, alpha=0.7)

526

axes[i].set_title(f'Perplexity = {perp}')

527

axes[i].set_xlabel('t-SNE Component 1')

528

axes[i].set_ylabel('t-SNE Component 2')

529

530

# Hide the last subplot if not used

531

if len(perplexity_values) < len(axes):

532

axes[-1].axis('off')

533

534

plt.tight_layout()

535

plt.show()

536

```

537

538

### Combined Statistics and Manifold Analysis

539

540

```python

541

import numpy as np

542

from sklearnex.basic_statistics import BasicStatistics

543

from sklearnex.manifold import TSNE

544

from sklearn.datasets import load_breast_cancer

545

from sklearn.preprocessing import StandardScaler

546

547

# Load real-world dataset

548

cancer = load_breast_cancer()

549

X_cancer, y_cancer = cancer.data, cancer.target

550

551

print(f"Breast cancer dataset shape: {X_cancer.shape}")

552

print(f"Feature names: {cancer.feature_names[:5]}...") # Show first 5 features

553

554

# Compute basic statistics on raw data

555

raw_stats = BasicStatistics(result_options='all')

556

raw_stats.fit(X_cancer)

557

558

print("\nRaw data statistics:")

559

print(f"Mean range: [{raw_stats.mean_.min():.2f}, {raw_stats.mean_.max():.2f}]")

560

print(f"Variance range: [{raw_stats.variance_.min():.2e}, {raw_stats.variance_.max():.2e}]")

561

print(f"Min values range: [{raw_stats.min_.min():.2f}, {raw_stats.min_.max():.2f}]")

562

print(f"Max values range: [{raw_stats.max_.min():.2f}, {raw_stats.max_.max():.2f}]")

563

564

# Identify features with high variance

565

high_var_features = np.where(raw_stats.variance_ > np.percentile(raw_stats.variance_, 90))[0]

566

print(f"High variance features: {[cancer.feature_names[i] for i in high_var_features]}")

567

568

# Standardize data for better t-SNE performance

569

scaler = StandardScaler()

570

X_scaled = scaler.fit_transform(X_cancer)

571

572

# Compute statistics on scaled data

573

scaled_stats = BasicStatistics(result_options=['mean', 'variance'])

574

scaled_stats.fit(X_scaled)

575

576

print(f"\nScaled data statistics:")

577

print(f"Mean after scaling: {scaled_stats.mean_}")

578

print(f"Variance after scaling: {scaled_stats.variance_}")

579

580

# Apply t-SNE to scaled data

581

tsne_cancer = TSNE(

582

n_components=2,

583

perplexity=30,

584

learning_rate=200,

585

n_iter=1000,

586

random_state=42,

587

verbose=1

588

)

589

X_tsne_cancer = tsne_cancer.fit_transform(X_scaled)

590

591

# Analyze t-SNE embedding statistics

592

tsne_stats = BasicStatistics(result_options='all')

593

tsne_stats.fit(X_tsne_cancer)

594

595

print(f"\nt-SNE embedding statistics:")

596

print(f"Embedding mean: {tsne_stats.mean_}")

597

print(f"Embedding variance: {tsne_stats.variance_}")

598

print(f"Embedding range: [{tsne_stats.min_}, {tsne_stats.max_}]")

599

600

# Visualize results with statistics

601

plt.figure(figsize=(15, 5))

602

603

# Original data: first two features

604

plt.subplot(1, 3, 1)

605

plt.scatter(X_cancer[:, 0], X_cancer[:, 1], c=y_cancer, cmap='coolwarm', alpha=0.7)

606

plt.xlabel(f"{cancer.feature_names[0]}")

607

plt.ylabel(f"{cancer.feature_names[1]}")

608

plt.title("Original Data (First 2 Features)")

609

plt.colorbar(label='Malignant (1) / Benign (0)')

610

611

# Scaled data: first two features

612

plt.subplot(1, 3, 2)

613

plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y_cancer, cmap='coolwarm', alpha=0.7)

614

plt.xlabel(f"Scaled {cancer.feature_names[0]}")

615

plt.ylabel(f"Scaled {cancer.feature_names[1]}")

616

plt.title("Scaled Data (First 2 Features)")

617

plt.colorbar(label='Malignant (1) / Benign (0)')

618

619

# t-SNE embedding

620

plt.subplot(1, 3, 3)

621

plt.scatter(X_tsne_cancer[:, 0], X_tsne_cancer[:, 1], c=y_cancer, cmap='coolwarm', alpha=0.7)

622

plt.xlabel("t-SNE Component 1")

623

plt.ylabel("t-SNE Component 2")

624

plt.title(f"t-SNE Embedding (KL={tsne_cancer.kl_divergence_:.2f})")

625

plt.colorbar(label='Malignant (1) / Benign (0)')

626

627

plt.tight_layout()

628

plt.show()

629

630

# Feature correlation analysis using statistics

631

feature_correlations = []

632

for i in range(X_cancer.shape[1]):

633

for j in range(i+1, X_cancer.shape[1]):

634

corr = np.corrcoef(X_cancer[:, i], X_cancer[:, j])[0, 1]

635

feature_correlations.append({

636

'feature1': cancer.feature_names[i],

637

'feature2': cancer.feature_names[j],

638

'correlation': abs(corr)

639

})

640

641

# Find most correlated features

642

feature_correlations.sort(key=lambda x: x['correlation'], reverse=True)

643

print(f"\nTop 5 most correlated feature pairs:")

644

for i in range(5):

645

fc = feature_correlations[i]

646

print(f" {fc['feature1']} <-> {fc['feature2']}: {fc['correlation']:.3f}")

647

```

648

649

### Performance Comparison

650

651

```python

652

import time

653

import numpy as np

654

from sklearn.datasets import make_regression

655

656

# Generate large dataset for performance testing

657

X_large, _ = make_regression(n_samples=100000, n_features=50, random_state=42)

658

659

print("Performance comparison on large dataset:")

660

print(f"Dataset shape: {X_large.shape}")

661

662

# Test BasicStatistics performance

663

print("\nBasic Statistics Performance:")

664

665

# Intel-optimized version

666

start_time = time.time()

667

from sklearnex.basic_statistics import BasicStatistics as IntelStats

668

intel_stats = IntelStats(result_options='all')

669

intel_stats.fit(X_large)

670

intel_time = time.time() - start_time

671

672

print(f"Intel BasicStatistics: {intel_time:.3f} seconds")

673

674

# NumPy comparison

675

start_time = time.time()

676

numpy_mean = np.mean(X_large, axis=0)

677

numpy_var = np.var(X_large, axis=0)

678

numpy_min = np.min(X_large, axis=0)

679

numpy_max = np.max(X_large, axis=0)

680

numpy_sum = np.sum(X_large, axis=0)

681

numpy_time = time.time() - start_time

682

683

print(f"NumPy equivalent computations: {numpy_time:.3f} seconds")

684

print(f"Speedup: {numpy_time / intel_time:.1f}x")

685

686

# Verify results match

687

print(f"Results identical:")

688

print(f" Mean: {np.allclose(intel_stats.mean_, numpy_mean)}")

689

print(f" Variance: {np.allclose(intel_stats.variance_, numpy_var)}")

690

print(f" Min: {np.allclose(intel_stats.min_, numpy_min)}")

691

print(f" Max: {np.allclose(intel_stats.max_, numpy_max)}")

692

693

# Test t-SNE performance (smaller dataset for practical timing)

694

X_tsne_test = X_large[:5000, :20] # Reduce size for t-SNE timing

695

696

print(f"\nt-SNE Performance (shape: {X_tsne_test.shape}):")

697

698

# Intel-optimized version

699

start_time = time.time()

700

from sklearnex.manifold import TSNE as IntelTSNE

701

intel_tsne = IntelTSNE(n_components=2, perplexity=30, random_state=42, verbose=0)

702

intel_embedding = intel_tsne.fit_transform(X_tsne_test)

703

intel_tsne_time = time.time() - start_time

704

705

print(f"Intel t-SNE: {intel_tsne_time:.2f} seconds")

706

print(f"KL divergence: {intel_tsne.kl_divergence_:.4f}")

707

708

# Standard scikit-learn version

709

start_time = time.time()

710

from sklearn.manifold import TSNE as StandardTSNE

711

standard_tsne = StandardTSNE(n_components=2, perplexity=30, random_state=42, verbose=0)

712

standard_embedding = standard_tsne.fit_transform(X_tsne_test)

713

standard_tsne_time = time.time() - start_time

714

715

print(f"Standard t-SNE: {standard_tsne_time:.2f} seconds")

716

print(f"KL divergence: {standard_tsne.kl_divergence_:.4f}")

717

print(f"Speedup: {standard_tsne_time / intel_tsne_time:.1f}x")

718

719

# Compare embedding quality

720

embedding_diff = np.mean(np.abs(intel_embedding - standard_embedding))

721

print(f"Mean absolute difference in embeddings: {embedding_diff:.4f}")

722

```

723

724

## Performance Notes

725

726

- BasicStatistics shows significant speedups on datasets with >10000 samples

727

- IncrementalBasicStatistics enables processing of datasets larger than memory

728

- t-SNE optimization provides substantial improvements on high-dimensional data (>20 features)

729

- Statistical computations benefit most from vectorized operations on wide datasets

730

- Memory usage for statistics is minimal and constant

731

- t-SNE memory usage scales with sample count, similar to scikit-learn

732

- All algorithms maintain high numerical accuracy compared to standard implementations