or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

clustering.mdcore-tree.mddata-tables.mdexternal-formats.mdindex.mdncbi-taxonomy.mdphylogenetic.mdsequences.mdvisualization.md

data-tables.mddocs/

0

# Data Tables and Arrays

1

2

Efficient handling of numerical data associated with trees and sequences, supporting matrix operations, statistical analysis, and integration with scientific computing workflows. ETE3's ArrayTable provides high-performance data manipulation capabilities.

3

4

## Capabilities

5

6

### ArrayTable Class

7

8

Main class for handling 2D numerical data with matrix operations and scientific computing integration.

9

10

```python { .api }

11

class ArrayTable:

12

"""

13

Efficient 2D data table with matrix operations and scientific computing support.

14

Built on NumPy for high performance numerical operations.

15

"""

16

17

def __init__(self, matrix_file=None, mtype="float"):

18

"""

19

Initialize array table.

20

21

Parameters:

22

- matrix_file (str): Path to matrix data file

23

- mtype (str): Data type ("float", "int", "str")

24

"""

25

26

def __len__(self):

27

"""Number of rows in table."""

28

29

def __str__(self):

30

"""String representation of table."""

31

```

32

33

### Data Access and Retrieval

34

35

Methods for accessing rows, columns, and individual data elements.

36

37

```python { .api }

38

def get_column_array(self, colname):

39

"""

40

Get column data as NumPy array.

41

42

Parameters:

43

- colname (str): Column name

44

45

Returns:

46

numpy.ndarray: Column data array

47

"""

48

49

def get_row_array(self, rowname):

50

"""

51

Get row data as NumPy array.

52

53

Parameters:

54

- rowname (str): Row name

55

56

Returns:

57

numpy.ndarray: Row data array

58

"""

59

60

def get_several_column_arrays(self, colnames):

61

"""

62

Get multiple columns as arrays.

63

64

Parameters:

65

- colnames (list): List of column names

66

67

Returns:

68

dict: Mapping from column names to arrays

69

"""

70

71

def get_several_row_arrays(self, rownames):

72

"""

73

Get multiple rows as arrays.

74

75

Parameters:

76

- rownames (list): List of row names

77

78

Returns:

79

dict: Mapping from row names to arrays

80

"""

81

82

# Properties for data access

83

matrix: numpy.ndarray # Underlying data matrix

84

colNames: list # Column names

85

rowNames: list # Row names

86

colValues: dict # Column name to index mapping

87

rowValues: dict # Row name to index mapping

88

```

89

90

### Matrix Operations

91

92

Mathematical operations and transformations on the data matrix.

93

94

```python { .api }

95

def transpose(self):

96

"""

97

Transpose the matrix (swap rows and columns).

98

99

Returns:

100

ArrayTable: New transposed table

101

"""

102

103

def remove_column(self, colname):

104

"""

105

Remove column from table.

106

107

Parameters:

108

- colname (str): Column name to remove

109

"""

110

111

def remove_row(self, rowname):

112

"""

113

Remove row from table.

114

115

Parameters:

116

- rowname (str): Row name to remove

117

"""

118

119

def add_column(self, colname, colvalues):

120

"""

121

Add new column to table.

122

123

Parameters:

124

- colname (str): Name for new column

125

- colvalues (array-like): Column data values

126

"""

127

128

def add_row(self, rowname, rowvalues):

129

"""

130

Add new row to table.

131

132

Parameters:

133

- rowname (str): Name for new row

134

- rowvalues (array-like): Row data values

135

"""

136

```

137

138

### File I/O Operations

139

140

Read and write table data in various formats.

141

142

```python { .api }

143

def write(self, fname=None, colnames=None):

144

"""

145

Write table to file.

146

147

Parameters:

148

- fname (str): Output file path, if None returns string

149

- colnames (list): Specific columns to write

150

151

Returns:

152

str: Formatted table string (if fname is None)

153

"""

154

155

def read(self, matrix_file, mtype="float", **kwargs):

156

"""

157

Read table data from file.

158

159

Parameters:

160

- matrix_file (str): Input file path

161

- mtype (str): Data type for parsing

162

- kwargs: Additional parsing parameters

163

"""

164

```

165

166

### Statistical Operations

167

168

Built-in statistical analysis and data summary methods.

169

170

```python { .api }

171

def get_stats(self):

172

"""

173

Calculate basic statistics for all columns.

174

175

Returns:

176

dict: Statistics including mean, std, min, max for each column

177

"""

178

179

def get_column_stats(self, colname):

180

"""

181

Calculate statistics for specific column.

182

183

Parameters:

184

- colname (str): Column name

185

186

Returns:

187

dict: Column statistics (mean, std, min, max, etc.)

188

"""

189

190

def normalize(self, method="standard"):

191

"""

192

Normalize data using specified method.

193

194

Parameters:

195

- method (str): Normalization method ("standard", "minmax", "robust")

196

197

Returns:

198

ArrayTable: Normalized table

199

"""

200

```

201

202

### Data Filtering and Selection

203

204

Filter and select subsets of data based on criteria.

205

206

```python { .api }

207

def filter_columns(self, condition_func):

208

"""

209

Filter columns based on condition function.

210

211

Parameters:

212

- condition_func (function): Function that takes column array, returns bool

213

214

Returns:

215

ArrayTable: Filtered table

216

"""

217

218

def filter_rows(self, condition_func):

219

"""

220

Filter rows based on condition function.

221

222

Parameters:

223

- condition_func (function): Function that takes row array, returns bool

224

225

Returns:

226

ArrayTable: Filtered table

227

"""

228

229

def select_columns(self, colnames):

230

"""

231

Select specific columns.

232

233

Parameters:

234

- colnames (list): Column names to select

235

236

Returns:

237

ArrayTable: Table with selected columns

238

"""

239

240

def select_rows(self, rownames):

241

"""

242

Select specific rows.

243

244

Parameters:

245

- rownames (list): Row names to select

246

247

Returns:

248

ArrayTable: Table with selected rows

249

"""

250

```

251

252

### Integration with Trees

253

254

Methods for associating tabular data with tree structures.

255

256

```python { .api }

257

def link_to_tree(self, tree, attr_name="profile"):

258

"""

259

Link table data to tree nodes.

260

261

Parameters:

262

- tree (Tree): Tree to link data to

263

- attr_name (str): Attribute name for storing data in nodes

264

"""

265

266

def get_tree_profile(self, tree, attr_name="profile"):

267

"""

268

Extract profile data from tree nodes.

269

270

Parameters:

271

- tree (Tree): Tree with profile data

272

- attr_name (str): Attribute name containing data

273

274

Returns:

275

ArrayTable: Table with tree profile data

276

"""

277

```

278

279

## Clustering Integration

280

281

### ClusterTree with ArrayTable

282

283

Enhanced clustering functionality when combined with data tables.

284

285

```python { .api }

286

def get_distance_matrix(self):

287

"""

288

Calculate distance matrix between rows.

289

290

Returns:

291

numpy.ndarray: Symmetric distance matrix

292

"""

293

294

def cluster_data(self, method="ward", metric="euclidean"):

295

"""

296

Perform hierarchical clustering on data.

297

298

Parameters:

299

- method (str): Linkage method ("ward", "complete", "average", "single")

300

- metric (str): Distance metric ("euclidean", "manhattan", "cosine")

301

302

Returns:

303

ClusterTree: Tree representing clustering hierarchy

304

"""

305

```

306

307

## Usage Examples

308

309

### Basic Table Operations

310

311

```python

312

from ete3 import ArrayTable

313

import numpy as np

314

315

# Create table from file

316

table = ArrayTable("data_matrix.txt", mtype="float")

317

318

# Basic properties

319

print(f"Table dimensions: {len(table.rowNames)} x {len(table.colNames)}")

320

print(f"Column names: {table.colNames}")

321

print(f"Row names: {table.rowNames}")

322

323

# Access data

324

col_data = table.get_column_array("column1")

325

row_data = table.get_row_array("row1")

326

327

print(f"Column1 stats: mean={np.mean(col_data):.2f}, std={np.std(col_data):.2f}")

328

```

329

330

### Data Manipulation

331

332

```python

333

from ete3 import ArrayTable

334

335

# Load data

336

table = ArrayTable("expression_data.txt")

337

338

# Remove unwanted columns/rows

339

table.remove_column("control_sample")

340

table.remove_row("uninformative_gene")

341

342

# Add new data

343

new_column_data = [1.5, 2.3, 0.8, 3.1, 1.9]

344

table.add_column("new_condition", new_column_data)

345

346

# Transpose for different analysis perspective

347

transposed = table.transpose()

348

349

# Save results

350

table.write("modified_data.txt")

351

```

352

353

### Statistical Analysis

354

355

```python

356

from ete3 import ArrayTable

357

358

table = ArrayTable("experimental_data.txt")

359

360

# Get overall statistics

361

stats = table.get_stats()

362

for col, col_stats in stats.items():

363

print(f"{col}: mean={col_stats['mean']:.2f}, std={col_stats['std']:.2f}")

364

365

# Normalize data

366

normalized_table = table.normalize(method="standard")

367

368

# Filter based on criteria

369

def high_variance_filter(col_array):

370

return np.var(col_array) > 1.0

371

372

high_var_table = table.filter_columns(high_variance_filter)

373

print(f"Filtered to {len(high_var_table.colNames)} high-variance columns")

374

```

375

376

### Integration with Trees

377

378

```python

379

from ete3 import ArrayTable, Tree

380

381

# Load data and tree

382

table = ArrayTable("gene_expression.txt")

383

tree = Tree("species_tree.nw")

384

385

# Link expression data to tree nodes

386

table.link_to_tree(tree, attr_name="expression")

387

388

# Access linked data

389

for leaf in tree.get_leaves():

390

if hasattr(leaf, 'expression'):

391

print(f"{leaf.name}: {leaf.expression[:5]}...") # First 5 values

392

393

# Extract profile data back from tree

394

extracted_table = table.get_tree_profile(tree, attr_name="expression")

395

```

396

397

### Clustering Analysis

398

399

```python

400

from ete3 import ArrayTable

401

402

# Load expression data

403

expression_table = ArrayTable("gene_expression_matrix.txt")

404

405

# Perform hierarchical clustering

406

cluster_tree = expression_table.cluster_data(method="ward", metric="euclidean")

407

408

# Analyze clustering results

409

print(f"Clustering tree: {cluster_tree.get_ascii()}")

410

411

# Get distance matrix for further analysis

412

dist_matrix = expression_table.get_distance_matrix()

413

print(f"Distance matrix shape: {dist_matrix.shape}")

414

```

415

416

### Advanced Data Analysis

417

418

```python

419

from ete3 import ArrayTable, ClusterTree

420

import numpy as np

421

422

# Load and prepare data

423

table = ArrayTable("multi_condition_data.txt")

424

425

# Select specific conditions

426

selected_conditions = ["treatment1", "treatment2", "control"]

427

filtered_table = table.select_columns(selected_conditions)

428

429

# Normalize and filter

430

normalized = filtered_table.normalize(method="standard")

431

432

# Filter for genes with significant variation

433

def significant_variation(row_array):

434

return np.max(row_array) - np.min(row_array) > 2.0

435

436

variable_genes = normalized.filter_rows(significant_variation)

437

438

# Cluster the filtered, normalized data

439

cluster_result = variable_genes.cluster_data(method="complete")

440

441

# Visualize clustering

442

cluster_result.show()

443

444

# Save processed data

445

variable_genes.write("filtered_normalized_data.txt")

446

```

447

448

### Custom Data Processing

449

450

```python

451

from ete3 import ArrayTable

452

import numpy as np

453

454

# Create table from Python data

455

data_matrix = np.random.rand(100, 20) # 100 genes, 20 samples

456

row_names = [f"gene_{i}" for i in range(100)]

457

col_names = [f"sample_{i}" for i in range(20)]

458

459

# Initialize empty table and populate

460

table = ArrayTable()

461

table.matrix = data_matrix

462

table.rowNames = row_names

463

table.colNames = col_names

464

table.rowValues = {name: i for i, name in enumerate(row_names)}

465

table.colValues = {name: i for i, name in enumerate(col_names)}

466

467

# Apply custom transformations

468

log_transformed = table.matrix.copy()

469

log_transformed = np.log2(log_transformed + 1) # log2(x+1) transformation

470

471

# Create new table with transformed data

472

log_table = ArrayTable()

473

log_table.matrix = log_transformed

474

log_table.rowNames = table.rowNames

475

log_table.colNames = table.colNames

476

log_table.rowValues = table.rowValues

477

log_table.colValues = table.colValues

478

479

# Save transformed data

480

log_table.write("log_transformed_data.txt")

481

```