Tessl Tile for pypi/ete3@3.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

clustering.md core-tree.md data-tables.md external-formats.md index.md ncbi-taxonomy.md phylogenetic.md sequences.md visualization.md

data-tables.mddocs/

0
# Data Tables and Arrays
1

2
Efficient handling of numerical data associated with trees and sequences, supporting matrix operations, statistical analysis, and integration with scientific computing workflows. ETE3's ArrayTable provides high-performance data manipulation capabilities.
3

4
## Capabilities
5

6
### ArrayTable Class
7

8
Main class for handling 2D numerical data with matrix operations and scientific computing integration.
9

10
```python { .api }
11
class ArrayTable:
12
    """
13
    Efficient 2D data table with matrix operations and scientific computing support.
14
    Built on NumPy for high performance numerical operations.
15
    """
16
    
17
    def __init__(self, matrix_file=None, mtype="float"):
18
        """
19
        Initialize array table.
20

21
        Parameters:
22
        - matrix_file (str): Path to matrix data file
23
        - mtype (str): Data type ("float", "int", "str")
24
        """
25

26
    def __len__(self):
27
        """Number of rows in table."""
28
        
29
    def __str__(self):
30
        """String representation of table."""
31
```
32

33
### Data Access and Retrieval
34

35
Methods for accessing rows, columns, and individual data elements.
36

37
```python { .api }
38
def get_column_array(self, colname):
39
    """
40
    Get column data as NumPy array.
41

42
    Parameters:
43
    - colname (str): Column name
44

45
    Returns:
46
    numpy.ndarray: Column data array
47
    """
48

49
def get_row_array(self, rowname):
50
    """
51
    Get row data as NumPy array.
52

53
    Parameters:
54
    - rowname (str): Row name
55

56
    Returns:
57
    numpy.ndarray: Row data array
58
    """
59

60
def get_several_column_arrays(self, colnames):
61
    """
62
    Get multiple columns as arrays.
63

64
    Parameters:
65
    - colnames (list): List of column names
66

67
    Returns:
68
    dict: Mapping from column names to arrays
69
    """
70

71
def get_several_row_arrays(self, rownames):
72
    """
73
    Get multiple rows as arrays.
74

75
    Parameters:
76
    - rownames (list): List of row names
77

78
    Returns:
79
    dict: Mapping from row names to arrays
80
    """
81

82
# Properties for data access
83
matrix: numpy.ndarray     # Underlying data matrix
84
colNames: list           # Column names
85
rowNames: list           # Row names
86
colValues: dict          # Column name to index mapping
87
rowValues: dict          # Row name to index mapping
88
```
89

90
### Matrix Operations
91

92
Mathematical operations and transformations on the data matrix.
93

94
```python { .api }
95
def transpose(self):
96
    """
97
    Transpose the matrix (swap rows and columns).
98

99
    Returns:
100
    ArrayTable: New transposed table
101
    """
102

103
def remove_column(self, colname):
104
    """
105
    Remove column from table.
106

107
    Parameters:
108
    - colname (str): Column name to remove
109
    """
110

111
def remove_row(self, rowname):
112
    """
113
    Remove row from table.
114

115
    Parameters:
116
    - rowname (str): Row name to remove
117
    """
118

119
def add_column(self, colname, colvalues):
120
    """
121
    Add new column to table.
122

123
    Parameters:
124
    - colname (str): Name for new column
125
    - colvalues (array-like): Column data values
126
    """
127

128
def add_row(self, rowname, rowvalues):
129
    """
130
    Add new row to table.
131

132
    Parameters:
133
    - rowname (str): Name for new row
134
    - rowvalues (array-like): Row data values
135
    """
136
```
137

138
### File I/O Operations
139

140
Read and write table data in various formats.
141

142
```python { .api }
143
def write(self, fname=None, colnames=None):
144
    """
145
    Write table to file.
146

147
    Parameters:
148
    - fname (str): Output file path, if None returns string
149
    - colnames (list): Specific columns to write
150

151
    Returns:
152
    str: Formatted table string (if fname is None)
153
    """
154

155
def read(self, matrix_file, mtype="float", **kwargs):
156
    """
157
    Read table data from file.
158

159
    Parameters:
160
    - matrix_file (str): Input file path
161
    - mtype (str): Data type for parsing
162
    - kwargs: Additional parsing parameters
163
    """
164
```
165

166
### Statistical Operations
167

168
Built-in statistical analysis and data summary methods.
169

170
```python { .api }
171
def get_stats(self):
172
    """
173
    Calculate basic statistics for all columns.
174

175
    Returns:
176
    dict: Statistics including mean, std, min, max for each column
177
    """
178

179
def get_column_stats(self, colname):
180
    """
181
    Calculate statistics for specific column.
182

183
    Parameters:
184
    - colname (str): Column name
185

186
    Returns:
187
    dict: Column statistics (mean, std, min, max, etc.)
188
    """
189

190
def normalize(self, method="standard"):
191
    """
192
    Normalize data using specified method.
193

194
    Parameters:
195
    - method (str): Normalization method ("standard", "minmax", "robust")
196

197
    Returns:
198
    ArrayTable: Normalized table
199
    """
200
```
201

202
### Data Filtering and Selection
203

204
Filter and select subsets of data based on criteria.
205

206
```python { .api }
207
def filter_columns(self, condition_func):
208
    """
209
    Filter columns based on condition function.
210

211
    Parameters:
212
    - condition_func (function): Function that takes column array, returns bool
213

214
    Returns:
215
    ArrayTable: Filtered table
216
    """
217

218
def filter_rows(self, condition_func):
219
    """
220
    Filter rows based on condition function.
221

222
    Parameters:
223
    - condition_func (function): Function that takes row array, returns bool
224

225
    Returns:
226
    ArrayTable: Filtered table
227
    """
228

229
def select_columns(self, colnames):
230
    """
231
    Select specific columns.
232

233
    Parameters:
234
    - colnames (list): Column names to select
235

236
    Returns:
237
    ArrayTable: Table with selected columns
238
    """
239

240
def select_rows(self, rownames):
241
    """
242
    Select specific rows.
243

244
    Parameters:
245
    - rownames (list): Row names to select
246

247
    Returns:
248
    ArrayTable: Table with selected rows
249
    """
250
```
251

252
### Integration with Trees
253

254
Methods for associating tabular data with tree structures.
255

256
```python { .api }
257
def link_to_tree(self, tree, attr_name="profile"):
258
    """
259
    Link table data to tree nodes.
260

261
    Parameters:
262
    - tree (Tree): Tree to link data to
263
    - attr_name (str): Attribute name for storing data in nodes
264
    """
265

266
def get_tree_profile(self, tree, attr_name="profile"):
267
    """
268
    Extract profile data from tree nodes.
269

270
    Parameters:
271
    - tree (Tree): Tree with profile data
272
    - attr_name (str): Attribute name containing data
273

274
    Returns:
275
    ArrayTable: Table with tree profile data
276
    """
277
```
278

279
## Clustering Integration
280

281
### ClusterTree with ArrayTable
282

283
Enhanced clustering functionality when combined with data tables.
284

285
```python { .api }
286
def get_distance_matrix(self):
287
    """
288
    Calculate distance matrix between rows.
289

290
    Returns:
291
    numpy.ndarray: Symmetric distance matrix
292
    """
293

294
def cluster_data(self, method="ward", metric="euclidean"):
295
    """
296
    Perform hierarchical clustering on data.
297

298
    Parameters:
299
    - method (str): Linkage method ("ward", "complete", "average", "single")
300
    - metric (str): Distance metric ("euclidean", "manhattan", "cosine")
301

302
    Returns:
303
    ClusterTree: Tree representing clustering hierarchy
304
    """
305
```
306

307
## Usage Examples
308

309
### Basic Table Operations
310

311
```python
312
from ete3 import ArrayTable
313
import numpy as np
314

315
# Create table from file
316
table = ArrayTable("data_matrix.txt", mtype="float")
317

318
# Basic properties
319
print(f"Table dimensions: {len(table.rowNames)} x {len(table.colNames)}")
320
print(f"Column names: {table.colNames}")
321
print(f"Row names: {table.rowNames}")
322

323
# Access data
324
col_data = table.get_column_array("column1")
325
row_data = table.get_row_array("row1")
326

327
print(f"Column1 stats: mean={np.mean(col_data):.2f}, std={np.std(col_data):.2f}")
328
```
329

330
### Data Manipulation
331

332
```python
333
from ete3 import ArrayTable
334

335
# Load data
336
table = ArrayTable("expression_data.txt")
337

338
# Remove unwanted columns/rows
339
table.remove_column("control_sample")
340
table.remove_row("uninformative_gene")
341

342
# Add new data
343
new_column_data = [1.5, 2.3, 0.8, 3.1, 1.9]
344
table.add_column("new_condition", new_column_data)
345

346
# Transpose for different analysis perspective
347
transposed = table.transpose()
348

349
# Save results
350
table.write("modified_data.txt")
351
```
352

353
### Statistical Analysis
354

355
```python
356
from ete3 import ArrayTable
357

358
table = ArrayTable("experimental_data.txt")
359

360
# Get overall statistics
361
stats = table.get_stats()
362
for col, col_stats in stats.items():
363
    print(f"{col}: mean={col_stats['mean']:.2f}, std={col_stats['std']:.2f}")
364

365
# Normalize data
366
normalized_table = table.normalize(method="standard")
367

368
# Filter based on criteria
369
def high_variance_filter(col_array):
370
    return np.var(col_array) > 1.0
371

372
high_var_table = table.filter_columns(high_variance_filter)
373
print(f"Filtered to {len(high_var_table.colNames)} high-variance columns")
374
```
375

376
### Integration with Trees
377

378
```python
379
from ete3 import ArrayTable, Tree
380

381
# Load data and tree
382
table = ArrayTable("gene_expression.txt")
383
tree = Tree("species_tree.nw")
384

385
# Link expression data to tree nodes
386
table.link_to_tree(tree, attr_name="expression")
387

388
# Access linked data
389
for leaf in tree.get_leaves():
390
    if hasattr(leaf, 'expression'):
391
        print(f"{leaf.name}: {leaf.expression[:5]}...")  # First 5 values
392

393
# Extract profile data back from tree
394
extracted_table = table.get_tree_profile(tree, attr_name="expression")
395
```
396

397
### Clustering Analysis
398

399
```python
400
from ete3 import ArrayTable
401

402
# Load expression data
403
expression_table = ArrayTable("gene_expression_matrix.txt")
404

405
# Perform hierarchical clustering
406
cluster_tree = expression_table.cluster_data(method="ward", metric="euclidean")
407

408
# Analyze clustering results
409
print(f"Clustering tree: {cluster_tree.get_ascii()}")
410

411
# Get distance matrix for further analysis
412
dist_matrix = expression_table.get_distance_matrix()
413
print(f"Distance matrix shape: {dist_matrix.shape}")
414
```
415

416
### Advanced Data Analysis
417

418
```python
419
from ete3 import ArrayTable, ClusterTree
420
import numpy as np
421

422
# Load and prepare data
423
table = ArrayTable("multi_condition_data.txt")
424

425
# Select specific conditions
426
selected_conditions = ["treatment1", "treatment2", "control"]
427
filtered_table = table.select_columns(selected_conditions)
428

429
# Normalize and filter
430
normalized = filtered_table.normalize(method="standard")
431

432
# Filter for genes with significant variation
433
def significant_variation(row_array):
434
    return np.max(row_array) - np.min(row_array) > 2.0
435

436
variable_genes = normalized.filter_rows(significant_variation)
437

438
# Cluster the filtered, normalized data
439
cluster_result = variable_genes.cluster_data(method="complete")
440

441
# Visualize clustering
442
cluster_result.show()
443

444
# Save processed data
445
variable_genes.write("filtered_normalized_data.txt")
446
```
447

448
### Custom Data Processing
449

450
```python
451
from ete3 import ArrayTable
452
import numpy as np
453

454
# Create table from Python data
455
data_matrix = np.random.rand(100, 20)  # 100 genes, 20 samples
456
row_names = [f"gene_{i}" for i in range(100)]
457
col_names = [f"sample_{i}" for i in range(20)]
458

459
# Initialize empty table and populate
460
table = ArrayTable()
461
table.matrix = data_matrix
462
table.rowNames = row_names  
463
table.colNames = col_names
464
table.rowValues = {name: i for i, name in enumerate(row_names)}
465
table.colValues = {name: i for i, name in enumerate(col_names)}
466

467
# Apply custom transformations
468
log_transformed = table.matrix.copy()
469
log_transformed = np.log2(log_transformed + 1)  # log2(x+1) transformation
470

471
# Create new table with transformed data
472
log_table = ArrayTable()
473
log_table.matrix = log_transformed
474
log_table.rowNames = table.rowNames
475
log_table.colNames = table.colNames
476
log_table.rowValues = table.rowValues
477
log_table.colValues = table.colValues
478

479
# Save transformed data
480
log_table.write("log_transformed_data.txt")
481
```

Version

Tile

Files

data-tables.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-tables.mddocs/