or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-datatable

Python package for manipulating 2-dimensional tabular data structures with emphasis on speed and big data support

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/datatable@1.1.x

To install, run

npx @tessl/cli install tessl/pypi-datatable@1.1.0

0

# datatable

1

2

A high-performance Python library for manipulating 2-dimensional tabular data structures with emphasis on speed and big data support up to 100GB on single-node machines. It features column-oriented data storage with native-C implementation, fast CSV reading, multi-threaded processing, and an expressive query syntax similar to R's data.table.

3

4

## Package Information

5

6

- **Package Name**: datatable

7

- **Language**: Python

8

- **Installation**: `pip install datatable`

9

10

## Core Imports

11

12

```python

13

import datatable as dt

14

from datatable import f, g, by, join

15

```

16

17

Common pattern for data manipulation:

18

19

```python

20

import datatable as dt

21

from datatable import f, g, by

22

```

23

24

## Basic Usage

25

26

```python

27

import datatable as dt

28

from datatable import f, g, by

29

30

# Read data from CSV

31

DT = dt.fread("data.csv")

32

33

# Create a Frame from data

34

DT = dt.Frame({

35

'A': [1, 2, 3, 4, 5],

36

'B': ['a', 'b', 'c', 'd', 'e'],

37

'C': [1.1, 2.2, 3.3, 4.4, 5.5]

38

})

39

40

# Basic operations

41

result = DT[:, f.A] # Select column A

42

result = DT[f.A > 2, :] # Filter rows where A > 2

43

result = DT[:, dt.sum(f.A)] # Aggregate sum of column A

44

45

# Groupby operations

46

result = DT[:, dt.sum(f.A), by(f.B)] # Sum A grouped by B

47

48

# Update operations

49

DT[:, dt.update(D=f.A * 2)] # Add new column D

50

51

# Join operations

52

DT2 = dt.Frame({'B': ['a', 'b'], 'X': [10, 20]})

53

result = DT[:, :, dt.join(DT2)] # Join on common columns

54

```

55

56

## Architecture

57

58

datatable follows a columnar storage architecture for performance:

59

60

- **Frame**: Main data structure representing a 2D table with column-oriented storage

61

- **Expression System**: f/g objects for column references and expression building

62

- **Type System**: Comprehensive stype/ltype system for precise data type control

63

- **Native-C Core**: Performance-critical operations implemented in C for speed

64

- **Memory Mapping**: Support for out-of-memory operations on large datasets

65

66

The library is designed specifically for machine learning applications requiring fast feature generation from large datasets, with copy-on-write semantics and rowindex views to minimize data copying.

67

68

## Capabilities

69

70

### Core Data Structure

71

72

The Frame class provides the main interface for tabular data manipulation with high-performance columnar storage and comprehensive data type support.

73

74

```python { .api }

75

class Frame:

76

def __init__(self, data=None, *, names=None, stypes=None,

77

stype=None, types=None, type=None): ...

78

79

@property

80

def shape(self) -> tuple: ...

81

@property

82

def names(self) -> tuple: ...

83

@property

84

def stypes(self) -> tuple: ...

85

86

def __getitem__(self, key): ...

87

def __setitem__(self, key, value): ...

88

```

89

90

[Core Data Structures](./core-data-structures.md)

91

92

### Expression System

93

94

Column references and expression building using f and g objects for flexible data queries and transformations.

95

96

```python { .api }

97

# Column reference objects

98

f: object # Primary column reference

99

g: object # Secondary column reference (for joins)

100

101

class FExpr:

102

"""Expression object for column operations"""

103

pass

104

105

class Namespace:

106

"""Namespace for organizing column references"""

107

pass

108

```

109

110

[Expression System](./expression-system.md)

111

112

### File I/O Operations

113

114

High-performance reading and writing of various file formats with automatic type detection and memory-efficient processing.

115

116

```python { .api }

117

def fread(anysource=None, *, file=None, text=None, cmd=None,

118

url=None, **kwargs) -> Frame: ...

119

120

def iread(anysource=None, *, file=None, text=None, cmd=None,

121

url=None, **kwargs): ... # Iterator version

122

```

123

124

[File I/O](./file-io.md)

125

126

### Data Manipulation Functions

127

128

Comprehensive set of functions for combining, transforming, and reshaping data frames.

129

130

```python { .api }

131

def cbind(*frames) -> Frame: ...

132

def rbind(*frames, force=False, bynames=True) -> Frame: ...

133

def unique(frame, *cols) -> Frame: ...

134

def sort(frame, *cols) -> Frame: ...

135

def update(**kwargs): ... # Update/add columns

136

def fillna(frame, value): ... # Fill missing values

137

def repeat(frame, n): ... # Repeat rows n times

138

def shift(frame, n): ... # Shift values by n positions

139

```

140

141

[Data Manipulation](./data-manipulation.md)

142

143

### Reduction and Aggregation

144

145

Statistical and mathematical reduction functions for data analysis and aggregation operations.

146

147

```python { .api }

148

def sum(expr): ...

149

def mean(expr): ...

150

def count(expr=None): ...

151

def min(expr): ...

152

def max(expr): ...

153

def median(expr): ...

154

def sd(expr): ... # Standard deviation

155

def nunique(expr): ...

156

```

157

158

[Reductions and Aggregations](./reductions-aggregations.md)

159

160

### Mathematical Functions

161

162

Comprehensive mathematical operations including trigonometric, logarithmic, and statistical functions.

163

164

```python { .api }

165

def abs(x): ...

166

def exp(x): ...

167

def log(x): ...

168

def log10(x): ...

169

def sqrt(x): ...

170

def isna(x): ...

171

def ifelse(condition, x, y): ... # Conditional selection

172

```

173

174

[Mathematical Functions](./mathematical-functions.md)

175

176

### Set Operations

177

178

Mathematical set operations for combining and comparing data frames.

179

180

```python { .api }

181

def union(*frames) -> Frame: ...

182

def intersect(*frames) -> Frame: ...

183

def setdiff(frame1, frame2) -> Frame: ...

184

def symdiff(frame1, frame2) -> Frame: ...

185

```

186

187

[Set Operations](./set-operations.md)

188

189

### Row-wise Operations

190

191

Element-wise operations across columns within rows for complex transformations.

192

193

```python { .api }

194

def rowall(*cols): ...

195

def rowany(*cols): ...

196

def rowcount(*cols): ...

197

def rowsum(*cols): ...

198

def rowmean(*cols): ...

199

```

200

201

[Row-wise Operations](./row-operations.md)

202

203

### String Operations

204

205

Text processing and manipulation functions for string columns.

206

207

```python { .api }

208

# String module functions

209

def len(x): ... # String length

210

def slice(x, start, stop=None): ... # String slicing

211

```

212

213

[String Operations](./string-operations.md)

214

215

### Time Operations

216

217

Date and time manipulation functions for temporal data analysis.

218

219

```python { .api }

220

def year(x): ...

221

def month(x): ...

222

def day(x): ...

223

def hour(x): ...

224

def minute(x): ...

225

def second(x): ...

226

```

227

228

[Time Operations](./time-operations.md)

229

230

### Type System and Conversion

231

232

Comprehensive type system with storage types (stype) and logical types (ltype) for precise data type control.

233

234

```python { .api }

235

class stype(Enum):

236

void = 0

237

bool8 = 1

238

int8 = 2

239

int16 = 3

240

int32 = 4

241

int64 = 5

242

float32 = 6

243

float64 = 7

244

str32 = 11

245

str64 = 12

246

obj64 = 21

247

248

def as_type(frame, new_type): ...

249

```

250

251

[Type System](./type-system.md)

252

253

### Data Binning and Encoding

254

255

Functions for data discretization and categorical encoding operations.

256

257

```python { .api }

258

def cut(x, bins, right=True, labels=None): ... # Bin values into discrete intervals

259

def qcut(x, q, labels=None): ... # Quantile-based discretization

260

def split_into_nhot(frame, delimiter=","): ... # One-hot encoding for delimited strings

261

```

262

263

## Global Objects

264

265

```python { .api }

266

# Module alias

267

dt = datatable # Common alias for the datatable module

268

269

# Configuration

270

options: Config # Global configuration system

271

272

# Display initialization

273

init_styles(): ... # Initialize display styles (auto-run in Jupyter)

274

```