or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mddatasets.mdindex.mdmachine-learning.mdnumpy-integration.mdpandas-integration.mdremote-computing.mdruntime-management.md

pandas-integration.mddocs/

0

# Pandas Integration

1

2

Drop-in replacement for pandas with distributed computing capabilities. Xorbits pandas provides the same API as pandas while enabling computation on datasets that exceed single-machine memory through distributed processing.

3

4

## Capabilities

5

6

### Core Data Structures

7

8

The fundamental data structures that mirror pandas DataFrame, Series, and Index with distributed capabilities.

9

10

```python { .api }

11

class DataFrame:

12

"""

13

Distributed DataFrame with pandas-compatible API.

14

15

Provides all pandas DataFrame functionality with automatic distribution

16

across multiple workers for scalable data processing.

17

"""

18

19

class Series:

20

"""

21

Distributed Series with pandas-compatible API.

22

23

One-dimensional labeled array capable of holding any data type,

24

distributed across multiple workers.

25

"""

26

27

class Index:

28

"""

29

Distributed Index with pandas-compatible API.

30

31

Immutable sequence used for indexing and alignment,

32

supporting distributed operations.

33

"""

34

```

35

36

### Data Types and Time Components

37

38

Pandas-compatible data types and time-related classes for working with temporal data.

39

40

```python { .api }

41

class Timedelta:

42

"""Time delta class for representing durations."""

43

44

class DateOffset:

45

"""Date offset class for date arithmetic."""

46

47

class Interval:

48

"""Interval class for representing intervals between values."""

49

50

class Timestamp:

51

"""Timestamp class for representing points in time."""

52

53

NaT: object

54

"""Not-a-Time constant for missing time values."""

55

56

NA: object

57

"""Missing value indicator (pandas >= 1.0)."""

58

59

class NamedAgg:

60

"""Named aggregation class for groupby operations (pandas >= 1.0)."""

61

62

class ArrowDtype:

63

"""Arrow data type for PyArrow integration (pandas >= 1.5)."""

64

```

65

66

### Configuration Functions

67

68

Configuration management specific to pandas operations, mirroring the pandas options system.

69

70

```python { .api }

71

def describe_option(option_name: str) -> None:

72

"""

73

Describe a configuration option.

74

75

Parameters:

76

- option_name: Name of the option to describe

77

"""

78

79

def get_option(option_name: str):

80

"""

81

Get the value of a configuration option.

82

83

Parameters:

84

- option_name: Name of the option to retrieve

85

86

Returns:

87

- Current value of the option

88

"""

89

90

def set_option(option_name: str, value) -> None:

91

"""

92

Set the value of a configuration option.

93

94

Parameters:

95

- option_name: Name of the option to set

96

- value: New value for the option

97

"""

98

99

def reset_option(option_name: str) -> None:

100

"""

101

Reset a configuration option to its default value.

102

103

Parameters:

104

- option_name: Name of the option to reset

105

"""

106

107

def option_context(*args, **kwargs):

108

"""

109

Context manager for temporarily changing pandas options.

110

111

Parameters:

112

- *args: Option names and values as alternating arguments

113

- **kwargs: Option names and values as keyword arguments

114

115

Returns:

116

- Context manager for temporary option changes

117

"""

118

119

def set_eng_float_format(format_string: str) -> None:

120

"""

121

Set engineering float format for display.

122

123

Parameters:

124

- format_string: Format string for engineering notation

125

"""

126

```

127

128

### Specialized Modules

129

130

Access to pandas specialized functionality through submodules.

131

132

```python { .api }

133

# Submodules providing specialized functionality

134

accessors # DataFrame and Series accessor functionality

135

core # Core pandas data structures

136

groupby # GroupBy functionality

137

plotting # Plotting functionality

138

window # Window operations

139

offsets # Date offset functionality

140

```

141

142

### Dynamic Function Access

143

144

All pandas module-level functions are available through dynamic import, including but not limited to:

145

146

```python { .api }

147

# Data I/O functions

148

def read_csv(filepath_or_buffer, **kwargs): ...

149

def read_parquet(path, **kwargs): ...

150

def read_json(path_or_buf, **kwargs): ...

151

def read_excel(io, **kwargs): ...

152

def read_sql(sql, con, **kwargs): ...

153

def read_pickle(filepath_or_buffer, **kwargs): ...

154

155

# Data manipulation functions

156

def concat(objs, **kwargs): ...

157

def merge(left, right, **kwargs): ...

158

def merge_asof(left, right, **kwargs): ...

159

def crosstab(index, columns, **kwargs): ...

160

def pivot_table(data, **kwargs): ...

161

def melt(frame, **kwargs): ...

162

163

# Utility functions

164

def cut(x, bins, **kwargs): ...

165

def qcut(x, q, **kwargs): ...

166

def get_dummies(data, **kwargs): ...

167

def factorize(values, **kwargs): ...

168

def unique(values): ...

169

def value_counts(values, **kwargs): ...

170

171

# Date/time utilities

172

def date_range(start=None, end=None, periods=None, freq=None, **kwargs): ...

173

def period_range(start=None, end=None, periods=None, freq=None, **kwargs): ...

174

def timedelta_range(start=None, end=None, periods=None, freq=None, **kwargs): ...

175

def to_datetime(arg, **kwargs): ...

176

def to_timedelta(arg, **kwargs): ...

177

def to_numeric(arg, **kwargs): ...

178

```

179

180

**Usage Examples:**

181

182

```python

183

import xorbits

184

import xorbits.pandas as pd

185

import xorbits.numpy as np

186

187

xorbits.init()

188

189

# Creating DataFrames (same as pandas)

190

df = pd.DataFrame({

191

'A': [1, 2, 3, 4, 5],

192

'B': ['a', 'b', 'c', 'd', 'e'],

193

'C': [1.1, 2.2, 3.3, 4.4, 5.5]

194

})

195

196

# Reading data (same as pandas)

197

df_from_csv = pd.read_csv('data.csv')

198

199

# Data manipulation (same as pandas)

200

grouped = df.groupby('B').agg({'A': 'sum', 'C': 'mean'})

201

merged = pd.merge(df, other_df, on='key')

202

concatenated = pd.concat([df1, df2])

203

204

# All pandas operations work the same way

205

result = df.query('A > 2').sort_values('C').head(10)

206

207

# Execute computation

208

computed = xorbits.run(result)

209

210

xorbits.shutdown()

211

```

212

213

### Configuration Usage

214

215

```python

216

import xorbits.pandas as pd

217

218

# Get current display options

219

max_rows = pd.get_option('display.max_rows')

220

221

# Set display options

222

pd.set_option('display.max_rows', 100)

223

pd.set_option('display.max_columns', 50)

224

225

# Use option context for temporary changes

226

with pd.option_context('display.max_rows', 20):

227

print(large_dataframe) # Shows only 20 rows

228

229

# Reset options

230

pd.reset_option('display.max_rows')

231

```