Tessl Tile for pypi/xorbits@0.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md datasets.md index.md machine-learning.md numpy-integration.md pandas-integration.md remote-computing.md runtime-management.md

pandas-integration.mddocs/

0
# Pandas Integration
1

2
Drop-in replacement for pandas with distributed computing capabilities. Xorbits pandas provides the same API as pandas while enabling computation on datasets that exceed single-machine memory through distributed processing.
3

4
## Capabilities
5

6
### Core Data Structures
7

8
The fundamental data structures that mirror pandas DataFrame, Series, and Index with distributed capabilities.
9

10
```python { .api }
11
class DataFrame:
12
    """
13
    Distributed DataFrame with pandas-compatible API.
14
    
15
    Provides all pandas DataFrame functionality with automatic distribution
16
    across multiple workers for scalable data processing.
17
    """
18

19
class Series:
20
    """
21
    Distributed Series with pandas-compatible API.
22
    
23
    One-dimensional labeled array capable of holding any data type,
24
    distributed across multiple workers.
25
    """
26

27
class Index:
28
    """
29
    Distributed Index with pandas-compatible API.
30
    
31
    Immutable sequence used for indexing and alignment,
32
    supporting distributed operations.
33
    """
34
```
35

36
### Data Types and Time Components
37

38
Pandas-compatible data types and time-related classes for working with temporal data.
39

40
```python { .api }
41
class Timedelta:
42
    """Time delta class for representing durations."""
43

44
class DateOffset:
45
    """Date offset class for date arithmetic."""
46

47
class Interval:
48
    """Interval class for representing intervals between values."""
49

50
class Timestamp:
51
    """Timestamp class for representing points in time."""
52

53
NaT: object
54
    """Not-a-Time constant for missing time values."""
55

56
NA: object
57
    """Missing value indicator (pandas >= 1.0)."""
58

59
class NamedAgg:
60
    """Named aggregation class for groupby operations (pandas >= 1.0)."""
61

62
class ArrowDtype:
63
    """Arrow data type for PyArrow integration (pandas >= 1.5)."""
64
```
65

66
### Configuration Functions
67

68
Configuration management specific to pandas operations, mirroring the pandas options system.
69

70
```python { .api }
71
def describe_option(option_name: str) -> None:
72
    """
73
    Describe a configuration option.
74
    
75
    Parameters:
76
    - option_name: Name of the option to describe
77
    """
78

79
def get_option(option_name: str):
80
    """
81
    Get the value of a configuration option.
82
    
83
    Parameters:
84
    - option_name: Name of the option to retrieve
85
    
86
    Returns:
87
    - Current value of the option
88
    """
89

90
def set_option(option_name: str, value) -> None:
91
    """
92
    Set the value of a configuration option.
93
    
94
    Parameters:
95
    - option_name: Name of the option to set
96
    - value: New value for the option
97
    """
98

99
def reset_option(option_name: str) -> None:
100
    """
101
    Reset a configuration option to its default value.
102
    
103
    Parameters:
104
    - option_name: Name of the option to reset
105
    """
106

107
def option_context(*args, **kwargs):
108
    """
109
    Context manager for temporarily changing pandas options.
110
    
111
    Parameters:
112
    - *args: Option names and values as alternating arguments
113
    - **kwargs: Option names and values as keyword arguments
114
    
115
    Returns:
116
    - Context manager for temporary option changes
117
    """
118

119
def set_eng_float_format(format_string: str) -> None:
120
    """
121
    Set engineering float format for display.
122
    
123
    Parameters:
124
    - format_string: Format string for engineering notation
125
    """
126
```
127

128
### Specialized Modules
129

130
Access to pandas specialized functionality through submodules.
131

132
```python { .api }
133
# Submodules providing specialized functionality
134
accessors  # DataFrame and Series accessor functionality
135
core       # Core pandas data structures
136
groupby    # GroupBy functionality
137
plotting   # Plotting functionality
138
window     # Window operations
139
offsets    # Date offset functionality
140
```
141

142
### Dynamic Function Access
143

144
All pandas module-level functions are available through dynamic import, including but not limited to:
145

146
```python { .api }
147
# Data I/O functions
148
def read_csv(filepath_or_buffer, **kwargs): ...
149
def read_parquet(path, **kwargs): ...
150
def read_json(path_or_buf, **kwargs): ...
151
def read_excel(io, **kwargs): ...
152
def read_sql(sql, con, **kwargs): ...
153
def read_pickle(filepath_or_buffer, **kwargs): ...
154

155
# Data manipulation functions
156
def concat(objs, **kwargs): ...
157
def merge(left, right, **kwargs): ...
158
def merge_asof(left, right, **kwargs): ...
159
def crosstab(index, columns, **kwargs): ...
160
def pivot_table(data, **kwargs): ...
161
def melt(frame, **kwargs): ...
162

163
# Utility functions
164
def cut(x, bins, **kwargs): ...
165
def qcut(x, q, **kwargs): ...
166
def get_dummies(data, **kwargs): ...
167
def factorize(values, **kwargs): ...
168
def unique(values): ...
169
def value_counts(values, **kwargs): ...
170

171
# Date/time utilities
172
def date_range(start=None, end=None, periods=None, freq=None, **kwargs): ...
173
def period_range(start=None, end=None, periods=None, freq=None, **kwargs): ...
174
def timedelta_range(start=None, end=None, periods=None, freq=None, **kwargs): ...
175
def to_datetime(arg, **kwargs): ...
176
def to_timedelta(arg, **kwargs): ...
177
def to_numeric(arg, **kwargs): ...
178
```
179

180
**Usage Examples:**
181

182
```python
183
import xorbits
184
import xorbits.pandas as pd
185
import xorbits.numpy as np
186

187
xorbits.init()
188

189
# Creating DataFrames (same as pandas)
190
df = pd.DataFrame({
191
    'A': [1, 2, 3, 4, 5],
192
    'B': ['a', 'b', 'c', 'd', 'e'],
193
    'C': [1.1, 2.2, 3.3, 4.4, 5.5]
194
})
195

196
# Reading data (same as pandas)
197
df_from_csv = pd.read_csv('data.csv')
198

199
# Data manipulation (same as pandas)
200
grouped = df.groupby('B').agg({'A': 'sum', 'C': 'mean'})
201
merged = pd.merge(df, other_df, on='key')
202
concatenated = pd.concat([df1, df2])
203

204
# All pandas operations work the same way
205
result = df.query('A > 2').sort_values('C').head(10)
206

207
# Execute computation
208
computed = xorbits.run(result)
209

210
xorbits.shutdown()
211
```
212

213
### Configuration Usage
214

215
```python
216
import xorbits.pandas as pd
217

218
# Get current display options
219
max_rows = pd.get_option('display.max_rows')
220

221
# Set display options
222
pd.set_option('display.max_rows', 100)
223
pd.set_option('display.max_columns', 50)
224

225
# Use option context for temporary changes
226
with pd.option_context('display.max_rows', 20):
227
    print(large_dataframe)  # Shows only 20 rows
228

229
# Reset options
230
pd.reset_option('display.max_rows')
231
```

Version

Tile

Files

pandas-integration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

pandas-integration.mddocs/