or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

api-types.mdconfiguration.mdcore-data-structures.mddata-io.mddata-manipulation.mddata-types.mderrors.mdindex.mdplotting.mdstatistics-math.mdtime-series.md

index.mddocs/

0

# Pandas

1

2

Pandas is a comprehensive Python data analysis library that provides powerful, flexible, and expressive data structures designed for working with structured and time series data. It offers extensive functionality for data manipulation, cleaning, transformation, and analysis including data alignment, merging, reshaping, grouping, and statistical operations.

3

4

## Package Information

5

6

- **Package Name**: pandas

7

- **Package Type**: library

8

- **Language**: Python

9

- **Installation**: `pip install pandas`

10

11

## Core Imports

12

13

```python

14

import pandas as pd

15

```

16

17

Common imports for specific functionality:

18

19

```python

20

import pandas as pd

21

from pandas import DataFrame, Series, Index

22

```

23

24

## Basic Usage

25

26

```python

27

import pandas as pd

28

import numpy as np

29

30

# Create a DataFrame from dictionary

31

data = {

32

'name': ['Alice', 'Bob', 'Charlie', 'Diana'],

33

'age': [25, 30, 35, 28],

34

'city': ['New York', 'London', 'Tokyo', 'Paris'],

35

'salary': [50000, 60000, 70000, 55000]

36

}

37

df = pd.DataFrame(data)

38

39

# Basic operations

40

print(df.head()) # Display first 5 rows

41

print(df.info()) # Display DataFrame info

42

print(df.describe()) # Statistical summary

43

44

# Data selection and filtering

45

young_employees = df[df['age'] < 30]

46

high_earners = df[df['salary'] > 55000]

47

48

# Create a Series

49

ages = pd.Series([25, 30, 35, 28], name='ages')

50

print(ages.mean()) # Calculate mean age

51

52

# Read data from files

53

df_csv = pd.read_csv('data.csv')

54

df_excel = pd.read_excel('data.xlsx')

55

56

# Basic data manipulation

57

df['bonus'] = df['salary'] * 0.1 # Add new column

58

df_sorted = df.sort_values('salary') # Sort by salary

59

df_grouped = df.groupby('city')['salary'].mean() # Group and aggregate

60

```

61

62

## Architecture

63

64

Pandas is built around three fundamental data structures:

65

66

- **Series**: One-dimensional labeled array capable of holding any data type

67

- **DataFrame**: Two-dimensional labeled data structure with heterogeneous columns

68

- **Index**: Immutable sequence used for indexing and alignment

69

70

The library integrates seamlessly with NumPy, providing optimized performance through vectorized operations, and serves as the foundation for the Python data science ecosystem, including integration with Jupyter notebooks, matplotlib, scikit-learn, and hundreds of domain-specific analysis libraries.

71

72

## Capabilities

73

74

### Core Data Structures

75

76

The fundamental data structures that form the foundation of pandas: DataFrame, Series, and various Index types. These structures provide the building blocks for all data manipulation operations.

77

78

```python { .api }

79

class DataFrame:

80

def __init__(self, data=None, index=None, columns=None, dtype=None, copy=None): ...

81

82

class Series:

83

def __init__(self, data=None, index=None, dtype=None, name=None, copy=None, fastpath=False): ...

84

85

class Index:

86

def __init__(self, data=None, dtype=None, copy=False, name=None, tupleize_cols=True): ...

87

```

88

89

[Core Data Structures](./core-data-structures.md)

90

91

### Data Input/Output

92

93

Comprehensive I/O capabilities for reading and writing data in various formats including CSV, Excel, JSON, SQL databases, HDF5, Parquet, and many statistical file formats.

94

95

```python { .api }

96

def read_csv(filepath_or_buffer, **kwargs): ...

97

def read_excel(io, **kwargs): ...

98

def read_json(path_or_buf, **kwargs): ...

99

def read_sql(sql, con, **kwargs): ...

100

def read_parquet(path, **kwargs): ...

101

```

102

103

[Data Input/Output](./data-io.md)

104

105

### Data Manipulation and Reshaping

106

107

Functions for combining, reshaping, and transforming data including merging, concatenation, pivoting, melting, and advanced data restructuring operations.

108

109

```python { .api }

110

def concat(objs, axis=0, join='outer', **kwargs): ...

111

def merge(left, right, how='inner', on=None, **kwargs): ...

112

def pivot_table(data, values=None, index=None, columns=None, **kwargs): ...

113

def melt(data, id_vars=None, value_vars=None, **kwargs): ...

114

```

115

116

[Data Manipulation](./data-manipulation.md)

117

118

### Time Series and Date Handling

119

120

Comprehensive time series functionality including date/time parsing, time zone handling, frequency conversion, resampling, and specialized time-based operations.

121

122

```python { .api }

123

def date_range(start=None, end=None, periods=None, freq=None, **kwargs): ...

124

def to_datetime(arg, **kwargs): ...

125

class Timestamp:

126

def __init__(self, ts_input=None, freq=None, tz=None, **kwargs): ...

127

```

128

129

[Time Series](./time-series.md)

130

131

### Data Types and Missing Data

132

133

Extension data types, missing data handling, and type conversion utilities including nullable integer/boolean types, categorical data, and advanced missing value operations.

134

135

```python { .api }

136

def isna(obj): ...

137

def notna(obj): ...

138

class Categorical:

139

def __init__(self, values, categories=None, ordered=None, dtype=None, fastpath=False): ...

140

```

141

142

[Data Types](./data-types.md)

143

144

### Statistical and Mathematical Operations

145

146

Built-in statistical functions, mathematical operations, and data analysis utilities including descriptive statistics, correlation analysis, and numerical computations.

147

148

```python { .api }

149

def cut(x, bins, **kwargs): ...

150

def qcut(x, q, **kwargs): ...

151

def factorize(values, **kwargs): ...

152

def value_counts(values, **kwargs): ...

153

```

154

155

[Statistics and Math](./statistics-math.md)

156

157

### Configuration and Options

158

159

Pandas configuration system for controlling display options, computational behavior, and library-wide settings.

160

161

```python { .api }

162

def get_option(pat): ...

163

def set_option(pat, value): ...

164

def reset_option(pat): ...

165

def option_context(*args): ...

166

```

167

168

[Configuration](./configuration.md)

169

170

### Plotting and Visualization

171

172

Comprehensive plotting capabilities including basic plot types, statistical visualizations, and advanced multivariate analysis plots built on matplotlib.

173

174

```python { .api }

175

def scatter_matrix(frame, **kwargs): ...

176

def parallel_coordinates(frame, class_column, **kwargs): ...

177

def andrews_curves(frame, class_column, **kwargs): ...

178

def radviz(frame, class_column, **kwargs): ...

179

```

180

181

[Plotting](./plotting.md)

182

183

### API and Type Checking

184

185

Type checking utilities and data type validation functions for working with pandas data structures and ensuring data quality.

186

187

```python { .api }

188

def is_numeric_dtype(arr_or_dtype): ...

189

def is_datetime64_dtype(arr_or_dtype): ...

190

def is_categorical_dtype(arr_or_dtype): ...

191

def infer_dtype(value, **kwargs): ...

192

```

193

194

[API Types](./api-types.md)

195

196

### Error Handling

197

198

Exception and warning classes for proper error handling in pandas applications, including parsing errors, performance warnings, and data validation errors.

199

200

```python { .api }

201

class ParserError(ValueError): ...

202

class PerformanceWarning(Warning): ...

203

class SettingWithCopyWarning(Warning): ...

204

class DtypeWarning(Warning): ...

205

```

206

207

[Errors](./errors.md)

208

209

## Types

210

211

```python { .api }

212

# Core scalar types

213

class Timestamp:

214

"""Pandas timestamp object."""

215

pass

216

217

class Timedelta:

218

"""Pandas timedelta object."""

219

pass

220

221

class Period:

222

"""Pandas period object."""

223

pass

224

225

class Interval:

226

"""Pandas interval object."""

227

pass

228

229

# Missing value sentinels

230

NA: object # Pandas missing value sentinel

231

NaT: object # Not-a-Time for datetime/timedelta

232

233

# Common type aliases

234

Scalar = Union[str, int, float, bool, Timestamp, Timedelta, Period, Interval]

235

ArrayLike = Union[list, tuple, np.ndarray, Series, Index]

236

Axes = Union[int, str, Sequence[Union[int, str]]]

237

```