or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-vega-datasets

A Python package for offline access to Vega datasets

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vega-datasets@0.9.x

To install, run

npx @tessl/cli install tessl/pypi-vega-datasets@0.9.0

0

# Vega Datasets

1

2

A Python package for offline access to Vega visualization datasets, providing a comprehensive collection of well-known datasets commonly used in data visualization and statistical analysis. Returns results as Pandas DataFrames for seamless integration with Python data science workflows.

3

4

## Package Information

5

6

- **Package Name**: vega_datasets

7

- **Language**: Python

8

- **Installation**: `pip install vega_datasets`

9

10

## Core Imports

11

12

```python

13

import vega_datasets

14

from vega_datasets import data, local_data

15

```

16

17

Access individual components:

18

19

```python

20

from vega_datasets import DataLoader, LocalDataLoader

21

from vega_datasets.utils import connection_ok

22

```

23

24

## Basic Usage

25

26

```python

27

from vega_datasets import data

28

29

# Load a dataset by calling the data loader with dataset name

30

iris_df = data('iris')

31

print(type(iris_df)) # pandas.DataFrame

32

33

# Or use attribute access

34

iris_df = data.iris()

35

36

# Get list of available datasets

37

all_datasets = data.list_datasets()

38

print(len(all_datasets)) # 70 datasets

39

40

# Load dataset with pandas options

41

cars_df = data.cars(usecols=['Name', 'Miles_per_Gallon', 'Horsepower'])

42

43

# Access only locally bundled datasets (no internet required)

44

from vega_datasets import local_data

45

stocks_df = local_data.stocks()

46

47

# Get raw data instead of parsed DataFrame

48

raw_data = data.iris.raw()

49

print(type(raw_data)) # bytes

50

```

51

52

## Architecture

53

54

The package follows a clean loader pattern with automatic fallback between local and remote data sources:

55

56

- **DataLoader**: Main interface for accessing all 70 datasets (17 local + 53 remote)

57

- **LocalDataLoader**: Restricted interface for only locally bundled datasets

58

- **Dataset**: Base class handling individual dataset loading, parsing, and metadata

59

- **Specialized Dataset Subclasses**: Custom loaders for datasets requiring specific handling

60

61

The design enables both bundled offline access and remote data fetching, making it suitable for various development and production environments.

62

63

## Capabilities

64

65

### Core Data Loading

66

67

Primary interface for loading datasets using either method calls or attribute access, with automatic format detection and pandas DataFrame conversion.

68

69

```python { .api }

70

class DataLoader:

71

def __call__(self, name: str, return_raw: bool = False, use_local: bool = True, **kwargs) -> pd.DataFrame: ...

72

def list_datasets(self) -> List[str]: ...

73

74

class LocalDataLoader:

75

def __call__(self, name: str, return_raw: bool = False, use_local: bool = True, **kwargs) -> pd.DataFrame: ...

76

def list_datasets(self) -> List[str]: ...

77

```

78

79

[Dataset Loading](./dataset-loading.md)

80

81

### Specialized Dataset Handling

82

83

Enhanced loaders for datasets requiring custom parsing, date handling, or alternative return types beyond standard DataFrames.

84

85

```python { .api }

86

# Stocks with pivot support

87

def stocks(pivoted: bool = False, use_local: bool = True, **kwargs) -> pd.DataFrame: ...

88

89

# Miserables returns tuple of DataFrames

90

def miserables(use_local: bool = True, **kwargs) -> Tuple[pd.DataFrame, pd.DataFrame]: ...

91

92

# Geographic data returns dict objects

93

def us_10m(use_local: bool = True, **kwargs) -> dict: ...

94

def world_110m(use_local: bool = True, **kwargs) -> dict: ...

95

```

96

97

[Specialized Datasets](./specialized-datasets.md)

98

99

## Dataset Categories

100

101

**Locally Bundled (17 datasets)** - Available without internet connection:

102

- Statistical classics: `iris`, `anscombe`, `cars`

103

- Time series: `stocks`, `seattle-weather`, `seattle-temps`, `sf-temps`

104

- Economic data: `iowa-electricity`, `us-employment`

105

- Geographic: `airports`, `la-riots`

106

- Scientific: `barley`, `wheat`, `burtin`, `crimea`, `driving`

107

- Financial: `ohlc`

108

109

**Remote Datasets (53 datasets)** - Require internet connection:

110

- Visualization examples: `7zip`, `flare`, `flare-dependencies`

111

- Global data: `countries`, `world-110m`, `population`

112

- Economic/social: `budget`, `budgets`, `disasters`, `gapminder`

113

- Scientific: `climate`, `co2-concentration`, `earthquakes`, `annual-precip`

114

- Technology: `github`, `ffox`, `movies`

115

- And many more specialized datasets

116

117

## Error Handling

118

119

```python

120

from vega_datasets import data

121

122

# Dataset not found

123

try:

124

df = data('nonexistent-dataset')

125

except ValueError as e:

126

print(e) # "No such dataset nonexistent-dataset exists..."

127

128

# Local dataset not available in LocalDataLoader

129

from vega_datasets import local_data

130

try:

131

df = local_data.github() # github is remote-only

132

except ValueError as e:

133

print(e) # "'github' dataset is not available locally..."

134

135

# Network issues for remote datasets

136

try:

137

df = data.github(use_local=False) # Force remote access

138

except Exception as e:

139

print(f"Network error: {e}")

140

```

141

142

## Utility Functions

143

144

```python { .api }

145

def connection_ok() -> bool:

146

"""

147

Check if web connection is available for remote datasets.

148

149

Returns:

150

bool: True if web connection is OK, False otherwise.

151

"""

152

```

153

154

## Types

155

156

```python { .api }

157

from typing import List, Tuple, Dict, Any

158

import pandas as pd

159

160

# Core classes

161

class DataLoader: ...

162

class LocalDataLoader: ...

163

class Dataset: ...

164

165

# Package-level exports

166

data: DataLoader

167

local_data: LocalDataLoader

168

__version__: str

169

170

# Utility functions

171

def connection_ok() -> bool: ...

172

```