or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

batch.mddatasets.mdexperiments.mdfeature-store.mdgenerative-ai.mdindex.mdmodels.mdpipelines.mdtraining.mdvector-search.mdvision.md

datasets.mddocs/

0

# Dataset Management

1

2

Comprehensive dataset creation, management, and preparation for various ML tasks including tabular, image, text, video, and time series data. Vertex AI datasets provide managed data storage with automatic schema detection and data validation.

3

4

## Capabilities

5

6

### Tabular Datasets

7

8

Structured data management for classification, regression, and forecasting tasks with automatic schema detection and data quality analysis.

9

10

```python { .api }

11

class TabularDataset:

12

@classmethod

13

def create(

14

cls,

15

display_name: str,

16

gcs_source: Union[str, Sequence[str]],

17

bq_source: Optional[str] = None,

18

project: Optional[str] = None,

19

location: Optional[str] = None,

20

labels: Optional[Dict[str, str]] = None,

21

encryption_spec_key_name: Optional[str] = None,

22

sync: bool = True,

23

create_request_timeout: Optional[float] = None,

24

**kwargs

25

) -> 'TabularDataset': ...

26

27

def import_data(

28

self,

29

gcs_source: Optional[Union[str, Sequence[str]]] = None,

30

bq_source: Optional[str] = None,

31

import_schema_uri: Optional[str] = None,

32

data_item_labels: Optional[Dict] = None,

33

sync: bool = True,

34

**kwargs

35

) -> 'TabularDataset': ...

36

37

@property

38

def column_names(self) -> List[str]: ...

39

@property

40

def schema(self) -> Dict[str, str]: ...

41

```

42

43

### Image Datasets

44

45

Image data management for classification, object detection, and segmentation tasks with support for various annotation formats.

46

47

```python { .api }

48

class ImageDataset:

49

@classmethod

50

def create(

51

cls,

52

display_name: str,

53

gcs_source: str,

54

import_schema_uri: str,

55

data_item_labels: Optional[Dict] = None,

56

project: Optional[str] = None,

57

location: Optional[str] = None,

58

labels: Optional[Dict[str, str]] = None,

59

encryption_spec_key_name: Optional[str] = None,

60

sync: bool = True,

61

create_request_timeout: Optional[float] = None,

62

**kwargs

63

) -> 'ImageDataset': ...

64

65

def import_data(

66

self,

67

gcs_source: str,

68

import_schema_uri: str,

69

data_item_labels: Optional[Dict] = None,

70

sync: bool = True,

71

**kwargs

72

) -> 'ImageDataset': ...

73

```

74

75

### Text Datasets

76

77

Text data management for classification, entity extraction, and sentiment analysis with support for various text formats.

78

79

```python { .api }

80

class TextDataset:

81

@classmethod

82

def create(

83

cls,

84

display_name: str,

85

gcs_source: Union[str, Sequence[str]],

86

import_schema_uri: str,

87

data_item_labels: Optional[Dict] = None,

88

project: Optional[str] = None,

89

location: Optional[str] = None,

90

labels: Optional[Dict[str, str]] = None,

91

encryption_spec_key_name: Optional[str] = None,

92

sync: bool = True,

93

create_request_timeout: Optional[float] = None,

94

**kwargs

95

) -> 'TextDataset': ...

96

```

97

98

### Time Series Datasets

99

100

Specialized datasets for forecasting and time series analysis with support for multiple time series and hierarchical forecasting.

101

102

```python { .api }

103

class TimeSeriesDataset:

104

@classmethod

105

def create(

106

cls,

107

display_name: str,

108

gcs_source: Union[str, Sequence[str]],

109

bq_source: Optional[str] = None,

110

project: Optional[str] = None,

111

location: Optional[str] = None,

112

labels: Optional[Dict[str, str]] = None,

113

encryption_spec_key_name: Optional[str] = None,

114

sync: bool = True,

115

create_request_timeout: Optional[float] = None,

116

**kwargs

117

) -> 'TimeSeriesDataset': ...

118

```

119

120

### Video Datasets

121

122

Video data management for action recognition, object tracking, and video classification tasks.

123

124

```python { .api }

125

class VideoDataset:

126

@classmethod

127

def create(

128

cls,

129

display_name: str,

130

gcs_source: Union[str, Sequence[str]],

131

import_schema_uri: str,

132

data_item_labels: Optional[Dict] = None,

133

project: Optional[str] = None,

134

location: Optional[str] = None,

135

labels: Optional[Dict[str, str]] = None,

136

encryption_spec_key_name: Optional[str] = None,

137

sync: bool = True,

138

create_request_timeout: Optional[float] = None,

139

**kwargs

140

) -> 'VideoDataset': ...

141

```

142

143

## Usage Examples

144

145

**Create tabular dataset:**

146

```python

147

import google.cloud.aiplatform as aiplatform

148

149

aiplatform.init(project='my-project', location='us-central1')

150

151

dataset = aiplatform.TabularDataset.create(

152

display_name="customer-data",

153

gcs_source="gs://my-bucket/customer_data.csv",

154

labels={"purpose": "classification", "team": "ml"}

155

)

156

157

print(f"Dataset created: {dataset.resource_name}")

158

print(f"Column names: {dataset.column_names}")

159

```

160

161

**Create image dataset:**

162

```python

163

dataset = aiplatform.ImageDataset.create(

164

display_name="product-images",

165

gcs_source="gs://my-bucket/images/",

166

import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification

167

)

168

```