0
# Dataset Management
1
2
Comprehensive dataset creation, management, and preparation for various ML tasks including tabular, image, text, video, and time series data. Vertex AI datasets provide managed data storage with automatic schema detection and data validation.
3
4
## Capabilities
5
6
### Tabular Datasets
7
8
Structured data management for classification, regression, and forecasting tasks with automatic schema detection and data quality analysis.
9
10
```python { .api }
11
class TabularDataset:
12
@classmethod
13
def create(
14
cls,
15
display_name: str,
16
gcs_source: Union[str, Sequence[str]],
17
bq_source: Optional[str] = None,
18
project: Optional[str] = None,
19
location: Optional[str] = None,
20
labels: Optional[Dict[str, str]] = None,
21
encryption_spec_key_name: Optional[str] = None,
22
sync: bool = True,
23
create_request_timeout: Optional[float] = None,
24
**kwargs
25
) -> 'TabularDataset': ...
26
27
def import_data(
28
self,
29
gcs_source: Optional[Union[str, Sequence[str]]] = None,
30
bq_source: Optional[str] = None,
31
import_schema_uri: Optional[str] = None,
32
data_item_labels: Optional[Dict] = None,
33
sync: bool = True,
34
**kwargs
35
) -> 'TabularDataset': ...
36
37
@property
38
def column_names(self) -> List[str]: ...
39
@property
40
def schema(self) -> Dict[str, str]: ...
41
```
42
43
### Image Datasets
44
45
Image data management for classification, object detection, and segmentation tasks with support for various annotation formats.
46
47
```python { .api }
48
class ImageDataset:
49
@classmethod
50
def create(
51
cls,
52
display_name: str,
53
gcs_source: str,
54
import_schema_uri: str,
55
data_item_labels: Optional[Dict] = None,
56
project: Optional[str] = None,
57
location: Optional[str] = None,
58
labels: Optional[Dict[str, str]] = None,
59
encryption_spec_key_name: Optional[str] = None,
60
sync: bool = True,
61
create_request_timeout: Optional[float] = None,
62
**kwargs
63
) -> 'ImageDataset': ...
64
65
def import_data(
66
self,
67
gcs_source: str,
68
import_schema_uri: str,
69
data_item_labels: Optional[Dict] = None,
70
sync: bool = True,
71
**kwargs
72
) -> 'ImageDataset': ...
73
```
74
75
### Text Datasets
76
77
Text data management for classification, entity extraction, and sentiment analysis with support for various text formats.
78
79
```python { .api }
80
class TextDataset:
81
@classmethod
82
def create(
83
cls,
84
display_name: str,
85
gcs_source: Union[str, Sequence[str]],
86
import_schema_uri: str,
87
data_item_labels: Optional[Dict] = None,
88
project: Optional[str] = None,
89
location: Optional[str] = None,
90
labels: Optional[Dict[str, str]] = None,
91
encryption_spec_key_name: Optional[str] = None,
92
sync: bool = True,
93
create_request_timeout: Optional[float] = None,
94
**kwargs
95
) -> 'TextDataset': ...
96
```
97
98
### Time Series Datasets
99
100
Specialized datasets for forecasting and time series analysis with support for multiple time series and hierarchical forecasting.
101
102
```python { .api }
103
class TimeSeriesDataset:
104
@classmethod
105
def create(
106
cls,
107
display_name: str,
108
gcs_source: Union[str, Sequence[str]],
109
bq_source: Optional[str] = None,
110
project: Optional[str] = None,
111
location: Optional[str] = None,
112
labels: Optional[Dict[str, str]] = None,
113
encryption_spec_key_name: Optional[str] = None,
114
sync: bool = True,
115
create_request_timeout: Optional[float] = None,
116
**kwargs
117
) -> 'TimeSeriesDataset': ...
118
```
119
120
### Video Datasets
121
122
Video data management for action recognition, object tracking, and video classification tasks.
123
124
```python { .api }
125
class VideoDataset:
126
@classmethod
127
def create(
128
cls,
129
display_name: str,
130
gcs_source: Union[str, Sequence[str]],
131
import_schema_uri: str,
132
data_item_labels: Optional[Dict] = None,
133
project: Optional[str] = None,
134
location: Optional[str] = None,
135
labels: Optional[Dict[str, str]] = None,
136
encryption_spec_key_name: Optional[str] = None,
137
sync: bool = True,
138
create_request_timeout: Optional[float] = None,
139
**kwargs
140
) -> 'VideoDataset': ...
141
```
142
143
## Usage Examples
144
145
**Create tabular dataset:**
146
```python
147
import google.cloud.aiplatform as aiplatform
148
149
aiplatform.init(project='my-project', location='us-central1')
150
151
dataset = aiplatform.TabularDataset.create(
152
display_name="customer-data",
153
gcs_source="gs://my-bucket/customer_data.csv",
154
labels={"purpose": "classification", "team": "ml"}
155
)
156
157
print(f"Dataset created: {dataset.resource_name}")
158
print(f"Column names: {dataset.column_names}")
159
```
160
161
**Create image dataset:**
162
```python
163
dataset = aiplatform.ImageDataset.create(
164
display_name="product-images",
165
gcs_source="gs://my-bucket/images/",
166
import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification
167
)
168
```