0
# Document Operations
1
2
Essential CRUD operations for working with individual documents in Elasticsearch. These operations provide the foundation for document-based interactions including creation, retrieval, updates, and deletion.
3
4
## Capabilities
5
6
### Document Creation
7
8
Create new documents with explicit IDs, ensuring the document doesn't already exist.
9
10
```python { .api }
11
def create(index: str, doc_type: str, id: str, body: dict, **params) -> dict:
12
"""
13
Create a new document with the specified ID.
14
15
Parameters:
16
- index: Index name where the document will be stored
17
- doc_type: Document type (use '_doc' for Elasticsearch 6.x+ compatibility)
18
- id: Unique document identifier
19
- body: Document content as a dictionary
20
- refresh: Control when changes are visible ('true', 'false', 'wait_for')
21
- routing: Routing value for document placement
22
- timeout: Request timeout
23
- version: Expected document version for optimistic concurrency
24
- version_type: Version type ('internal', 'external', 'external_gte')
25
26
Returns:
27
dict: Response containing '_index', '_id', '_version', 'result', and '_shards'
28
29
Raises:
30
ConflictError: If document with the same ID already exists
31
"""
32
```
33
34
### Document Indexing
35
36
Index documents (create or update) with optional auto-generated IDs.
37
38
```python { .api }
39
def index(index: str, doc_type: str, body: dict, id: str = None, **params) -> dict:
40
"""
41
Index a document (create new or update existing).
42
43
Parameters:
44
- index: Index name where the document will be stored
45
- doc_type: Document type
46
- body: Document content as a dictionary
47
- id: Document ID (auto-generated if not provided)
48
- op_type: Operation type ('index', 'create')
49
- refresh: Control when changes are visible
50
- routing: Routing value for document placement
51
- timeout: Request timeout
52
- version: Expected document version
53
- version_type: Version type ('internal', 'external', 'external_gte')
54
- pipeline: Ingest pipeline to process document
55
56
Returns:
57
dict: Response with document metadata and operation result
58
"""
59
```
60
61
### Document Retrieval
62
63
Retrieve documents by ID with support for field filtering and routing.
64
65
```python { .api }
66
def get(index: str, id: str, doc_type: str = '_all', **params) -> dict:
67
"""
68
Retrieve a document by its ID.
69
70
Parameters:
71
- index: Index name containing the document
72
- id: Document identifier
73
- doc_type: Document type (default '_all' searches all types)
74
- _source: Fields to include/exclude in response
75
- _source_excludes: Fields to exclude from _source
76
- _source_includes: Fields to include in _source
77
- routing: Routing value used when indexing
78
- preference: Node preference for request execution
79
- realtime: Whether to retrieve from transaction log (true) or search (false)
80
- refresh: Refresh index before retrieval
81
- version: Expected document version
82
- version_type: Version type for version checking
83
84
Returns:
85
dict: Document with '_source', '_id', '_version', and metadata
86
87
Raises:
88
NotFoundError: If document doesn't exist
89
"""
90
91
def get_source(index: str, doc_type: str, id: str, **params) -> dict:
92
"""
93
Retrieve only the document source (_source field).
94
95
Parameters:
96
- index: Index name
97
- doc_type: Document type
98
- id: Document identifier
99
- _source_excludes: Fields to exclude
100
- _source_includes: Fields to include
101
- routing: Routing value
102
- preference: Node preference
103
- realtime: Real-time retrieval flag
104
- refresh: Refresh before retrieval
105
- version: Expected version
106
- version_type: Version type
107
108
Returns:
109
dict: Document source content only
110
"""
111
```
112
113
### Document Existence Checks
114
115
Check if documents exist without retrieving full content.
116
117
```python { .api }
118
def exists(index: str, doc_type: str, id: str, **params) -> bool:
119
"""
120
Check if a document exists.
121
122
Parameters:
123
- index: Index name
124
- doc_type: Document type
125
- id: Document identifier
126
- routing: Routing value
127
- preference: Node preference
128
- realtime: Real-time check flag
129
- refresh: Refresh before check
130
- version: Expected version
131
- version_type: Version type
132
133
Returns:
134
bool: True if document exists, False otherwise
135
"""
136
137
def exists_source(index: str, doc_type: str, id: str, **params) -> bool:
138
"""
139
Check if document source exists.
140
141
Parameters: Same as exists()
142
143
Returns:
144
bool: True if document source exists
145
"""
146
```
147
148
### Document Updates
149
150
Update existing documents with partial updates or script-based modifications.
151
152
```python { .api }
153
def update(index: str, doc_type: str, id: str, body: dict = None, **params) -> dict:
154
"""
155
Update an existing document.
156
157
Parameters:
158
- index: Index name
159
- doc_type: Document type
160
- id: Document identifier
161
- body: Update specification with 'doc', 'script', or 'upsert'
162
- retry_on_conflict: Number of retry attempts on version conflicts
163
- routing: Routing value
164
- timeout: Request timeout
165
- refresh: Control when changes are visible
166
- _source: Fields to return in response
167
- version: Expected current version
168
- version_type: Version type
169
- wait_for_active_shards: Wait for N shards to be active
170
171
Body structure:
172
{
173
"doc": {"field": "new_value"}, # Partial document update
174
"script": { # Script-based update
175
"source": "ctx._source.counter += params.increment",
176
"params": {"increment": 1}
177
},
178
"upsert": {"field": "default_value"} # Create if doesn't exist
179
}
180
181
Returns:
182
dict: Update result with '_version', 'result', and optionally 'get'
183
184
Raises:
185
NotFoundError: If document doesn't exist and no upsert provided
186
"""
187
```
188
189
### Document Deletion
190
191
Delete documents by ID with support for routing and versioning.
192
193
```python { .api }
194
def delete(index: str, doc_type: str, id: str, **params) -> dict:
195
"""
196
Delete a document by ID.
197
198
Parameters:
199
- index: Index name
200
- doc_type: Document type
201
- id: Document identifier
202
- routing: Routing value used when indexing
203
- timeout: Request timeout
204
- refresh: Control when changes are visible
205
- version: Expected document version
206
- version_type: Version type
207
- wait_for_active_shards: Wait for N shards to be active
208
209
Returns:
210
dict: Deletion result with '_version', 'result', and '_shards'
211
212
Raises:
213
NotFoundError: If document doesn't exist
214
"""
215
```
216
217
### Multi-Document Retrieval
218
219
Retrieve multiple documents in a single request for improved performance.
220
221
```python { .api }
222
def mget(body: dict, index: str = None, doc_type: str = None, **params) -> dict:
223
"""
224
Retrieve multiple documents by their IDs.
225
226
Parameters:
227
- body: Multi-get request specification
228
- index: Default index name for documents without explicit index
229
- doc_type: Default document type
230
- _source: Default fields to include/exclude
231
- _source_excludes: Default fields to exclude
232
- _source_includes: Default fields to include
233
- preference: Node preference
234
- realtime: Real-time retrieval flag
235
- refresh: Refresh before retrieval
236
- routing: Default routing value
237
238
Body structure:
239
{
240
"docs": [
241
{"_index": "my_index", "_type": "_doc", "_id": "1"},
242
{"_index": "my_index", "_type": "_doc", "_id": "2", "_source": ["title"]},
243
{"_index": "other_index", "_type": "_doc", "_id": "3"}
244
]
245
}
246
247
Or with default index/type:
248
{
249
"ids": ["1", "2", "3"]
250
}
251
252
Returns:
253
dict: Response with 'docs' array containing each document or error
254
"""
255
```
256
257
## Usage Examples
258
259
### Basic Document Lifecycle
260
261
```python
262
from elasticsearch5 import Elasticsearch
263
264
es = Elasticsearch(['localhost:9200'])
265
266
# Create a document
267
doc = {
268
'title': 'My Article',
269
'content': 'This is the article content',
270
'author': 'John Doe',
271
'created_at': '2023-01-01T12:00:00'
272
}
273
274
# Index with auto-generated ID
275
result = es.index(index='articles', doc_type='_doc', body=doc)
276
doc_id = result['_id']
277
278
# Create with explicit ID (fails if exists)
279
try:
280
es.create(index='articles', doc_type='_doc', id='article-1', body=doc)
281
except es.ConflictError:
282
print("Document already exists")
283
284
# Check if document exists
285
if es.exists(index='articles', doc_type='_doc', id=doc_id):
286
# Get the document
287
retrieved = es.get(index='articles', doc_type='_doc', id=doc_id)
288
print(f"Document: {retrieved['_source']}")
289
```
290
291
### Document Updates
292
293
```python
294
# Partial document update
295
update_body = {
296
'doc': {
297
'content': 'Updated article content',
298
'updated_at': '2023-01-02T12:00:00'
299
}
300
}
301
es.update(index='articles', doc_type='_doc', id=doc_id, body=update_body)
302
303
# Script-based update
304
script_update = {
305
'script': {
306
'source': 'ctx._source.view_count = (ctx._source.view_count ?: 0) + 1'
307
}
308
}
309
es.update(index='articles', doc_type='_doc', id=doc_id, body=script_update)
310
311
# Upsert (update or insert)
312
upsert_body = {
313
'doc': {'title': 'New Title'},
314
'upsert': {'title': 'Default Title', 'created_at': '2023-01-01T00:00:00'}
315
}
316
es.update(index='articles', doc_type='_doc', id='new-article', body=upsert_body)
317
```
318
319
### Multi-Document Operations
320
321
```python
322
# Retrieve multiple documents
323
mget_body = {
324
'docs': [
325
{'_index': 'articles', '_type': '_doc', '_id': doc_id},
326
{'_index': 'articles', '_type': '_doc', '_id': 'article-2', '_source': ['title', 'author']}
327
]
328
}
329
results = es.mget(body=mget_body)
330
331
for doc in results['docs']:
332
if doc['found']:
333
print(f"Found: {doc['_source']}")
334
else:
335
print(f"Not found: {doc['_id']}")
336
```