or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

bulk-operations.mdcluster-operations.mddocument-operations.mdindex-management.mdindex.mdsearch-operations.mdtransport-connection.md

document-operations.mddocs/

0

# Document Operations

1

2

Essential CRUD operations for working with individual documents in Elasticsearch. These operations provide the foundation for document-based interactions including creation, retrieval, updates, and deletion.

3

4

## Capabilities

5

6

### Document Creation

7

8

Create new documents with explicit IDs, ensuring the document doesn't already exist.

9

10

```python { .api }

11

def create(index: str, doc_type: str, id: str, body: dict, **params) -> dict:

12

"""

13

Create a new document with the specified ID.

14

15

Parameters:

16

- index: Index name where the document will be stored

17

- doc_type: Document type (use '_doc' for Elasticsearch 6.x+ compatibility)

18

- id: Unique document identifier

19

- body: Document content as a dictionary

20

- refresh: Control when changes are visible ('true', 'false', 'wait_for')

21

- routing: Routing value for document placement

22

- timeout: Request timeout

23

- version: Expected document version for optimistic concurrency

24

- version_type: Version type ('internal', 'external', 'external_gte')

25

26

Returns:

27

dict: Response containing '_index', '_id', '_version', 'result', and '_shards'

28

29

Raises:

30

ConflictError: If document with the same ID already exists

31

"""

32

```

33

34

### Document Indexing

35

36

Index documents (create or update) with optional auto-generated IDs.

37

38

```python { .api }

39

def index(index: str, doc_type: str, body: dict, id: str = None, **params) -> dict:

40

"""

41

Index a document (create new or update existing).

42

43

Parameters:

44

- index: Index name where the document will be stored

45

- doc_type: Document type

46

- body: Document content as a dictionary

47

- id: Document ID (auto-generated if not provided)

48

- op_type: Operation type ('index', 'create')

49

- refresh: Control when changes are visible

50

- routing: Routing value for document placement

51

- timeout: Request timeout

52

- version: Expected document version

53

- version_type: Version type ('internal', 'external', 'external_gte')

54

- pipeline: Ingest pipeline to process document

55

56

Returns:

57

dict: Response with document metadata and operation result

58

"""

59

```

60

61

### Document Retrieval

62

63

Retrieve documents by ID with support for field filtering and routing.

64

65

```python { .api }

66

def get(index: str, id: str, doc_type: str = '_all', **params) -> dict:

67

"""

68

Retrieve a document by its ID.

69

70

Parameters:

71

- index: Index name containing the document

72

- id: Document identifier

73

- doc_type: Document type (default '_all' searches all types)

74

- _source: Fields to include/exclude in response

75

- _source_excludes: Fields to exclude from _source

76

- _source_includes: Fields to include in _source

77

- routing: Routing value used when indexing

78

- preference: Node preference for request execution

79

- realtime: Whether to retrieve from transaction log (true) or search (false)

80

- refresh: Refresh index before retrieval

81

- version: Expected document version

82

- version_type: Version type for version checking

83

84

Returns:

85

dict: Document with '_source', '_id', '_version', and metadata

86

87

Raises:

88

NotFoundError: If document doesn't exist

89

"""

90

91

def get_source(index: str, doc_type: str, id: str, **params) -> dict:

92

"""

93

Retrieve only the document source (_source field).

94

95

Parameters:

96

- index: Index name

97

- doc_type: Document type

98

- id: Document identifier

99

- _source_excludes: Fields to exclude

100

- _source_includes: Fields to include

101

- routing: Routing value

102

- preference: Node preference

103

- realtime: Real-time retrieval flag

104

- refresh: Refresh before retrieval

105

- version: Expected version

106

- version_type: Version type

107

108

Returns:

109

dict: Document source content only

110

"""

111

```

112

113

### Document Existence Checks

114

115

Check if documents exist without retrieving full content.

116

117

```python { .api }

118

def exists(index: str, doc_type: str, id: str, **params) -> bool:

119

"""

120

Check if a document exists.

121

122

Parameters:

123

- index: Index name

124

- doc_type: Document type

125

- id: Document identifier

126

- routing: Routing value

127

- preference: Node preference

128

- realtime: Real-time check flag

129

- refresh: Refresh before check

130

- version: Expected version

131

- version_type: Version type

132

133

Returns:

134

bool: True if document exists, False otherwise

135

"""

136

137

def exists_source(index: str, doc_type: str, id: str, **params) -> bool:

138

"""

139

Check if document source exists.

140

141

Parameters: Same as exists()

142

143

Returns:

144

bool: True if document source exists

145

"""

146

```

147

148

### Document Updates

149

150

Update existing documents with partial updates or script-based modifications.

151

152

```python { .api }

153

def update(index: str, doc_type: str, id: str, body: dict = None, **params) -> dict:

154

"""

155

Update an existing document.

156

157

Parameters:

158

- index: Index name

159

- doc_type: Document type

160

- id: Document identifier

161

- body: Update specification with 'doc', 'script', or 'upsert'

162

- retry_on_conflict: Number of retry attempts on version conflicts

163

- routing: Routing value

164

- timeout: Request timeout

165

- refresh: Control when changes are visible

166

- _source: Fields to return in response

167

- version: Expected current version

168

- version_type: Version type

169

- wait_for_active_shards: Wait for N shards to be active

170

171

Body structure:

172

{

173

"doc": {"field": "new_value"}, # Partial document update

174

"script": { # Script-based update

175

"source": "ctx._source.counter += params.increment",

176

"params": {"increment": 1}

177

},

178

"upsert": {"field": "default_value"} # Create if doesn't exist

179

}

180

181

Returns:

182

dict: Update result with '_version', 'result', and optionally 'get'

183

184

Raises:

185

NotFoundError: If document doesn't exist and no upsert provided

186

"""

187

```

188

189

### Document Deletion

190

191

Delete documents by ID with support for routing and versioning.

192

193

```python { .api }

194

def delete(index: str, doc_type: str, id: str, **params) -> dict:

195

"""

196

Delete a document by ID.

197

198

Parameters:

199

- index: Index name

200

- doc_type: Document type

201

- id: Document identifier

202

- routing: Routing value used when indexing

203

- timeout: Request timeout

204

- refresh: Control when changes are visible

205

- version: Expected document version

206

- version_type: Version type

207

- wait_for_active_shards: Wait for N shards to be active

208

209

Returns:

210

dict: Deletion result with '_version', 'result', and '_shards'

211

212

Raises:

213

NotFoundError: If document doesn't exist

214

"""

215

```

216

217

### Multi-Document Retrieval

218

219

Retrieve multiple documents in a single request for improved performance.

220

221

```python { .api }

222

def mget(body: dict, index: str = None, doc_type: str = None, **params) -> dict:

223

"""

224

Retrieve multiple documents by their IDs.

225

226

Parameters:

227

- body: Multi-get request specification

228

- index: Default index name for documents without explicit index

229

- doc_type: Default document type

230

- _source: Default fields to include/exclude

231

- _source_excludes: Default fields to exclude

232

- _source_includes: Default fields to include

233

- preference: Node preference

234

- realtime: Real-time retrieval flag

235

- refresh: Refresh before retrieval

236

- routing: Default routing value

237

238

Body structure:

239

{

240

"docs": [

241

{"_index": "my_index", "_type": "_doc", "_id": "1"},

242

{"_index": "my_index", "_type": "_doc", "_id": "2", "_source": ["title"]},

243

{"_index": "other_index", "_type": "_doc", "_id": "3"}

244

]

245

}

246

247

Or with default index/type:

248

{

249

"ids": ["1", "2", "3"]

250

}

251

252

Returns:

253

dict: Response with 'docs' array containing each document or error

254

"""

255

```

256

257

## Usage Examples

258

259

### Basic Document Lifecycle

260

261

```python

262

from elasticsearch5 import Elasticsearch

263

264

es = Elasticsearch(['localhost:9200'])

265

266

# Create a document

267

doc = {

268

'title': 'My Article',

269

'content': 'This is the article content',

270

'author': 'John Doe',

271

'created_at': '2023-01-01T12:00:00'

272

}

273

274

# Index with auto-generated ID

275

result = es.index(index='articles', doc_type='_doc', body=doc)

276

doc_id = result['_id']

277

278

# Create with explicit ID (fails if exists)

279

try:

280

es.create(index='articles', doc_type='_doc', id='article-1', body=doc)

281

except es.ConflictError:

282

print("Document already exists")

283

284

# Check if document exists

285

if es.exists(index='articles', doc_type='_doc', id=doc_id):

286

# Get the document

287

retrieved = es.get(index='articles', doc_type='_doc', id=doc_id)

288

print(f"Document: {retrieved['_source']}")

289

```

290

291

### Document Updates

292

293

```python

294

# Partial document update

295

update_body = {

296

'doc': {

297

'content': 'Updated article content',

298

'updated_at': '2023-01-02T12:00:00'

299

}

300

}

301

es.update(index='articles', doc_type='_doc', id=doc_id, body=update_body)

302

303

# Script-based update

304

script_update = {

305

'script': {

306

'source': 'ctx._source.view_count = (ctx._source.view_count ?: 0) + 1'

307

}

308

}

309

es.update(index='articles', doc_type='_doc', id=doc_id, body=script_update)

310

311

# Upsert (update or insert)

312

upsert_body = {

313

'doc': {'title': 'New Title'},

314

'upsert': {'title': 'Default Title', 'created_at': '2023-01-01T00:00:00'}

315

}

316

es.update(index='articles', doc_type='_doc', id='new-article', body=upsert_body)

317

```

318

319

### Multi-Document Operations

320

321

```python

322

# Retrieve multiple documents

323

mget_body = {

324

'docs': [

325

{'_index': 'articles', '_type': '_doc', '_id': doc_id},

326

{'_index': 'articles', '_type': '_doc', '_id': 'article-2', '_source': ['title', 'author']}

327

]

328

}

329

results = es.mget(body=mget_body)

330

331

for doc in results['docs']:

332

if doc['found']:

333

print(f"Found: {doc['_source']}")

334

else:

335

print(f"Not found: {doc['_id']}")

336

```