0
# PyES - Python ElasticSearch Driver
1
2
## Overview
3
4
PyES is a comprehensive Python client library for ElasticSearch that provides a pythonic interface for interacting with ElasticSearch clusters. First released in 2010, it offers extensive functionality for indexing, searching, and managing ElasticSearch infrastructure with support for both Python 2 and Python 3.
5
6
**Version**: 0.99.6
7
**License**: BSD
8
**Documentation**: http://pyes.rtfd.org/
9
**PyPI**: https://pypi.org/project/pyes/
10
11
## Installation
12
13
```bash
14
pip install pyes
15
```
16
17
## Core Imports
18
19
```python { .api }
20
# Main client class
21
from pyes import ES
22
23
# Query DSL classes
24
from pyes import (
25
Query, Search, BoolQuery, MatchAllQuery, TermQuery, TermsQuery,
26
RangeQuery, FilteredQuery, QueryStringQuery, MatchQuery,
27
MultiMatchQuery, TextQuery, SimpleQueryStringQuery,
28
FuzzyQuery, FuzzyLikeThisQuery, MoreLikeThisQuery,
29
PrefixQuery, WildcardQuery, RegexTermQuery, IdsQuery,
30
ConstantScoreQuery, DisMaxQuery, BoostingQuery,
31
CustomScoreQuery, FunctionScoreQuery, HasChildQuery,
32
HasParentQuery, TopChildrenQuery, NestedQuery,
33
SpanTermQuery, SpanFirstQuery, SpanNearQuery,
34
SpanNotQuery, SpanOrQuery, SpanMultiQuery,
35
PercolatorQuery, RescoreQuery, Suggest
36
)
37
38
# Filter DSL classes
39
from pyes import (
40
Filter, FilterList, ANDFilter, ORFilter, BoolFilter, NotFilter,
41
TermFilter, TermsFilter, PrefixFilter, RegexTermFilter,
42
ExistsFilter, MissingFilter, RangeFilter, LimitFilter,
43
GeoDistanceFilter, GeoBoundingBoxFilter, GeoPolygonFilter,
44
GeoShapeFilter, GeoIndexedShapeFilter, HasChildFilter,
45
HasParentFilter, NestedFilter, TypeFilter, IdsFilter,
46
QueryFilter, ScriptFilter, MatchAllFilter, RawFilter
47
)
48
49
# Facet and Aggregation classes
50
from pyes import (
51
FacetFactory, TermFacet, DateHistogramFacet, HistogramFacet,
52
RangeFacet, GeoDistanceFacet, StatisticalFacet, TermStatsFacet,
53
QueryFacet, FilterFacet, AggFactory, Agg, BucketAgg,
54
TermsAgg, DateHistogramAgg, HistogramAgg, RangeAgg,
55
FilterAgg, FiltersAgg, NestedAgg, ReverseNestedAgg,
56
MissingAgg, StatsAgg, ValueCountAgg, SumAgg, AvgAgg,
57
MinAgg, MaxAgg, CardinalityAgg, TermStatsAgg
58
)
59
60
# Mapping classes
61
from pyes import (
62
Mapper, AbstractField, StringField, NumericFieldAbstract,
63
IntegerField, LongField, FloatField, DoubleField,
64
DateField, BooleanField, BinaryField, IpField,
65
ByteField, ShortField, GeoPointField, MultiField,
66
ObjectField, NestedObject, DocumentObjectField,
67
AttachmentField
68
)
69
70
# River classes
71
from pyes import (
72
River, RabbitMQRiver, TwitterRiver, CouchDBRiver,
73
JDBCRiver, MongoDBRiver
74
)
75
76
# Utility functions
77
from pyes import (
78
file_to_attachment, make_path, make_id, clean_string,
79
string_b64encode, string_b64decode, quote, ESRange,
80
ESRangeOp, TermsLookup
81
)
82
83
# Exception classes
84
from pyes import (
85
ElasticSearchException, QueryError, InvalidQuery,
86
InvalidParameterQuery, IndexAlreadyExistsException,
87
IndexMissingException, InvalidIndexNameException,
88
TypeMissingException, DocumentAlreadyExistsException,
89
DocumentMissingException, VersionConflictEngineException,
90
BulkOperationException, SearchPhaseExecutionException,
91
ReduceSearchPhaseException, ReplicationShardOperationFailedException,
92
ClusterBlockException, MapperParsingException, NoServerAvailable
93
)
94
```
95
96
## Basic Usage Example
97
98
```python { .api }
99
from pyes import ES, TermQuery, Search
100
101
# Create ES client connection
102
es = ES('localhost:9200')
103
104
# Index a document
105
doc = {
106
"title": "Python ElasticSearch Guide",
107
"content": "Comprehensive guide to using PyES library",
108
"tags": ["python", "elasticsearch", "search"],
109
"published": "2023-01-15",
110
"author": "John Doe"
111
}
112
es.index(doc, "blog", "post", id="1")
113
114
# Search for documents
115
query = Search(TermQuery("tags", "python"))
116
results = es.search(query, indices=["blog"])
117
118
# Process results
119
for hit in results:
120
print(f"Title: {hit.title}")
121
print(f"Score: {hit._meta.score}")
122
```
123
124
## Architecture Overview
125
126
PyES provides a layered architecture for ElasticSearch interaction:
127
128
1. **Client Layer** (`ES` class) - Connection management and high-level operations
129
2. **Query DSL** - Pythonic query construction with full ElasticSearch query support
130
3. **Filter DSL** - Filtering capabilities with logical and specialized filters
131
4. **Facets & Aggregations** - Data analysis and summarization tools
132
5. **Mapping System** - Schema definition and field type management
133
6. **River System** - Data streaming from external sources
134
7. **Bulk Operations** - High-performance batch processing
135
8. **Index Management** - Index lifecycle and cluster administration
136
137
## Core Capabilities
138
139
### ES Client Operations
140
The main `ES` class provides comprehensive ElasticSearch client functionality:
141
142
```python { .api }
143
# Initialize client with configuration
144
es = ES(
145
server="localhost:9200",
146
timeout=30.0,
147
bulk_size=400,
148
max_retries=3,
149
basic_auth=("username", "password")
150
)
151
152
# Document operations
153
doc_id = es.index(document, "index_name", "doc_type", id="optional_id")
154
document = es.get("index_name", "doc_type", "doc_id")
155
es.update("index_name", "doc_type", "doc_id", script="ctx._source.views += 1")
156
es.delete("index_name", "doc_type", "doc_id")
157
158
# Bulk operations for performance
159
es.index(doc1, "index", "type", bulk=True)
160
es.index(doc2, "index", "type", bulk=True)
161
es.flush_bulk() # Execute all buffered operations
162
```
163
164
**[→ Full ES Client Reference](client.md)**
165
166
### Query DSL Construction
167
Build complex search queries with the comprehensive query DSL:
168
169
```python { .api }
170
from pyes import Search, BoolQuery, TermQuery, RangeQuery, MatchQuery
171
172
# Complex boolean query
173
query = Search(
174
BoolQuery(
175
must=[MatchQuery("title", "python")],
176
should=[TermQuery("tags", "tutorial")],
177
must_not=[TermQuery("status", "draft")],
178
filter=RangeQuery("published", gte="2023-01-01")
179
)
180
).size(20).sort("published", order="desc")
181
182
results = es.search(query, indices=["blog"])
183
```
184
185
**[→ Complete Query DSL Reference](query-dsl.md)**
186
187
### Filter DSL for Performance
188
Use filters for fast, non-scored filtering:
189
190
```python { .api }
191
from pyes import BoolFilter, TermFilter, RangeFilter, GeoDistanceFilter
192
193
# Geographic and term filtering
194
filter = BoolFilter(
195
must=[
196
TermFilter("category", "restaurant"),
197
RangeFilter("rating", gte=4.0),
198
GeoDistanceFilter(
199
distance="5km",
200
location={"lat": 40.7128, "lon": -74.0060}
201
)
202
]
203
)
204
205
filtered_query = Search().filter(filter)
206
```
207
208
**[→ Complete Filter DSL Reference](filters.md)**
209
210
### Facets and Aggregations
211
Analyze and summarize data with facets and aggregations:
212
213
```python { .api }
214
from pyes import Search, TermsAgg, DateHistogramAgg, StatsAgg
215
216
# Multi-level aggregations
217
search = Search().add_aggregation(
218
TermsAgg("categories", field="category.keyword", size=10)
219
.add_aggregation(
220
DateHistogramAgg("monthly", field="published", interval="month")
221
)
222
).add_aggregation(
223
StatsAgg("price_stats", field="price")
224
)
225
226
results = es.search(search, indices=["products"])
227
categories = results.facets.categories
228
monthly_trend = results.facets.categories.monthly
229
price_stats = results.facets.price_stats
230
```
231
232
**[→ Complete Facets & Aggregations Reference](facets-aggregations.md)**
233
234
### Index Mapping Management
235
Define and manage index schemas with typed field mappings:
236
237
```python { .api }
238
from pyes import Mapper, StringField, IntegerField, DateField, GeoPointField
239
240
# Define document mapping
241
mapping = Mapper()
242
mapping.add_property("title", StringField(analyzer="standard"))
243
mapping.add_property("content", StringField(analyzer="english"))
244
mapping.add_property("views", IntegerField())
245
mapping.add_property("published", DateField())
246
mapping.add_property("location", GeoPointField())
247
248
# Apply mapping to index
249
es.indices.put_mapping("blog_post", mapping.as_dict(), indices=["blog"])
250
```
251
252
**[→ Complete Mappings Reference](mappings.md)**
253
254
### Rivers for Data Streaming
255
Set up automated data ingestion from external sources:
256
257
```python { .api }
258
from pyes import CouchDBRiver, TwitterRiver, JDBCRiver
259
260
# CouchDB replication river
261
couchdb_river = CouchDBRiver(
262
couchdb_db="mydb",
263
couchdb_host="localhost",
264
couchdb_port=5984,
265
es_index="replicated_data",
266
es_type="document"
267
)
268
es.create_river(couchdb_river, "couchdb_sync")
269
270
# Twitter streaming river
271
twitter_river = TwitterRiver(
272
oauth_token="token",
273
oauth_secret="secret",
274
consumer_key="key",
275
consumer_secret="secret",
276
filter_tracks=["python", "elasticsearch"]
277
)
278
es.create_river(twitter_river, "twitter_stream")
279
```
280
281
**[→ Complete Rivers Reference](rivers.md)**
282
283
### Bulk Operations for Performance
284
Handle large-scale data operations efficiently:
285
286
```python { .api }
287
# Configure bulk processing
288
es.bulk_size = 1000 # Process in batches of 1000
289
290
# Bulk indexing with automatic flushing
291
documents = [{"title": f"Doc {i}", "content": f"Content {i}"} for i in range(5000)]
292
293
for doc in documents:
294
es.index(doc, "bulk_index", "doc", bulk=True)
295
# Automatically flushes when bulk_size reached
296
297
# Manual bulk operations
298
es.force_bulk() # Force immediate processing
299
300
# Bulk deletion
301
es.delete("index", "type", "id1", bulk=True)
302
es.delete("index", "type", "id2", bulk=True)
303
es.flush_bulk()
304
```
305
306
**[→ Complete Bulk Operations Reference](bulk-operations.md)**
307
308
## Advanced Features
309
310
### Percolator Queries
311
Store queries and match documents against them:
312
313
```python { .api }
314
# Register percolator query
315
percolator_query = TermQuery("tags", "python")
316
es.create_percolator("blog", "python_posts", percolator_query)
317
318
# Test document against registered queries
319
doc = {"title": "Python Tutorial", "tags": ["python", "programming"]}
320
matches = es.percolate("blog", ["post"], doc)
321
```
322
323
### More Like This
324
Find similar documents:
325
326
```python { .api }
327
similar_docs = es.morelikethis(
328
"blog", "post", "doc_id_1",
329
fields=["title", "content"],
330
min_term_freq=1,
331
max_query_terms=12
332
)
333
```
334
335
### Suggestions and Auto-complete
336
Provide search suggestions:
337
338
```python { .api }
339
from pyes import Suggest
340
341
# Term suggestions
342
suggest = Suggest()
343
suggest.add_term("python programming", "title_suggest", "title")
344
345
suggestions = es.suggest_from_object(suggest, indices=["blog"])
346
```
347
348
### Geospatial Search
349
Search by geographic location:
350
351
```python { .api }
352
from pyes import GeoDistanceFilter, Search
353
354
# Find restaurants within 2km
355
geo_query = Search().filter(
356
GeoDistanceFilter(
357
distance="2km",
358
location={"lat": 40.7128, "lon": -74.0060}
359
)
360
)
361
362
nearby_restaurants = es.search(geo_query, indices=["restaurants"])
363
```
364
365
## Connection and Configuration
366
367
PyES supports multiple connection protocols and extensive configuration:
368
369
```python { .api }
370
# HTTP connection (default)
371
es = ES(
372
server=["host1:9200", "host2:9200"], # Multiple hosts for failover
373
timeout=30.0,
374
max_retries=3,
375
retry_time=60,
376
basic_auth=("username", "password"),
377
cert_reqs='CERT_REQUIRED' # SSL certificate verification
378
)
379
380
# Thrift connection (optional)
381
from pyes import ES
382
es = ES(server="localhost:9500", connection_type="thrift")
383
```
384
385
## Error Handling
386
387
PyES provides comprehensive exception handling:
388
389
```python { .api }
390
from pyes import (
391
ElasticSearchException, IndexMissingException,
392
DocumentMissingException, BulkOperationException
393
)
394
395
try:
396
result = es.get("missing_index", "doc_type", "doc_id")
397
except IndexMissingException:
398
print("Index does not exist")
399
except DocumentMissingException:
400
print("Document not found")
401
except ElasticSearchException as e:
402
print(f"ElasticSearch error: {e}")
403
```
404
405
## Performance Considerations
406
407
- Use bulk operations for high-throughput indexing
408
- Implement connection pooling for concurrent access
409
- Use filters instead of queries when scoring is not needed
410
- Configure appropriate bulk_size based on document size and memory
411
- Use scan & scroll for large result sets
412
- Implement proper error handling and retry logic
413
414
## Migration and Compatibility
415
416
PyES maintains compatibility with ElasticSearch versions up to 2.x. For newer ElasticSearch versions (5.x+), consider migrating to the official `elasticsearch-py` client. PyES supports both Python 2 and Python 3.
417
418
---
419
420
This documentation provides comprehensive coverage of the PyES Python ElasticSearch driver. Each linked section contains detailed API references, examples, and usage patterns for building robust search-enabled applications.