0
# PySOLR
1
2
A lightweight Python client for Apache Solr that provides a simple interface for performing basic Solr operations including document selection, updating, deletion, index optimization, More Like This functionality, spelling correction, timeout handling, and SolrCloud awareness.
3
4
## Package Information
5
6
- **Package Name**: pysolr
7
- **Language**: Python
8
- **Python Version**: 2.7 - 3.7+
9
- **Installation**: `pip install pysolr`
10
- **Optional Dependencies**: `simplejson`, `kazoo` (for SolrCloud)
11
12
## Core Imports
13
14
```python
15
import pysolr
16
```
17
18
For SolrCloud functionality:
19
20
```python
21
import pysolr
22
# Requires: pip install pysolr[solrcloud]
23
```
24
25
## Basic Usage
26
27
```python
28
import pysolr
29
30
# Create a Solr client instance
31
solr = pysolr.Solr('http://localhost:8983/solr/my_core', always_commit=True)
32
33
# Health check
34
solr.ping()
35
36
# Index documents
37
solr.add([
38
{
39
"id": "doc_1",
40
"title": "A test document",
41
"content": "This is a sample document for indexing."
42
},
43
{
44
"id": "doc_2",
45
"title": "Another document",
46
"content": "More content to be indexed."
47
}
48
])
49
50
# Search for documents
51
results = solr.search('test')
52
print(f"Found {len(results)} documents")
53
for doc in results:
54
print(f"ID: {doc['id']}, Title: {doc['title']}")
55
56
# Delete documents
57
solr.delete(id='doc_1')
58
59
# Commit changes
60
solr.commit()
61
```
62
63
## Architecture
64
65
PySOLR uses a straightforward client-server architecture:
66
67
- **Solr**: Main client class that handles HTTP communication with Solr servers
68
- **Results**: Wrapper class for search results with metadata and iteration support
69
- **SolrCoreAdmin**: Administrative operations for Solr cores
70
- **SolrCloud**: ZooKeeper-aware client for SolrCloud deployments with automatic failover
71
- **ZooKeeper**: Handles cluster coordination and node discovery for SolrCloud
72
73
The library abstracts Solr's HTTP API into Python-friendly interfaces while maintaining access to all Solr features through keyword arguments.
74
75
## Capabilities
76
77
### Core Client Operations
78
79
Essential Solr operations including client initialization, health checks, document indexing, searching, deletion, and index management. These operations form the foundation for interacting with Solr servers.
80
81
```python { .api }
82
class Solr:
83
def __init__(self, url, decoder=None, encoder=None, timeout=60, results_cls=Results,
84
search_handler="select", use_qt_param=False, always_commit=False,
85
auth=None, verify=True, session=None): ...
86
def ping(self, handler="admin/ping", **kwargs): ...
87
def add(self, docs, boost=None, fieldUpdates=None, commit=None, softCommit=False,
88
commitWithin=None, waitFlush=None, waitSearcher=None, overwrite=None,
89
handler="update", min_rf=None): ...
90
def delete(self, id=None, q=None, commit=None, softCommit=False,
91
waitFlush=None, waitSearcher=None, handler="update"): ...
92
def commit(self, softCommit=False, waitFlush=None, waitSearcher=None,
93
expungeDeletes=None, handler="update"): ...
94
def optimize(self, commit=True, waitFlush=None, waitSearcher=None,
95
maxSegments=None, handler="update"): ...
96
def extract(self, file_obj, extractOnly=True, handler="update/extract", **kwargs): ...
97
```
98
99
[Core Client Operations](./core-client.md)
100
101
### Search and Query Operations
102
103
Advanced search functionality including basic queries, More Like This queries, term suggestions, and result handling with pagination and cursor mark support.
104
105
```python { .api }
106
def search(self, q, search_handler=None, **kwargs): ...
107
def more_like_this(self, q, mltfl, handler="mlt", **kwargs): ...
108
def suggest_terms(self, fields, prefix, handler="terms", **kwargs): ...
109
110
class Results:
111
def __init__(self, decoded, next_page_query=None): ...
112
docs: list
113
hits: int
114
highlighting: dict
115
facets: dict
116
spellcheck: dict
117
```
118
119
[Search Operations](./search-operations.md)
120
121
### Administrative Operations
122
123
Core administration capabilities for managing Solr cores including creation, reloading, renaming, swapping, and status monitoring.
124
125
```python { .api }
126
class SolrCoreAdmin:
127
def __init__(self, url, *args, **kwargs): ...
128
def status(self, core=None): ...
129
def create(self, name, instance_dir=None, config="solrconfig.xml", schema="schema.xml"): ...
130
def reload(self, core): ...
131
def rename(self, core, other): ...
132
def swap(self, core, other): ...
133
def unload(self, core): ...
134
```
135
136
[Administrative Operations](./admin-operations.md)
137
138
### SolrCloud Support
139
140
SolrCloud cluster support with ZooKeeper coordination, automatic failover, leader detection, and distributed query handling across multiple Solr nodes.
141
142
```python { .api }
143
class SolrCloud(Solr):
144
def __init__(self, zookeeper, collection, decoder=None, encoder=None, timeout=60,
145
retry_count=5, retry_timeout=0.2, auth=None, verify=True, *args, **kwargs): ...
146
147
class ZooKeeper:
148
def __init__(self, zkServerAddress, timeout=15, max_retries=-1, kazoo_client=None): ...
149
def getHosts(self, collname, only_leader=False, seen_aliases=None): ...
150
def getRandomURL(self, collname, only_leader=False): ...
151
def getLeaderURL(self, collname): ...
152
```
153
154
[SolrCloud Support](./solrcloud-support.md)
155
156
### Document Processing
157
158
Advanced document handling including content extraction with Apache Tika, nested document support, field updates, and XML/JSON processing utilities.
159
160
```python { .api }
161
def extract(self, file_obj, extractOnly=True, handler="update/extract", **kwargs): ...
162
```
163
164
[Document Processing](./document-processing.md)
165
166
### Utility Functions
167
168
Helper functions for data conversion, text processing, URL encoding, and XML sanitization used throughout the library.
169
170
```python { .api }
171
def get_version(): ...
172
def force_unicode(value): ...
173
def force_bytes(value): ...
174
def unescape_html(text): ...
175
def safe_urlencode(params, doseq=0): ...
176
def clean_xml_string(s): ...
177
def sanitize(data): ...
178
def is_py3(): ...
179
```
180
181
[Utility Functions](./utilities.md)
182
183
## Types
184
185
```python { .api }
186
class SolrError(Exception):
187
"""Base exception for Solr-related errors."""
188
pass
189
190
# Constants
191
NESTED_DOC_KEY = "_childDocuments_" # Key for nested documents in document structure
192
193
# Type aliases for complex parameters
194
AuthType = tuple # HTTP auth tuple (username, password) or requests auth object
195
SessionType = requests.Session # Custom requests session
196
DecoderType = json.JSONDecoder # Custom JSON decoder
197
EncoderType = json.JSONEncoder # Custom JSON encoder
198
```