0
# Core Client Operations
1
2
Essential Solr operations that form the foundation for interacting with Solr servers. These operations handle client initialization, health monitoring, document management, and index maintenance.
3
4
## Capabilities
5
6
### Client Initialization
7
8
Create and configure a Solr client instance with connection settings, authentication, timeouts, and custom handlers.
9
10
```python { .api }
11
class Solr:
12
def __init__(self, url, decoder=None, encoder=None, timeout=60, results_cls=Results,
13
search_handler="select", use_qt_param=False, always_commit=False,
14
auth=None, verify=True, session=None):
15
"""
16
Initialize a Solr client.
17
18
Parameters:
19
- url (str): Solr server URL (e.g., 'http://localhost:8983/solr/core_name')
20
- decoder (json.JSONDecoder, optional): Custom JSON decoder instance
21
- encoder (json.JSONEncoder, optional): Custom JSON encoder instance
22
- timeout (int): Request timeout in seconds (default: 60)
23
- results_cls (type): Results class for search responses (default: Results)
24
- search_handler (str): Default search handler name (default: "select")
25
- use_qt_param (bool): Use qt parameter instead of handler path (default: False)
26
- always_commit (bool): Auto-commit all update operations (default: False)
27
- auth (tuple or requests auth object, optional): HTTP authentication
28
- verify (bool): Enable SSL certificate verification (default: True)
29
- session (requests.Session, optional): Custom requests session
30
"""
31
```
32
33
Usage:
34
35
```python
36
import pysolr
37
38
# Basic client
39
solr = pysolr.Solr('http://localhost:8983/solr/my_core')
40
41
# Client with timeout and authentication
42
solr = pysolr.Solr(
43
'https://solr.example.com/solr/my_core',
44
timeout=30,
45
auth=('username', 'password'),
46
always_commit=True
47
)
48
49
# Client with custom session and SSL settings
50
import requests
51
session = requests.Session()
52
session.headers.update({'User-Agent': 'MyApp/1.0'})
53
54
solr = pysolr.Solr(
55
'https://solr.example.com/solr/my_core',
56
session=session,
57
verify='/path/to/ca-bundle.crt'
58
)
59
```
60
61
### Health Check
62
63
Test connectivity and server health with ping operations.
64
65
```python { .api }
66
def ping(self, handler="admin/ping", **kwargs):
67
"""
68
Send a ping request to test server connectivity.
69
70
Parameters:
71
- handler (str): Ping handler path (default: "admin/ping")
72
- **kwargs: Additional parameters passed to Solr
73
74
Returns:
75
str: Server response content
76
77
Raises:
78
SolrError: If ping fails or server is unreachable
79
"""
80
```
81
82
Usage:
83
84
```python
85
try:
86
response = solr.ping()
87
print("Solr server is healthy")
88
except pysolr.SolrError as e:
89
print(f"Solr server is down: {e}")
90
```
91
92
### Document Indexing
93
94
Add or update documents in the Solr index with support for batch operations, field updates, and commit control.
95
96
```python { .api }
97
def add(self, docs, boost=None, fieldUpdates=None, commit=None, softCommit=False,
98
commitWithin=None, waitFlush=None, waitSearcher=None, overwrite=None,
99
handler="update", min_rf=None):
100
"""
101
Add or update documents in the index.
102
103
Parameters:
104
- docs (list or dict): Document(s) to index. Each document is a dict with field names as keys
105
- boost (dict, optional): Per-field boost values {"field_name": boost_value}
106
- fieldUpdates (dict, optional): Field update operations {"field": "set"/"add"/"inc"}
107
- commit (bool, optional): Force commit after operation (overrides always_commit)
108
- softCommit (bool): Perform soft commit (default: False)
109
- commitWithin (int, optional): Auto-commit within specified milliseconds
110
- waitFlush (bool, optional): Wait for flush to complete
111
- waitSearcher (bool, optional): Wait for new searcher
112
- overwrite (bool, optional): Allow document overwrites (default: True)
113
- handler (str): Update handler path (default: "update")
114
- min_rf (int, optional): Minimum replication factor for SolrCloud
115
116
Returns:
117
str: Server response content
118
119
Raises:
120
SolrError: If indexing fails
121
ValueError: If docs parameter is invalid
122
"""
123
```
124
125
Usage:
126
127
```python
128
# Single document
129
solr.add({
130
"id": "doc_1",
131
"title": "Sample Document",
132
"content": "This is the document content.",
133
"category": "example"
134
})
135
136
# Multiple documents
137
docs = [
138
{"id": "doc_1", "title": "First Document", "content": "Content 1"},
139
{"id": "doc_2", "title": "Second Document", "content": "Content 2"}
140
]
141
solr.add(docs)
142
143
# With field boosts
144
solr.add(
145
{"id": "doc_1", "title": "Important Document", "content": "Key content"},
146
boost={"title": 2.0, "content": 1.5}
147
)
148
149
# Atomic field updates
150
solr.add(
151
{"id": "existing_doc", "category": "updated"},
152
fieldUpdates={"category": "set"}
153
)
154
155
# With commit control
156
solr.add(docs, commit=True) # Force immediate commit
157
solr.add(docs, commitWithin=5000) # Auto-commit within 5 seconds
158
```
159
160
### Document Deletion
161
162
Remove documents from the index by ID or query with commit control options.
163
164
```python { .api }
165
def delete(self, id=None, q=None, commit=None, softCommit=False,
166
waitFlush=None, waitSearcher=None, handler="update"):
167
"""
168
Delete documents from the index.
169
170
Parameters:
171
- id (str, list, or None): Document ID(s) to delete. Can be single ID or list of IDs
172
- q (str or None): Lucene query to select documents for deletion
173
- commit (bool, optional): Force commit after deletion (overrides always_commit)
174
- softCommit (bool): Perform soft commit (default: False)
175
- waitFlush (bool, optional): Wait for flush to complete
176
- waitSearcher (bool, optional): Wait for new searcher
177
- handler (str): Update handler path (default: "update")
178
179
Returns:
180
str: Server response content
181
182
Raises:
183
SolrError: If deletion fails
184
ValueError: If neither id nor q is specified, or both are specified
185
"""
186
```
187
188
Usage:
189
190
```python
191
# Delete by single ID
192
solr.delete(id='doc_1')
193
194
# Delete by multiple IDs
195
solr.delete(id=['doc_1', 'doc_2', 'doc_3'])
196
197
# Delete by query
198
solr.delete(q='category:obsolete')
199
solr.delete(q='*:*') # Delete all documents
200
201
# With commit control
202
solr.delete(id='doc_1', commit=True)
203
```
204
205
### Index Commit
206
207
Force Solr to write pending changes to disk and make them searchable.
208
209
```python { .api }
210
def commit(self, softCommit=False, waitFlush=None, waitSearcher=None,
211
expungeDeletes=None, handler="update"):
212
"""
213
Force Solr to commit pending changes to disk.
214
215
Parameters:
216
- softCommit (bool): Perform soft commit (visible but not durable) (default: False)
217
- waitFlush (bool, optional): Wait for flush to complete before returning
218
- waitSearcher (bool, optional): Wait for new searcher before returning
219
- expungeDeletes (bool, optional): Expunge deleted documents during commit
220
- handler (str): Update handler path (default: "update")
221
222
Returns:
223
str: Server response content
224
225
Raises:
226
SolrError: If commit fails
227
"""
228
```
229
230
Usage:
231
232
```python
233
# Standard commit
234
solr.commit()
235
236
# Soft commit (fast, visible immediately but not durable)
237
solr.commit(softCommit=True)
238
239
# Hard commit with deleted document cleanup
240
solr.commit(expungeDeletes=True)
241
242
# Synchronous commit (wait for completion)
243
solr.commit(waitFlush=True, waitSearcher=True)
244
```
245
246
### Index Optimization
247
248
Optimize the Solr index by reducing the number of segments, improving query performance.
249
250
```python { .api }
251
def optimize(self, commit=True, waitFlush=None, waitSearcher=None,
252
maxSegments=None, handler="update"):
253
"""
254
Optimize the Solr index by merging segments.
255
256
Parameters:
257
- commit (bool): Commit after optimization (default: True)
258
- waitFlush (bool, optional): Wait for flush to complete
259
- waitSearcher (bool, optional): Wait for new searcher
260
- maxSegments (int, optional): Maximum number of segments to merge down to
261
- handler (str): Update handler path (default: "update")
262
263
Returns:
264
str: Server response content
265
266
Raises:
267
SolrError: If optimization fails
268
"""
269
```
270
271
Usage:
272
273
```python
274
# Basic optimization
275
solr.optimize()
276
277
# Optimize to specific segment count
278
solr.optimize(maxSegments=1)
279
280
# Asynchronous optimization
281
solr.optimize(waitFlush=False, waitSearcher=False)
282
```
283
284
### Content Extraction
285
286
Extract content and metadata from files using Apache Tika integration for rich document processing.
287
288
```python { .api }
289
def extract(self, file_obj, extractOnly=True, handler="update/extract", **kwargs):
290
"""
291
Extract content and metadata from files using Apache Tika.
292
293
Parameters:
294
- file_obj (file-like object): File object with a 'name' attribute to extract from
295
- extractOnly (bool): If True, only extract without indexing (default: True)
296
- handler (str): Extract handler path (default: "update/extract")
297
- **kwargs: Additional parameters passed to Solr ExtractingRequestHandler
298
299
Returns:
300
dict: Dictionary containing extracted content and metadata:
301
- contents: Extracted full-text content (if applicable)
302
- metadata: Key-value pairs of extracted metadata
303
304
Raises:
305
ValueError: If file_obj doesn't have a 'name' attribute
306
SolrError: If extraction fails or server error occurs
307
"""
308
```
309
310
Usage:
311
312
```python
313
# Extract content from a PDF file
314
with open('document.pdf', 'rb') as pdf_file:
315
extracted = solr.extract(pdf_file)
316
print("Content:", extracted.get('contents', 'No content'))
317
print("Metadata:", extracted.get('metadata', {}))
318
319
# Extract and index in one step
320
with open('document.docx', 'rb') as doc_file:
321
result = solr.extract(
322
doc_file,
323
extractOnly=False, # Index the document
324
literal_id='doc_123', # Provide document ID
325
literal_title='Important Document' # Add custom fields
326
)
327
```
328
329
## Types
330
331
```python { .api }
332
class SolrError(Exception):
333
"""Exception raised for Solr-related errors including network issues, timeouts, and server errors."""
334
pass
335
```