Orange, a component-based data mining framework.
npx @tessl/cli install tessl/pypi-orange3@3.39.00
# Orange3
1
2
Orange3 is a comprehensive visual data mining and machine learning framework designed for both novice and expert users. It provides an intuitive drag-and-drop workflow interface while offering powerful programmatic APIs for data analysis, machine learning, and visualization. The framework includes extensive capabilities for data input/output, preprocessing, classification, regression, clustering, evaluation, and visualization.
3
4
## Package Information
5
6
- **Package Name**: Orange3
7
- **Language**: Python
8
- **Installation**: `pip install Orange3`
9
- **Minimum Python Version**: 3.10+
10
11
## Core Imports
12
13
```python
14
import Orange
15
```
16
17
Common for working with data and machine learning:
18
19
```python
20
from Orange.data import Table, Domain, Variable, ContinuousVariable, DiscreteVariable, StringVariable
21
from Orange.classification import LogisticRegressionLearner, TreeLearner, RandomForestLearner, NaiveBayesLearner
22
from Orange.regression import LinearRegressionLearner, RidgeRegressionLearner, LassoRegressionLearner, ElasticNetLearner, SGDRegressionLearner
23
from Orange.clustering import KMeans, DBSCAN, HierarchicalClustering
24
from Orange.evaluation import CrossValidation, CA, AUC, MSE, RMSE
25
from Orange.preprocess import Discretize, Impute, Normalizer, SelectBestFeatures
26
from Orange.projection import PCA, LDA
27
```
28
29
## Basic Usage
30
31
```python
32
import Orange
33
from Orange.data import Table
34
from Orange.classification import TreeLearner
35
from Orange.evaluation import CrossValidation, CA
36
37
# Load a dataset (using recommended factory method)
38
data = Table.from_file("iris")
39
40
# Create a learner
41
learner = TreeLearner()
42
43
# Evaluate using cross-validation
44
results = CrossValidation(data, [learner], k=5)
45
accuracy = CA(results)
46
47
print(f"Accuracy: {accuracy[0]:.3f}")
48
49
# Train a model on full dataset
50
model = learner(data)
51
52
# Make predictions on new data
53
predictions = model(data[:5])
54
print(f"Predictions: {predictions}")
55
```
56
57
## Architecture
58
59
Orange3 uses a modular, lazy-loading architecture with several key components:
60
61
- **Data Module**: Core data handling (Table, Domain, Variable types)
62
- **Machine Learning Modules**: Classification, regression, clustering, and ensemble methods
63
- **Processing Modules**: Data preprocessing, feature selection, and transformation
64
- **Evaluation Module**: Model validation and performance metrics
65
- **Visual Programming**: Widget-based graphical interface for building workflows
66
- **Utilities**: Base classes, tree structures, distance metrics, and statistical functions
67
68
The framework supports both programmatic usage and visual workflow construction, making it accessible to users with varying levels of programming experience while maintaining the power needed for advanced data science tasks.
69
70
## Capabilities
71
72
### Data Handling and I/O
73
74
Core data structures and operations for loading, manipulating, and transforming datasets. Includes support for various file formats, missing value handling, and domain management.
75
76
```python { .api }
77
class Table:
78
@classmethod
79
def from_domain(cls, domain, n_rows=0, weights=False): ...
80
@classmethod
81
def from_table(cls, domain, source, row_indices=...): ...
82
@classmethod
83
def from_file(cls, filename, **kwargs): ...
84
@classmethod
85
def from_numpy(cls, domain, X, Y=None, metas=None, **kwargs): ...
86
@classmethod
87
def from_url(cls, url, **kwargs): ...
88
def save(self, filename): ...
89
def copy(self): ...
90
def transform(self, domain): ...
91
92
class Domain:
93
def __init__(self, attributes, class_vars=None, metas=None): ...
94
95
class Variable:
96
def __init__(self, name="", compute_value=None): ...
97
98
class ContinuousVariable(Variable):
99
def __init__(self, name="", number_of_decimals=None, compute_value=None, *, sparse=False): ...
100
class DiscreteVariable(Variable):
101
def __init__(self, name="", values=(), ordered=False, compute_value=None, *, sparse=False): ...
102
class StringVariable(Variable):
103
def __init__(self, name="", compute_value=None, *, sparse=False): ...
104
```
105
106
[Data Handling](./data-handling.md)
107
108
### Classification Algorithms
109
110
Supervised learning algorithms for categorical prediction tasks, including tree-based methods, probabilistic classifiers, support vector machines, neural networks, and ensemble methods.
111
112
```python { .api }
113
class TreeLearner:
114
def __init__(self, binarize=False, max_depth=None, min_samples_leaf=1,
115
min_samples_split=2, sufficient_majority=0.95, preprocessors=None): ...
116
def __call__(self, data): ...
117
118
class LogisticRegressionLearner:
119
def __init__(self, penalty="l2", dual=False, tol=0.0001, C=1.0,
120
fit_intercept=True, intercept_scaling=1, class_weight=None,
121
random_state=None, solver="auto", max_iter=100,
122
multi_class="deprecated", verbose=0, n_jobs=1, preprocessors=None): ...
123
def __call__(self, data): ...
124
125
class RandomForestLearner:
126
def __init__(self, n_estimators=10, max_depth=None, preprocessors=None): ...
127
def __call__(self, data): ...
128
129
class NaiveBayesLearner:
130
def __init__(self, preprocessors=None): ...
131
def __call__(self, data): ...
132
```
133
134
[Classification](./classification.md)
135
136
### Regression Algorithms
137
138
Supervised learning algorithms for continuous prediction tasks, including linear models, tree-based regression, neural networks, and ensemble methods.
139
140
```python { .api }
141
class LinearRegressionLearner:
142
def __init__(self, preprocessors=None, fit_intercept=True): ...
143
def __call__(self, data): ...
144
145
class RidgeRegressionLearner:
146
def __init__(self, alpha=1.0, fit_intercept=True, copy_X=True,
147
max_iter=None, tol=0.001, solver='auto', preprocessors=None): ...
148
def __call__(self, data): ...
149
150
class LassoRegressionLearner:
151
def __init__(self, alpha=1.0, fit_intercept=True, precompute=False,
152
copy_X=True, max_iter=1000, tol=0.0001, warm_start=False,
153
positive=False, preprocessors=None): ...
154
def __call__(self, data): ...
155
156
class ElasticNetLearner:
157
def __init__(self, alpha=1.0, l1_ratio=0.5, fit_intercept=True,
158
precompute=False, max_iter=1000, copy_X=True, tol=0.0001,
159
warm_start=False, positive=False, preprocessors=None): ...
160
def __call__(self, data): ...
161
162
class SGDRegressionLearner:
163
def __init__(self, loss='squared_error', penalty='l2', alpha=0.0001,
164
l1_ratio=0.15, fit_intercept=True, max_iter=5, tol=1e-3,
165
shuffle=True, epsilon=0.1, random_state=None, preprocessors=None): ...
166
def __call__(self, data): ...
167
168
class RandomForestRegressionLearner:
169
def __init__(self, n_estimators=10, max_depth=None, preprocessors=None): ...
170
def __call__(self, data): ...
171
```
172
173
[Regression](./regression.md)
174
175
### Clustering and Unsupervised Learning
176
177
Unsupervised learning algorithms for discovering patterns and structures in data without labeled examples.
178
179
```python { .api }
180
class KMeans:
181
def __init__(self, n_clusters=8, init='k-means++', n_init=10, max_iter=300,
182
tol=0.0001, random_state=None, preprocessors=None): ...
183
def __call__(self, data): ...
184
185
class DBSCAN:
186
def __init__(self, eps=0.5, min_samples=5, metric='euclidean',
187
algorithm='auto', leaf_size=30, p=None, preprocessors=None): ...
188
def __call__(self, data): ...
189
190
class HierarchicalClustering:
191
def __init__(self, n_clusters=2, linkage='average'): ...
192
def fit(self, X): ...
193
def fit_predict(self, X, y=None): ...
194
```
195
196
[Clustering](./clustering.md)
197
198
### Data Preprocessing
199
200
Data transformation and preparation techniques including discretization, normalization, imputation, and feature selection.
201
202
```python { .api }
203
class Discretize:
204
def __init__(self, method=None, n_intervals=4): ...
205
def __call__(self, data): ...
206
207
class Impute:
208
def __init__(self, method=None): ...
209
def __call__(self, data): ...
210
211
class Normalizer:
212
def __init__(self, norm_type='l2', transform_class=False): ...
213
def __call__(self, data): ...
214
215
class SelectBestFeatures:
216
def __init__(self, method=None, k=5): ...
217
def __call__(self, data): ...
218
```
219
220
[Preprocessing](./preprocessing.md)
221
222
### Model Evaluation and Validation
223
224
Comprehensive model evaluation framework with cross-validation, performance metrics, and statistical testing capabilities.
225
226
```python { .api }
227
class CrossValidation:
228
def __init__(self, data, learners, k=10, stratified=True): ...
229
230
class TestOnTestData:
231
def __init__(self, train_data, test_data, learners): ...
232
233
def CA(results): ... # Classification Accuracy
234
def AUC(results): ... # Area Under Curve
235
def MSE(results): ... # Mean Squared Error
236
def RMSE(results): ... # Root Mean Squared Error
237
```
238
239
[Evaluation](./evaluation.md)
240
241
### Dimensionality Reduction and Projection
242
243
Techniques for reducing data dimensionality and creating low-dimensional representations for visualization and analysis.
244
245
```python { .api }
246
class PCA:
247
def __init__(self, n_components=None): ...
248
def __call__(self, data): ...
249
250
class LDA:
251
def __init__(self, n_components=None): ...
252
def __call__(self, data): ...
253
254
class FreeViz:
255
def __init__(self): ...
256
def __call__(self, data): ...
257
```
258
259
[Projection](./projection.md)
260
261
### Distance Metrics
262
263
Comprehensive collection of distance and similarity measures for various data types and analysis tasks.
264
265
```python { .api }
266
class Euclidean:
267
def __call__(self, data): ...
268
269
class Manhattan:
270
def __call__(self, data): ...
271
272
class Cosine:
273
def __call__(self, data): ...
274
275
class Jaccard:
276
def __call__(self, data): ...
277
```
278
279
[Distance Metrics](./distance.md)
280
281
### Visual Programming Widgets
282
283
Widget-based graphical interface components for building data analysis workflows through drag-and-drop operations.
284
285
```python { .api }
286
# Widget categories accessible through Orange Canvas:
287
# - Orange.widgets.data: Data input/output widgets
288
# - Orange.widgets.visualize: Visualization widgets
289
# - Orange.widgets.model: Machine learning model widgets
290
# - Orange.widgets.evaluate: Evaluation widgets
291
# - Orange.widgets.unsupervised: Clustering widgets
292
```
293
294
[Widgets](./widgets.md)