0
# IPython Integration
1
2
Kedro provides seamless integration with IPython and Jupyter environments through magic commands, automatic project loading, and interactive development support. This enables iterative development and debugging of data pipelines.
3
4
## Capabilities
5
6
### IPython Extension Loading
7
8
Main entry point for IPython extension that enables Kedro magic commands and project integration.
9
10
```python { .api }
11
def load_ipython_extension(ipython):
12
"""
13
Load Kedro IPython extension.
14
15
Args:
16
ipython (InteractiveShell): IPython shell instance
17
18
Side Effects:
19
- Registers %reload_kedro magic command
20
- Registers %load_node magic command
21
- Automatically loads Kedro project if found in current directory
22
- Provides logging and status information
23
"""
24
```
25
26
### Project Reloading
27
28
Functions for loading and reloading Kedro projects in interactive environments.
29
30
```python { .api }
31
def reload_kedro(path=None, env=None, runtime_params=None, local_namespace=None, conf_source=None):
32
"""
33
Load or reload Kedro project in IPython/Jupyter environment.
34
35
Args:
36
path (str, optional): Path to Kedro project root
37
env (str, optional): Environment name for configuration
38
runtime_params (dict, optional): Runtime parameters to pass
39
local_namespace (dict, optional): Local namespace for variable injection
40
conf_source (str, optional): Configuration source directory
41
42
Side Effects:
43
- Creates KedroSession and loads context
44
- Injects 'context', 'catalog', 'session', 'pipelines' into namespace
45
- Loads project-specific entry points and magic commands
46
- Configures project logging and settings
47
"""
48
```
49
50
## Magic Commands
51
52
### %reload_kedro Magic
53
54
IPython line magic for loading and reloading Kedro projects with configuration options.
55
56
```python { .api }
57
# %reload_kedro [path] [--env ENV] [--params PARAMS] [--conf-source CONF_SOURCE]
58
59
"""
60
Reload Kedro project with specified configuration.
61
62
Arguments:
63
path (str, optional): Path to project root directory
64
65
Options:
66
--env, -e (str): Environment name for configuration loading
67
--params (str): Runtime parameters in key=value,key2=value2 format
68
--conf-source (str): Path to configuration source directory
69
70
Examples:
71
%reload_kedro
72
%reload_kedro /path/to/project
73
%reload_kedro --env production
74
%reload_kedro --params model_type=xgboost,n_estimators=100
75
%reload_kedro --conf-source custom_conf
76
%reload_kedro /path/to/project --env staging --params debug=true
77
78
Variables Created:
79
context: KedroContext instance for project management
80
catalog: DataCatalog instance for dataset operations
81
session: KedroSession instance for lifecycle management
82
pipelines: Pipeline registry with project pipelines
83
"""
84
```
85
86
### %load_node Magic
87
88
IPython line magic for loading and debugging individual pipeline nodes.
89
90
```python { .api }
91
# %load_node [node_name]
92
93
"""
94
Load node code for debugging and development.
95
96
Arguments:
97
node_name (str): Name of the pipeline node to load
98
99
Features:
100
- Generates executable code for node debugging
101
- Loads node inputs from catalog
102
- Provides import statements and function definitions
103
- Creates executable function calls with proper parameters
104
- Supports multiple output cells in Jupyter/VSCode
105
106
Supported Environments:
107
- Jupyter Notebook (>7.0)
108
- Jupyter Lab
109
- IPython terminal
110
- VSCode Notebook
111
112
Examples:
113
%load_node preprocess_data_node
114
%load_node train_model_node
115
116
Generated Code Includes:
117
1. Catalog loading statements for node inputs
118
2. Import statements from node's source file
119
3. Function definition from node source
120
4. Function call with proper parameter mapping
121
"""
122
```
123
124
## Usage Examples
125
126
### Basic IPython Integration
127
128
```python
129
# In IPython/Jupyter cell
130
%load_ext kedro.ipython
131
132
# This automatically:
133
# 1. Registers magic commands
134
# 2. Detects and loads Kedro project from current directory
135
# 3. Provides context, catalog, session, pipelines variables
136
137
# Access project components
138
print(f"Available datasets: {catalog.keys()}")
139
print(f"Available pipelines: {list(pipelines.keys())}")
140
141
# Load data for exploration
142
raw_data = catalog.load("raw_data")
143
print(f"Raw data shape: {raw_data.shape}")
144
145
# Save results back to catalog
146
processed_data = raw_data.dropna()
147
catalog.save("processed_data", processed_data)
148
```
149
150
### Project Reloading
151
152
```python
153
# Reload project after code changes
154
%reload_kedro
155
156
# Reload with specific environment
157
%reload_kedro --env production
158
159
# Reload with runtime parameters
160
%reload_kedro --params model_type=random_forest,n_estimators=200
161
162
# Reload from different project path
163
%reload_kedro /path/to/other/project --env local
164
```
165
166
### Node Debugging Workflow
167
168
```python
169
# Load specific node for debugging
170
%load_node preprocess_data_node
171
172
# This generates multiple cells with:
173
# 1. Input loading
174
# 2. Imports
175
# 3. Function definition
176
# 4. Function call
177
178
# Example generated code:
179
# Cell 1 - Load inputs
180
"""
181
# Prepare necessary inputs for debugging
182
# All debugging inputs must be defined in your project catalog
183
raw_data = catalog.load("raw_data")
184
parameters = catalog.load("parameters:preprocessing")
185
"""
186
187
# Cell 2 - Imports
188
"""
189
import pandas as pd
190
import numpy as np
191
from sklearn.preprocessing import StandardScaler
192
"""
193
194
# Cell 3 - Function definition
195
"""
196
def preprocess_data(raw_data, parameters):
197
'''Clean and preprocess raw data.'''
198
# Drop missing values
199
clean_data = raw_data.dropna()
200
201
# Apply scaling if requested
202
if parameters.get("scale_features", False):
203
scaler = StandardScaler()
204
numeric_columns = clean_data.select_dtypes(include=[np.number]).columns
205
clean_data[numeric_columns] = scaler.fit_transform(clean_data[numeric_columns])
206
207
return clean_data
208
"""
209
210
# Cell 4 - Function call
211
"""
212
preprocess_data(raw_data, parameters)
213
"""
214
```
215
216
### Advanced Interactive Development
217
218
```python
219
# Load extension and project
220
%load_ext kedro.ipython
221
%reload_kedro --env development
222
223
# Explore pipeline structure
224
pipeline = pipelines["data_processing"]
225
print(f"Pipeline has {len(pipeline.nodes)} nodes")
226
227
# Visualize dependencies
228
for node in pipeline.nodes:
229
print(f"{node.name}: {node.inputs} -> {node.outputs}")
230
231
# Run partial pipeline interactively
232
from kedro.runner import SequentialRunner
233
runner = SequentialRunner()
234
235
# Run just preprocessing nodes
236
preprocessing_pipeline = pipeline.filter(tags=["preprocessing"])
237
result = runner.run(preprocessing_pipeline, catalog)
238
239
# Inspect intermediate results
240
intermediate_data = catalog.load("cleaned_data")
241
print(f"Cleaned data statistics:\n{intermediate_data.describe()}")
242
243
# Test individual node modifications
244
def modified_preprocess_data(raw_data):
245
# Test new preprocessing logic
246
return raw_data.fillna(0) # Different approach
247
248
# Test with current catalog data
249
test_input = catalog.load("raw_data")
250
test_output = modified_preprocess_data(test_input)
251
print(f"Modified preprocessing result: {test_output.shape}")
252
```
253
254
### Multi-Environment Development
255
256
```python
257
# Work with different environments
258
%reload_kedro --env local
259
local_catalog = catalog
260
261
%reload_kedro --env staging
262
staging_catalog = catalog
263
264
# Compare configurations
265
print("Local datasets:", local_catalog.keys())
266
print("Staging datasets:", staging_catalog.keys())
267
268
# Test pipeline with different data
269
%reload_kedro --env test
270
test_result = session.run("validation_pipeline")
271
print("Test validation results:", test_result)
272
```
273
274
### Custom Magic Command Development
275
276
```python
277
from IPython.core.magic import register_line_magic, needs_local_scope
278
from kedro.framework.cli.utils import load_entry_points
279
280
@register_line_magic
281
@needs_local_scope
282
def kedro_status(line, local_ns=None):
283
"""Custom magic command to show Kedro project status."""
284
if 'context' not in local_ns:
285
print("No Kedro project loaded. Use %reload_kedro first.")
286
return
287
288
context = local_ns['context']
289
catalog = local_ns['catalog']
290
pipelines = local_ns['pipelines']
291
292
print(f"Project Path: {context.project_path}")
293
print(f"Environment: {context._env}")
294
print(f"Datasets: {len(catalog.keys())}")
295
print(f"Pipelines: {len(pipelines)}")
296
297
# Show pipeline node counts
298
for name, pipeline in pipelines.items():
299
print(f" {name}: {len(pipeline.nodes)} nodes")
300
301
# Register custom magic
302
%kedro_status
303
```
304
305
### Integration with Data Science Workflows
306
307
```python
308
# Load Kedro project
309
%load_ext kedro.ipython
310
311
# Use catalog data with pandas/matplotlib
312
import matplotlib.pyplot as plt
313
import seaborn as sns
314
315
# Load data for analysis
316
df = catalog.load("cleaned_data")
317
318
# Exploratory data analysis
319
plt.figure(figsize=(12, 8))
320
sns.heatmap(df.corr(), annot=True)
321
plt.title("Feature Correlations")
322
plt.show()
323
324
# Feature engineering experiments
325
def create_new_features(data):
326
data = data.copy()
327
data['feature_ratio'] = data['feature_a'] / data['feature_b']
328
data['feature_interaction'] = data['feature_a'] * data['feature_b']
329
return data
330
331
# Test new features
332
enhanced_data = create_new_features(df)
333
334
# Save back to catalog for pipeline use
335
catalog.save("enhanced_features", enhanced_data)
336
337
# Run modeling pipeline with new features
338
result = session.run("modeling_pipeline", from_inputs=["enhanced_features"])
339
```
340
341
### Debugging Failed Pipelines
342
343
```python
344
# Load project and examine failed pipeline
345
%reload_kedro
346
347
# Load the specific node that failed
348
%load_node problematic_node
349
350
# Debug with actual inputs
351
problematic_inputs = {
352
input_name: catalog.load(input_name)
353
for input_name in node.inputs
354
}
355
356
# Step through function logic
357
def debug_function(input_data, parameters):
358
print(f"Input shape: {input_data.shape}")
359
print(f"Parameters: {parameters}")
360
361
# Add debugging prints
362
step1_result = input_data.dropna()
363
print(f"After dropna: {step1_result.shape}")
364
365
step2_result = step1_result[step1_result['value'] > 0]
366
print(f"After filtering: {step2_result.shape}")
367
368
return step2_result
369
370
# Test with debugging
371
debug_result = debug_function(
372
problematic_inputs['input_data'],
373
problematic_inputs['parameters:config']
374
)
375
```
376
377
## Environment Detection and Adaptation
378
379
```python { .api }
380
def _guess_run_environment():
381
"""
382
Detect the current IPython/Jupyter environment.
383
384
Returns:
385
str: Environment identifier - "vscode", "databricks", "jupyter", or "ipython"
386
387
Detection Logic:
388
- VSCode: Checks for VSCODE_PID or VSCODE_CWD environment variables
389
- Databricks: Checks for DATABRICKS_RUNTIME_VERSION environment variable
390
- Jupyter: Checks for kernel attribute on IPython instance
391
- IPython: Default fallback for terminal IPython
392
"""
393
```
394
395
## Types
396
397
```python { .api }
398
from typing import Dict, Any, Optional, List
399
from IPython.core.interactiveshell import InteractiveShell
400
401
EnvironmentName = Optional[str]
402
RuntimeParams = Optional[Dict[str, Any]]
403
LocalNamespace = Optional[Dict[str, Any]]
404
ConfSource = Optional[str]
405
ProjectPath = Optional[str]
406
NodeName = str
407
MagicCommand = str
408
```