or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-kedro

A comprehensive Python framework for production-ready data science and analytics pipeline development

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/kedro@1.0.x

To install, run

npx @tessl/cli install tessl/pypi-kedro@1.0.0

0

# Kedro

1

2

Kedro is a comprehensive Python framework designed for production-ready data science and analytics pipeline development. It provides software engineering best practices to help create reproducible, maintainable, and modular data engineering and data science pipelines through uniform project templates, data abstraction layers, configuration management, and pipeline assembly tools.

3

4

## Package Information

5

6

- **Package Name**: kedro

7

- **Language**: Python

8

- **Installation**: `pip install kedro`

9

- **Requires Python**: >=3.9

10

11

## Core Imports

12

13

```python

14

import kedro

15

```

16

17

Common patterns for working with Kedro components:

18

19

```python

20

# Configuration management

21

from kedro.config import AbstractConfigLoader, OmegaConfigLoader

22

23

# Data catalog and datasets

24

from kedro.io import DataCatalog, AbstractDataset, MemoryDataset

25

26

# Pipeline construction

27

from kedro.pipeline import Pipeline, Node, pipeline, node

28

29

# Pipeline execution

30

from kedro.runner import SequentialRunner, ParallelRunner, ThreadRunner

31

32

# Framework components

33

from kedro.framework.context import KedroContext

34

from kedro.framework.session import KedroSession

35

from kedro.framework.project import configure_project, pipelines, settings

36

```

37

38

## Basic Usage

39

40

```python

41

from kedro.pipeline import pipeline, node

42

from kedro.io import DataCatalog, MemoryDataset

43

from kedro.runner import SequentialRunner

44

45

# Define a simple processing function

46

def process_data(input_data):

47

"""Process input data and return results."""

48

return [x * 2 for x in input_data]

49

50

# Create a pipeline node

51

processing_node = node(

52

func=process_data,

53

inputs="raw_data",

54

outputs="processed_data",

55

name="process_data_node"

56

)

57

58

# Create a pipeline from nodes

59

data_pipeline = pipeline([processing_node])

60

61

# Set up a data catalog

62

catalog = DataCatalog({

63

"raw_data": MemoryDataset([1, 2, 3, 4, 5]),

64

"processed_data": MemoryDataset()

65

})

66

67

# Run the pipeline

68

runner = SequentialRunner()

69

runner.run(data_pipeline, catalog)

70

71

# Access results

72

results = catalog.load("processed_data")

73

print(results) # [2, 4, 6, 8, 10]

74

```

75

76

## Architecture

77

78

Kedro follows a modular architecture built around key abstractions:

79

80

- **DataCatalog**: Central registry managing all datasets with consistent load/save interfaces

81

- **Pipeline**: Directed acyclic graph (DAG) of processing nodes with automatic dependency resolution

82

- **Node**: Individual computation units that transform inputs to outputs via Python functions

83

- **Runner**: Execution engines supporting sequential, parallel, and threaded processing strategies

84

- **KedroContext**: Project context providing configuration, catalog access, and environment management

85

- **KedroSession**: Session management for project lifecycle and execution environment

86

87

This design enables scalable data workflows that follow software engineering principles, supporting everything from local development to production deployment across different compute environments.

88

89

## Capabilities

90

91

### Configuration Management

92

93

Flexible configuration loading supporting multiple formats (YAML, JSON) with environment-specific overrides, parameter management, and extensible loader implementations.

94

95

```python { .api }

96

class AbstractConfigLoader:

97

def load_and_merge_dir_config(self, config_path, env=None, **kwargs): ...

98

def get(self, *patterns, **kwargs): ...

99

100

class OmegaConfigLoader(AbstractConfigLoader):

101

def __init__(self, conf_source, base_env="base", default_run_env="local", **kwargs): ...

102

```

103

104

[Configuration Management](./configuration.md)

105

106

### Data Catalog and Dataset Management

107

108

Comprehensive data abstraction layer providing consistent interfaces for various data sources, versioning support, lazy loading, and catalog-based dataset management.

109

110

```python { .api }

111

class DataCatalog:

112

def load(self, name): ...

113

def save(self, name, data): ...

114

def list(self): ...

115

def exists(self, name): ...

116

def add(self, data_set_name, data_set, replace=False): ...

117

118

class AbstractDataset:

119

def load(self): ...

120

def save(self, data): ...

121

def exists(self): ...

122

```

123

124

[Data Catalog and Datasets](./data-catalog.md)

125

126

### Pipeline Construction

127

128

Pipeline definition capabilities including node creation, dependency management, pipeline composition, filtering, and transformation operations.

129

130

```python { .api }

131

class Pipeline:

132

def filter(self, tags=None, from_nodes=None, to_nodes=None, **kwargs): ...

133

def tag(self, tags): ...

134

def __add__(self, other): ...

135

def __or__(self, other): ...

136

137

class Node:

138

def __init__(self, func, inputs, outputs, name=None, tags=None): ...

139

140

def node(func, inputs, outputs, name=None, tags=None): ...

141

def pipeline(pipe, inputs=None, outputs=None, parameters=None, tags=None): ...

142

```

143

144

[Pipeline Construction](./pipeline-construction.md)

145

146

### Pipeline Execution

147

148

Multiple execution strategies for running pipelines including sequential, parallel (multiprocessing), and threaded execution with support for partial runs and custom data loading.

149

150

```python { .api }

151

class AbstractRunner:

152

def run(self, pipeline, catalog, hook_manager=None, session_id=None): ...

153

def run_only_missing(self, pipeline, catalog, hook_manager=None, session_id=None): ...

154

155

class SequentialRunner(AbstractRunner): ...

156

class ParallelRunner(AbstractRunner): ...

157

class ThreadRunner(AbstractRunner): ...

158

```

159

160

[Pipeline Execution](./pipeline-execution.md)

161

162

### Project Context and Session Management

163

164

Project lifecycle management including context creation, session handling, configuration access, and environment setup for Kedro applications.

165

166

```python { .api }

167

class KedroContext:

168

def run(self, pipeline_name=None, tags=None, runner=None, **kwargs): ...

169

@property

170

def catalog(self): ...

171

@property

172

def config_loader(self): ...

173

174

class KedroSession:

175

@classmethod

176

def create(cls, project_path=None, save_on_close=True, **kwargs): ...

177

def load_context(self): ...

178

def run(self, pipeline_name=None, tags=None, runner=None, **kwargs): ...

179

```

180

181

[Context and Session Management](./context-session.md)

182

183

### CLI and Project Management

184

185

Command-line interface for project creation, pipeline execution, and project management with extensible plugin system and project discovery utilities.

186

187

```python { .api }

188

def main(): ...

189

def configure_project(package_name): ...

190

def find_pipelines(raise_errors=False): ...

191

```

192

193

[CLI and Project Management](./cli-project.md)

194

195

### Hook System and Extensions

196

197

Plugin architecture enabling custom behavior injection at various lifecycle points including node execution, pipeline runs, and catalog operations.

198

199

```python { .api }

200

def hook_impl(func): ...

201

def _create_hook_manager(): ...

202

```

203

204

[Hook System](./hooks.md)

205

206

### IPython and Jupyter Integration

207

208

Interactive development support with magic commands for reloading projects, debugging nodes, and seamless integration with Jupyter notebooks and IPython environments.

209

210

```python { .api }

211

def load_ipython_extension(ipython): ...

212

def reload_kedro(path=None, env=None, runtime_params=None, **kwargs): ...

213

```

214

215

[IPython Integration](./ipython-integration.md)