Library to easily sync/diff/update 2 different data sources
npx @tessl/cli install tessl/pypi-diffsync@2.1.00
# DiffSync
1
2
DiffSync is a Python utility library designed to compare and synchronize different datasets. It serves as an intermediate translation layer between multiple data sources, enabling developers to define data models and adapters to translate between each base data source and a unified data model. The library excels in scenarios requiring repeated synchronization as data changes over time, accounting for creation, modification, and deletion of records, especially when data forms hierarchical relationships.
3
4
## Package Information
5
6
- **Package Name**: diffsync
7
- **Language**: Python (>=3.9,<4.0)
8
- **Installation**: `pip install diffsync`
9
- **Optional Redis Support**: `pip install diffsync[redis]`
10
11
## Core Imports
12
13
```python
14
import diffsync
15
```
16
17
For main classes:
18
19
```python
20
from diffsync import DiffSyncModel, Adapter
21
```
22
23
For complete API access:
24
25
```python
26
from diffsync import (
27
DiffSyncModel, Adapter, Diff,
28
DiffSyncFlags, DiffSyncModelFlags, DiffSyncStatus,
29
LocalStore, BaseStore,
30
# Exceptions
31
ObjectAlreadyExists, ObjectNotFound, ObjectStoreWrongType,
32
DiffClassMismatch
33
)
34
from diffsync.diff import DiffElement
35
from diffsync.store.redis import RedisStore
36
from diffsync.exceptions import (
37
ObjectNotCreated, ObjectNotUpdated, ObjectNotDeleted
38
)
39
from diffsync.enum import DiffSyncActions
40
```
41
42
## Basic Usage
43
44
```python
45
from diffsync import DiffSyncModel, Adapter
46
47
# Define a data model
48
class Device(DiffSyncModel):
49
_modelname = "device"
50
_identifiers = ("name",)
51
_attributes = ("os_version", "vendor")
52
53
name: str
54
os_version: str
55
vendor: str
56
57
# Create adapters for different data sources
58
class NetworkAdapter(Adapter):
59
device = Device
60
top_level = ["device"]
61
62
def load(self):
63
# Load data from your source (database, API, etc.)
64
device1 = Device(name="router1", os_version="15.1", vendor="cisco")
65
device2 = Device(name="switch1", os_version="12.2", vendor="juniper")
66
self.add(device1)
67
self.add(device2)
68
69
# Create two adapters with different data
70
source = NetworkAdapter(name="source")
71
target = NetworkAdapter(name="target")
72
73
# Load their respective data
74
source.load()
75
target.load()
76
77
# Calculate differences
78
diff = target.diff_from(source)
79
print(diff.str())
80
81
# Synchronize data from source to target
82
sync_diff = target.sync_from(source)
83
```
84
85
## Architecture
86
87
DiffSync uses a hierarchical model-based approach with several key components:
88
89
- **DiffSyncModel**: Base class for defining data models with identifiers, attributes, and child relationships
90
- **Adapter**: Container for managing collections of DiffSyncModel instances and performing diff/sync operations
91
- **Store Backends**: Storage implementations (LocalStore for in-memory, RedisStore for persistent storage)
92
- **Diff Objects**: Structured representations of differences between datasets
93
- **Sync Operations**: Automated creation, update, and deletion of records based on calculated diffs
94
95
This design enables systematic comparison and synchronization of complex, hierarchical data structures between disparate systems while maintaining data integrity and providing detailed change tracking.
96
97
## Capabilities
98
99
### Model Definition
100
101
Core functionality for defining data models that represent your domain objects. Models specify unique identifiers, trackable attributes, and parent-child relationships between different object types.
102
103
```python { .api }
104
class DiffSyncModel(BaseModel):
105
_modelname: ClassVar[str]
106
_identifiers: ClassVar[Tuple[str, ...]]
107
_attributes: ClassVar[Tuple[str, ...]]
108
_children: ClassVar[Dict[str, str]]
109
model_flags: DiffSyncModelFlags
110
adapter: Optional["Adapter"]
111
```
112
113
[Model Definition](./model-definition.md)
114
115
### Data Management
116
117
Adapter functionality for managing collections of models, loading data from various sources, and providing query and storage operations through configurable storage backends.
118
119
```python { .api }
120
class Adapter:
121
top_level: ClassVar[List[str]]
122
123
def __init__(self, name: Optional[str] = None,
124
internal_storage_engine: Union[Type[BaseStore], BaseStore] = LocalStore): ...
125
def load(self): ...
126
def add(self, obj: DiffSyncModel): ...
127
def get(self, obj: Union[str, DiffSyncModel, Type[DiffSyncModel]],
128
identifier: Union[str, Dict]) -> DiffSyncModel: ...
129
def get_all(self, obj: Union[str, DiffSyncModel, Type[DiffSyncModel]]) -> List[DiffSyncModel]: ...
130
```
131
132
[Data Management](./data-management.md)
133
134
### Diff Calculation
135
136
Comprehensive difference calculation between datasets, supporting hierarchical data structures, customizable comparison logic, and detailed change tracking with multiple output formats.
137
138
```python { .api }
139
def diff_from(self, source: "Adapter", diff_class: Type[Diff] = Diff,
140
flags: DiffSyncFlags = DiffSyncFlags.NONE,
141
callback: Optional[Callable[[str, int, int], None]] = None) -> Diff: ...
142
143
class Diff:
144
def __init__(self): ...
145
def add(self, element: "DiffElement"): ...
146
def has_diffs(self) -> bool: ...
147
def summary(self) -> Dict[str, int]: ...
148
```
149
150
[Diff Calculation](./diff-calculation.md)
151
152
### Synchronization
153
154
Automated synchronization operations that apply calculated differences to update target datasets. Supports creation, modification, and deletion of records with comprehensive error handling and status tracking.
155
156
```python { .api }
157
def sync_from(self, source: "Adapter", diff_class: Type[Diff] = Diff,
158
flags: DiffSyncFlags = DiffSyncFlags.NONE,
159
callback: Optional[Callable[[str, int, int], None]] = None,
160
diff: Optional[Diff] = None) -> Diff: ...
161
162
def sync_to(self, target: "Adapter", diff_class: Type[Diff] = Diff,
163
flags: DiffSyncFlags = DiffSyncFlags.NONE,
164
callback: Optional[Callable[[str, int, int], None]] = None,
165
diff: Optional[Diff] = None) -> Diff: ...
166
```
167
168
[Synchronization](./synchronization.md)
169
170
### Storage Backends
171
172
Pluggable storage backend implementations for different persistence requirements, from in-memory storage for temporary operations to Redis-based storage for distributed scenarios.
173
174
```python { .api }
175
class BaseStore:
176
def get(self, *, model: Union[str, "DiffSyncModel", Type["DiffSyncModel"]],
177
identifier: Union[str, Dict]) -> "DiffSyncModel": ...
178
def add(self, *, obj: "DiffSyncModel"): ...
179
def remove(self, *, obj: "DiffSyncModel", remove_children: bool = False): ...
180
181
class LocalStore(BaseStore): ...
182
class RedisStore(BaseStore): ...
183
```
184
185
[Storage Backends](./storage-backends.md)
186
187
### Flags and Configuration
188
189
Behavioral control flags and configuration options for customizing diff calculation and synchronization behavior, including error handling, skipping patterns, and logging verbosity.
190
191
```python { .api }
192
class DiffSyncFlags(enum.Flag):
193
NONE = 0
194
CONTINUE_ON_FAILURE = 0b1
195
SKIP_UNMATCHED_SRC = 0b10
196
SKIP_UNMATCHED_DST = 0b100
197
LOG_UNCHANGED_RECORDS = 0b1000
198
199
class DiffSyncModelFlags(enum.Flag):
200
NONE = 0
201
IGNORE = 0b1
202
SKIP_CHILDREN_ON_DELETE = 0b10
203
SKIP_UNMATCHED_SRC = 0b100
204
SKIP_UNMATCHED_DST = 0b1000
205
NATURAL_DELETION_ORDER = 0b10000
206
```
207
208
[Flags and Configuration](./flags-configuration.md)