or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

data-cleaning.mdindex.mdschema-definition.mdschema-introspection.mdutility-functions.mdvalidation-context.mdvalidation.md

data-cleaning.mddocs/

0

# Data Cleaning

1

2

Automatic data cleaning and normalization including type conversion, trimming, filtering, and automatic value assignment.

3

4

## Capabilities

5

6

### Clean Method

7

8

Primary method for cleaning and normalizing data according to schema rules.

9

10

```typescript { .api }

11

/**

12

* Cleans and normalizes an object according to schema rules

13

* @param doc - Document to clean

14

* @param options - Cleaning options

15

* @returns Cleaned document

16

*/

17

clean(doc: Record<string | number | symbol, unknown>, options?: CleanOptions): Record<string | number | symbol, unknown>;

18

19

interface CleanOptions {

20

/** Whether to automatically convert types (e.g., string "123" to number 123) */

21

autoConvert?: boolean;

22

/** Extended context object passed to autoValue functions */

23

extendedAutoValueContext?: CustomAutoValueContext;

24

/** Whether to remove properties not defined in schema */

25

filter?: boolean;

26

/** Whether to run autoValue functions */

27

getAutoValues?: boolean;

28

/** Whether the document being cleaned is a MongoDB modifier */

29

isModifier?: boolean;

30

/** Whether this is an upsert operation */

31

isUpsert?: boolean;

32

/** MongoObject instance for modifier operations */

33

mongoObject?: MongoObject;

34

/** Whether to mutate the original document (default: false, returns copy) */

35

mutate?: boolean;

36

/** Whether to remove empty strings */

37

removeEmptyStrings?: boolean;

38

/** Whether to remove null values from arrays */

39

removeNullsFromArrays?: boolean;

40

/** Whether to trim whitespace from string values */

41

trimStrings?: boolean;

42

}

43

```

44

45

**Usage Examples:**

46

47

```typescript

48

import SimpleSchema from "simpl-schema";

49

50

const userSchema = new SimpleSchema({

51

name: {

52

type: String,

53

trim: true

54

},

55

age: {

56

type: Number,

57

autoConvert: true

58

},

59

email: String,

60

tags: [String],

61

profile: {

62

type: Object,

63

optional: true

64

},

65

'profile.bio': {

66

type: String,

67

optional: true,

68

trim: true

69

}

70

});

71

72

// Basic cleaning

73

const dirtyData = {

74

name: " John Doe ",

75

age: "25", // String that should be number

76

email: "john@example.com",

77

tags: ["tag1", "tag2", null, ""],

78

unknownField: "should be removed"

79

};

80

81

const cleanData = userSchema.clean(dirtyData, {

82

autoConvert: true,

83

trimStrings: true,

84

filter: true,

85

removeEmptyStrings: true,

86

removeNullsFromArrays: true

87

});

88

// Result: {

89

// name: "John Doe",

90

// age: 25,

91

// email: "john@example.com",

92

// tags: ["tag1", "tag2"]

93

// }

94

```

95

96

### Auto-conversion

97

98

Automatic type conversion during cleaning process.

99

100

```typescript { .api }

101

interface CleanOptions {

102

/** Whether to automatically convert types based on schema definitions */

103

autoConvert?: boolean;

104

}

105

106

// Auto-conversion rules:

107

// - String numbers to Number type

108

// - String booleans ("true"/"false") to Boolean type

109

// - String dates to Date type

110

// - Arrays to proper array format

111

// - Objects to proper object format

112

```

113

114

**Usage Examples:**

115

116

```typescript

117

const schema = new SimpleSchema({

118

count: Number,

119

isActive: Boolean,

120

createdAt: Date,

121

tags: [String]

122

});

123

124

const data = {

125

count: "42",

126

isActive: "true",

127

createdAt: "2023-01-01",

128

tags: "tag1,tag2" // Could be converted to array

129

};

130

131

const cleaned = schema.clean(data, { autoConvert: true });

132

// Result: {

133

// count: 42,

134

// isActive: true,

135

// createdAt: new Date("2023-01-01"),

136

// tags: ["tag1,tag2"] // Note: SimpleSchema doesn't auto-split strings

137

// }

138

```

139

140

### String Trimming

141

142

Automatic whitespace removal from string values.

143

144

```typescript { .api }

145

interface CleanOptions {

146

/** Whether to trim whitespace from string values */

147

trimStrings?: boolean;

148

}

149

150

// Field-level trim option

151

interface SchemaKeyDefinitionBase {

152

/** Whether to trim this specific field during cleaning */

153

trim?: boolean;

154

}

155

```

156

157

**Usage Examples:**

158

159

```typescript

160

const schema = new SimpleSchema({

161

title: {

162

type: String,

163

trim: true // Field-level trim

164

},

165

description: String // No field-level trim specified

166

});

167

168

const data = {

169

title: " My Title ",

170

description: " My description "

171

};

172

173

// Clean with global trimStrings option

174

const cleaned1 = schema.clean(data, { trimStrings: true });

175

// Result: { title: "My Title", description: "My description" }

176

177

// Clean without global trimStrings (only field-level trim applies)

178

const cleaned2 = schema.clean(data);

179

// Result: { title: "My Title", description: " My description " }

180

```

181

182

### Filtering

183

184

Remove properties not defined in the schema.

185

186

```typescript { .api }

187

interface CleanOptions {

188

/** Whether to remove properties not defined in schema */

189

filter?: boolean;

190

}

191

```

192

193

**Usage Examples:**

194

195

```typescript

196

const schema = new SimpleSchema({

197

name: String,

198

email: String

199

});

200

201

const data = {

202

name: "John",

203

email: "john@example.com",

204

password: "secret123", // Not in schema

205

adminField: true // Not in schema

206

};

207

208

const filtered = schema.clean(data, { filter: true });

209

// Result: { name: "John", email: "john@example.com" }

210

211

const unfiltered = schema.clean(data);

212

// Result: { name: "John", email: "john@example.com", password: "secret123", adminField: true }

213

```

214

215

### Empty Value Removal

216

217

Remove empty strings and null values from arrays.

218

219

```typescript { .api }

220

interface CleanOptions {

221

/** Whether to remove empty strings from the document */

222

removeEmptyStrings?: boolean;

223

/** Whether to remove null values from arrays */

224

removeNullsFromArrays?: boolean;

225

}

226

```

227

228

**Usage Examples:**

229

230

```typescript

231

const schema = new SimpleSchema({

232

title: {

233

type: String,

234

optional: true

235

},

236

tags: [String],

237

categories: [String]

238

});

239

240

const data = {

241

title: "",

242

tags: ["javascript", "", "react", null, "node"],

243

categories: ["tech", null, "", "programming"]

244

};

245

246

const cleaned = schema.clean(data, {

247

removeEmptyStrings: true,

248

removeNullsFromArrays: true

249

});

250

// Result: {

251

// tags: ["javascript", "react", "node"],

252

// categories: ["tech", "programming"]

253

// }

254

// Note: title is removed because it's an empty string

255

```

256

257

### Auto Values

258

259

Automatically set field values using autoValue functions.

260

261

```typescript { .api }

262

interface CleanOptions {

263

/** Whether to run autoValue functions during cleaning */

264

getAutoValues?: boolean;

265

/** Extended context passed to autoValue functions */

266

extendedAutoValueContext?: CustomAutoValueContext;

267

}

268

269

// AutoValue function type

270

type AutoValueFunction = (this: AutoValueContext, obj: any) => any;

271

272

interface AutoValueContext {

273

/** Current field key */

274

key: string;

275

/** Whether the field value is explicitly set */

276

isSet: boolean;

277

/** Current field value (if set) */

278

value: any;

279

/** MongoDB update operator being used (if any) */

280

operator: string | null;

281

/** Access to other field values */

282

field(key: string): FieldInfo<any>;

283

/** Access to sibling field values */

284

siblingField(key: string): FieldInfo<any>;

285

/** Parent document being cleaned */

286

parentDoc: any;

287

/** Whether this is an insert operation */

288

isInsert: boolean;

289

/** Whether this is an update operation */

290

isUpdate: boolean;

291

/** Whether this is an upsert operation */

292

isUpsert: boolean;

293

/** Whether this is a modifier document */

294

isModifier: boolean;

295

/** Current user ID (if available) */

296

userId?: string;

297

/** Whether field is inside an array */

298

isInArrayItemObject: boolean;

299

/** Whether field is inside an object within an array */

300

isInSubObject: boolean;

301

}

302

```

303

304

**Usage Examples:**

305

306

```typescript

307

const schema = new SimpleSchema({

308

title: String,

309

slug: {

310

type: String,

311

autoValue() {

312

if (!this.isSet && this.field('title').isSet) {

313

// Auto-generate slug from title

314

return this.field('title').value.toLowerCase().replace(/\s+/g, '-');

315

}

316

}

317

},

318

createdAt: {

319

type: Date,

320

autoValue() {

321

if (this.isInsert && !this.isSet) {

322

return new Date();

323

}

324

}

325

},

326

updatedAt: {

327

type: Date,

328

autoValue() {

329

if (this.isUpdate) {

330

return new Date();

331

}

332

}

333

},

334

userId: {

335

type: String,

336

autoValue() {

337

if (this.isInsert && !this.isSet && this.userId) {

338

return this.userId;

339

}

340

}

341

}

342

});

343

344

const data = { title: "My Article" };

345

346

const cleaned = schema.clean(data, {

347

getAutoValues: true,

348

extendedAutoValueContext: { userId: "user123" }

349

});

350

// Result: {

351

// title: "My Article",

352

// slug: "my-article",

353

// createdAt: new Date(),

354

// userId: "user123"

355

// }

356

```

357

358

### MongoDB Modifier Cleaning

359

360

Clean MongoDB update modifier documents.

361

362

```typescript { .api }

363

interface CleanOptions {

364

/** Whether the document being cleaned is a MongoDB modifier */

365

isModifier?: boolean;

366

/** Whether this is an upsert operation */

367

isUpsert?: boolean;

368

/** MongoObject instance for advanced modifier handling */

369

mongoObject?: MongoObject;

370

}

371

```

372

373

**Usage Examples:**

374

375

```typescript

376

const schema = new SimpleSchema({

377

'profile.name': {

378

type: String,

379

trim: true

380

},

381

'profile.age': Number,

382

tags: [String],

383

updatedAt: {

384

type: Date,

385

autoValue() {

386

if (this.isUpdate) {

387

return new Date();

388

}

389

}

390

}

391

});

392

393

// Clean $set modifier

394

const setModifier = {

395

$set: {

396

'profile.name': ' John Doe ',

397

'profile.age': '30'

398

}

399

};

400

401

const cleanedSet = schema.clean(setModifier, {

402

isModifier: true,

403

trimStrings: true,

404

autoConvert: true,

405

getAutoValues: true

406

});

407

// Result: {

408

// $set: {

409

// 'profile.name': 'John Doe',

410

// 'profile.age': 30,

411

// updatedAt: new Date()

412

// }

413

// }

414

415

// Clean $push modifier

416

const pushModifier = {

417

$push: {

418

tags: ' javascript '

419

}

420

};

421

422

const cleanedPush = schema.clean(pushModifier, {

423

isModifier: true,

424

trimStrings: true

425

});

426

// Result: {

427

// $push: {

428

// tags: 'javascript'

429

// }

430

// }

431

```

432

433

### Mutation Control

434

435

Control whether cleaning mutates the original object or returns a copy.

436

437

```typescript { .api }

438

interface CleanOptions {

439

/** Whether to mutate the original document (default: false) */

440

mutate?: boolean;

441

}

442

```

443

444

**Usage Examples:**

445

446

```typescript

447

const schema = new SimpleSchema({

448

name: {

449

type: String,

450

trim: true

451

}

452

});

453

454

const originalData = { name: " John " };

455

456

// Clean without mutation (default)

457

const cleaned1 = schema.clean(originalData, { trimStrings: true });

458

console.log(originalData); // { name: " John " } - unchanged

459

console.log(cleaned1); // { name: "John" } - cleaned copy

460

461

// Clean with mutation

462

const cleaned2 = schema.clean(originalData, {

463

trimStrings: true,

464

mutate: true

465

});

466

console.log(originalData); // { name: "John" } - mutated

467

console.log(cleaned2); // { name: "John" } - same reference as originalData

468

console.log(originalData === cleaned2); // true

469

```

470

471

### Default Value Assignment

472

473

Automatically assign default values during cleaning.

474

475

```typescript { .api }

476

interface SchemaKeyDefinitionBase {

477

/** Default value to assign if field is not set */

478

defaultValue?: any;

479

}

480

481

// Default values are assigned during cleaning when getAutoValues is true

482

```

483

484

**Usage Examples:**

485

486

```typescript

487

const schema = new SimpleSchema({

488

name: String,

489

status: {

490

type: String,

491

defaultValue: 'active'

492

},

493

settings: {

494

type: Object,

495

defaultValue: () => ({ theme: 'light', lang: 'en' })

496

},

497

'settings.theme': {

498

type: String,

499

defaultValue: 'light'

500

},

501

'settings.lang': {

502

type: String,

503

defaultValue: 'en'

504

}

505

});

506

507

const data = { name: "John" };

508

509

const cleaned = schema.clean(data, { getAutoValues: true });

510

// Result: {

511

// name: "John",

512

// status: "active",

513

// settings: { theme: "light", lang: "en" }

514

// }

515

```

516

517

## Types

518

519

```typescript { .api }

520

interface CustomAutoValueContext {

521

[key: string]: any;

522

}

523

524

interface MongoObject {

525

[key: string]: any;

526

}

527

528

interface FieldInfo<ValueType> {

529

value: ValueType;

530

isSet: boolean;

531

operator: string | null;

532

}

533

```