or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

character-splitting.mdformat-splitting.mdindex.mdrecursive-splitting.mdtoken-splitting.md
tile.json

character-splitting.mddocs/

0

# Character Text Splitting

1

2

Basic text splitting functionality using simple character-based separators. Ideal for basic document chunking with predictable separator patterns like line breaks or paragraph markers.

3

4

## Capabilities

5

6

### CharacterTextSplitter Class

7

8

Splits text based on a single character separator with configurable chunk size and overlap.

9

10

```typescript { .api }

11

/**

12

* Text splitter that splits text based on a single character separator

13

*/

14

class CharacterTextSplitter extends TextSplitter implements CharacterTextSplitterParams {

15

separator: string;

16

17

constructor(fields?: Partial<CharacterTextSplitterParams>);

18

splitText(text: string): Promise<string[]>;

19

static lc_name(): string;

20

}

21

22

interface CharacterTextSplitterParams extends TextSplitterParams {

23

/** The character(s) to split text on (default: "\n\n") */

24

separator: string;

25

}

26

```

27

28

**Usage Examples:**

29

30

```typescript

31

import { CharacterTextSplitter } from "@langchain/textsplitters";

32

33

// Basic paragraph splitting

34

const splitter = new CharacterTextSplitter({

35

separator: "\n\n",

36

chunkSize: 1000,

37

chunkOverlap: 200,

38

});

39

40

const text = `Paragraph one content here.

41

42

Paragraph two content here.

43

44

Paragraph three content here.`;

45

46

const chunks = await splitter.splitText(text);

47

// Result: ["Paragraph one content here.", "Paragraph two content here.", "Paragraph three content here."]

48

49

// Custom separator splitting

50

const csvSplitter = new CharacterTextSplitter({

51

separator: ",",

52

chunkSize: 50,

53

chunkOverlap: 10,

54

});

55

56

const csvData = "apple,banana,cherry,date,elderberry,fig";

57

const csvChunks = await csvSplitter.splitText(csvData);

58

59

// Word-level splitting

60

const wordSplitter = new CharacterTextSplitter({

61

separator: " ",

62

chunkSize: 20,

63

chunkOverlap: 5,

64

});

65

66

const sentence = "The quick brown fox jumps over the lazy dog";

67

const wordChunks = await wordSplitter.splitText(sentence);

68

```

69

70

### Document Creation

71

72

Create Document objects from split text with metadata preservation.

73

74

```typescript { .api }

75

/**

76

* Create Document objects from split text

77

* @param texts - Array of texts to split and convert to documents

78

* @param metadatas - Optional metadata for each text

79

* @param chunkHeaderOptions - Optional chunk header configuration

80

* @returns Array of Document objects with split text and metadata

81

*/

82

createDocuments(

83

texts: string[],

84

metadatas?: Record<string, any>[],

85

chunkHeaderOptions?: TextSplitterChunkHeaderOptions

86

): Promise<Document[]>;

87

```

88

89

**Usage Examples:**

90

91

```typescript

92

import { CharacterTextSplitter } from "@langchain/textsplitters";

93

94

const splitter = new CharacterTextSplitter({

95

separator: "\n\n",

96

chunkSize: 100,

97

chunkOverlap: 20,

98

});

99

100

// Create documents with metadata

101

const texts = ["First document text", "Second document text"];

102

const metadatas = [

103

{ source: "doc1.txt", author: "Alice" },

104

{ source: "doc2.txt", author: "Bob" }

105

];

106

107

const documents = await splitter.createDocuments(texts, metadatas);

108

// Each document will have pageContent with split text and merged metadata

109

110

// Create documents with chunk headers

111

const documentsWithHeaders = await splitter.createDocuments(

112

texts,

113

metadatas,

114

{

115

chunkHeader: "=== DOCUMENT CHUNK ===\n",

116

chunkOverlapHeader: "(continued from previous chunk) ",

117

appendChunkOverlapHeader: true

118

}

119

);

120

```

121

122

### Document Splitting

123

124

Split existing Document objects while preserving their metadata.

125

126

```typescript { .api }

127

/**

128

* Split existing Document objects

129

* @param documents - Array of documents to split

130

* @param chunkHeaderOptions - Optional chunk header configuration

131

* @returns Array of split Document objects

132

*/

133

splitDocuments(

134

documents: Document[],

135

chunkHeaderOptions?: TextSplitterChunkHeaderOptions

136

): Promise<Document[]>;

137

```

138

139

**Usage Examples:**

140

141

```typescript

142

import { CharacterTextSplitter } from "@langchain/textsplitters";

143

import { Document } from "@langchain/core/documents";

144

145

const splitter = new CharacterTextSplitter({

146

separator: "\n",

147

chunkSize: 50,

148

chunkOverlap: 10,

149

});

150

151

const originalDocs = [

152

new Document({

153

pageContent: "Line one\nLine two\nLine three\nLine four",

154

metadata: { source: "example.txt", type: "text" }

155

})

156

];

157

158

const splitDocs = await splitter.splitDocuments(originalDocs);

159

// Results in multiple documents, each with preserved metadata plus line location info

160

```

161

162

### Configuration Options

163

164

All character text splitters support the base TextSplitterParams configuration.

165

166

```typescript { .api }

167

interface TextSplitterParams {

168

/** Maximum size of each chunk in characters (default: 1000) */

169

chunkSize: number;

170

/** Number of characters to overlap between chunks (default: 200) */

171

chunkOverlap: number;

172

/** Whether to keep the separator in the split text (default: false) */

173

keepSeparator: boolean;

174

/** Custom function to calculate text length (default: text.length) */

175

lengthFunction?: ((text: string) => number) | ((text: string) => Promise<number>);

176

}

177

178

type TextSplitterChunkHeaderOptions = {

179

/** Header text to prepend to each chunk */

180

chunkHeader?: string;

181

/** Header text for chunks that continue from previous (default: "(cont'd) ") */

182

chunkOverlapHeader?: string;

183

/** Whether to append overlap header to continuing chunks (default: false) */

184

appendChunkOverlapHeader?: boolean;

185

};

186

```

187

188

**Configuration Examples:**

189

190

```typescript

191

// Custom length function using token count

192

const tokenBasedSplitter = new CharacterTextSplitter({

193

separator: "\n",

194

chunkSize: 100, // 100 tokens instead of characters

195

chunkOverlap: 20,

196

lengthFunction: (text: string) => {

197

// Simple token estimation (actual implementation would use proper tokenizer)

198

return text.split(/\s+/).length;

199

}

200

});

201

202

// Keep separators in output

203

const separatorKeepingSplitter = new CharacterTextSplitter({

204

separator: "\n---\n",

205

chunkSize: 500,

206

chunkOverlap: 0,

207

keepSeparator: true // Separators will be included in the chunks

208

});

209

```