0
# Character Text Splitting
1
2
Basic text splitting functionality using simple character-based separators. Ideal for basic document chunking with predictable separator patterns like line breaks or paragraph markers.
3
4
## Capabilities
5
6
### CharacterTextSplitter Class
7
8
Splits text based on a single character separator with configurable chunk size and overlap.
9
10
```typescript { .api }
11
/**
12
* Text splitter that splits text based on a single character separator
13
*/
14
class CharacterTextSplitter extends TextSplitter implements CharacterTextSplitterParams {
15
separator: string;
16
17
constructor(fields?: Partial<CharacterTextSplitterParams>);
18
splitText(text: string): Promise<string[]>;
19
static lc_name(): string;
20
}
21
22
interface CharacterTextSplitterParams extends TextSplitterParams {
23
/** The character(s) to split text on (default: "\n\n") */
24
separator: string;
25
}
26
```
27
28
**Usage Examples:**
29
30
```typescript
31
import { CharacterTextSplitter } from "@langchain/textsplitters";
32
33
// Basic paragraph splitting
34
const splitter = new CharacterTextSplitter({
35
separator: "\n\n",
36
chunkSize: 1000,
37
chunkOverlap: 200,
38
});
39
40
const text = `Paragraph one content here.
41
42
Paragraph two content here.
43
44
Paragraph three content here.`;
45
46
const chunks = await splitter.splitText(text);
47
// Result: ["Paragraph one content here.", "Paragraph two content here.", "Paragraph three content here."]
48
49
// Custom separator splitting
50
const csvSplitter = new CharacterTextSplitter({
51
separator: ",",
52
chunkSize: 50,
53
chunkOverlap: 10,
54
});
55
56
const csvData = "apple,banana,cherry,date,elderberry,fig";
57
const csvChunks = await csvSplitter.splitText(csvData);
58
59
// Word-level splitting
60
const wordSplitter = new CharacterTextSplitter({
61
separator: " ",
62
chunkSize: 20,
63
chunkOverlap: 5,
64
});
65
66
const sentence = "The quick brown fox jumps over the lazy dog";
67
const wordChunks = await wordSplitter.splitText(sentence);
68
```
69
70
### Document Creation
71
72
Create Document objects from split text with metadata preservation.
73
74
```typescript { .api }
75
/**
76
* Create Document objects from split text
77
* @param texts - Array of texts to split and convert to documents
78
* @param metadatas - Optional metadata for each text
79
* @param chunkHeaderOptions - Optional chunk header configuration
80
* @returns Array of Document objects with split text and metadata
81
*/
82
createDocuments(
83
texts: string[],
84
metadatas?: Record<string, any>[],
85
chunkHeaderOptions?: TextSplitterChunkHeaderOptions
86
): Promise<Document[]>;
87
```
88
89
**Usage Examples:**
90
91
```typescript
92
import { CharacterTextSplitter } from "@langchain/textsplitters";
93
94
const splitter = new CharacterTextSplitter({
95
separator: "\n\n",
96
chunkSize: 100,
97
chunkOverlap: 20,
98
});
99
100
// Create documents with metadata
101
const texts = ["First document text", "Second document text"];
102
const metadatas = [
103
{ source: "doc1.txt", author: "Alice" },
104
{ source: "doc2.txt", author: "Bob" }
105
];
106
107
const documents = await splitter.createDocuments(texts, metadatas);
108
// Each document will have pageContent with split text and merged metadata
109
110
// Create documents with chunk headers
111
const documentsWithHeaders = await splitter.createDocuments(
112
texts,
113
metadatas,
114
{
115
chunkHeader: "=== DOCUMENT CHUNK ===\n",
116
chunkOverlapHeader: "(continued from previous chunk) ",
117
appendChunkOverlapHeader: true
118
}
119
);
120
```
121
122
### Document Splitting
123
124
Split existing Document objects while preserving their metadata.
125
126
```typescript { .api }
127
/**
128
* Split existing Document objects
129
* @param documents - Array of documents to split
130
* @param chunkHeaderOptions - Optional chunk header configuration
131
* @returns Array of split Document objects
132
*/
133
splitDocuments(
134
documents: Document[],
135
chunkHeaderOptions?: TextSplitterChunkHeaderOptions
136
): Promise<Document[]>;
137
```
138
139
**Usage Examples:**
140
141
```typescript
142
import { CharacterTextSplitter } from "@langchain/textsplitters";
143
import { Document } from "@langchain/core/documents";
144
145
const splitter = new CharacterTextSplitter({
146
separator: "\n",
147
chunkSize: 50,
148
chunkOverlap: 10,
149
});
150
151
const originalDocs = [
152
new Document({
153
pageContent: "Line one\nLine two\nLine three\nLine four",
154
metadata: { source: "example.txt", type: "text" }
155
})
156
];
157
158
const splitDocs = await splitter.splitDocuments(originalDocs);
159
// Results in multiple documents, each with preserved metadata plus line location info
160
```
161
162
### Configuration Options
163
164
All character text splitters support the base TextSplitterParams configuration.
165
166
```typescript { .api }
167
interface TextSplitterParams {
168
/** Maximum size of each chunk in characters (default: 1000) */
169
chunkSize: number;
170
/** Number of characters to overlap between chunks (default: 200) */
171
chunkOverlap: number;
172
/** Whether to keep the separator in the split text (default: false) */
173
keepSeparator: boolean;
174
/** Custom function to calculate text length (default: text.length) */
175
lengthFunction?: ((text: string) => number) | ((text: string) => Promise<number>);
176
}
177
178
type TextSplitterChunkHeaderOptions = {
179
/** Header text to prepend to each chunk */
180
chunkHeader?: string;
181
/** Header text for chunks that continue from previous (default: "(cont'd) ") */
182
chunkOverlapHeader?: string;
183
/** Whether to append overlap header to continuing chunks (default: false) */
184
appendChunkOverlapHeader?: boolean;
185
};
186
```
187
188
**Configuration Examples:**
189
190
```typescript
191
// Custom length function using token count
192
const tokenBasedSplitter = new CharacterTextSplitter({
193
separator: "\n",
194
chunkSize: 100, // 100 tokens instead of characters
195
chunkOverlap: 20,
196
lengthFunction: (text: string) => {
197
// Simple token estimation (actual implementation would use proper tokenizer)
198
return text.split(/\s+/).length;
199
}
200
});
201
202
// Keep separators in output
203
const separatorKeepingSplitter = new CharacterTextSplitter({
204
separator: "\n---\n",
205
chunkSize: 500,
206
chunkOverlap: 0,
207
keepSeparator: true // Separators will be included in the chunks
208
});
209
```