Various implementations of LangChain.js text splitters for retrieval-augmented generation (RAG) pipelines
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Basic text splitting functionality using simple character-based separators. Ideal for basic document chunking with predictable separator patterns like line breaks or paragraph markers.
Splits text based on a single character separator with configurable chunk size and overlap.
/**
* Text splitter that splits text based on a single character separator
*/
class CharacterTextSplitter extends TextSplitter implements CharacterTextSplitterParams {
separator: string;
constructor(fields?: Partial<CharacterTextSplitterParams>);
splitText(text: string): Promise<string[]>;
static lc_name(): string;
}
interface CharacterTextSplitterParams extends TextSplitterParams {
/** The character(s) to split text on (default: "\n\n") */
separator: string;
}Usage Examples:
import { CharacterTextSplitter } from "@langchain/textsplitters";
// Basic paragraph splitting
const splitter = new CharacterTextSplitter({
separator: "\n\n",
chunkSize: 1000,
chunkOverlap: 200,
});
const text = `Paragraph one content here.
Paragraph two content here.
Paragraph three content here.`;
const chunks = await splitter.splitText(text);
// Result: ["Paragraph one content here.", "Paragraph two content here.", "Paragraph three content here."]
// Custom separator splitting
const csvSplitter = new CharacterTextSplitter({
separator: ",",
chunkSize: 50,
chunkOverlap: 10,
});
const csvData = "apple,banana,cherry,date,elderberry,fig";
const csvChunks = await csvSplitter.splitText(csvData);
// Word-level splitting
const wordSplitter = new CharacterTextSplitter({
separator: " ",
chunkSize: 20,
chunkOverlap: 5,
});
const sentence = "The quick brown fox jumps over the lazy dog";
const wordChunks = await wordSplitter.splitText(sentence);Create Document objects from split text with metadata preservation.
/**
* Create Document objects from split text
* @param texts - Array of texts to split and convert to documents
* @param metadatas - Optional metadata for each text
* @param chunkHeaderOptions - Optional chunk header configuration
* @returns Array of Document objects with split text and metadata
*/
createDocuments(
texts: string[],
metadatas?: Record<string, any>[],
chunkHeaderOptions?: TextSplitterChunkHeaderOptions
): Promise<Document[]>;Usage Examples:
import { CharacterTextSplitter } from "@langchain/textsplitters";
const splitter = new CharacterTextSplitter({
separator: "\n\n",
chunkSize: 100,
chunkOverlap: 20,
});
// Create documents with metadata
const texts = ["First document text", "Second document text"];
const metadatas = [
{ source: "doc1.txt", author: "Alice" },
{ source: "doc2.txt", author: "Bob" }
];
const documents = await splitter.createDocuments(texts, metadatas);
// Each document will have pageContent with split text and merged metadata
// Create documents with chunk headers
const documentsWithHeaders = await splitter.createDocuments(
texts,
metadatas,
{
chunkHeader: "=== DOCUMENT CHUNK ===\n",
chunkOverlapHeader: "(continued from previous chunk) ",
appendChunkOverlapHeader: true
}
);Split existing Document objects while preserving their metadata.
/**
* Split existing Document objects
* @param documents - Array of documents to split
* @param chunkHeaderOptions - Optional chunk header configuration
* @returns Array of split Document objects
*/
splitDocuments(
documents: Document[],
chunkHeaderOptions?: TextSplitterChunkHeaderOptions
): Promise<Document[]>;Usage Examples:
import { CharacterTextSplitter } from "@langchain/textsplitters";
import { Document } from "@langchain/core/documents";
const splitter = new CharacterTextSplitter({
separator: "\n",
chunkSize: 50,
chunkOverlap: 10,
});
const originalDocs = [
new Document({
pageContent: "Line one\nLine two\nLine three\nLine four",
metadata: { source: "example.txt", type: "text" }
})
];
const splitDocs = await splitter.splitDocuments(originalDocs);
// Results in multiple documents, each with preserved metadata plus line location infoAll character text splitters support the base TextSplitterParams configuration.
interface TextSplitterParams {
/** Maximum size of each chunk in characters (default: 1000) */
chunkSize: number;
/** Number of characters to overlap between chunks (default: 200) */
chunkOverlap: number;
/** Whether to keep the separator in the split text (default: false) */
keepSeparator: boolean;
/** Custom function to calculate text length (default: text.length) */
lengthFunction?: ((text: string) => number) | ((text: string) => Promise<number>);
}
type TextSplitterChunkHeaderOptions = {
/** Header text to prepend to each chunk */
chunkHeader?: string;
/** Header text for chunks that continue from previous (default: "(cont'd) ") */
chunkOverlapHeader?: string;
/** Whether to append overlap header to continuing chunks (default: false) */
appendChunkOverlapHeader?: boolean;
};Configuration Examples:
// Custom length function using token count
const tokenBasedSplitter = new CharacterTextSplitter({
separator: "\n",
chunkSize: 100, // 100 tokens instead of characters
chunkOverlap: 20,
lengthFunction: (text: string) => {
// Simple token estimation (actual implementation would use proper tokenizer)
return text.split(/\s+/).length;
}
});
// Keep separators in output
const separatorKeepingSplitter = new CharacterTextSplitter({
separator: "\n---\n",
chunkSize: 500,
chunkOverlap: 0,
keepSeparator: true // Separators will be included in the chunks
});