or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

additional-utilities.mdcaching.mddata-structures.mddevelopment-debugging-tools.mdfile-io-operations.mdformat-table-utilities.mdindex.mditeration-processing.mdmath-stats-operations.mdnetwork-url-handling.mdstring-text-processing.mdtime-date-utilities.md

string-text-processing.mddocs/

0

# String & Text Processing

1

2

Comprehensive text manipulation including case conversion, slugification, text formatting, HTML processing, ANSI handling, compression, and advanced string operations with internationalization support. Provides utilities for common text processing tasks with robust encoding and formatting capabilities.

3

4

## Capabilities

5

6

### Case Conversion

7

8

Convert between different string case formats.

9

10

```python { .api }

11

def camel2under(camel_string):

12

"""

13

Convert CamelCase to under_score.

14

15

Parameters:

16

- camel_string (str): CamelCase string to convert

17

18

Returns:

19

str: Converted under_score string

20

"""

21

22

def under2camel(under_string):

23

"""

24

Convert under_score to CamelCase.

25

26

Parameters:

27

- under_string (str): under_score string to convert

28

29

Returns:

30

str: Converted CamelCase string

31

"""

32

```

33

34

### Text Slugification

35

36

Convert text to URL-safe slugs and identifiers.

37

38

```python { .api }

39

def slugify(text, delim='_', lower=True, ascii=False):

40

"""

41

Convert text to URL-safe slug.

42

43

Parameters:

44

- text (str): Text to slugify

45

- delim (str): Delimiter character (default: '_')

46

- lower (bool): Convert to lowercase (default: True)

47

- ascii (bool): Force ASCII output (default: False)

48

49

Returns:

50

str: URL-safe slug

51

"""

52

53

def a10n(string):

54

"""

55

Create internationalization-style abbreviation (a11y, i18n, etc.).

56

57

Parameters:

58

- string (str): String to abbreviate

59

60

Returns:

61

str: Abbreviated form (first + count + last)

62

"""

63

```

64

65

### Text Formatting and Manipulation

66

67

Advanced text processing and formatting utilities.

68

69

```python { .api }

70

def split_punct_ws(text):

71

"""

72

Split text on punctuation and whitespace.

73

74

Parameters:

75

- text (str): Text to split

76

77

Returns:

78

list: List of text segments

79

"""

80

81

def unit_len(sized_iterable, unit_noun='item'):

82

"""

83

Format count with unit noun.

84

85

Parameters:

86

- sized_iterable: Iterable with __len__

87

- unit_noun (str): Singular noun for the unit

88

89

Returns:

90

str: Formatted count with proper pluralization

91

"""

92

93

def ordinalize(number, ext_only=False):

94

"""

95

Convert number to ordinal (1st, 2nd, etc.).

96

97

Parameters:

98

- number (int): Number to convert

99

- ext_only (bool): Return only the suffix (default: False)

100

101

Returns:

102

str: Ordinal number or suffix

103

"""

104

105

def cardinalize(unit_noun, count):

106

"""

107

Pluralize unit noun based on count.

108

109

Parameters:

110

- unit_noun (str): Singular noun

111

- count (int): Count to determine pluralization

112

113

Returns:

114

str: Properly pluralized noun

115

"""

116

117

def singularize(word):

118

"""

119

Convert plural word to singular form.

120

121

Parameters:

122

- word (str): Plural word

123

124

Returns:

125

str: Singular form

126

"""

127

128

def pluralize(word):

129

"""

130

Convert singular word to plural form.

131

132

Parameters:

133

- word (str): Singular word

134

135

Returns:

136

str: Plural form

137

"""

138

```

139

140

### Text Analysis and Extraction

141

142

Extract and analyze text content.

143

144

```python { .api }

145

def find_hashtags(string):

146

"""

147

Extract hashtags from text.

148

149

Parameters:

150

- string (str): Text containing hashtags

151

152

Returns:

153

list: List of hashtag strings (including #)

154

"""

155

156

def is_uuid(string):

157

"""

158

Check if string is valid UUID format.

159

160

Parameters:

161

- string (str): String to check

162

163

Returns:

164

bool: True if valid UUID format

165

"""

166

167

def is_ascii(text):

168

"""

169

Check if text contains only ASCII characters.

170

171

Parameters:

172

- text (str): Text to check

173

174

Returns:

175

bool: True if text is ASCII-only

176

"""

177

```

178

179

### Text Cleaning and Normalization

180

181

Clean and normalize text content.

182

183

```python { .api }

184

def strip_ansi(text):

185

"""

186

Remove ANSI escape sequences from text.

187

188

Parameters:

189

- text (str): Text with ANSI sequences

190

191

Returns:

192

str: Text with ANSI sequences removed

193

"""

194

195

def asciify(text, ignore=False):

196

"""

197

Convert text to ASCII by removing diacritics.

198

199

Parameters:

200

- text (str): Text to convert

201

- ignore (bool): Ignore non-convertible characters

202

203

Returns:

204

str: ASCII-compatible text

205

"""

206

207

def unwrap_text(text, **kwargs):

208

"""

209

Unwrap text by removing line breaks.

210

211

Parameters:

212

- text (str): Text to unwrap

213

214

Returns:

215

str: Text with line breaks removed appropriately

216

"""

217

218

def indent(text, prefix):

219

"""

220

Indent text lines with prefix.

221

222

Parameters:

223

- text (str): Text to indent

224

- prefix (str): Prefix to add to each line

225

226

Returns:

227

str: Indented text

228

"""

229

```

230

231

### HTML Processing

232

233

Extract and process HTML content.

234

235

```python { .api }

236

def html2text(html_text):

237

"""

238

Extract plain text from HTML string.

239

240

Parameters:

241

- html_text (str): HTML content

242

243

Returns:

244

str: Plain text content

245

"""

246

247

class HTMLTextExtractor(HTMLParser):

248

"""Extract plain text from HTML."""

249

def __init__(self): ...

250

def handle_data(self, data): ...

251

def get_text(self): ...

252

```

253

254

### Data Formatting

255

256

Format data for human consumption.

257

258

```python { .api }

259

def bytes2human(nbytes, ndigits=0):

260

"""

261

Convert bytes to human readable format.

262

263

Parameters:

264

- nbytes (int): Number of bytes

265

- ndigits (int): Number of decimal places

266

267

Returns:

268

str: Human readable size (e.g., "1.5 MB")

269

"""

270

```

271

272

### Compression

273

274

Text compression and decompression utilities.

275

276

```python { .api }

277

def gunzip_bytes(data):

278

"""

279

Decompress gzip bytes.

280

281

Parameters:

282

- data (bytes): Gzipped data

283

284

Returns:

285

bytes: Decompressed data

286

"""

287

288

def gzip_bytes(data):

289

"""

290

Compress data to gzip bytes.

291

292

Parameters:

293

- data (bytes): Data to compress

294

295

Returns:

296

bytes: Gzipped data

297

"""

298

```

299

300

### String Replacement

301

302

Efficient multiple string replacement operations.

303

304

```python { .api }

305

def multi_replace(input_string, sub_map, **kwargs):

306

"""

307

Efficient multiple string replacement.

308

309

Parameters:

310

- input_string (str): String to process

311

- sub_map (dict): Mapping of old -> new strings

312

313

Returns:

314

str: String with all replacements made

315

"""

316

317

class MultiReplace:

318

"""Efficient multiple string replacement."""

319

def __init__(self, sub_map): ...

320

def __call__(self, input_string): ...

321

```

322

323

### Shell Command Processing

324

325

Escape and format shell command arguments.

326

327

```python { .api }

328

def escape_shell_args(args, sep=' ', style=None):

329

"""

330

Escape shell command arguments.

331

332

Parameters:

333

- args (list): List of arguments

334

- sep (str): Separator between arguments

335

- style (str): Shell style ('sh', 'cmd', etc.)

336

337

Returns:

338

str: Escaped shell command string

339

"""

340

341

def args2sh(args, sep=' '):

342

"""

343

Convert args to shell-escaped string.

344

345

Parameters:

346

- args (list): List of arguments

347

- sep (str): Separator between arguments

348

349

Returns:

350

str: Shell-escaped command string

351

"""

352

353

def args2cmd(args, sep=' '):

354

"""

355

Convert args to cmd.exe-escaped string.

356

357

Parameters:

358

- args (list): List of arguments

359

- sep (str): Separator between arguments

360

361

Returns:

362

str: CMD-escaped command string

363

"""

364

```

365

366

### Integer List Processing

367

368

Parse and format integer ranges and lists.

369

370

```python { .api }

371

def parse_int_list(range_string, **kwargs):

372

"""

373

Parse integer ranges from string.

374

375

Parameters:

376

- range_string (str): String like "1-5,7,9-12"

377

378

Returns:

379

list: List of integers

380

"""

381

382

def format_int_list(int_list, **kwargs):

383

"""

384

Format integer list as range string.

385

386

Parameters:

387

- int_list (list): List of integers

388

389

Returns:

390

str: Formatted range string

391

"""

392

393

def complement_int_list(range_string, **kwargs):

394

"""

395

Get complement of integer ranges.

396

397

Parameters:

398

- range_string (str): Range string to complement

399

400

Returns:

401

str: Complement range string

402

"""

403

404

def int_ranges_from_int_list(int_list):

405

"""

406

Convert integer list to ranges.

407

408

Parameters:

409

- int_list (list): List of integers

410

411

Returns:

412

list: List of (start, end) tuples

413

"""

414

```

415

416

### Memory-Efficient Text Processing

417

418

Process large text files efficiently.

419

420

```python { .api }

421

def iter_splitlines(text):

422

"""

423

Memory-efficient line iteration.

424

425

Parameters:

426

- text (str): Text to split into lines

427

428

Yields:

429

str: Each line

430

"""

431

```

432

433

## Usage Examples

434

435

```python

436

from boltons.strutils import (

437

slugify, camel2under, under2camel, bytes2human,

438

strip_ansi, html2text, multi_replace, find_hashtags

439

)

440

441

# Create URL-friendly slugs

442

title = "Hello, World! This is a test."

443

slug = slugify(title)

444

print(slug) # "hello-world-this-is-a-test"

445

446

# Case conversion

447

camel = "myVariableName"

448

under = camel2under(camel)

449

print(under) # "my_variable_name"

450

451

back_to_camel = under2camel(under)

452

print(back_to_camel) # "myVariableName"

453

454

# Human-readable byte sizes

455

size = bytes2human(1536)

456

print(size) # "1.5 KB"

457

458

# Clean ANSI escape sequences

459

ansi_text = "\033[31mRed text\033[0m"

460

clean = strip_ansi(ansi_text)

461

print(clean) # "Red text"

462

463

# Extract text from HTML

464

html = "<p>Hello <b>world</b>!</p>"

465

text = html2text(html)

466

print(text) # "Hello world!"

467

468

# Multiple string replacements

469

text = "Hello world, hello universe"

470

replacements = {"hello": "hi", "world": "earth"}

471

result = multi_replace(text, replacements)

472

print(result) # "Hi earth, hi universe"

473

474

# Find hashtags in text

475

social_text = "Check out #python and #boltons!"

476

tags = find_hashtags(social_text)

477

print(tags) # ["#python", "#boltons"]

478

```

479

480

### Advanced Text Processing

481

482

```python

483

from boltons.strutils import (

484

ordinalize, cardinalize, pluralize, singularize,

485

parse_int_list, format_int_list, asciify

486

)

487

488

# Number formatting

489

print(ordinalize(1)) # "1st"

490

print(ordinalize(22)) # "22nd"

491

print(ordinalize(103)) # "103rd"

492

493

# Pluralization

494

print(cardinalize("item", 1)) # "1 item"

495

print(cardinalize("item", 5)) # "5 items"

496

print(pluralize("child")) # "children"

497

print(singularize("children")) # "child"

498

499

# Integer range processing

500

ranges = "1-5,7,9-12"

501

numbers = parse_int_list(ranges)

502

print(numbers) # [1, 2, 3, 4, 5, 7, 9, 10, 11, 12]

503

504

formatted = format_int_list([1, 2, 3, 5, 6, 8])

505

print(formatted) # "1-3,5-6,8"

506

507

# Text normalization

508

accented = "café résumé naïve"

509

ascii_text = asciify(accented)

510

print(ascii_text) # "cafe resume naive"

511

```

512

513

## Types

514

515

```python { .api }

516

# Character mapping for removing diacritics

517

class DeaccenterDict(dict):

518

"""Dictionary for character deaccenting mappings."""

519

pass

520

521

# Regular expressions

522

HASHTAG_RE: re.Pattern # Pattern for matching hashtags

523

ANSI_SEQUENCES: re.Pattern # Pattern for ANSI escape sequences

524

525

# Character mappings

526

DEACCENT_MAP: dict # Mapping for removing diacritical marks

527

```