or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

array-creation.mdarray-manipulation.mddata-conversion.mdindex.mdintegration.mdmathematical-operations.mdstring-operations.mdtype-system.md

string-operations.mddocs/

0

# String Operations

1

2

Comprehensive string processing capabilities modeled after Apache Arrow's compute functions, providing efficient operations on arrays of strings including pattern matching, transformations, analysis, and categorical operations. All functions work seamlessly with nested string arrays.

3

4

## Capabilities

5

6

### String Case Transformations

7

8

Functions for changing the case of string arrays while preserving array structure and handling missing values appropriately.

9

10

```python { .api }

11

def str.capitalize(array):

12

"""

13

Capitalize the first character of each string.

14

15

Parameters:

16

- array: Array of strings to capitalize

17

18

Returns:

19

Array with strings having first character capitalized

20

"""

21

22

def str.lower(array):

23

"""

24

Convert strings to lowercase.

25

26

Parameters:

27

- array: Array of strings to convert

28

29

Returns:

30

Array with strings converted to lowercase

31

"""

32

33

def str.upper(array):

34

"""

35

Convert strings to uppercase.

36

37

Parameters:

38

- array: Array of strings to convert

39

40

Returns:

41

Array with strings converted to uppercase

42

"""

43

44

def str.swapcase(array):

45

"""

46

Swap case of each character in strings.

47

48

Parameters:

49

- array: Array of strings to swap case

50

51

Returns:

52

Array with case of each character swapped

53

"""

54

55

def str.title(array):

56

"""

57

Convert strings to title case (capitalize first letter of each word).

58

59

Parameters:

60

- array: Array of strings to convert

61

62

Returns:

63

Array with strings converted to title case

64

"""

65

```

66

67

### String Reversal and Ordering

68

69

Functions for reversing string content and analyzing string structure.

70

71

```python { .api }

72

def str.reverse(array):

73

"""

74

Reverse each string character by character.

75

76

Parameters:

77

- array: Array of strings to reverse

78

79

Returns:

80

Array with strings reversed

81

"""

82

```

83

84

### String Padding and Alignment

85

86

Functions for padding strings to specified widths with customizable fill characters and alignment options.

87

88

```python { .api }

89

def str.center(array, width, padding=" "):

90

"""

91

Center strings in fields of specified width.

92

93

Parameters:

94

- array: Array of strings to center

95

- width: int, minimum width of resulting strings

96

- padding: str, character to use for padding (default space)

97

98

Returns:

99

Array with strings centered and padded to specified width

100

"""

101

102

def str.lpad(array, width, padding=" "):

103

"""

104

Left-pad strings to specified width.

105

106

Parameters:

107

- array: Array of strings to pad

108

- width: int, minimum width of resulting strings

109

- padding: str, character to use for padding (default space)

110

111

Returns:

112

Array with strings left-padded to specified width

113

"""

114

115

def str.rpad(array, width, padding=" "):

116

"""

117

Right-pad strings to specified width.

118

119

Parameters:

120

- array: Array of strings to pad

121

- width: int, minimum width of resulting strings

122

- padding: str, character to use for padding (default space)

123

124

Returns:

125

Array with strings right-padded to specified width

126

"""

127

```

128

129

### String Trimming and Cleanup

130

131

Functions for removing unwanted characters from the beginning, end, or both ends of strings.

132

133

```python { .api }

134

def str.trim(array, characters=None):

135

"""

136

Remove leading and trailing characters from strings.

137

138

Parameters:

139

- array: Array of strings to trim

140

- characters: str, characters to remove (None for whitespace)

141

142

Returns:

143

Array with specified characters trimmed from both ends

144

"""

145

146

def str.ltrim(array, characters=None):

147

"""

148

Remove leading characters from strings.

149

150

Parameters:

151

- array: Array of strings to trim

152

- characters: str, characters to remove (None for whitespace)

153

154

Returns:

155

Array with specified characters trimmed from start

156

"""

157

158

def str.rtrim(array, characters=None):

159

"""

160

Remove trailing characters from strings.

161

162

Parameters:

163

- array: Array of strings to trim

164

- characters: str, characters to remove (None for whitespace)

165

166

Returns:

167

Array with specified characters trimmed from end

168

"""

169

170

def str.trim_whitespace(array):

171

"""

172

Remove leading and trailing whitespace from strings.

173

174

Parameters:

175

- array: Array of strings to trim

176

177

Returns:

178

Array with whitespace trimmed from both ends

179

"""

180

181

def str.ltrim_whitespace(array):

182

"""

183

Remove leading whitespace from strings.

184

185

Parameters:

186

- array: Array of strings to trim

187

188

Returns:

189

Array with whitespace trimmed from start

190

"""

191

192

def str.rtrim_whitespace(array):

193

"""

194

Remove trailing whitespace from strings.

195

196

Parameters:

197

- array: Array of strings to trim

198

199

Returns:

200

Array with whitespace trimmed from end

201

"""

202

```

203

204

### String Length and Analysis

205

206

Functions for analyzing string properties including length, character counts, and pattern occurrences.

207

208

```python { .api }

209

def str.length(array):

210

"""

211

Get length of each string in characters.

212

213

Parameters:

214

- array: Array of strings to measure

215

216

Returns:

217

Array of integers representing string lengths

218

"""

219

220

def str.count_substring(array, pattern, ignore_case=False):

221

"""

222

Count non-overlapping occurrences of substring in each string.

223

224

Parameters:

225

- array: Array of strings to search

226

- pattern: str, substring pattern to count

227

- ignore_case: bool, if True perform case-insensitive search

228

229

Returns:

230

Array of integers representing count of pattern occurrences

231

"""

232

233

def str.count_substring_regex(array, pattern, flags=0):

234

"""

235

Count non-overlapping regex matches in each string.

236

237

Parameters:

238

- array: Array of strings to search

239

- pattern: str, regular expression pattern to count

240

- flags: int, regex flags (e.g., re.IGNORECASE)

241

242

Returns:

243

Array of integers representing count of pattern matches

244

"""

245

```

246

247

### String Search and Pattern Finding

248

249

Functions for locating patterns within strings using both literal and regular expression matching.

250

251

```python { .api }

252

def str.find_substring(array, pattern, start=0, end=None, ignore_case=False):

253

"""

254

Find first occurrence of substring in each string.

255

256

Parameters:

257

- array: Array of strings to search

258

- pattern: str, substring pattern to find

259

- start: int, starting position for search

260

- end: int, ending position for search (None for end of string)

261

- ignore_case: bool, if True perform case-insensitive search

262

263

Returns:

264

Array of integers representing position of first match (-1 if not found)

265

"""

266

267

def str.find_substring_regex(array, pattern, flags=0):

268

"""

269

Find first regex match position in each string.

270

271

Parameters:

272

- array: Array of strings to search

273

- pattern: str, regular expression pattern to find

274

- flags: int, regex flags (e.g., re.IGNORECASE)

275

276

Returns:

277

Array of integers representing position of first match (-1 if not found)

278

"""

279

```

280

281

### Character Type Predicates

282

283

Functions for testing character properties and string composition, useful for data validation and filtering.

284

285

```python { .api }

286

def str.is_alnum(array):

287

"""

288

Test if all characters in strings are alphanumeric.

289

290

Parameters:

291

- array: Array of strings to test

292

293

Returns:

294

Array of booleans indicating if strings are alphanumeric

295

"""

296

297

def str.is_alpha(array):

298

"""

299

Test if all characters in strings are alphabetic.

300

301

Parameters:

302

- array: Array of strings to test

303

304

Returns:

305

Array of booleans indicating if strings are alphabetic

306

"""

307

308

def str.is_ascii(array):

309

"""

310

Test if all characters in strings are ASCII.

311

312

Parameters:

313

- array: Array of strings to test

314

315

Returns:

316

Array of booleans indicating if strings contain only ASCII characters

317

"""

318

319

def str.is_decimal(array):

320

"""

321

Test if all characters in strings are decimal digits.

322

323

Parameters:

324

- array: Array of strings to test

325

326

Returns:

327

Array of booleans indicating if strings are decimal

328

"""

329

330

def str.is_digit(array):

331

"""

332

Test if all characters in strings are digits.

333

334

Parameters:

335

- array: Array of strings to test

336

337

Returns:

338

Array of booleans indicating if strings contain only digits

339

"""

340

341

def str.is_lower(array):

342

"""

343

Test if all cased characters in strings are lowercase.

344

345

Parameters:

346

- array: Array of strings to test

347

348

Returns:

349

Array of booleans indicating if strings are lowercase

350

"""

351

352

def str.is_numeric(array):

353

"""

354

Test if all characters in strings are numeric.

355

356

Parameters:

357

- array: Array of strings to test

358

359

Returns:

360

Array of booleans indicating if strings are numeric

361

"""

362

363

def str.is_printable(array):

364

"""

365

Test if all characters in strings are printable.

366

367

Parameters:

368

- array: Array of strings to test

369

370

Returns:

371

Array of booleans indicating if strings are printable

372

"""

373

374

def str.is_space(array):

375

"""

376

Test if all characters in strings are whitespace.

377

378

Parameters:

379

- array: Array of strings to test

380

381

Returns:

382

Array of booleans indicating if strings contain only whitespace

383

"""

384

385

def str.is_title(array):

386

"""

387

Test if strings are in title case.

388

389

Parameters:

390

- array: Array of strings to test

391

392

Returns:

393

Array of booleans indicating if strings are in title case

394

"""

395

396

def str.is_upper(array):

397

"""

398

Test if all cased characters in strings are uppercase.

399

400

Parameters:

401

- array: Array of strings to test

402

403

Returns:

404

Array of booleans indicating if strings are uppercase

405

"""

406

```

407

408

### Pattern Matching and Boolean Tests

409

410

Functions for testing string patterns using various matching strategies including prefix/suffix, regex, and SQL-like patterns.

411

412

```python { .api }

413

def str.starts_with(array, pattern, ignore_case=False):

414

"""

415

Test if strings start with specified pattern.

416

417

Parameters:

418

- array: Array of strings to test

419

- pattern: str, pattern to match at start of strings

420

- ignore_case: bool, if True perform case-insensitive matching

421

422

Returns:

423

Array of booleans indicating if strings start with pattern

424

"""

425

426

def str.ends_with(array, pattern, ignore_case=False):

427

"""

428

Test if strings end with specified pattern.

429

430

Parameters:

431

- array: Array of strings to test

432

- pattern: str, pattern to match at end of strings

433

- ignore_case: bool, if True perform case-insensitive matching

434

435

Returns:

436

Array of booleans indicating if strings end with pattern

437

"""

438

439

def str.match_substring(array, pattern, ignore_case=False):

440

"""

441

Test if strings contain specified substring.

442

443

Parameters:

444

- array: Array of strings to test

445

- pattern: str, substring pattern to match

446

- ignore_case: bool, if True perform case-insensitive matching

447

448

Returns:

449

Array of booleans indicating if strings contain pattern

450

"""

451

452

def str.match_substring_regex(array, pattern, flags=0):

453

"""

454

Test if strings match regular expression pattern.

455

456

Parameters:

457

- array: Array of strings to test

458

- pattern: str, regular expression pattern to match

459

- flags: int, regex flags (e.g., re.IGNORECASE)

460

461

Returns:

462

Array of booleans indicating if strings match pattern

463

"""

464

465

def str.match_like(array, pattern, ignore_case=False, escape=None):

466

"""

467

Test strings using SQL LIKE pattern matching.

468

469

Parameters:

470

- array: Array of strings to test

471

- pattern: str, SQL LIKE pattern (% for any chars, _ for single char)

472

- ignore_case: bool, if True perform case-insensitive matching

473

- escape: str, escape character for literal % and _ (default None)

474

475

Returns:

476

Array of booleans indicating if strings match LIKE pattern

477

"""

478

```

479

480

### Set Membership Operations

481

482

Functions for testing string membership in collections and finding positions within value sets.

483

484

```python { .api }

485

def str.is_in(array, values):

486

"""

487

Test if strings are in specified collection of values.

488

489

Parameters:

490

- array: Array of strings to test

491

- values: Array or sequence of strings to test membership against

492

493

Returns:

494

Array of booleans indicating if strings are in value set

495

"""

496

497

def str.index_in(array, values):

498

"""

499

Find index of strings in specified collection of values.

500

501

Parameters:

502

- array: Array of strings to find indices for

503

- values: Array or sequence of strings to find indices in

504

505

Returns:

506

Array of integers representing index in values (-1 if not found)

507

"""

508

```

509

510

### String Replacement and Modification

511

512

Functions for replacing and modifying string content using literal patterns, regular expressions, or slice operations.

513

514

```python { .api }

515

def str.replace_substring(array, pattern, replacement, max_replacements=None):

516

"""

517

Replace occurrences of substring with replacement string.

518

519

Parameters:

520

- array: Array of strings to modify

521

- pattern: str, substring pattern to replace

522

- replacement: str, replacement string

523

- max_replacements: int, maximum number of replacements per string (None for all)

524

525

Returns:

526

Array with substring occurrences replaced

527

"""

528

529

def str.replace_substring_regex(array, pattern, replacement, max_replacements=None):

530

"""

531

Replace regex matches with replacement string.

532

533

Parameters:

534

- array: Array of strings to modify

535

- pattern: str, regular expression pattern to replace

536

- replacement: str, replacement string (can include capture groups)

537

- max_replacements: int, maximum number of replacements per string (None for all)

538

539

Returns:

540

Array with regex matches replaced

541

"""

542

543

def str.replace_slice(array, start, stop, replacement):

544

"""

545

Replace string slice with replacement string.

546

547

Parameters:

548

- array: Array of strings to modify

549

- start: int, start index of slice to replace

550

- stop: int, stop index of slice to replace

551

- replacement: str, replacement string

552

553

Returns:

554

Array with string slices replaced

555

"""

556

557

def str.repeat(array, repeats):

558

"""

559

Repeat each string specified number of times.

560

561

Parameters:

562

- array: Array of strings to repeat

563

- repeats: int or Array of ints, number of repetitions for each string

564

565

Returns:

566

Array with strings repeated

567

"""

568

```

569

570

### String Extraction and Slicing

571

572

Functions for extracting parts of strings using position-based slicing or pattern-based extraction.

573

574

```python { .api }

575

def str.slice(array, start=0, stop=None, step=1):

576

"""

577

Extract substring using slice notation.

578

579

Parameters:

580

- array: Array of strings to slice

581

- start: int, start index (default 0)

582

- stop: int, stop index (None for end of string)

583

- step: int, step size (default 1)

584

585

Returns:

586

Array containing extracted substrings

587

"""

588

589

def str.extract_regex(array, pattern, flags=0):

590

"""

591

Extract regex capture groups from strings.

592

593

Parameters:

594

- array: Array of strings to extract from

595

- pattern: str, regular expression with capture groups

596

- flags: int, regex flags (e.g., re.IGNORECASE)

597

598

Returns:

599

Array of tuples/records containing captured groups (None if no match)

600

"""

601

```

602

603

### String Splitting and Joining

604

605

Functions for splitting strings into components and joining string arrays into single strings.

606

607

```python { .api }

608

def str.split_whitespace(array, max_splits=None):

609

"""

610

Split strings on whitespace characters.

611

612

Parameters:

613

- array: Array of strings to split

614

- max_splits: int, maximum number of splits per string (None for unlimited)

615

616

Returns:

617

Array of lists containing string components

618

"""

619

620

def str.split_pattern(array, pattern, max_splits=None):

621

"""

622

Split strings on literal pattern.

623

624

Parameters:

625

- array: Array of strings to split

626

- pattern: str, literal pattern to split on

627

- max_splits: int, maximum number of splits per string (None for unlimited)

628

629

Returns:

630

Array of lists containing string components

631

"""

632

633

def str.split_pattern_regex(array, pattern, max_splits=None, flags=0):

634

"""

635

Split strings using regular expression pattern.

636

637

Parameters:

638

- array: Array of strings to split

639

- pattern: str, regular expression pattern to split on

640

- max_splits: int, maximum number of splits per string (None for unlimited)

641

- flags: int, regex flags (e.g., re.IGNORECASE)

642

643

Returns:

644

Array of lists containing string components

645

"""

646

647

def str.join(array, separator):

648

"""

649

Join arrays of strings using separator.

650

651

Parameters:

652

- array: Array of string lists to join

653

- separator: str, separator to use between elements

654

655

Returns:

656

Array of strings created by joining list elements

657

"""

658

659

def str.join_element_wise(array, separator):

660

"""

661

Join corresponding elements from multiple string arrays.

662

663

Parameters:

664

- array: Array of string lists where each inner list contains strings to join

665

- separator: str, separator to use between elements

666

667

Returns:

668

Array of strings created by joining corresponding elements

669

"""

670

```

671

672

### Categorical String Operations

673

674

Functions for working with categorical string data, enabling memory-efficient storage and processing of repeated string values.

675

676

```python { .api }

677

def str.to_categorical(array):

678

"""

679

Convert string array to categorical representation.

680

681

Parameters:

682

- array: Array of strings to convert

683

684

Returns:

685

Array with categorical representation (indices + categories)

686

"""

687

```

688

689

## Usage Examples

690

691

### Basic String Operations

692

693

```python

694

import awkward as ak

695

696

# Create array of strings

697

names = ak.Array(["alice", "bob", "CHARLIE", "diana"])

698

699

# Case transformations

700

upper_names = ak.str.upper(names) # ["ALICE", "BOB", "CHARLIE", "DIANA"]

701

lower_names = ak.str.lower(names) # ["alice", "bob", "charlie", "diana"]

702

title_names = ak.str.title(names) # ["Alice", "Bob", "Charlie", "Diana"]

703

704

# String properties

705

lengths = ak.str.length(names) # [5, 3, 7, 5]

706

is_upper = ak.str.is_upper(names) # [False, False, True, False]

707

```

708

709

### String Filtering and Matching

710

711

```python

712

import awkward as ak

713

714

emails = ak.Array(["user@example.com", "admin@site.org", "test@example.com"])

715

716

# Pattern matching

717

has_example = ak.str.match_substring(emails, "example") # [True, False, True]

718

starts_admin = ak.str.starts_with(emails, "admin") # [False, True, False]

719

ends_com = ak.str.ends_with(emails, ".com") # [True, False, True]

720

721

# Filter based on pattern

722

example_emails = emails[has_example] # ["user@example.com", "test@example.com"]

723

```

724

725

### String Transformations

726

727

```python

728

import awkward as ak

729

730

# Nested string arrays

731

data = ak.Array([["hello world", "test"], ["python", "awkward array"]])

732

733

# Split strings

734

split_data = ak.str.split_whitespace(data)

735

# [[["hello", "world"], ["test"]], [["python"], ["awkward", "array"]]]

736

737

# Replace patterns

738

cleaned = ak.str.replace_substring(data, "test", "demo")

739

# [["hello world", "demo"], ["python", "awkward array"]]

740

741

# Extract parts

742

first_words = ak.str.split_whitespace(data)[:, :, 0]

743

# [["hello", "test"], ["python", "awkward"]]

744

```

745

746

### String Padding and Formatting

747

748

```python

749

import awkward as ak

750

751

numbers = ak.Array(["1", "22", "333"])

752

753

# Pad strings

754

left_padded = ak.str.lpad(numbers, 5, "0") # ["00001", "00022", "00333"]

755

centered = ak.str.center(numbers, 5, "*") # ["**1**", "*22**", "*333*"]

756

757

# Trim whitespace

758

messy = ak.Array([" hello ", " world ", "test"])

759

clean = ak.str.trim_whitespace(messy) # ["hello", "world", "test"]

760

```

761

762

### Regular Expression Operations

763

764

```python

765

import awkward as ak

766

import re

767

768

text = ak.Array(["Phone: 123-456-7890", "Call me at 555-123-4567", "No phone"])

769

770

# Extract phone numbers

771

phone_pattern = r'(\d{3})-(\d{3})-(\d{4})'

772

matches = ak.str.extract_regex(text, phone_pattern)

773

774

# Count pattern occurrences

775

digit_count = ak.str.count_substring_regex(text, r'\d') # [10, 10, 0]

776

777

# Boolean matching

778

has_phone = ak.str.match_substring_regex(text, phone_pattern) # [True, True, False]

779

```

780

781

### Advanced String Processing

782

783

```python

784

import awkward as ak

785

786

# String arrays with missing values

787

data = ak.Array([["alice", "bob"], None, ["charlie"]])

788

789

# Operations handle None gracefully

790

upper_data = ak.str.upper(data) # [["ALICE", "BOB"], None, ["CHARLIE"]]

791

792

# Join string lists

793

sentences = ak.Array([["hello", "world"], ["python", "is", "great"]])

794

joined = ak.str.join(sentences, " ") # ["hello world", "python is great"]

795

796

# Categorical conversion for memory efficiency

797

categories = ak.Array(["red", "blue", "red", "green", "blue", "red"])

798

categorical = ak.str.to_categorical(categories) # More memory efficient

799

```