0
# String Operations
1
2
Text processing and manipulation functions for string columns in datatable.
3
4
## Capabilities
5
6
### String Functions
7
8
```python { .api }
9
def str.len(x):
10
"""
11
String length function.
12
13
Parameters:
14
- x: String column expression
15
16
Returns:
17
Integer column with string lengths
18
"""
19
20
def str.slice(x, start, stop=None):
21
"""
22
String slicing function.
23
24
Parameters:
25
- x: String column expression
26
- start: Starting index
27
- stop: Ending index (optional)
28
29
Returns:
30
String column with sliced strings
31
"""
32
33
def str.split_into_nhot(x):
34
"""
35
Split strings into n-hot encoding.
36
37
Parameters:
38
- x: String column expression
39
40
Returns:
41
Frame with n-hot encoded columns
42
"""
43
```
44
45
### Regular Expression Functions
46
47
```python { .api }
48
def re.match(x, pattern):
49
"""
50
Regular expression matching.
51
52
Parameters:
53
- x: String column expression
54
- pattern: Regular expression pattern
55
56
Returns:
57
Boolean column indicating matches
58
"""
59
```
60
61
## Examples
62
63
```python
64
import datatable as dt
65
66
DT = dt.Frame({
67
'text': ['hello', 'world', 'datatable', 'python'],
68
'codes': ['ABC-123', 'DEF-456', 'GHI-789', 'JKL-012']
69
})
70
71
# String operations
72
result = DT[:, dt.update(
73
text_length=dt.str.len(f.text),
74
first_3_chars=dt.str.slice(f.text, 0, 3),
75
last_2_chars=dt.str.slice(f.text, -2),
76
matches_pattern=dt.re.match(f.codes, r'[A-Z]{3}-\d{3}')
77
)]
78
```