Tessl Tile for pypi/mlxtend@0.23.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

classification.md clustering.md datasets.md evaluation.md feature-engineering.md file-io.md index.md math-utils.md pattern-mining.md plotting.md preprocessing.md regression.md text-processing.md utilities.md

text-processing.mddocs/

0
# Text Processing
1

2
Text processing utilities for natural language processing tasks including name normalization and tokenization.
3

4
## Capabilities
5

6
### Name Processing
7

8
Utilities for processing and normalizing person names.
9

10
```python { .api }
11
def generalize_names(name):
12
    """
13
    Generalize person names for consistency.
14
    
15
    Parameters:
16
    - name: str, person name to generalize
17
    
18
    Returns:
19
    - generalized_name: str, normalized name
20
    """
21

22
def generalize_names_duplcheck(name_list):
23
    """
24
    Generalize names with duplicate checking and removal.
25
    
26
    Parameters:
27
    - name_list: list, list of person names
28
    
29
    Returns:
30
    - unique_names: list, deduplicated normalized names
31
    """
32
```
33

34
### Text Tokenization
35

36
Tokenization utilities for text processing including emoticon handling.
37

38
```python { .api }
39
def tokenizer_words_and_emoticons(text):
40
    """
41
    Tokenize text including words and emoticons.
42
    
43
    Parameters:
44
    - text: str, input text to tokenize
45
    
46
    Returns:
47
    - tokens: list, list of word and emoticon tokens
48
    """
49

50
def tokenizer_emoticons(text):
51
    """
52
    Extract emoticons from text.
53
    
54
    Parameters:
55
    - text: str, input text
56
    
57
    Returns:
58
    - emoticons: list, list of emoticon tokens found in text
59
    """
60
```
61

62
## Usage Examples
63

64
```python
65
from mlxtend.text import generalize_names, generalize_names_duplcheck
66
from mlxtend.text import tokenizer_words_and_emoticons, tokenizer_emoticons
67

68
# Name processing examples
69
name = "Dr. John Smith Jr."
70
normalized = generalize_names(name)
71
print(f"Original: {name}")
72
print(f"Normalized: {normalized}")
73

74
# Duplicate name handling
75
names = ["John Smith", "J. Smith", "John Smith", "Jane Doe"]
76
unique_names = generalize_names_duplcheck(names)
77
print(f"Original names: {names}")
78
print(f"Unique normalized: {unique_names}")
79

80
# Text tokenization with emoticons
81
text = "I love machine learning! :) It's so cool :D"
82
tokens = tokenizer_words_and_emoticons(text)
83
emoticons = tokenizer_emoticons(text)
84

85
print(f"Text: {text}")
86
print(f"All tokens: {tokens}")
87
print(f"Emoticons only: {emoticons}")
88
```

Version

Tile

Files

text-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

text-processing.mddocs/