Tessl Tile for pypi/azure-ai-translation-text@1.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

dictionary-operations.md index.md language-support.md script-transliteration.md sentence-boundaries.md text-translation.md

sentence-boundaries.mddocs/

0
# Sentence Boundary Detection
1

2
Identify sentence boundaries in text with automatic language detection and script-specific processing. This service determines where sentences begin and end in input text, providing length information for proper text segmentation and analysis.
3

4
## Capabilities
5

6
### Find Sentence Boundaries
7

8
Analyzes input text to identify sentence boundaries and returns length information for each detected sentence with optional language detection.
9

10
```python { .api }
11
def find_sentence_boundaries(
12
    body: Union[List[str], List[InputTextItem], IO[bytes]],
13
    *,
14
    client_trace_id: Optional[str] = None,
15
    language: Optional[str] = None,
16
    script: Optional[str] = None,
17
    **kwargs: Any
18
) -> List[BreakSentenceItem]
19
```
20

21
**Parameters:**
22
- `body`: Text to analyze (strings, InputTextItem objects, or binary data)
23
- `client_trace_id`: Client-generated GUID for request tracking
24
- `language`: Language code for the text (auto-detected if omitted)
25
- `script`: Script identifier for the text (default script assumed if omitted)
26

27
**Returns:** List of sentence boundary analysis results
28

29
### Usage Examples
30

31
```python
32
from azure.ai.translation.text import TextTranslationClient
33
from azure.core.credentials import AzureKeyCredential
34

35
client = TextTranslationClient(
36
    credential=AzureKeyCredential("your-api-key"),
37
    region="your-region"
38
)
39

40
# Basic sentence boundary detection with auto-detection
41
response = client.find_sentence_boundaries(
42
    body=["The answer lies in machine translation. This is a test. How are you?"]
43
)
44

45
result = response[0]
46
print(f"Detected language: {result.detected_language.language}")
47
print(f"Detection confidence: {result.detected_language.score}")
48
print(f"Sentence lengths: {result.sent_len}")
49
# Output: [37, 15, 12] (character counts for each sentence)
50

51
# Multi-text analysis
52
multi_response = client.find_sentence_boundaries(
53
    body=[
54
        "First text with multiple sentences. This is sentence two.",
55
        "Second text. Also has multiple parts. Three sentences total."
56
    ]
57
)
58

59
for i, result in enumerate(multi_response):
60
    print(f"\nText {i+1}:")
61
    print(f"  Language: {result.detected_language.language}")
62
    print(f"  Sentence lengths: {result.sent_len}")
63

64
# Specify language and script explicitly
65
explicit_response = client.find_sentence_boundaries(
66
    body=["¡Hola mundo! ¿Cómo estás hoy? Me alegro de verte."],
67
    language="es",
68
    script="Latn"
69
)
70

71
# Complex punctuation handling
72
complex_response = client.find_sentence_boundaries(
73
    body=["Dr. Smith went to the U.S.A. yesterday. He said 'Hello!' to everyone."]
74
)
75

76
# Mixed language content (relies on auto-detection)
77
mixed_response = client.find_sentence_boundaries(
78
    body=["English sentence. Sentence en français. Back to English."]
79
)
80
```
81

82
## Input Types
83

84
### Text Input Models
85

86
```python { .api }
87
class InputTextItem:
88
    text: str  # Text content to analyze for sentence boundaries
89
```
90

91
## Response Types
92

93
### Sentence Boundary Results
94

95
```python { .api }
96
class BreakSentenceItem:
97
    sent_len: List[int]  # Character lengths of each detected sentence
98
    detected_language: Optional[DetectedLanguage]  # Auto-detected language info
99
```
100

101
### Language Detection Information
102

103
```python { .api }
104
class DetectedLanguage:
105
    language: str  # Detected language code (ISO 639-1/639-3)
106
    score: float   # Detection confidence score (0.0 to 1.0)
107
```
108

109
## Sentence Segmentation Rules
110

111
The service applies language-specific and script-specific rules for sentence boundary detection:
112

113
### General Rules
114
- Periods, exclamation marks, and question marks typically end sentences
115
- Abbreviations (Dr., Mr., U.S.A.) are handled contextually
116
- Quotation marks and parentheses are considered in boundary detection
117
- Multiple consecutive punctuation marks are processed appropriately
118

119
### Language-Specific Processing
120
- **English**: Handles abbreviations, contractions, and decimal numbers
121
- **Spanish**: Processes inverted punctuation marks (¡¿)
122
- **Chinese/Japanese**: Recognizes full-width punctuation (。！？)
123
- **Arabic**: Handles right-to-left text directionality
124
- **German**: Manages compound words and capitalization rules
125

126
### Script Considerations
127
- **Latin scripts**: Standard punctuation processing
128
- **CJK scripts**: Full-width punctuation mark recognition
129
- **Arabic script**: Right-to-left text flow handling
130
- **Devanagari**: Script-specific sentence ending markers
131

132
## Integration with Translation
133

134
Sentence boundary detection is automatically used when `include_sentence_length=True` in translation requests:
135

136
```python
137
# Translation with automatic sentence boundary detection
138
translation_response = client.translate(
139
    body=["First sentence. Second sentence. Third sentence."],
140
    to_language=["es"],
141
    include_sentence_length=True
142
)
143

144
translation = translation_response[0].translations[0]
145
if translation.sent_len:
146
    print(f"Source sentence lengths: {translation.sent_len.src_sent_len}")
147
    print(f"Target sentence lengths: {translation.sent_len.trans_sent_len}")
148
```
149

150
## Error Handling
151

152
```python
153
from azure.core.exceptions import HttpResponseError
154

155
try:
156
    response = client.find_sentence_boundaries(
157
        body=["Text to analyze"],
158
        language="invalid-code"  # Invalid language code
159
    )
160
except HttpResponseError as error:
161
    if error.error:
162
        print(f"Error Code: {error.error.code}")
163
        print(f"Message: {error.error.message}")
164
```

Version

Tile

Files

sentence-boundaries.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

sentence-boundaries.mddocs/