tessl/pypi-azure-ai-translation-text

Azure Text Translation client library for Python that provides neural machine translation technology for quick and accurate source-to-target text translation in real time across all supported languages

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Sentence Boundary Detection

Name: tessl/pypi-azure-ai-translation-text
Author: tessl

Identify sentence boundaries in text with automatic language detection and script-specific processing. This service determines where sentences begin and end in input text, providing length information for proper text segmentation and analysis.

Capabilities

Find Sentence Boundaries

Analyzes input text to identify sentence boundaries and returns length information for each detected sentence with optional language detection.

def find_sentence_boundaries(
    body: Union[List[str], List[InputTextItem], IO[bytes]],
    *,
    client_trace_id: Optional[str] = None,
    language: Optional[str] = None,
    script: Optional[str] = None,
    **kwargs: Any
) -> List[BreakSentenceItem]

Parameters:

body: Text to analyze (strings, InputTextItem objects, or binary data)
client_trace_id: Client-generated GUID for request tracking
language: Language code for the text (auto-detected if omitted)
script: Script identifier for the text (default script assumed if omitted)

Returns: List of sentence boundary analysis results

Usage Examples

from azure.ai.translation.text import TextTranslationClient
from azure.core.credentials import AzureKeyCredential

client = TextTranslationClient(
    credential=AzureKeyCredential("your-api-key"),
    region="your-region"
)

# Basic sentence boundary detection with auto-detection
response = client.find_sentence_boundaries(
    body=["The answer lies in machine translation. This is a test. How are you?"]
)

result = response[0]
print(f"Detected language: {result.detected_language.language}")
print(f"Detection confidence: {result.detected_language.score}")
print(f"Sentence lengths: {result.sent_len}")
# Output: [37, 15, 12] (character counts for each sentence)

# Multi-text analysis
multi_response = client.find_sentence_boundaries(
    body=[
        "First text with multiple sentences. This is sentence two.",
        "Second text. Also has multiple parts. Three sentences total."
    ]
)

for i, result in enumerate(multi_response):
    print(f"\nText {i+1}:")
    print(f"  Language: {result.detected_language.language}")
    print(f"  Sentence lengths: {result.sent_len}")

# Specify language and script explicitly
explicit_response = client.find_sentence_boundaries(
    body=["¡Hola mundo! ¿Cómo estás hoy? Me alegro de verte."],
    language="es",
    script="Latn"
)

# Complex punctuation handling
complex_response = client.find_sentence_boundaries(
    body=["Dr. Smith went to the U.S.A. yesterday. He said 'Hello!' to everyone."]
)

# Mixed language content (relies on auto-detection)
mixed_response = client.find_sentence_boundaries(
    body=["English sentence. Sentence en français. Back to English."]
)

Input Types

Text Input Models

class InputTextItem:
    text: str  # Text content to analyze for sentence boundaries

Response Types

Sentence Boundary Results

class BreakSentenceItem:
    sent_len: List[int]  # Character lengths of each detected sentence
    detected_language: Optional[DetectedLanguage]  # Auto-detected language info

Language Detection Information

class DetectedLanguage:
    language: str  # Detected language code (ISO 639-1/639-3)
    score: float   # Detection confidence score (0.0 to 1.0)

Sentence Segmentation Rules

The service applies language-specific and script-specific rules for sentence boundary detection:

General Rules

Periods, exclamation marks, and question marks typically end sentences
Abbreviations (Dr., Mr., U.S.A.) are handled contextually
Quotation marks and parentheses are considered in boundary detection
Multiple consecutive punctuation marks are processed appropriately

Language-Specific Processing

English: Handles abbreviations, contractions, and decimal numbers
Spanish: Processes inverted punctuation marks (¡¿)
Chinese/Japanese: Recognizes full-width punctuation (。！？)
Arabic: Handles right-to-left text directionality
German: Manages compound words and capitalization rules

Script Considerations

Latin scripts: Standard punctuation processing
CJK scripts: Full-width punctuation mark recognition
Arabic script: Right-to-left text flow handling
Devanagari: Script-specific sentence ending markers

Integration with Translation

Sentence boundary detection is automatically used when include_sentence_length=True in translation requests:

# Translation with automatic sentence boundary detection
translation_response = client.translate(
    body=["First sentence. Second sentence. Third sentence."],
    to_language=["es"],
    include_sentence_length=True
)

translation = translation_response[0].translations[0]
if translation.sent_len:
    print(f"Source sentence lengths: {translation.sent_len.src_sent_len}")
    print(f"Target sentence lengths: {translation.sent_len.trans_sent_len}")

Error Handling

from azure.core.exceptions import HttpResponseError

try:
    response = client.find_sentence_boundaries(
        body=["Text to analyze"],
        language="invalid-code"  # Invalid language code
    )
except HttpResponseError as error:
    if error.error:
        print(f"Error Code: {error.error.code}")
        print(f"Message: {error.error.message}")

Install with Tessl CLI

npx tessl i tessl/pypi-azure-ai-translation-text

docs

dictionary-operations.md

index.md

language-support.md

script-transliteration.md

sentence-boundaries.md

text-translation.md

tile.json

tessl/pypi-azure-ai-translation-text

sentence-boundaries.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

Sentence Boundary Detection

Capabilities

Find Sentence Boundaries

Usage Examples

Input Types

Text Input Models

Response Types

Sentence Boundary Results

Language Detection Information

Sentence Segmentation Rules

General Rules

Language-Specific Processing

Script Considerations

Integration with Translation

Error Handling

sentence-boundaries.mddocs/