0
# Types-chardet
1
2
Type stubs for chardet - A universal character encoding detection library for Python. This package provides comprehensive type annotations for static type checking when using chardet to automatically detect the character encoding of text data from unknown sources.
3
4
## Package Information
5
6
- **Package Name**: types-chardet
7
- **Language**: Python
8
- **Installation**: `pip install types-chardet`
9
- **Requires**: chardet library (`pip install chardet`)
10
11
## Core Imports
12
13
```python
14
from chardet import UniversalDetector
15
```
16
17
The package also supports dynamic attribute access via `__getattr__`:
18
19
```python
20
import chardet
21
# Access the main detect function dynamically
22
detect_func = chardet.detect # Available at runtime
23
```
24
25
Import type annotations:
26
27
```python
28
from typing import Dict, Any, Optional, List, Tuple
29
from typing_extensions import TypedDict
30
```
31
32
## Basic Usage
33
34
```python
35
from chardet import UniversalDetector
36
import chardet
37
38
# Simple detection using detect() function (available via __getattr__)
39
result = chardet.detect(b'Hello World')
40
# result: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
41
42
# Advanced incremental detection with UniversalDetector
43
detector = UniversalDetector()
44
with open('unknown_encoding.txt', 'rb') as f:
45
for line in f:
46
detector.feed(line)
47
if detector.done:
48
break
49
result = detector.close()
50
# result: {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}
51
```
52
53
## Capabilities
54
55
### Character Encoding Detection
56
57
Simple detection function that analyzes byte sequences to identify character encoding.
58
59
```python { .api }
60
def detect(byte_str: bytes) -> _FinalResultType:
61
"""
62
Detect character encoding of the given bytes.
63
64
Parameters:
65
- byte_str: bytes, data to analyze for encoding detection
66
67
Returns:
68
Dict with 'encoding', 'confidence', and 'language' keys
69
"""
70
```
71
72
### Dynamic Module Access
73
74
The package provides dynamic access to all chardet attributes through `__getattr__`. This allows access to functions like `detect()` and other runtime attributes that are not statically defined in the stubs.
75
76
```python { .api }
77
def __getattr__(name: str) -> Any:
78
"""
79
Provide dynamic access to any chardet module attribute.
80
81
This mechanism enables access to runtime functions like detect() and
82
other chardet functionality that is available at runtime but not
83
statically defined in the type stubs.
84
85
Parameters:
86
- name: str, attribute name to access
87
88
Returns:
89
Any attribute from the chardet runtime module
90
"""
91
```
92
93
### Universal Detection
94
95
The UniversalDetector class provides the core encoding detection functionality with support for incremental detection and language filtering.
96
97
```python { .api }
98
class UniversalDetector:
99
MINIMUM_THRESHOLD: float
100
HIGH_BYTE_DETECTOR: Pattern[bytes]
101
ESC_DETECTOR: Pattern[bytes]
102
WIN_BYTE_DETECTOR: Pattern[bytes]
103
ISO_WIN_MAP: dict[str, str]
104
105
result: _IntermediateResultType
106
done: bool
107
lang_filter: int
108
logger: Logger
109
110
def __init__(self, lang_filter: int = ...) -> None:
111
"""
112
Initialize UniversalDetector.
113
114
Parameters:
115
- lang_filter: int, language filter using LanguageFilter constants
116
"""
117
118
def reset(self) -> None:
119
"""Reset the detector to initial state."""
120
121
def feed(self, byte_str: bytes) -> None:
122
"""
123
Feed bytes to the detector for analysis.
124
125
Parameters:
126
- byte_str: bytes, data to analyze
127
"""
128
129
def close(self) -> _FinalResultType:
130
"""
131
Close detection and return final result.
132
133
Returns:
134
Dict with 'encoding', 'confidence', and 'language' keys
135
"""
136
```
137
138
### Detection State Enums
139
140
State constants for controlling and monitoring the detection process.
141
142
```python { .api }
143
class InputState:
144
PURE_ASCII: int
145
ESC_ASCII: int
146
HIGH_BYTE: int
147
148
class ProbingState:
149
DETECTING: int
150
FOUND_IT: int
151
NOT_ME: int
152
153
class MachineState:
154
START: int
155
ERROR: int
156
ITS_ME: int
157
```
158
159
### Language Filtering
160
161
Language filter constants for focusing detection on specific language groups.
162
163
```python { .api }
164
class LanguageFilter:
165
CHINESE_SIMPLIFIED: int
166
CHINESE_TRADITIONAL: int
167
JAPANESE: int
168
KOREAN: int
169
NON_CJK: int
170
ALL: int
171
CHINESE: int
172
CJK: int
173
```
174
175
### Sequence Analysis
176
177
Likelihood and character categorization constants for internal sequence analysis.
178
179
```python { .api }
180
class SequenceLikelihood:
181
NEGATIVE: int
182
UNLIKELY: int
183
LIKELY: int
184
POSITIVE: int
185
186
@classmethod
187
def get_num_categories(cls) -> int:
188
"""Get the number of likelihood categories."""
189
190
class CharacterCategory:
191
UNDEFINED: int
192
LINE_BREAK: int
193
SYMBOL: int
194
DIGIT: int
195
CONTROL: int
196
```
197
198
### Language Models
199
200
Language-specific detection models with character mappings and precedence matrices.
201
202
```python { .api }
203
# Greek language model
204
Latin7GreekModel: _LangModelType
205
Win1253GreekModel: _LangModelType
206
Latin7_char_to_order_map: tuple[int, ...]
207
win1253_char_to_order_map: tuple[int, ...]
208
GreekLangModel: tuple[int, ...]
209
210
# Bulgarian language model
211
Latin5BulgarianModel: _LangModelType
212
Win1251BulgarianModel: _LangModelType
213
Latin5_BulgarianCharToOrderMap: tuple[int, ...]
214
win1251BulgarianCharToOrderMap: tuple[int, ...]
215
BulgarianLangModel: tuple[int, ...]
216
217
# Cyrillic language models
218
Koi8rModel: _LangModelType
219
Win1251CyrillicModel: _LangModelType
220
Latin5CyrillicModel: _LangModelType
221
MacCyrillicModel: _LangModelType
222
Ibm866Model: _LangModelType
223
Ibm855Model: _LangModelType
224
KOI8R_char_to_order_map: tuple[int, ...]
225
win1251_char_to_order_map: tuple[int, ...]
226
latin5_char_to_order_map: tuple[int, ...]
227
macCyrillic_char_to_order_map: tuple[int, ...]
228
IBM855_char_to_order_map: tuple[int, ...]
229
IBM866_char_to_order_map: tuple[int, ...]
230
RussianLangModel: tuple[int, ...]
231
232
# Hebrew language model
233
Win1255HebrewModel: _LangModelType
234
WIN1255_CHAR_TO_ORDER_MAP: tuple[int, ...]
235
HEBREW_LANG_MODEL: tuple[int, ...]
236
237
# Hungarian language model
238
Latin2HungarianModel: _LangModelType
239
Win1250HungarianModel: _LangModelType
240
Latin2_HungarianCharToOrderMap: tuple[int, ...]
241
win1250HungarianCharToOrderMap: tuple[int, ...]
242
HungarianLangModel: tuple[int, ...]
243
244
# Thai language model
245
TIS620ThaiModel: _LangModelType
246
TIS620CharToOrderMap: tuple[int, ...]
247
ThaiLangModel: tuple[int, ...]
248
249
# Turkish language model
250
Latin5TurkishModel: _LangModelType
251
Latin5_TurkishCharToOrderMap: tuple[int, ...]
252
TurkishLangModel: tuple[int, ...]
253
```
254
255
### Version Information
256
257
Package version information for compatibility checking.
258
259
```python { .api }
260
__version__: str
261
VERSION: list[str]
262
```
263
264
## Types
265
266
```python { .api }
267
from typing import Pattern, Optional, List, Any
268
from typing_extensions import TypedDict
269
from logging import Logger
270
271
class _FinalResultType(TypedDict):
272
encoding: str
273
confidence: float
274
language: str
275
276
class _IntermediateResultType(TypedDict):
277
encoding: str | None
278
confidence: float
279
language: str | None
280
281
class _LangModelType(TypedDict):
282
char_to_order_map: tuple[int, ...]
283
precedence_matrix: tuple[int, ...]
284
typical_positive_ratio: float
285
keep_english_letter: bool
286
charset_name: str
287
language: str
288
289
class _SMModelType(TypedDict):
290
class_table: tuple[int, ...]
291
class_factor: int
292
state_table: tuple[int, ...]
293
char_len_table: tuple[int, ...]
294
name: str
295
```