or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Types-chardet

1

2

Type stubs for chardet - A universal character encoding detection library for Python. This package provides comprehensive type annotations for static type checking when using chardet to automatically detect the character encoding of text data from unknown sources.

3

4

## Package Information

5

6

- **Package Name**: types-chardet

7

- **Language**: Python

8

- **Installation**: `pip install types-chardet`

9

- **Requires**: chardet library (`pip install chardet`)

10

11

## Core Imports

12

13

```python

14

from chardet import UniversalDetector

15

```

16

17

The package also supports dynamic attribute access via `__getattr__`:

18

19

```python

20

import chardet

21

# Access the main detect function dynamically

22

detect_func = chardet.detect # Available at runtime

23

```

24

25

Import type annotations:

26

27

```python

28

from typing import Dict, Any, Optional, List, Tuple

29

from typing_extensions import TypedDict

30

```

31

32

## Basic Usage

33

34

```python

35

from chardet import UniversalDetector

36

import chardet

37

38

# Simple detection using detect() function (available via __getattr__)

39

result = chardet.detect(b'Hello World')

40

# result: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

41

42

# Advanced incremental detection with UniversalDetector

43

detector = UniversalDetector()

44

with open('unknown_encoding.txt', 'rb') as f:

45

for line in f:

46

detector.feed(line)

47

if detector.done:

48

break

49

result = detector.close()

50

# result: {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

51

```

52

53

## Capabilities

54

55

### Character Encoding Detection

56

57

Simple detection function that analyzes byte sequences to identify character encoding.

58

59

```python { .api }

60

def detect(byte_str: bytes) -> _FinalResultType:

61

"""

62

Detect character encoding of the given bytes.

63

64

Parameters:

65

- byte_str: bytes, data to analyze for encoding detection

66

67

Returns:

68

Dict with 'encoding', 'confidence', and 'language' keys

69

"""

70

```

71

72

### Dynamic Module Access

73

74

The package provides dynamic access to all chardet attributes through `__getattr__`. This allows access to functions like `detect()` and other runtime attributes that are not statically defined in the stubs.

75

76

```python { .api }

77

def __getattr__(name: str) -> Any:

78

"""

79

Provide dynamic access to any chardet module attribute.

80

81

This mechanism enables access to runtime functions like detect() and

82

other chardet functionality that is available at runtime but not

83

statically defined in the type stubs.

84

85

Parameters:

86

- name: str, attribute name to access

87

88

Returns:

89

Any attribute from the chardet runtime module

90

"""

91

```

92

93

### Universal Detection

94

95

The UniversalDetector class provides the core encoding detection functionality with support for incremental detection and language filtering.

96

97

```python { .api }

98

class UniversalDetector:

99

MINIMUM_THRESHOLD: float

100

HIGH_BYTE_DETECTOR: Pattern[bytes]

101

ESC_DETECTOR: Pattern[bytes]

102

WIN_BYTE_DETECTOR: Pattern[bytes]

103

ISO_WIN_MAP: dict[str, str]

104

105

result: _IntermediateResultType

106

done: bool

107

lang_filter: int

108

logger: Logger

109

110

def __init__(self, lang_filter: int = ...) -> None:

111

"""

112

Initialize UniversalDetector.

113

114

Parameters:

115

- lang_filter: int, language filter using LanguageFilter constants

116

"""

117

118

def reset(self) -> None:

119

"""Reset the detector to initial state."""

120

121

def feed(self, byte_str: bytes) -> None:

122

"""

123

Feed bytes to the detector for analysis.

124

125

Parameters:

126

- byte_str: bytes, data to analyze

127

"""

128

129

def close(self) -> _FinalResultType:

130

"""

131

Close detection and return final result.

132

133

Returns:

134

Dict with 'encoding', 'confidence', and 'language' keys

135

"""

136

```

137

138

### Detection State Enums

139

140

State constants for controlling and monitoring the detection process.

141

142

```python { .api }

143

class InputState:

144

PURE_ASCII: int

145

ESC_ASCII: int

146

HIGH_BYTE: int

147

148

class ProbingState:

149

DETECTING: int

150

FOUND_IT: int

151

NOT_ME: int

152

153

class MachineState:

154

START: int

155

ERROR: int

156

ITS_ME: int

157

```

158

159

### Language Filtering

160

161

Language filter constants for focusing detection on specific language groups.

162

163

```python { .api }

164

class LanguageFilter:

165

CHINESE_SIMPLIFIED: int

166

CHINESE_TRADITIONAL: int

167

JAPANESE: int

168

KOREAN: int

169

NON_CJK: int

170

ALL: int

171

CHINESE: int

172

CJK: int

173

```

174

175

### Sequence Analysis

176

177

Likelihood and character categorization constants for internal sequence analysis.

178

179

```python { .api }

180

class SequenceLikelihood:

181

NEGATIVE: int

182

UNLIKELY: int

183

LIKELY: int

184

POSITIVE: int

185

186

@classmethod

187

def get_num_categories(cls) -> int:

188

"""Get the number of likelihood categories."""

189

190

class CharacterCategory:

191

UNDEFINED: int

192

LINE_BREAK: int

193

SYMBOL: int

194

DIGIT: int

195

CONTROL: int

196

```

197

198

### Language Models

199

200

Language-specific detection models with character mappings and precedence matrices.

201

202

```python { .api }

203

# Greek language model

204

Latin7GreekModel: _LangModelType

205

Win1253GreekModel: _LangModelType

206

Latin7_char_to_order_map: tuple[int, ...]

207

win1253_char_to_order_map: tuple[int, ...]

208

GreekLangModel: tuple[int, ...]

209

210

# Bulgarian language model

211

Latin5BulgarianModel: _LangModelType

212

Win1251BulgarianModel: _LangModelType

213

Latin5_BulgarianCharToOrderMap: tuple[int, ...]

214

win1251BulgarianCharToOrderMap: tuple[int, ...]

215

BulgarianLangModel: tuple[int, ...]

216

217

# Cyrillic language models

218

Koi8rModel: _LangModelType

219

Win1251CyrillicModel: _LangModelType

220

Latin5CyrillicModel: _LangModelType

221

MacCyrillicModel: _LangModelType

222

Ibm866Model: _LangModelType

223

Ibm855Model: _LangModelType

224

KOI8R_char_to_order_map: tuple[int, ...]

225

win1251_char_to_order_map: tuple[int, ...]

226

latin5_char_to_order_map: tuple[int, ...]

227

macCyrillic_char_to_order_map: tuple[int, ...]

228

IBM855_char_to_order_map: tuple[int, ...]

229

IBM866_char_to_order_map: tuple[int, ...]

230

RussianLangModel: tuple[int, ...]

231

232

# Hebrew language model

233

Win1255HebrewModel: _LangModelType

234

WIN1255_CHAR_TO_ORDER_MAP: tuple[int, ...]

235

HEBREW_LANG_MODEL: tuple[int, ...]

236

237

# Hungarian language model

238

Latin2HungarianModel: _LangModelType

239

Win1250HungarianModel: _LangModelType

240

Latin2_HungarianCharToOrderMap: tuple[int, ...]

241

win1250HungarianCharToOrderMap: tuple[int, ...]

242

HungarianLangModel: tuple[int, ...]

243

244

# Thai language model

245

TIS620ThaiModel: _LangModelType

246

TIS620CharToOrderMap: tuple[int, ...]

247

ThaiLangModel: tuple[int, ...]

248

249

# Turkish language model

250

Latin5TurkishModel: _LangModelType

251

Latin5_TurkishCharToOrderMap: tuple[int, ...]

252

TurkishLangModel: tuple[int, ...]

253

```

254

255

### Version Information

256

257

Package version information for compatibility checking.

258

259

```python { .api }

260

__version__: str

261

VERSION: list[str]

262

```

263

264

## Types

265

266

```python { .api }

267

from typing import Pattern, Optional, List, Any

268

from typing_extensions import TypedDict

269

from logging import Logger

270

271

class _FinalResultType(TypedDict):

272

encoding: str

273

confidence: float

274

language: str

275

276

class _IntermediateResultType(TypedDict):

277

encoding: str | None

278

confidence: float

279

language: str | None

280

281

class _LangModelType(TypedDict):

282

char_to_order_map: tuple[int, ...]

283

precedence_matrix: tuple[int, ...]

284

typical_positive_ratio: float

285

keep_english_letter: bool

286

charset_name: str

287

language: str

288

289

class _SMModelType(TypedDict):

290

class_table: tuple[int, ...]

291

class_factor: int

292

state_table: tuple[int, ...]

293

char_len_table: tuple[int, ...]

294

name: str

295

```