or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

client-system.mdcommand-line-interface.mdconfiguration-management.mdcontent-entities.mdcore-download-api.mddownload-system.mdexception-handling.mdindex.mdplugin-system.mdtext-data-processing.md

text-data-processing.mddocs/

0

# Text and Data Processing

1

2

Specialized utilities for text processing, HTML parsing, image processing, and cryptographic operations. These tools support the core functionality with URL parsing, data extraction, content processing, and security operations.

3

4

## Types

5

6

```python { .api }

7

from typing import Dict, Any, List, Optional, Union, Pattern, Match

8

```

9

10

## Capabilities

11

12

### Text Processing Utilities

13

14

Comprehensive text processing tools for URL handling, domain management, and ID parsing.

15

16

```python { .api }

17

class JmcomicText:

18

"""

19

Text processing utilities for URL parsing, domain extraction, and ID parsing.

20

21

Provides essential text manipulation functions for working with JMComic

22

URLs, domain names, and content identifiers.

23

24

Static Methods:

25

- parse_to_jm_id(text): Parse text to extract JM IDs

26

- extract_domain(url): Extract domain from URL

27

- normalize_url(url): Normalize URL format

28

- is_valid_jm_id(jm_id): Validate JM ID format

29

- parse_album_id(text): Extract album ID from text

30

- parse_photo_id(text): Extract photo ID from text

31

- clean_filename(filename): Clean filename for filesystem

32

- format_title(title): Format title for display

33

"""

34

35

@staticmethod

36

def parse_to_jm_id(text: Union[str, int]) -> str:

37

"""

38

Parse text or URL to extract JM ID.

39

40

Handles various input formats including URLs, raw IDs,

41

and text containing IDs.

42

43

Parameters:

44

- text: str or int - Text containing JM ID

45

46

Returns:

47

str - Extracted and normalized JM ID

48

49

Raises:

50

ValueError - If no valid ID found

51

"""

52

53

@staticmethod

54

def extract_domain(url: str) -> str:

55

"""

56

Extract domain from URL.

57

58

Parameters:

59

- url: str - URL to parse

60

61

Returns:

62

str - Extracted domain name

63

"""

64

65

@staticmethod

66

def normalize_url(url: str) -> str:

67

"""

68

Normalize URL format for consistent processing.

69

70

Parameters:

71

- url: str - URL to normalize

72

73

Returns:

74

str - Normalized URL

75

"""

76

77

@staticmethod

78

def is_valid_jm_id(jm_id: Union[str, int]) -> bool:

79

"""

80

Validate JM ID format.

81

82

Parameters:

83

- jm_id: str or int - ID to validate

84

85

Returns:

86

bool - True if valid JM ID format

87

"""

88

89

@staticmethod

90

def clean_filename(filename: str) -> str:

91

"""

92

Clean filename for filesystem compatibility.

93

94

Removes or replaces invalid characters for safe file operations.

95

96

Parameters:

97

- filename: str - Original filename

98

99

Returns:

100

str - Cleaned filename safe for filesystem

101

"""

102

```

103

104

Usage examples:

105

106

```python

107

# Parse various ID formats

108

jm_id = JmcomicText.parse_to_jm_id("https://example.com/album/123456")

109

jm_id = JmcomicText.parse_to_jm_id("123456")

110

jm_id = JmcomicText.parse_to_jm_id("Album ID: 123456")

111

112

# Validate IDs

113

is_valid = JmcomicText.is_valid_jm_id("123456")

114

115

# Clean filenames

116

safe_filename = JmcomicText.clean_filename("Album: Title with/invalid\\chars")

117

```

118

119

### HTML Parsing and Pattern Matching

120

121

Tools for parsing HTML content and extracting data using regular expressions.

122

123

```python { .api }

124

class PatternTool:

125

"""

126

Regular expression utilities for HTML parsing and data extraction.

127

128

Provides pre-compiled patterns and matching utilities for

129

extracting structured data from HTML pages.

130

131

Class Attributes:

132

- ALBUM_ID_PATTERN: Pattern - Regex for album ID extraction

133

- PHOTO_ID_PATTERN: Pattern - Regex for photo ID extraction

134

- IMAGE_URL_PATTERN: Pattern - Regex for image URL extraction

135

- TITLE_PATTERN: Pattern - Regex for title extraction

136

137

Static Methods:

138

- match_album_info(html): Extract album information from HTML

139

- match_photo_info(html): Extract photo information from HTML

140

- match_image_urls(html): Extract image URLs from HTML

141

- find_all_matches(pattern, text): Find all regex matches

142

"""

143

144

@staticmethod

145

def match_album_info(html: str) -> Dict[str, Any]:

146

"""

147

Extract album information from HTML content.

148

149

Parameters:

150

- html: str - HTML content to parse

151

152

Returns:

153

dict - Extracted album information

154

"""

155

156

@staticmethod

157

def match_photo_info(html: str) -> Dict[str, Any]:

158

"""

159

Extract photo information from HTML content.

160

161

Parameters:

162

- html: str - HTML content to parse

163

164

Returns:

165

dict - Extracted photo information

166

"""

167

168

@staticmethod

169

def match_image_urls(html: str) -> List[str]:

170

"""

171

Extract image URLs from HTML content.

172

173

Parameters:

174

- html: str - HTML content to parse

175

176

Returns:

177

List[str] - List of extracted image URLs

178

"""

179

180

@staticmethod

181

def find_all_matches(pattern: Pattern, text: str) -> List[Match]:

182

"""

183

Find all regex matches in text.

184

185

Parameters:

186

- pattern: Pattern - Compiled regex pattern

187

- text: str - Text to search

188

189

Returns:

190

List[Match] - List of regex match objects

191

"""

192

```

193

194

### Page Processing Tools

195

196

Specialized tools for processing HTML pages and extracting structured data.

197

198

```python { .api }

199

class JmPageTool:

200

"""

201

HTML page parsing and data extraction utilities.

202

203

Provides high-level functions for parsing JMComic HTML pages

204

and extracting structured data for albums, photos, and searches.

205

206

Static Methods:

207

- parse_album_page(html): Parse album detail page

208

- parse_photo_page(html): Parse photo detail page

209

- parse_search_page(html): Parse search results page

210

- parse_category_page(html): Parse category listing page

211

- extract_pagination(html): Extract pagination information

212

- extract_metadata(html): Extract page metadata

213

"""

214

215

@staticmethod

216

def parse_album_page(html: str) -> 'JmAlbumDetail':

217

"""

218

Parse album detail page HTML to extract album information.

219

220

Parameters:

221

- html: str - Album page HTML content

222

223

Returns:

224

JmAlbumDetail - Parsed album with metadata and episodes

225

"""

226

227

@staticmethod

228

def parse_photo_page(html: str) -> 'JmPhotoDetail':

229

"""

230

Parse photo detail page HTML to extract photo information.

231

232

Parameters:

233

- html: str - Photo page HTML content

234

235

Returns:

236

JmPhotoDetail - Parsed photo with metadata and images

237

"""

238

239

@staticmethod

240

def parse_search_page(html: str) -> 'JmSearchPage':

241

"""

242

Parse search results page HTML.

243

244

Parameters:

245

- html: str - Search page HTML content

246

247

Returns:

248

JmSearchPage - Parsed search results with albums and pagination

249

"""

250

251

@staticmethod

252

def extract_pagination(html: str) -> Dict[str, Any]:

253

"""

254

Extract pagination information from page.

255

256

Parameters:

257

- html: str - HTML content with pagination

258

259

Returns:

260

dict - Pagination data (current_page, total_pages, has_next, etc.)

261

"""

262

```

263

264

### API Response Processing

265

266

Tools for processing and adapting API responses from different client types.

267

268

```python { .api }

269

class JmApiAdaptTool:

270

"""

271

API response adaptation and transformation utilities.

272

273

Handles conversion between different API response formats and

274

standardizes data structures across client types.

275

276

Static Methods:

277

- adapt_album_response(response): Adapt album API response

278

- adapt_photo_response(response): Adapt photo API response

279

- adapt_search_response(response): Adapt search API response

280

- normalize_response_data(data): Normalize response data format

281

- validate_api_response(response): Validate API response structure

282

"""

283

284

@staticmethod

285

def adapt_album_response(response: Dict[str, Any]) -> 'JmAlbumDetail':

286

"""

287

Adapt album API response to standard format.

288

289

Parameters:

290

- response: dict - Raw API response data

291

292

Returns:

293

JmAlbumDetail - Standardized album entity

294

"""

295

296

@staticmethod

297

def adapt_photo_response(response: Dict[str, Any]) -> 'JmPhotoDetail':

298

"""

299

Adapt photo API response to standard format.

300

301

Parameters:

302

- response: dict - Raw API response data

303

304

Returns:

305

JmPhotoDetail - Standardized photo entity

306

"""

307

308

@staticmethod

309

def normalize_response_data(data: Dict[str, Any]) -> Dict[str, Any]:

310

"""

311

Normalize response data format across different APIs.

312

313

Parameters:

314

- data: dict - Raw response data

315

316

Returns:

317

dict - Normalized data structure

318

"""

319

```

320

321

### Image Processing Tools

322

323

Comprehensive image processing utilities including decryption, format conversion, and manipulation.

324

325

```python { .api }

326

class JmImageTool:

327

"""

328

Image processing, decryption, and format conversion utilities.

329

330

Provides tools for handling scrambled images, format conversion,

331

and image manipulation operations.

332

333

Static Methods:

334

- decrypt_image(image_data, scramble_id): Decrypt scrambled image

335

- is_image_scrambled(image_data): Check if image is scrambled

336

- convert_image_format(image_data, target_format): Convert image format

337

- resize_image(image_data, width, height): Resize image

338

- get_image_info(image_data): Get image metadata

339

- merge_images_vertical(images): Merge images vertically

340

- optimize_image(image_data): Optimize image for size

341

"""

342

343

@staticmethod

344

def decrypt_image(image_data: bytes, scramble_id: int) -> bytes:

345

"""

346

Decrypt scrambled image data.

347

348

JMComic images are sometimes scrambled for protection.

349

This function reverses the scrambling process.

350

351

Parameters:

352

- image_data: bytes - Scrambled image data

353

- scramble_id: int - Scramble algorithm identifier

354

355

Returns:

356

bytes - Decrypted image data

357

"""

358

359

@staticmethod

360

def is_image_scrambled(image_data: bytes) -> bool:

361

"""

362

Check if image data is scrambled.

363

364

Parameters:

365

- image_data: bytes - Image data to check

366

367

Returns:

368

bool - True if image appears to be scrambled

369

"""

370

371

@staticmethod

372

def convert_image_format(image_data: bytes, target_format: str) -> bytes:

373

"""

374

Convert image to different format.

375

376

Parameters:

377

- image_data: bytes - Original image data

378

- target_format: str - Target format ('JPEG', 'PNG', 'WEBP')

379

380

Returns:

381

bytes - Converted image data

382

"""

383

384

@staticmethod

385

def get_image_info(image_data: bytes) -> Dict[str, Any]:

386

"""

387

Get image metadata and properties.

388

389

Parameters:

390

- image_data: bytes - Image data

391

392

Returns:

393

dict - Image information (width, height, format, size)

394

"""

395

396

@staticmethod

397

def merge_images_vertical(images: List[bytes]) -> bytes:

398

"""

399

Merge multiple images vertically into single image.

400

401

Parameters:

402

- images: List[bytes] - List of image data to merge

403

404

Returns:

405

bytes - Merged image data

406

"""

407

```

408

409

### Cryptographic Tools

410

411

Encryption and decryption utilities for API communications and data protection.

412

413

```python { .api }

414

class JmCryptoTool:

415

"""

416

Encryption/decryption utilities for API communications.

417

418

Handles the encryption protocols used by JMComic mobile API

419

and provides security functions for data protection.

420

421

Static Methods:

422

- encrypt_api_request(data): Encrypt API request data

423

- decrypt_api_response(encrypted_data): Decrypt API response

424

- generate_request_signature(data): Generate request signature

425

- validate_response_signature(response): Validate response signature

426

- hash_password(password): Hash password for authentication

427

"""

428

429

@staticmethod

430

def encrypt_api_request(data: Dict[str, Any]) -> bytes:

431

"""

432

Encrypt API request data using JMComic protocol.

433

434

Parameters:

435

- data: dict - Request data to encrypt

436

437

Returns:

438

bytes - Encrypted request data

439

"""

440

441

@staticmethod

442

def decrypt_api_response(encrypted_data: bytes) -> Dict[str, Any]:

443

"""

444

Decrypt API response data using JMComic protocol.

445

446

Parameters:

447

- encrypted_data: bytes - Encrypted response data

448

449

Returns:

450

dict - Decrypted response data

451

"""

452

453

@staticmethod

454

def generate_request_signature(data: Dict[str, Any]) -> str:

455

"""

456

Generate request signature for API authentication.

457

458

Parameters:

459

- data: dict - Request data

460

461

Returns:

462

str - Generated signature

463

"""

464

465

@staticmethod

466

def validate_response_signature(response: Dict[str, Any]) -> bool:

467

"""

468

Validate response signature for data integrity.

469

470

Parameters:

471

- response: dict - API response with signature

472

473

Returns:

474

bool - True if signature is valid

475

"""

476

```

477

478

## Usage Examples

479

480

```python

481

# Text processing

482

jm_id = JmcomicText.parse_to_jm_id("https://jmcomic.example/album/123456")

483

clean_name = JmcomicText.clean_filename("Album: Title/with\\invalid*chars")

484

485

# HTML parsing

486

album_info = PatternTool.match_album_info(html_content)

487

image_urls = PatternTool.match_image_urls(photo_html)

488

489

# Page processing

490

album = JmPageTool.parse_album_page(album_html)

491

search_results = JmPageTool.parse_search_page(search_html)

492

493

# Image processing

494

decrypted_image = JmImageTool.decrypt_image(scrambled_data, scramble_id)

495

image_info = JmImageTool.get_image_info(image_data)

496

converted_image = JmImageTool.convert_image_format(image_data, 'JPEG')

497

498

# API processing

499

album = JmApiAdaptTool.adapt_album_response(api_response)

500

normalized_data = JmApiAdaptTool.normalize_response_data(raw_data)

501

502

# Cryptographic operations

503

encrypted_request = JmCryptoTool.encrypt_api_request(request_data)

504

decrypted_response = JmCryptoTool.decrypt_api_response(encrypted_response)

505

```

506

507

## Integration with Core Systems

508

509

These tools integrate seamlessly with the core download and client systems:

510

511

- **Text tools** are used throughout for ID parsing and URL handling

512

- **Pattern tools** power the HTML client's data extraction

513

- **Page tools** convert HTML pages to structured entities

514

- **API tools** standardize responses across different client types

515

- **Image tools** handle content processing in downloaders and plugins

516

- **Crypto tools** secure API communications in the mobile client