0
# Text and Data Processing
1
2
Specialized utilities for text processing, HTML parsing, image processing, and cryptographic operations. These tools support the core functionality with URL parsing, data extraction, content processing, and security operations.
3
4
## Types
5
6
```python { .api }
7
from typing import Dict, Any, List, Optional, Union, Pattern, Match
8
```
9
10
## Capabilities
11
12
### Text Processing Utilities
13
14
Comprehensive text processing tools for URL handling, domain management, and ID parsing.
15
16
```python { .api }
17
class JmcomicText:
18
"""
19
Text processing utilities for URL parsing, domain extraction, and ID parsing.
20
21
Provides essential text manipulation functions for working with JMComic
22
URLs, domain names, and content identifiers.
23
24
Static Methods:
25
- parse_to_jm_id(text): Parse text to extract JM IDs
26
- extract_domain(url): Extract domain from URL
27
- normalize_url(url): Normalize URL format
28
- is_valid_jm_id(jm_id): Validate JM ID format
29
- parse_album_id(text): Extract album ID from text
30
- parse_photo_id(text): Extract photo ID from text
31
- clean_filename(filename): Clean filename for filesystem
32
- format_title(title): Format title for display
33
"""
34
35
@staticmethod
36
def parse_to_jm_id(text: Union[str, int]) -> str:
37
"""
38
Parse text or URL to extract JM ID.
39
40
Handles various input formats including URLs, raw IDs,
41
and text containing IDs.
42
43
Parameters:
44
- text: str or int - Text containing JM ID
45
46
Returns:
47
str - Extracted and normalized JM ID
48
49
Raises:
50
ValueError - If no valid ID found
51
"""
52
53
@staticmethod
54
def extract_domain(url: str) -> str:
55
"""
56
Extract domain from URL.
57
58
Parameters:
59
- url: str - URL to parse
60
61
Returns:
62
str - Extracted domain name
63
"""
64
65
@staticmethod
66
def normalize_url(url: str) -> str:
67
"""
68
Normalize URL format for consistent processing.
69
70
Parameters:
71
- url: str - URL to normalize
72
73
Returns:
74
str - Normalized URL
75
"""
76
77
@staticmethod
78
def is_valid_jm_id(jm_id: Union[str, int]) -> bool:
79
"""
80
Validate JM ID format.
81
82
Parameters:
83
- jm_id: str or int - ID to validate
84
85
Returns:
86
bool - True if valid JM ID format
87
"""
88
89
@staticmethod
90
def clean_filename(filename: str) -> str:
91
"""
92
Clean filename for filesystem compatibility.
93
94
Removes or replaces invalid characters for safe file operations.
95
96
Parameters:
97
- filename: str - Original filename
98
99
Returns:
100
str - Cleaned filename safe for filesystem
101
"""
102
```
103
104
Usage examples:
105
106
```python
107
# Parse various ID formats
108
jm_id = JmcomicText.parse_to_jm_id("https://example.com/album/123456")
109
jm_id = JmcomicText.parse_to_jm_id("123456")
110
jm_id = JmcomicText.parse_to_jm_id("Album ID: 123456")
111
112
# Validate IDs
113
is_valid = JmcomicText.is_valid_jm_id("123456")
114
115
# Clean filenames
116
safe_filename = JmcomicText.clean_filename("Album: Title with/invalid\\chars")
117
```
118
119
### HTML Parsing and Pattern Matching
120
121
Tools for parsing HTML content and extracting data using regular expressions.
122
123
```python { .api }
124
class PatternTool:
125
"""
126
Regular expression utilities for HTML parsing and data extraction.
127
128
Provides pre-compiled patterns and matching utilities for
129
extracting structured data from HTML pages.
130
131
Class Attributes:
132
- ALBUM_ID_PATTERN: Pattern - Regex for album ID extraction
133
- PHOTO_ID_PATTERN: Pattern - Regex for photo ID extraction
134
- IMAGE_URL_PATTERN: Pattern - Regex for image URL extraction
135
- TITLE_PATTERN: Pattern - Regex for title extraction
136
137
Static Methods:
138
- match_album_info(html): Extract album information from HTML
139
- match_photo_info(html): Extract photo information from HTML
140
- match_image_urls(html): Extract image URLs from HTML
141
- find_all_matches(pattern, text): Find all regex matches
142
"""
143
144
@staticmethod
145
def match_album_info(html: str) -> Dict[str, Any]:
146
"""
147
Extract album information from HTML content.
148
149
Parameters:
150
- html: str - HTML content to parse
151
152
Returns:
153
dict - Extracted album information
154
"""
155
156
@staticmethod
157
def match_photo_info(html: str) -> Dict[str, Any]:
158
"""
159
Extract photo information from HTML content.
160
161
Parameters:
162
- html: str - HTML content to parse
163
164
Returns:
165
dict - Extracted photo information
166
"""
167
168
@staticmethod
169
def match_image_urls(html: str) -> List[str]:
170
"""
171
Extract image URLs from HTML content.
172
173
Parameters:
174
- html: str - HTML content to parse
175
176
Returns:
177
List[str] - List of extracted image URLs
178
"""
179
180
@staticmethod
181
def find_all_matches(pattern: Pattern, text: str) -> List[Match]:
182
"""
183
Find all regex matches in text.
184
185
Parameters:
186
- pattern: Pattern - Compiled regex pattern
187
- text: str - Text to search
188
189
Returns:
190
List[Match] - List of regex match objects
191
"""
192
```
193
194
### Page Processing Tools
195
196
Specialized tools for processing HTML pages and extracting structured data.
197
198
```python { .api }
199
class JmPageTool:
200
"""
201
HTML page parsing and data extraction utilities.
202
203
Provides high-level functions for parsing JMComic HTML pages
204
and extracting structured data for albums, photos, and searches.
205
206
Static Methods:
207
- parse_album_page(html): Parse album detail page
208
- parse_photo_page(html): Parse photo detail page
209
- parse_search_page(html): Parse search results page
210
- parse_category_page(html): Parse category listing page
211
- extract_pagination(html): Extract pagination information
212
- extract_metadata(html): Extract page metadata
213
"""
214
215
@staticmethod
216
def parse_album_page(html: str) -> 'JmAlbumDetail':
217
"""
218
Parse album detail page HTML to extract album information.
219
220
Parameters:
221
- html: str - Album page HTML content
222
223
Returns:
224
JmAlbumDetail - Parsed album with metadata and episodes
225
"""
226
227
@staticmethod
228
def parse_photo_page(html: str) -> 'JmPhotoDetail':
229
"""
230
Parse photo detail page HTML to extract photo information.
231
232
Parameters:
233
- html: str - Photo page HTML content
234
235
Returns:
236
JmPhotoDetail - Parsed photo with metadata and images
237
"""
238
239
@staticmethod
240
def parse_search_page(html: str) -> 'JmSearchPage':
241
"""
242
Parse search results page HTML.
243
244
Parameters:
245
- html: str - Search page HTML content
246
247
Returns:
248
JmSearchPage - Parsed search results with albums and pagination
249
"""
250
251
@staticmethod
252
def extract_pagination(html: str) -> Dict[str, Any]:
253
"""
254
Extract pagination information from page.
255
256
Parameters:
257
- html: str - HTML content with pagination
258
259
Returns:
260
dict - Pagination data (current_page, total_pages, has_next, etc.)
261
"""
262
```
263
264
### API Response Processing
265
266
Tools for processing and adapting API responses from different client types.
267
268
```python { .api }
269
class JmApiAdaptTool:
270
"""
271
API response adaptation and transformation utilities.
272
273
Handles conversion between different API response formats and
274
standardizes data structures across client types.
275
276
Static Methods:
277
- adapt_album_response(response): Adapt album API response
278
- adapt_photo_response(response): Adapt photo API response
279
- adapt_search_response(response): Adapt search API response
280
- normalize_response_data(data): Normalize response data format
281
- validate_api_response(response): Validate API response structure
282
"""
283
284
@staticmethod
285
def adapt_album_response(response: Dict[str, Any]) -> 'JmAlbumDetail':
286
"""
287
Adapt album API response to standard format.
288
289
Parameters:
290
- response: dict - Raw API response data
291
292
Returns:
293
JmAlbumDetail - Standardized album entity
294
"""
295
296
@staticmethod
297
def adapt_photo_response(response: Dict[str, Any]) -> 'JmPhotoDetail':
298
"""
299
Adapt photo API response to standard format.
300
301
Parameters:
302
- response: dict - Raw API response data
303
304
Returns:
305
JmPhotoDetail - Standardized photo entity
306
"""
307
308
@staticmethod
309
def normalize_response_data(data: Dict[str, Any]) -> Dict[str, Any]:
310
"""
311
Normalize response data format across different APIs.
312
313
Parameters:
314
- data: dict - Raw response data
315
316
Returns:
317
dict - Normalized data structure
318
"""
319
```
320
321
### Image Processing Tools
322
323
Comprehensive image processing utilities including decryption, format conversion, and manipulation.
324
325
```python { .api }
326
class JmImageTool:
327
"""
328
Image processing, decryption, and format conversion utilities.
329
330
Provides tools for handling scrambled images, format conversion,
331
and image manipulation operations.
332
333
Static Methods:
334
- decrypt_image(image_data, scramble_id): Decrypt scrambled image
335
- is_image_scrambled(image_data): Check if image is scrambled
336
- convert_image_format(image_data, target_format): Convert image format
337
- resize_image(image_data, width, height): Resize image
338
- get_image_info(image_data): Get image metadata
339
- merge_images_vertical(images): Merge images vertically
340
- optimize_image(image_data): Optimize image for size
341
"""
342
343
@staticmethod
344
def decrypt_image(image_data: bytes, scramble_id: int) -> bytes:
345
"""
346
Decrypt scrambled image data.
347
348
JMComic images are sometimes scrambled for protection.
349
This function reverses the scrambling process.
350
351
Parameters:
352
- image_data: bytes - Scrambled image data
353
- scramble_id: int - Scramble algorithm identifier
354
355
Returns:
356
bytes - Decrypted image data
357
"""
358
359
@staticmethod
360
def is_image_scrambled(image_data: bytes) -> bool:
361
"""
362
Check if image data is scrambled.
363
364
Parameters:
365
- image_data: bytes - Image data to check
366
367
Returns:
368
bool - True if image appears to be scrambled
369
"""
370
371
@staticmethod
372
def convert_image_format(image_data: bytes, target_format: str) -> bytes:
373
"""
374
Convert image to different format.
375
376
Parameters:
377
- image_data: bytes - Original image data
378
- target_format: str - Target format ('JPEG', 'PNG', 'WEBP')
379
380
Returns:
381
bytes - Converted image data
382
"""
383
384
@staticmethod
385
def get_image_info(image_data: bytes) -> Dict[str, Any]:
386
"""
387
Get image metadata and properties.
388
389
Parameters:
390
- image_data: bytes - Image data
391
392
Returns:
393
dict - Image information (width, height, format, size)
394
"""
395
396
@staticmethod
397
def merge_images_vertical(images: List[bytes]) -> bytes:
398
"""
399
Merge multiple images vertically into single image.
400
401
Parameters:
402
- images: List[bytes] - List of image data to merge
403
404
Returns:
405
bytes - Merged image data
406
"""
407
```
408
409
### Cryptographic Tools
410
411
Encryption and decryption utilities for API communications and data protection.
412
413
```python { .api }
414
class JmCryptoTool:
415
"""
416
Encryption/decryption utilities for API communications.
417
418
Handles the encryption protocols used by JMComic mobile API
419
and provides security functions for data protection.
420
421
Static Methods:
422
- encrypt_api_request(data): Encrypt API request data
423
- decrypt_api_response(encrypted_data): Decrypt API response
424
- generate_request_signature(data): Generate request signature
425
- validate_response_signature(response): Validate response signature
426
- hash_password(password): Hash password for authentication
427
"""
428
429
@staticmethod
430
def encrypt_api_request(data: Dict[str, Any]) -> bytes:
431
"""
432
Encrypt API request data using JMComic protocol.
433
434
Parameters:
435
- data: dict - Request data to encrypt
436
437
Returns:
438
bytes - Encrypted request data
439
"""
440
441
@staticmethod
442
def decrypt_api_response(encrypted_data: bytes) -> Dict[str, Any]:
443
"""
444
Decrypt API response data using JMComic protocol.
445
446
Parameters:
447
- encrypted_data: bytes - Encrypted response data
448
449
Returns:
450
dict - Decrypted response data
451
"""
452
453
@staticmethod
454
def generate_request_signature(data: Dict[str, Any]) -> str:
455
"""
456
Generate request signature for API authentication.
457
458
Parameters:
459
- data: dict - Request data
460
461
Returns:
462
str - Generated signature
463
"""
464
465
@staticmethod
466
def validate_response_signature(response: Dict[str, Any]) -> bool:
467
"""
468
Validate response signature for data integrity.
469
470
Parameters:
471
- response: dict - API response with signature
472
473
Returns:
474
bool - True if signature is valid
475
"""
476
```
477
478
## Usage Examples
479
480
```python
481
# Text processing
482
jm_id = JmcomicText.parse_to_jm_id("https://jmcomic.example/album/123456")
483
clean_name = JmcomicText.clean_filename("Album: Title/with\\invalid*chars")
484
485
# HTML parsing
486
album_info = PatternTool.match_album_info(html_content)
487
image_urls = PatternTool.match_image_urls(photo_html)
488
489
# Page processing
490
album = JmPageTool.parse_album_page(album_html)
491
search_results = JmPageTool.parse_search_page(search_html)
492
493
# Image processing
494
decrypted_image = JmImageTool.decrypt_image(scrambled_data, scramble_id)
495
image_info = JmImageTool.get_image_info(image_data)
496
converted_image = JmImageTool.convert_image_format(image_data, 'JPEG')
497
498
# API processing
499
album = JmApiAdaptTool.adapt_album_response(api_response)
500
normalized_data = JmApiAdaptTool.normalize_response_data(raw_data)
501
502
# Cryptographic operations
503
encrypted_request = JmCryptoTool.encrypt_api_request(request_data)
504
decrypted_response = JmCryptoTool.decrypt_api_response(encrypted_response)
505
```
506
507
## Integration with Core Systems
508
509
These tools integrate seamlessly with the core download and client systems:
510
511
- **Text tools** are used throughout for ID parsing and URL handling
512
- **Pattern tools** power the HTML client's data extraction
513
- **Page tools** convert HTML pages to structured entities
514
- **API tools** standardize responses across different client types
515
- **Image tools** handle content processing in downloaders and plugins
516
- **Crypto tools** secure API communications in the mobile client